Search Results: "error"

11 May 2026

Freexian Collaborators: Debusine workflow performance issues (by Colin Watson)

During March and April, we had a number of performance issues that made Debusine s core functions of running work requests and reflecting their results in workflows quite unreliable. Investigating and fixing this took up a lot of time from both the Debusine development team and Freexian s sysadmins. The central problems involved a series of database concurrency and worker communication issues that interacted in complex ways. On bad days, this caused between 10% and 25% of processed work requests to fail unnecessarily. We communicated some of the problems to users on IRC, but not consistently since we didn t entirely understand the scope of the problems at the time. Most of the problems are fixed now, but we had a retrospective meeting to make sure we understood what happened and that we learn from it. Here s a summary.

Data model Debusine s workflows consist of many individual work requests. Each work request has a database row representing its state, which means that the overall state of a workflow is distributed across many rows. Changes to one work request (for example, when it is completed) can cause changes to other work requests (perhaps unblocking it so that it can be scheduled to an idle worker). Those changes may happen concurrently, and in practice often do. Workers typically need to create artifacts containing the output of tasks: these include things like packages, build logs, and test output. Debusine records task history so that it can make better decisions about how to schedule work requests. Since this might otherwise grow without bound, the server expires older parts of that history after a while. The same is true for many other kinds of data.

Causes
  • Because workflows involve changes that propagate between work requests, there were historically some cases where different parts of the system could deadlock due to trying to take update locks on overlapping sets of work request rows in different orders. We mitigated that somewhere around 2025-11-05 by locking entire workflows in one go before making any change that might need to propagate between work requests like this; that dealt with the deadlocks, but it s quite a heavyweight locking strategy that sometimes caused significant delays.
  • We ve been working for some time to make Debusine useful to Debian developers, and regression tracking is an important part of that: it lets developers test uploads without being too badly misled by tests in related packages that were already failing before they started. On 2026-03-11 we enabled this by default on debusine.debian.net, after testing it for a while. Although this is useful, it put more load on the system as a whole, often approximately doubling the number of work requests in a given workflow with many additional dependencies between them.
  • Like much of the world, we re in an arms race with unethical scrapers desperately trying to feed everyone else s data into LLMs before they run out of money. We saw a substantial uptick here towards the end of March, which meant that we had to temporarily disable regression tracking and to put some other mitigations in front of our web interface.
  • We historically haven t had systematic internal timeouts. Prompted by ruff, a Google Summer of Code applicant went through and added timeouts in many places, including some calls between the worker and the server. This was fiddly work and the student did a solid job, so I m not putting them on blast or anything! However, it did mean that some things that came in under load balancer timeouts now timed out earlier on the client side of the request (and hence in Debusine workers), which made some problems show up in different ways and be more obvious. This was deployed on 2026-04-03.

Fixes

Workflow orchestration Figuring out what individual work requests need to be run as part of a workflow - the process we call orchestration - can be challenging. Unlike typical CI pipelines, these workflows often span substantial chunks of a distribution: a glibc update can involve retesting nearly everything! Nevertheless, it s not particularly helpful for it to take hours just to build the workflow graph. Fixing this involved many classic database optimizations such as adding indexes and CTEs, but probably the most effective fix was adding a cache for lookups within each orchestrator run or work request. Profiling showed that resolving lookups was a hot spot, and the way that task data is often passed down through a workflow meant that the same lookup could be resolved hundreds or thousands of times in a large workflow.

Expiry We knew for quite some time that our expiry job took very aggressive locks, effectively blocking most of the rest of the system. This was an early decision to make the expiry logic simpler by allowing it to follow graphs without worrying about concurrent activity, but it clearly couldn t stay that way forever. Row locks in PostgreSQL was very helpful in figuring out the correct approach here. Since we re mainly concerned about the possibility of new foreign key references being created to artifacts we re considering for expiry, and since that would involve taking FOR KEY SHARE locks on those rows, we can explicitly take FOR UPDATE locks (which conflict with FOR KEY SHARE), and then recompute the set of artifacts to expire with any locked artifacts marked to keep. This was delicate work, but it saved minutes of downtime every day.

Whole-workflow locking I mentioned earlier that we avoided some deadlock issues by taking locks on entire workflows. To ensure that these locks are effective even against code that isn t specifically aware of them, this is implemented by using SELECT FOR UPDATE on all the work request rows in the workflow. In some cases the search for which rows to lock itself tripped up the PostgreSQL planner.

Scheduling We run multiple Celery workers for various purposes. Some of them can do many things in parallel, but in some specific cases (notably the task scheduler) we only ever want a single instance to run at once. Unfortunately a bug in the systemd service meant that the scheduler often ran concurrently anyway! Once we fixed that, the scheduler logs became a lot less confusing. When Debusine was small, it was reasonable for it to perform scheduling very aggressively, typically as soon as any change occurred to a work request or a worker that might possibly influence it. This doesn t scale very well, though, and even though we tried to batch multiple scheduling triggers that occurred within a single transaction, it could still make debugging very confusing. We reduced the number of changes that would result in immediate scheduling, and deferred everything else to a regular tick . The scheduler may not be able to assign a work request to an idle worker due to the workflow being locked. That isn t a major problem in itself; it can just try again later. However, in very large workflows, we found that it often worked its way down all the pending work requests one by one finding that each of them was locked, which was slow and also produced a huge amount of log noise. It now assumes that if a work request is locked, then it might as well skip other work requests in the same workflow until the next scheduler run. Between them, these changes reduced the number of locks typically being held on debusine.debian.net by about 80%: Lock graph

Worker refactoring The Debusine worker has always been partially asynchronous, but while it was actually executing a task - in other words, most of the time, at least in busy periods - it didn t respond to inbound websocket messages, causing spurious disconnections. We restructured the whole worker to be fully event-based. We also had to put quite a bit of effort into improving the path by which workers report work request completion, because if that hits a timeout then it can mean throwing away hours of work. We have some further improvements in mind, but for now we defer most of this work to a Celery task so that whole-workflow locks aren t on the critical path.

Database write volume One of our sysadmins observed that our database write volume was consistently very high. This was a puzzle, but for a long time we left that unexplored. Eventually we thought to ask PostgreSQL s own statistics, and we found a surprise:
debusine=> SELECT relname AS table_name,
debusine->        n_tup_ins AS inserts,
debusine->        n_tup_upd AS updates,
debusine->        n_tup_del AS deletes,
debusine->        (n_tup_ins + n_tup_upd + n_tup_del) AS total_dml
debusine-> FROM pg_stat_user_tables
debusine-> WHERE (n_tup_ins + n_tup_upd + n_tup_del) > 0
debusine-> ORDER BY total_dml DESC
debusine-> LIMIT 20;
              table_name                inserts    updates     deletes   total_dml
--------------------------------------+---------+------------+---------+------------
 db_collectionitem                      1418251   3578202388   3630143   3583250782
 db_token                                 15143     11212106     11389     11238638
 db_workrequest                          386196      6399071   1820500      8605767
 db_fileinartifact                      2783021      1837929   1663887      6284837
 django_celery_results_taskresult       1819301      1501623   1791656      5112580
 db_artifact                             960077      3340859    663890      4964826
 db_collectionitemmatchconstraint       1550457            0   2207486      3757943
 db_artifactrelation                    2229382            0   1363825      3593207
 db_fileupload                          1023400      1057036   1023346      3103782
 db_file                                1673194            0    970252      2643446
 db_fileinstore                         1411995            0    970259      2382254
 db_filestore                                 0      2381578         0      2381578
 django_session                          645423      1519880       531      2165834
 db_workrequest_dependencies             365877            0    936537      1302414
 db_worker                                18317       949280      9487       977084
 db_collection                            10061           85    177741       187887
 db_workerpooltaskexecutionstatistics     28721            0         0        28721
 db_workerpoolstatistics                   1640            0         0         1640
 db_workflowtemplate                        130          158       649          937
 db_identity                                 76          661         0          737
(20 rows)
Oh my - that s a lot of db_collectionitem updates and must surely be out of proportion with what we really need. Can we narrow that down by asking about the most recently-updated tuples?
debusine=> SELECT DISTINCT category
debusine-> FROM db_collectionitem
debusine-> WHERE id IN (
debusine->     SELECT id FROM db_collectionitem
debusine->     ORDER BY xmin::text::integer DESC LIMIT 10000
debusine-> );
           category
------------------------------
 debusine:historical-task-run
(1 row)
That might not be absolutely reliable, but it was certainly a hint. As per PostgreSQL s documentation, by default UPDATE always performs physical updates to every matching row regardless of whether the data has changed, and our code to expire old task history entries wasn t doing that properly. Once we knew where to look, it was easy to add some extra constraints. This reduced our mean write volume on debusine.debian.net from about 23 MB/s to about 3 MB/s, which had an immediate knock-on effect on our request failure rate: Disk write graph HTTP errors

Current state Our metrics indicate that things are a lot better now. We still have a few things to deal with, such as:
  • Some more performance fixes are on their way to fix some remaining cases where views are very slow or where file uploads from workers fail due to locks.
  • We have some changes in the works to revamp how work request changes propagate through workflows in a way that doesn t require so many heavyweight locks.
  • We have a number of monitoring and alerting improvements we d like to make, both for outcomes (things like slow Celery tasks) and possible root causes (database performance). We d also like to deploy some more modern observability tools; hunting for things using journalctl isn t terrible, but it s not really the state of the art.
  • We need to improve how we communicate to users when we re having operational problems, both informally (IRC, etc.) and on the site.
  • Retries don t always behave the way you d expect in workflows.
I hope this has been an interesting tour through the sorts of things that can go wrong in this kind of distributed system!

9 May 2026

Russell Coker: Systemd, Mobile Linux, and Containers

I ve had some problems running apps I want on my Furilabs FLX1s [1], so I decided to install some container environments to test various versions. I started with Debian/Testing so I can test the build process for some packages I m about to upload to Unstable. Systemd Issues When running debootstrap testing testing to setup the chroot the process aborted with errors including the following from the systemd postinst:
Failed to enable units: Protocol driver not attached.
Cannot open '/etc/machine-id': Protocol driver not attached
This turned out to be from trying to run systemctl in the postinst, I just removed the set -e line from /chroot/testing/var/lib/dpkg/info/systemd.postinst and kept on going (I m not planning to actually use systemd so it s failure to setup wasn t a problem). Then I installed a bunch of -dev packages needed to build my package which had a dependency chain that included udev leading to the following error:
Setting up udev (260.1-1) ...
Failed to chase and open directory '/etc/udev/hwdb.d', ignoring: Protocol driver not attached
Failed to chase and open directory '/usr/lib/udev/hwdb.d', ignoring: Protocol driver not attached
Udev is also a part of systemd. Googling for this turned up a closed systemd bug about this indicating that it has a minimum kernel version of 5.10 [2]. The Furiphone has kernel 4.19.325-furiphone-radon due to being based on Android. Checking the kernel version isn t that hard to do, if the systemd programs in question checked the version and reported can t run on kernels prior to 5.10 then it would avoid a lot of confusion and also bug reports that the systemd developers don t want. Some Debian package dependencies can probably do with revision. Installing the packages libkdb3-dev libkf5archive-dev qtdeclarative5-dev qtpositioning5-dev qttools5-dev ideally wouldn t have a dependency chain leading to udev. The Furilabs people appear to have patched the latest Debian version of systemd to work with the older kernels, the version is currently 260.1-1+furios0+git20260425023744.8401044.forky.production. Compile Times I got this working by just editing every postinst script and either removing the set -e or adding an exit 0 at the top, I don t need things to be configured properly for a running OS I just need the files in the right locations for a container. One issue I discovered when I started compiling is that it was only running on 1 core and the nprocs program was returning 1 . The lscpu program showed that only 1 of the 8 cores was online, it was a single Cortex-A78 core. Some combination of putting it in caffeine mode and having the screen on enabled all 6*Cortex-A55 and 2*Cortex-A78 cores. The below table compares compiling Harbour-Amazfish on the Furiphone with all 8 CPU cores active, my E5-2696 v4 workstation (almost the fastest socket 2011-3 CPU ever made), running ARM64 software emulation on a system with two E5-2699A v4 CPUs, and a Radxa 8 core ARM SBC (which I will review in a future blog post). Given that the source apparently limits the parallelism to less than 7 cores on average it s pretty impressive for the elapsed time to be only 2.5* longer on the phone. Emulating the ARM64 build at about 4* the system CPU time is impressive too, as the system has 4.5* as many CPU cores it could theoretically compile ARM code faster than the native ARM hardware I own for any project that uses enough cores.
System User time System time Elapsed %CPU
Furiphone 2252.76 164.51 7:00.88 574
E5-2696 v4 workstation 679.64 119.07 1:58.63 673
2*22core Intel CPUs (qemu) 8476.65 113.14 10:24.57 1375
Radxa 2011.45 239.40 6:25.55 583

4 May 2026

Russell Coker: Copy Fail on Debian and SE Linux

I have just learned of the Copy Fail kernel vulnerability [1] thanks to alexanderkjall@mastodon.social (who I have just followed on Mastodon and I recommend that you follow too). The question for me (after installing the patched kernel the systems of mine that are most exposed) is whether SE Linux would have stopped that. Basic Policy Analysis For the SE Linux policy analysis the alg_socket class is the one that is related to the exploit. So the following policy analysis command (run as non-root with policy copied to /tmp from a running system) shows what domains are allowed access on my current Debian development system:
$ sesearch -A -c alg_socket /tmp/policy.35 
allow NetworkManager_t NetworkManager_t:alg_socket   accept bind create read setopt write  ;
allow bluetooth_t bluetooth_t:alg_socket   accept append bind connect create getattr getopt ioctl listen read setattr setopt shutdown write  ;
allow daemon init_t:alg_socket   getattr getopt ioctl read setopt write  ;
allow devicekit_disk_t domain:alg_socket getattr;
allow lvm_t lvm_t:alg_socket   append bind connect create getattr getopt ioctl read setattr setopt shutdown write  ;
allow sosreport_t domain:alg_socket getattr;
allow sysadm_t domain:alg_socket getattr;
allow unconfined_domain_type domain:alg_socket   accept append bind connect create getattr getopt ioctl listen lock map name_bind read recvfrom relabelfrom relabelto sendto setattr setopt shutdown write  ;
The above is the same as on the Trixie release policy as these things aren t changed often. Below is from Debian/Bookworm which is the same apart from Bookworm not allowing lvm_t:
$ sesearch -A -c alg_socket /tmp/policy.33
allow NetworkManager_t NetworkManager_t:alg_socket   accept bind create read setopt write  ;
allow bluetooth_t bluetooth_t:alg_socket   accept append bind connect create getattr getopt ioctl listen read setattr setopt shutdown write  ;
allow daemon init_t:alg_socket   getattr getopt ioctl read setopt write  ;
allow devicekit_disk_t domain:alg_socket getattr;
allow sosreport_t domain:alg_socket getattr;
allow sysadm_t domain:alg_socket getattr;
allow unconfined_domain_type domain:alg_socket   accept append bind connect create getattr getopt ioctl listen lock map name_bind read recvfrom relabelfrom relabelto sendto setattr setopt shutdown write  ;
I checked every Debian policy back to when the alg_socket class was first added and found that the older versions had fewer domains granted access. The most recently added was bluetooth_t and the one before that was NetworkManager_t. The Risky Lines Of those allow statements the following are the risks: Unconfined Domains and the unconfined_domain_type Attribute When writing policy lines like the following line aren t generally considered a problem as unconfined domains are allowed full access to the system. However it can be an issue if you have a process in an unconfined domain without root access, which means a regular user login. Unfortunately this happens to be where this exploit and the default Debian SE Linux configuration intersect.
allow unconfined_domain_type domain:alg_socket   accept append bind connect create getattr getopt ioctl listen lock map name_bind read recvfrom relabelfrom relabelto sendto setattr setopt shutdown write  ;
The following shell code gets a list of unconfined domains which can be entered from user domains.
A=""
for n in $(seinfo -x -a unconfined_domain_type grep _t$) ; do
  A="$A ($n)"
done
A=$(echo $A sed -e s/^.//)
sesearch -T -s user_application_exec_domain -c process egrep "$A;"
Below is the output on a Debian/Trixie (Stable) system. So a confined user in the user_t domain could run an X server and try and get it to run the exploit code (which seems difficult) or running a Wine or Mono program from the Window manager in a Wayland environment.
type_transition user_t xserver_exec_t:process xserver_t;
type_transition user_wm_t mono_exec_t:process mono_t;
type_transition user_wm_t wine_exec_t:process wine_t;
type_transition user_wm_t xserver_exec_t:process xserver_t;
The issue of unconfined domains in SE Linux policy needs much more work. I ll write some blog posts about it later and the next release of Debian will be significantly better in this regard. Daemons that Have Access
allow NetworkManager_t NetworkManager_t:alg_socket   accept bind create read setopt write  ;
allow bluetooth_t bluetooth_t:alg_socket   accept append bind connect create getattr getopt ioctl listen read setattr setopt shutdown write  ;
Network Manager is something that can potentially be exploited by a desktop user as it has a large attack surface for the desktop interface. But as the vast majority of desktop user accounts are unconfined that s not an issue. This might be an issue for some restricted desktop PCs, maybe kiosk systems and those PCs that were being installed in prisons. The bluetooth_t domain is used by the bluetooth daemon that runs as root. While we generally are less concerned about a root process being exploited the daemon will handle some data from hostile sources and it could be used as an escalation attack by someone with a hostile Bluetooth device. These can t be exploited without another bug. The Lines that Aren t Problems The getattr Lines
allow devicekit_disk_t domain:alg_socket getattr;
allow sosreport_t domain:alg_socket getattr;
allow sysadm_t domain:alg_socket getattr;
The above getattr access isn t an issue as it just allows seeing process information, and it s also by privileged domains. The init_t Sockets
allow daemon init_t:alg_socket   getattr getopt ioctl read setopt write  ;
The daemon access to sockets inherited from init_t probably isn t a great idea, it s from the following section in init.te which is to allow socket activation for daemons, the comment is concerning in this context. Also socket_class_set is overly broad as without even inspecting the systemd source code I m pretty sure that far fewer than 1/3 of the 55 classes allowed by that rule are actually supported in systemd.
ifdef( init_systemd', 
        # Until systemd is fixed
        allow daemon init_t:socket_class_set   getattr getopt ioctl read setopt write  ;
But that s not really a problem as systemd has to just not create a socket of that type, if a hostile party can make systemd create such sockets then you have probably already lost. SE Linux Protection Overall SE Linux systems running confined users (kiosks and other confined GUI environments) will be protected barring a bug in Network Manager or the Bluetooth daemon as long as there is no Xserver installed (or the X server won t run scripts on startup), no Wine system installed, and no Mono. SE Linux servers and VMs will be protected against daemon issues as long as the daemon isn t unconfined. To convert the default login to user_t run the following commands:
semanage login -m -s user_u -r s0 __default__
restorecon -R -v -F /home
But it is still possible to access an unconfined domain from user_t (a topic I will address in detail in a future blog post). To remove unconfined entirely (not a task for novices or something to be done on in production without testing and planning) run the following commands:
semanage login -m -s root -r s0 root
# logout and login again
semodule -X 100 -r unconfined
Then a Debian/Trixie system running SE Linux will be safe against this attack even when running a vulnerable kernel. If you still want to use root as unconfined_t but still have untrusted shell users then run the following command to remove the easiest ways for users to run a program in an unconfined domain:
semodule -X 100 -r mono wine
Success and Failure Blocked by SE Linux Below is what happens on stdout/stderr when SE Linux blocks the exploit (tested with vulnerable Debian kernel 6.12.74+deb13+1-amd64):
test@testing1:~$ python3 ./copy_fail_exp.py 
Traceback (most recent call last):
  File "/home/test/./copy_fail_exp.py", line 9, in <module>
    while i<len(e):c(f,i,e[i:i+4]);i+=4
                   ~^^^^^^^^^^^^^^
  File "/home/test/./copy_fail_exp.py", line 5, in c
    a=s.socket(38,5,0);a.bind(("aead","authencesn(hmac(sha256),cbc(aes))"));h=279;v=a.setsockopt;v(h,1,d('0800010000000010'+'0'*64));v(h,5,None,4);u,_=a.accept();o=t+4;i=d('00');u.sendmsg([b"A"*4+c],[(h,3,i*4),(h,2,b'\x10'+i*19),(h,4,b'\x08'+i*3),],32768);r,w=g.pipe();n=g.splice;n(f,w,o,offset_src=0);n(r,u.fileno(),o)
  File "/usr/lib/python3.13/socket.py", line 233, in __init__
    _socket.socket.__init__(self, family, type, proto, fileno)
    ~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^PermissionError: [Errno 13] Permission denied
test@testing1:~$ su
Password:
When the attack is blocked by SE Linux there will be no messages in the kernel message log but the SE Linux audit log (typically stored in /var/log/audit/audit.log) will have lines like the following:
type=AVC msg=audit(1777803068.070:76): avc:  denied    create   for  pid=811 comm="python3" scontext=user_u:user_r:user_t:s0 tcontext=user_u:user_r:user_t:s0 tclass=alg_socket permissive=0
type=SYSCALL msg=audit(1777803068.070:76): arch=c000003e syscall=41 success=no exit=-13 a0=26 a1=80005 a2=0 a3=0 items=0 ppid=791 pid=811 auid=1000 uid=1000 gid=1000 euid=1000 suid=1000 fsuid=1000 egid=1000 sgid=1000 fsgid=1000 tty=pts0 ses=1 comm="python3" exe="/usr/bin/python3.13" subj=user_u:user_r:user_t:s0 key=(null)ARCH=x86_64 SYSCALL=socket AUID="test" UID="test" GID="test" EUID="test" SUID="test" FSUID="test" EGID="test" SGID="test" FSGID="test"
type=PROCTITLE msg=audit(1777803068.070:76): proctitle=707974686F6E33002E2F636F70795F6661696C5F6578702E7079
For that the :76 is the audit log entry number, the command ausearch -i -a 76 will interpret that message with the following output:
type=PROCTITLE msg=audit(05/03/26 10:11:08.070:76) : proctitle=python3 ./copy_fail_exp.py 
type=SYSCALL msg=audit(05/03/26 10:11:08.070:76) : arch=x86_64 syscall=socket success=no exit=EACCES(Permission denied) a0=alg a1=SOCK_SEQPACKET a2=ip a3=0x0 items=0 ppid=791 pid=811 auid=test uid=test gid=test euid=test suid=test fsuid=test egid=test sgid=test fsgid=test tty=pts0 ses=1 comm=python3 exe=/usr/bin/python3.13 subj=user_u:user_r:user_t:s0 key=(null) 
type=AVC msg=audit(05/03/26 10:11:08.070:76) : avc:  denied    create   for  pid=811 comm=python3 scontext=user_u:user_r:user_t:s0 tcontext=user_u:user_r:user_t:s0 tclass=alg_socket permissive=0 
When it Works Below is what happens when it works (again tested with Debian kernel 6.12.74+deb13+1-amd64):
test@testing1:~$ python3 ./copy_fail_exp.py 
# 
Here is the kernel log when the attack works:
[   30.441830] alg: No test for authencesn(hmac(sha256),cbc(aes)) (authencesn(hmac(sha256-avx2),cbc-aes-aesni))
[   30.447466] process 'su' launched '/bin/sh' with NULL argv: empty string added
When the Kernel Isn t Vulnerable If the kernel isn t vulnerable and SE Linux permits the attack (EG run from an unconfined domain) the following is seen on stdout/stderr:
$ python3 ./copy_fail_exp.py 
Password: 
su: Authentication failure
In that situation the kernel will log something like the following:
[   36.647023] alg: No test for authencesn(hmac(sha256),cbc(aes)) (authencesn(hmac-sha256-lib,cbc-aes-aesni))
This was tested on the Debian/Unstable kernel 6.19.13+deb14-amd64. Conclusion Run the following commands and then force all users to logout to make a Debian SE Linux system offering shell access reasonably safe against this bug. But also upgrade your kernel as soon as convenient because having multiple layers of protection is always good.
semanage login -m -s user_u -r s0 __default__
restorecon -R -v -F /home
semodule -X 100 -r mono wine
The GrapheneOS people are doing really good work on securing phones, I am most interested in Mobian (Debian on phones) but for people who have made different choices GrapheneOS is a good option. Here is the GrapheneOS statement on Copy Fail (they are not vulnerable to it) [3]. For people interested in running a secure Android build GrapheneOS is the best option. Their supported devices list shows Pixel 6 to Pixel 10 supported and Pixel 8 to Pixel 10a recommended [4]. In Australia Kogan sells refurbished Pixel 6 phones starting at $251 including delivery and refurbished Pixel 8 phones starting at $499 with First membership, they seem to have the cheapest Pixel phones. I want to make Debian more like Android in terms of security, but that s a topic for other blog posts. Here is the Debian page listing kernels that have been fixed against this exploit [5].

2 May 2026

Bits from Debian: Debian welcomes the 2026 GSoC interns

GSoC logo We are very excited to announce that Debian has been assigned seven contributors to work under mentorship on a variety of projects with us during the Google Summer of Code. Here is a list of the projects and contributors, along with details of the tasks to be performed.
Project: Automated Debian Packaging with debianize Deliverables of the project: Debianize is a tool that aims to automatically create debian packages from scratch from upstream source trees. As for the current version, it works for some of the packages but it is not reliable. This project aims at making it production ready such that it can work with most of the projects. Along with that improving its reliability, coverage, integration with the broader ecosystem and other enhancements.
Project: Linux Livepatching Deliverables of the project: Linux Kernel Livepatching is the process of replacing functions in the kernel code affected by CVEs with the patch-applied functions during system runtime. It's basically a method to apply security kernel patches to a running system.
Project: DebNet: Visualising the Bus Factor Graph Analysis of Debian's Infrastructure Deliverables of the project: DebNet models the Debian archive as a graph to identify critical packages maintained by too few people. Using data from the Ultimate Debian Database (UDD), it builds a package dependency graph and a maintainer-package graph to compute practical metrics like the Bus Factor, Fragility Score, and Dependency Impact for every source package.
Project: Attack of the Clones: Fight Back Using Code Duplication Detection From Security Patches Deliverables of the project: This project aims to detect vulnerable code clones in the Debian archive by automatically extracting signatures from security patches. Using a two-signal approach that separates vulnerable patterns from fix patterns, the system generates high-specificity queries to search the entire archive via Debian CodeSearch.
Project: Debusine: debuginfod server Deliverables of the project: This project implements a debuginfod-compatible server within Debusine to provide automated debug symbol resolution for Debian developers.
Project: Debian-LSP: Improve File Format Support Deliverables of the project: The Debian LSP Language Server currently provides only basic features field completion, parse-error diagnostics, and simple quick fixes leaving Debian maintainers without the rich IDE experience available in other ecosystems.
Project: Debusine: live log streaming Deliverables of the project: Debusine currently only shows task logs after a task has fully completed. This means developers working with long-running jobs (such as package builds or test pipelines) have no way to monitor progress in real time or catch failures early. This project adds live log streaming to Debusine.
Congratulations and welcome to all the contributors! The Google Summer of Code program is possible in Debian thanks to the efforts of Debian Developers and Debian Contributors that dedicate part of their free time to mentor contributors and outreach tasks. Join us and help extend Debian! You can follow the contributors' weekly reports on the debian-outreach mailing-list, chat with us on our IRC channel or reach out to the individual projects' team mailing lists.

15 April 2026

Paul Tagliamonte: designing arf, an sdr iq encoding format

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

Want to jump right to the draft? I'll be maintaining ARF going forward at /draft-tagliamonte-arf-00.txt.
It s true processing data from software defined radios can be a bit complex which tends to keep all but the most grizzled experts and bravest souls from playing with it. While I wouldn t describe myself as either, I will say that I ve stuck with it for longer than most would have expected of me. One of the biggest takeaways I have from my adventures with software defined radio is that there s a lot of cool crossover opportunity between RF and nearly every other field of engineering. Fairly early on, I decided on a very light metadata scheme to track SDR captures, called rfcap. rfcap has withstood my test of time, and I can go back to even my earliest captures and still make sense of what they are IQ format, capture frequencies, sample rates, etc. A huge part of this was the simplicity of the scheme (fixed-lengh header, byte-aligned to supported capture formats), which made it roughly as easy to work with as a raw file of IQ samples. However, rfcap has a number of downsides. It s only a single, fixed-length header. If the frequency of operation changed during the capture, that change is not represented in the capture information. It s not possible to easily represent mulit-channel coherent IQ streams, and additional metadata is condemned to adjacent text files.

ARF (Archive of RF) A few years ago, I needed to finally solve some of these shortcomings and tried to see if a new format would stick. I sat down and wrote out my design goals before I started figuring out what it looked like. First, whatever I come up with must be capable of being streamed and processed while being streamed. This includes streaming across the network or merely written to disk as it s being created. No post-processing required. This is mostly an artifact of how I ve built all my tools and how I intereact with my SDRs. I use them extensively over the network (both locally, as well as remotely by friends across my wider lan). This decision sometimes even prompts me to do some crazy things from time to time. I need actual, real support for multiple IQ channels from my multi-channel SDRs (Ettus, Kerberos/Kracken SDR, etc) for playing with things like beamforming. My new format must be capable of storing multiple streams in a single capture file, rather than a pile of files in a directory (and hope they re aligned). Finally, metadata must be capable of being stored in-band. The initial set of metadata I needed to formalize in-stream were Frequency Changes and Discontinuities. Since then, ARF has grown a few more. After getting all that down, I opted to start at what I thought the simplest container would look like, TLV (tag-length-value) encoded packets. This is a fairly well trodden path, and used by a bunch of existing protocols we all know and love. Each ARF file (or stream) was a set of encoded packets (sometimes called data units in other specs). This means that unknown packet types may be skipped (since the length is included) and additional data can be added after the existing fields without breaking existing decoders.
tag
length
value
Heads up! Once this is posted, I'm not super likely to update this page. Once this goes out, the latest stable copy of the ARF spec is maintained at draft-tagliamonte-arf-00.txt. This page may quickly become out of date, so if you're actually interested in implementing this, I've put a lot of effort into making the draft comprehensive, and I plan to maintain it as I edit the format.
Unlike a traditional TLV structure, I opted to add flags to the top-level packet. This gives me a bit of wiggle room down the line, and gives me a feature that I like from ASN.1 a critical bit. The critical bit indicates that the packet must be understood fully by implementers, which allows future backward incompatible changes by marking a new packet type as critical. This would only really be done if something meaningfully changed the interpretation of the backwards compatible data to follow.
Flag Description
0x01Critical (tag must be understood)
Within each Packet is a tag field. This tag indicates how the contents of the value field should be interpreted.
Tag ID Description
0x01Header
0x02Stream Header
0x03Samples
0x04Frequency Change
0x05Timing
0x06Discontinuity
0x07Location
0xFEVendor Extension
In order to help with checking the basic parsing and encoding of this format, the following is an example packet which should parse without error.
 00, // tag (0; no subpacket is 0 yet)
 00, // flags (0; no flags)
 00, 00 // length (0; no data)
 // data would go here, but there is none
Additionally, throughout the rest of the subpackets, there are a few unique and shared datatypes. I document them all more clearly in the draft, but to quickly run through them here too:

UUID This field represents a globally unique idenfifer, as defined by RFC 9562, as 16 raw bytes.

Frequency Data encoded in a Frequency field is stored as microhz (1 Hz is stored as 1000000, 2 Hz is stored as 2000000) as an unsigned 64 bit integer. This has a minimum value of 0 Hz, and a maximum value of 18446744073709551615 uHz, or just above 18.4 THz. This is a bit of a tradeoff, but it s a set of issues that I would gladly contend with rather than deal with the related issues with storing frequency data as a floating point value downstream. Not a huge factor, but as an aside, this is also how my current generation SDR processing code (sparky) stores Frequency data internally, which makes conversion between the two natural.

IQ samples ARF supports IQ samples in a number of different formats. Part of the idea here is I want it to be easy for capturing programs to encode ARF for a specific radio without mandating a single iq format representation. For IQ types with a scalar value which takes more than a single byte, this is always paired with a Byte Order field, to indicate if the IQ scalar values are little or big endian.
ID Name Description
0x01f32interleaved 32 bit floating point scalar values
0x02i8 interleaved 8 bit signed integer scalar values
0x03i16interleaved 16 bit signed integer scalar values
0x04u8 interleaved 8 bit unsigned integer scalar values
0x05f64interleaved 64 bit floating point scalar values
0x06f16interleaved 16 bit floating point scalar values

Stream Header Immediately after the arf Header, some number of Stream Headers follow. There must be exactly the same number of Stream Header packets as are indicated by the num streams field of the Header. This has the nice effect of enabling clients to read all the stream headers without requiring buffering of unread packets from the stream.
id
flags
fmt
bo
rate
freq
guid
site
In order to help with checking the basic parsing and encoding of this format, the following is an example stream header subpacket (when encoded or decoded this will be found inside an ARF packet as described above) which should parse without error, with known values.
00, 01, // id (1)
00, 00, 00, 00, 00, 00, 00, 00, // flags
01, // format (float32)
01, // byte order (Little Endian)
00, 00, 01, d1, a9, 4a, 20, 00, // rate (2 MHz)
00, 00, 5a, f3, 10, 7a, 40, 00, // frequency (100 MHz)

// guid (7b98019d-694e-417a-8f18-167e2052be4d)
7b, 98, 01, 9d, 69, 4e, 41, 7a,
8f, 18, 16, 7e, 20, 52, be, 4d,

// site_id (98c98dc7-c3c6-47fe-bc05-05fb37b2e0db)
98, c9, 8d, c7, c3, c6, 47, fe,
bc, 05, 05, fb, 37, b2, e0, db,

Samples Block of IQ samples in the format indicated by this stream s format and byte_order field sent in the related Stream Header.
id
iq samples
In order to help with checking the basic parsing and encoding of this format, the following is an samples subpacket (when encoded or decoded this will be found inside an ARF packet as described above). The IQ values here are notional (and are either 2 8 bit samples, or 1 16 bit sample, depending on what the related Stream Header was).
01, // id
ab, cd, ab, cd, // iq samples

Frequency Change The center frequency of the IQ stream has changed since the Stream Header or last Frequency Change has been sent. This is useful to capture IQ streams that are jumping around in frequency during the duration of the capture, rather than starting and stopping them.
id
frequency
In order to help with checking the basic parsing and encoding of this format, the following is a frequency change subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
01, // id
00, 00, b5, e6, 20, f4, 80, 00 // frequency (200 MHz)

Discontinuity Since the last Samples packet for this stream, samples have been dropped or not encoded to this stream. This can be used for a stream that has dropped samples for some reason, a large gap (radio was needed for something else), or communicating iq snippits .
id
In order to help with checking the basic parsing and encoding of this format, the following is a discontinuity subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
01, // id

Location Up-to-date location as of this moment of the IQ stream, usually from a GPS. This allows for in-band geospatial information to be marked in the IQ stream. This can be used for all sorts of things (detected IQ packet snippits aligned with a time and location or a survey of rf noise in an area)
flags
sys
lat
long
el
accuracy
The sys field indicates the Geodetic system to be used for the provided latitude, longitude and elevation fields. The full list of supported geodetic systems is currently just WGS84, but in case something meaningfully changes in the future, it d be nice to migrate forward. Unfortunately, being a bit of a coward here, the accuracy field is a bit of a cop-out. I d really rather it be what we see out of kinematic state estimation tools like a kalman filter, or at minimum, some sort of ellipsoid. This is neither of those - it s a perfect sphere of error where we pick the largest error in any direction and use that. Truthfully, I can t be bothered to model this accurately, and I don t want to contort myself into half-assing something I know I will half-ass just because I know better.
System Description
0x01 WGS84 - World Geodetic System 1984
In order to help with checking the basic parsing and encoding of this format, the following is a location subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
00, 00, 00, 00, 00, 00, 00, 00, // flags
01, // system (wgs84)
3f, f3, be, 76, c8, b4, 39, 58, // latitude (1.234)
40, 02, c2, 8f, 5c, 28, f5, c3, // longitude (2.345)
40, 59, 00, 00, 00, 00, 00, 00, // elevation (100)
40, 24, 00, 00, 00, 00, 00, 00 // accuracy (10)

Vendor Extension In addition to the fields I put in the spec, I expect that I may need custom packet types I can t think of now. There s all sorts of useful data that could be encoded into the stream, so I d rather there be an officially sanctioned mechanism that allows future work on the spec without constraining myself. Just an example, I ve used a custom subpacket to create test vectors, the data is encoded into a Vendor Extension, followed by the IQ for the modulated packet. If the demodulated data and in-band original data don t match, we ve regressed. You could imagine in-band speech-to-text, antenna rotator azimuth information, or demodulated digital sideband data (like FM HDR data) too. Or even things I can t even think of!
id
data
In order to help with checking the basic parsing and encoding of this format, the following is a vendor extension subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
// extension id (b24305f6-ff73-4b7a-ae99-7a6b37a5d5cd)
b2, 43, 05, f6, ff, 73, 4b, 7a,
ae, 99, 7a, 6b, 37, a5, d5, cd,

// data (0x01, 0x02, 0x03, 0x04, 0x05)
01, 02, 03, 04, 05

Tradeoffs The biggest tradeoff that I m not entirely happy with is limiting the length of a packet to u16 65535 bytes. Given the u8 sample header, this limits us to 8191 32 bit sample pairs at a time. I wound up believing that the overhead in terms of additional packet framing is worth it because always encoding 4 byte lengths felt like overkill, and a dynamic length scheme ballooned codepaths in the decoder that I was trying to keep as easy to change as possible as I worked with the format.

Freexian Collaborators: Debian Contributions: Debusine projects in GSoC, Debian CI updates, Salsa CI maintenance and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-03 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debusine projects in Google s Summer of Code While Freexian initiated Debusine, and is investing a lot of resources in the project, we manage it as a true free software project that can and should have a broader community. We always had documentation for new contributors and we aim to be reactive with them when they interact via the issue tracker or via merge requests. We decided to put those intentions under stress tests by proposing five projects for Google s Summer of Code as part of Debian s participation in that program. Given that at least 11 candidates managed to get their merge request accepted in the last 30 days (interacting with the development team is part of the pre-requisites to apply to Google Summer of Code projects these days), the contributing experience must not be too bad. If you want to try it out, we maintain a list of quick fixes that are accessible to newcomers. And as always, we welcome your feedback!

Debian CI: incus backend and upgrade to Bootstrap 5, by Antonio Terceiro debci 3.14 was released on March 4th, with a followup 3.14.1 release with regression fixes a few days afterwards. Those releases were followed by new development and maintenance work that will provide extra capabilities and stability to the platform. This month saw the initial version of an incus backend land in Debian CI. The transition into the new backend will be done carefully so as to not disrupt testing migration. Each package will be running jobs with both the current lxc backend and with incus. Packages that have the same result on both backends will be migrated over, and packages that exhibit different results will be investigated further, resulting in bug reports and/or other communication with the maintainers. On the frontend side, the code has been ported to Bootstrap 5 over from the now ancient Bootstrap 3. This need has been originally reported back in 2024 based on the lack of security support for Bootstrap 3. Beyond improving maintainability, this upgrade also enables support for dark mode in debci, which is still work in progress. Both updates mentioned in this section will be available in a following debci release.

Salsa CI maintenance by Santiago Ruano Rinc n et al. Santiago reviewed some Salsa CI issues and reviewed associated merge requests. For example, he investigated a regression (#545), introduced by the move to sbuild, on the use of extra repositories configured as .source files; and reviewed the MR (!712) that fixes it. Also, there were conflicts with changes made in debci 3.14 and debci 3.14.1 (those updates are mentioned above), and different people have contributed to fix the subsequent issues, in a long-term way. This includes Rapha l who proposed MR !707 and who also suggested Antonio to merge the Salsa CI patches to avoid similar errors in the future. This happened shortly after. Those fixes finally required the unrelated MR !709, which will prevent similar problems when building images. To identify bugs related to the autopkgtest support in the backport suites as early as possible, Santiago proposed MR !708. Finally, Santiago, in collaboration with Emmanuel Arias also had exchanges with GSoC candidates for the Salsa CI project, including the contributions they have made as merge requests. It is important to note that there are several very good candidates interested in participating. Thanks a lot to them for their work so far!

Miscellaneous contributions
  • Rapha l reported a zim bug affecting Debian Unstable users, which was already fixed in git apparently. He could thus cherry-pick the fix and update the package in Debian Unstable.
  • Carles created a new page on the InstallingDebianOn in Debian Wiki.
  • Carles submitted translation errors in the debian-installer Weblate.
  • Carles, using po-debconf-manager, improved Catalan translations: reviewed and submitted 3 packages. Also improved error handling when forking or submitting an MR if the fork already existed.
  • Carles kept improving check-relations: code base related general improvements (added strict typing, enabled pre-commit). Also added DebPorts support, virtual packages support and added commands for reporting missing relations and importing bugs from bugs.debian.org.
  • Antonio handled miscellaneous Salsa support requests.
  • Antonio improved the management of MiniDebConf websites by keeping all non-secret settings in git and fixed exporting these sites as static HTML.
  • Stefano uploaded routine updates to hatchling, python-mitogen, python-virtualenv, python-discovery, dh-python, pypy3, python-pipx, and git-filter-repo.
  • Faidon uploaded routine updates to crun, libmaxminddb, librdkafka, lowdown, platformdirs, python-discovery, sphinx-argparse-cli, tox, tox-uv.
  • Stefano and Santiago continued to help with DebConf 26 preparations.
  • Stefano reviewed some contributions to debian-reimbursements and handled admin for reimbursements.debian.net.
  • Stefano attended the Debian Technical Committee meeting.
  • Helmut sent 8 patches for cross build failures.
  • Building on the work of postmarketOS, Helmut managed to cross build systemd for musl in rebootstrap and sent several patches in the process.
  • Helmut reviewed several MRs of Johannes Schauer Marin Rodrigues expanding support for DPKG_ROOT to support installing hurd.
  • Helmut incorporated a final round of feedback for the Multi-Arch documentation in Debian policy, which finally made it into unstable together with documentation of Build-Profiles.
  • In order to fix python-memray, Helmut NMUed libunwind generally disabling C++ exception support as being an incompatible duplication of the gcc implementation. Unfortunately, that ended up breaking suricata on riscv64. After another NMU, python-memray finally migrated.
  • Thorsten uploaded new upstream versions of epson-inkjet-printer-escpr and sane-airscan. He also fixed a packaging bug in printer-driver-oki. As of systemd 260.1-1 the configuration of lpadmin has been added to the sysusers.d configuration. All printing packages can now simply depend on the systemd-sysusers package and don t have to take care of its creation in maintainer scripts anymore.
  • In collaboration with Emmanuel Arias, Santiago had exchanges with GSoC candidates and reviewed the proposals of the Linux livepatching GSoC 2026 project.
  • Colin helped to fix CVE-2026-3497 in openssh and CVE-2026-28356 in multipart.
  • Colin upgraded tango and pytango to new upstream releases and packaged pybind11-stubgen (needed for pytango), thanks to a Freexian customer. Tests of reproducible builds revealed that pybind11-stubgen didn t generate imports in a stable order; this is now fixed upstream.
  • Lucas fixed CVE-2025-67733 and CVE-2026-21863 affecting src:valkey in unstable and testing. Also reviewed the same fixes targeting stable proposed by Peter Wienemann.
  • Faidon worked with upstream and build-dep Debian maintainers on resolving blockers in order to bring pyHanko into Debian, starting with the adoption of python-pyhanko-certvalidator. pyHanko is a suite for signing and stamping PDF files, and one of the few libraries that can be leveraged to sign PDFs with eIDAS Qualified Electronic Signatures.
  • Anupa co-organized MiniDebConf Kanpur and attended the event with many others from all across India. She handled the accommodation arrangements along with the registration team members, worked on the budget and expenses. She was also a speaker at the event.
  • Lucas helped with content review/schedule for the MiniDebConf Campinas. Thanks Freexian for being a Gold sponsor!
  • Lucas organized and took part in a one-day in-person sprint to work on Ruby 3.4 transition. It was held in a coworking space in Brasilia - Brazil on April 6th. There were 5 DDs and they fixed multiple packages FTBFSing against Ruby 3.4 (coming to unstable soon hopefully). Lucas has been postponing a blog post about this sprint since then :-)

14 April 2026

Russell Coker: Furilabs FLX1s Finally Working

I ve been using the Furilabs FLX1s phone [1] as my daily driver for 6 weeks, it s a decent phone, not as good as I hoped but good enough to use every day and rely on for phone calls about job interviews etc. I intend to keep using it as my main phone and as a platform to improve phone software in Debian as you really can t effectively find bugs unless you use the platform for important tasks. Support Problems I previously wrote about the phone after I received it without a SIM caddy on the 13th of Jan. I had a saga with support about this, on the 16th of Jan one support person said that they would ship it immediately but didn t provide a tracking number or any indication of when it would arrive. On the 5th of Feb I contacted support again and asked how long it would be, the new support person seemed to have no record of my previous communication but said that they would send it. On the 17th of Feb I made another support request including asking for a way of direct communication as the support email came from an address that wouldn t accept replies, I was asked for a photo showing where the problem is. The support person also said that they might have to send a replacement phone! The last support request I sent included my disappointment at the time taken to resolve the issue and the proposed solution of replacing the entire phone (why have two international shipments of a fragile and expensive phone when a single letter with a cheap SIM caddy would do?). I didn t receive a reply but the SIM caddy arrived on the 2nd of Mar. Here is a pic of the SIM caddy and the package it came in: One thing that should be noted is that some of the support people seemed to be very good at their jobs and they were all friendly. It was the system that failed here, turning a minor issue of a missing part into a 6 week saga. Furilabs needs to do the following to address this issue:
  1. Make it possible to reply directly to a message from a support person. Accept email with a custom subject to sort it, give a URL for a web form, anything. Collating discussions with a customer allows giving better support while taking less time for the support people.
  2. Have someone monitor every social media address that is used by the company. When someone sends a support request in a public Mastodon post it indicates that something has gone wrong and you want to move quickly to resolve it.
  3. Take care of the little things, like sending a tracking number for every parcel. If it s something too small for a parcel (the SIM caddy could have fit in a regular letter) then just tell the customer what date it was posted and where it was posted from so they have some idea of when it will arrive.
This is not just a single failure of Furilabs support, it s a systemic failure of their processes. Problems I Will Fix Unless Someone Beats Me to it Here are some issues I plan to work on. Smart Watch Support I need to port one of the smart watch programs to Debian. Also I want to make one of them support the Colmi P80 [2]. A smart watch significantly increases the utility of a phone even though IMHO they aren t doing nearly all the things that they could and should do. When we get Debian programs talking to the PineTime it will make a good platform for development of new smart phone and OS features. Nextcloud I have ongoing issues of my text Nextcloud installation on a Debian VM not allowing connection from the Linux desktop app (as packaged in Debian) and from the Android client (from f-droid). The desktop client works with a friend s Nextcloud installation on Ubuntu so I may try running it on an Ubuntu VM I run while waiting for the Debian issue to get resolved. There was a bug recently fixed in Nextcloud that appears related so maybe the next release will fix it. For the moment I ve been running without these features and I call and SMS people from knowing their number or just returning calls. Phone calls generally aren t very useful for me nowadays except when applying for jobs. If I could deal with recruiters and hiring managers via video calls then I would consider just not having a phone number. Wifi IPv6 Periodically IPv6 support just stops working, I can t ping the gateway. I turn wifi off and on again and it works. This might be an issue with my wifi network configuration. This might be an issue with the way I have configured my IPv6 networking, although that problem doesn t happen with any of my laptops. Chatty Sorting Chatty is the program for SMS that is installed by default (part of the phosh/phoc setup), it also does Jabber. Version 0.8.7 is installed which apparently has some Furios modifications and it doesn t properly support sorting SMS/Jabber conversations. Version 0.8.9 from Debian sorts in the same way as most SMS and Jabber programs with the most recent at the top. But the Debian version doesn t support Jabber (only SMS and Matrix). When I went back to the Furilabs version of Chatty it still sorted for a while but then suddenly stopped. Killing Chatty (not just closing the window and reopening it) seems to make it sort the conversations sometimes. Problems for Others to Fix Here are the current issues I have starting with the most important. Important The following issues seriously reduce the usability of the device. Hotspot The Wifi hotspot functionality wasn t working for a few weeks, this Gitlab issue seems to match it [3]. It started working correctly for a day and I was not sure if an update I applied fixed the bug or if it s some sort of race condition that worked for this boot and will return next time I reboot it. Later on I rebooted it and found that it s somewhat random whether it works or now. Also while it is mostly working it seemed to stop working about every 25 minutes or so and I had to turn it off and on again to get it going. On another day it went to a stage where it got repeated packet loss when I pinged the phone as a hotspot from my laptop. A pattern of 3 ping responses and 3 Destination Host Unreachable messages was often repeated. I don t know if this is related to the way Android software is run in a container to access the hardware. 4G Reliability Sometimes 4G connectivity has just stopped, sometimes I can stop and restart the 4G data through software to fix it and sometimes I need to use the hardware switch. I haven t noticed this for a week or two so there is a possibility that one fix addressed both Hotspot and 4G. One thing that I will do is setup monitoring to give an alert on the phone if it can t connect to the Internet. I don t want it to just quietly stop doing networking stuff and not tell me! On-screen Keyboard The compatibility issues of the GNOME and KDE on-screen keyboards are getting me. I use phosh/phoc as the login environment as I want to stick to defaults at first to not make things any more difficult than they need to be. When I use programs that use QT such as Nheko the keyboard doesn t always appear when it should and it forgets the setting for word completion (which means spelling correction). The spelling correction system doesn t suggest replacing dont with don t which is really annoying as a major advantage for spelling checkers on touch screens is inserting an apostrophy. An apostrophy takes at least 3* longer than a regular character and saving that delay makes a difference to typing speed. The spelling correction doesn t correct two words run together. Medium Priority These issues are ongoing annoyances. Delay on Power Button In the best case scenario this phone has a much slower response to pressing the power button than the Android phones I tested (Huawei Mate 10 Pro and Samsung Galaxy Note 9) and a much slower response than my recollection of the vast majority of Android phones I ve ever used. For testing pressing buttons on the phones simultaneously resulted in the Android phone screens lighting up much sooner. Something like 200ms vs 600ms I don t have a good setup to time these things but it s very obvious when I test. In a less common case scenario (the phone having been unused for some time) the response can be something like 5 seconds. The worst case scenario is something in excess of 20 seconds. For UI designers, if you get multiple press events from a button that can turn the screen on/off please make your UI leave the screen on and ignore all the stacked events. Having the screen start turning on and off repeatedly when the phone recovers and processes all the button presses isn t good, especially when each screen flash takes half a second. Notifications Touching on a notification for a program often doesn t bring it to the foreground. I haven t yet found a connection between when it does and when it doesn t. Also the lack of icons in the top bar on the screen to indicate notifications is annoying, but that seems to be an issue of design not the implementation. Charge Delay When I connect the phone to a power source there is a delay of about 22 seconds before it starts to charge. Having it miss 22 seconds of charge time is no big deal, having to wait 22 seconds to be sure it s charging before leaving it is really annoying. Also the phone makes an audible alert when it gets to 0% charge which woke me up one night when I had failed to push the USB-C connector in hard enough. This phone requires a slightly deeper connector than most phones so with some plugs it s easy to not quite insert them far enough. Torch aka Flash The light for the torch or flash for camera is not bright at all. In a quick test staring into the light from 40cm away wasn t unpleasant compared to my Huawei Mate 10 Pro which has a light bright enough that it hurts to look at it from 4 meters away. Because of this photos at night are not viable, not even when photographing something that s less than a meter away. The torch has a brightness setting which doesn t seem to change the brightness, so it seems likely that this is a software issue and the brightness is set at a low level and the software isn t changing it. Audio When I connect to my car the Lollypop player starts playing before the phone directs audio to the car, so the music starts coming from the phone for about a second. This is an annoying cosmetic error. Sometimes audio playing pauses for no apparent reason. It doesn t support the phone profile with Bluetooth so phone calls can t go through the car audio system. Also it doesn t always connect to my car when I start driving, sometimes I need to disable and enable Bluetooth to make it connect. When I initially set the phone up Lollypop would send the track name when playing music through my car (Nissan LEAF) Bluetooth connection, after an update that often doesn t happen so the car doesn t display the track name or whether the music is playing but the pause icon works to pause and resume music (sometimes it does work). About 30 seconds into a phone call it switches to hands-free mode while the icon to indicate hands-free is not highlighted, so I have to press the hands-free button twice to get it back to normal phone mode. Low Priority I could live with these things remaining as-is but it s annoying. Ticket Mode There is apparently some code written to display tickets on screen without unlocking. I want to get this working and store screen-caps of the Android barcode screens of the different loyalty cards so I can scan them without unlocking. My threat model does not include someone trying to steal my phone to get a free loaf of bread on the bakery loyalty program. Camera The camera app works with both the back and front cameras, which is nice, and sadly based on my experience with other Debian phones it s noteworthy. The problem is that it takes a long time to take a photo, something like a second after the button is pressed long enough for you to think that it just silently took a photo and then move the phone. The UI of the furios-camera app is also a little annoying, when viewing photos there is an icon at the bottom left of the screen for a video camera and an icon at the bottom right with a cross. Which every time makes me think record videos and leave this screen not return to taking photos and delete current photo . I can get used to the surprising icons, but being so slow is a real problem. GUI App Installation The program for managing software doesn t work very well. It said that there were two updates for Mesa package needed, but didn t seem to want to install them. I ran flatpak update as root to fix that. The process of selecting software defaults to including non-free, and most of the available apps are for desktop/laptop with no way to search for phone/tablet apps. Generally I think it s best to just avoid this and use apt and flatpak directly from the command-line. Being able to ssh to my phone from a desktop or laptop is good! Android Emulation The file /home/furios/.local/share/andromeda/data/system/uiderrors.txt is created by the Andromeda system which runs Android apps in a LXC container and appears to grow without end. After using the phone for a month it was 3.5G in size. The disk space usage isn t directly a problem, out of the 110G storage space only 17G is used and I don t have a need to put much else on it, even if I wanted to put backups of /home from my laptop on it when travelling that would still leave plenty of free space. But that sort of thing is a problem for backing up the phone and wasting 3.5G out of 110G total is a fairly significant step towards breaking the entire system. Also having lots of logging messages from a subsystem that isn t even being used is a bad sign. I just tried using it and it doesn t start from either the settings menu or from the f-droid icon. Android isn t that important to me as I want to get away from the proprietary app space so I won t bother trying this any more. Unfixable Problems Unlocking After getting used to fingerprint unlocking going back to a password is a pain. I think that the hardware isn t sufficient for modern quality face recognition that can t be fooled by a photo and there isn t fingerprint hardware. When I first used an Android phone using a pin to unlock didn t seem like a big deal, but after getting used to fingerprint unlock it s a real drag to go without. This is a real annoyance when doing things like checking Wikipedia while watching TV. This phone would be significantly improved with a fingerprint sensor or a camera that worked well enough for face unlock. Plasma Mobile According to Reddit Plasma Mobile (KDE for phones) doesn t support Halium and can never work on this phone because of it [4]. This is one of a number of potential issues with the phone, running on hardware that was never designed for open OSs is always going to have issues. Wifi MAC Address The MAC keeps changing on reboot so I can t assign a permanent IPv4 address to the phone. It appears from the MAC prefix of 00:08:22 that the network hardware is made in InPro Comm which is well known for using random addresses in the products it OEMs. They apparently have one allocation of 2^24 addresses and each device randomly chooses a MAC from that range on boot. In the settings for a Wifi connection the Identity tab has a field named Cloned Address which can be set to Stable for SSID that prevents it from changing and allows a static IP address allocation from DHCP. It s not ideal but it works. Network Manager can be configured to have a permanent assigned MAC address for all connections or for just some connections. In the past for such things I have copied MAC addresses from ethernet devices that were being discarded and used them for such things. For the moment the Stable for SSID setting does what I need but I will consider setting a permanent address at some future time. Docks Having the ability to connect to a dock is really handy. The PinePhonePro and Librem5 support it and on the proprietary side a lot of Samsung devices do it with a special desktop GUI named Dex and some Huawei devices also have a desktop version of the GUI. It s unfortunate that this phone can t do it. The Good Things It s good to be able to ssh in to my phone, even if the on-screen keyboard worked as well as the Android ones it would still be a major pain to use when compared to a real keyboard. The phone doesn t support connecting to a dock (unlike Samsung phones I ve used for which I found Dex to be very useful with a 4K monitor and proper keyboard) so ssh is the best way to access it. This phone has very reliable connections to my home wifi. I ve had ssh sessions from my desktop to my phone that have remained open for multiple days. I don t really need this, I ve just forgotten to logout and noticed days later that the connection is still running. None of the other phones running Debian could do that. Running the same OS on desktop and phone makes things easier to test and debug. Having support for all the things that Linux distributions support is good. For example none of the Android music players support all the encodings of audio that comes from YouTube so to play all of my music collection on Android I would need to transcode most of them which means either losing quality, wasting storage space, or both. While Lollypop plays FLAC0, mp3, m4a, mka, webm, ogg, and more. Conclusion This is a step towards where I want to go but it s far from the end goal. The PinePhonePro and Librem5 are more open hardware platforms which have some significant benefits. But the battery life issues make them unusable for me. Running Mobian on a OnePlus 6 or Droidian on a Note 9 works well for the small tablet features but without VoLTE. While the telcos have blocked phones without VoLTE data devices still work so if recruiters etc would stop requiring phone calls then I could make one of them an option. The phone works well enough that it could potentially be used by one of my older relatives. If I could ssh in to my parents phones when they mess things up that would be convenient. I ve run this phone as my daily driver since the 3rd of March and it has worked reasonably well. 6 weeks compared to my previous use of the PinePhonePro for 3 days. This is the first time in 15 years that a non-Android phone has worked for me personally. I have briefly used an iPhone 7 for work which basically did what it needed to do, it was at the bottom of the pile of unused phones at work and I didn t want to take a newer iPhone that could be used by someone who s doing more than the occasional SMS or Slack message. So this is better than it might have been, not as good as I hoped, but a decent platform to use it while developing for it.

Ravi Dwivedi: Hungary Visa

The annual LibreOffice conference 2025 was held in Budapest, Hungary, from the 3rd to the 6th of September 2025. Thanks to the The Document Foundation (TDF) for sponsoring me to attend the conference. As Hungary is a part of the Schengen area, I needed a Schengen visa to attend the conference. In order to apply for a Schengen visa, one needs to get an appointment at VFS Global and submit all the required documents there, which are then forwarded to the embassy. I got an appointment for a Hungary visa at VFS Global in New Delhi for the 24th of July. There were many appointment slots available for the Hungary visa. One could easily get an appointment for the next day at the Delhi center. There were some technical problems on the VFS website, though, as I was unable to upload a scanned copy of my passport while booking the appointment. I got an error saying, Unfortunately, you have exceeded the maximum upload limit. The problem didn t get fixed even after contacting the VFS helpline. They asked me to try in the Firefox browser and deleting all the cache, which I already did. So I created another account with a different email address and phone number, after which I was able to upload my passport and book an appointment. Other conference attendees from India also reported facing some technical issues on the VFS Hungary website. Anyway, I went to the VFS Hungary application center as per my appointment on the 24th of July. Going inside, I located the Hungary visa application counter. There were two applicants ahead of me. When it was my turn, the VFS staff warned me that my passport was damaged. The damage was on the bio-data page. All the details could be seen, but the lamination of the details page wore off a bit. They asked me to write an application to the Embassy of Hungary in New Delhi stating that I insist VFS to submit my application along with describing the damage on my passport. I got a bit worried about my application getting rejected due to the damage. But I decided to gamble my money on this one, as I didn t have time (and energy) to apply for a new passport before this trip. Moreover, I had struck down a couple of fields in my visa application form which were not applicable to me, due to which the VFS staff asked me to fill out another visa application. After this, the application got submitted, and it was 11,000 INR (including the fee to book the appointment at VFS). Here is the list of documents I submitted: It took 2 hours for me to submit my visa application, even though there were only two applicants before me. This was by far the longest time to submit a Schengen visa application for me. Fast-forward to the 30th of July, and I received an email from the Embassy of Hungary asking me to submit an additional document - paid air ticket - for my application. I had only submitted dummy flight tickets, and they were enough for the Schengen visas I applied for until now. This was the first time a country was asking me to submit a confirmed flight ticket during the visa process. I consulted my travel agent on this, and they were fairly confident that I will get the visa if the embassy is asking me to submit confirmed flight tickets. So I asked the travel agent to book the flight tickets. These tickets were 78,000, and the airline was Emirates. Then, I sent the flight tickets to the embassy by email. The embassy sent the visa results on the 6th of August, which I received the next day. My visa had been approved! It took 14 days for me to get the Hungary visa after submitting the application. See you in the next one! Thanks to Badri for proofreading.

12 April 2026

Colin Watson: Free software activity in March 2026

My Debian contributions this month were all sponsored by Freexian. You can also support my work directly via Liberapay or GitHub Sponsors. OpenSSH I fixed CVE-2026-3497 in unstable, thanks to a fix in Ubuntu by Marc Deslauriers. Relatedly, I applied an Ubuntu patch by Athos Ribeiro to not default to weak GSS-API exchange algorithms. I m looking forward to being able to split out GSS-API key exchange support in OpenSSH once Ubuntu 26.04 LTS has been released! This stuff will still be my problem, but at least it won t be in packages that nearly everyone has installed. Python packaging New upstream versions: I packaged pybind11-stubgen, needed for new upstream versions of pytango. Tests of reproducible builds revealed that it didn t generate imports in a stable order; I contributed a fix for that upstream. I worked with the security team to release DSA-6161-1 in multipart, fixing CVE-2026-28356 (upstream discussion). (Most of the work for this was in February, but the vulnerability was still embargoed when I published my last monthly update.) In trixie-backports, I updated pytest-django to 4.12.0. I fixed a number of packages to support building with pyo3 0.28: Other build/test failures: Rust packaging New upstream versions: Other bits and pieces I upgraded tango to 10.1.2, and yubihsm-shell to 2.7.2. Code reviews

1 April 2026

Ben Hutchings: FOSS activity in March 2026

31 March 2026

Russ Allbery: Review: Code Blue Emergency

Review: Code Blue Emergency, by James White
Series: Sector General #7
Publisher: Orb
Copyright: 1987
Printing: May 2003
ISBN: 0-7653-0663-8
Format: Trade paperback
Pages: 252
Code Blue Emergency (annoying em-dash in original title) is the seventh book of James White's Sector General science fiction series about a vast multi-species hospital station. While there are some references to (and spoilers for) earlier books in the series, you don't have to remember the previous books to read this one. I had no trouble despite a nine-year gap. I read this as part of the Orb General Practice omnibus, which collects this novel and The Genocidal Healer. Cha Thrat is a Sommaradvan warrior-surgeon, member of a newly-discovered species that is beginning the process of contact with the Federation. She saved a Monitor corps human after an accident on her world, performing some some highly competent surgery on a species she had never seen before. That plus her somewhat outcast status on her own world due to her very traditional attitude towards medical ethics led Sector General to extend an offer of medical internship, and led her to leap into the unknown by accepting. This may have been a mistake; there is a great deal that Sector General does not understand about Sommaradvan medical ethics. This series entry is another proper (if somewhat episodic) novel and the first book of the series that doesn't primarily focus on Conway. He makes an appearance in his new role as Diagnostician, but only as a supporting character. Code Blue Emergency is told in the tight third-person perspective of Cha Thrat, an alien who finds many things about Sector General baffling, confusing, and ethically troubling (and who therefore provides a good reader surrogate for reintroducing the basics of how the hospital works). Using an alien viewpoint is a more sophisticated narrative technique than White has used previously. I'm glad he tried it, and it mostly works, although I have some complaints. Cha Thrat comes from the middle caste of a strictly hierarchical society of three castes, but is also immensely stubborn and used to a medical system in which doctors take sole responsibility for their patients. This creates a lot of cultural conflicts, and I do enjoy science fiction where the human attitudes are portrayed as the strange ones, but the cultural analysis offered by this novel is not very deep. The pattern of this book is for Cha Thrat to stumble into a successful approach to a problem while being either oblivious to or hostile to the normal hierarchical structure expected of medical trainees. This is believable as far as it goes. She is a skilled and intelligent doctor with some good instincts and a strong commitment to patient care, but is also culturally inclined to not ask for help. It makes sense for that to be a serious problem in a hospital. Unfortunately, no one says this directly. Sector General staff get quite upset in ways that seem more territorial than oriented towards patient safety, no one directly explains to Cha Thrat why following a process is important or shows examples of what could go wrong, and plot armor means that her mistakes usually have positive outcomes. One can extrapolate the reasons why she is not a good medical student, but the reader is forced to do the extrapolation. This is the sort of book where the narration makes clear there are unresolved cultural clashes that are going to cause problems but hides the details. To Cha Thrat, her perspective is so obvious she never bothers to explain it to the reader, so the specifics come as a surprise. As with the alien perspective, I've seen this technique used with more subtlety and sophistication in other books, but White's version mostly works. Cha Thrat is a sympathetic protagonist because she is truly trying to take the most ethical and empathetic action in every situation and is clearly competent. Most of my frustration as a reader, ironically, lands on the other Sector General doctors who seem to make little to no effort to understand her perspective when she fails to conform to their expectations. This is believable in the abstract, but the whole point of Sector General is that they're supposed to be wiser about interspecies difference than this. Also, sometimes their reactions just seem petty. Cha Thrat has a very hierarchical concept of medicine that matches the social classes of her culture. For her, the highest tier of doctor are wizards who treat rulers, because the work of rulers is mostly mental and intellectual and therefore the diseases of rulers are treated with magic spells performed with words to reshape their thinking rather than surgery on their bodies. O'Mara and the other Sector General psychologists take great offense at this, muttering about being called witch doctors, which I found completely absurd. This is a comprehensible, if odd, description of psychology from a wholly alien species. Surely one's first reaction should be that words like "wizard" or "magic" are translation errors. Don't get offended; look to see if the underlying substance matches, which it clearly does. Apart from cultural and psychological clashes, Code Blue Emergency has the standard episodic Sector General structure of interesting medical mysteries that require lateral thinking. I find this sort of puzzle story satisfying, particularly given the firm belief of every character in an essentially pacifist and empathetic approach to even the most alien of creatures. This determined non-violence is one of the more interesting things about this series, and it continues here. White does tend towards both biological and gender essentialism for everyone other than the protagonist and main supporting characters, but he seemed to be walking back some of the more outrageous limitations on women that appeared in previous books. There is still some nonsense in here about how females of any species can't be Diagnosticians, but then Cha Thrat, who is female, seems to violate the justification for that rule over the course of this novel (sadly without comment). Perhaps he's setting up for proving Sector General wrong about this prejudice. I picked this up after reading Elizabeth Bear's Machine, which is essentially a (better written) Sector General novel that got me in the mood for reading more. I wouldn't give Code Blue Emergency any awards, but it delivered exactly what I was looking for. This series is not as deep or well-written as some more recent SF, but it is reliably itself and reliably entertaining. There are worse things in a series. Recommended if you're in the mood for alien ER in space. The omnibus edition that I read has an introduction to both novels by John Clute. It does add some interesting insights, but (as is somewhat typical for Clute) it also spoils parts of both books. You may want to read it after you read the novels. Followed by The Genocidal Healer. Rating: 7 out of 10

30 March 2026

Jamie McClelland: Mailman3 has 2 databases. Whoops.

At May First we have been carefully planning our migration of about 1200 lists from mailman2 to mailman3 for almost six months now. We did a lot of user communications, had several months of beta testing with a handful of lists ported over, and everything was looking good. So we kicked off the migration! But, about 15% of the way through I started seeing sqlite lock errors. Wait, what? I carefully re-configured mailman3 to use postgres, not sqlite. Well, yes, but apparently that was for the database managing the email list configuration, not the database powering the django web app, which, incidentally, also includes hundresds of gigabytes of archives. In other words, the one we really need in postgres, not sqlite.

Moving from sqlite to postgres Well that sucks. We immediately stopped the migration to deal with this. I noticed that the web is full of useful django instructions on how to migrate your database from one database to antoher. However, if you read the fine print, those convenient looking dumpdata loaddata workflows are designed to move the table definitions and a small amount of data. In our case, even after just 15% of our lists moved, our sqlite database was about 30GB. I considered some of the hacks to manage memory and try to run this via django, but eventually decided that pgloader was a more robust option. This option also allowed me to more easily test things out on a copy of our sqlite database (made while mailman was turned off). This way I could migrate and re-migrate the sqlite database over and over without impacting our live installation until I was satisfied it was all working. My first decision was to opt out of pgloader s schema creation. I used django s schema creation tool by:
  • Turning off mailman3 and mailman3-web and changing the mailman web configuration to use the new postgresql database.
  • Running mailman-web migrate
  • Changing the mailman web configuration back to sqlite and starting everything again.
Note: I tried just adding new database settings in the mailman web configuration indexed to new - django has the ability to define different databases by name, then you can run mailman-web migrate --database new. But, during the migration, I caught django querying the sqlite database for some migrations that required referencing existing fields (specifically hyperkitty s 0003_thread_starting_email). I didn t want any of these steps to touch the live database so I opted for the cleaner approach. Once I had a clean postgres schema, I dumped it so I could easily return to this spot. Next I started working on our pgloader load file. After a lot of trial and error, I ended with:
LOAD DATABASE
    FROM sqlite:///var/lib/mailman3/sqlite-postgres-migration/mailman3web.clean.backup.db
    INTO postgresql://mailmanweb:xxxxxxxxxxx@localhost:5432/mailmanweb

WITH data only,
    reset sequences,
    include no drop,
    disable triggers,
    create no tables,
    batch size = 5MB,
    batch rows = 500,
    prefetch rows = 50,
    workers = 2,
    concurrency = 1

SET work_mem to '64MB',
    maintenance_work_mem to '512MB'

CAST type datetime to timestamptz drop default drop not null,
    type date to date drop default drop not null,
    type int when (= precision 1) to boolean using tinyint-to-boolean,
    type text to varchar using remove-null-characters;
The batch, prefetch, workers and concurreny settings are all there to ensure memory doesn t blow up. I also discovered that I had to make some changes to the schema before loading data. Mostly truncating tables that the django migrate command populated to avoid duplicate key errors:
TRUNCATE TABLE django_migrations CASCADE;
TRUNCATE TABLE django_content_type CASCADE;
TRUNCATE TABLE auth_permission CASCADE;
TRUNCATE TABLE django_site CASCADE;
And also, I had to change a column type. Apparently the mailman import process allowed an attachment file name that exceeds the limit for postgres, but was allowed into sqlite:
ALTER TABLE hyperkitty_attachment ALTER COLUMN name TYPE text
When pgloader runs, we still get a lot of warnings from pgloader, which wants to cast columns differently than django does. These are harmless (I was able to import the data without a problem). And there are still a lot of warnings along the lines of:
2026-03-30T14:08:01.691990Z WARNING PostgreSQL warning: constraint hyperkitty_vote_email_id_73a50f4d_fk_hyperkitty_email_id of relation hyperkitty_vote does not exist, skipping
These are harmless as well. They appear because disable triggers disables foreign key constraints. Without it, we wouldn t be able to load tables that require values in tables that have not yet been populated. After all the tweaking, the import of our 30GB sqlite database took about 40 minutes.

Final Steps I think the reset sequences from pgloader should take care of this, but just in case:
mailman-web sqlsequencereset hyperkitty mailman_django auth   mailman-web dbshell
And, just to ensure postgres is optimized, run this in the psql shell:
ANALYZE VERBOSE;

Last thoughts I understand very well all the decisions the mailman3 devs made in designing the next version of mailman, and if I was in the same place I may have made them the same ones. For example, separating the code running the mailing list from the code managing the archives and the web interface makes perfectly good sense - many people might want to run just the mailing list part without a web interface. And building the web interface in django makes a lot of sense as well - why re-invent the wheel? I m sure a lot of time and effort was saved by simply using the built in features you get for free with django. But the unfortunate consequence of these decisions is that sys admins have a much harder time. Almost everyone wants the email lists along with the web interface and the archives. But nobody wants two different configuration files with different syntaxes and logic, not to mention two different command lines to use for maintenance and configuration with completely different APIs. Trying to understand how to change a default template or set list defaults requires a lot of research and usually you have to write a python script to do it. I have finally come to the conclusion that mailman2 is designed for sys admins, while mailman3 is designed for developers. Despite these short comings, I am impressed with the community and their quick and friendly responses to the questions of a confused sys admin. That might be more valuable than anything else.

28 March 2026

Evgeni Golov: Converting Dovecot password schemes on the fly without (too much) cursing

I finally upgraded my mail server to Debian 13 and, as expected, the Dovecot part was quite a ride. The configuration syntax changed between Dovecot 2.3 (Debian 12) and Dovecot 2.4 (Debian 13), so I started first with diffing my configuration against a vanilla Debian 12 one (this setup is slightly old) and then applied the same (logical) changes to a vanilla Debian 13 one. This mostly went well. Mostly because my user database is stored in SQL and while the Dovecot Configuration Upgrader says it can convert old dovecot-auth-sql.conf.ext files to the new syntax, it only does so for the structure, not the SQL queries themselves. While I don't expect it to be able to parse the queries and adopt them correctly, at least a hint that the field names in userdb changed and might require adjustment would've been cool. Once I got that all sorted, Dovecot would still refuse to let me in:
Error: sql: Invalid password in passdb: Weak password scheme 'MD5-CRYPT' used and refused
Yeah, right. Did I mention that this setup is old? The quick cure against this is a auth_allow_weak_schemes = yes in /etc/dovecot/conf.d/10-auth.conf, but long term I really should upgrade the password hashes in the database to something more modern. And this is what this post is about. My database only contains hashed (and salted) passwords, so I can't just update them without changing the password. And while there are only 9 users in total, I wanted to play nice and professional. (LOL) There is a Converting Password Schemes howto in the Dovecot documentation, but it uses a rather odd looking PHP script, wrapped in a shell script which leaks the plaintext password to the process list, and I really didn't want to remember how to write PHP to complete this task. Luckily, I know Python. The general idea is: To make the plaintext password available to the post-login script, we add '% password ' as userdb_plain_pass to the SELECT statement of our passdb query. The original howto also says to add a prefetch userdb, which we do. The sql userdb remains, as otherwise Postfix can't use Dovecot to deliver mail. Now comes the interesting part. We need to write a script that is executed by Dovecot's script-login and that will update the database for us. Thanks to Python's passlib and mysqlclient, the database and hashing parts are relatively straight forward:
#!/usr/bin/env python3
import os
import MySQLdb
import passlib.hash
DB_SETTINGS =  "host": "127.0.0.1", "user": "user", "password": "password", "database": "mail" 
SELECT_QUERY = "SELECT password_enc FROM mail_users WHERE username=%(username)s"
UPDATE_QUERY = "UPDATE mail_users SET password_enc=%(pwhash)s WHERE username=%(username)s"
SCHEME = "bcrypt"
EXPECTED_PREFIX = "$2b$"
def main():
    # https://doc.dovecot.org/2.4.3/core/config/post_login_scripting.html
    # https://doc.dovecot.org/2.4.3/howto/convert_password_schemes.html
    user = os.environ.get("USER")
    plain_pass = os.environ.get("PLAIN_PASS")
    if plain_pass is not None:
        db = MySQLdb.connect(**DB_SETTINGS)
        cursor = db.cursor()
        cursor.execute(SELECT_QUERY,  "username": user )
        result = cursor.fetchone()
        current_pwhash = result[0]
        if not current_pwhash.startswith(EXPECTED_PREFIX):
            hash_module = getattr(passlib.hash, SCHEME)
            pwhash = hash_module.hash(plain_pass)
            data =  "pwhash": pwhash, "username": user 
            cursor.execute(UPDATE_QUERY, data)
        cursor.close()
        db.close()
if __name__ == "__main__":
    main()
But if we add that as executable = script-login /etc/dovecot/dpsu.py to our imap-postlogin service, as the howto suggests, the users won't be able to login anymore:
Error: Post-login script denied access to user
WAT? Remember that shell script I wanted to avoid? It ends with exec "$@". Turns out the script-login "API" is rather interesting. It's not "pass in a list of scripts to call and I'll call all of them". It's "pass a list of scripts, I'll execv the first item and pass the rest as args, and every item is expected to execv the next one again". With that (cursed) knowledge, the script becomes:
#!/usr/bin/env python3
import os
import sys
import MySQLdb
import passlib.hash
DB_SETTINGS =  "host": "127.0.0.1", "user": "user", "password": "password", "database": "mail" 
SELECT_QUERY = "SELECT password_enc FROM mail_users WHERE username=%(username)s"
UPDATE_QUERY = "UPDATE mail_users SET password_enc=%(pwhash)s WHERE username=%(username)s"
SCHEME = "bcrypt"
EXPECTED_PREFIX = "$2b$"
def main():
    # https://doc.dovecot.org/2.4.3/core/config/post_login_scripting.html
    # https://doc.dovecot.org/2.4.3/howto/convert_password_schemes.html
    user = os.environ.get("USER")
    plain_pass = os.environ.get("PLAIN_PASS")
    if plain_pass is not None:
        db = MySQLdb.connect(**DB_SETTINGS)
        cursor = db.cursor()
        cursor.execute(SELECT_QUERY,  "username": user )
        result = cursor.fetchone()
        current_pwhash = result[0]
        if not current_pwhash.startswith(EXPECTED_PREFIX):
            hash_module = getattr(passlib.hash, SCHEME)
            pwhash = hash_module.hash(plain_pass)
            data =  "pwhash": pwhash, "username": user 
            cursor.execute(UPDATE_QUERY, data)
        cursor.close()
        db.close()
    os.execv(sys.argv[1], sys.argv[1:])
if __name__ == "__main__":
    main()
And the passwords are getting gradually updated as the users log in. Once all are updated, we can remove the post-login script and drop the auth_allow_weak_schemes = yes.

27 March 2026

Paul Tagliamonte: librtlsdr.so for fun and profit

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.
It s well known and universally agreed that radios are cool. Among the contested field of coolest radios, Software Defined Radios (SDRs) are definitely the most interesting to me. Out of all of my (entirely too many) SDRs I own, the rtlsdr is still my #1. It s just good. It s a great price, extremely capable, reliable, well-supported, and compact. Why bother with anything else? Sure, it can t transmit, uses a (fairly weird) 8 bit unsigned integer IQ representation, limited sampling rate, limited frequency range but even with all that, it s still the radio I will pack first. Don t get me wrong, I love my Ettus radios, PlutoSDRs, HackRFs, my AirspyHF+ - they re great! I just always find myself falling back to an rtl-sdr, every time. Perhaps the best reason to use an rtlsdr is the absolutely mind-boggling amount of cool stuff people have written for it. The rtlsdr API is super easy to use, widely supported if you re building on top of existing radio processing frameworks it s still a shock to me when something omits rtlsdr support.

sparky Over the last 7 years, I ve been learning about radios I got my ham radio license (de K3XEC), hacked on some cool stuff where I ve learned how radios work by doing , and even was lucky enough to give my first rf-centric talk at districtcon. Embarrassingly, I still haven t gotten around to learning how the fancy stuff like GNU Radio works. I m sure I m going to love it when I do. As part of this, I ve also cooked up some very unprofessional formats and protocols I use for convenience. Locally, all my on-disk captures are stored in rfcap or more recently arf, while direct SDR access at my house is almost entirely a mix of the widely used rtl-tcp protocol, and my riq protocol (post on this coming soon). Both rtl-tcp and riq operate over the network, so I don t have to bother with plugging things into USB ports, and I can share my radios with my friends. All of that work sits in my current generation of radio processing code, sparky (a reference to spark-gap transmitters), which is a heap of Rust, supporting everything from no_std for embedded experiments, conditional support for interfacing with all the radios I own, and tokio-based async support in addition to blocking i/o for highly concurrent daemons. This quickly advanced beyond my old Go-based code (hz.tools/go-sdr), which I archived so I can focus on learning. I still think Go is a great language to write RF code in but I can t focus on that tech tree anymore. Of course, this now poses a new problem no one supports my format(s) or radio protocol(s), since, well, I m the only one using them. I ve committed a fair amount of my hardware to this setup, and yanking it from the rack to try something out does pose a bit of a pickle. This isn t a huge deal for learning, but it does make it tedious to try out something from the internets.

librtlsdr.so Thankfully, Rust has robust support for wrap[ping itself] in a grotesque simulacra of C s skin and mak[ing its] flesh undulate, which is an attractive nuisance if i ve ever seen one. Naturally, my ability to restrain myself from engaging in ill-advised rf adventures is basically zero, so it s time to do the thing any similarly situated person would do reimplement the API and ABI of librtlsdr.so, backed with sparky instead. Since enumeration of devices is going to be annoying (specifically, they re over the network), I decided early-on to rely on an explicit list of devices via a configuration file. I d rather only load that once so programs don t get confused, so I opted to use a CTOR to run a stub when the ELF is linked at runtime.
// lightly edited for clarity

#[used]
#[expect(unused)]
#[unsafe(link_section = ".init_array")]
pub static INITIALIZE: extern "C" fn() = sparky_rtlsdr_ctor;

#[unsafe(no_mangle)]
pub extern "C" fn sparky_rtlsdr_ctor()  
 let config: Config =  
 if let Ok(config_bytes) = std::fs::read("/etc/sparky-rtlsdr.toml")  
 toml::from_slice(&config_bytes).unwrap()
   else  
 Config   device: vec![]  
  
  ;
 CONFIG.set(config);
 
Next, it s time to start with the basics. Opening and closing a handle using rtlsdr_open and rtlsdr_close. Given we don t control the runtime, and the rtl-sdr device handle is opaque (for good reason!), I opted to smuggle a rust Box<Device> non-FFI safe heap-allocated struct through the device handle pointer, and let C take ownership of the Box. No one should be looking in there anyway.
// lightly edited for clarity

#[unsafe(no_mangle)]
pub unsafe extern "C" fn rtlsdr_open(dev: *mut *mut Handle, index: u32) -> int  
 let config = &CONFIG.device[index as usize];
 let sdr = match config.load()  
 Ok(v) => v,
 Err(err) =>  
 return -1;
  
  ;
 let handle = Box::new(Handle   config, sdr  );
 unsafe   *dev = Box::into_raw(handle)  ;
 0
 

#[unsafe(no_mangle)]
pub unsafe extern "C" fn rtlsdr_close(dev: *mut Handle) -> int  
 let dev = unsafe   Box::from_raw(dev)  ;
 drop(dev);
 0
 
With that in place, we can chip away at the API surface, translating calls as best as we can. I won t bother listing it all, since it s not very interesting but here s an example implementation of rtlsdr_set_sample_rate and rtlsdr_get_sample_rate. These calls are translating from an rtl-sdr frequency (which is a u32 containing the value as Hz) into a sparky Frequency type, and invoking get_sample_rate or set_sample_rate on the device s rust handle. Since each device implements the sparky Sdr trait, the actual underlying device doesn t matter much here.
#[unsafe(no_mangle)]
pub unsafe extern "C" fn rtlsdr_set_sample_rate(dev: *mut Handle, rate: u32) -> int  
 let dev = unsafe   &mut *dev  ;
 let rate = Frequency::from_hz(rate as i64);
 if let Err(err) = dev.sdr.set_sample_rate(dev.channel, rate)  
 return -1;
  
 0
 

#[unsafe(no_mangle)]
pub unsafe extern "C" fn rtlsdr_get_sample_rate(dev: *mut Handle) -> u32  
 let dev = unsafe   &mut *dev  ;
 let freq = match dev.sdr.get_sample_rate(dev.channel)  
 Ok(freq) => freq,
 Err(err) =>  
 return 0;
  
  ;
 freq.as_hz() as u32
 
After repeating this process for the rest of the stubs I could (and otherwise setting error conditions if the functionality is not supported), I was ready to try it out. Within sparky, I patched my MockSDR (basically a Sdr traited Mock type) to implement the same testmode IQ protocol that the RTL-SDR has, and decided to see if rtl_test from apt without any changes could be fooled.
$ rtl_test
No supported devices found.
Great, cool. No devices plugged in. Looks great. Let s try it with my librtlsdr.so LD_PRELOAD-ed into the binary first:
$ LD_PRELOAD=target/release/librtlsdr.so rtl_test
Found 1 device(s):
 0: hz.tools, mock sdr, SN: totally legit no tricks

Using device 0: sparky mock sdr
Supported gain values (0):
Sampling at 2048000 S/s.

Info: This tool will continuously read from the device, and report if
samples get lost. If you observe no further output, everything is fine.

Reading samples in async mode...
^CSignal caught, exiting!

User cancel, exiting...
Samples per million lost (minimum): 0
$
Outstanding. Even more outstandingly, if I change my testmode implementation to skip samples, rtl_test correctly reports the errors I think it s showing promise! On to try the real endgame here let s have our new librtlsdr.so connect to an rtl-tcp endpoint and see if rtl_fm works:
LD_PRELOAD=target/release/librtlsdr.so \
 rtl_fm -d 1 -s 120k -E deemp -M fm -f 90.9M   \
 ffplay -f s16le -ar 120k -i -
Found 2 device(s):
 0: hz.tools, mock sdr, SN: totally legit no tricks
 1: hz.tools, rtl-tcp, SN: node2.rf.lan:1202

Using device 1: sparky rtltcp node2
Tuner gain set to automatic.
Tuned to 91170000 Hz.
Oversampling input by: 9x.
Oversampling output by: 1x.
Buffer size: 7.59ms
Sampling at 1080000 S/s.
Output at 120000 Hz.
And there it was! Not the best audio quality (mostly due to my inability to correctly read the rtl_fm manpage to tune the filter and downsample/oversampling rates to audio), but it s definitely passable. I figured I d try something that was a bit more interesting next gqrx, since it s super handy, I use it a ton, and will definitely amuse me to no end. To my surprise and delight, LD_PRELOAD=target/release/librtlsdr.so gqrx wound up running, and I saw my devices pop right up in the setting menu: Huge. Huge. Amazing. It did crash as soon as I tried to actually use the radio, but after fixing a few dangling bugs in the API surface (and some assumptions I think some underlying gnuradio driver may be making that I need to double check in the code), I was able to get a super solid stream of broadcast fm radio, with gqrx being none the wiser. It thought it was just talking to the device it knows as rtl=1. Nice. I can t wait to try this with the rest of the rtl-sdr based tools I like having around using my riq protocol next. I don t think that ll be worth a post, but hopefully I ll get around to publishing details on that stack next.

epilogue Well. That s it. End of story. A bit anti-climatic, sure. While this new shim will provide me endless minutes of mild amusement, I could see using this to expose my sparky testing utilities via librtlsdr.so my mock sdr driver allows for replaying captures off disk, which could be interesting to make sure that signals are still properly decoded after changes, or instrument performance changes (via SNR, BER, packets observed, etc) on reference samples I have on my NAS. Maybe that ll come in handy one day! Truth be told, I m not sure I actually want to encourage anyone to do this for real (although I think I ll definitely be using it on my LAN to see what happens). I also don t have a repo to share I don t particularly feel with dealing with the secondary effects of publishing sparky (and sparky-rtlsdr) yet, since i m still getting my feet under me on the radio aspect of all this. I ll be sure to post updates if anything changes with this here (tagged sparky) and at @paul@soylent.green. I can t wait to post more about some of the odd sidequests (like this one!) i ve completed over the last few years I ve been waiting to feel confident that my work has matured and was withstood the new problems i ve thrown at it, and it largely has. It s my hope that these projects (and this project in particular) has provided a glimpse into the world of software defined radio for my systems friends, and a bit about systems for my radio friends. It s not all magic, and I hope someone out there feels inclined to have some fun with radios themselves!

21 March 2026

C.J. Collier: The WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104

The
WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104 This document synthesizes the extensive work performed from March
13th to March 20th, 2026, to harden, stabilize, and refactor the
WWW::Mechanize::Chrome library and its test suite. This
effort involved deep dives into asynchronous programming,
platform-specific bug hunting, and strategic architectural
decisions.

Part I:
The Quest for Cross-Platform Stability (March 13 16) The initial phase of work focused on achieving a green test suite
across a variety of Linux distributions and preparing for a new release.
This involved significant hardening of the library to account for
different browser versions, OS-level security restrictions, and
filesystem differences.

Key Milestones &
Engineering Decisions:
  • Fedora & RHEL-family Success: A major effort
    was undertaken to achieve a 100% pass rate on modern Fedora 43 and
    CentOS Stream 10. This required several key engineering decisions to
    handle modern browser behavior:
    • Decision: Implement Asynchronous DOM Serialization
      Fallback. Synchronous fallbacks in an async context are
      dangerous. To prevent Resource was not cached errors during
      saveResources, we implemented a fully asynchronous fallback
      in _saveResourceTree. By chaining
      _cached_document with DOM.getOuterHTML
      messages, we can reconstruct document content without blocking the event
      loop, even if Chromium has evicted the resource from its cache. This
      also proved resilient against Fedora s security policies, which often
      block file:// access.
    • Decision: Truncate Filenames for Cross-Platform
      Safety. To avoid File name too long errors,
      especially on Windows where the MAX_PATH limit is 260
      characters, filenameFromUrl was hardened. The filename
      truncation was reduced to a more conservative 150
      characters, leaving ample headroom for deeply nested CI
      temporary directories. Logic was also added to preserve file extensions
      during truncation and to sanitize backslashes from URI paths.
    • Decision: Expand Browser Discovery Paths. To
      support RHEL-based systems out-of-the-box, the
      default_executable_names was expanded to include
      headless_shell and search paths were updated to include
      /usr/lib64/chromium-browser/.
    • Decision: Mitigate Race Conditions with Stabilization Waits
      and Resilient Fetching. On fast systems,
      DOM.documentUpdated events could invalidate
      nodeIds immediately after navigation, causing XPath queries
      to fail with Could not find node with given id . A small stabilization
      sleep(0.25s) was added after page loads to ensure the DOM
      is settled. Furthermore, the asynchronous DOM fetching loop was hardened
      to gracefully handle these errors by catching protocol errors and
      returning an empty string for any node that was invalidated during
      serialization, ensuring the overall process could complete.
  • Windows Hardening:
    • Decision: Adopt Platform-Aware Watchdogs. The test
      suite s reliance on ualarm was a blocker for Windows, where
      it is not implemented. The t::helper::set_watchdog function
      was refactored to use standard alarm() (seconds) on Windows
      and ualarm (microseconds) on Unix-like systems, enabling
      consistent test-level timeout enforcement.
  • Version 0.77 Release:
    • Decision: Adopt SOP for Version Synchronization.
      The project maintains duplicate version strings across 24+ files. A
      Standard Operating Procedure was adopted to use a batch-replacement tool
      to update all sub-modules in lib/ and to always run
      make clean and perl Makefile.PL to ensure
      META.json and META.yml reflect the new
      version. After achieving stability on Linux, the project version was
      bumped to 0.77.
  • Infrastructure & Strategic Work:
    • The ad2 Windows Server 2025 instance was restored and
      optimized, with Active Directory demoted and disk I/O performance
      improved.
    • A strategic proposal for the Heterogeneous Directory
      Replication Protocol (HDRP) was drafted and published.

Part II: The
Great Async Refactor (March 17 18) Despite success on Linux, tests on the slow ad2 Windows
host were still plagued by intermittent, indefinite hangs. This
triggered a fundamental architectural shift to move the library s core
from a mix of synchronous and asynchronous code to a fully non-blocking
internal API.

Key Milestones &
Engineering Decisions:
  • Decision: Expose a _future API.
    Instead of hardcoding timeouts in the library, the core strategy was to
    refactor all blocking methods (xpath, field,
    get, etc.) into thin wrappers around new non-blocking
    ..._future counterparts. This moved timeout management to
    the test harness, allowing for flexible and explicit handling of
    stalls.
    # Example library implementation
    sub xpath($self, $query, %options)  
        return $self->xpath_future($query, %options)->get;
     
    
    sub xpath_future($self, $query, %options)  
        # Async implementation using $self->target->send_message(...)
     
  • Decision: Centralize Test Hardening in a Helper.
    A dedicated test library, t/lib/t/helper.pm, was created to
    contain all stabilization logic. Safe wrappers (safe_get,
    safe_xpath) were implemented there, using
    Future->wait_any to race asynchronous operations against
    a timeout, preventing tests from hanging.
    # Example test helper implementation
    sub safe_xpath  
        my ($mech, $query, %options) = @_;
        my $timeout = delete $options timeout    5;
        my $call_f = $mech->xpath_future($query, %options);
        my $timeout_f = $mech->sleep_future($timeout)->then(sub   Future->fail("Timeout")  );
        return Future->wait_any($call_f, $timeout_f)->get;
     
  • Decision: Refactor Node Attribute Cache.
    Investigations into flaky checkbox tests (t/50-tick.t)
    revealed that WWW::Mechanize::Chrome::Node was storing
    attributes as a flat list ([key, val, key, val]), which was
    inefficient for lookups and individual updates. The cache was refactored
    to definitively use a HashRef, providing O(1) lookups
    and enabling atomic dual-updates where both the browser property (via
    JS) and the internal library attribute are synchronized
    simultaneously.
  • Decision: Implement Self-Cancelling Socket
    Watchdog. On Windows, traditional watchdog processes often
    failed to detect parent termination, leading to 60-second hangs after
    successful tests. We implemented a new socket-based watchdog in
    t::helper that listens on an ephemeral port; the background
    process terminates immediately when the parent socket closes,
    eliminating these cumulative delays.
  • Decision: Deep Recursive Refactoring & Form
    Selection. To make the API truly non-blocking, the entire
    internal call stack had to be refactored. For example, making
    get_set_value_future non-blocking required first making its
    dependency, _field_by_name, asynchronous. This culminated
    in refactoring the entire form selection API (form_name,
    form_id, etc.) to use the new asynchronous
    _future lookups, which was a key step in mitigating the
    Windows deadlocks.
  • Decision: Fix Critical Regressions & Memory
    Cycles.
    • Evaluation Normalization: Implemented a
      _process_eval_result helper to centralize the parsing of
      results from Runtime.evaluate. This ensures consistent
      handling of return values and exceptions between synchronous
      (eval_in_page) and asynchronous (eval_future)
      calls.
    • Memory Cycle Mitigation: A significant memory
      leak was discovered where closures attached to CDP event futures (like
      for asynchronous body retrieval) would capture strong references to
      $self and the $response object, creating a
      circular reference. The established rule is to now always use
      Scalar::Util::weaken on both $self and any
      other relevant objects before they are used inside a
      ->then block that is stored on an object.
    • Context Propagation (wantarray): A
      major regression was discovered where Perl s wantarray
      context, which distinguishes between scalar and list context, was lost
      inside asynchronous Future->then blocks. This caused
      methods like xpath to return incorrect results (e.g., a
      count instead of a list of nodes). The solution was to adopt the Async
      Context Pattern : capture wantarray in the synchronous
      wrapper, pass it as an option to the _future method, and
      then use that captured value inside the future s final resolution
      block.
      # Synchronous Wrapper
      sub xpath($self, $query, %options)  
          $options  wantarray   = wantarray; # 1. Capture
          return $self->xpath_future($query, %options)->get; # 2. Pass
       
      
      # Asynchronous Implementation
      sub xpath_future($self, $query, %options)  
          my $wantarray = delete $options  wantarray  ; # 3. Retrieve
          # ... async logic ...
          return $doc->then(sub  
              if ($wantarray)   # 4. Respect
                  return Future->done(@results);
                else  
                  return Future->done($results[0]);
               
           );
       
    • Asynchronous Body Retrieval & Robust Content
      Fallbacks: Fixed a bug where decoded_content()
      would return empty strings by ensuring it awaited a
      __body_future. This was implemented by storing the
      retrieval future directly on the response object
      ($response-> __body_future ). To make this more robust,
      a tiered strategy was implemented: first try to get the content from the
      network response, but if that fails (e.g., for about:blank
      or due to cache eviction), fall back to a JavaScript
      XMLSerializer to get the live DOM content.
    • Signature Hardening: Fixed Too few arguments
      errors when using modern Perl signatures with
      Future->then. Callbacks were updated to use optional
      parameters (sub($result = undef) ... ) to gracefully
      handle futures that resolve with no value.
    • XHTML Split-Brain Bug: Resolved a
      long-standing Chromium bug (40130141) where content provided via
      setDocumentContent is parsed differently than content
      loaded from a URL. A workaround was implemented: for XHTML documents,
      WMC now uses a JavaScript-based XPath evaluation
      (document.evaluate) against the live DOM, bypassing the
      broken CDP search mechanism.

Derived Architectural Rules
& SOPs:
  • Rule: Always provide _future variants.
    Every library method that interacts with the browser via CDP must have a
    non-blocking asynchronous counterpart.
  • Rule: Centralize stabilization in the test layer.
    All timeout and retry logic should reside in the test harness
    (t/lib/t/helper.pm), not in the core library.
  • Rule: Explicitly propagate wantarray
    context. Synchronous wrappers must capture the caller s context
    and pass it down the Future chain to ensure correct
    scalar/list behavior.
  • Rule: The entire call chain must be asynchronous.
    To enable non-blocking timeouts, even a single hidden blocking call in
    an otherwise asynchronous method will cause a stall.
  • SOP: Reduce Library Noise. Diagnostic messages
    (warn, note, diag) should be
    removed from library code before commits. All such messages should be
    converted to use the internal $self->log('debug', ...)
    mechanism, ensuring a clean TAP output for CI systems.

Part III: The
MutationObserver Saga (March 19) With most of the library refactored to be asynchronous, one stubborn
test, t/65-is_visible.t, continued to fail with timeouts.
This led to an ambitious, but ultimately unsuccessful, attempt to
replace the wait_until_visible polling logic with a more
modern MutationObserver.

Key Milestones & Challenges:
  • The Theory: The goal was to replace an inefficient
    repeat sleep loop with an event-driven
    MutationObserver in JavaScript that would notify Perl
    immediately when an element s visibility changed.
  • Implementation & Cascade Failure: The
    implementation proved incredibly difficult and introduced a series of
    new, hard-to-diagnose bugs:
    1. An incorrect function signature for
      callFunctionOn_future.
    2. A critical unit mismatch, passing seconds from Perl to JavaScript s
      setTimeout, which expected milliseconds.
    3. A fundamental hang where the MutationObserver s
      JavaScript Promise would never resolve, even after the
      underlying DOM element changed.
  • Debugging Maze: Multiple attempts to fix the
    checkVisibility JavaScript logic inside the observer
    callback, including making it more robust by adding DOM tree traversal
    and extensive console.log tracing, failed to resolve the
    hang. This highlighted the opacity and difficulty of debugging complex,
    cross-language asynchronous interactions, especially when dealing with
    low-level browser APIs.

Procedural Learning:
Granular Edits The effort was plagued by procedural missteps in using automated
file-editing tools. Initial attempts to replace large code blocks in a
single operation led to accidental code loss and match failures.
  • Decision: Adopt Delete, then Add Workflow.
    Following forceful user correction, a new SOP was established for all
    future modifications:
    1. Isolate: Break the file into small, manageable
      chunks (e.g., 250 lines).
    2. Delete: Perform a delete operation by replacing
      the old code block with an empty string.
    3. Add: Perform an add operation by inserting the
      new code into the empty space.
    4. Verify: Verifying each atomic step before
      proceeding. This granular process, while slower, ensured surgical
      precision and regained technical control over the large
      Chrome.pm module.
The consistent failure of the MutationObserver approach
eventually led to the decision to abandon it in favor of stabilizing the
original, more transparent implementation.

Part IV:
Reversion and Final Stabilization (March 20) After exhausting all reasonable attempts to fix the
MutationObserver, a strategic decision was made to revert
to the simpler, more transparent polling implementation and fix it
correctly. This proved to be the correct path to a stable solution.

Key Milestones &
Engineering Decisions:
  • Decision: Perform Strategic Reversion. The
    MutationObserver implementation, when integrated via
    callFunctionOn_future with awaitPromise,
    proved fundamentally unstable. Its JavaScript promise would consistently
    fail to resolve, causing indefinite hangs. A decision was made to
    revert all MutationObserver code from
    WWW::Mechanize::Chrome.pm and restore the original
    repeat sleep polling mechanism. A stable,
    understandable solution was prioritized over an elegant but broken
    one.
  • Decision: Correct Timeout Delegation in the
    Harness. The root cause of the original timeout failure was
    identified as a race condition in the t/lib/t/helper.pm
    test harness. The safe_wait_until_* wrappers were
    implementing their own timeout (via wait_any and
    sleep_future) that raced against the underlying polling
    function s internal timeout. This led to intermittent failures on slow
    machines. The helpers were refactored to delegate all timeout
    management to the library s polling functions, ensuring a
    single, authoritative timer controlled the operation.
  • Decision: Optimize Polling Performance. At the
    user s request, the polling interval was reduced from 300ms to
    150ms. This modest performance improvement reduced the
    test suite s wallclock execution time by over a second while maintaining
    stability.
  • Decision: Tune Test Watchdogs. The global watchdog
    timeout was adjusted to 12 seconds, specifically calculated as 1.5x the
    observed real execution time of the optimized test. This provides a
    data-driven safety margin for CI.

Part
V: The Last Bug A Platform-Specific Memory Leak (March 20) With all other tests passing, a single memory leak failure in
t/78-memleak.t persisted, but only on the Windows
ad2 environment. This required a different approach than
the timeout fixes.

Key Milestones:
  • The Bug: A strong reference cycle involving the
    on_dialog event listener was not being broken on Windows,
    despite multiple attempts to fix it. Fixes that worked on Linux (such as
    calling on_dialog(undef) in DESTROY) were not
    sufficient on the Windows host.
  • The Diagnosis: The issue was determined to be a
    deep, platform-specific interaction between Perl s garbage collector,
    the IO::Async event loop implementation on Windows, and the
    Test::Memory::Cycle module. The cycle report was identical
    on both platforms, but the cleanup behavior was different.
  • Failed Attempts: A series of increasingly
    aggressive fixes were attempted to break the cycle, including:
    1. Moving the on_dialog(undef) call from
      close() to DESTROY().
    2. Explicitly deleteing the listener and callback
      properties from the object hash in DESTROY.
    3. Swapping between $self->remove_listener and
      $self->target->unlisten in a mistaken attempt to find
      the correct un-registration method.
  • Pragmatic Solution: After exhausting all reasonable
    code-level fixes without a resolution on Windows, the user opted to mark
    the failing test as a known issue for that specific platform.
  • Final Fix: The single failing test in
    t/78-memleak.t was wrapped in a conditional
    TODO block that only executes on Windows
    (if ($^O =~ /MSWin32/i)), formally acknowledging the bug
    without blocking the build. This allows the test suite to pass in CI
    environments while flagging the issue for future, deeper
    investigation.

Part VI: CI Hardening (March
20) A final failure in the GitHub Actions CI environment revealed one
last configuration flaw.

Key Milestones:
  • The Bug: The CI was running
    prove --nocount --jobs 3 -I local/ -bl xt t directly. This
    command was missing the crucial -It/lib include path, which
    is necessary for test files to locate the t::helper module.
    This resulted in nearly all tests failing with
    Can't locate t/helper.pm in @INC.
  • The Investigation: An analysis of
    Makefile.PL revealed a custom MY::test block
    specifically designed to inject the -It/lib flag into the
    make test command. This confirmed that
    make test is the correct, canonical way to run the test
    suite for this project.
  • The Fix: The
    .github/workflows/linux.yml file was modified to replace
    the direct prove call with make test in the
    Run Tests step. This ensures the CI environment runs the
    tests in the exact same way as a local developer, with all necessary
    include paths correctly configured by the project s build system.

Final Outcome After this long and arduous journey, the
WWW::Mechanize::Chrome test suite is now stable and
passing on all targeted platforms, with known
platform-specific issues clearly documented in the code. The project is
in a vastly more robust and reliable state.

20 March 2026

Dirk Eddelbuettel: RcppSpdlog 0.0.28 on CRAN: Micro-Maintenance

Version 0.0.28 of RcppSpdlog arrived on CRAN today, has been uploaded to Debian and built for r2u. The (nice) documentation site has been refreshed too. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. You can learn more at the nice package documention site. This release contains a rebuild RcppExports.cpp to aid Rcpp in the transition towards Rcpp::stop() and away from Rf_error() in its user packages. No othe The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.28 (2026-03-19)
  • Regenerate RcppExports.cpp to switch to (Rf_error) aiding in Rcpp transition to Rcpp::stop()

Courtesy of my CRANberries, there is also a diffstat report detailing changes. More detailed information is on the RcppSpdlog page, or the package documention site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

19 March 2026

Otto Kek l inen: Automated security validation: How 7,000+ tests shaped MariaDB's new AppArmor profile

Featured image of post Automated security validation: How 7,000+ tests shaped MariaDB's new AppArmor profileLinux kernel security modules provide a good additional layer of security around individual programs by restricting what they are allowed to do, and at best block and detect zero-day security vulnerabilities as soon as anyone tries to exploit them, long before they are widely known and reported. However, the challenge is how to create these security profiles without accidentally also blocking legitimate actions. For MariaDB in Debian and Ubuntu, a new AppArmor profile was recently created by leveraging the extensive test suite with 7000+ tests, giving good confidence that AppArmor is unlikely to yield false positive alerts with it. AppArmor is a Mandatory Access Control (MAC) system, meaning that each process controlled by AppArmor has a sort of an allowlist called profile that defines all capabilities and file paths a program can access. If a program tries to do something not covered by the rules in its AppArmor profile, the action will be denied on the Linux kernel level and a warning logged in the system journal. This additional security layer is valuable because even if a malicious user found a security vulnerability some day in the future, the AppArmor profile severely restricts the ability to exploit it and gain access to the operating system. AppArmor was originally developed by Novell for use in SUSE Linux, but nowadays the main driver is Canonical and AppArmor is extensively used in Ubuntu and Debian, and many of their derivatives (e.g. Linux Mint, Pop!_OS, Zorin OS) and in Arch. AppArmor s benefit compared to the main alternative SELinux (used mainly in the RedHat/Fedora ecosystem) is that AppArmor is easier to manage. AppArmor continues to be actively developed, with new major version 5.0 expected to arrive soon. I also have some personal history contributing some notification handler scripts in Python and I also created the website that AppArmor.net still runs.

Regular review of denials in the system log required Any system administrator using Debian/Ubuntu needs to know how to check for AppArmor denials. The point of using AppArmor is kind of moot if nobody is checking the denials. When AppArmor blocks an action, it logs the event to the system audit or kernel logs. Understanding these logs is crucial for troubleshooting custom configurations or identifying potential security incidents. To view recent denials, check /var/log/audit/audit.log or run journalctl -ke --grep=apparmor. A typical denial entry for MariaDB will look like this (split across multiple lines for legibility):
msg=audit(1700000000.123:456): apparmor="DENIED" operation="open"
profile="/usr/sbin/mariadbd" name="/custom/data/path/test.ibd" pid=1234
comm="mariadbd" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
How to interpret this output:
  • msg=audit( ): The audit timestamp and event serial number.
  • apparmor= DENIED : Indicates AppArmor blocked the action.
  • operation: The action being attempted (e.g., open, mknod, file_mmap, file_perm).
  • profile: The specific AppArmor profile that triggered the denial (in this case the /usr/sbin/mariadbd profile).
  • name: The file path or resource that was blocked. In the example above, a custom data path was denied access because it wasn t defined in the profile s allowed abstractions.
  • comm: The command name that triggered the denial (here mariadbd).
  • requested_mask / denied_mask: Shows the permissions requested (e.g., r for read, w for write).
  • pid: The process ID.
  • fsuid: The user ID of the process attempting the action.
  • ouid: The owner user ID of the target file.
If an action seems legit and should not be denied, the sysadmin needs to update the existing rules at /etc/apparmor.d/ or drop a local customization file in at /etc/apparmor.d/local/. If the denied action looks malicious, the sysadmin should start a security investigation and if needed report a suspected zero-day vulnerability to the upstream software vendor (e.g. Ubuntu customers to Canonical, or MariaDB customers to MariaDB).

AppArmor in MariaDB - not a novel thing, and not easy to implement well Based on old bug reports, there was an AppArmor profile already back in 2011, but it was removed in MariaDB 5.1.56 due to backlash from users running into various issues. A new profile was created in 2015, but kept opt-in only due to the risk of side effects. It likely had very few users and saw minimal maintenance, getting only a handful of updates in the past 10 years. The primary challenge in using mandatory access control systems with MariaDB lies in the sheer breadth of MariaDB s operational footprint with diverse storage engines and plugins. Also the code base in MariaDB assumes that system calls to Linux always work which they do under normal circumstances and do not handle errors well if AppArmor suddenly denies a system call. MariaDB is also a large and complex piece of software to run and operate, and it can be very challenging for system administrators to root-cause that a misbehavior in their system was due to AppArmor blocking a single syscall. Ironically, AppArmor is most beneficial exactly due to the same reasons for MariaDB. The larger and more complex a software is, the larger are the odds of a security vulnerability arising between the various components. And AppArmor profile helps reduce this complexity down to a single access list. Over the years there has been users requesting to get the AppArmor profile back, such as Debian Bug#875890 since 2017. The need was raised recently again by the Ubuntu security team during the MariaDB Ubuntu main inclusion review in 2025, which prompted a renewed effort by Debian/Ubuntu developers, mainly myself and Aquila Macedo, with upstream MariaDB assistance from Daniel Black.

A fresh approach: leverage the MariaDB test suite for automated testing and the open source community for reviews The key to creating a robust AppArmor profile is the ability to know in detail what is expected and normal behavior of the system. One could in theory read all of the source code in MariaDB, but with over two million lines, it is of course not feasible in practice. However, MariaDB does have a very extensive 7000+ test suite, and running it should trigger most code paths in MariaDB. Utilizing the test suite was key in creating the new AppArmor profile for MariaDB: we installed MariaDB on a Ubuntu system, enabled AppArmor in complain mode and iterated on the allowlist by running the full mariadb-test-run with all MariaDB plugins and features enabled until we had a comprehensive yet clean list of rules. To be extra diligent, we also reworked the autopkgtest for MariaDB in Debian and Ubuntu CI systems to run with the AppArmor profile enabled and to print all AppArmor notices at the end of the run, making it easy to detect now and in the future if the MariaDB test suite triggers any AppArmor denials. If any test fails, the release would not get promoted further, protecting users from regressions. While developing and triggering manual test runs we used the maximal achievable test suite with 7177 tests. The test is however so extensive it takes over two hours to run, and it also has some brittle tests, so the standard test run in Debian and Ubuntu autopkgtest is limited just to MariaDB s main suite with about 1000 tests. Having some tests fail while testing the AppArmor profile was not a problem, because we didn t need all the tests to pass we merely needed them to run as many code paths as possible to see if they run any system calls not accounted for in the AppArmor profile. Note that extending the profile was not just mechanical copying of log messages to the profile. For example, even though a couple of tests involve running the dash shell, we decided to not allow it, as it opens too much of a path for a potential exploit to access the operating system. The result of this effort is a modernized, robust profile that is now production-ready. Those interested in the exact technical details can read the Debian Bug#1130272 and the Merge Request discussions at salsa.debian.org, which hosts the Debian packaging source code.

Now available in Debian unstable, soon Ubuntu feedback welcome! Even though the file is just 200 lines long, the work to craft it spanned several weeks. To minimize risk we also did a gradual rollout by releasing the first new profile version in complain mode, so AppArmor only logs would-be-denials without blocking anything. The AppArmor profile was switched to enforce mode only in the very latest MariaDB revision 1:11.8.6-4 in Debian, and a NEWS item issued to help increase user awareness of this change. It is also slated for the upcoming Ubuntu 26.04 Resolute Raccoon release next month, providing out-of-the-box hardening for the wider ecosystem. While automated testing is extensive, it cannot simulate everything. Most notably various complicated replication topologies and all Galera setups are likely not covered. Thus, I am calling on the community to deploy this profile and monitor for any audit denials in the kernel logs. If you encounter unexpected behavior or legitimate denials, please submit a bug report via the Debian Bug Tracking System. To ensure you are running the latest MariaDB version, run apt install --update --yes mariadb-server. To view the latest profile rules, run cat /etc/apparmor.d/mariadbd and to see if it is enforced review the output of aa-status. To quickly check if there were any AppArmor denials, simply run journalctl -k grep -i apparmor grep -i mariadb.

Systemd hardening also adopted as security features keep evolving For those interested in MariaDB security hardening, note that also new systemd hardening options were rolled out in Debian/Ubuntu recently. Note that Debian and Ubuntu are mainly volunteer-driven open source developer communities, and if you find this topic interesting and you think you have the necessary skills, feel free to submit your improvement ideas as Merge Requests at salsa.debian.org/mariadb-team. If your improvement suggestions are not Debian/Ubuntu specific, please submit them directly to upstream at GitHub.com/MariaDB.

18 March 2026

Dirk Eddelbuettel: tidyCpp 0.0.11 on CRAN: Microfix

And yet another maintenance release of the tidyCpp package arrived on CRAN this morning, just days after previous release which itself came a mere week and a half after its predecessor. It has been built for r2u as well. The package offers a clean C++ layer (as well as one small C++ helper class) on top of the C API for R which aims to make use of this robust (if awkward) C API a little easier and more consistent. See the vignette for motivating examples. This release restores the small CSS file used by the vignette which we, in a last-second decision, omitted from the previous release. Oddly, it only failed under the oldrel i.e. the R from now nearly two years ago. But it was still an unenforced error, and this upload corrects it. Changes are summarized in the NEWS entry that follows.

Changes in tidyCpp version 0.0.11 (2026-03-17)
  • Keep a CSS file in the package to allow vignette build on r-oldrel too

Thanks to my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

16 March 2026

Dirk Eddelbuettel: RcppClassicExamples 0.1.4 on CRAN: Maintenance

Another minor maintenance release version 0.1.4 of package RcppClassicExamples arrived earlier today on CRAN, and has been built for r2u. This package illustrates usage of the old and otherwise deprecated initial Rcpp API which no new projects should use as the normal and current Rcpp API is so much better. This release, the first in two and half years, mostly aids Rcpp in moving from Rf_error() to Rcpp::stop() for better behaviour under error conditions or excections. A few other things were updated in the interim such as standard upgrade to continuous integration, use of Authors@R, and switch to static linking and an improved build to support multiple macOS architectures. No new code or features. Full details below. And as a reminder, don t use the old RcppClassic use Rcpp instead.

Changes in version 0.1.4 (2026-03-16)
  • Continuous integration has been updated several times
  • DESCRIPTION now uses Authors@R
  • Static linking is enforced, RcppClassic (>= 0.9.14) required
  • Calls to Rf_error() have been replaced with Rcpp::stop()
  • Updated versioned dependencies

Thanks to CRANberries, you can also look at a diff to the previous release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

Russ Allbery: Review: The Martian Contingency

Review: The Martian Contingency, by Mary Robinette Kowal
Series: Lady Astronaut #4
Publisher: Tor
Copyright: 2025
ISBN: 1-250-23703-3
Format: Kindle
Pages: 390
The Martian Contingency is the fourth book of the mostly-realistic science fiction alternate history series that began with the novelette "The Lady Astronaut of Mars" and the novel The Calculating Stars. It returns to Elma York as the main character, covering her second trip to Mars after the events of The Fated Sky. It's helpful to remember the events of the previous two books to follow some of the plot. Elma is back on Mars, this time as second in command. The immediate goal of the second Mars mission is to open more domes and land additional crew currently in orbit, creating the first permanent human settlement on Mars. The long-term goal is to set up Mars as a refuge in case the greenhouse effect caused by the meteor strike in The Calculating Stars continues to spiral out of control. Elma is anxious and not looking forward to being partly in charge, particularly since her position is partly due to her fame with the public (and connection with the American president). She'd rather just be a pilot. But she'll do what the mission needs from her, and at least this time her husband is with her on Mars. As one might expect from earlier installments of this series, The Martian Contingency starts with the details and rhythms of life in a dangerous, highly technical, and mission-driven scientific environment: hard science fiction of the type most closely modeled on NASA and real space missions. Given that this is aimed at permanent Mars colonies that would theoretically have to be independent of Earth, it requires a huge amount of suspension of disbelief for the premise, but Kowal at least tries for verisimilitude in the small details. I am not an expert in early space program technology (Kowal's alternate history diverges into a greatly accelerated space program in the 1950s and, for example, uses female mathematicians for most calculations), so I don't know how successful this is, but it feels crunchy and believable. As with the previous books, though, this is not just a day in the life of an astronaut. There's something wrong, something that happened during the first Mars expedition while Elma was in orbit and left odd physical clues, and no one is willing to talk about it. Elma is just starting to poke around before the politics at home go off the rails (again), exacerbated by a cringe-worthy social error by Elma herself, and she once again has to navigate egregious sexism and political meddling in a highly dangerous environment a long way from home. It is a little surprising that I like this series as much as I do. I don't particularly care for pseudo-realistic science fiction, although I admit there is something deeply satisfying about reading about people following checklists properly. The idea of permanent Mars colonies as an escape from a doomed Earth is unbelievable and deeply silly, but Kowal locked herself into that alternate future with "The Lady Astronaut of Mars," which is still set in the future of all of the books so far. A primary conflict in each of the books comes from the egregious sexism and racism of a culture based on 1950s American attitudes towards both, and the amount of progress Elma can make against either is limited, contingent, and constantly compromised. And yet. At its best, this series is excellent competence porn, both in the spirit of the Apollo 13 movie and for the navigation of social and political obstacles and idiocy. Elma is highly competent in a believable and sympathetic way, with strengths, weaknesses, and an ongoing struggle with anxiety. There is something rewarding in watching people solve problems and eventually triumph by being professional, careful, principled, and creative. It's enough to make a good book, even if I am not that interested in the setting and technology. As with the rest of the series, this will not be for everyone. You have to be up for reading about a lot of truly awful sexism and racism without the payoff of a complete triumph. This is a system that Elma navigates, not overthrows, and that's not going to be enough for some readers. You also have to accept the premise of a Mars colony, which in an otherwise hard science fiction novel is a bit much despite Kowal's attempts to acknowledge some of the difficulties. But if you don't mind that drawbacks, this series continues to be an opportunity to read about people being quietly and professionally competent. This is not my favorite entry, mostly because Elma makes a rather humiliating mistake that's central to the plot and has a lot of after-effects (and therefore a lot of time in the spotlight), and because there is rather a lot of discussion of sexuality that felt childish to me. The intent was to try to capture the way people in the 1950s talked about sex, and perhaps Kowal was successful in that, but I didn't enjoy the experience. But I still found myself pulled into the plot and happily rooting for the characters, even though a reader of "The Lady Astronaut of Mars" has a pretty good idea of how everything will turn out. If you liked the series so far, recommended, although I doubt it will be the favorite entry for most readers. If you did not like the earlier books of the series, this one will not change your mind. Content notes: Way, way too much detailed discussion of an injury to a fingernail than I wanted to read, as well as some other rather explicit description of physical injury. Reproductive health care through the lens of the 1950s, so, uh, yeah. A whole lot of sexism, racism, and other forms of discrimination that is mostly worked around rather than confronted. Rating: 7 out of 10

Next.