Search Results: "max"

3 May 2026

Jelmer Vernoo : Inquest, a test result repository in Rust

testrepository For a long time I ve used Robert Collins testrepository (testr) to run tests in many of the projects I work on. It s a small, focused tool built around a simple idea: decouple the running of tests from the recording and querying of their results. The way it works is straightforward. A test runner emits a subunit stream a compact binary protocol for test results and testrepository stores those streams in a per-project .testrepository/ directory. Once results are in the repository, you can ask questions like which tests failed in the last run? , re-run only the failures , what are the slowest tests? , or what changed between this run and the previous one? . The killer feature, for me, has always been the failing-test loop. When a big test suite breaks, you don t want to re-run the whole thing after every fix you want to iterate on just the failures, and only re-run the full suite once they re all green. testrepository made that workflow ergonomic long before most language-specific test runners had anything comparable, and many of them still don t have a good answer for it. testrepository has served me well for over a decade, but it has been largely unmaintained for a while, and I had some ideas of improvements that I wanted to try out. So I wrote a Rust port, which has since grown a number of features of its own.
Inquest Inquest is a Rust port of testrepository that has since grown a number of features of its own. The binary is called inq.
Goals The goals are deliberately modest:
  • a single static binary, no Python runtime required
  • no need to write a dedicated config file for most projects
  • compatible enough with testrepository s workflow that I can switch projects over without retraining my fingers
  • a richer on-disk format that captures more about each run (git commit, command line, duration, exit code, concurrency)
  • good support for the languages I actually use day-to-day: Rust, Python, Go, and Node.js
  • mostly Do What I Mean (DWIM), e.g. getting me to know as quickly as possible what tests are failing and why, and being clever about doing this
Inquest reads and writes subunit v2 streams, so anything that can produce subunit (directly or via one of the many converters) can feed into it.
Quick start Inquest can usually figure out how to run your tests on its own. In a Rust, Python, Go or Node.js project:
 $ cd my-project
 $ inq
Or if the auto-detection doesn t work, you can ask it to generate a config file and then run the tests:
 $ inq auto
 $ inq run
inq auto writes an inquest.toml describing how to invoke the test runner; inq run runs the tests, captures the subunit stream, and stores the results in a .inquest/ directory. For a Rust project the generated config looks like:
 test_command = "cargo subunit $IDOPTION"
 test_id_option = "--test $IDFILE"
 test_list_option = "--list"
After the first run, the usual queries work:
 $ inq stats             # repository-wide statistics
 $ inq last              # results of the most recent run
 $ inq failing           # only the failing tests
 $ inq slowest           # the slowest tests in the last run
 $ inq run --failing     # re-run only what failed last time
The last one is the workflow I use most often: run the full suite once, fix the obvious failures, then iterate on inq run --failing until the list is empty.
A few things that aren t in testrepository Some of the features that have grown in inquest beyond the original testrepository functionality:
  • Timeouts. --test-timeout, --max-duration, and --no-output-timeout will kill a test process that is hanging or has stopped producing output. --test-timeout auto derives a per-test timeout from the historical duration of that test, which is handy for catching tests that hang. Once the test runner is killed, the test is marked as failed and the next test is started, so a broken test doesn t hold up the whole suite.

  • Ordering --order can be used to run tests in a specific order, e.g. to run the slowest tests first, to run the tests that failed most recently first, or to run the widest variety of tests first to maximize the chance of finding a failure early on.

  • Live progress. inq running tails the in-progress subunit stream on disk and reports observed/expected test counts, percent complete, elapsed wall-clock time, and an ETA derived from each test s historical duration. Useful when a CI run is taking longer than you d like.

  • Flakiness ranking. inq flaky ranks tests by pass fail transitions in consecutive runs in which the test was recorded, so chronically broken tests rank low and genuinely flapping tests rank high.

  • Comparing runs. inq diff <A> <B> shows what changed between two test runs newly failing, newly passing, and tests that flipped state which makes it easy to see whether your last change actually fixed (or broke) anything.

  • Bisecting git history. inq bisect <TEST> drives git bisect to find the commit that broke a given test. It defaults the known-good and known-bad commits from the recorded run history (the most recent run where the test passed, and the most recent where it failed), so in the common case there is no need to remember either just point it at the test name and let it work.

  • Richer run metadata. inq info shows the git commit, command line, duration, exit code, and concurrency for a run, with a flag for whether the working tree was dirty when the run started. Combined with inq diff this makes it much easier to triangulate when a regression was introduced.

  • Rerun a previous run verbatim. inq rerun <ID> re-runs exactly the tests of a previous run, in the same order, forwarding the same -- arguments that the original run used. inq rerun -1 repeats the latest.

  • Web based view. inq web serves a web-based view of the repository, with a dashboard of recent runs and detailed views of individual runs and tests.

Web UI Most of the time I drive inquest from the command line, but for browsing historical results of a large suite spotting flapping tests, drilling into a single test s run history, or just getting a visual sense of which parts of the suite are hurting a web view is more pleasant. inq web starts a local server with exactly that:
 $ inq web
The repository overview shows totals and a per-test history grid where each cell is one run, coloured by outcome. Bands of red make it easy to pick out tests that have been broken for a long time, and isolated red cells in an otherwise green column point at flaky tests. Inquest web UI repository overview, with a grid of per-run results Drilling into an individual test gives you its full run history, a duration sparkline, and per-run pass/fail status: Inquest web UI per-test view with run history and duration sparkline
Migrating from testrepository If you already have a .testrepository/ directory full of historical runs, inq upgrade will migrate it into the new .inquest/ format, with a progress bar for the impatient. The legacy .testr.conf (INI) format is still understood, so existing projects don t have to be converted to inquest.toml immediately though the TOML format is preferred for new projects.
Trying it The source is on GitHub at jelmer/inquest. To install from source:
 $ cargo install inquest
In a project with a Rust, Python, Go or Node.js test suite:
 $ inq
Bug reports and patches are welcome.

20 April 2026

Russ Allbery: Review: Surface Detail

Review: Surface Detail, by Iain M. Banks
Publisher: Orbit
Copyright: October 2010
Printing: May 2011
ISBN: 0-316-12341-2
Format: Trade paperback
Pages: 627
Surface Detail is the ninth novel in Banks's Culture science fiction (literary space opera?) series. As with most of the Culture novels, it can be read in any order, although this isn't the best starting point. There is an Easter egg reference to Use of Weapons that would be easier to notice if you have read that book recently, but which is not that important to the story. Lededje Y'breq is an Indented Intagliate from the Sichultian Enablement. Her body is patterned from her skin down to her bones, covered with elaborate markings similar to tattoos that extend to her internal organs. As an intagliate, she is someone's property. In her case, she is the property of Joller Veppers, the richest man in the Enablement and her father's former business partner. Intagliates are a tradition of great cultural pride in the Enablement. They are a living representation of the seriousness with which debts and honor are taken, up to and including one's not-yet-born children becoming the property of one's debtor. Such children are decorated as living works of art of the highest skill and technical sophistication; after all, the Enablement are not barbarians. As the story opens, Lededje is attempting, not for the first time, to escape. This attempt is successful in an unexpected way. Prin and Chay are Pavulean researchers and academics who, as this story opens, are in Hell. They are not dead; they have infiltrated the Hell that Pavuleans are shown to scare them into proper behavior in order to prove that it is not an illusion and their society does indeed torture people in an afterlife, in more awful ways than people dare imagine. They have reached the portal through which temporary visitors exit, hoping to escape with firm evidence of the existence and horrors of the Pavulean afterlife. They will not be entirely successful. Yime Nsokyi is a Culture agent for Quietus, the part of Contact that concerns itself with the dead. Many advanced societies throughout the galaxy have invented and reinvented the ability to digitize a mind and then run it in a virtual environment. Once a society can capture the minds of every person in that society from that point forward, it faces the question of whether to do so and, if it does, what to do with those minds. More specifically, it faces the moral question of whether to punish the minds of people who were horrible in life. It faces the question of whether to create Hell. Vatueil is a soldier in a contestation, a limited and carefully monitored virtual war. The purpose of that war game is to, once and for all, resolve the question of whether civilizations should be allowed to create Hells. Some civilizations consider them integral to their religion or self-conception. Others consider them morally abhorrent, and that conflict was in danger of spilling over into war in the Real. Hence the War in Heaven: Both sides committed to fight in a virtual space under specific and structured rules, and the winner decides the fate of the galaxy's Hells. Vatueil is fighting for the anti-Hell side. The anti-Hell side is losing. There are very few authors who were better at big-idea science fiction than Iain M. Banks. I've been reading a few books about AI ships and remembered that I had two unread Culture novels that I was saving. It felt like a good time to lose myself in something sprawling. Surface Detail does sprawl. Even by Banks's standards, there was an impressive amount of infodumping in this book. Banks always has huge and lovingly described set pieces, and this book is no exception, but there are also paragraphs and pages of background and cultural musings and galactic politics. We are introduced to not one but three new Contact divisions; as well as the already-mentioned Quietus, there is Numina, which concerns itself with the races that have sublimed (transcended), and Restoria, which deals with hegemonizing swarms (grey goo nanotech, paperclip maximizers, and their equivalents). Infodumping is both a feature and a bane of big-idea science fiction, and it helps to be in the right mood. It also helps if the info being dumped is interesting, and this is where Banks shines. This is a huge, sprawling book, but it deals with some huge, sprawling questions and it has interesting and non-reductive thoughts about them. The problems posed by the plot come with history, failed solutions, multi-sided political disputes, strategies and tactics of varying morality and efficacy, and an effort to wrestle with the irreducible complexity of trying to resolve political and ethical disagreements in a universe full of profound disagreements and moral systems that one cannot simply steamroll. It also helps that the characters are interesting, even when they're not likable. Surface Detail has one fully hissable villain (Veppers) as a viewpoint character, but even Veppers is interesting in a "let me check the publication date to see if Banks was aware of Peter Thiel" sort of way. The Culture ships, of which there are several in this story, tend towards a gently sarcastic kindness that I find utterly charming. Lededje provides the compelling motive force of someone who has no involvement in the broader philosophical questions and instead intends to resolve one specific problem through lethal violence. Vatueil and Yime were a bit bland in personality, more exposition generators than characters I warmed to, but their roles and therefore the surrounding exposition were fascinating enough that I still enjoyed their sections. I'm sure this is not an original observation, but I was struck reading this book in the first half of 2026 that the Culture functions as an implementation of what the United States likes to think it is but has never been. It has a strong sense of shared ethics and moral principles, it tries to export them to the rest of the galaxy through example, persuasion, and careful meddling, but it tries to follow some combination of pragmatic and moral rules while doing so, partly to avoid a backlash and partly to avoid becoming its own sort of hegemonizing swarm. That is a powerfully attractive vision of how to be an advanced civilization, and the fact that every hegemon that has claimed that mantle has behaved appallingly just makes it more intriguing as a fictional concept. In this book, like in many Culture books, the Culture is painfully aware of the failure modes of meddling, and the story slowly reveals the effort the Culture put into staying just on a defensible side of their own moral lines. This is, in a sense, a Prime Directive story, but with a level of hard-nosed pragmatism and political sophistication that the endless Star Trek Prime Directive episodes never reach. Surface Detail does tend to sprawl, and I'm not sure Banks pulled together all the pieces of the plot. For example, if there was a point to the subplot involving the Unfallen Bulbitian, it was lost on me. (There is always a possibility with Banks that I wasn't paying close enough attention.) But the descriptions are so elaborate and the sense of politics and history are so deep that I was never bored, even when following a plot thread that meandered off into apparent irrelevance. The main plot line comes to a satisfying conclusion that may be even more biting social commentary today than it was in 2010. A large part of the plot does involve Hell, so a warning for those who haven't read much Banks: He adores elaborate descriptions of body horror and physical torture. The sections involving Prin and Chay are rather grim and horrific, probably a bit worse than Dante's Inferno. I have a low tolerance for horror and I was able to read past and around the worst bits, but be warned that Banks indulges his love for the painfully grotesque quite a bit. This was great, and exactly what I was hoping for when I picked it up. It's not the strongest Culture novel (for me, that's either The Player of Games or Excession), but it's one of the better ones. Highly recommended, although if you're new to the Culture, I would start with one of the earlier books that provide a more gradual introduction to the Culture and Special Circumstances. Followed, in the somewhat disconnected Culture series sense, by The Hydrogen Sonata. Content warnings: Rape (largely off-screen), graphic violence, lots of Bosch-style grotesque torture, and a lot of Veppers being a thoroughly awful human being as a viewpoint character. Rating: 8 out of 10

17 April 2026

Russell Coker: Home Battery

Prices On the 19th of March I got a home battery system installed. The government has a rebate scheme so it had a list price of about $22k for a 40kWh setup and cost me about $12k. It seems that 40KWh is the minimum usable size for the amount of electricity I use, I have 84 cores running BOINC when they have nothing better to do which is 585W of TDP according to Intel. While the CPUs are certainly using less than the maximum TDP (both due to design safety limits and the fact that I have disabled hyper-threading on all systems due to it providing minimal benefits and potential security issues) given some power usage by cooling fans and some inefficiency in PSUs I think that assuming that 585W is accounted for 24*7 by CPUs is reasonable. So my home draws between 800W and 1KW when no-one is home and with an electric car and all electric cooking a reasonable amount of electricity can be used. My bills prior to the battery installation were around $200/month which was based on charging my car only during sunny times as my electricity provider (Amber Electric) has variable rates based on wholesale prices. Also the feed in rates if my solar panels produce too much electricity in sunny times often go negative so if I don t use enough electricity. I haven t had the electric car long enough to find out what the bills might be in winter without a home battery. Before getting the battery my daily bills according to the Amber app were usually between $5 and $10. After getting it the daily bills have almost always been below $5. The only day where it s been over $5 since the battery installation was when electricity was cheap and I fully charged the home battery and my car which used 50KWh in one day and cost $7.87 which is 16 cents per KWh. 16 cents isn t the cheapest price (sometimes it gets as low as 10 cents) but is fairly cheap, sometimes even in the cheap parts of the day it doesn t get that low (the cheapest price on the day I started writing this was 20 cents). So it looks like this may save me $100 per month, if so there will be a 10% annual return on investment on the $12K I spent. This makes it a good investment, better than repaying a mortgage (which is generally under 6%) and almost as good as the long term results of index tracker funds. However if it cost $22K (the full price without subsidy) then it would still be ok but wouldn t be a great investment. The government subsidised batteries because the huge amount of power generated by rooftop solar systems was greater than the grid could use during the day in summer and batteries are needed to use that power when it s dark. Android App The battery system is from Fox ESS and the FoxCloud 2.0 Android app is a bit lacking in functionality. It has a timer for mode setting with options Self-use (not clearly explained), Feed-in Priority (not explained but testing shows feeding everything in to the grid), Back Up , Forced Charge , and Forced Discharge . Currently I have Forced Charge setup for most sunny 5 hours of the day for a maximum charge power of 5KW. I did that because about 25KW/day is what I need to cover everything and while the system can do almost 10KW that would charge the battery fully in a few hours and then electricity would be exported to the grid which would at best pay me almost nothing and at worst bill me for supplying electricity when they don t want it. There doesn t seem to be a never put locally generated power into the grid unless the battery is full option. The force charge mode allows stopping at a certain percentage, but when that is reached there is no fallback to another option. It would be nice if the people who designed the configuration could take as a baseline assumption that the macro programming in office suites and functions in spreadsheets are things that regular people are capable of using when designing the configuration options. I don t think we need a Turing complete programming language in the app to control batteries (although I would use it if there was one), but I think we need clauses like if battery is X% full then end this section . There is no option to say force charge until 100% or force charge for the next X minutes as a one-off thing. If I came home in the afternoon with my car below 50% battery and a plan to do a lot of driving the next day then I d want to force charge it immediately to allow charging the car overnight. But I can t do that without entering a schedule . For Unix people imagine having to do everything via a cron job and no option to run something directly from the command-line. It s a little annoying that they appear to have spent more development time on animations for the app than some of what should be core functionality. Management Amber has an option to allow my battery to be managed by them based on wholesale pries but I haven t done that as the feed-in prices are very low. So I just charge my battery when electricity is cheap and use it for the rest of the day. There is usually a factor of 2 or more price difference between the middle of the day and night time so that saves money. It also means I don t have to go out of my way to try and charge my car in the middle of the day. There is some energy lost in charging and discharging the batteries but it s not a lot. I configured the system to force charge for the 5 sunniest hours every day for 5KW as that s enough to keep it charged overnight and 5KW is greater than the amount of solar electricity produced on my house since I ve been monitoring it so that forces it to all be used for the battery. In summer I might have to change that to 6KW for the sunniest 2 or 3 hours and then 4KW or 5KW surrounding that which will be a pain to manage. Instead of charging the car every day during sunny times I charge it once or twice a week, I have a 3.3KW charger and the car has a 40KWh battery so usually it takes me less than 10 hours to fully charge it and I get at least 5 hours of good sunlight in the process. There are people hacking on these devices which is interesting to get direct control from computers [1], and apparently not banned from the official community for doing so. I m not enthusiastic enough to do this, I ve got plenty of other free software things to work on. But it s good that others are doing so.

15 April 2026

Paul Tagliamonte: designing arf, an sdr iq encoding format

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

Want to jump right to the draft? I'll be maintaining ARF going forward at /draft-tagliamonte-arf-00.txt.
It s true processing data from software defined radios can be a bit complex which tends to keep all but the most grizzled experts and bravest souls from playing with it. While I wouldn t describe myself as either, I will say that I ve stuck with it for longer than most would have expected of me. One of the biggest takeaways I have from my adventures with software defined radio is that there s a lot of cool crossover opportunity between RF and nearly every other field of engineering. Fairly early on, I decided on a very light metadata scheme to track SDR captures, called rfcap. rfcap has withstood my test of time, and I can go back to even my earliest captures and still make sense of what they are IQ format, capture frequencies, sample rates, etc. A huge part of this was the simplicity of the scheme (fixed-lengh header, byte-aligned to supported capture formats), which made it roughly as easy to work with as a raw file of IQ samples. However, rfcap has a number of downsides. It s only a single, fixed-length header. If the frequency of operation changed during the capture, that change is not represented in the capture information. It s not possible to easily represent mulit-channel coherent IQ streams, and additional metadata is condemned to adjacent text files.

ARF (Archive of RF) A few years ago, I needed to finally solve some of these shortcomings and tried to see if a new format would stick. I sat down and wrote out my design goals before I started figuring out what it looked like. First, whatever I come up with must be capable of being streamed and processed while being streamed. This includes streaming across the network or merely written to disk as it s being created. No post-processing required. This is mostly an artifact of how I ve built all my tools and how I intereact with my SDRs. I use them extensively over the network (both locally, as well as remotely by friends across my wider lan). This decision sometimes even prompts me to do some crazy things from time to time. I need actual, real support for multiple IQ channels from my multi-channel SDRs (Ettus, Kerberos/Kracken SDR, etc) for playing with things like beamforming. My new format must be capable of storing multiple streams in a single capture file, rather than a pile of files in a directory (and hope they re aligned). Finally, metadata must be capable of being stored in-band. The initial set of metadata I needed to formalize in-stream were Frequency Changes and Discontinuities. Since then, ARF has grown a few more. After getting all that down, I opted to start at what I thought the simplest container would look like, TLV (tag-length-value) encoded packets. This is a fairly well trodden path, and used by a bunch of existing protocols we all know and love. Each ARF file (or stream) was a set of encoded packets (sometimes called data units in other specs). This means that unknown packet types may be skipped (since the length is included) and additional data can be added after the existing fields without breaking existing decoders.
tag
length
value
Heads up! Once this is posted, I'm not super likely to update this page. Once this goes out, the latest stable copy of the ARF spec is maintained at draft-tagliamonte-arf-00.txt. This page may quickly become out of date, so if you're actually interested in implementing this, I've put a lot of effort into making the draft comprehensive, and I plan to maintain it as I edit the format.
Unlike a traditional TLV structure, I opted to add flags to the top-level packet. This gives me a bit of wiggle room down the line, and gives me a feature that I like from ASN.1 a critical bit. The critical bit indicates that the packet must be understood fully by implementers, which allows future backward incompatible changes by marking a new packet type as critical. This would only really be done if something meaningfully changed the interpretation of the backwards compatible data to follow.
Flag Description
0x01Critical (tag must be understood)
Within each Packet is a tag field. This tag indicates how the contents of the value field should be interpreted.
Tag ID Description
0x01Header
0x02Stream Header
0x03Samples
0x04Frequency Change
0x05Timing
0x06Discontinuity
0x07Location
0xFEVendor Extension
In order to help with checking the basic parsing and encoding of this format, the following is an example packet which should parse without error.
 00, // tag (0; no subpacket is 0 yet)
 00, // flags (0; no flags)
 00, 00 // length (0; no data)
 // data would go here, but there is none
Additionally, throughout the rest of the subpackets, there are a few unique and shared datatypes. I document them all more clearly in the draft, but to quickly run through them here too:

UUID This field represents a globally unique idenfifer, as defined by RFC 9562, as 16 raw bytes.

Frequency Data encoded in a Frequency field is stored as microhz (1 Hz is stored as 1000000, 2 Hz is stored as 2000000) as an unsigned 64 bit integer. This has a minimum value of 0 Hz, and a maximum value of 18446744073709551615 uHz, or just above 18.4 THz. This is a bit of a tradeoff, but it s a set of issues that I would gladly contend with rather than deal with the related issues with storing frequency data as a floating point value downstream. Not a huge factor, but as an aside, this is also how my current generation SDR processing code (sparky) stores Frequency data internally, which makes conversion between the two natural.

IQ samples ARF supports IQ samples in a number of different formats. Part of the idea here is I want it to be easy for capturing programs to encode ARF for a specific radio without mandating a single iq format representation. For IQ types with a scalar value which takes more than a single byte, this is always paired with a Byte Order field, to indicate if the IQ scalar values are little or big endian.
ID Name Description
0x01f32interleaved 32 bit floating point scalar values
0x02i8 interleaved 8 bit signed integer scalar values
0x03i16interleaved 16 bit signed integer scalar values
0x04u8 interleaved 8 bit unsigned integer scalar values
0x05f64interleaved 64 bit floating point scalar values
0x06f16interleaved 16 bit floating point scalar values

Stream Header Immediately after the arf Header, some number of Stream Headers follow. There must be exactly the same number of Stream Header packets as are indicated by the num streams field of the Header. This has the nice effect of enabling clients to read all the stream headers without requiring buffering of unread packets from the stream.
id
flags
fmt
bo
rate
freq
guid
site
In order to help with checking the basic parsing and encoding of this format, the following is an example stream header subpacket (when encoded or decoded this will be found inside an ARF packet as described above) which should parse without error, with known values.
00, 01, // id (1)
00, 00, 00, 00, 00, 00, 00, 00, // flags
01, // format (float32)
01, // byte order (Little Endian)
00, 00, 01, d1, a9, 4a, 20, 00, // rate (2 MHz)
00, 00, 5a, f3, 10, 7a, 40, 00, // frequency (100 MHz)

// guid (7b98019d-694e-417a-8f18-167e2052be4d)
7b, 98, 01, 9d, 69, 4e, 41, 7a,
8f, 18, 16, 7e, 20, 52, be, 4d,

// site_id (98c98dc7-c3c6-47fe-bc05-05fb37b2e0db)
98, c9, 8d, c7, c3, c6, 47, fe,
bc, 05, 05, fb, 37, b2, e0, db,

Samples Block of IQ samples in the format indicated by this stream s format and byte_order field sent in the related Stream Header.
id
iq samples
In order to help with checking the basic parsing and encoding of this format, the following is an samples subpacket (when encoded or decoded this will be found inside an ARF packet as described above). The IQ values here are notional (and are either 2 8 bit samples, or 1 16 bit sample, depending on what the related Stream Header was).
01, // id
ab, cd, ab, cd, // iq samples

Frequency Change The center frequency of the IQ stream has changed since the Stream Header or last Frequency Change has been sent. This is useful to capture IQ streams that are jumping around in frequency during the duration of the capture, rather than starting and stopping them.
id
frequency
In order to help with checking the basic parsing and encoding of this format, the following is a frequency change subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
01, // id
00, 00, b5, e6, 20, f4, 80, 00 // frequency (200 MHz)

Discontinuity Since the last Samples packet for this stream, samples have been dropped or not encoded to this stream. This can be used for a stream that has dropped samples for some reason, a large gap (radio was needed for something else), or communicating iq snippits .
id
In order to help with checking the basic parsing and encoding of this format, the following is a discontinuity subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
01, // id

Location Up-to-date location as of this moment of the IQ stream, usually from a GPS. This allows for in-band geospatial information to be marked in the IQ stream. This can be used for all sorts of things (detected IQ packet snippits aligned with a time and location or a survey of rf noise in an area)
flags
sys
lat
long
el
accuracy
The sys field indicates the Geodetic system to be used for the provided latitude, longitude and elevation fields. The full list of supported geodetic systems is currently just WGS84, but in case something meaningfully changes in the future, it d be nice to migrate forward. Unfortunately, being a bit of a coward here, the accuracy field is a bit of a cop-out. I d really rather it be what we see out of kinematic state estimation tools like a kalman filter, or at minimum, some sort of ellipsoid. This is neither of those - it s a perfect sphere of error where we pick the largest error in any direction and use that. Truthfully, I can t be bothered to model this accurately, and I don t want to contort myself into half-assing something I know I will half-ass just because I know better.
System Description
0x01 WGS84 - World Geodetic System 1984
In order to help with checking the basic parsing and encoding of this format, the following is a location subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
00, 00, 00, 00, 00, 00, 00, 00, // flags
01, // system (wgs84)
3f, f3, be, 76, c8, b4, 39, 58, // latitude (1.234)
40, 02, c2, 8f, 5c, 28, f5, c3, // longitude (2.345)
40, 59, 00, 00, 00, 00, 00, 00, // elevation (100)
40, 24, 00, 00, 00, 00, 00, 00 // accuracy (10)

Vendor Extension In addition to the fields I put in the spec, I expect that I may need custom packet types I can t think of now. There s all sorts of useful data that could be encoded into the stream, so I d rather there be an officially sanctioned mechanism that allows future work on the spec without constraining myself. Just an example, I ve used a custom subpacket to create test vectors, the data is encoded into a Vendor Extension, followed by the IQ for the modulated packet. If the demodulated data and in-band original data don t match, we ve regressed. You could imagine in-band speech-to-text, antenna rotator azimuth information, or demodulated digital sideband data (like FM HDR data) too. Or even things I can t even think of!
id
data
In order to help with checking the basic parsing and encoding of this format, the following is a vendor extension subpacket (when encoded or decoded this will be found inside an ARF packet as described above).
// extension id (b24305f6-ff73-4b7a-ae99-7a6b37a5d5cd)
b2, 43, 05, f6, ff, 73, 4b, 7a,
ae, 99, 7a, 6b, 37, a5, d5, cd,

// data (0x01, 0x02, 0x03, 0x04, 0x05)
01, 02, 03, 04, 05

Tradeoffs The biggest tradeoff that I m not entirely happy with is limiting the length of a packet to u16 65535 bytes. Given the u8 sample header, this limits us to 8191 32 bit sample pairs at a time. I wound up believing that the overhead in terms of additional packet framing is worth it because always encoding 4 byte lengths felt like overkill, and a dynamic length scheme ballooned codepaths in the decoder that I was trying to keep as easy to change as possible as I worked with the format.

Freexian Collaborators: Debian Contributions: Debusine projects in GSoC, Debian CI updates, Salsa CI maintenance and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-03 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debusine projects in Google s Summer of Code While Freexian initiated Debusine, and is investing a lot of resources in the project, we manage it as a true free software project that can and should have a broader community. We always had documentation for new contributors and we aim to be reactive with them when they interact via the issue tracker or via merge requests. We decided to put those intentions under stress tests by proposing five projects for Google s Summer of Code as part of Debian s participation in that program. Given that at least 11 candidates managed to get their merge request accepted in the last 30 days (interacting with the development team is part of the pre-requisites to apply to Google Summer of Code projects these days), the contributing experience must not be too bad. If you want to try it out, we maintain a list of quick fixes that are accessible to newcomers. And as always, we welcome your feedback!

Debian CI: incus backend and upgrade to Bootstrap 5, by Antonio Terceiro debci 3.14 was released on March 4th, with a followup 3.14.1 release with regression fixes a few days afterwards. Those releases were followed by new development and maintenance work that will provide extra capabilities and stability to the platform. This month saw the initial version of an incus backend land in Debian CI. The transition into the new backend will be done carefully so as to not disrupt testing migration. Each package will be running jobs with both the current lxc backend and with incus. Packages that have the same result on both backends will be migrated over, and packages that exhibit different results will be investigated further, resulting in bug reports and/or other communication with the maintainers. On the frontend side, the code has been ported to Bootstrap 5 over from the now ancient Bootstrap 3. This need has been originally reported back in 2024 based on the lack of security support for Bootstrap 3. Beyond improving maintainability, this upgrade also enables support for dark mode in debci, which is still work in progress. Both updates mentioned in this section will be available in a following debci release.

Salsa CI maintenance by Santiago Ruano Rinc n et al. Santiago reviewed some Salsa CI issues and reviewed associated merge requests. For example, he investigated a regression (#545), introduced by the move to sbuild, on the use of extra repositories configured as .source files; and reviewed the MR (!712) that fixes it. Also, there were conflicts with changes made in debci 3.14 and debci 3.14.1 (those updates are mentioned above), and different people have contributed to fix the subsequent issues, in a long-term way. This includes Rapha l who proposed MR !707 and who also suggested Antonio to merge the Salsa CI patches to avoid similar errors in the future. This happened shortly after. Those fixes finally required the unrelated MR !709, which will prevent similar problems when building images. To identify bugs related to the autopkgtest support in the backport suites as early as possible, Santiago proposed MR !708. Finally, Santiago, in collaboration with Emmanuel Arias also had exchanges with GSoC candidates for the Salsa CI project, including the contributions they have made as merge requests. It is important to note that there are several very good candidates interested in participating. Thanks a lot to them for their work so far!

Miscellaneous contributions
  • Rapha l reported a zim bug affecting Debian Unstable users, which was already fixed in git apparently. He could thus cherry-pick the fix and update the package in Debian Unstable.
  • Carles created a new page on the InstallingDebianOn in Debian Wiki.
  • Carles submitted translation errors in the debian-installer Weblate.
  • Carles, using po-debconf-manager, improved Catalan translations: reviewed and submitted 3 packages. Also improved error handling when forking or submitting an MR if the fork already existed.
  • Carles kept improving check-relations: code base related general improvements (added strict typing, enabled pre-commit). Also added DebPorts support, virtual packages support and added commands for reporting missing relations and importing bugs from bugs.debian.org.
  • Antonio handled miscellaneous Salsa support requests.
  • Antonio improved the management of MiniDebConf websites by keeping all non-secret settings in git and fixed exporting these sites as static HTML.
  • Stefano uploaded routine updates to hatchling, python-mitogen, python-virtualenv, python-discovery, dh-python, pypy3, python-pipx, and git-filter-repo.
  • Faidon uploaded routine updates to crun, libmaxminddb, librdkafka, lowdown, platformdirs, python-discovery, sphinx-argparse-cli, tox, tox-uv.
  • Stefano and Santiago continued to help with DebConf 26 preparations.
  • Stefano reviewed some contributions to debian-reimbursements and handled admin for reimbursements.debian.net.
  • Stefano attended the Debian Technical Committee meeting.
  • Helmut sent 8 patches for cross build failures.
  • Building on the work of postmarketOS, Helmut managed to cross build systemd for musl in rebootstrap and sent several patches in the process.
  • Helmut reviewed several MRs of Johannes Schauer Marin Rodrigues expanding support for DPKG_ROOT to support installing hurd.
  • Helmut incorporated a final round of feedback for the Multi-Arch documentation in Debian policy, which finally made it into unstable together with documentation of Build-Profiles.
  • In order to fix python-memray, Helmut NMUed libunwind generally disabling C++ exception support as being an incompatible duplication of the gcc implementation. Unfortunately, that ended up breaking suricata on riscv64. After another NMU, python-memray finally migrated.
  • Thorsten uploaded new upstream versions of epson-inkjet-printer-escpr and sane-airscan. He also fixed a packaging bug in printer-driver-oki. As of systemd 260.1-1 the configuration of lpadmin has been added to the sysusers.d configuration. All printing packages can now simply depend on the systemd-sysusers package and don t have to take care of its creation in maintainer scripts anymore.
  • In collaboration with Emmanuel Arias, Santiago had exchanges with GSoC candidates and reviewed the proposals of the Linux livepatching GSoC 2026 project.
  • Colin helped to fix CVE-2026-3497 in openssh and CVE-2026-28356 in multipart.
  • Colin upgraded tango and pytango to new upstream releases and packaged pybind11-stubgen (needed for pytango), thanks to a Freexian customer. Tests of reproducible builds revealed that pybind11-stubgen didn t generate imports in a stable order; this is now fixed upstream.
  • Lucas fixed CVE-2025-67733 and CVE-2026-21863 affecting src:valkey in unstable and testing. Also reviewed the same fixes targeting stable proposed by Peter Wienemann.
  • Faidon worked with upstream and build-dep Debian maintainers on resolving blockers in order to bring pyHanko into Debian, starting with the adoption of python-pyhanko-certvalidator. pyHanko is a suite for signing and stamping PDF files, and one of the few libraries that can be leveraged to sign PDFs with eIDAS Qualified Electronic Signatures.
  • Anupa co-organized MiniDebConf Kanpur and attended the event with many others from all across India. She handled the accommodation arrangements along with the registration team members, worked on the budget and expenses. She was also a speaker at the event.
  • Lucas helped with content review/schedule for the MiniDebConf Campinas. Thanks Freexian for being a Gold sponsor!
  • Lucas organized and took part in a one-day in-person sprint to work on Ruby 3.4 transition. It was held in a coworking space in Brasilia - Brazil on April 6th. There were 5 DDs and they fixed multiple packages FTBFSing against Ruby 3.4 (coming to unstable soon hopefully). Lucas has been postponing a blog post about this sprint since then :-)

14 April 2026

Ravi Dwivedi: Hungary Visa

The annual LibreOffice conference 2025 was held in Budapest, Hungary, from the 3rd to the 6th of September 2025. Thanks to the The Document Foundation (TDF) for sponsoring me to attend the conference. As Hungary is a part of the Schengen area, I needed a Schengen visa to attend the conference. In order to apply for a Schengen visa, one needs to get an appointment at VFS Global and submit all the required documents there, which are then forwarded to the embassy. I got an appointment for a Hungary visa at VFS Global in New Delhi for the 24th of July. There were many appointment slots available for the Hungary visa. One could easily get an appointment for the next day at the Delhi center. There were some technical problems on the VFS website, though, as I was unable to upload a scanned copy of my passport while booking the appointment. I got an error saying, Unfortunately, you have exceeded the maximum upload limit. The problem didn t get fixed even after contacting the VFS helpline. They asked me to try in the Firefox browser and deleting all the cache, which I already did. So I created another account with a different email address and phone number, after which I was able to upload my passport and book an appointment. Other conference attendees from India also reported facing some technical issues on the VFS Hungary website. Anyway, I went to the VFS Hungary application center as per my appointment on the 24th of July. Going inside, I located the Hungary visa application counter. There were two applicants ahead of me. When it was my turn, the VFS staff warned me that my passport was damaged. The damage was on the bio-data page. All the details could be seen, but the lamination of the details page wore off a bit. They asked me to write an application to the Embassy of Hungary in New Delhi stating that I insist VFS to submit my application along with describing the damage on my passport. I got a bit worried about my application getting rejected due to the damage. But I decided to gamble my money on this one, as I didn t have time (and energy) to apply for a new passport before this trip. Moreover, I had struck down a couple of fields in my visa application form which were not applicable to me, due to which the VFS staff asked me to fill out another visa application. After this, the application got submitted, and it was 11,000 INR (including the fee to book the appointment at VFS). Here is the list of documents I submitted: It took 2 hours for me to submit my visa application, even though there were only two applicants before me. This was by far the longest time to submit a Schengen visa application for me. Fast-forward to the 30th of July, and I received an email from the Embassy of Hungary asking me to submit an additional document - paid air ticket - for my application. I had only submitted dummy flight tickets, and they were enough for the Schengen visas I applied for until now. This was the first time a country was asking me to submit a confirmed flight ticket during the visa process. I consulted my travel agent on this, and they were fairly confident that I will get the visa if the embassy is asking me to submit confirmed flight tickets. So I asked the travel agent to book the flight tickets. These tickets were 78,000, and the airline was Emirates. Then, I sent the flight tickets to the embassy by email. The embassy sent the visa results on the 6th of August, which I received the next day. My visa had been approved! It took 14 days for me to get the Hungary visa after submitting the application. See you in the next one! Thanks to Badri for proofreading.

12 April 2026

Dirk Eddelbuettel: littler 0.3.23 on CRAN: Mostly Internal Fixes

max-heap image The twentyfourth release of littler as a CRAN package landed on CRAN just now, following in the now twenty-one year history (!!) as a (initially non-CRAN) package started by Jeff in 2006, and joined by me a few weeks later. littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only began to do in later years. littler lives on Linux and Unix, has its difficulties on macOS due to some-braindeadedness there (who ever thought case-insensitive filesystems as a default were a good idea?) and simply does not exist on Windows (yet the build system could be extended see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo:, as well as in the examples vignette. This release, which comes just two months after the previous 0.3.22 release that brought a few new features, is mostly internal. (The previous release erroneously had 0.3.23 in its blog and social media posts, it really was 0.3.22 and this one now is is 0.3.23.) Mattias Ellert address a nag (when building for a distribution) about one example file with a shebang not have excutable modes. I accommodated the ever-changing interface the C API of R (within about twelve hours of being notified). A few other smaller changes were made as well polishing a script or two or usual, see below for more. The full change description follows.

Changes in littler version 0.3.23 (2026-04-12)
  • Changes in examples scripts
    • Correct spelling in installGithub.r to lower-case h
    • The r2u.r now recognises resolute aka 26.06
    • installRub.r can install (more easily) from r-multiverse
    • A file permission was corrected (Mattias Ellert in #131)
  • Changes in package
    • Update script count and examples in README.md
    • Continuous intgegration scripts received minor updates
    • The C level access to the R API was updated to reflect most recent standards (Dirk in #132)

My CRANberries service provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and also on the package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as (in a day or two) Ubuntu binaries at CRAN thanks to the tireless Michael Rutter. Comments and suggestions are welcome at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub. You can also sponsor my Tour de Shore 2026 ride in support of the Maywood Fine Arts Center.

9 April 2026

Russell Coker: HP Z640 and E5-2696 v4

I recently decided to upgrade the CPU in my workstation, the E5-2696 v3 CPU was OK (passmark 2045 for single thread and 21,380 for multi thread) [1] but I felt like buying something better so I got a E5-2696 v4 (passmark 2115 and 24,643) [2]. I chose the E5-2696 v4 because I was looking for a E5-2699 v4 and found an ebay seller who had them at $140 but was offering the E5-2696 v4 for $99 and the passmark results for the two CPUs are almost identical. After buying the CPU and waiting for it to be delivered I realised that the Z640 doesn t include it in the list of supported CPUs and that the maximum TDP of any supported CPU is 145W while according to passmark it has a TDP of 150W. I looked for information about it on Intel ARK (the official site for specs of Intel CPUs) and discovered that The Intel Xeon Processor E5-2696 v4 is designed to be used by system manufacturers (OEMs), and this means they can modify its specifications depending on the system where it will be implemented and The processor does not have an ARK page for this reason, since it has no standard specification from Intel, so depending on the original system, it is necessary to contact that system manufacturer for information [3]. That s the official response from an Intel employee saying that there are no standard specs for that CPU!!! Somehow I had used a E5-2696 v3 for 3 years without realising that the same lack of support and specs applies to it [4]! I installed the new CPU in another Z640 which had a E5-1620 v3 CPU and it worked. I was a little surprised to discover that the hole in the corner is in the bottom right (according to the alignment of the printed text on the top) for all my E5-26xx CPUs while it s in the top left on the E5-1620 v3. Google searches for things like e5-2600 e5-1600 difference and e5-2600 e5-1600 difference hole in corner didn t turn up any useful information. The best information I found was from the Linus Tech Tips forum which says that the hole is to allow gasses to escape when the CPU package is glued together [5] which implies (but doesn t state) that the location of the hole has no meaning. I had previously thought that the hole was to indicate the location of pin 1 and was surprised when the new CPU had the hole in the opposite corner. Hopefully in future when people have such concerns they can find this post and not be worried that they are about to destroy their CPU, PC, or both when upgrading the CPU. The previous Z640 was one I bought from Facebook marketplace for $50 in unknown condition in the expectation that I would get at least $50 of parts but it worked perfectly apart from one DIMM socket. The Z640 I m using now is one I bought from Facebook marketplace for $200 and it s working perfectly with 4 DIMMs, 128G of RAM, and the E5-2696 v4 CPU. $300 for a workstation with ECC RAM and a 22 core CPU is good value for money! There are some accounts of the E5-2696 v4 not working on white-box motherboards including a claim that when it was selling for $4000US someone s motherboard destroyed one. The best plan for such CPUs is to google for someone who s already got it working in the same machine, which means a name-brand server. That doesn t guarantee that it will work (Intel refuses to supply specs and states that different items may work differently) but greatly improves the probability. This system has the HP BIOS version 2.61, note that the Linux fwupd package doesn t seem to update the BIOS on HP workstations so you need to manually download it and install it. There is a possibility that a Z640 with an older BIOS won t work with this CPU. Here is the previous post in my Z640 saga [6].

6 April 2026

Thorsten Alteholz: My Debian Activities in March 2026

Debian LTS/ELTS This was my hundred-forty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on: I also worked on the check-advisories script and proposed a fix for cases where issues would be assigned to the coordinator instead of the person who forgot doing something. I also did some work for a kernel update and packages snapd and ldx on security-master and attended the monthly LTS/ELTS meeting. Last but not least I started to work on gst-plugins-bad1.0 Debian Printing This month I uploaded a new upstream versions: Several packages take care of group lpadmin in their maintainer scripts. With the upload of version 260.1-1 of systemd there is now a central package (systemd systemd-standalone-sysusers systemd-sysusers) that takes care of this. Other dependencies like adduser can now be dropped. This work is generously funded by Freexian! Debian Lomiri This month I continued to work on unifying packaging on Debian and Ubuntu. This makes it easier to work on those packages independent of the used platform. I am also able to upload Debian packages to the corresponding Ubuntu PPA now. A small bug had to be fixed in the python script to allow the initial configuration in Launchpad. This work is generously funded by Fre(i)e Software GmbH! Debian Astro This month I uploaded a new upstream version or a bugfix version of: I also uploaded lots of indi-drivers (libplayerone, libsbig, libricohcamerasdk, indi-asi, indi-eqmod, indi-fishcamp, indi-inovaplx, indi-pentax, indi-playerone, indi-sbig, indi-mi, libahp-xc, indi-aagcloudwatcher, indi-aok, indi-apogee, libapogee3, indi-nightscape, libasi, libinovasdk, libmicam, indi-avalon, indi-beefocus, indi-bresserexos2, indi-dsi, indi-ffmv, indi-fli, indi-gige, info-gphoto, indi-gpsd, indi-gpsnmea, indi-limesdr, indi-maxdomeii, indi-mgen, indi-rtklib, indi-shelyak, indi-starbook, indi-starbookten, indi-talon6, indi-weewx-json, indi-webcam, indi-orion-ssg3, indi-armadillo-playtypus ) to experimental to make progress with the indi-transition. No problems with those drivers appeared and the next step would be the upload of indi version 2.x to unstable. I hope this will happen soon, as new drivers are already waiting in the pipeline. There have been also four packages, that migrated to the official indi package and are no longer needed as 3rdparty drivers (indi-astrolink4, indi-astromechfoc, indi-dreamfocuser, indi-spectracyber). While working on these packages, I thought about testing them. Unfortunately I don t have enough hardware to really check out every package, so I can upload most of them only as is. In case anybody is interested in a better testing coverage and me being able to provide upstream patches, I would be very glad about hardware donations. Debian IoT This month I uploaded a new upstream version or a bugfix version of: Debian Mobcom This month I uploaded a new upstream version or a bugfix version of: misc This month I uploaded a new upstream version or a bugfix version of: I also sponsored the upload of Matomo. Thanks a lot to William for preparing the package.

29 March 2026

Russ Allbery: Review: The Sovereign

Review: The Sovereign, by C.L. Clark
Series: Magic of the Lost #3
Publisher: Orbit
Copyright: September 2025
ISBN: 0-316-54286-5
Format: Kindle
Pages: 575
The Sovereign is the third and concluding book of C.L. Clark's Magic of the Lost high fantasy trilogy. I recommend reading the books of this series close together, since there are a lot of characters and a lot of continuity between books that is helpful to remember, but it was not quite as difficult this time to remember where the story left off. At the end of The Faithless, the political situation in Balladaire (not-France) was more stable, but the threat of a plague lay on the horizon. That threat arrives in earnest in this book, along with new threats from both Balladaire's former colonial conscript soldiers and from neighboring Taargen (not-Germany, sort of, although the parallel isn't as close). Luca and Touraine have finally admitted that they're deeply in love, but they are still very different people with different goals and ethics. Luca is determined to do anything necessary to save her kingdom, but her definition of her kingdom is sharp and brittle. Touraine is torn between far too many loyalties, plus the lingering worry that her morals and Luca's may not be compatible. I think the hardest part of this sort of series is finding an ending the reader will find satisfying. This one, unfortunately, did not work for me, but that may be more due to personal preference than objective flaws. There have been two threads through this series: an improbable romance embedded in a network of complex personal relationships, and a political commentary on colonialism and post-colonial wars. I was enjoying the former, but it was the latter that felt fresh and interesting to me. The plot threads in The Faithless outside of Balladaire expanded that complexity, and I was hoping the final volume would continue in that direction. How could a colonial power atone for its history? How does the former colony establish its own governance? Is there a path to freedom without violence? Are attempts to chart a more moral course doomed to open lines of attack for one's other enemies? It's clear that Clark was thinking about similar themes, but The Sovereign narrows the field instead of widens it, restricts the political options, and then resolves most questions in a massive war. This is not that surprising of a conclusion, but it's one that I found unsatisfying and, honestly, a little boring. Yes, one way to resolve all the competing tensions is for everyone to try to kill each other and whoever survives wins, and historically that's one of the more likely outcomes, but that ending doesn't wrestle with the politics as much as it collapses them. Clark instead focuses this concluding volume on the romance, which becomes even more fraught, tragic, and dramatic than it was in previous books (and that's saying something). The hard questions of divided loyalties and moral conflicts are mostly framed by questions about Touraine's loyalty to Luca and Luca's trust of Touraine. This is all very Shakespearean, full of hard choices, sudden reversals, miscommunication, and a very deep conflict between Luca's realpolitik and Touraine's stubborn personal morality. If this is what you were reading the series for, if you were hoping for a maximum-drama sapphic relationship, you may thoroughly enjoy this. I thought it had its moments, but I wish they had been balanced by more moments of cool-headed practicality and creative political ingenuity. My biggest frustration with this ending is that the characters largely stop doing politics. The political complexity was the strength of both The Unbroken and The Faithless: People who intensely dislike each other negotiate because there is something larger to be gained, personal decisions made without considering the political ramifications have costs, and multiple characters are trying hard to find a way to turn a nasty, exploitative world into something better without simply killing everyone who disagrees. Many of the characters were objectively bad at politics, inexperienced and immature, but they stumbled or dragged or fought their way into political solutions anyway. I thought Clark moved too far away from that in The Sovereign. Everyone goes deep into their own emotions and desire for vengeance or conquest or revolution and stops compromising. To a depressingly large extent, the story is resolved by killing everyone who disagrees. I think the story is poorer for it. One of the other threads of the series is Balladairan magic, or rather its odd absence. Luca has one understanding of it, the rebels introduced in The Faithless have a different understanding of it, and its pursuit is set up as critical to resolving the threat of a plague. We do get an explanation of sorts, but it's not as complete or as satisfying as I was hoping, and the symbolism of Balladaire's missing magic is left frustratingly murky. For me, this has some of the same problems as the political conclusion: I wanted an intellectual catharsis alongside the emotional catharsis, but that was not the direction Clark was taking the story. I like reading about these characters. All of Luca, Touraine, and Pruett are complex, comprehensible, flawed, and often intriguing. But my favorite character in the story, the person I latched on to as an emotional path through the story, was Sabine. Her refreshingly straightforward loyalty and lack of drama was a breath of fresh air. She has some great moments in this book, but there too I got wrong-footed by the direction Clark went with her arc and found its conclusion deeply unsatisfying. I'm not sure how many of these complaints are because of missed opportunities in the novel, how many were due to a mismatch of taste, and how many were due to not being in the right mood to read this conclusion. I'm sure that it didn't help that I read this simultaneous with another novel in which the characters were always miserable, or that I read it in early 2026 with, uh, all that entails. I suspect that if you came away from the first two books invested in the messy romance and wanting MOAR DRAMA, you may get exactly what you were hoping for. That, sadly, was not what I was hoping for. I can't really recommend this. I thought it dragged in places and didn't deliver the ending I wanted. But it has some great moments, it does wrap up the threads of the trilogy as advertised, and at least the romance gets a dramatic climax worthy of the tension that has been built through the previous books. If that matches what you were enjoying in the previous books, you may well enjoy this more than I did. Rating: 5 out of 10

27 March 2026

Samuel Henrique: I use curl with ECH btw (in Debian)

tl;dr This is an experimental feature that, for the first time, brings full ECH support to curl on Debian using OpenSSL. Starting with curl 8.19.0-3+exp2 (Debian Experimental), you can now use ECH, with HTTPS-RR and DoH for maximum privacy. curl 8.19.0-3+exp2 is quite fresh at the time of writing, bear in mind that your repository might not have synced the package yet, all mirrors should have it by March 27th 14:00 UTC.
# defo.ie is a test server that confirms whether ECH was successfully used
curl -v --ech hard https://defo.ie/ech-check.php
# For Encrypted Client Hello (ECH) + DNS over HTTPS (DoH)
curl -v --ech hard --doh-url https://1.1.1.1/dns-query https://defo.ie/ech-check.php
"--ech hard" tells curl to refuse the connection entirely if ECH cannot be negotiated. Or, if you would like to try it out in a container:
podman run debian:experimental /bin/bash -c 'apt install --update -t experimental -y curl && curl -v --ech hard --doh-url https://1.1.1.1/dns-query https://defo.ie/ech-check.php'
(in case you haven't noticed, apt now has the --update option for the upgrade and install commands)

For Privacy CloudFlare calls it "the last puzzle piece to privacy" in their must-read announcement: https://blog.cloudflare.com/announcing-encrypted-client-hello/. Encrypted Client Hello (rfc9849) encrypts the "which website are you connecting to?" part of the TLS handshake that was previously visible in plaintext. HTTPS-RR (rfc9460) is a DNS record type that publishes connection parameters for a service, including the public key clients need to perform ECH. DNS Over HTTPS (rfc8484) encrypts DNS queries by tunneling them over HTTPS, hiding what domains you're looking up from network observers. When all three operate together over a CDN with shared IP space, the target domain name is hidden from passive observers; the HTTPS-RR record is queried over DoH in order to retrieve the ECH key (rfc9848) for the TLS handshake. Seems like quite an important feature, and in fact the major browsers have it enabled for some time now, the trick is that they do not use OpenSSL (Chrome uses BoringSSL and Firefox uses NSS). For everyone else, the only option is to patch OpenSSL or wait until 4.0.0 is released, and so part of the reason Debian is the first distro to enable it (curl + OpenSSL + ECH) is that the OpenSSL maintainer (Sebastian Andrzej Siewior) packaged the alpha release just 3 days after it was published. Do not forget that ECH support is experimental and currently relies on the alpha release of OpenSSL.

wcurl Gets It Too Considering wcurl is just a wrapper on curl, it gets the feature for free:
wcurl --curl-options="--ech hard --doh-url https://1.1.1.1/dns-query" $URL
If you're using wcurl, you don't want to have to set parameters, this is just to show that the feature is there and if you have a .curlrc file, it can enable the feature seamlessly.

Other Debian Releases Given the ECH feature requires OpenSSL >= 4, it will not make it to Debian 13, having a small chance of going to Debian 13 Backports (emphasis on small). It should get to Debian Unstable and Debian Testing within the next couple of months as the OpenSSL GA release happens and gets packaged, but you should be able to install the package from Experimental in your Unstable and Testing systems without issues. It will also be in Debian 14 once it becomes the new Stable.

Shoulders of Giants Stephen Farrell's presentation from OpenSSL Conference 2025 has a lot of background on the work involved: Encrypted Client Hello Lessons learned from trying to do something that was probably too complicated They have been working on implementing ECH in open-source projects for years, something as big as this doesn't happen without lots of people dedicating both their paid and free times over it. I ended up being the person who enabled it on Debian, which was pretty much the least amount of work between everyone involved, but hey it's fun flipping the switch and telling you about it.

Background Since 2025, the curl developers started organizing an yearly meeting with all maintainers of curl in Operating Systems. The 2026 edition happened in March 26th: https://github.com/curl/curl/wiki/curl-distro-discussion-2026. Attendance was really good, and as you can imagine one of the topics of discussion was ECH, in which it was pointed out that having OpenSSL 4 was the main requirement but besides it nothing unusual was needed. In Debian Experimental, we have been enabling HTTPS-RR since March 2025, and OpenSSL 4.0.0 alpha was packaged just recently (2026-03-13) by Sebastian Andrzej Siewior, it's time for the next step. The curl distro meeting was just the motivation I needed to go ahead and enable it in Debian Experimental, so as part of our Debian Brasil Weekly Meetings I've prepared and uploaded the changes, while Carlos Henrique Lima Melara worked on addressing a recent test regression for Debian Unstable. Unfortunately sergiodj couldn't join and I'm sure he's jealous of the hacking session now.

Appendix While writing this, I've noticed one of the authors of the CloudFlare blogpost is the previous curl maintainer on Debian; Alessandro Ghedini let me take over the maintenance back in 2021 and today curl is maintained by a team of 4 people, it's nice to see Alessandro's involvement.

21 March 2026

C.J. Collier: The WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104

The
WWW::Mechanize::Chrome Saga: A Comprehensive Narrative of PR #104 This document synthesizes the extensive work performed from March
13th to March 20th, 2026, to harden, stabilize, and refactor the
WWW::Mechanize::Chrome library and its test suite. This
effort involved deep dives into asynchronous programming,
platform-specific bug hunting, and strategic architectural
decisions.

Part I:
The Quest for Cross-Platform Stability (March 13 16) The initial phase of work focused on achieving a green test suite
across a variety of Linux distributions and preparing for a new release.
This involved significant hardening of the library to account for
different browser versions, OS-level security restrictions, and
filesystem differences.

Key Milestones &
Engineering Decisions:
  • Fedora & RHEL-family Success: A major effort
    was undertaken to achieve a 100% pass rate on modern Fedora 43 and
    CentOS Stream 10. This required several key engineering decisions to
    handle modern browser behavior:
    • Decision: Implement Asynchronous DOM Serialization
      Fallback. Synchronous fallbacks in an async context are
      dangerous. To prevent Resource was not cached errors during
      saveResources, we implemented a fully asynchronous fallback
      in _saveResourceTree. By chaining
      _cached_document with DOM.getOuterHTML
      messages, we can reconstruct document content without blocking the event
      loop, even if Chromium has evicted the resource from its cache. This
      also proved resilient against Fedora s security policies, which often
      block file:// access.
    • Decision: Truncate Filenames for Cross-Platform
      Safety. To avoid File name too long errors,
      especially on Windows where the MAX_PATH limit is 260
      characters, filenameFromUrl was hardened. The filename
      truncation was reduced to a more conservative 150
      characters, leaving ample headroom for deeply nested CI
      temporary directories. Logic was also added to preserve file extensions
      during truncation and to sanitize backslashes from URI paths.
    • Decision: Expand Browser Discovery Paths. To
      support RHEL-based systems out-of-the-box, the
      default_executable_names was expanded to include
      headless_shell and search paths were updated to include
      /usr/lib64/chromium-browser/.
    • Decision: Mitigate Race Conditions with Stabilization Waits
      and Resilient Fetching. On fast systems,
      DOM.documentUpdated events could invalidate
      nodeIds immediately after navigation, causing XPath queries
      to fail with Could not find node with given id . A small stabilization
      sleep(0.25s) was added after page loads to ensure the DOM
      is settled. Furthermore, the asynchronous DOM fetching loop was hardened
      to gracefully handle these errors by catching protocol errors and
      returning an empty string for any node that was invalidated during
      serialization, ensuring the overall process could complete.
  • Windows Hardening:
    • Decision: Adopt Platform-Aware Watchdogs. The test
      suite s reliance on ualarm was a blocker for Windows, where
      it is not implemented. The t::helper::set_watchdog function
      was refactored to use standard alarm() (seconds) on Windows
      and ualarm (microseconds) on Unix-like systems, enabling
      consistent test-level timeout enforcement.
  • Version 0.77 Release:
    • Decision: Adopt SOP for Version Synchronization.
      The project maintains duplicate version strings across 24+ files. A
      Standard Operating Procedure was adopted to use a batch-replacement tool
      to update all sub-modules in lib/ and to always run
      make clean and perl Makefile.PL to ensure
      META.json and META.yml reflect the new
      version. After achieving stability on Linux, the project version was
      bumped to 0.77.
  • Infrastructure & Strategic Work:
    • The ad2 Windows Server 2025 instance was restored and
      optimized, with Active Directory demoted and disk I/O performance
      improved.
    • A strategic proposal for the Heterogeneous Directory
      Replication Protocol (HDRP) was drafted and published.

Part II: The
Great Async Refactor (March 17 18) Despite success on Linux, tests on the slow ad2 Windows
host were still plagued by intermittent, indefinite hangs. This
triggered a fundamental architectural shift to move the library s core
from a mix of synchronous and asynchronous code to a fully non-blocking
internal API.

Key Milestones &
Engineering Decisions:
  • Decision: Expose a _future API.
    Instead of hardcoding timeouts in the library, the core strategy was to
    refactor all blocking methods (xpath, field,
    get, etc.) into thin wrappers around new non-blocking
    ..._future counterparts. This moved timeout management to
    the test harness, allowing for flexible and explicit handling of
    stalls.
    # Example library implementation
    sub xpath($self, $query, %options)  
        return $self->xpath_future($query, %options)->get;
     
    
    sub xpath_future($self, $query, %options)  
        # Async implementation using $self->target->send_message(...)
     
  • Decision: Centralize Test Hardening in a Helper.
    A dedicated test library, t/lib/t/helper.pm, was created to
    contain all stabilization logic. Safe wrappers (safe_get,
    safe_xpath) were implemented there, using
    Future->wait_any to race asynchronous operations against
    a timeout, preventing tests from hanging.
    # Example test helper implementation
    sub safe_xpath  
        my ($mech, $query, %options) = @_;
        my $timeout = delete $options timeout    5;
        my $call_f = $mech->xpath_future($query, %options);
        my $timeout_f = $mech->sleep_future($timeout)->then(sub   Future->fail("Timeout")  );
        return Future->wait_any($call_f, $timeout_f)->get;
     
  • Decision: Refactor Node Attribute Cache.
    Investigations into flaky checkbox tests (t/50-tick.t)
    revealed that WWW::Mechanize::Chrome::Node was storing
    attributes as a flat list ([key, val, key, val]), which was
    inefficient for lookups and individual updates. The cache was refactored
    to definitively use a HashRef, providing O(1) lookups
    and enabling atomic dual-updates where both the browser property (via
    JS) and the internal library attribute are synchronized
    simultaneously.
  • Decision: Implement Self-Cancelling Socket
    Watchdog. On Windows, traditional watchdog processes often
    failed to detect parent termination, leading to 60-second hangs after
    successful tests. We implemented a new socket-based watchdog in
    t::helper that listens on an ephemeral port; the background
    process terminates immediately when the parent socket closes,
    eliminating these cumulative delays.
  • Decision: Deep Recursive Refactoring & Form
    Selection. To make the API truly non-blocking, the entire
    internal call stack had to be refactored. For example, making
    get_set_value_future non-blocking required first making its
    dependency, _field_by_name, asynchronous. This culminated
    in refactoring the entire form selection API (form_name,
    form_id, etc.) to use the new asynchronous
    _future lookups, which was a key step in mitigating the
    Windows deadlocks.
  • Decision: Fix Critical Regressions & Memory
    Cycles.
    • Evaluation Normalization: Implemented a
      _process_eval_result helper to centralize the parsing of
      results from Runtime.evaluate. This ensures consistent
      handling of return values and exceptions between synchronous
      (eval_in_page) and asynchronous (eval_future)
      calls.
    • Memory Cycle Mitigation: A significant memory
      leak was discovered where closures attached to CDP event futures (like
      for asynchronous body retrieval) would capture strong references to
      $self and the $response object, creating a
      circular reference. The established rule is to now always use
      Scalar::Util::weaken on both $self and any
      other relevant objects before they are used inside a
      ->then block that is stored on an object.
    • Context Propagation (wantarray): A
      major regression was discovered where Perl s wantarray
      context, which distinguishes between scalar and list context, was lost
      inside asynchronous Future->then blocks. This caused
      methods like xpath to return incorrect results (e.g., a
      count instead of a list of nodes). The solution was to adopt the Async
      Context Pattern : capture wantarray in the synchronous
      wrapper, pass it as an option to the _future method, and
      then use that captured value inside the future s final resolution
      block.
      # Synchronous Wrapper
      sub xpath($self, $query, %options)  
          $options  wantarray   = wantarray; # 1. Capture
          return $self->xpath_future($query, %options)->get; # 2. Pass
       
      
      # Asynchronous Implementation
      sub xpath_future($self, $query, %options)  
          my $wantarray = delete $options  wantarray  ; # 3. Retrieve
          # ... async logic ...
          return $doc->then(sub  
              if ($wantarray)   # 4. Respect
                  return Future->done(@results);
                else  
                  return Future->done($results[0]);
               
           );
       
    • Asynchronous Body Retrieval & Robust Content
      Fallbacks: Fixed a bug where decoded_content()
      would return empty strings by ensuring it awaited a
      __body_future. This was implemented by storing the
      retrieval future directly on the response object
      ($response-> __body_future ). To make this more robust,
      a tiered strategy was implemented: first try to get the content from the
      network response, but if that fails (e.g., for about:blank
      or due to cache eviction), fall back to a JavaScript
      XMLSerializer to get the live DOM content.
    • Signature Hardening: Fixed Too few arguments
      errors when using modern Perl signatures with
      Future->then. Callbacks were updated to use optional
      parameters (sub($result = undef) ... ) to gracefully
      handle futures that resolve with no value.
    • XHTML Split-Brain Bug: Resolved a
      long-standing Chromium bug (40130141) where content provided via
      setDocumentContent is parsed differently than content
      loaded from a URL. A workaround was implemented: for XHTML documents,
      WMC now uses a JavaScript-based XPath evaluation
      (document.evaluate) against the live DOM, bypassing the
      broken CDP search mechanism.

Derived Architectural Rules
& SOPs:
  • Rule: Always provide _future variants.
    Every library method that interacts with the browser via CDP must have a
    non-blocking asynchronous counterpart.
  • Rule: Centralize stabilization in the test layer.
    All timeout and retry logic should reside in the test harness
    (t/lib/t/helper.pm), not in the core library.
  • Rule: Explicitly propagate wantarray
    context. Synchronous wrappers must capture the caller s context
    and pass it down the Future chain to ensure correct
    scalar/list behavior.
  • Rule: The entire call chain must be asynchronous.
    To enable non-blocking timeouts, even a single hidden blocking call in
    an otherwise asynchronous method will cause a stall.
  • SOP: Reduce Library Noise. Diagnostic messages
    (warn, note, diag) should be
    removed from library code before commits. All such messages should be
    converted to use the internal $self->log('debug', ...)
    mechanism, ensuring a clean TAP output for CI systems.

Part III: The
MutationObserver Saga (March 19) With most of the library refactored to be asynchronous, one stubborn
test, t/65-is_visible.t, continued to fail with timeouts.
This led to an ambitious, but ultimately unsuccessful, attempt to
replace the wait_until_visible polling logic with a more
modern MutationObserver.

Key Milestones & Challenges:
  • The Theory: The goal was to replace an inefficient
    repeat sleep loop with an event-driven
    MutationObserver in JavaScript that would notify Perl
    immediately when an element s visibility changed.
  • Implementation & Cascade Failure: The
    implementation proved incredibly difficult and introduced a series of
    new, hard-to-diagnose bugs:
    1. An incorrect function signature for
      callFunctionOn_future.
    2. A critical unit mismatch, passing seconds from Perl to JavaScript s
      setTimeout, which expected milliseconds.
    3. A fundamental hang where the MutationObserver s
      JavaScript Promise would never resolve, even after the
      underlying DOM element changed.
  • Debugging Maze: Multiple attempts to fix the
    checkVisibility JavaScript logic inside the observer
    callback, including making it more robust by adding DOM tree traversal
    and extensive console.log tracing, failed to resolve the
    hang. This highlighted the opacity and difficulty of debugging complex,
    cross-language asynchronous interactions, especially when dealing with
    low-level browser APIs.

Procedural Learning:
Granular Edits The effort was plagued by procedural missteps in using automated
file-editing tools. Initial attempts to replace large code blocks in a
single operation led to accidental code loss and match failures.
  • Decision: Adopt Delete, then Add Workflow.
    Following forceful user correction, a new SOP was established for all
    future modifications:
    1. Isolate: Break the file into small, manageable
      chunks (e.g., 250 lines).
    2. Delete: Perform a delete operation by replacing
      the old code block with an empty string.
    3. Add: Perform an add operation by inserting the
      new code into the empty space.
    4. Verify: Verifying each atomic step before
      proceeding. This granular process, while slower, ensured surgical
      precision and regained technical control over the large
      Chrome.pm module.
The consistent failure of the MutationObserver approach
eventually led to the decision to abandon it in favor of stabilizing the
original, more transparent implementation.

Part IV:
Reversion and Final Stabilization (March 20) After exhausting all reasonable attempts to fix the
MutationObserver, a strategic decision was made to revert
to the simpler, more transparent polling implementation and fix it
correctly. This proved to be the correct path to a stable solution.

Key Milestones &
Engineering Decisions:
  • Decision: Perform Strategic Reversion. The
    MutationObserver implementation, when integrated via
    callFunctionOn_future with awaitPromise,
    proved fundamentally unstable. Its JavaScript promise would consistently
    fail to resolve, causing indefinite hangs. A decision was made to
    revert all MutationObserver code from
    WWW::Mechanize::Chrome.pm and restore the original
    repeat sleep polling mechanism. A stable,
    understandable solution was prioritized over an elegant but broken
    one.
  • Decision: Correct Timeout Delegation in the
    Harness. The root cause of the original timeout failure was
    identified as a race condition in the t/lib/t/helper.pm
    test harness. The safe_wait_until_* wrappers were
    implementing their own timeout (via wait_any and
    sleep_future) that raced against the underlying polling
    function s internal timeout. This led to intermittent failures on slow
    machines. The helpers were refactored to delegate all timeout
    management to the library s polling functions, ensuring a
    single, authoritative timer controlled the operation.
  • Decision: Optimize Polling Performance. At the
    user s request, the polling interval was reduced from 300ms to
    150ms. This modest performance improvement reduced the
    test suite s wallclock execution time by over a second while maintaining
    stability.
  • Decision: Tune Test Watchdogs. The global watchdog
    timeout was adjusted to 12 seconds, specifically calculated as 1.5x the
    observed real execution time of the optimized test. This provides a
    data-driven safety margin for CI.

Part
V: The Last Bug A Platform-Specific Memory Leak (March 20) With all other tests passing, a single memory leak failure in
t/78-memleak.t persisted, but only on the Windows
ad2 environment. This required a different approach than
the timeout fixes.

Key Milestones:
  • The Bug: A strong reference cycle involving the
    on_dialog event listener was not being broken on Windows,
    despite multiple attempts to fix it. Fixes that worked on Linux (such as
    calling on_dialog(undef) in DESTROY) were not
    sufficient on the Windows host.
  • The Diagnosis: The issue was determined to be a
    deep, platform-specific interaction between Perl s garbage collector,
    the IO::Async event loop implementation on Windows, and the
    Test::Memory::Cycle module. The cycle report was identical
    on both platforms, but the cleanup behavior was different.
  • Failed Attempts: A series of increasingly
    aggressive fixes were attempted to break the cycle, including:
    1. Moving the on_dialog(undef) call from
      close() to DESTROY().
    2. Explicitly deleteing the listener and callback
      properties from the object hash in DESTROY.
    3. Swapping between $self->remove_listener and
      $self->target->unlisten in a mistaken attempt to find
      the correct un-registration method.
  • Pragmatic Solution: After exhausting all reasonable
    code-level fixes without a resolution on Windows, the user opted to mark
    the failing test as a known issue for that specific platform.
  • Final Fix: The single failing test in
    t/78-memleak.t was wrapped in a conditional
    TODO block that only executes on Windows
    (if ($^O =~ /MSWin32/i)), formally acknowledging the bug
    without blocking the build. This allows the test suite to pass in CI
    environments while flagging the issue for future, deeper
    investigation.

Part VI: CI Hardening (March
20) A final failure in the GitHub Actions CI environment revealed one
last configuration flaw.

Key Milestones:
  • The Bug: The CI was running
    prove --nocount --jobs 3 -I local/ -bl xt t directly. This
    command was missing the crucial -It/lib include path, which
    is necessary for test files to locate the t::helper module.
    This resulted in nearly all tests failing with
    Can't locate t/helper.pm in @INC.
  • The Investigation: An analysis of
    Makefile.PL revealed a custom MY::test block
    specifically designed to inject the -It/lib flag into the
    make test command. This confirmed that
    make test is the correct, canonical way to run the test
    suite for this project.
  • The Fix: The
    .github/workflows/linux.yml file was modified to replace
    the direct prove call with make test in the
    Run Tests step. This ensures the CI environment runs the
    tests in the exact same way as a local developer, with all necessary
    include paths correctly configured by the project s build system.

Final Outcome After this long and arduous journey, the
WWW::Mechanize::Chrome test suite is now stable and
passing on all targeted platforms, with known
platform-specific issues clearly documented in the code. The project is
in a vastly more robust and reliable state.

19 March 2026

Otto Kek l inen: Automated security validation: How 7,000+ tests shaped MariaDB's new AppArmor profile

Featured image of post Automated security validation: How 7,000+ tests shaped MariaDB's new AppArmor profileLinux kernel security modules provide a good additional layer of security around individual programs by restricting what they are allowed to do, and at best block and detect zero-day security vulnerabilities as soon as anyone tries to exploit them, long before they are widely known and reported. However, the challenge is how to create these security profiles without accidentally also blocking legitimate actions. For MariaDB in Debian and Ubuntu, a new AppArmor profile was recently created by leveraging the extensive test suite with 7000+ tests, giving good confidence that AppArmor is unlikely to yield false positive alerts with it. AppArmor is a Mandatory Access Control (MAC) system, meaning that each process controlled by AppArmor has a sort of an allowlist called profile that defines all capabilities and file paths a program can access. If a program tries to do something not covered by the rules in its AppArmor profile, the action will be denied on the Linux kernel level and a warning logged in the system journal. This additional security layer is valuable because even if a malicious user found a security vulnerability some day in the future, the AppArmor profile severely restricts the ability to exploit it and gain access to the operating system. AppArmor was originally developed by Novell for use in SUSE Linux, but nowadays the main driver is Canonical and AppArmor is extensively used in Ubuntu and Debian, and many of their derivatives (e.g. Linux Mint, Pop!_OS, Zorin OS) and in Arch. AppArmor s benefit compared to the main alternative SELinux (used mainly in the RedHat/Fedora ecosystem) is that AppArmor is easier to manage. AppArmor continues to be actively developed, with new major version 5.0 expected to arrive soon. I also have some personal history contributing some notification handler scripts in Python and I also created the website that AppArmor.net still runs.

Regular review of denials in the system log required Any system administrator using Debian/Ubuntu needs to know how to check for AppArmor denials. The point of using AppArmor is kind of moot if nobody is checking the denials. When AppArmor blocks an action, it logs the event to the system audit or kernel logs. Understanding these logs is crucial for troubleshooting custom configurations or identifying potential security incidents. To view recent denials, check /var/log/audit/audit.log or run journalctl -ke --grep=apparmor. A typical denial entry for MariaDB will look like this (split across multiple lines for legibility):
msg=audit(1700000000.123:456): apparmor="DENIED" operation="open"
profile="/usr/sbin/mariadbd" name="/custom/data/path/test.ibd" pid=1234
comm="mariadbd" requested_mask="r" denied_mask="r" fsuid=1000 ouid=0
How to interpret this output:
  • msg=audit( ): The audit timestamp and event serial number.
  • apparmor= DENIED : Indicates AppArmor blocked the action.
  • operation: The action being attempted (e.g., open, mknod, file_mmap, file_perm).
  • profile: The specific AppArmor profile that triggered the denial (in this case the /usr/sbin/mariadbd profile).
  • name: The file path or resource that was blocked. In the example above, a custom data path was denied access because it wasn t defined in the profile s allowed abstractions.
  • comm: The command name that triggered the denial (here mariadbd).
  • requested_mask / denied_mask: Shows the permissions requested (e.g., r for read, w for write).
  • pid: The process ID.
  • fsuid: The user ID of the process attempting the action.
  • ouid: The owner user ID of the target file.
If an action seems legit and should not be denied, the sysadmin needs to update the existing rules at /etc/apparmor.d/ or drop a local customization file in at /etc/apparmor.d/local/. If the denied action looks malicious, the sysadmin should start a security investigation and if needed report a suspected zero-day vulnerability to the upstream software vendor (e.g. Ubuntu customers to Canonical, or MariaDB customers to MariaDB).

AppArmor in MariaDB - not a novel thing, and not easy to implement well Based on old bug reports, there was an AppArmor profile already back in 2011, but it was removed in MariaDB 5.1.56 due to backlash from users running into various issues. A new profile was created in 2015, but kept opt-in only due to the risk of side effects. It likely had very few users and saw minimal maintenance, getting only a handful of updates in the past 10 years. The primary challenge in using mandatory access control systems with MariaDB lies in the sheer breadth of MariaDB s operational footprint with diverse storage engines and plugins. Also the code base in MariaDB assumes that system calls to Linux always work which they do under normal circumstances and do not handle errors well if AppArmor suddenly denies a system call. MariaDB is also a large and complex piece of software to run and operate, and it can be very challenging for system administrators to root-cause that a misbehavior in their system was due to AppArmor blocking a single syscall. Ironically, AppArmor is most beneficial exactly due to the same reasons for MariaDB. The larger and more complex a software is, the larger are the odds of a security vulnerability arising between the various components. And AppArmor profile helps reduce this complexity down to a single access list. Over the years there has been users requesting to get the AppArmor profile back, such as Debian Bug#875890 since 2017. The need was raised recently again by the Ubuntu security team during the MariaDB Ubuntu main inclusion review in 2025, which prompted a renewed effort by Debian/Ubuntu developers, mainly myself and Aquila Macedo, with upstream MariaDB assistance from Daniel Black.

A fresh approach: leverage the MariaDB test suite for automated testing and the open source community for reviews The key to creating a robust AppArmor profile is the ability to know in detail what is expected and normal behavior of the system. One could in theory read all of the source code in MariaDB, but with over two million lines, it is of course not feasible in practice. However, MariaDB does have a very extensive 7000+ test suite, and running it should trigger most code paths in MariaDB. Utilizing the test suite was key in creating the new AppArmor profile for MariaDB: we installed MariaDB on a Ubuntu system, enabled AppArmor in complain mode and iterated on the allowlist by running the full mariadb-test-run with all MariaDB plugins and features enabled until we had a comprehensive yet clean list of rules. To be extra diligent, we also reworked the autopkgtest for MariaDB in Debian and Ubuntu CI systems to run with the AppArmor profile enabled and to print all AppArmor notices at the end of the run, making it easy to detect now and in the future if the MariaDB test suite triggers any AppArmor denials. If any test fails, the release would not get promoted further, protecting users from regressions. While developing and triggering manual test runs we used the maximal achievable test suite with 7177 tests. The test is however so extensive it takes over two hours to run, and it also has some brittle tests, so the standard test run in Debian and Ubuntu autopkgtest is limited just to MariaDB s main suite with about 1000 tests. Having some tests fail while testing the AppArmor profile was not a problem, because we didn t need all the tests to pass we merely needed them to run as many code paths as possible to see if they run any system calls not accounted for in the AppArmor profile. Note that extending the profile was not just mechanical copying of log messages to the profile. For example, even though a couple of tests involve running the dash shell, we decided to not allow it, as it opens too much of a path for a potential exploit to access the operating system. The result of this effort is a modernized, robust profile that is now production-ready. Those interested in the exact technical details can read the Debian Bug#1130272 and the Merge Request discussions at salsa.debian.org, which hosts the Debian packaging source code.

Now available in Debian unstable, soon Ubuntu feedback welcome! Even though the file is just 200 lines long, the work to craft it spanned several weeks. To minimize risk we also did a gradual rollout by releasing the first new profile version in complain mode, so AppArmor only logs would-be-denials without blocking anything. The AppArmor profile was switched to enforce mode only in the very latest MariaDB revision 1:11.8.6-4 in Debian, and a NEWS item issued to help increase user awareness of this change. It is also slated for the upcoming Ubuntu 26.04 Resolute Raccoon release next month, providing out-of-the-box hardening for the wider ecosystem. While automated testing is extensive, it cannot simulate everything. Most notably various complicated replication topologies and all Galera setups are likely not covered. Thus, I am calling on the community to deploy this profile and monitor for any audit denials in the kernel logs. If you encounter unexpected behavior or legitimate denials, please submit a bug report via the Debian Bug Tracking System. To ensure you are running the latest MariaDB version, run apt install --update --yes mariadb-server. To view the latest profile rules, run cat /etc/apparmor.d/mariadbd and to see if it is enforced review the output of aa-status. To quickly check if there were any AppArmor denials, simply run journalctl -k grep -i apparmor grep -i mariadb.

Systemd hardening also adopted as security features keep evolving For those interested in MariaDB security hardening, note that also new systemd hardening options were rolled out in Debian/Ubuntu recently. Note that Debian and Ubuntu are mainly volunteer-driven open source developer communities, and if you find this topic interesting and you think you have the necessary skills, feel free to submit your improvement ideas as Merge Requests at salsa.debian.org/mariadb-team. If your improvement suggestions are not Debian/Ubuntu specific, please submit them directly to upstream at GitHub.com/MariaDB.

15 March 2026

Vasudev Kamath: Using Gemini CLI to Configure the Hyprland Window Manager

What led to this experiment? Well, for one, Well, for one, there was a thought shared by Andrej Karpathy regarding the shift towards "Agentic" workflows.
"The future of software is not just 'tools', but 'agents' that can navigate complex tasks on your behalf."

Andrej Karpathy

Recently, I spoke with Ritesh, who mentioned his success using the Gemini CLI to debug an idle power drain issue on his laptop. I wanted to experiment with this myself, and I had the perfect use case: configuring the Hyprland Window Manager on my aging laptop. The machine is nearly eight years old with 12GB of RAM (upgraded from the original 4GB). I found that GNOME and KDE were becoming overkill, often leading to system freezes when running multiple AI-powered IDEs like Antigravity and VS Code with Co-pilot. Coincidentally, I noticed my Jio number had a "Google One 2TB" and "Google AI Premium" plan available to claim. I claimed it, and now here I am, experimenting with the Gemini CLI.
Getting Started First, you need to install geminicli. It is an open-source project, and currently, the easiest way to install it is via the Node Package Manager (npm):
npm install -g @google/gemini-cli
Next, we need to create a context for Gemini a set of instructions for it to follow throughout the project. This is managed via a GEMINI.md file. I went to Google Gemini, explained my requirements, and asked it to generate one for me. My requirements were:
  1. A minimalist but fully functional session, comparable to my existing GNOME setup.
  2. Basic functionalities including wallpaper, screen locks, and a status bar with system icons.
  3. Swapping Control and Caps Lock (a must for Emacs users).
  4. Mandatory permission prompts for privileged operations; otherwise, it can work freely within a specified directory.
  5. Persistent memory/artifacts for the session.
  6. Permission to inspect my current session to understand the existing hardware and software configuration.
The goal was to reduce bloat and reclaim memory for heavy applications like Antigravity and VS Code. Gemini provided the following GEMINI.md file:
# Role: Hyprland Configuration Specialist (Minimalist & High-Performance)
You are a Linux Systems Engineer specializing in migrating users from heavy
Desktop Environments to minimalist, tiling-based Wayland sessions on Debian.
Your goal is to maximize available RAM for heavy applications while maintaining
essential desktop features.
## 1. Environment & Persona
- **Target OS:** Debian (Linux)
- **Target WM:** Hyprland
- **Hardware:** ThinkPad E470 (i5-7th Gen, 12GB RAM)
- **User Profile:** Emacs user, prioritizes "anti-gravity" (zero bloat).
- **Tone:** Technical, concise, and security-conscious.
## 2. Core Functional Requirements
- **Status Bar:**  waybar  (with CPU, RAM, Network, and Battery icons).
- **Wallpaper:**  swww  or  hyprpaper .
- **Screen Lock:**  hyprlock  +  hypridle .
- **Input Mapping:** Swap Control and Caps Lock ( kb_options = ctrl:nocaps ).
## 3. Operational Constraints
- **Permission First:** Ask before using  sudo  or writing outside the work directory.
- **Inspection:** Use  hyprctl ,  lsmod , or  gsettings  for compatibility checks.
- **Artifact Management:** Update  MEMORY.md  after every major step.
Gemini also recommended creating a MEMORY.md file to track progress. Interestingly, Gemini remembered that I had previously shared dmidecode output, so it already knew my exact laptop specs. (Though it did include a note about me being a "daily rice eater" I assume it meant Linux 'ricing,' though I actually use Debian Unstable, not Stable!). The AI suggested starting with this prompt:
Read MEMORY.md and GEMINI.md. Based on my hardware, give me a shell script to inspect my current GNOME environment so we can start replicating the session basics.
How Did It Go? I initialized a git repository for these files and instructed the Gemini CLI to update GEMINI.md and commit changes after every major step so I could track the progress. The workflow looked like this:
  1. Inspection: It created a script to extract my GNOME settings.
  2. Configuration: Once I provided the output, it began configuring Hyprland.
  3. Utilities: It generated an installation script for all required Wayland utilities.
  4. Validation: All changes were staged in a hypr-config-draft folder. I had Gemini verify them using hyprland --verify-config before moving them to ~/.config/hypr.
Most things worked immediately, but I hit a snag with the wallpaper. Even after generating the config, hyprpaper failed to display anything. The AI got stuck in a loop trying to debug it. I eventually spawned a second Gemini CLI instance to review the code and logs. The debug log showed: 'DEBUG ]: Monitor eDP-1 has no target: no wp will be created'. It turns out the configuration format was outdated. By feeding the Hyprpaper Wiki into the AI, it finally corrected the config, and the wallpaper appeared. After that, it successfully fixed an ssh-agent issue and configured a clipboard manager with custom keybindings.
Learnings I have used window managers for a long time because my hardware was rarely top-of-the-line. However, I had moved back to KDE/GNOME with the arrival of Wayland because most of my preferred WMs were X11-based. Manually configuring a window manager is a painful, time-consuming process involving endless wiki-trawling and trial-and-error. What usually takes weeks took only a few hours with the Gemini CLI. AI isn't perfect I still had to step in and guide it when it hit a wall but the efficiency gain is undeniable. If you're interested in the configuration or the history of the session, you can find the repository here. I still have a few pending items in MEMORY.md, but I'll tackle those next time!

10 March 2026

Freexian Collaborators: Debian Contributions: Opening DebConf 26 Registration, Debian CI improvements and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-02 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

DebConf 26 Registration, by Stefano Rivera, Antonio Terceiro, and Santiago Ruano Rinc n DebConf 26, to be held in Santa Fe Argentina in July, has opened for registration and event proposals. Stefano, Antonio, and Santiago all contributed to making this happen. As always, some changes needed to be made to the registration system. Bigger changes were planned, but we ran out of time to implement them for DebConf 26. All 3 of us have had experience in hosting local DebConf events in the past and have been advising the DebConf 26 local team.

Debian CI improvements, by Antonio Terceiro Debian CI is the platform responsible for automated testing of packages from the Debian archive, and its results are used by the Debian Release team automation as Quality Assurance to control the migration of packages from Debian unstable into testing, the base for the next Debian release. Antonio started developing an incus backend, and that prompted two rounds of improvements to the platform, including but not limited to allowing user to select a job execution backend (lxc, qemu) during the job submission, reducing the part of testbed image creation that requires superuser privileges and other refactorings and bug fixes. The platform API was also improved to reduce disruption when reporting results to the Release Team automation after service downtimes. Last, but not least, the platform now has support for testing packages against variants of autopkgtest, which will allow the Debian CI team to test new versions of autopkgtest before making releases to avoid widespread regressions.

Miscellaneous contributions
  • Carles improved po-debconf-manager while users requested features / found bugs. Improvements done - add packages from unstable instead of just salsa.debian.org, upgrade and merge templates of upgraded packages, finished adding typing annotations, improved deleting packages: support multiple line texts, add debug to see subprocess.run commands, etc.
  • Carles, using po-debconf-manager, reviewed 7 Catalan translations and sent bug reports or MRs for 11 packages. Also reviewed the translations of fortunes-debian-hints and submitted possible changes in the hints.
  • Carles submitted MRs for reportbug (reportbug --ui gtk detecting the wrong dependencies), devscript (delete unused code from debrebuild and add recommended dependency), wcurl (format help for 80 columns). Carles submitted a bug report for apt not showing the long descriptions of packages.
  • Carles resumed effort for checking relations (e.g. Recommends / Suggests) between Debian packages. A new codebase (still in early stages) was started with a new approach in order to detect, report and track the broken relations.
  • Emilio drove several transitions, most notably the haskell transition and the glibc/gcc-15/zlib transition for the s390 31-bit removal. This last one included reviewing and requeueing lots of autopkgtests due to britney losing a lot of results.
  • Emilio reviewed and uploaded poppler updates to experimental for a new transition.
  • Emilio reviewed, merged and deployed some performance improvements proposed for the security-tracker.
  • Stefano prepared routine updates for pycparser, python-confuse, python-cffi, python-mitogen, python-pip, wheel, platformdirs, python-authlib, and python-virtualenv.
  • Stefano updated Python 3.13 and 3.14 to the latest point releases, including security updates, and did some preliminary work for Python 3.15.
  • Stefano reviewed changes to dh-python and merged MRs.
  • Stefano did some debian.social sysadmin work, bridging additional IRC channels to Matrix.
  • Stefano and Antonio, as DebConf Committee Members, reviewed the DebConf 27 bids and took part in selecting the Japanese bid to host DebConf 27.
  • Helmut sent patches for 29 cross build failures.
  • Helmut continued to maintain rebootstrap addressing issues relating to specific architectures (such as musl-linux-any, hurd-any or s390x) or specific packages (such as binutils, brotli or fontconfig).
  • Helmut worked on diagnosing bugs such as rocblas #1126608, python-memray #1126944 upstream and greetd #1129070 with varying success.
  • Antonio provided support for multiple MiniDebConfs whose websites run wafer + wafer-debconf (the same stack as DebConf itself).
  • Antonio fixed the salsa tagpending webhook.
  • Antonio sent specinfra upstream a patch to fix detection of Debian systems in some situations.
  • Santiago reviewed some Merge Requests for the Salsa CI pipeline, including !703 and !704, that aim to improve how the build source job is handled by Salsa CI. Thanks a lot to Jochen for his work on this.
  • In collaboration with Emmanuel Arias, Santiago proposed a couple of projects for the Google Summer of Code (GSoC) 2026 round. Santiago has been reviewing applications and giving feedback to candidates.
  • Thorsten uploaded new upstream versions of ipp-usb, brlaser and gutenprint.
  • Rapha l updated publican to fix an old bug that became release critical and that happened only when building with the nocheck profile. Publican is a build dependency of the Debian s Administrator Handbook and with that fix, the package is back into testing.
  • Rapha l implemented a small feature in Debusine that makes it possible to refer to a collection in a parent workspace even if a collection with the same name is present in the current workspace.
  • Lucas updated the current status of ruby packages affecting the Ruby 3.4 transition after a bunch of updates made by team members. He will follow up on this next month.
  • Lucas joined the Debian orga team for GSoC this year and tried to reach out to potential mentors.
  • Lucas did some content work for MiniDebConf Campinas - Brazil.
  • Colin published minor security updates to bookworm and trixie for CVE-2025-61984 and CVE-2025-61985 in OpenSSH, both of which allowed code execution via ProxyCommand in some cases. The trixie update also included a fix for mishandling of PerSourceMaxStartups.
  • Colin spotted and fixed a typo in the bug tracking system s spam-handling rules, which in combination with a devscripts regression caused bts forwarded commands to be discarded.
  • Colin ported 12 more Python packages away from using the deprecated (and now removed upstream) pkg_resources module.
  • Anupa is co-organizing MiniDebConf Kanpur with Debian India team. Anupa was responsible for preparing the schedule, publishing it on the website, co-ordination with the fiscal host in addition to attending meetings.
  • Anupa attended the Debian Publicity team online sprint which was a skill sharing session.

9 March 2026

Colin Watson: Free software activity in February 2026

My Debian contributions this month were all sponsored by Freexian. You can also support my work directly via Liberapay or GitHub Sponsors. OpenSSH I released bookworm and trixie fixes for CVE-2025-61984 and CVE-2025-61985, both allowing code execution via ProxyCommand in some cases. The trixie update also included a fix for openssh-server: refuses further connections after having handled PerSourceMaxStartups connections. bugs.debian.org administration Gioele Barabucci reported that some messages to the bug tracking system generated by the bts command were being discarded. While the regression here was on the client side, I found and fixed a typo in our SpamAssassin configuration that was failing to apply a bonus specifically to forwarded commands, mitigating the problem. Python packaging New upstream versions: Porting away from the deprecated (and now removed from upstream setuptools) pkg_resources: Other build/test failures: Other bugs: I added a manual page symlink to make the documentation for Testsuite: autopkgtest-pkg-pybuild easier to find. I backported python-pytest-unmagic, a more recent version of pytest-django, and a more recent version of django-cte to trixie for use in Debusine. Rust packaging I also packaged rust-garde and rust-garde-derive, which are part of the pile of work needed to get the ruff packaging back in shape (which is a project I haven t decided if I m going to take on for real, but I thought I d at least chip away at a bit of it). Other bits and pieces Code reviews

2 March 2026

Ben Hutchings: FOSS activity in February 2026

22 February 2026

Otto Kek l inen: Do AI models still keep getting better, or have they plateaued?

Featured image of post Do AI models still keep getting better, or have they plateaued?The AI hype is based on the assumption that the frontier AI labs are producing better and better foundational models at an accelerating pace. Is that really true, or are people just in sort of a mass psychosis because AI models have become so good at mimicking human behavior that we unconsciously attribute increasing intelligence to them? I decided to conduct a mini-benchmark of my own to find out if the latest and greatest AI models are actually really good or not.

The problem with benchmarks Every time any team releases a new LLM, they boast how well it performs on various industry benchmarks such as Humanity s Last Exam, SWE-Bench and Ai2 ARC or ARC-AGI. An overall leaderboard can be viewed at LLM-stats. This incentivizes teams to optimize for specific benchmarks, which might make them excel on specific tasks while general abilities degrade. Also, the older a benchmark dataset is, the more online material there is discussing the questions and best answers, which in turn increases the chances of newer models trained on more recent web content scoring better. Thus I prefer looking at real-time leaderboards such as the LM Arena leaderboard (or OpenCompass for Chinese models that might be missing from LM Arena). However, even though the LM Arena Elo score is rated by humans in real-time, the benchmark can still be played. For example, Meta reportedly used a special chat-optimized model instead of the actual Llama 4 model when getting scored on the LM Arena. Therefore I trust my own first-hand experience more than the benchmarks for gaining intuition. Intuition however is not a compelling argument in discussions on whether or not new flagship AI models have plateaued. Thus, I decided to devise my own mini-benchmark so that no model could have possibly seen it in its training data or be specifically optimized for it in any way.

My mini-benchmark I crafted 6 questions based on my own experience using various LLMs for several years and having developed some intuition about what kinds of questions LLMs typically struggle with. I conducted the benchmark using the OpenRouter.ai chat playroom with the following state-of-the-art models: OpenRouter.ai is great as it very easy to get responses from multiple models in parallel to a single question. Also it allows to turn off web search to force the models to answer purely based on their embedded knowledge. OpenRouter.ai Chat playroom Common for all the test questions is that they are fairly straightforward and have a clear answer, yet the answer isn t common knowledge or statistically the most obvious one, and instead requires a bit of reasoning to get correct. Some of these questions are also based on myself witnessing a flagship model failing miserably to answer it.

1. Which cities have hosted the Olympics more than just once? This question requires accounting for both summer and winter Olympics, and for Olympics hosted across multiple cities. The variance in responses comes from if the model understands that Beijing should be counted as it has hosted both summer and winter Olympics. Interestingly GPT was the only model to not mention Beijing at all. Some variance also comes from how models account for co-hosted Olympics. For example Cortina should be counted as having hosted the Olympics twice, in 1956 and 2026, but only Claude, Gemini and Kimi pointed this out. Stockholm s 1956 hosting of the equestrian games during the Melbourne Olympics is a special case, which GPT, Gemini and Kimi pointed out in a side note. Some models seem to have old training material, and for example Grok assumes the current year is 2024. All models that accounted for awarded future Olympics (e.g. Los Angeles 2028) marked them clearly as upcoming. Overall I would judge that only GPT and MinMax gave incomplete answers, while all other models replied as the best humans could reasonably have.

2. If EUR/USD continues to slide to 1.5 by mid-2026, what is the likely effect on BMW s stock price by end of 2026? This question requires mapping the currency exchange rate to historic value, dodging the misleading word slide , and reasoning on where the revenue of a company comes from and how a weaker US dollar affects it in multiple ways. I ve frequently witnessed flagship models get it wrong how interest rates and exchange rates work. Apparently the binary choice between up or down is somehow challenging to the internal statistical model in the LLMs on a topic where there are a lot of training material that talk about both things being likely to happen, and choosing between them requires specifically reasoning about the scenario at hand and disregarding general knowledge of the situation. However, this time all the models concluded correctly that a weak dollar would have a negative overall effect on the BMW stock price. Gemini, GLM, Qwen and Kimi also mention the potential hedging effect of BMW s X-series production in South Carolina for worldwide export.

3. What is the Unicode code point for the traffic cone emoji? This was the first question where the the flagship models clearly still struggle in 2026. The trap here is that there is no traffic cone emoji, so an advanced model should simply refuse to give any Unicode numbers at all. Most LLMs however have an urge to give some answer, leading to hallucinations. Also, as the answer has a graphical element to it, the LLM might not understand how the emoji looks in ways that would be obvious to a human, and thus many models claim the construction sign emoji is a traffic cone, which it is not. By far the worst response was from GPT, that simply hallucinates and stops there: OpenAIs GPT-5.2 completely wrong answer to traffic cone emoji question While Gemini and Grok were among the three models not falling into this trap, the response from Claude was exemplary good: Claude Opus 4.6 exemplary good answer to traffic cone emoji question

4. Which languages are the 10 most commonly spoken in the world? For each language, count from 1 to 5. This seemingly simple question can become quite complex when you account for the fact that other languages might also have their own character system, both for words and the numerals (e.g. Arabic /one vs /1), and that numbers might have variations depending on what the thing being counted is (e.g in French object gender un/une), or whether plain numbers and occurrences have their own words (e.g. Russian /odin vs /raz). I ve also witnessed older models frequently mix up on what parts of the reply should be using what language. For a human this question is trivial, assuming they know the language. For LLMs it is surprisingly challenging. Grok and MinMax only gave a list of the most popular languages and stopped there without showing any words in the languages. Qwen got a bit further by trying to list numbers in each language, but the lists were only the numbers 1 to 5: Qwen3.5 Plus count to five in top 10 languages I have seen Grok do something similar before, which in fact inspired me to include this question in my mini-benchmark. GPT got a bit further, but for Hindi, Arabic and Bengali it listed the numerals in local script, not the number words. Gemini, GLM and Kimi gave a complete and correct answer as a list, while the absolute best answer and presentation was by Claude, that gave the table below:
Claude Opus 4.6 10 Most Commonly Spoken Languages (by Total Speakers)
# Language 1 2 3 4 5
1 English One Two Three Four Five
2 Mandarin Chinese (y ) ( r) (s n) (s ) (w )
3 Hindi (ek) (do) (t n) (ch r) (p ch)
4 Spanish Uno Dos Tres Cuatro Cinco
5 French Un Deux Trois Quatre Cinq
6 Arabic (w id) (ithn n) (thal tha) (arba a) (khamsa)
7 Bengali (ek) (dui) (tin) (ch r) (p ch)
8 Portuguese Um Dois Tr s Quatro Cinco
9 Russian (odin) (dva) (tri) (chetyre) (pyat')
10 Japanese (ichi) (ni) (san) (shi) (go)
Note: Rankings are based on estimated total speakers (native + non-native). Rankings can vary depending on the source and methodology used.

5. Count the number of drone launch pads in the picture. Together with the question, I uploaded this picture: Drone pad picture attached to question above A human can easily count that there are 10 rows and 30+ columns in the grid, but because the picture resolution isn t good enough, the exact number of columns can t be counted, and the answer should be that there are at least 300 launch pads in the picture. GPT and Grok both guessed the count is zero. Instead of hallucinating some number they say zero, but it would have been better to not give any number at all, and just state that they are unable to perform the task. Gemini gave as its answer 101 , which is quite odd, but reading the reasoning section, it seems to have tried counting items in the image without reasoning much about what it is actually counting and that there is clearly a grid that can make the counting much easier. Both Qwen and Kimi state they can see four parallel structures, but are unable to count drone launch pads. The absolutely best answer was given by Claude, which counted 10-12 rows and 30-40+ columns, and concluded that there must be 300-500 drone launch pads. Very close to best human level - impressive! This question applied only to multi-modal models that can see images, so GLM and MinMax could not give any response.

6. Explain why I am getting the error below, and what is the best way to fix it? Together with the question above, I gave this code block:
$ SH_SCRIPTS="$(mktemp; grep -Irnw debian/ -e '^#!.*/sh'   sort -u   cut -d ':' -f 1   true)"
$ shellcheck -x --enable=all --shell=sh "$SH_SCRIPTS"
/tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: /tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: openBinaryFile: does not exist (No such file or directory)
Older models would easily be misled by the last error message thinking that a file went missing, and focus on suggesting changes to the complex-looking first line. In reality the error is simply caused by having the quotes around the $SH_SCRIPTS, resulting in the entire multi-line string being passed as a single argument to shellcheck. So instead of receiving two separate file paths, shellcheck tries to open one file literally named /tmp/tmp.xQOpI5Nljx\ndebian/tests/integration-tests. Incorrect argument expansion is fairly easy for an experienced human programmer to notice, but tricky for an LLM. Indeed, Grok, MinMax, and Qwen fell for this trap and focused on the mktemp, assuming it somehow fails to create a file. Interestingly GLM fails to produce an answer at all, as the reasoning step seems to be looping, thinking too much about the missing file, but not understanding why it would be missing when there is nothing wrong with how mktemp is executed. Claude, Gemini, and Kimi immediately spot the real root cause of passing the variable quoted and suggested correct fixes that involve either removing the quotes, or using Bash arrays or xargs in a way that makes the whole command also handle correctly filenames with spaces in them.

Conclusion
Model Sports Economics Emoji Languages Visual Shell Score
Claude Opus 4.6 6/6
GPT-5.2 ~ 2.5/6
Grok 4.1 3/6
Gemini 3.1 Pro 5/6
GLM 5 ? N/A 3/5
MinMax M2.5 N/A 1/5
Qwen3.5 Plus ~ 2.5/6
Kimi K2.5 4/6
Obviously, my mini-benchmark only had 6 questions, and I ran it only once. This was obviously not scientifically rigorous. However it was systematic enough to trump just a mere feeling. The main finding for me personally is that Claude Opus 4.6, the flagship model by Anthropic, seems to give great answers consistently. The answers are not only correct, but also well scoped giving enough information to cover everything that seems relevant, without blurping unnecessary filler. I used Claude extensively in 2023-2024 when it was the main model available at my day work, but for the past year I had been using other models that I felt were better at the time. Now Claude seems to be the best-of-the-best again, with Gemini and Kimi as close follow-ups. Comparing their pricing at OpenRouter.ai the Kimi K2.5 price of $0.6 / million tokens is almost 90% cheaper than the Claude Opus 4.6 s $5.0 / million tokens suggests that Kimi K2.5 offers the best price-per-performance ratio. Claude might be cheaper with a monthly subscription directly from Anthropic, potentially narrowing the price gap. Overall I do feel that Anthropic, Google and Moonshot.ai have been pushing the envelope with their latest models in a way that one can t really claim that AI models have plateaued. In fact, one could claim that at least Claude has now climbed over the hill of AI slop and consistently produces valuable results. If and when AI usage expands from here, we might actually not drown in AI slop as chances of accidentally crappy results decrease. This makes me positive about the future. I am also really happy to see that there wasn t just one model crushing everybody else, but that there are at least three models doing very well. As an open source enthusiast I am particularly glad to see that Moonshot.ai s Kimi K2.5 is published with an open license. Given the hardware, anyone can run it on their own. OpenRouter.ai currently lists 9 independent providers alongside Moonshot.ai itself, showcasing the potential of open-weight models in practice. If the pattern holds and flagship models continue improving at this pace we might look back at 2026 as the year AI stopped feeling like a call center associate and started to resemble a scientific researcher. While new models become available we need to keep testing, keep questioning, and keep our expectations grounded in actual performance rather than press releases. Thanks to OpenRouter.ai for providing a great service that makes testing various models incredibly easy!

4 February 2026

Dirk Eddelbuettel: littler 0.3.23 on CRAN: More Features (and Fixes)

max-heap image The twentythird release of littler as a CRAN package landed on CRAN just now, following in the now twenty year history (!!) as a (initially non-CRAN) package started by Jeff in 2006, and joined by me a few weeks later. littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only began to do in later years. littler lives on Linux and Unix, has its difficulties on macOS due to some-braindeadedness there (who ever thought case-insensitive filesystems as a default were a good idea?) and simply does not exist on Windows (yet the build system could be extended see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo:, as well as in the examples vignette. This release, the first in about eleven months, once again brings two new helper scripts, and enhances six existing one. The release was triggered because it finally became clear why installGitHub.r ignored r2u when available: we forced the type argument to source (so thanks to I aki for spotting this). One change was once again contributed by Michael which is again greatly appreciated. The full change description follows.

Changes in littler version 0.3.22 (2026-02-03)
  • Changes in examples scripts
    • A new script busybees.r aggregates deadlined packages by maintainer
    • Several small updated have been made to the (mostly internal) 'r2u.r' script
    • The deadliners.r script has refined treatment for screen width
    • The install2.r script has new options --quiet and --verbose as proposed by Zivan Karaman
    • The rcc.r script passes build-args to 'rcmdcheck' to compact vignettes and save data
    • The installRub.r script now defaults to 'noble' and is more tolerant of inputs
    • The installRub.r script deals correctly with empty utils::osVersion thanks to Michael Chirico
    • New script checkPackageUrls.r inspired by how CRAN checks (with thanks to Kurt Hornik for the hint)
    • The installGithub.r script now adjusts to bspm and takes advantage of r2u binaries for its build dependencies
  • Changes in package
    • Environment variables (read at build time) can use double quotes
    • Continuous intgegration scripts received a minor update

My CRANberries service provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and also on the package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as (in a day or two) Ubuntu binaries at CRAN thanks to the tireless Michael Rutter. Comments and suggestions are welcome at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

Ben Hutchings: FOSS activity in January 2026

Next.