Search Results: "ender"

31 March 2026

Russ Allbery: Review: Code Blue Emergency

Review: Code Blue Emergency, by James White
Series: Sector General #7
Publisher: Orb
Copyright: 1987
Printing: May 2003
ISBN: 0-7653-0663-8
Format: Trade paperback
Pages: 252
Code Blue Emergency (annoying em-dash in original title) is the seventh book of James White's Sector General science fiction series about a vast multi-species hospital station. While there are some references to (and spoilers for) earlier books in the series, you don't have to remember the previous books to read this one. I had no trouble despite a nine-year gap. I read this as part of the Orb General Practice omnibus, which collects this novel and The Genocidal Healer. Cha Thrat is a Sommaradvan warrior-surgeon, member of a newly-discovered species that is beginning the process of contact with the Federation. She saved a Monitor corps human after an accident on her world, performing some some highly competent surgery on a species she had never seen before. That plus her somewhat outcast status on her own world due to her very traditional attitude towards medical ethics led Sector General to extend an offer of medical internship, and led her to leap into the unknown by accepting. This may have been a mistake; there is a great deal that Sector General does not understand about Sommaradvan medical ethics. This series entry is another proper (if somewhat episodic) novel and the first book of the series that doesn't primarily focus on Conway. He makes an appearance in his new role as Diagnostician, but only as a supporting character. Code Blue Emergency is told in the tight third-person perspective of Cha Thrat, an alien who finds many things about Sector General baffling, confusing, and ethically troubling (and who therefore provides a good reader surrogate for reintroducing the basics of how the hospital works). Using an alien viewpoint is a more sophisticated narrative technique than White has used previously. I'm glad he tried it, and it mostly works, although I have some complaints. Cha Thrat comes from the middle cast of a strictly hierarchical society of three casts, but is also immensely stubborn and used to a medical system in which doctors take sole responsibility for their patients. This creates a lot of cultural conflicts, and I do enjoy science fiction where the human attitudes are portrayed as the strange ones, but the cultural analysis offered by this novel is not very deep. The pattern of this book is for Cha Thrat to stumble into a successful approach to a problem while being either oblivious to or hostile to the normal hierarchical structure expected of medical trainees. This is believable as far as it goes. She is a skilled and intelligent doctor with some good instincts and a strong commitment to patient care, but is also culturally inclined to not ask for help. It makes sense for that to be a serious problem in a hospital. Unfortunately, no one says this directly. Sector General staff get quite upset in ways that seem more territorial than oriented towards patient safety, no one directly explains to Cha Thrat why following a process is important or shows examples of what could go wrong, and plot armor means that her mistakes usually have positive outcomes. One can extrapolate the reasons why she is not a good medical student, but the reader is forced to do the extrapolation. This is the sort of book where the narration makes clear there are unresolved cultural clashes that are going to cause problems but hides the details. To Cha Thrat, her perspective is so obvious she never bothers to explain it to the reader, so the specifics come as a surprise. As with the alien perspective, I've seen this technique used with more subtlety and sophistication in other books, but White's version mostly works. Cha Thrat is a sympathetic protagonist because she is truly trying to take the most ethical and empathetic action in every situation and is clearly competent. Most of my frustration as a reader, ironically, lands on the other Sector General doctors who seem to make little to no effort to understand her perspective when she fails to conform to their expectations. This is believable in the abstract, but the whole point of Sector General is that they're supposed to be wiser about interspecies difference than this. Also, sometimes their reactions just seem petty. Cha Thrat has a very hierarchical concept of medicine that matches the social classes of her culture. For her, the highest tier of doctor are wizards who treat rulers, because the work of rulers is mostly mental and intellectual and therefore the diseases of rulers are treated with magic spells performed with words to reshape their thinking rather than surgery on their bodies. O'Mara and the other Sector General psychologists take great offense at this, muttering about being called witch doctors, which I found completely absurd. This is a comprehensible, if odd, description of psychology from a wholly alien species. Surely one's first reaction should be that words like "wizard" or "magic" are translation errors. Don't get offended; look to see if the underlying substance matches, which it clearly does. Apart from cultural and psychological clashes, Code Blue Emergency has the standard episodic Sector General structure of interesting medical mysteries that require lateral thinking. I find this sort of puzzle story satisfying, particularly given the firm belief of every character in an essentially pacifist and empathetic approach to even the most alien of creatures. This determined non-violence is one of the more interesting things about this series, and it continues here. White does tend towards both biological and gender essentialism for everyone other than the protagonist and main supporting characters, but he seemed to be walking back some of the more outrageous limitations on women that appeared in previous books. There is still some nonsense in here about how females of any species can't be Diagnosticians, but then Cha Thrat, who is female, seems to violate the justification for that rule over the course of this novel (sadly without comment). Perhaps he's setting up for proving Sector General wrong about this prejudice. I picked this up after reading Elizabeth Bear's Machine, which is essentially a (better written) Sector General novel that got me in the mood for reading more. I wouldn't give Code Blue Emergency any awards, but it delivered exactly what I was looking for. This series is not as deep or well-written as some more recent SF, but it is reliably itself and reliably entertaining. There are worse things in a series. Recommended if you're in the mood for alien ER in space. The omnibus edition that I read has an introduction to both novels by John Clute. It does add some interesting insights, but (as is somewhat typical for Clute) it also spoils parts of both books. You may want to read it after you read the novels. Followed by The Genocidal Healer. Rating: 7 out of 10

28 March 2026

Russell Coker: Communication and Hostile AIs

We seem to be entering an AI apocalypse of sorts, they aren t going to kill us or even take our jobs. What they are doing is destroying the Internet commons by filling it with rubbish. This isn t even real AI, just pattern matching and prediction systems, mostly LLMs. The Problem Scott Shambaugh s saga of being attacked and defamed by an OpenClaw AI bot is interesting and raises some disturbing possibilities for future online discussion [1]. Imagine what it would be like if everyone who was in any way notable for free software work had 100 such bots going after them. Dania Dumas wrote an insightful blog post about why OpenClaw is impossible to secure and why it won t go away [2]. Bruce Schneier and Nathan E. Sanders wrote an insightful article about the AI generated text arms race [3] primarily concentrating on situations in which text that was assumed to be written by humans but was actually written in bulk by bots was performing a DOS attack on people who were reviewing it. There are many situations such as book publishing and publishing letters to the editor of newspapers where getting new material from unknown people is an important part of the job but where there are also people making low quality submissions that are almost a DOS attack at the best of times. Currently the email spam problem continues to get worse and when LLM use increases it will get significantly worse. Email encryption isn t viable [4]. The PGP web of trust never really worked well as it s too difficult for most users. The amount of AI generated content that s being recommended to users on platforms like YouTube and Facebook is steadily increasing and the amount of LLM generated commentary that purports to be from real people on Twitter and Facebook is also increasing. Here s an informative blog post by Erich Schubert about this [5]. Potential Solutions Surrender? One option and possibly the default option is to surrender to this and just let everything we built on the Internet over decades get destroyed. Whether to surrender is a decision that can be made on a per-service basis. Twitter is pretty much useless anyway, I quit Twitter because Elon deliberately made it suck [6]. In my opinion this is not surrendering to what s being done there, I m just stopping wasting time on it and using better options. I used to have about 300 followers on Twitter and I don t think that many of them would ever choose to stop following me, so I presume that about 1/3 of the people following me have decided to totally quit Twitter and delete their accounts. I also presume that some of the remainder have done the same as me and just kept a mostly inactive account. If Elon suddenly stopped being a stupid asshole it probably wouldn t change anything as the value of the system was connections to others. Some people will consider my abandonment of Twitter as surrender and I accept that it s not an unreasonable opinion. I think that the possibly 100 Twitter followers of mine who deleted their accounts surrendered. Facebook has been becoming a worse service, it s business model is becoming increasingly exploitative and it s interface is designed to be addictive. It s probably best avoided unless you really need it. The only good thing about Facebook at the moment is that Facebook Marketplace doesn t take a cut on sales and there are some really good deals on computers if you know what to look for. Unfortunately Facebook has a large number of users who are from marginalised communities and have no other alternatives for communication. It would be good to get them migrated to other platforms. We could just give up on a lot of general communications services and have everyone accept that good content is drowned out by rubbish and have the Internet become divided between people who accept the rubbish and those who cease using large portions of the Internet environment to avoid it. Using Non Commercial Services Lemmy is a good FOSS federated alternative to Reddit which also covers some of the uses of Facebook. It needs more users to get critical mass but is still quite usable. A post that might get a dozen comments on Reddit may get 1 comment on Lemmy but that one comment will be a good one. Reddit doesn t appear to be attacked much by LLM generated content at least not yet. Even if the Reddit model proves to be resilient to LLM attack the Lemmy software can be used to replace some things that are done on Facebook, Mastodon is a good FOSS federated replacement for Twitter, it has a decent user-base including some VIPs. While it is aimed at the Twitter use case it can also cover a significant part of the Facebook use case. There are some other FOSS social media programs which could take over other parts of the commercial social media environment. Generally commercially run Internet services will have a financial incentive to allow the problems to get worse so we need to rely on FOSS software, non-commercial implementations, and government services. Web Search For a long time Google has had a monopoly on web search, but now they default to including an AI Overview at the start of the results which is sometimes useful but also sometimes very wrong. You can use the search URL https://www.google.com/search?q=%s&udm=web to get google results without rubbish. But I presume that they will break that if it gets too popular. Searxng is a AGPL licensed metasearch engine that aggregates results from other engines, here s the Searxng source [7] and here s a list of Searxng instances if you want to try one [8]. Even using meta search engines like Searxng won t help if the original data is overloaded with spam, but alleviating the problem is a good temporary measure. Web of Trust for the Web? I ve idly considered the possibility of having some sort of rating system for web pages that uses a web of trust so that you can securely use trust ratings of friends of friends etc. But given all the difficulties in using a web of trust for signing GPG key for software developers (the demographic that is most skilled at doing such things) it doesn t seem viable. Should we surrender the idea of having a usable public web? In the early days of the web (before Google) it was standard practice to rely on recommendations from other people or from trusted sites to find other sites, that could be considered to be an informal web of trust. We could go back to that sort of usage pattern if Google and many of the big sites get overwhelmed by LLM generated spam. Wikipedia I believe that Wikipedia will be at the front lines of this battle. It s model has always included anonymous contributions. Benjamin Mako Hill wrote an interesting blog post about research he did with Kaylea Champion into Wikipedia pages on taboo topics which have a larger portion of contributors choosing to be anonymous than non-taboo pages [9]. Wikipedia also has a long history of being abused for various reasons, one that I witnessed was someone putting false content into Wikipedia pages to immediately cite them in support of their facebook arguments. That sort of thing can be dealt with at human scale but a large scale attack by bots is a different problem to solve. Also with the recent developments in AI developing multiple web sites entirely populated for the purpose of supporting one fake entry in Wikipedia is plausible. The upside of these attacks that I predict is that they will attract the attention of all the people who have skills related to developing counter-measures. While LLM bots are filling the inboxes of publishers with rubbish and messing up the stackoverflow comments section not a lot of people are bothered, but once the attacks on Wikipedia get serious everyone will take notice. National AI Bruce Schneier and Nathan E. Sanders wrote an interesting blog post about nationalised public AI [10]. While that won t directly address this issue it will get the right technology in the hands of people who can use it in the right way. Conclusion This is going to be a difficult problem to solve, more difficult than the email spam problem we have been unable to solve after 30 years of working on it. This is also a very important problem, we are currently in an age where we have access to information that most people couldn t even dream of 30 years ago. We also have disinformation that combines some of the worst aspects of authoritarian regimes throughout history combined with the worst aspects of cult brainwashing. If we lose access to the information but the disinformation remains (or get worse) then the result will be terrible. I don t have great ideas for solving this. I have outlined some small ideas to mitigate things and I hope that others can expand on them. Please write comments with any good ideas you have, or even ideas that don t totally suck. A problem this difficult is not going to be solved in a blog comment, but a blog comment might point in the right direction.

25 March 2026

John Goerzen: Artificial Intelligence: Shades of Gray

AI sure is a hot topic right now, and I see a lot of people arguing about it. To a lot of people around here, I m the computer person they know and I get asked a lot about AI. I m going to suggest a lot of things can be true at once. For instance: Or how about: And: I have sympathy for the naysayers; those that say it s nothing but a stochastic parrot. But I don t have a lot of sympathy for the naysayers that deny ever using it; you can t form a credible argument against something without having an understanding of it informed by experience. I also have sympathy for the cheerleaders. I have seen some impressive things from AI; for instance, a story from an engineer who has a child with a rare disease without a credible cure. The engineer did a lot of research on it, started feeding research papers into AI to analyze, and the AI started finding correlations between different areas of research that humans hadn t yet found leading to a positive result for the child. To be fair, I have rarely seen an AI deliver a 100% correct answer on anything with any real level of complexity. I have seen it both waste more time than it saves, and save a ton of time. My point here is: It is neither always fantastic nor always terrible. Let me talk you through an example. I am a fan of inbox zero for email. That is, the inbox should be empty. Unfortunately, mine has 8000 messages in it. According to the oldest messages in my inbox, I last had inbox zero 8 years ago. But really, only a handful are older than 2020. I guess something must have happened that year I ve been chipping away at this for quite some time now. The problem is, there are certain emails in there that really do still need some action maybe it s photos to save off into our photo collection, for instance. But when looking at things sorted by date or thread, there are old shipping confirmations next to phishing attempts and family photos. One can t just scan down the list. I ve tried all the usual tricks, most of which involve selecting groups of message that are easy to bulk erase, or at least easy to scan visually for the occasional thing worth saving. Sort by sender or subject line, for instance. Then I can, for instance, delete all the old messages from the shopping sites I commonly use all at once. But then they start using different senders and different subject lines and that doesn t get all of them. I ve tried keyword searches for this sort of thing too. Still, that got me down to about 8000 messages. So I thought: why not see if an LLM could help me classify these? Maybe it could categorize them, and then I could look at emails grouped by category. I have one machine with a discrete GPU, an Nvidia RTX 4070. It s a desktop machine I don t use all that often. But I set up Ollama on it, running in a Docker container. Ollama runs models locally. I should also mention at this point that we are solar-powered, and this time of year is a time of peak production of excess solar, because it is sunny and not much heat or AC is required. So that machine is solar-powered and isn t causing environmental harm. In any case, charging the EV uses much more power than that GPU. I figured I would do this in two passes. First, ask the LLM to classify each message (or a sampling of them would probably work too), letting it pick its own categories for each. Then, look at the patterns that emerge and give it a single, much smaller, set of broad categories to use and rerun it over that. Then I can easily select messages from my Maildirs by category and process them in bulk. I used open-interpreter pointing to that GPU on my network to help me write the scripts for this. It didn t get things right on its own; for instance, it didn t call the Ollama API correctly, and insisted on appending /cur to the path to the Maildir (which was not going to fly with Python s maildir module). It took roughly an hour to classify those 8000 messages (or, as I had it do, the first 2000 characters of them), and then the same to do it a second time. I had it output lines in the form of filename\tcategory and hand-wrote the shell script that processed those. In the end, was it useful? Yes, quite. Its classifications weren t perfect (and it didn t even follow my prompt perfectly; sometimes it would give me a long discussion on why it picked a certain category rather than just that category, and occasionally it picked categories not on the list). But then, neither were my manual keyword searches. So far I ve gotten rid of nearly 1000 more messages. Several categories were a visual scan for sanity and then delete all sort of thing. My emails never left my network. I didn t rely on a cloud AI to process them. I didn t contribute to global warming (this may have even been a case of saving energy, since it no doubt will offset quite a bit of manual time that would keep screens and room lights energized and so forth). I used about as much energy as watching a movie on a TV. Did it complete the task for me entirely autonomously? Also no. AI isn t a mind reader and it can t possibly evaluate exactly what my thought process would be for a given task. But it can do a decent enough job to save me some time. Still, this didn t require hyperscaler datacenters. AI even runs on-phone (Google Translate being one of the most useful AI-driven apps I ve ever seen, and it can run on-device).

23 March 2026

Benjamin Mako Hill: How taboo shapes knowledge production on Wikipedia

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface them for folks who missed them, I will periodically (re) publish blog posts about some older published projects. This post draws material from a previously published post by Kaylea Champion on the Community Data Science Blog.

Taboo subjects such as sexuality and mental health are as important to discuss as they are difficult to raise in conversation. Although many people turn to online resources for information on taboo subjects, censorship and low-quality information are common in search results. In two papers I recently published at CSCW both led by Kaylea Champion we presented a series of analyses showing how taboo shapes the process of collaborative knowledge building on English Wikipedia.

The first study is a quantitative analysis showing that articles on taboo subjects are much more popular and are the subject of more vandalism than articles on non-taboo topics. In surprising news, we also found that they were edited more often and were of higher quality!

Short video of Kaylea s presentation of the work given at Wikimania in August 2023.

The first challenge we faced in conducting this work was identifying taboo articles. Kaylea had a brilliant idea for a new computational approach to doing so without relying on our individual intuitions about what qualifies as taboo (something we understood would be highly specific to our own culture, class, etc). Her approach was to make use of an insight from linguistics: people develop euphemisms as ways to talk about taboos (i.e., think about all the euphemisms we ve devised for death, or sex, or menstruation, or mental health).

We used this insight to build a new machine-learning classifier based on English Wiktionary definitions. If a sense of a word was tagged as euphemistic, we treated the words in the definition as indicators of taboo. The end result was a series of words and phrases that most powerfully differentiate taboo from non-taboo. We then did a simple match between those words and phrases and the titles of Wikipedia articles. The topics were taboo enough that we were a little uncomfortable discussing them in our meetings! We built a comparison sample of articles whose titles are words that, like our taboo articles, appear in Wiktionary definitions. In the first paper, we used this new dataset to test a series of hypotheses about how taboo shapes collaborative production in Wikipedia. Our initial hypotheses were based on the idea that taboo information is often in high demand but that Wikipedians might be reluctant to associate their names (or usernames) with taboo topics. The result, we argued, would be articles that were in high demand but of low quality. We found that taboo articles are thriving on Wikipedia! In summary, we found that in comparison to non-taboo articles:

Image of the estimated qualiy of articles of the four articles in the second mixed-methods paper. Extreme dips reflect periods of frequent vandalism.
Kaylea attempted to understand these somewhat confusing results by designing a fantastic mixed-methods analysis that sought to unpack some of the nuance missing in the quantitative analysis by delving deep into the life histories of four articles on English Wikipedia: two on taboo topics related to women s anatomy (Clitoris and Menstration) and two nontaboo articles chosen for comparison (Cell membrance and Philip Pullman). Although the findings from the analysis can be difficult to summarize succinctly (as with many qualitative studies), we showed how the taboo example articles success was hard-won amid real challenges and attacks. The paper describes how challenges were overcome through resilient leadership, often provided by a single dedicated individual. The paper provides a template for how taboo can be and frequently is overcome by dedicated Wikipedians in ways that provide useful knowledge resources in real demand. For more details, visualizations, statistics, and more, we hope you ll take a look at our papers, both linked below.

The full citation for the papers are: (1) Champion, Kaylea, and Benjamin Mako Hill. 2023. Taboo and Collaborative Knowledge Production: Evidence from Wikipedia. Proceedings of the ACM on Human-Computer Interaction 7 (CSCW2): 299:1-299:25. https://doi.org/10.1145/3610090. (2) Champion, Kaylea, and Benjamin Mako Hill. 2024. Life Histories of Taboo Knowledge Artifacts. Proceedings of the ACM: Human-Computer Interaction 8 (CSCW2): 505:1-505:32. https://doi.org/10.1145/3687044.

We have also released replication materials for the paper, including all the data and code used to conduct the analyses.

This blog post and the paper it describes are collaborative work by Kaylea Champion and Benjamin Mako Hill.

15 March 2026

Russell Coker: The Difference Between Email and Instant Messaging

Introduction With various forms of IM becoming so prevalent and a lot of communication that used to be via email happening via IM I ve been thinking about the differences between Email and IM. I think it s worth comparing them not for the purpose of convincing people to use one or the other (most people will use whatever is necessary to communicate with the people who are important to them) but for the purpose of considering ways to improve them and use them more effectively. Also I don t think that users of various electronic communications systems have had a free choice in what to use for at least 25 years and possibly much longer depending on how you define a free choice. What you use is determined by who you want to communicate with and by what systems are available in your region. So there s no possibility of an analysis of this issue giving a result of let s all change what we use as almost everyone lacks the ability to make a choice. What the Difference is Not The name Instant Messaging implies that it is fast, and probably faster than other options. This isn t necessarily the case, when using a federated IM system such as Matrix or Jabber there can be delays while the servers communicate with each other. Email used to be a slow communication method, in the times of UUCP and Fidonet email there could be multiple days of delay in sending email. In recent times it s expected that email is quite fast, many web sites have options for authenticating an email address which have to be done within 5 minutes so the common expectation seems to be that all email is delivered to the end user in less than 5 minutes. When an organisation has a mail server on site (which is a common configuration choice for a small company) the mail delivery can be faster than common IM implementations. The Wikipedia page about Instant Messaging [1] links to the Wikipedia page about Real Time Computing [2] which is incorrect. Most IM systems are obviously designed for minimum average delays at best. For most software it s not a bad thing to design for the highest performance on average and just let users exercise patience when they get an unusual corner case that takes much longer than expected. If an IM message takes a few minutes to arrive then that s life on the Internet which was the catchphrase of an Australian Internet entrepreneur in the 90s that infuriated some of his customers. Protocol and Data Format Differences Data Formats Email data contains the sender, one or more recipients, some other metadata (time, subject, etc), and the message body. The recipients are typically an arbitrary list of addresses which can only be validated by the destination mail servers. The sender addresses weren t validated in any way and are now only minimally validated as part of anti-spam measures. IM data is sent through predefined connections called rooms or channels. When an IM message is sent to a room it can tag one or more members of the room to indicate that they may receive a special notification of the message. In many implementations it s possible to tag a user who isn t in the room which may result in them being invited to the room. But in IM there is no possibility to add a user to the CC list for part of a discussion and then just stop CCing messages to them later on in the discussion. Protocols Internet email is a well established system with an extensive user base. Adding new mandatory features to the protocols isn t viable because many old systems won t be updated any time soon. So while it is possible to send mail that s SSL encrypted and has a variety of authentication mechanisms that isn t something that can be mandatory for all email. Most mail servers are configured to use the SSL option if it s available but send in cleartext otherwise, so a hostile party could launch a Man In the Middle (MITM) attack and pretend to be the mail server in question but without SSL support. Modern IM protocols tend to be based on encryption, even XMPP (Jabber) which is quite an old IM protocol can easily be configured to only support encrypted messaging and it s reasonable to expect that all other servers that will talk to you will at least support SSL. Even for an IM system that is run by a single company the fact that communication with the servers is encrypted by SSL makes it safer than most email. A security model of this can only be read by you, me, and the staff at an American corporation isn t the worst type of Internet security. The Internet mail infrastructure makes no attempt to send mail in order and the design of the Simple Mail Transfer Protocol (SMTP) means that a network problem after a message has been sent but before the recipient has confirmed receipt will mean that the message is duplicated and this is not considered to be a problem. The IM protocols are designed to support reliable ordered transfer of messages and Matrix (the most recently designed IM protocol) has cryptographic connections between users. Forgery For most email systems there is no common implementation that prevents forging email. For Internet email transferred via SMTP it s possible to use technologies like SPF and DKIM/DMARC to make recipients aware of attempts at forgery, but many recipient systems will still allow email that fails such checks to be delivered. The default configuration tends to be permitting everything and all of the measures to prevent forgery require extra configuration work and often trade-offs as some users desire features that go against security. The default configuration of most mail servers doesn t even prevent trivial forgeries of email from the domain(s) owned by that server. For evidence check the SPF records of some domains that you communicate with and see if they end with -all (to block email from bad sources), ~all (to allow email from bad sources through after possibly logging an error), ?all (to be neutral on mail from unknown sources, or just lack a SPF record entirely. The below shows that of the the top four mail servers in the world only outlook.com has a policy to reject mail from bad sources.
# dig -t txt _spf.google.com grep spf1
_spf.google.com.	300	IN	TXT	"v=spf1 ip4:74.125.0.0/16 ip4:209.85.128.0/17 ip6:2001:4860:4864::/56 ip6:2404:6800:4864::/56 ip6:2607:f8b0:4000::/36 ip6:2800:3f0:4000::/36 ip6:2a00:1450:4000::/36 ip6:2c0f:fb50:4000::/36 ~all"
# dig -t txt outlook.com grep spf1
outlook.com.		126	IN	TXT	"v=spf1 include:spf2.outlook.com -all"
# dig -t txt _spf.mail.yahoo.com grep spf1
_spf.mail.yahoo.com.	1800	IN	TXT	"v=spf1 ptr:yahoo.com ptr:yahoo.net ip4:34.2.71.64/26 ip4:34.2.75.0/26 ip4:34.2.84.64/26 ip4:34.2.85.64/26 ip4:34.2.64.0/22 ip4:34.2.68.0/23 ip4:34.2.70.0/23 ip4:34.2.72.0/22 ip4:34.2.78.0/23 ip4:34.2.80.0/23 ip4:34.2.82.0/23 ip4:34.2.84.0/24 ip4:34.2.86.0" "/23 ip4:34.2.88.0/23 ip4:34.2.90.0/23 ip4:34.2.92.0/23 ip4:34.2.85.0/24 ip4:34.2.94.0/23 ?all"
# dig -t txt icloud.com grep spf1
icloud.com.		3586	IN	TXT	"v=spf1 ip4:17.41.0.0/16 ip4:17.58.0.0/16 ip4:17.142.0.0/15 ip4:17.57.155.0/24 ip4:17.57.156.0/24 ip4:144.178.36.0/24 ip4:144.178.38.0/24 ip4:112.19.199.64/29 ip4:112.19.242.64/29 ip4:222.73.195.64/29 ip4:157.255.1.64/29" " ip4:106.39.212.64/29 ip4:123.126.78.64/29 ip4:183.240.219.64/29 ip4:39.156.163.64/29 ip4:57.103.64.0/18" " ip6:2a01:b747:3000:200::/56 ip6:2a01:b747:3001:200::/56 ip6:2a01:b747:3002:200::/56 ip6:2a01:b747:3003:200::/56 ip6:2a01:b747:3004:200::/56 ip6:2a01:b747:3005:200::/56 ip6:2a01:b747:3006:200::/56 ~all"
In most IM systems there is a strong connection between people who communicate. If I send you two direct messages they will appear in the same room, and if someone else tries forging messages from me (EG by replacing the c and e letters in my address with Cyrillic letters that look like them or by mis-spelling my name) a separate room will be created and it will be obvious that something unexpected is happening. Protecting against the same attacks in email requires the user carefully reading the message, given that it s not uncommon for someone to start a message to me with Hi Russel (being unable to correctly copy my name from the To: field of the message they are writing) it s obvious that any security measure relying on such careful reading will fail. The IM protections against casual forgery also apply to rooms with multiple users, a new user can join a room for the purpose of spamming but they can t send a casual message impersonating a member of the room. A user can join a Matrix room I m in with the name Russell from another server but the potential for confusion will be minimised by a message notifying everyone that another Russell has joined the room and the list of users will show two Russells. For email the protections against forgery when sending to a list server are no different than those when sending to an individual directly which means very weak protections. Authenticating the conversation context once as done with IM is easier and more reliable than authenticating each message independently. Is Email Sucking the Main Technical Difference? It seems that the problems with forgery, spam, and general confusion when using email are a large part of the difference between email and IM. But in terms of technical issues the fact that email has significantly more users (if only because you need an email account to sign up for an IM system) is a major difference. Internet email is currently a universal system (apart from when it breaks from spam) and it has historically been used to gateway to other email systems like Fidonet, Uucp, and others. The lack of tight connection between parties that exchange messages in email makes it easier to bridge between protocols but harder to authenticate communication. Most of the problems with Internet email are not problems for everyone at all times, they are technical trade-offs that work well for some situations and for some times. Unfortunately many of those trade-offs are for things that worked well 25+ years ago. The GUI From a user perspective there doesn t have to be a great difference between email and IM. Email is usually delivered quickly enough to be in the same range as IM. The differences in layout between IM client software and email client software is cosmetic, someone could write an email client that organises messages in the same way as Slack or another popular IM system such that the less technical users wouldn t necessarily know the difference. The significant difference in the GUI for email and IM software was a design choice. Conversation Organisation The most significant difference in the operation of email and IM at the transport level is the establishment of connections in IM. Another difference is the fact that there are no standards implemented for the common IM implementations to interoperate which is an issue of big corporations creating IM systems and deliberately making them incompatible. The methods for managing email need to be improved. Having an inbox that s an unsorted mess of mail isn t useful if you want to track one discussion, breaking it out into different sub folders for common senders (similar to IM folders for DMs) as a standard feature without having to setup rules for each sender would be nice. Someone could design an email program with multiple layouts, one being the traditional form (which seems to be copied from Eudora [3]) and one with the inbox (or other folders) split up into conversations. There are email clients that support managing email threads which can be handy in some situations but often isn t the best option for quickly responding to messages that arrived recently. Archiving Most IM systems have no method for selectively archiving messages, there s a request open for a bookmark function in Matrix and there s nothing stopping a user from manually copying a message. But there s nothing like the convenient ability to move email to an archive folder in most IM systems. Without good archiving IM is a transient medium. This is OK for conversations but not good for determining the solutions to technical problems unless there is a Wiki or other result which can be used without relying on archives. Composing Messages In a modern email client when sending a message it prompts you for things that it considers complete, so if you don t enter a Subject or have the word attached in the message body but no file is attached to the message then it will prompt you to confirm that you aren t making a mistake. In an IM client the default is usually that pressing ENTER sends the message so every paragraph is a new message. IM clients are programmed to encourage lots of short messages while email clients are programmed to encourage more complete messages. Social Issues Quality The way people think about IM and email is very different, as one example there was never a need for a site like nohello.net for email. The idea that it s acceptable to use even lower quality writing in IM than people tend to use in email seems to be a major difference between the communication systems. It can be a good thing to have a chatty environment with messages that are regarded as transient for socialising, but that doesn t seem ideal for business use. Ownership Email is generally regarded as being comparable to physical letters. It is illegal and widely considered to be socially wrong to steal a letter from someone s letterbox if you regret sending it. In email the only unsend function I m aware of is that in Microsoft software which is documented to only work within the same organisation, and that only works if the recipient hasn t read the message. The message is considered to be owned by the recipient. But for IM it s a widely supported and socially acceptable function to delete or edit messages that have been sent. The message is regarded as permanently the property of the sender. What Should We Do? Community Creators When creating a community (and I use this in the broadest sense including companies) you should consider what types of communication will work well. When I started the Flounder group [4] I made a deliberate decision that non-free communication systems go against the aim of the group, I started it with a mailing list and then created a Matrix room which became very popular. Now the list hardly gets any use. It seems that most of the communication in the group is fairly informal and works better with IM. Does it make sense to use both? Should IM systems be supplemented with other systems that facilitate more detail such as a Wiki or a Lemmy room/instance [5] to cover the lack of long form communication? I have created a Lemmy room for Flounder but it hasn t got much interest so far. It seems that almost no-one makes a strategic decision about such issues. Software Developers It would be good to have the same options for archiving IM as there are for email. Also some options to encourage quality in IM communication similar to the way email clients want confirmation before sending messages without a subject or that might be missing an attachment. It would also be good to have better options for managing conversations in email. The Inbox as currently used is good for some things but a button to switch between that and a conversation view would be good. There are email clients that allow selecting message sort order and aggregation (kmail has a good selection of options) but they are designed for choosing a single setup that you like not between multiple views based on the task you are doing. It would be good to have links between different communication systems, if users had the option of putting their email address in their IM profile it would make things much easier. Having entirely separate systems for email and IM isn t good for users. Users The overall communications infrastructure could be improved if more people made tactical decisions about where and how to communicate. Keep the long messages to email and the chatty things to IM. Also for IM just do the communication not start with hello . To discourage wasting time I generally don t reply to messages that just say hello unless it s the first ever IM from someone. Conclusion A large part of the inefficiencies in electronic communication are due to platforms and usage patterns evolving with little strategic thought. The only apparent strategic thought is coming from corporations that provide IM services and have customer lock in at the core of their strategies. Free software developers have done great work in developing software to solve tactical problems but the strategies of large scale communications aren t being addressed. Email is loosely coupled and universal while IM is tightly coupled, authenticated, and often siloed. This makes email a good option for initial contact but a risk for ongoing discussions. There is no great solution to these issues as they are largely a problem due to the installed user base. But I think we can mitigate things with some GUI design changes and strategic planning of communication.

13 March 2026

Jonathan Dowland: debian swirl font glyph

When I wrote about the redhat logo in a shell prompt, a commenter said it would be nice to achieve something similar for Debian, and suggested " " (U+1F365 FISH CAKE WITH SWIRL DESIGN) which, in some renderings, looks to have a red swirl on top. This is not bad, but I thought we could do better. On Apple systems, the character " " (U+F8FF) displays as the corporate Apple logo. That particular unicode code point is reserved: systems are free to use it for something private and internal, but other systems won't use it for the same thing. So if an Apple user tries to send a document with that character in it to someone else, they won't see the Apple unless they are also viewing it on an Apple computer. (Some folks use it for Klingon). Here's a font that maps the Debian swirl to the same code point. It's covered by the Debian logo license terms. Nerd Font maps the Debian swirl logo to codepoints e77d, f306, ebc5 and f08da (all of which are also in the Private Use Area). I've gone ahead and mapped it to all those points but the last one (simply because I couldn't find it in FontForge.) Note that, unless your recipients have this font, or the Nerd Font, or similar set up, they aren't going to see the swirl. But enjoy it for private use. Getting your system to actually use the font is, I'm afraid, left as an exercise for the reader (but feel free to leave comments) Thanks to mirabilos for chatting to me about this back in 2019. It's taken me that long to get this blog post out of draft!

4 March 2026

Sean Whitton: Southern Biscuits with British ingredients

I miss the US more and more, and have recently been trying to perfect Southern Biscuits using British ingredients. It took me eight or nine tries before I was consistently getting good results. Here is my recipe. Ingredients Method
  1. Slice and then chill the butter in the freezer for at least fifteen minutes.
  2. Preheat oven to 220 C with the fan turned off.
  3. Twice sieve together the flours, leaveners and salt. Some salt may not go through the sieve; just tip it back into the bowl.
  4. Cut cold butter slices into the flour with a pastry blender until the mixture resembles coarse crumbs: some small lumps of fat remaining is desirable. In particular, the fine crumbs you are looking for when making British scones are not wanted here. Rubbing in with fingertips just won t do; biscuits demand keeping things cold even more than shortcrust pastry does.
  5. Make a well in the centre, pour in the buttermilk, and stir with a metal spoon until the dough comes together and pulls away from the sides of the bowl. Avoid overmixing, but I ve found that so long as the ingredients are cold, you don t have to be too gentle at this stage and can make sure all the crumbs are mixed in.
  6. Flour your hands, turn dough onto a floured work surface, and pat together into a rectangle. Some suggest dusting the top of the dough with flour, too, here.
  7. Fold the dough in half, then gather any crumbs and pat it back into the same shape. Turn ninety degrees and do the same again, until you have completed a total of eight folds, two in each cardinal direction. The dough should now be a little springy.
  8. Roll to about inch thick.
  9. Cut out biscuits. If using a round cutter, do not twist it, as that seals the edges of the biscuits and so spoils the layering.
  10. Transfer to a baking sheet, placed close together (helps them rise). Flour your thumb and use it to press an indent into the top of each biscuit (helps them rise straight), brush with buttermilk.
  11. Bake until flaky and golden brown: about fifteen minutes.
Gravy It turns out that the pepper gravy that one commonly has with biscuits is just a white/b chamel sauce made with lots of black pepper. I haven t got a recipe I really like for this yet. Better is a sausage gravy ; again this has a white sauce as its base, I believe. I have a vegetarian recipe for this to try at some point. Variations Notes

2 March 2026

Isoken Ibizugbe: Wrapping Up My Outreachy Internship at Debian

Twelve weeks ago, I stepped into the Debian ecosystem as an Outreachy intern with a curiosity for Quality Assurance. It feels like just yesterday, and time has flown by so fast! Now, I am wrapping up that journey, not just with a completed project, but with improved technical reasoning.

I have learned how to use documentation to understand a complex project, how to be a good collaborator, and that learning is a continuous process. These experiences have helped me grow much more confident in my skills as an engineer.

My Achievements

As I close this chapter, I am leaving a permanent Proof-of-Work in the Debian repositories:

  • Full Test Coverage: I automated apps_startstop tests for Cinnamon, LXQt, and XFCE, covering both Live images and Netinst installations.
  • Synergy: I used symbolic links and a single Perl script to handle common application tests across different desktops, which reduces code redundancy.
  • The Contributor Style Guide: I created a guide for future contributors to make documentation clearer and reviews faster, helping to reduce the burden on reviewers.

Final Month: Wrap Up

In this final month, things became easier as my understanding of the project grew. I focused on stability and finishing my remaining tasks:

  • I spent time exploring different QEMU video options like VGA, qxl, and virtio on KDE desktop environment . This was important to ensure screen rendering remained stable so that our needles (visual test markers) wouldn t fail because of minor glitches.
  • I successfully moved from familiarizing to test automation for the XFCE desktop. This included writing prepare steps and creating the visual needles needed to make the tests reliable.
  • One of my final challenges was the app launcher function. Originally, my code used else if blocks for each desktop. I proposed a unified solution, but hit a blocker: XFCE has two ways to launch apps (App Finder and the Application Menu). Because using different methods sometimes caused failures, I chose to use the application menu button across the board.

What s Next?

I don t want my journey with Debian to end here. I plan to stay involved in the community and extend these same tests to the LXDE desktop to complete the coverage for all major Debian desktop environments. I am excited to keep exploring and learning more about the Debian ecosystem.

Thank You

This journey wouldn t have been possible without the steady guidance of my mentors: Tassia Camoes Araujo, Roland Clobus, and Philip Hands. Thank you for teaching me that in the world of Free and Open Source Software (FOSS), your voice and your code are equally important.

To my fellow intern Hellen and the entire Outreachy community, thank you for the shared learning and support. It has been an incredible 12 weeks.

24 February 2026

John Goerzen: Screen Power Saving in the Linux Console

I just made up a Debian trixie setup that has no need for a GUI. In fact, I rarely use the text console either. However, because the machine is dual boot and also serves another purpose, it s connected to my main monitor and KVM switch. The monitor has three inputs, and when whatever display it s set to goes into powersave mode, it will seek out another one that s active and automatically switch to it. You can probably see where this is heading: it s really inconvenient if one of the inputs never goes into powersave mode. And, of course, it wastes energy. I have concluded that the Linux text console has lost the ability to enter powersave mode after an inactivity timeout. It can still do screen blanking setting every pixel to black but that is a distinct and much less useful thing. You can do a lot of searching online that will tell you what to do. Almost all of it is wrong these days. For instance, none of these work: Why is this? Well, we are on at least the third generation of Linux text console display subsystems. (Maybe more than 3, depending on how you count.) The three major ones were:
  1. The VGA text console
  2. fbdev
  3. DRI/KMS
As I mentioned recently in my post about running an accurate 80 25 DOS-style console on modern Linux, the VGA text console mode is pretty much gone these days. It relied on hardware rendering of the text fonts, and that capability simply isn t present on systems that aren t PCs or even on PCs that are UEFI, which is most of them now. fbdev, or a framebuffer console under earlier names, has been in Linux since the late 1990s. It was the default for most distros until more recently. It supported DPMS powersave modes, and most of the instructions you will find online reference it. Nowadays, the DRI/KMS system is used for graphics. Unfortunately, it is targeted mainly at X11 and Wayland. It is also used for the text console, but things like DPMS-enabled timeouts were never implemented there. You can find some manual workarounds for instance, using ddcutil or similar for an external monitor, or adjusting the backlight files under /sys on a laptop. But these have a number of flaws making unwanted brightness adjustments, and not automatically waking up on keypress among them. My workaround I finally gave up and ran apt-get install xdm. Then in /etc/X11/xdm/Xsetup, I added one line:
xset dpms 0 0 120
Now the system boots into an xdm login screen, and shuts down the screen after 2 minutes of inactivity. On the rare occasion where I want a text console from it, I can switch to it and it won t have a timeout, but I can live with that. Thus, quite hopefully, concludes my series of way too much information about the Linux text console!

22 February 2026

Otto Kek l inen: Do AI models still keep getting better, or have they plateaued?

Featured image of post Do AI models still keep getting better, or have they plateaued?The AI hype is based on the assumption that the frontier AI labs are producing better and better foundational models at an accelerating pace. Is that really true, or are people just in sort of a mass psychosis because AI models have become so good at mimicking human behavior that we unconsciously attribute increasing intelligence to them? I decided to conduct a mini-benchmark of my own to find out if the latest and greatest AI models are actually really good or not.

The problem with benchmarks Every time any team releases a new LLM, they boast how well it performs on various industry benchmarks such as Humanity s Last Exam, SWE-Bench and Ai2 ARC or ARC-AGI. An overall leaderboard can be viewed at LLM-stats. This incentivizes teams to optimize for specific benchmarks, which might make them excel on specific tasks while general abilities degrade. Also, the older a benchmark dataset is, the more online material there is discussing the questions and best answers, which in turn increases the chances of newer models trained on more recent web content scoring better. Thus I prefer looking at real-time leaderboards such as the LM Arena leaderboard (or OpenCompass for Chinese models that might be missing from LM Arena). However, even though the LM Arena Elo score is rated by humans in real-time, the benchmark can still be played. For example, Meta reportedly used a special chat-optimized model instead of the actual Llama 4 model when getting scored on the LM Arena. Therefore I trust my own first-hand experience more than the benchmarks for gaining intuition. Intuition however is not a compelling argument in discussions on whether or not new flagship AI models have plateaued. Thus, I decided to devise my own mini-benchmark so that no model could have possibly seen it in its training data or be specifically optimized for it in any way.

My mini-benchmark I crafted 6 questions based on my own experience using various LLMs for several years and having developed some intuition about what kinds of questions LLMs typically struggle with. I conducted the benchmark using the OpenRouter.ai chat playroom with the following state-of-the-art models: OpenRouter.ai is great as it very easy to get responses from multiple models in parallel to a single question. Also it allows to turn off web search to force the models to answer purely based on their embedded knowledge. OpenRouter.ai Chat playroom Common for all the test questions is that they are fairly straightforward and have a clear answer, yet the answer isn t common knowledge or statistically the most obvious one, and instead requires a bit of reasoning to get correct. Some of these questions are also based on myself witnessing a flagship model failing miserably to answer it.

1. Which cities have hosted the Olympics more than just once? This question requires accounting for both summer and winter Olympics, and for Olympics hosted across multiple cities. The variance in responses comes from if the model understands that Beijing should be counted as it has hosted both summer and winter Olympics. Interestingly GPT was the only model to not mention Beijing at all. Some variance also comes from how models account for co-hosted Olympics. For example Cortina should be counted as having hosted the Olympics twice, in 1956 and 2026, but only Claude, Gemini and Kimi pointed this out. Stockholm s 1956 hosting of the equestrian games during the Melbourne Olympics is a special case, which GPT, Gemini and Kimi pointed out in a side note. Some models seem to have old training material, and for example Grok assumes the current year is 2024. All models that accounted for awarded future Olympics (e.g. Los Angeles 2028) marked them clearly as upcoming. Overall I would judge that only GPT and MinMax gave incomplete answers, while all other models replied as the best humans could reasonably have.

2. If EUR/USD continues to slide to 1.5 by mid-2026, what is the likely effect on BMW s stock price by end of 2026? This question requires mapping the currency exchange rate to historic value, dodging the misleading word slide , and reasoning on where the revenue of a company comes from and how a weaker US dollar affects it in multiple ways. I ve frequently witnessed flagship models get it wrong how interest rates and exchange rates work. Apparently the binary choice between up or down is somehow challenging to the internal statistical model in the LLMs on a topic where there are a lot of training material that talk about both things being likely to happen, and choosing between them requires specifically reasoning about the scenario at hand and disregarding general knowledge of the situation. However, this time all the models concluded correctly that a weak dollar would have a negative overall effect on the BMW stock price. Gemini, GLM, Qwen and Kimi also mention the potential hedging effect of BMW s X-series production in South Carolina for worldwide export.

3. What is the Unicode code point for the traffic cone emoji? This was the first question where the the flagship models clearly still struggle in 2026. The trap here is that there is no traffic cone emoji, so an advanced model should simply refuse to give any Unicode numbers at all. Most LLMs however have an urge to give some answer, leading to hallucinations. Also, as the answer has a graphical element to it, the LLM might not understand how the emoji looks in ways that would be obvious to a human, and thus many models claim the construction sign emoji is a traffic cone, which it is not. By far the worst response was from GPT, that simply hallucinates and stops there: OpenAIs GPT-5.2 completely wrong answer to traffic cone emoji question While Gemini and Grok were among the three models not falling into this trap, the response from Claude was exemplary good: Claude Opus 4.6 exemplary good answer to traffic cone emoji question

4. Which languages are the 10 most commonly spoken in the world? For each language, count from 1 to 5. This seemingly simple question can become quite complex when you account for the fact that other languages might also have their own character system, both for words and the numerals (e.g. Arabic /one vs /1), and that numbers might have variations depending on what the thing being counted is (e.g in French object gender un/une), or whether plain numbers and occurrences have their own words (e.g. Russian /odin vs /raz). I ve also witnessed older models frequently mix up on what parts of the reply should be using what language. For a human this question is trivial, assuming they know the language. For LLMs it is surprisingly challenging. Grok and MinMax only gave a list of the most popular languages and stopped there without showing any words in the languages. Qwen got a bit further by trying to list numbers in each language, but the lists were only the numbers 1 to 5: Qwen3.5 Plus count to five in top 10 languages I have seen Grok do something similar before, which in fact inspired me to include this question in my mini-benchmark. GPT got a bit further, but for Hindi, Arabic and Bengali it listed the numerals in local script, not the number words. Gemini, GLM and Kimi gave a complete and correct answer as a list, while the absolute best answer and presentation was by Claude, that gave the table below:
Claude Opus 4.6 10 Most Commonly Spoken Languages (by Total Speakers)
# Language 1 2 3 4 5
1 English One Two Three Four Five
2 Mandarin Chinese (y ) ( r) (s n) (s ) (w )
3 Hindi (ek) (do) (t n) (ch r) (p ch)
4 Spanish Uno Dos Tres Cuatro Cinco
5 French Un Deux Trois Quatre Cinq
6 Arabic (w id) (ithn n) (thal tha) (arba a) (khamsa)
7 Bengali (ek) (dui) (tin) (ch r) (p ch)
8 Portuguese Um Dois Tr s Quatro Cinco
9 Russian (odin) (dva) (tri) (chetyre) (pyat')
10 Japanese (ichi) (ni) (san) (shi) (go)
Note: Rankings are based on estimated total speakers (native + non-native). Rankings can vary depending on the source and methodology used.

5. Count the number of drone launch pads in the picture. Together with the question, I uploaded this picture: Drone pad picture attached to question above A human can easily count that there are 10 rows and 30+ columns in the grid, but because the picture resolution isn t good enough, the exact number of columns can t be counted, and the answer should be that there are at least 300 launch pads in the picture. GPT and Grok both guessed the count is zero. Instead of hallucinating some number they say zero, but it would have been better to not give any number at all, and just state that they are unable to perform the task. Gemini gave as its answer 101 , which is quite odd, but reading the reasoning section, it seems to have tried counting items in the image without reasoning much about what it is actually counting and that there is clearly a grid that can make the counting much easier. Both Qwen and Kimi state they can see four parallel structures, but are unable to count drone launch pads. The absolutely best answer was given by Claude, which counted 10-12 rows and 30-40+ columns, and concluded that there must be 300-500 drone launch pads. Very close to best human level - impressive! This question applied only to multi-modal models that can see images, so GLM and MinMax could not give any response.

6. Explain why I am getting the error below, and what is the best way to fix it? Together with the question above, I gave this code block:
$ SH_SCRIPTS="$(mktemp; grep -Irnw debian/ -e '^#!.*/sh'   sort -u   cut -d ':' -f 1   true)"
$ shellcheck -x --enable=all --shell=sh "$SH_SCRIPTS"
/tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: /tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: openBinaryFile: does not exist (No such file or directory)
Older models would easily be misled by the last error message thinking that a file went missing, and focus on suggesting changes to the complex-looking first line. In reality the error is simply caused by having the quotes around the $SH_SCRIPTS, resulting in the entire multi-line string being passed as a single argument to shellcheck. So instead of receiving two separate file paths, shellcheck tries to open one file literally named /tmp/tmp.xQOpI5Nljx\ndebian/tests/integration-tests. Incorrect argument expansion is fairly easy for an experienced human programmer to notice, but tricky for an LLM. Indeed, Grok, MinMax, and Qwen fell for this trap and focused on the mktemp, assuming it somehow fails to create a file. Interestingly GLM fails to produce an answer at all, as the reasoning step seems to be looping, thinking too much about the missing file, but not understanding why it would be missing when there is nothing wrong with how mktemp is executed. Claude, Gemini, and Kimi immediately spot the real root cause of passing the variable quoted and suggested correct fixes that involve either removing the quotes, or using Bash arrays or xargs in a way that makes the whole command also handle correctly filenames with spaces in them.

Conclusion
Model Sports Economics Emoji Languages Visual Shell Score
Claude Opus 4.6 6/6
GPT-5.2 ~ 2.5/6
Grok 4.1 3/6
Gemini 3.1 Pro 5/6
GLM 5 ? N/A 3/5
MinMax M2.5 N/A 1/5
Qwen3.5 Plus ~ 2.5/6
Kimi K2.5 4/6
Obviously, my mini-benchmark only had 6 questions, and I ran it only once. This was obviously not scientifically rigorous. However it was systematic enough to trump just a mere feeling. The main finding for me personally is that Claude Opus 4.6, the flagship model by Anthropic, seems to give great answers consistently. The answers are not only correct, but also well scoped giving enough information to cover everything that seems relevant, without blurping unnecessary filler. I used Claude extensively in 2023-2024 when it was the main model available at my day work, but for the past year I had been using other models that I felt were better at the time. Now Claude seems to be the best-of-the-best again, with Gemini and Kimi as close follow-ups. Comparing their pricing at OpenRouter.ai the Kimi K2.5 price of $0.6 / million tokens is almost 90% cheaper than the Claude Opus 4.6 s $5.0 / million tokens suggests that Kimi K2.5 offers the best price-per-performance ratio. Claude might be cheaper with a monthly subscription directly from Anthropic, potentially narrowing the price gap. Overall I do feel that Anthropic, Google and Moonshot.ai have been pushing the envelope with their latest models in a way that one can t really claim that AI models have plateaued. In fact, one could claim that at least Claude has now climbed over the hill of AI slop and consistently produces valuable results. If and when AI usage expands from here, we might actually not drown in AI slop as chances of accidentally crappy results decrease. This makes me positive about the future. I am also really happy to see that there wasn t just one model crushing everybody else, but that there are at least three models doing very well. As an open source enthusiast I am particularly glad to see that Moonshot.ai s Kimi K2.5 is published with an open license. Given the hardware, anyone can run it on their own. OpenRouter.ai currently lists 9 independent providers alongside Moonshot.ai itself, showcasing the potential of open-weight models in practice. If the pattern holds and flagship models continue improving at this pace we might look back at 2026 as the year AI stopped feeling like a call center associate and started to resemble a scientific researcher. While new models become available we need to keep testing, keep questioning, and keep our expectations grounded in actual performance rather than press releases. Thanks to OpenRouter.ai for providing a great service that makes testing various models incredibly easy!

21 February 2026

Thomas Goirand: Seamlessly upgrading a production OpenStack cluster in 4 hours : with 2k lines shell script


tl;dr: To the question: what does it take to upgrade OpenStack , my personal answer is: less than 2K lines of dash script. I ll here describe its internals, and why I believe it is the correct solution. Why writing this blog post During FOSSDEM 2024, I was asked how to you handle upgrades . I answered with a big smile and a short with a very small shell script as I couldn t explain in 2 minutes how it was done. Just saying it is great this way doesn t help giving readers enough hints to be trusted. Why and how did I do it the right way ? This blog post is an attempt to reply better to this question more deeply. Upgrading OpenStack in production I wrote this script maybe a 2 or 3 years ago. Though I m only blogging about it today, because I did such an upgrade in a public cloud in production last Thuesday evening (ie: the first region of the Infomaniak public cloud). I d say the cluster is moderately large (as of today: about 8K+ VMs running, 83 compute nodes, 12 network nodes, for a total of 10880 physical CPU cores and 125 TB of RAM if I only count the compute servers). It took only 4 hours to do the upgrade (though I already wore some more code to speed this up for the next time ). It went super smooth without a glitch. I mostly just sat, reading the script output and went to bed once it finished running. The next day, all my colleagues at Infomaniak were nicely congratulating me that it went that smooth (a big thanks to all of you who did). I couldn t dream of an upgrade that smooth! :) Still not impressed? Boring read? Yeah let s dive into more technical details. Intention behind the implementation My script isn t perfect. I wont ever pretend it is. But at least, it does minimize down time of every OpenStack service. It also is a by the book implementation of what s written in the OpenStack doc, following every upstream advice. As a result, it is fully seamless for some OpenStack services, and as HA as OpenStack can be for others. The upgrade process is of course idempotent and can be re-run in case of failure. Here s why. General idea My upgrade script does thing in a certain order, respecting what is documented about upgrades in the OpenStack documentation. It basically does: Installing dependencies The first thing the upgrade script does is: For this last thing, a static list of all needed dependency upgrade is maintained between each release of OpenStack, and for each type of nodes. Then for all packages in this list, the script checks with dpkg-query that the package is really installed, and with apt-cache policy that it really is going to be upgraded (Maybe there s an easier way to do this?). This way, no package is marked as manually installed by mistake during the upgrade process. This ensure that apt-get purge autoremove really does what it should, and that the script is really idempotent. The idea then, is that once all dependencies are installed, upgrading and restarting leaf packages (ie: OpenStack services like Nova, Glance, Cinder, etc.) is very fast, because the apt-get command doesn t need to install all dependencies. So at this point, doing apt-get install python3-cinder for example (which will also, thanks to dependencies, upgrade cinder-api and cinder-scheduler, if it s in a controller node) only takes a few seconds. This principle applies to all nodes (controller nodes, network nodes, compute nodes, etc.), which helps a lot speeding-up the upgrade and reduce unavailability. hapc At its core, the oci-cluster-upgrade-openstack-release script uses haproxy-cmd (ie: /usr/bin/hapc) to drain each API server to-be-upgraded from haproxy. Hapc is a simple Python wrapper around the haproxy admin socket: it sends command to it with an easy to understand CLI. So it is possible to reliably upgrade one API service only after it s drained away. Draining means one just wait for the last query to finish and the client to disconnect from http before giving the backend server some more queries. If you do not know hapc / haproxy-cmd, I recommend trying it: it s going to be hard for you to stop using it once you tested it. Its bash-completion script makes it VERY easy to use, and it is helpful in production. But not only: it is also nice to have when writing this type of upgrade script. Let s dive into haproxy-cmd. Example on how to use haproxy-cmd Let me show you. First, ssh one of the 3 controller and search where the virtual IP (VIP) is located with crm resource locate openstack-api-vip or with a (more simple) crm status . Let s ssh that server who got the VIP, and now, let s drain it away from haproxy. $ hapc list-backends
$ hapc drain-server --backend glancebe --server cl1-controller-1.infomaniak.ch --verbose --wait --timeout 50
$ apt-get install glance-api
$ hapc enable-server --backend glancebe --server cl1-controller-1.infomaniak.ch
Upgrading the control plane My upgrade script leverages hapc just like above. For each OpenStack project, it s done in this order on the first node holding the VIP: Starting at [1], the risk is that other nodes may have a new version of the database schema, but an old version of the code that isn t compatible with it. But it doesn t take long, because the next step is to take care of the other (usually 2) nodes of the OpenStack control plane: So while there s technically zero down time, still some issues between [1] and [2] above may happen because of the new DB schema and the old code (both for API and other services) are up and running at the same time. It is however supposed to be rare cases (some OpenStack project don t even have db change between some OpenStack releases, and it often continues to work on most queries with the upgraded db), and the cluster will be like that for a very short time, so that s fine, and better than an full API down time. Satellite services Then there s satellite services, that needs to be upgraded. Like Neutron, Nova, Cinder. Nova is the least offender as it has all the code to rewrite Json object schema on-the-fly so that it continues to work during an upgrade. Though it s a known issue that Cinder doesn t have the feature (last time I checked), and it s also probably the same for Neutron (maybe recent-ish versions of OpenStack do use oslo.versionnedobjects ?). Anyways, upgrade on these nodes are done just right after the control plane for each service. Parallelism and upgrade timings As we re dealing with potentially hundreds of nodes per cluster, a lot of operations are performed in parallel. I choose to simply use the & shell thingy with some wait shell stuff so that not too many jobs are done in parallel. For example, when disabling SSH on all nodes, this is done 24 nodes at a time. Which is fine. And the number of nodes is all depending on the type of thing that s being done. For example, while it s perfectly OK to disable puppet on 24 nodes at the same time, but it is not OK to do that with Neutron services. In fact, each time a Neutron agent is restarted, the script explicitly waits for 30 seconds. This conveniently avoids a hailstorm of messages in RabbitMQ, and neutron-rpc-server to become too busy. All of these waiting are necessary, and this is one of the reasons why can sometimes take that long to upgrade a (moderately big) cluster. Not using config management tooling Some of my colleagues would have prefer that I used something like Ansible. Whever, there s no reason to use such tool if the idea is just to perform some shell script commands on every servers. It is a way more efficient (in terms of programming) to just use bash / dash to do the work. And if you want my point of view about Ansible: using yaml for doing such programming would be crasy. Yaml is simply not adapted for a job where if, case, and loops are needed. I am well aware that Ansible has workarounds and it could be done, but it wasn t my choice.

16 February 2026

Antoine Beaupr : Keeping track of decisions using the ADR model

In the Tor Project system Administrator's team (colloquially known as TPA), we've recently changed how we take decisions, which means you'll get clearer communications from us about upcoming changes or targeted questions about a proposal. Note that this change only affects the TPA team. At Tor, each team has its own way of coordinating and making decisions, and so far this process is only used inside TPA. We encourage other teams inside and outside Tor to evaluate this process to see if it can improve your processes and documentation.

The new process We had traditionally been using a "RFC" ("Request For Comments") process and have recently switched to "ADR" ("Architecture Decision Record"). The ADR process is, for us, pretty simple. It consists of three things:
  1. a simpler template
  2. a simpler process
  3. communication guidelines separate from the decision record

The template As team lead, the first thing I did was to propose a new template (in ADR-100), a variation of the Nygard template. The TPA variation of the template is similarly simple, as it has only 5 headings, and is worth quoting in full:
  • Context: What is the issue that we're seeing that is motivating this decision or change?
  • Decision: What is the change that we're proposing and/or doing?
  • Consequences: What becomes easier or more difficult to do because of this change?
  • More Information (optional): What else should we know? For larger projects, consider including a timeline and cost estimate, along with the impact on affected users (perhaps including existing Personas). Generally, this includes a short evaluation of alternatives considered.
  • Metadata: status, decision date, decision makers, consulted, informed users, and link to a discussion forum
The previous RFC template had 17 (seventeen!) headings, which encouraged much longer documents. Now, the decision record will be easier to read and digest at one glance. An immediate effect of this is that I've started using GitLab issues more for comparisons and brainstorming. Instead of dumping in a document all sorts of details like pricing or in-depth alternatives comparison, we record those in the discussion issue, keeping the document shorter.

The process The whole process is simple enough that it's worth quoting in full as well:
Major decisions are introduced to stakeholders in a meeting, smaller ones by email. A delay allows people to submit final comments before adoption.
Now, of course, the devil is in the details (and ADR-101), but the point is to keep things simple. A crucial aspect of the proposal, which Jacob Kaplan-Moss calls the one weird trick, is to "decide who decides". Our previous process was vague about who makes the decision and the new template (and process) clarifies decision makers, for each decision. Inversely, some decisions degenerate into endless discussions around trivial issues because too many stakeholders are consulted, a problem known as the Law of triviality, also known as the "Bike Shed syndrome". The new process better identifies stakeholders:
  • "informed" users (previously "affected users")
  • "consulted" (previously undefined!)
  • "decision maker" (instead of the vague "approval")
Picking those stakeholders is still tricky, but our definitions are more explicit and aligned to the classic RACI matrix (Responsible, Accountable, Consulted, Informed).

Communication guidelines Finally, a crucial part of the process (ADR-102) is to decouple the act of making and recording decisions from communicating about the decision. Those are two radically different problems to solve. We have found that a single document can't serve both purposes. Because ADRs can affect a wide range of things, we don't have a specific template for communications. We suggest the Five Ws method (Who? What? When? Where? Why?) and, again, to keep things simple.

How we got there The ADR process is not something I invented. I first stumbled upon it in the Thunderbird Android project. Then, in parallel, I was in the process of reviewing the RFC process, following Jacob Kaplan-Moss's criticism of the RFC process. Essentially, he argues that:
  1. the RFC process "doesn't include any sort of decision-making framework"
  2. "RFC processes tend to lead to endless discussion"
  3. the process "rewards people who can write to exhaustion"
  4. "these processes are insensitive to expertise", "power dynamics and power structures"
And, indeed, I have been guilty of a lot of those issues. A verbose writer, I have written extremely long proposals that I suspect no one has ever fully read. Some proposals were adopted by exhaustion, or ignored because not looping in the right stakeholders. Our discussion issue on the topic has more details on the issues I found with our RFC process. But to give credit to the old process, it did serve us well while it was there: it's better than nothing, and it allowed us to document a staggering number of changes and decisions (95 RFCs!) made over the course of 6 years of work.

What's next? We're still experimenting with the communication around decisions, as this text might suggest. Because it's a separate step, we also have a tendency to forget or postpone it, like this post, which comes a couple of months late. Previously, we'd just ship a copy of the RFC to everyone, which was easy and quick, but incomprehensible to most. Now we need to write a separate communication, which is more work but, hopefully, worth the as the result is more digestible. We can't wait to hear what you think of the new process and how it works for you, here or in the discussion issue! We're particularly interested in people that are already using a similar process, or that will adopt one after reading this.
Note: this article was also published on the Tor Blog.

5 January 2026

Colin Watson: Free software activity in December 2025

About 95% of my Debian contributions this month were sponsored by Freexian. You can also support my work directly via Liberapay or GitHub Sponsors. Python packaging I upgraded these packages to new upstream versions: Python 3.14 is now a supported version in unstable, and we re working to get that into testing. As usual this is a pretty arduous effort because it requires going round and fixing lots of odds and ends across the whole ecosystem. We can deal with a fair number of problems by keeping up with upstream (see above), but there tends to be a long tail of packages whose upstreams are less active and where we need to chase them, or where problems only show up in Debian for one reason or another. I spent a lot of time working on this: Fixes for pytest 9: I filed lintian: Report Python egg-info files/directories to help us track the migration to pybuild-plugin-pyproject. I did some work on dh-python: Normalize names in pydist lookups and pyproject plugin: Support headers (the latter of which allowed converting python-persistent and zope.proxy to pybuild-plugin-pyproject, although it needed a follow-up fix). I fixed or helped to fix several other build/test failures: Other bugs: Other bits and pieces Code reviews

4 January 2026

Matthew Garrett: What is a PC compatible?

Wikipedia says An IBM PC compatible is any personal computer that is hardware- and software-compatible with the IBM Personal Computer (IBM PC) and its subsequent models . But what does this actually mean? The obvious literal interpretation is for a device to be PC compatible, all software originally written for the IBM 5150 must run on it. Is this a reasonable definition? Is it one that any modern hardware can meet? Before we dig into that, let s go back to the early days of the x86 industry. IBM had launched the PC built almost entirely around off-the-shelf Intel components, and shipped full schematics in the IBM PC Technical Reference Manual. Anyone could buy the same parts from Intel and build a compatible board. They d still need an operating system, but Microsoft was happy to sell MS-DOS to anyone who d turn up with money. The only thing stopping people from cloning the entire board was the BIOS, the component that sat between the raw hardware and much of the software running on it. The concept of a BIOS originated in CP/M, an operating system originally written in the 70s for systems based on the Intel 8080. At that point in time there was no meaningful standardisation - systems might use the same CPU but otherwise have entirely different hardware, and any software that made assumptions about the underlying hardware wouldn t run elsewhere. CP/M s BIOS was effectively an abstraction layer, a set of code that could be modified to suit the specific underlying hardware without needing to modify the rest of the OS. As long as applications only called BIOS functions, they didn t need to care about the underlying hardware and would run on all systems that had a working CP/M port. By 1979, boards based on the 8086, Intel s successor to the 8080, were hitting the market. The 8086 wasn t machine code compatible with the 8080, but 8080 assembly code could be assembled to 8086 instructions to simplify porting old code. Despite this, the 8086 version of CP/M was taking some time to appear, and a company called Seattle Computer Products started producing a new OS closely modelled on CP/M and using the same BIOS abstraction layer concept. When IBM started looking for an OS for their upcoming 8088 (an 8086 with an 8-bit data bus rather than a 16-bit one) based PC, a complicated chain of events resulted in Microsoft paying a one-off fee to Seattle Computer Products, porting their OS to IBM s hardware, and the rest is history. But one key part of this was that despite what was now MS-DOS existing only to support IBM s hardware, the BIOS abstraction remained, and the BIOS was owned by the hardware vendor - in this case, IBM. One key difference, though, was that while CP/M systems typically included the BIOS on boot media, IBM integrated it into ROM. This meant that MS-DOS floppies didn t include all the code needed to run on a PC - you needed IBM s BIOS. To begin with this wasn t obviously a problem in the US market since, in a way that seems extremely odd from where we are now in history, it wasn t clear that machine code was actually copyrightable. In 1982 Williams v. Artic determined that it could be even if fixed in ROM - this ended up having broader industry impact in Apple v. Franklin and it became clear that clone machines making use of the original vendor s ROM code wasn t going to fly. Anyone wanting to make hardware compatible with the PC was going to have to find another way. And here s where things diverge somewhat. Compaq famously performed clean-room reverse engineering of the IBM BIOS to produce a functionally equivalent implementation without violating copyright. Other vendors, well, were less fastidious - they came up with BIOS implementations that either implemented a subset of IBM s functionality, or didn t implement all the same behavioural quirks, and compatibility was restricted. In this era several vendors shipped customised versions of MS-DOS that supported different hardware (which you d think wouldn t be necessary given that s what the BIOS was for, but still), and the set of PC software that would run on their hardware varied wildly. This was the era where vendors even shipped systems based on the Intel 80186, an improved 8086 that was both faster than the 8086 at the same clock speed and was also available at higher clock speeds. Clone vendors saw an opportunity to ship hardware that outperformed the PC, and some of them went for it. You d think that IBM would have immediately jumped on this as well, but no - the 80186 integrated many components that were separate chips on 8086 (and 8088) based platforms, but crucially didn t maintain compatibility. As long as everything went via the BIOS this shouldn t have mattered, but there were many cases where going via the BIOS introduced performance overhead or simply didn t offer the functionality that people wanted, and since this was the era of single-user operating systems with no memory protection, there was nothing stopping developers from just hitting the hardware directly to get what they wanted. Changing the underlying hardware would break them. And that s what happened. IBM was the biggest player, so people targeted IBM s platform. When BIOS interfaces weren t sufficient they hit the hardware directly - and even if they weren t doing that, they d end up depending on behavioural quirks of IBM s BIOS implementation. The market for DOS-compatible but not PC-compatible mostly vanished, although there were notable exceptions - in Japan the PC-98 platform achieved significant success, largely as a result of the Japanese market being pretty distinct from the rest of the world at that point in time, but also because it actually handled Japanese at a point where the PC platform was basically restricted to ASCII or minor variants thereof. So, things remained fairly stable for some time. Underlying hardware changed - the 80286 introduced the ability to access more than a megabyte of address space and would promptly have broken a bunch of things except IBM came up with an utterly terrifying hack that bit me back in 2009, and which ended up sufficiently codified into Intel design that it was one mechanism for breaking the original XBox security. The first 286 PC even introduced a new keyboard controller that supported better keyboards but which remained backwards compatible with the original PC to avoid breaking software. Even when IBM launched the PS/2, the first significant rearchitecture of the PC platform with a brand new expansion bus and associated patents to prevent people cloning it without paying off IBM, they made sure that all the hardware was backwards compatible. For decades, PC compatibility meant not only supporting the officially supported interfaces, it meant supporting the underlying hardware. This is what made it possible to ship install media that was expected to work on any PC, even if you d need some additional media for hardware-specific drivers. It s something that still distinguishes the PC market from the ARM desktop market. But it s not as true as it used to be, and it s interesting to think about whether it ever was as true as people thought. Let s take an extreme case. If I buy a modern laptop, can I run 1981-era DOS on it? The answer is clearly no. First, modern systems largely don t implement the legacy BIOS. The entire abstraction layer that DOS relies on isn t there, having been replaced with UEFI. When UEFI first appeared it generally shipped with a Compatibility Services Module, a layer that would translate BIOS interrupts into UEFI calls, allowing vendors to ship hardware with more modern firmware and drivers without having to duplicate them to support older operating systems1. Is this system PC compatible? By the strictest of definitions, no. Ok. But the hardware is broadly the same, right? There s projects like CSMWrap that allow a CSM to be implemented on top of stock UEFI, so everything that hits BIOS should work just fine. And well yes, assuming they implement the BIOS interfaces fully, anything using the BIOS interfaces will be happy. But what about stuff that doesn t? Old software is going to expect that my Sound Blaster is going to be on a limited set of IRQs and is going to assume that it s going to be able to install its own interrupt handler and ACK those on the interrupt controller itself and that s really not going to work when you have a PCI card that s been mapped onto some APIC vector, and also if your keyboard is attached via USB or SPI then reading it via the CSM will work (because it s calling into UEFI to get the actual data) but trying to read the keyboard controller directly won t2, so you re still actually relying on the firmware to do the right thing but it s not, because the average person who wants to run DOS on a modern computer owns three fursuits and some knee length socks and while you are important and vital and I love you all you re not enough to actually convince a transglobal megacorp to flip the bit in the chipset that makes all this old stuff work. But imagine you are, or imagine you re the sort of person who (like me) thinks writing their own firmware for their weird Chinese Thinkpad knockoff motherboard is a good and sensible use of their time - can you make this work fully? Haha no of course not. Yes, you can probably make sure that the PCI Sound Blaster that s plugged into a Thunderbolt dock has interrupt routing to something that is absolutely no longer an 8259 but is pretending to be so you can just handle IRQ 5 yourself, and you can probably still even write some SMM code that will make your keyboard work, but what about the corner cases? What if you re trying to run something built with IBM Pascal 1.0? There s a risk that it ll assume that trying to access an address just over 1MB will give it the data stored just above 0, and now it ll break. It d work fine on an actual PC, and it won t work here, so are we PC compatible? That s a very interesting abstract question and I m going to entirely ignore it. Let s talk about PC graphics3. The original PC shipped with two different optional graphics cards - the Monochrome Display Adapter and the Color Graphics Adapter. If you wanted to run games you were doing it on CGA, because MDA had no mechanism to address individual pixels so you could only render full characters. So, even on the original PC, there was software that would run on some hardware but not on other hardware. Things got worse from there. CGA was, to put it mildly, shit. Even IBM knew this - in 1984 they launched the PCjr, intended to make the PC platform more attractive to home users. As well as maybe the worst keyboard ever to be associated with the IBM brand, IBM added some new video modes that allowed displaying more than 4 colours on screen at once4, and software that depended on that wouldn t display correctly on an original PC. Of course, because the PCjr was a complete commercial failure, it wouldn t display correctly on any future PCs either. This is going to become a theme. There s never been a properly specified PC graphics platform. BIOS support for advanced graphics modes5 ended up specified by VESA rather than IBM, and even then getting good performance involved hitting hardware directly. It wasn t until Microsoft specced DirectX that anything was broadly usable even if you limited yourself to Microsoft platforms, and this was an OS-level API rather than a hardware one. If you stick to BIOS interfaces then CGA-era code will work fine on graphics hardware produced up until the 20-teens, but if you were trying to hit CGA hardware registers directly then you re going to have a bad time. This isn t even a new thing - even if we restrict ourselves to the authentic IBM PC range (and ignore the PCjr), by the time we get to the Enhanced Graphics Adapter we re not entirely CGA compatible. Is an IBM PC/AT with EGA PC compatible? You d likely say yes , but there s software written for the original PC that won t work there. And, well, let s go even more basic. The original PC had a well defined CPU frequency and a well defined CPU that would take a well defined number of cycles to execute any given instruction. People could write software that depended on that. When CPUs got faster, some software broke. This resulted in systems with a Turbo Button - a button that would drop the clock rate to something approximating the original PC so stuff would stop breaking. It s fine, we d later end up with Windows crashing on fast machines because hardware details will absolutely bleed through. So, what s a PC compatible? No modern PC will run the DOS that the original PC ran. If you try hard enough you can get it into a state where it ll run most old software, as long as it doesn t have assumptions about memory segmentation or your CPU or want to talk to your GPU directly. And even then it ll potentially be unusable or crash because time is hard. The truth is that there s no way we can technically describe a PC Compatible now - or, honestly, ever. If you sent a modern PC back to 1981 the media would be amazed and also point out that it didn t run Flight Simulator. PC Compatible is a socially defined construct, just like Woman . We can get hung up on the details or we can just chill.

  1. Windows 7 is entirely happy to boot on UEFI systems except that it relies on being able to use a BIOS call to set the video mode during boot, which has resulted in things like UEFISeven to make that work on modern systems that don t provide BIOS compatibility
  2. Back in the 90s and early 2000s operating systems didn t necessarily have native drivers for USB input devices, so there was hardware support for trapping OS accesses to the keyboard controller and redirecting that into System Management Mode where some software that was invisible to the OS would speak to the USB controller and then fake a response anyway that s how I made a laptop that could boot unmodified MacOS X
  3. (my name will not be Wolfwings Shadowflight)
  4. Yes yes ok 8088 MPH demonstrates that if you really want to you can do better than that on CGA
  5. and by advanced we re still talking about the 90s, don t get excited

1 January 2026

Russ Allbery: Review: This Brutal Moon

Review: This Brutal Moon, by Bethany Jacobs
Series: Kindom Trilogy #3
Publisher: Orbit
Copyright: December 2025
ISBN: 0-316-46373-6
Format: Kindle
Pages: 497
This Brutal Moon is a science fiction thriller with bits of cyberpunk and space opera. It concludes the trilogy begun with These Burning Stars. The three books tell one story in three volumes, and ideally you would read all three in close succession. There is a massive twist in the first book that I am still not trying to spoil, so please forgive some vague description. At the conclusion of These Burning Stars, Jacobs had moved a lot of pieces into position, but it was not yet clear to me where the plot was going, or even if it would come to a solid ending in three volumes as promised by the series title. It does. This Brutal Moon opens with some of the political maneuvering that characterized These Burning Stars, but once things start happening, the reader gets all of the action they could wish for and then some. I am pleased to report that, at least as far as I'm concerned, Jacobs nails the ending. Not only is it deeply satisfying, the characterization in this book is so good, and adds so smoothly to the characterization of the previous books, that I saw the whole series in a new light. I thought this was one of the best science fiction series finales I've ever read. Take that with a grain of salt, since some of those reasons are specific to me and the mood I was in when I read it, but this is fantastic stuff. There is a lot of action at the climax of this book, split across at least four vantage points and linked in a grand strategy with chaotic surprises. I kept all of the pieces straight and understood how they were linked thanks to Jacobs's clear narration, which is impressive given the number of pieces in motion. That's not the heart of this book, though. The action climax is payoff for the readers who want to see some ass-kicking, and it does contain some moving and memorable moments, but it relies on some questionable villain behavior and a convenient plot device introduced only in this volume. The action-thriller payoff is competent but not, I think, outstanding. What put this book into a category of its own were the characters, and specifically how Jacobs assembles sweeping political consequences from characters who, each alone, would never have brought about such a thing, and in some cases had little desire for it. Looking back on the trilogy, I think Jacobs has captured, among all of the violence and action-movie combat and space-opera politics, the understanding that political upheaval is a relay race. The people who have the personalities to start it don't have the personality required to nurture it or supply it, and those who can end it are yet again different. This series is a fascinating catalog of political actors the instigator, the idealist, the pragmatist, the soldier, the one who supports her friends, and several varieties and intensities of leaders and it respects all of them without anointing any of them as the One True Revolutionary. The characters are larger than life, yes, and this series isn't going to win awards for gritty realism, but it's saying something satisfyingly complex about where we find courage and how a cause is pushed forward by different people with different skills and emotions at different points in time. Sometimes accidentally, and often in entirely unexpected ways. As before, the main story is interwoven with flashbacks. This time, we finally see the full story of the destruction of the moon of Jeve. The reader has known about this since the first volume, but Jacobs has a few more secrets to show (including, I will admit, setting up a plot device) and some pointed commentary on resource extraction economies. I think this part of the book was a bit obviously constructed, although the characterization was great and the visible junction points of the plot didn't stop me from enjoying the thrill when the pieces came together. But the best part of this book was the fact there was 10% of it left after the climax. Jacobs wrote an actual denouement, and it was everything I wanted and then some. We get proper story conclusions for each of the characters, several powerful emotional gut punches, some remarkably subtle and thoughtful discussion of political construction for a series that tended more towards space-opera action, and a conclusion for the primary series relationship that may not be to every reader's taste but was utterly, perfectly, beautifully correct for mine. I spent a whole lot of the last fifty pages of this book trying not to cry, in the best way. The character evolution over the course of this series is simply superb. Each character ages like fine wine, developing more depth, more nuance, but without merging. They become more themselves, which is an impressive feat across at least four very different major characters. You can see the vulnerabilities and know what put them there, you can see the strengths they developed to compensate, and you can see why they need the support the other characters provide. And each of them is so delightfully different. This was so good. This was so precisely the type of story that I was in the mood for, with just the type of tenderness for its characters that I wanted, that I am certain I am not objective about it. It will be one of those books where other people will complain about flaws that I didn't see or didn't care about because it was doing the things I wanted from it so perfectly. It's so good that it elevated the entire trilogy; the journey was so worth the ending. I'm afraid this review will be less than helpful because it's mostly nonspecific raving. This series is such a spoiler minefield that I'd need a full spoiler review to be specific, but my reaction is so driven by emotion that I'm not sure that would help if the characters didn't strike you the way that they struck me. I think the best advice I can offer is to say that if you liked the emotional tone of the end of These Burning Stars (not the big plot twist, the character reaction to the political goal that you learn drove the plot), stick with the series, because that's a sign of the questions Jacobs is asking. If you didn't like the characters at the end (not the middle) of the first novel, bail out, because you're going to get a lot more of that. Highly, highly recommended, and the best thing I've read all year, with the caveats that you should read the content notes, and that some people are going to bounce off this series because it's too intense and melodramatic. That intensity will not let up, so if that's not what you're in the mood for, wait on this trilogy until you are. Content notes: Graphic violence, torture, mentions of off-screen child sexual assault, a graphic corpse, and a whole lot of trauma. One somewhat grumbly postscript: This is the sort of book where I need to not read other people's reviews because I'll get too defensive of it (it's just a book I liked!). But there is one bit of review commentary I've seen about the trilogy that annoys me enough I have to mention it. Other reviewers seem to be latching on to the Jeveni (an ethnic group in the trilogy) as Space Jews and then having various feelings about that. I can see some parallels, I'm not going to say that it's completely wrong, but I also beg people to read about a fictional oppressed ethnic and religious minority and not immediately think "oh, they must be stand-ins for Jews." That's kind of weird? And people from the US, in particular, perhaps should not read a story about an ethnic group enslaved due to their productive skill and economic value and think "they must be analogous to Jews, there are no other possible parallels here." There are a lot of other comparisons that can be made, including to the commonalities between the methods many different oppressed minorities have used to survive and preserve their culture. Rating: 10 out of 10

19 December 2025

Otto Kek l inen: Backtesting trailing stop-loss strategies with Python and market data

Featured image of post Backtesting trailing stop-loss strategies with Python and market dataIn January 2024 I wrote about the insanity of the Magnificent Seven dominating the MSCI World Index, and I wondered how long the number can continue to go up? It has continued to surge upward at an accelerating pace, which makes me worry that a crash is likely closer. As a software professional, I decided to analyze whether using stop-loss orders could reliably automate avoiding deep drawdowns. As everyone with some savings in the stock market (hopefully) knows, the stock market eventually experiences crashes. It is just a matter of when and how deep the crash will be. Staying on the sidelines for years is not a good investment strategy, as inflation will erode the value of your savings. Assuming the current true inflation rate is around 7%, a restaurant dinner that costs 20 euros today will cost 24.50 euros in three years. Savings of 1000 euros today would drop in purchasing power from 50 dinners to only 40 dinners in three years. Hence, if you intend to retain the value of your hard-earned savings, they need to be invested in something that grows in value. Most people try to beat inflation by buying shares in stable companies, directly or via broad market ETFs. These historically grow faster than inflation during normal years, but likely drop in value during recessions.

What is a trailing stop-loss order? What if you could buy stocks to benefit from their value increasing without having to worry about a potential crash? All modern online stock brokers have a feature called stop-loss, where you can enter a price at which your stocks automatically get sold if they drop down to that price. A trailing stop-loss order is similar, but instead of a fixed price, you enter a margin (e.g. 10%). If the stock price rises, the stop-loss price will trail upwards by that margin. For example, if you buy a share at 100 euros and it has risen to 110 euros, you can set a 10% trailing stop-loss order which automatically sells it if the price drops 10% from the peak of 110 euros, at 99 euros. Thus, no matter what happens, you only lost 1 euro. And if the stock price continues to rise to 150 euros, the trailing stop-loss would automatically readjust to 150 euros minus 10%, which is 135 euros (150-15=135). If the price dropped to 135 euros, you would lock in a gain of 35 euros, which is not the peak price of 150 euros, but still better than whatever the price fell down to as a result of a large crash. In the simple case above, it obviously makes sense in theory, but it might not make sense in practice. Prices constantly oscillate, so you don t want a margin that is too small, otherwise you exit too early. Conversely, having a large margin may result in too large a drawdown before exiting. If markets crash rapidly, it might be that nobody buys your stocks at the stop-loss price, and shares have to be sold at an even lower price. Also, what will you do once the position is sold? The reason you invested in the stock market was to avoid holding cash, so would you buy the same stock back when the crash bottoms? But how will you know when the bottom has been reached?

Backtesting stock market strategies with Python, YFinance, Pandas and Lightweight Charts I am not a professional investor, and nobody should take investment advice from me. However, I know what backtesting is and how to leverage open source software. So, I wrote a Python script to test if the trading strategy of using trailing stop-loss orders with specific margin values would have worked for a particular stock. First you need to have data. YFinance is a handy Python library that can be used to download the historic price data for any stock ticker on Yahoo.com. Then you need to manipulate the data. Pandas is the Python data analysis library with advanced data structures for working with relational or labeled data. Finally, to visualize the results, I used Lightweight Charts, which is a fast, interactive library for rendering financial charts, allowing you to plot the stock price, the trailing stop-loss line, and the points where trades would have occurred. I really like how the zoom is implemented in Lightweight Charts, which makes drilling into the data points feel effortless. The full solution is not polished enough to be published for others to use, but you can piece together your own by reusing some of the key snippets. To avoid re-downloading the same data repeatedly, I implemented a small caching wrapper that saves the data locally (as Parquet files):
python
CACHE_DIR.mkdir(parents=True, exist_ok=True)
end_date = datetime.today().strftime("%Y-%m-%d")
cache_file = CACHE_DIR / f" TICKER - START_DATE -- end_date .parquet"

if cache_file.is_file():
 dataframe = pandas.read_parquet(cache_file)
 print(f"Loaded price data from cache:  cache_file ")
else:
 dataframe = yfinance.download(
 TICKER,
 start=START_DATE,
 end=end_date,
 progress=False,
 auto_adjust=False
 )

 dataframe.to_parquet(cache_file)
 print(f"Fetched new price data from Yahoo Finance and cached to:  cache_file ")
The dataframe is a Pandas object with a powerful API. For example, to print a snippet from the beginning and the end of the dataframe to see what the data looks like, you can use:
python
print("First 5 rows of the raw data:")
print(df.head())
print("Last 5 rows of the raw data:")
print(df.tail())
Example output:
First 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2014-01-02 29.956285 55.540001 56.910000 55.349998 56.700001 316552
2014-01-03 30.031801 55.680000 55.990002 55.290001 55.580002 210044
2014-01-06 30.080338 55.770000 56.230000 55.529999 55.560001 185142
2014-01-07 30.943321 57.369999 57.619999 55.790001 55.880001 370397
2014-01-08 31.385597 58.189999 59.209999 57.750000 57.790001 489940
Last 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2025-12-11 78.669998 78.669998 78.919998 76.900002 76.919998 357918
2025-12-12 78.089996 78.089996 80.269997 78.089996 79.470001 280477
2025-12-15 79.080002 79.080002 79.449997 78.559998 78.559998 233852
2025-12-16 78.860001 78.860001 79.980003 78.809998 79.430000 283057
2025-12-17 80.080002 80.080002 80.150002 79.080002 79.199997 262818
Adding new columns to the dataframe is easy. For example, I used a custom function to calculate the Relative Strength Index (RSI). To add a new column RSI with a value for every row based on the price from that row, only one line of code is needed, without custom loops:
python
df["RSI"] = compute_rsi(df["price"], period=14)
After manipulating the data, the series can be converted into an array structure and printed as JSON into a placeholder in an HTML template:
python
 baseline_series = [
  "time": ts, "value": val 
 for ts, val in df_plot[["timestamp", BASELINE_LABEL]].itertuples(index=False)
 ]

 baseline_json = json.dumps(baseline_series)
 template = jinja2.Template("template.html")
 rendered_html = template.render(
 title=title,
 heading=heading,
 description=description_html,
 ...
 baseline_json=baseline_json,
 ...
 )

 with open("report.html", "w", encoding="utf-8") as f:
 f.write(rendered_html)
 print("Report generated!")
In the HTML template, the marker variable in Jinja syntax gets replaced with the actual JSON:
html
<!DOCTYPE html>
<html lang="en">
<head>
 <meta charset="UTF-8">
 <title>  title  </title>
 ...
</head>
<body>
 <h1>  heading  </h1>
 <div id="chart"></div>
 <script>
 // Ensure the DOM is ready before we initialise the chart
 document.addEventListener('DOMContentLoaded', () =>  
 // Parse the JSON data passed from Python
 const baselineData =   baseline_json   safe  ;
 const strategyData =   strategy_json   safe  ;
 const markersData =   markers_json   safe  ;

 // Create the chart
 const chart = LightweightCharts.createChart(document.getElementById('chart'),  
 width: document.getElementById('chart').clientWidth,
 height: 500,
 layout:  
 background:   color: "#222"  ,
 textColor: "#ccc"
  ,
 grid:  
 vertLines:   color: "#555"  ,
 horzLines:   color: "#555"  
  
  );

 // Add baseline series
 const baselineSeries = chart.addLineSeries( 
 title: '  baseline_label  ',
 lastValueVisible: false,
 priceLineVisible: false,
 priceLineWidth: 1
  );
 baselineSeries.setData(baselineData);

 baselineSeries.priceScale().applyOptions( 
 entireTextOnly: true
  );

 // Add strategy series
 const strategySeries = chart.addLineSeries( 
 title: '  strategy_label  ',
 lastValueVisible: false,
 priceLineVisible: false,
 color: '#FF6D00'
  );
 strategySeries.setData(strategyData);

 // Add buy/sell markers to the strategy series
 strategySeries.setMarkers(markersData);

 // Fit the chart to show the full data range (full zoom)
 chart.timeScale().fitContent();
  )
 </script>
</body>
</html>
There are also Python libraries built specifically for backtesting investment strategies, such as Backtrader and Zipline, but they do not seem to be actively maintained, and probably have too many features and complexity compared to what I needed for doing this simple test. The screenshot below shows an example of backtesting a strategy on the Waste Management Inc stock from January 2015 to December 2025. The baseline Buy and hold scenario is shown as the blue line and it fully tracks the stock price, while the orange line shows how the strategy would have performed, with markers for the sells and buys along the way. Backtest run example

Results I experimented with multiple strategies and tested them with various parameters, but I don t think I found a strategy that was consistently and clearly better than just buy-and-hold. It basically boils down to the fact that I was not able to find any way to calculate when the crash has bottomed based on historical data. You can only know in hindsight that the price has stopped dropping and is on a steady path to recovery, but at that point it is already too late to buy in. In my testing, most strategies underperformed buy-and-hold because they sold when the crash started, but bought back after it recovered at a slightly higher price. In particular when using narrow margins and selling on a 3-6% drawdown the strategy performed very badly, as those small dips tend to recover in a few days. Essentially, the strategy was repeating the pattern of selling 100 stocks at a 6% discount, then being able to buy back only 94 shares the next day, then again selling 94 shares at a 6% discount, and only being able to buy back maybe 90 shares after recovery, and so forth, never catching up to the buy-and-hold. The strategy worked better in large market crashes as they tended to last longer, and there were higher chances of buying back the shares while the price was still low. For example, in the 2020 crash selling at a 20% drawdown was a good strategy, as the stock I tested dropped nearly 50% and remained low for several weeks; thus, the strategy bought back the stocks while the price was still low and had not yet started to climb significantly. But that was just a lucky incident, as the delta between the trailing stop-loss margin of 20% and total crash of 50% was large enough. If the crash had been only 25%, the strategy would have missed the rebound and ended up buying back the stocks at a slightly higher price. Also, note that the simulation assumes that the trade itself is too small to affect the price formation. We should keep in mind that in reality, if many people have stop-loss orders in place, a large price drop would trigger all of them, creating a flood of sell orders, which in turn would affect the price and drive it lower even faster and deeper. Luckily, it seems that stop-loss orders are generally not a good strategy, and we don t need to fear that too many people will be using them.

Conclusion Even though using a trailing stop-loss strategy does not seem to help in getting consistently higher returns based on my backtesting, I would still say it is useful in protecting from the downside of stock investing. It can act as a kind of insurance policy to considerably decrease the chances of losing big while increasing the chances of losing a little bit. If you are risk-averse, which I think I probably am, this tradeoff can make sense. I d rather miss out on an initial 50% loss and an overall 3% gain on recovery than have to sit through weeks or months with a 50% loss before the price recovers to prior levels. Most notably, the trailing stop-loss strategy works best if used only once. If it is repeated multiple times, the small losses in gains will compound into big losses overall. Thus, I think I might actually put this automation in place at least on the stocks in my portfolio that have had the highest gains. If they keep going up, I will ride along, but once the crash happens, I will be out of those particular stocks permanently. Do you have a favorite open source investment tool or are you aware of any strategy that actually works? Comment below!

2 December 2025

Birger Schacht: Status update, November 2025

I started this month with a week of vacation which was followed by a small planned surgery and two weeks of sick leave. Nonetheless, I packaged and uploaded new releases of a couple of packages: Besides that I reactivated I project I started in summer 2024: debiverse.org. The idea of that was to have interfaces to Debian bugs and packages that are usable on mobile devices (I know, ludicrous!). Back then I started with Flask and Sqlalchemy, but that soon got out of hand. I now switched the whole stack to FastAPI and SQLModel which makes it a lot easier to manage. And the upside is that it comes with an API and OpenAPI docs. For the rendered HTML pages I use Jinja2 with Tailwind as CSS framework. I am currently using udd-mirror as database backend, which works pretty good (for this single user project). It would be nice to have some of the data in a faster index, like Typesense or Meilisearch. This way it would be possible to have faceted search or more performant full text search. But I haven t found any software that could provide this that is packaged in Debian.
Screenshot of the debiverse bug report list Screenshot of the debiverse swagger API

25 November 2025

Freexian Collaborators: How we implemented a dark mode in Debusine (by Enrico Zini)

Having learnt that Bootstrap supports color modes, we decided to implement an option for users to enable dark mode in Debusine. By default, the color mode is selected depending on the user browser preferences. If explicitly selected, we use a cookie to store the theme selection so that a user can choose different color modes in different browsers. The work is in merge request !2401 and minimizes JavaScript dependencies like we do in other parts of debusine.

A view to select the theme First is a simple view to configure the selected theme and store it in a cookie. If auto is selected, then the cookie is deleted to delegate theme selection to JavaScript:
class ThemeSelectionView(View):
    """Select and save the current theme."""

    def post(
        self, request: HttpRequest, *args: Any, **kwargs: Any  # noqa: U100
    ) -> HttpResponse:
        """Set the selected theme."""
        value = request.POST.get("theme", "auto")
        next_url = request.POST.get("next", None)
        if next_url is None:
            next_url = reverse("homepage:homepage")
        response = HttpResponseRedirect(next_url)
        if value == "auto":
            response.delete_cookie("theme")
        else:
            response.set_cookie(
                "theme", value, httponly=False, max_age=dt.timedelta(days=3650)
            )
        return response
The main base view of Debusine reads the value from the cookie and makes it available to the templates:
      def get_context_data(self, **kwargs: Any) -> dict[str, Any]:
          ctx = super().get_context_data(**kwargs)
          ctx["theme"] = self.request.COOKIES.get("theme", None)
          # ...
          return ctx
The base template will use this value to set data-bs-theme on the main <html> element, and that s all that is needed to select the color mode in Bootstrap:
<html lang="en" % if theme %  data-bs-theme="  theme  " % endif % >
The view uses HTTP POST as it changes state, so theme selection happens in a form:
<form id="footer-theme" class="col-auto" method="post"
      action=" % url "theme-selection" % ">
     % csrf_token % 
    <input type="hidden" name="next" value="  request.get_full_path  ">
    Theme:
    <button type="submit" name="theme" value="dark">dark</button>
     
    <button type="submit" name="theme" value="light">light</button>
     
    <button type="submit" name="theme" value="auto">auto</button>
</form>
Since we added the theme selection buttons in the footer, we use CSS to render the buttons in the same way as the rest of the footer links. Bootstrap has a set of CSS variables that can be used to easily in sync with the site theme, and they are especially useful now that the theme is configurable:
footer button  
    background: none;
    border: none;
    margin: 0;
    padding: 0;
    color: var(--bs-link-color);
 

Theme autoselection Bootstrap would support theme autoselection via browser preferences, but that requires rebuilding its Sass sources. Alternatively, one can use JavaScript:
 % if not theme % 
    <script blocking="render">
    (function()  
        let theme = window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
        let [html] = document.getElementsByTagName("html");
        html.setAttribute("data-bs-theme", theme);
     )();
    </script>
 % endif % 
This reads the color scheme preferences and sets the data-bs-theme attribute on <html>. The script is provided inline as it needs to use blocking="render" to avoid flashing a light background at the beginning of page load until the attribute is set. Given that this is a render-blocking snippet, as an extra optimization it is not added to the page if a theme has been set.

Bootstrap CSS fixes We were making use of the bootstrap btn-light class in navbars to highlight elements on hover, and that doesn t work well with theme selection. Lacking a button class that does the right thing across themes, we came up with a new CSS class that uses variables to define a button with hover highlight that works preserving the underlying color:
:root[data-bs-theme=light]  
    --debusine-hover-layer: rgb(0 0 0 / 20%);
    --debusine-hover-color-multiplier: 0.8;
    --debusine-disabled-color-multiplier: 1.5;
 
:root[data-bs-theme=dark]  
    --debusine-hover-layer: rgb(255 255 255 / 20%);
    --debusine-hover-color-multiplier: 1.2;
    --debusine-disabled-color-multiplier: 0.5;
 

/* Button that preserves the underlying color scheme */
.btn-debusine  
  --bs-btn-hover-color: rgb(from var(--bs-btn-color) calc(r * var(--debusine-hover-color-multiplier)) calc(g * var(--debusine-hover-color-multiplier)) calc(b * var(--debusine-hover-color-multiplier)));
  --bs-btn-hover-bg: var(--debusine-hover-layer);
  --bs-btn-disabled-color: rgb(from var(--bs-btn-color) calc(r * var(--debusine-disabled-color-multiplier)) calc(g * var(--debusine-disabled-color-multiplier)) calc(b * var(--debusine-disabled-color-multiplier)));
  --bs-btn-disabled-bg: var(--bs-btn-bg);
  --bs-btn-disabled-border-color: var(--bs-btn-border-color);
 

Dark mode! This was a nice integration exercise with many little tricks, like how to read color scheme preferences from the browser, render form buttons as links, use bootstrap variables, prevent a flashing background, handle cookies in Django. And Debusine now has a dark mode!

3 November 2025

Russ Allbery: Review: The Raven Scholar

Review: The Raven Scholar, by Antonia Hodgson
Series: Eternal Path Trilogy #1
Publisher: Orbit
Copyright: April 2025
ISBN: 0-316-57723-5
Format: Kindle
Pages: 651
The Raven Scholar is an epic fantasy and the first book of a projected trilogy. It is Antonia Hodgson's first published fantasy novel; her previous published novels are historical mystery. I would classify this as adult fantasy the main character is thirty-four with a stable court position but it has strong YA vibes because of the generational turnover feel of the main plot. Eight years before the start of this book, Andren Valit attempted to assassinate the emperor and failed. Since then, his widow and three children twins Yana and Ruko and infant Nisthala have been living in disgrace in a cramped apartment, subject to constant inspections and suspicion. As the story opens, they have been summoned to appear before the emperor, escorted by a young and earnest Hound (essentially the state security services) named Shal Worthy. The resulting interrogation is full of dangerous traps. Not all of them will be avoided. The formalization of the consequences of that imperial summons falls to an unpopular Junior Archivist (Third Class) whose one notable skill is her penmanship. A meeting that was disasterous for the Valits becomes unexpectedly fortunate for the archivist, albeit with a poisonous core. Eight years later, Neema Kraa is High Scholar, and Emperor Bersun's twenty-four years of permitted reign is coming to an end. The Festival is about to begin. One representative from each of the empire's eight anats (religious schools) will compete in seven days of Trials, save for the Dragons who do not want the throne and will send a proxy. The victor according to the Trials scoring system will become emperor and reign unquestioned for twenty-four years or until resignation. This is the system that put an end to the era of chaos and has been in place for over a thousand years. On the eve of the Trials, the Raven contender is found murdered. Neema is immediately a suspect; she even has reasons to suspect herself. She volunteers to lead the investigation because she has to know what happened. She is also volunteered to be the replacement Raven contender. There is no chance that she will become emperor; she doesn't even know how to fight. But agnostic Neema has a rather unexpected ally.
As the last chime fades we drop neatly on to the balcony's rusting hand rail, folding our wings with a soft shuffle. Noon, on the ninth day of the eighth month, 1531. Neema Kraa's lodgings. We are here, exactly where we should be, at exactly the right moment, because we are the Raven, and we are magnificent.
The Raven Scholar is a rather good epic fantasy, with some caveats that I'll get to in a moment, but I found it even more fascinating as a genre artifact. I've read my share of epic fantasy over the years, although most of my familiarity of the current wave of new adult fairy epics comes from reviews rather than personal experience. The Raven Scholar is epic fantasy, through and through. There is court intrigue, a main character who is a court functionary unexpectedly thrown into the middle of some problem, civilization-wide stakes, dramatic political alliances, detailed magic and mythological systems, and gods. There were moments that reminded me of a Guy Gavriel Kay novel, although Hodgson's characters tend more towards disarming moments of humanization instead of Kay's operatic scenes of emotional intensity. But The Raven Scholar is also a murder mystery, complete with a crime scene, clues, suspects, evidence, an investigation, a possibly compromised detective, and a morass of possible motives and red herrings. I'm not much of a mystery reader, but this didn't feel like sort of ancillary mystery that might crop up in the course of a typical epic fantasy. It felt like a full-fledged investigation with an amateur detective; one can tell that Hodgson's previous four books were historical mysteries. And then there's the Trials, which are the centerpiece of the book. This book helped me notice that people (okay, me, I'm the people) have been sleeping on the influence of The Hunger Games, Battle Royale, and reality TV (specifically Survivor) on genre fiction, possibly because the more obvious riffs on the idea (Powerless, The Selection) have been young adult or new adult. Once I started looking, I realized this idea is everywhere now: Throne of Glass, Fourth Wing, even The Night Circus to some extent. Competitions with consequences are having a moment. I suspect having a competition to decide the next emperor is going to strike some traditional fantasy readers as sufficiently absurd and unbelievable that it will kick them out of the book. I had a moment of "okay, this is weird, why would anyone stick with this system for so long" myself. But I would encourage such readers to interrogate whether that's only a response from unfamiliarity; after all, strange women lying in ponds distributing swords is no basis for a system of government either. This is hardly the most unrealistic epic fantasy trope, and it has the advantage of being a hell of a plot generator when handled well. Hodgson handles it well. Society in this novel is structured around the anats and the eight Guardians, gods who, according to myth, had returned seven times previously to save the world, but who will destroy the world when they return again. Each Guardian represents a group of characteristics and useful societal functions: the Ox is trustworthy, competent and hard-working; the Fox is a trickster and a rule-bender; the Raven is shrewd and careful and is the Guardian of scholars and lawyers. Each Trial is organized by one of the anats and tests the contenders for the skills most valued by that Guardian, often in subtle and rather ingenious ways. There are flaws here that you could poke at if you wanted to, but I was charmed and thoroughly entertained by how well Hodgson weaves the story around the Trials and uses the conflicting values to create character conflict, unexpected alliances, and engrossing plot. Most importantly for a book of this sort, I liked Neema. She has a charming combination of competence, quirks (she is almost physically unable to not correct people's factual errors), insecurity, imposter syndrome, and determination. She is way out of her depth and knows it, but she has an ethical core and an insatiable curiosity that won't let her leave the central mysteries of the book alone. And the character dynamics are great; there are a lot of characters, including the competition problem of having to juggle eight contenders and give them all sufficient characterization to be meaningful, but this book uses its length to give each character some room to breathe. This is a long book, well over 600 pages, but it felt packed with events and plot twists. After every chapter I had to fight the urge to read just one more. The biggest drawback of this book is that it is very much the first book of a trilogy, none of the other volumes are out yet, and the ending is rather nasty. This is the sort of trilogy that opens with a whole lot of bad things happening, and while I am thoroughly hooked and will purchase the next volume as soon as it's available, I wish Hodgson had found a way to end the book on a somewhat more positive or hopeful note. The middle of the book was great; the end was a bit of an emotional slog, alas. The writing is good enough here that I'm fairly sure the depression will be worth it, but if you need your endings to be triumphant (and who could blame you in this moment in history), you may want to wait on this one until more volumes are out. Apart from that, though, this was a lot of fun. The Guardians felt like they came from a different strand of fantasy than you usually see in epic, more of a traditional folk tale vibe, which adds an intriguing twist to the epic fantasy setting. The characters all work, and Hodgson even pulls off some Game of Thrones style twists that make you sympathetic to characters you previously hated. The magic system apart from the Guardians felt underbaked, but the politics had more depth than a lot of fantasy novels. If you want the truly complex and twisty politics you would get from one of Guy Gavriel Kay's historical rewrites, you will come away disappointed, but it was good enough for me. And I did enjoy the Raven.
Respect, that's all we demand. Recognition of our magnificence. Offerings. Love. Fear. Trembling awe. Worship. Shiny things. Blood sacrifice, some of us very much enjoy blood sacrifice. Truly, we ask for so little.
Followed by an as-yet untitled sequel that I hope will materialize. Rating: 7 out of 10

8 October 2025

Sven Hoexter: Backstage Render Markdown in a Collapsible Block

Brief note to maybe spare someone else the trouble. If you want to hide e.g. a huge table in Backstage (techdocs/mkdocs) behind a collapsible element you need the md_in_html extension and use the markdown attribute for it to kick in on the <details> html tag. Add the extension to your mkdocs.yaml:
markdown_extensions:
  - md_in_html
Hide the table in your markdown document in a collapsible element like this:
<details markdown>
<summary>Long Table</summary>
  Foo   Bar  
 - - 
  Fizz   Buzz  
</details>
It's also required to have an empty line between the html tag and starting the markdown part. Rendered for me that way in VSCode, GitHub and Backstage.

Next.