Search Results: "alan"

11 May 2026

Freexian Collaborators: Debusine workflow performance issues (by Colin Watson)

During March and April, we had a number of performance issues that made Debusine s core functions of running work requests and reflecting their results in workflows quite unreliable. Investigating and fixing this took up a lot of time from both the Debusine development team and Freexian s sysadmins. The central problems involved a series of database concurrency and worker communication issues that interacted in complex ways. On bad days, this caused between 10% and 25% of processed work requests to fail unnecessarily. We communicated some of the problems to users on IRC, but not consistently since we didn t entirely understand the scope of the problems at the time. Most of the problems are fixed now, but we had a retrospective meeting to make sure we understood what happened and that we learn from it. Here s a summary.

Data model Debusine s workflows consist of many individual work requests. Each work request has a database row representing its state, which means that the overall state of a workflow is distributed across many rows. Changes to one work request (for example, when it is completed) can cause changes to other work requests (perhaps unblocking it so that it can be scheduled to an idle worker). Those changes may happen concurrently, and in practice often do. Workers typically need to create artifacts containing the output of tasks: these include things like packages, build logs, and test output. Debusine records task history so that it can make better decisions about how to schedule work requests. Since this might otherwise grow without bound, the server expires older parts of that history after a while. The same is true for many other kinds of data.

Causes
  • Because workflows involve changes that propagate between work requests, there were historically some cases where different parts of the system could deadlock due to trying to take update locks on overlapping sets of work request rows in different orders. We mitigated that somewhere around 2025-11-05 by locking entire workflows in one go before making any change that might need to propagate between work requests like this; that dealt with the deadlocks, but it s quite a heavyweight locking strategy that sometimes caused significant delays.
  • We ve been working for some time to make Debusine useful to Debian developers, and regression tracking is an important part of that: it lets developers test uploads without being too badly misled by tests in related packages that were already failing before they started. On 2026-03-11 we enabled this by default on debusine.debian.net, after testing it for a while. Although this is useful, it put more load on the system as a whole, often approximately doubling the number of work requests in a given workflow with many additional dependencies between them.
  • Like much of the world, we re in an arms race with unethical scrapers desperately trying to feed everyone else s data into LLMs before they run out of money. We saw a substantial uptick here towards the end of March, which meant that we had to temporarily disable regression tracking and to put some other mitigations in front of our web interface.
  • We historically haven t had systematic internal timeouts. Prompted by ruff, a Google Summer of Code applicant went through and added timeouts in many places, including some calls between the worker and the server. This was fiddly work and the student did a solid job, so I m not putting them on blast or anything! However, it did mean that some things that came in under load balancer timeouts now timed out earlier on the client side of the request (and hence in Debusine workers), which made some problems show up in different ways and be more obvious. This was deployed on 2026-04-03.

Fixes

Workflow orchestration Figuring out what individual work requests need to be run as part of a workflow - the process we call orchestration - can be challenging. Unlike typical CI pipelines, these workflows often span substantial chunks of a distribution: a glibc update can involve retesting nearly everything! Nevertheless, it s not particularly helpful for it to take hours just to build the workflow graph. Fixing this involved many classic database optimizations such as adding indexes and CTEs, but probably the most effective fix was adding a cache for lookups within each orchestrator run or work request. Profiling showed that resolving lookups was a hot spot, and the way that task data is often passed down through a workflow meant that the same lookup could be resolved hundreds or thousands of times in a large workflow.

Expiry We knew for quite some time that our expiry job took very aggressive locks, effectively blocking most of the rest of the system. This was an early decision to make the expiry logic simpler by allowing it to follow graphs without worrying about concurrent activity, but it clearly couldn t stay that way forever. Row locks in PostgreSQL was very helpful in figuring out the correct approach here. Since we re mainly concerned about the possibility of new foreign key references being created to artifacts we re considering for expiry, and since that would involve taking FOR KEY SHARE locks on those rows, we can explicitly take FOR UPDATE locks (which conflict with FOR KEY SHARE), and then recompute the set of artifacts to expire with any locked artifacts marked to keep. This was delicate work, but it saved minutes of downtime every day.

Whole-workflow locking I mentioned earlier that we avoided some deadlock issues by taking locks on entire workflows. To ensure that these locks are effective even against code that isn t specifically aware of them, this is implemented by using SELECT FOR UPDATE on all the work request rows in the workflow. In some cases the search for which rows to lock itself tripped up the PostgreSQL planner.

Scheduling We run multiple Celery workers for various purposes. Some of them can do many things in parallel, but in some specific cases (notably the task scheduler) we only ever want a single instance to run at once. Unfortunately a bug in the systemd service meant that the scheduler often ran concurrently anyway! Once we fixed that, the scheduler logs became a lot less confusing. When Debusine was small, it was reasonable for it to perform scheduling very aggressively, typically as soon as any change occurred to a work request or a worker that might possibly influence it. This doesn t scale very well, though, and even though we tried to batch multiple scheduling triggers that occurred within a single transaction, it could still make debugging very confusing. We reduced the number of changes that would result in immediate scheduling, and deferred everything else to a regular tick . The scheduler may not be able to assign a work request to an idle worker due to the workflow being locked. That isn t a major problem in itself; it can just try again later. However, in very large workflows, we found that it often worked its way down all the pending work requests one by one finding that each of them was locked, which was slow and also produced a huge amount of log noise. It now assumes that if a work request is locked, then it might as well skip other work requests in the same workflow until the next scheduler run. Between them, these changes reduced the number of locks typically being held on debusine.debian.net by about 80%: Lock graph

Worker refactoring The Debusine worker has always been partially asynchronous, but while it was actually executing a task - in other words, most of the time, at least in busy periods - it didn t respond to inbound websocket messages, causing spurious disconnections. We restructured the whole worker to be fully event-based. We also had to put quite a bit of effort into improving the path by which workers report work request completion, because if that hits a timeout then it can mean throwing away hours of work. We have some further improvements in mind, but for now we defer most of this work to a Celery task so that whole-workflow locks aren t on the critical path.

Database write volume One of our sysadmins observed that our database write volume was consistently very high. This was a puzzle, but for a long time we left that unexplored. Eventually we thought to ask PostgreSQL s own statistics, and we found a surprise:
debusine=> SELECT relname AS table_name,
debusine->        n_tup_ins AS inserts,
debusine->        n_tup_upd AS updates,
debusine->        n_tup_del AS deletes,
debusine->        (n_tup_ins + n_tup_upd + n_tup_del) AS total_dml
debusine-> FROM pg_stat_user_tables
debusine-> WHERE (n_tup_ins + n_tup_upd + n_tup_del) > 0
debusine-> ORDER BY total_dml DESC
debusine-> LIMIT 20;
              table_name                inserts    updates     deletes   total_dml
--------------------------------------+---------+------------+---------+------------
 db_collectionitem                      1418251   3578202388   3630143   3583250782
 db_token                                 15143     11212106     11389     11238638
 db_workrequest                          386196      6399071   1820500      8605767
 db_fileinartifact                      2783021      1837929   1663887      6284837
 django_celery_results_taskresult       1819301      1501623   1791656      5112580
 db_artifact                             960077      3340859    663890      4964826
 db_collectionitemmatchconstraint       1550457            0   2207486      3757943
 db_artifactrelation                    2229382            0   1363825      3593207
 db_fileupload                          1023400      1057036   1023346      3103782
 db_file                                1673194            0    970252      2643446
 db_fileinstore                         1411995            0    970259      2382254
 db_filestore                                 0      2381578         0      2381578
 django_session                          645423      1519880       531      2165834
 db_workrequest_dependencies             365877            0    936537      1302414
 db_worker                                18317       949280      9487       977084
 db_collection                            10061           85    177741       187887
 db_workerpooltaskexecutionstatistics     28721            0         0        28721
 db_workerpoolstatistics                   1640            0         0         1640
 db_workflowtemplate                        130          158       649          937
 db_identity                                 76          661         0          737
(20 rows)
Oh my - that s a lot of db_collectionitem updates and must surely be out of proportion with what we really need. Can we narrow that down by asking about the most recently-updated tuples?
debusine=> SELECT DISTINCT category
debusine-> FROM db_collectionitem
debusine-> WHERE id IN (
debusine->     SELECT id FROM db_collectionitem
debusine->     ORDER BY xmin::text::integer DESC LIMIT 10000
debusine-> );
           category
------------------------------
 debusine:historical-task-run
(1 row)
That might not be absolutely reliable, but it was certainly a hint. As per PostgreSQL s documentation, by default UPDATE always performs physical updates to every matching row regardless of whether the data has changed, and our code to expire old task history entries wasn t doing that properly. Once we knew where to look, it was easy to add some extra constraints. This reduced our mean write volume on debusine.debian.net from about 23 MB/s to about 3 MB/s, which had an immediate knock-on effect on our request failure rate: Disk write graph HTTP errors

Current state Our metrics indicate that things are a lot better now. We still have a few things to deal with, such as:
  • Some more performance fixes are on their way to fix some remaining cases where views are very slow or where file uploads from workers fail due to locks.
  • We have some changes in the works to revamp how work request changes propagate through workflows in a way that doesn t require so many heavyweight locks.
  • We have a number of monitoring and alerting improvements we d like to make, both for outcomes (things like slow Celery tasks) and possible root causes (database performance). We d also like to deploy some more modern observability tools; hunting for things using journalctl isn t terrible, but it s not really the state of the art.
  • We need to improve how we communicate to users when we re having operational problems, both informally (IRC, etc.) and on the site.
  • Retries don t always behave the way you d expect in workflows.
I hope this has been an interesting tour through the sorts of things that can go wrong in this kind of distributed system!

9 May 2026

Russell Coker: Bad Criticism of LLMs (not AI)

Discussion of AI systems seems to be dominated by fears of uncommon and unlikely threats. I think that we should be focusing more on real issues with LLMs and with society in general and put the most effort towards the biggest problems. It s Not AI True Artificial Intelligence [1] (IE a computer that has the mental capacity of a household pet) is something that I think can be developed, but it hasn t been developed and we don t have good plans for developing it. We seem to be a lot further away from achieving that goal than we were from landing on the moon in 1962 when JFK gave his historic speech. What we have is a variety of pattern recognition systems that can predict what fits into a pattern. The most well known type of Machine Learning (ML) is the Large Language Model (LLM) which means ChatGPT and similar systems which predict which text would be likely to come next and can make an essay from it. They can give interesting and useful output, but there is no thought behind it, it s just a better form of Eliza (the famous program from 1964 that simulates conversation by pattern matching) [2]. By analysing billions of documents, storing the data in a condensed mathematical way, and then using computation to extract from that record LLMs can produce output that is unfortunately considered by some people to be good enough to include in legal documents submitted to courts, university assignments, and many other documents. But they do so without even having the thinking ability of a mouse. To call current systems AIs without any significant qualifiers when criticising them is to concede the debate about the worth of such things. If we develop AIs that can actually think we will have to deal with the issues in the SciFi horror short story Lena by qntm [3]. The Bad Arguments Here is a list of some of the most unreasonable arguments I ve seen against AI which distract attention from real problems both related to AI and other problems in society. Suicide and Homicide Wikipedia has a page listing Deaths Linked to Chatbots [4] which right now has 16 entries from 2023 to Feb 2026. They are all tragedies and as a society we should try to prevent such things. But what I would like to see from the media is some analysis of overall trends, yes it gets people s attention when someone dies in an unusual way but we need to have attention paid to the more numerous deaths which are preventable. It has become a standard practice to give information on Lifeline in media referencing suicide, it would be good if they also developed a practice of mentioning the relative incidence of a problem when publishing an article about it. One of the many factors that cause more suicides than chatbots is school, Scientific American has an informative article from 2022 about the correlation between child suicide and school [5]. It is based on US statistics and shows that the lowest suicide rate is in July (a no-school month in the US) which has a rate of 2.3 per 100,000 person years. So if kids had a quality of life equivalent to July all year around then there would be 2.3 suicides per 100,000 kids every year while if they had a quality of life equivalent to a Monday in January or November it would be 3.9 suicides per 100,000 kids every year. The article states Any time I present these data to teachers, parents, principals or school administrators, they are shocked. This should be common knowledge. It is common knowledge to anyone who takes any notice of what happens in schools, but paying attention to serious problems is unpleasant, it s more fun to pretend that school is good for everyone. No parent wants to think that they sent their child to a place that was horrible, no teacher wants to think that they are part of a system that harms kids. The US CDC has an informative article about youth suicide [6] which documents it as the 3rd largest cause of death in the 14-18 age range fro 2021. This article was published in 2024 and based on statistics from 2023 and earlier. It notes significant differences in suicides, attempts, and persistent feelings of sadness or hopelessness which had girls at more than twice the rate of boys and LGBQ+ kids at more than twice the rate of heterosexual students. It seems obvious that misogyny and homophobia is correlated with suicide and that s something that could and should be addressed in schools. My state has a Safer Schools program [7] to try and alleviate the problems related to homophobia, but I expect that things are getting worse in the US in that regard. 39.7% of kids in US high schools had persistent feelings of sadness or hopelessness before LLMs became popular, school could and should be a happy time for the vast majority of kids but instead almost half of the kids don t enjoy it and a majority of girls and LGBQ+ kids don t. Having no mention of trans kids is a significant omission from that article, based on everything I ve heard from trans people I expect that their statistics would be even worse. One could argue that the small number of deaths inspired by use or misuse of LLMs is an indication of a larger number of people suffering in ways that don t result in death and don t get noticed. But I don t think that can compare to the fact that the majority of girls and LGBQ+ kids have persistent feelings of sadness or hopelessness in the current school system. Regarding homicide, the Australian Institute of Criminology has an article showing that in the 2003-2004 time period 49% of women who were killed were listed as a domestic argument [8], that s something that could and should be addressed. That article claimed 308 homicide victims in that time period which is larger than the world-wide death toll from LLMs but also less than 1/3 the death toll from car accidents in Australia. Australia has less than 0.4% of the world population, a fairly low homicide rate, and a number of homicides that vastly outnumbers all world homicides related to LLMs. I think it s great to address any cause of suicide or homicide, but devoting government resources and legislation towards very uncommon causes instead of things that happen every day is not a good strategy. It would be fine to address all factors leading to suicide, but problems with the school system have been a major factor for decades with little effort applied to fix it. Fraud and Other Crime There is evidence of criminals using LLMs to help prepare for crimes, the ability to generate large amounts of text quickly can be used for fraud and extortion. This is going to be a serious problem and we need structural changes to society to deal with it. There is an ongoing issue of scammers convincing older people that their child or other young relative is in trouble and a large amount of cash is required to address it. This sort of scam as well as the more well known Nigerian scams will probably become more common as the cost of running them decreases. This may be more of a problem for people in developing countries as currently a common scam business model is to have people in regions where wages are low (such as Pakistan for one who I spoke to) scamming people in relatively wealthy countries like Australia so an attack with a low probability of success is financially viable. Cheaper attacks will make less affluent victims financially viable to the scammers. While writing this post I received a financial scam phone call trying to get me to invest in SpaceX that was run by an AI chat system, I expect to receive more of them and this is something that needs to be dealt with via both technical measures and legislation. Do we have to accept less freedom and less anonymity in finances as a cost of reducing financial crime? Greater restrictions on the use of cash would make some crimes more difficult or less profitable for criminals. As a society I think we need to have a discussion about a balance between financial freedom and freedom from criminal exploitation, failing to have such a discussion is likely to lead to policies which don t work well. Also one thing that ML systems are good at is recognising patterns in data. Banks could scan all their transactions and look for patterns that correlate with fraud. They currently do this badly and do things like locking credit cards when someone goes to another country and spends money. They could do a better job of that and involve the police in cases of obvious fraud even when the customer doesn t realise that they are a victim. This isn t a reason to criticise AIs , it s a reason to plan defensive technology that matches the capabilities of attackers. As an aside I used to work for a company that was developing AI software to scan bank phone calls and allow banks to recognise employees who acted illegally. Unfortunately the Royal Commission into banking misconduct [9] didn t impose any penalties that gave the banks a financial reason to avoid criminal activity. Unemployment and Inequality There are many claims about AI systems making large numbers of jobs obsolete, some of them are outlandish such as the claims that all white-collar jobs will be obsolete in the near future. There are some reasonable claims like the ability to replace some mundane jobs. Replacing jobs that suck with computers, robots, and other machinery is a good thing! Very few people wish that they were working on a farm without a tractor. In 1900 it s estimated that between 60% and 70% of the world labour force worked in agriculture and 40% of the US labour force did so. Now it s something like 27% globally and between 1% and 3% in developed countries. Automated factories are also a good thing, it s best to avoid boring and dangerous work. The most plausible claims about job replacement from AI is jobs that involve analysing and summarising documents. One example that comes to mind is the worst kind of journalism where press releases from companies are massaged into the format of a feature article. I don t think anyone wants that sort of job and doing it with AI hopefully means no human has to sign their name to it. For work like programming few people will be directly replaced by AI but if people can do their work more efficiently while using it then less people are required. I don t think that any programmer likes the part of their job where they have to skim read long documents looking for a clue about how to solve a problem with a library or protocol. A LLM processing the document and finding the potentially useful things will take away the drudgery from the work and allow greater productivity. The trend in replacing people has been making people work longer. If you force all employees to work 60 hour weeks then that can theoretically allow hiring fewer people than having 40 hour weeks. For some work that applies but for skilled work it mostly doesn t as productivity and work quality on average drops when people work more than 40 hours in a week. Another trend for exploiting people is having a low minimum wage and making accommodation expensive so that many people need to work two jobs. What we need is legislation to restore the situation in the 70s where a single full time job was sufficient to provide for a family. The low minimum wage and high expenses for many things is a problem that s been slowly developing over the course of decades while being mostly ignored by journalists. If they could concentrate on the real issues that are hurting workers today they could incite political action to fix these problems. Academic Cheating There is no shortage of ways of cheating in school and university. There are people who are paid to write essays, mobile phones are used for cheating in exams, etc. Getting an AI to write essays makes it easier to cheat for the essay writing part but does so with lower quality and in a less stealthy way. What s the worst case scenario? That we have to change to oral exams for all university subjects? In the US the average annual price for tuition at a university is apparently $25,000, if each student had individually supervised assessment for their exams at a cost of $100 per hour it would make the degree cost 4% more. The cost of university in the US is unreasonably high and that s a problem that needs to be fixed, but a hypothetical case of increasing the price by 4% isn t going to be a major part of it. Weak Arguments Against AI Computer Security Attacks There have been many claims made that AI will break the security of all systems and cause the type of disruption that was previously predicted for year 2000. Bruce Schneier has written a good analysis of the issues including how AI can be used by both attackers and defenders [10], he doesn t have a strong conclusion on whether the net result will be good or bad but his article does make it clear that the result is not going to be a total disaster. While I was working on this post I read another post by Bruce Schneier that was significantly more negative about this issue [11]. While I still don t think this will destroy civilisation I found his other post convincing enough to move computer security from the bad argument section to the weak argument section. Spidering the Web to Death There are issues of bots from AI companies doing a bad job of trying to download all the Internet s content and using a lot of resources. When it was just the major search engines and the Wayback Machine doing it the load was small due to having a small number of organisations that were very good at the way they did it having evolved practices over many years. Now we have a lot of idiots doing it badly and repeatedly hitting generated content. This is really annoying but is something that we can deal with. Currently my blog and many other sites are hosted on a Hetzner server with a E3-1271 v3 CPU with 32G of RAM and there are occasions where more than half the CPU power is being used to service web requests from such systems. Even on the server bidding (renting servers previously used by other customers) Hetzner isn t offering systems so slow nowadays, the slowest they offer is about 20% faster than that. This is something that can be dealt with by spending a little more on hosting until the companies doing that go bankrupt. I m sure this is a serious problem for some people, but for most people it s not a big deal. Also hostile traffic on the Internet is something we have all had to deal with as a part of life since the mid to late 90s. RAM Prices The unreasonably high prices for RAM are annoying and hurt the development of useful computer projects. Big companies can afford it, even with current high prices and large quantities of RAM used for some servers it s still not significant. But it is a major issue for hobbyists and small projects. Things like setting up a dozen test VMs for FOSS development are now too expensive for many people who develop software in their spare time. But this is a temporary thing, if AI companies were to keep buying RAM at high rates for a few years companies would just manufacture more of it to meet demand. In some situations capitalism can work. Environmental Damage There are many people claiming that power used by data centers for AI will lead to environmental damage, using power and water when there isn t enough. The trend of computer hardware is to get smaller and faster. It hasn t been going as fast as it used to in many areas but it hasn t stopped either and it s an exponential trend. There has been an increase in data centers (DCs) for AI use as the use has been increasing faster than the hardware gets smaller. Eventually they will stop increasing faster than advances in hardware and software can match and the size of DCs will decrease. As the production of renewable energy is increases the environmental cost of energy hungry industries decreases. In a few years this won t be an issue anyone is bothered about. False Claims About Danger as PR Jamie McClelland makes an interesting claim that the AI companies are pushing dangers of AI as a method of PR [12]. That seems plausible and combined with the tendency of many journalists to just massage press releases from companies into articles could be the reason for a lot of the bad arguments against AI. Good Arguments Against AI Spam Everywhere I ve previously written about Communication and Hostile AIs [13]. I think that filling all communication channels with rubbish is a denial of service attack against society. In the past communication took some effort, even the simplest email that was directly targeted at the recipient took some human effort and that reduced it s frequency. I get a lot of spam saying something like I see your web site doesn t rank in the top for Google searches while my web site in fact rates well and the actor named Russell Coker is ranking below me, so I know that such spam hasn t had the minimum of human involvement. Now a spammer who wanted to do a better job could get an LLM written spam for every target so the message was specifically aimed at them and would take much longer to be recognised by a human as spam and would also avoid most anti-spam software. Searching for businesses used to be easy, the phone book had listings for them and there was a real cost to being in the book as well as humans actively trying to stop fraud. Creating fake web sites to get business isn t too difficult but it s also not trivial at the moment and such fake sites won t look complete. Now with LLMs it s possible to create hundreds of sites that have content and look reasonable without human involvement. Instead of the small number of suicides and homicides inspired by AI chat systems we should probably be concerned about people who need psychological or medical advice being misled by bogus web sites created as part of fraud campaigns. Imagine people searching for mental health assistance finding web sites run by cults who oppose psychology as a profession. Imagine people searching for basic medical advice such as how to cook a healthy meal getting sucked in to web sites that start sane and then lead people to Ivermectin as a universal medicine. LLMs have the potential to take spam from quick and simple attacks to large scale targeted fraud aimed at people and organisations that don t have the resources to defend against it. There have been many reports of CEO impersonation fraud against major corporations aiming to steal hundreds of thousands of dollars and fraud against individuals who are persuaded to get amounts like $50,000 to help a relative who is allegedly in a difficult situation. But if every corner store experienced the same type of attack that CEOs experience and if every child had someone trying to steal the pocket money in the same way that relatively wealthy people are being targeted now it would really change things. David Brin wrote an insightful and informative blog post about this focusing on how AI generated content is being allowed to destroy YouTube [14]. Deep Fakes There is some overlap between filling all communications channels with rubbish (fake news etc) and deep fake. Making a fake photo of a politician or celebrity to lobby for legislative changes is a real issue but it s not what most people think of when the term deep fake is used. Using photo and video faking targeting non-consenting people is a serious issue. It s not just fake porn (which is a major issue and will cause some suicides) as there are many other possibilities. Fake videos showing behaviour that justifies sacking people from their jobs is going to become an issue, for people in public facing positions even proof that the videos are fake won t necessarily help them. Will we find ourselves in a situation where every politician gets deep-fake porn made of them and the only people who run for public office are ones who are cool with that? Will positions of leadership in the technology industry be restricted to people who aren t bothered by having the most depraved fake porn made of them? The Justice System We have seen a lot of evidence of law enforcement and the court system based on bias leading to bad results. The Innocence Project attempts to correct that and it s web site documents some of the things that have gone wrong [15]. Using AI systems to do some of the work of law enforcement by training computers on the flawed results of current systems can entrench bias and also make it harder to spot. When determining whether someone should be considered a suspect or whether a prisoner should be eligible for parole the number of factors that a human can use is limited. But a computer can take many more factors into account so the issues of whether inappropriate factors are being used can be masked. Computers are also unable to explain decisions that they made and are also able to come up with better fake reasons. In the past there have been racist policies in the US about banks not lending to people living in suburbs where most houses were owned by non-white people, these policies were documented and the documents have become part of the historical record showing racist policies. If a LLM decides not to lend money to people based on mathematical correlations it determined based on historical banking practices it could assign negative weights to factors such as non-English names and implement the racism in a large array of numbers with no proof. The current cases of lawyers getting LLM systems to do some of their work and having their incompetence revealed when the computer generated work is shown to be ridiculously bad are amusing. But that is not the real problem. The real problems will start when the computers in police cars start flagging every car owned by a non-write person as having a probable cause for a drug stop. Technically Not Financial Fraud The majority of the ecosystem around AI is a financial scam [16]. There are companies and individuals doing good things with machine learning some of which is based on hardware and software developed as part of this ecosystem. But the majority of it has no plausible path to profits and a the future of it inevitably ends with some bankruptcies. There are circular flows of money that have the major cloud providers and NVidia looped in, when the values of these companies correct it will become apparent that they have all burned a lot of money keeping this running and all the senior people have got a share of it (the entire purpose of stock options is to allow senior people to suck money out of the company). Then every cloud provider will increase costs while under chapter 11 and all the companies that depend on them will pay whatever it takes. That includes all major companies and most governments. Unlike the dot-com boom and crash and the housing crash the coming financial crash will impact every company that we deal with and most governments. So the people in first-world countries will effectively be taxed to pay for this scam while the executives go party in Monaco. This may seem like an extreme claim but it all happened before with the dot com crash and the housing market crash. The CEO class has an ongoing practice of doing things that aren t crimes because they lobby (bribe) politicians to make them legal. So the current stock market shenanigans around AI don t seem to involve things that governments consider to be crimes. But any normal person might be surprised to learn that such things are legal and most people would vote for such things to be crimes if they had the opportunity. A global financial crisis is the least of the problems that seem likely to afflict society from AI systems. But it will be more immediately obvious when it happens which could be this year! Propaganda Creating art requires skills that the type of people who want to create propaganda tend to lack. AI technologies allow creating art that is based on mathematical models of actual art to the requirements of the person running the program. I have seen the term AI Fascism used to describe the use of AI to help authoritarian governments. I am dubious about whether it deserves that term and while every article I ve read about the topic has had some good points I thought that they were all weak points. But there are lots of ways that governments can abuse their populations without going full fascist. In the last century there were lots of truly terrible governments that didn t even make the top 10 of fascism. AI Sycophants Bruce Schneier wrote an informative blog post about AI Chatbots and Trust which focused on sycophantic chatbots [17]. We have seen a lot of evidence of terrible behaviour and stupid decisions from rich people due to having no negative consequences for bad choices. The vast majority of the history of kings concerns bad decisions made by such people. A future where middle class and poor people can make the same bad decisions as rich people wouldn t be good. Good Things About ML Machine Learning (abbreviated as ML) can do useful things. It s not just Large Language Models (LLMs) such as ChatGPT etc. There are also ML systems that can analyse images and other data sets. I have found ChatGPT to be very useful for making suggestions for improving blog posts. I don t get it to write anything just ask for suggestions. It has pointed out things that I missed such as when I didn t include the price when reviewing a car because the car in question was much more expensive than I will ever pay, the price wasn t relevant to me but would be to some readers. It has also made useful suggestions about structure of blog posts, repeating points, and having a good conclusion. It has some downsides which include trying to erase my voice from my writing, suggesting that the rhetorical question does email suck? is unprofessional. I have worked for a company that used ML systems to analyse driver performance and alert people if a driver is falling asleep, using a phone, or otherwise seems unable to drive safely. Their business model involved a human reviewing the images from the drivers the computer flagged and then determining who is actually doing the wrong thing. This seems a good use of the technology. I have also worked for a company that used ML systems to analyse the performance of bank employees and detect potentially fraudulent behaviour. Preventing crime seems to be clearly a good thing and in this case the manager of the employee in question would review the evidence to make sure that they weren t being falsely accused. Conclusion I don t think that the problems with managing the changes that so called AI is introducing are particularly new. An example of how society handles change that s worth considering is car safety. The seat belt first became mandatory for aeroplanes in some jurisdictions in 1928. The Model T Ford is widely regarded as the first vehicle to start a mass market for cars and it was released in 1925. So if society acted in a reasonable way then for the majority of mass market cars seat belts would have been a standard feature. However seat belts were first made compulsory in 1970 in Victoria Australia and there are still people who think that they are safer without seat belts! The delay in adoption of car seat belts is only one example of needless deaths caused by not taking reasonable measures for car safety but it s one that s easy to demonstrate and measure. The difference between past problems like car safety and the current problems of AI is that the AI problems will be more pervasive. Most of my history as a car driver and car passenger was in cars that are much less safe than cars made in the last 10 years. But partly through luck I ve never been in a serious crash so being in cars that wouldn t have given me a low probability of surviving a freeway speed crash didn t affect me. There is no possibility that through any combination of luck and skill someone could avoid the downsides of AI . If nothing else the results of elections will be affected and no-one can avoid that. As a society we really need to address the real issues related to AI which in some cases requires legislation.

21 April 2026

Dirk Eddelbuettel: RcppArmadillo 15.2.6-1 on CRAN: Several Updates

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1263 other packages on CRAN, downloaded 45.7 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 683 times according to Google Scholar. This versions updates to the 15.2.5 and 15.2.6 upstream Armadillo releases from, respectively, two and five days ago. The package has already been updated for Debian, and built for r2u. When we ran the reverse-dependency check for 15.2.5 at the end of last week, one package failed. I got in touch with the authors, filed an issue, poked some more, isolated the one line that caused an example to fail and right then 15.2.6 came out fixing just that. It was after all an upstream issue. We used to ran these checks before Conrad made a release, he now skips this and hence needed a quick follow-up release. It can happen. The other big change is that this R package release phases out the dual support for both C++14 or newer (as in current Armadillo) along with a C++11 fallback for more slowly updating packages. I am happy to say that after over eight months of this managed transition (during which CRAN expulsed some laggard packages that were not moving in from C++11) we are now at all packages using C++14 or newer which is nice. And I will take this as an opportunity to stress that one can in fact manage a disruptive API change this way as we just demonstrated. Sadly, R Core does not seem to have gotten that message and rollout of this package was also still a little delayed because of the commotion created by the last minute API changes preceding the R 4.6.0 release later this week. Smaller changes in the package are a switch in pdf vignette production to the Rcpp::asis() driver, and a higher-precision computation in rmultinom() (matching a change made in R-devel during last week in its use of Kahan summation). All detailed changes since the last CRAN release follow.

Changes in RcppArmadillo version 15.2.6-1 (2026-04-20)
  • Upgraded to Armadillo release 15.2.6 (Medium Roast Deluxe)
    • Ensure internally computed tolerances are not NaN
  • The rmultinom deploys 'Kahan summation' as R-devel does now.

Changes in RcppArmadillo version 15.2.5-1 [github-only] (2026-04-18)
  • Upgraded to Armadillo release 15.2.5 (Medium Roast Deluxe)
    • Fix for handling NaN elements in .is_zero()
    • Fix for handling NaN in tolerance and conformance checks
    • Faster handling of diagonal views and submatrices with one row>

  • Sunset the C++11 fallback of including Armadillo 14.6.3 (#504 closing #503)
  • The vignettes have refreshed bibliographies, and are now built using the Rcpp::asis vignette builder (#506)
  • One rmultinom test is skipped under R-devel which has switched to a higher precisions calc
Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub. You can also sponsor my Tour de Shore 2026 ride in support of the Maywood Fine Arts Center.

19 April 2026

Russ Allbery: Review: Collision Course

Review: Collision Course, by Michelle Diener
Series: Class 5 #6
Publisher: Eclipse
Copyright: November 2024
ISBN: 1-7637844-0-1
Format: Kindle
Pages: 289
Collision Course is the sixth novel in the Class 5 science fiction series and the first that doesn't use the Dark X naming convention. There are lots of spoilers in this story for the earlier books, but you don't have to remember all the details of previous events. Like the novella, Dark Ambitions, this novel returns to Rose, Sazo, and Dav instead of introducing another Earth woman and Class 5 ship. In Dark Class, Ellie discovered an interesting artifact of a previously-unknown space-faring civilization. Rose, Sazo, and Dav are on their way to make first contact when, during a routine shuttle flight between the Class 5 and Dav's Grih military ship, Rose is abducted. The aliens they came to contact have an aggressive, leverage-based negotiating strategy. They're also in the middle of a complicated war with more sides than are readily apparent. What I liked most about Dark Horse, the first book of this series and our introduction to Rose, was the revealed ethical system and a tense plot that hinged primarily on establishing mutual trust when there were excellent reasons for the characters to not trust each other. As the series has continued, I think the plots have become more complicated but the ethical dilemmas and revealing moments of culture shock have become less common. That is certainly true of Collision Course; this is science fiction as thriller, with a complex factional conflict, a lot of events, more plot reversals than the earlier books, but also less ethics and philosophy. I'm not sure if this is a complaint. I kind of miss the ethics and philosophy, but Diener also hasn't had much new to say for the past few books. The plot of Collision Course is quite satisfyingly twisty for a popcorn-style science fiction series. I was kept guessing about the merits of some of the factions quite late into the book, although admittedly I was in the mood for light entertainment and was not trying too hard to figure out where the book was going. I did read nearly the entire book in one sitting and stayed up until 2am to finish it, which is a solid indication that something Diener was doing worked. I do have quibbles, though. One is that the ending is a bit unsatisfying. Like Sazo, I was getting quite annoyed at the people capturing (and recapturing) Rose and would have enjoyed somewhat more decisive consequences. Also, and here I have to be vague to avoid spoilers, I was expecting a bit more of a redemption arc for one of the players in the multi-sided conflict. The ending I did get was believable but rather sad, and I wish Diener had either chosen a different outcome (this is light happily-ever-after science fiction, after all) or wrestled more directly with the implications. There were a bit too many "wait, one more thing" ending reversals and not quite enough emotional payoff for me. The other quibble is that Collision Course was a bit too damsel in distress for this series. Rose is pregnant, which Diener uses throughout the book as a way to raise the stakes of the plot and also make Rose more annoyed but also less capable than she was in her earlier novel. Both Sazo and Dav are in full heroic rescue mode, and while Diener still ensures Rose is primarily responsible for her own fate, there is some "military men attempt to protect the vulnerable woman" here. One of the things I like about this series is that it does not use that plot, so while the balance between Rose rescuing herself and other people rescuing her is still tilted towards Rose, I would have liked this book more if Rose were in firmer control of events. I will mostly ignore the fact that a human and a Grih sexually reproducing makes little to no biological sense, since Star Trek did similar things routinely and it's an established genre trope. But I admit that it still annoys me a bit that the alien hunk is essentially human except that he's obsessed with Rose's singing and has pointy ears. Diener cares about Rose's pregnancy a lot more than I did, which added to my mild grumpiness at how often it came up. Overall, this was fine. I prefer a bit more of a protagonist discovering how powerful she is by making ingenious use of the ethical dilemmas her captors have trapped themselves in, and a bit less of Rose untangling a complicated political situation by getting abducted by every player serially, but it still kept the pages turning. Any book that is sufficiently engrossing for me to read straight through is working at some level. Collision Course was highly readable, undemanding, and distracting, which is what I was looking for when I read it. I would put it about middle of pack in the series. If Rose's pregnancy is more interesting to you than it was to me, that might push it a bit higher. If you have gotten this far in the series, you will probably enjoy this, although it does feel like Diener is running out of new things to say about this universe. That's unfortunate given the number of threads about AI sentience and rights that could still be followed, but I think tracing them properly would require more philosophical meat than Diener intends for these books. Which is why the next book I grabbed was a Culture novel. Currently this is the final book in the Class 5 series, but there is no inherent reason why Diener couldn't write more of them. Rating: 7 out of 10

18 April 2026

Valhalla's Things: Pizza!

Posted on April 18, 2026
Tags: madeof:atoms, craft:cooking
This post contains a bit of consumerism and is full of references to commercial products, none of which caused me to receive any money nor non-monetary compensation. This post has also been written after eating in one meal the amount of bread-like stuff that we usually have in more than 24 hours. I ve been baking bread since a long time ago. I don t know exactly when, but probably it was the early 2000s or so, and remained a regular-ish thing until 2020, when it became an extremely regular thing, as in I believe I bake bread on average every other day. In the before times, I ve had a chance to bake pizza in a wood fired oven a few times: a friend had one and would offer the house, my partner would mind the fire, and I would get there with the dough and prepare the pizza. Now that we have moved to a new house, we don t have a good and convenient place for a proper wood fired oven in masonry, but we can use one of the portable ones, and having dealt with more urgent expenses, I decided that just before the potential collapse of the global economy was a good time as any to buy the oven I had been looking at since we found this house. I decided to get an Ooni Karu 2, having heard good things about the brand, and since it looked like a good balance between size and portability. I also didn t consider their gas fired ovens (nor did I buy the gas burner) because I m trying to get rid of gas, not add stuff that uses it, and I didn t get an electric one because I m not at all unhappy with the bakery-style pizza we make in our regular oven, and I have to admit we also wanted to play with fire1. We also needed an outdoor table suitable to use the oven on and store it. Here I looked for inspiration at the Ooni tables (and for cheaper alternatives in the same style), but my mother who shares the outdoor area with us wasn t happy with the idea of steel2. And then I was browsing the modern viking shores, and found that there was a new piece in the N MMAR series my mother likes (and of which we already have some reclining chairs): a kitchen unit in wood with a steel top. At first I expected to just skip the back panel, since it would be in the way when using the oven, but then I realized that it could probably be assembled upside down, down from the top between the table legs, and we decided to try that option. This week everything had arrived, and we could try it. Yesterday evening, after dinner (around 21, I think) I prepared the dough with the flour I usually use for bakery-style pizza: Farina di Grano Tenero Tipo 0 PANE (320 - 340 W); since I wanted to make things easier for myself I only used 55% hydration, so the recipe was:
  • 1 kg flour
  • 550 g water
  • 2 g dry yeast
  • 12 g salt
The next time I think I ll try with one of my other staples: Molino Bogetto etichetta blu (260/280 W) Then this morning we assembled the N MMAR , then I divided the dough in eight balls, put them in a covered but not sealed container 3, well floured with rice flour and then we fired the oven (as in: my partner did, I looked for a short while and then set the table and stuff), using charcoal, because we already had some, and could conveniently get more at the supermarket. When the oven had reached temperatures in the orange range4 I stretched the smallest ball out, working on my wooden peel, sprayed it with water5, sprinkled it with coarse salt and put it in the oven. After 30 seconds I turned it around with the new metal peel, then again after 30 seconds, and then I lost count of how many times I repeated this6, but it was probably 2 or 3 minutes until it looked good. A flatbread on a regular plate: it's only a bit more than half the plate in diameter, puffed up near the borders and thin in the middle, and only lightly browned in places, not burnt. It's sitting on the lower shelf of a wooden table. And it was good. The kind of pizza that is quite soft, especially near the borders. We ate it with fresh mozzarella and tomatoes, and then made another one the same way, to finish the mozzarella. Another flatbread on the same plate, this time it's about 4 cm smaller than the plate on all sides, and it's covered with brownish-red chopped up vegetables. This was supposed to be our lunch, but we decided to try one with some leftover cooked radicchio, and that also worked quite nicely. And finally, we decided we needed to try a more classical pizza, with tomato sauce and cured meat, of which we forgot to take pictures. Up to here we had eaten about half of the dough, and we were getting full: I had prepared significantly more than what I expected to eat, to be able to accidentally burn some, but also with the idea to bake something else to be eaten later. So I made two more focaccias with just water and salt, and then I tried to cook some bread with what I expected to be residual heat. Another flatbread with coarse salt and two bread rolls, one of which is completely carbonized on one side. The other one has been cut, and while it has a carbonized spot, it is also well cooked in the middle, and perfectly edible. Except that the oven was getting a bit too cold, so my partner added some charcoal, and when I put the last two unflattened balls right at the back of the oven where it was still warmer, that side carbonized. After 5 minutes I moved them to the middle of the oven, and turned them, and then after another turn and 5 more minutes they were ready. And other than the burnt crust, they were pretty edible. So, the thoughts after our first experience. Everybody around the table (my SO, my mother and me) was quite happy with the results, and they are different enough from the ones I could get with the regular oven. As I should have expected, it s much faster than a masonry oven, both in getting to temperature and in cooling down: my plan for residual heat bread cooking will have to be adjusted with experience. We were able to get it hot enough, but not as hot as it s supposed to be able to get: we suspect that using just charcoal may have influenced it, and next week we ll try to get some wood, and try with a mix. As for the recipe, dividing the dough in eight parts worked quite well: maybe the pizzas are a bit on the smaller side, but since they come one at a time it s more convenient to cut and share them, and maybe make a couple more at the end. Of course, I ll want to try different recipes, for different styles of pizzas (including some almost-trademark-violating ones) and for other types of flatbread. I expect it won t be hard to find volunteers to help us with the experiments. :D

  1. any insinuation that there may have been considerations of having a way to have freshly baked bread in case of a prolonged blackout may or may not be based on reality. But it wasn t the only or even the main reason.
  2. come on! it s made of STEEL. how can it be not good? :D
  3. IKEA 365+ 3.1 glass, the one that is 32 cm 21 cm 9 cm; it was just big enough for the amount of dough, and then I covered it with a lid that is missing the seal.
  4. why did they put a thermometer on it, and not add labels with the actual temperature? WHY???
  5. if you don t have dietary restrictions a bit of olive oil would taste even better.
  6. numbers above 2 are all basically the same, right?

15 April 2026

Freexian Collaborators: Debian Contributions: Debusine projects in GSoC, Debian CI updates, Salsa CI maintenance and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-03 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debusine projects in Google s Summer of Code While Freexian initiated Debusine, and is investing a lot of resources in the project, we manage it as a true free software project that can and should have a broader community. We always had documentation for new contributors and we aim to be reactive with them when they interact via the issue tracker or via merge requests. We decided to put those intentions under stress tests by proposing five projects for Google s Summer of Code as part of Debian s participation in that program. Given that at least 11 candidates managed to get their merge request accepted in the last 30 days (interacting with the development team is part of the pre-requisites to apply to Google Summer of Code projects these days), the contributing experience must not be too bad. If you want to try it out, we maintain a list of quick fixes that are accessible to newcomers. And as always, we welcome your feedback!

Debian CI: incus backend and upgrade to Bootstrap 5, by Antonio Terceiro debci 3.14 was released on March 4th, with a followup 3.14.1 release with regression fixes a few days afterwards. Those releases were followed by new development and maintenance work that will provide extra capabilities and stability to the platform. This month saw the initial version of an incus backend land in Debian CI. The transition into the new backend will be done carefully so as to not disrupt testing migration. Each package will be running jobs with both the current lxc backend and with incus. Packages that have the same result on both backends will be migrated over, and packages that exhibit different results will be investigated further, resulting in bug reports and/or other communication with the maintainers. On the frontend side, the code has been ported to Bootstrap 5 over from the now ancient Bootstrap 3. This need has been originally reported back in 2024 based on the lack of security support for Bootstrap 3. Beyond improving maintainability, this upgrade also enables support for dark mode in debci, which is still work in progress. Both updates mentioned in this section will be available in a following debci release.

Salsa CI maintenance by Santiago Ruano Rinc n et al. Santiago reviewed some Salsa CI issues and reviewed associated merge requests. For example, he investigated a regression (#545), introduced by the move to sbuild, on the use of extra repositories configured as .source files; and reviewed the MR (!712) that fixes it. Also, there were conflicts with changes made in debci 3.14 and debci 3.14.1 (those updates are mentioned above), and different people have contributed to fix the subsequent issues, in a long-term way. This includes Rapha l who proposed MR !707 and who also suggested Antonio to merge the Salsa CI patches to avoid similar errors in the future. This happened shortly after. Those fixes finally required the unrelated MR !709, which will prevent similar problems when building images. To identify bugs related to the autopkgtest support in the backport suites as early as possible, Santiago proposed MR !708. Finally, Santiago, in collaboration with Emmanuel Arias also had exchanges with GSoC candidates for the Salsa CI project, including the contributions they have made as merge requests. It is important to note that there are several very good candidates interested in participating. Thanks a lot to them for their work so far!

Miscellaneous contributions
  • Rapha l reported a zim bug affecting Debian Unstable users, which was already fixed in git apparently. He could thus cherry-pick the fix and update the package in Debian Unstable.
  • Carles created a new page on the InstallingDebianOn in Debian Wiki.
  • Carles submitted translation errors in the debian-installer Weblate.
  • Carles, using po-debconf-manager, improved Catalan translations: reviewed and submitted 3 packages. Also improved error handling when forking or submitting an MR if the fork already existed.
  • Carles kept improving check-relations: code base related general improvements (added strict typing, enabled pre-commit). Also added DebPorts support, virtual packages support and added commands for reporting missing relations and importing bugs from bugs.debian.org.
  • Antonio handled miscellaneous Salsa support requests.
  • Antonio improved the management of MiniDebConf websites by keeping all non-secret settings in git and fixed exporting these sites as static HTML.
  • Stefano uploaded routine updates to hatchling, python-mitogen, python-virtualenv, python-discovery, dh-python, pypy3, python-pipx, and git-filter-repo.
  • Faidon uploaded routine updates to crun, libmaxminddb, librdkafka, lowdown, platformdirs, python-discovery, sphinx-argparse-cli, tox, tox-uv.
  • Stefano and Santiago continued to help with DebConf 26 preparations.
  • Stefano reviewed some contributions to debian-reimbursements and handled admin for reimbursements.debian.net.
  • Stefano attended the Debian Technical Committee meeting.
  • Helmut sent 8 patches for cross build failures.
  • Building on the work of postmarketOS, Helmut managed to cross build systemd for musl in rebootstrap and sent several patches in the process.
  • Helmut reviewed several MRs of Johannes Schauer Marin Rodrigues expanding support for DPKG_ROOT to support installing hurd.
  • Helmut incorporated a final round of feedback for the Multi-Arch documentation in Debian policy, which finally made it into unstable together with documentation of Build-Profiles.
  • In order to fix python-memray, Helmut NMUed libunwind generally disabling C++ exception support as being an incompatible duplication of the gcc implementation. Unfortunately, that ended up breaking suricata on riscv64. After another NMU, python-memray finally migrated.
  • Thorsten uploaded new upstream versions of epson-inkjet-printer-escpr and sane-airscan. He also fixed a packaging bug in printer-driver-oki. As of systemd 260.1-1 the configuration of lpadmin has been added to the sysusers.d configuration. All printing packages can now simply depend on the systemd-sysusers package and don t have to take care of its creation in maintainer scripts anymore.
  • In collaboration with Emmanuel Arias, Santiago had exchanges with GSoC candidates and reviewed the proposals of the Linux livepatching GSoC 2026 project.
  • Colin helped to fix CVE-2026-3497 in openssh and CVE-2026-28356 in multipart.
  • Colin upgraded tango and pytango to new upstream releases and packaged pybind11-stubgen (needed for pytango), thanks to a Freexian customer. Tests of reproducible builds revealed that pybind11-stubgen didn t generate imports in a stable order; this is now fixed upstream.
  • Lucas fixed CVE-2025-67733 and CVE-2026-21863 affecting src:valkey in unstable and testing. Also reviewed the same fixes targeting stable proposed by Peter Wienemann.
  • Faidon worked with upstream and build-dep Debian maintainers on resolving blockers in order to bring pyHanko into Debian, starting with the adoption of python-pyhanko-certvalidator. pyHanko is a suite for signing and stamping PDF files, and one of the few libraries that can be leveraged to sign PDFs with eIDAS Qualified Electronic Signatures.
  • Anupa co-organized MiniDebConf Kanpur and attended the event with many others from all across India. She handled the accommodation arrangements along with the registration team members, worked on the budget and expenses. She was also a speaker at the event.
  • Lucas helped with content review/schedule for the MiniDebConf Campinas. Thanks Freexian for being a Gold sponsor!
  • Lucas organized and took part in a one-day in-person sprint to work on Ruby 3.4 transition. It was held in a coworking space in Brasilia - Brazil on April 6th. There were 5 DDs and they fixed multiple packages FTBFSing against Ruby 3.4 (coming to unstable soon hopefully). Lucas has been postponing a blog post about this sprint since then :-)

29 March 2026

Russ Allbery: Review: The Sovereign

Review: The Sovereign, by C.L. Clark
Series: Magic of the Lost #3
Publisher: Orbit
Copyright: September 2025
ISBN: 0-316-54286-5
Format: Kindle
Pages: 575
The Sovereign is the third and concluding book of C.L. Clark's Magic of the Lost high fantasy trilogy. I recommend reading the books of this series close together, since there are a lot of characters and a lot of continuity between books that is helpful to remember, but it was not quite as difficult this time to remember where the story left off. At the end of The Faithless, the political situation in Balladaire (not-France) was more stable, but the threat of a plague lay on the horizon. That threat arrives in earnest in this book, along with new threats from both Balladaire's former colonial conscript soldiers and from neighboring Taargen (not-Germany, sort of, although the parallel isn't as close). Luca and Touraine have finally admitted that they're deeply in love, but they are still very different people with different goals and ethics. Luca is determined to do anything necessary to save her kingdom, but her definition of her kingdom is sharp and brittle. Touraine is torn between far too many loyalties, plus the lingering worry that her morals and Luca's may not be compatible. I think the hardest part of this sort of series is finding an ending the reader will find satisfying. This one, unfortunately, did not work for me, but that may be more due to personal preference than objective flaws. There have been two threads through this series: an improbable romance embedded in a network of complex personal relationships, and a political commentary on colonialism and post-colonial wars. I was enjoying the former, but it was the latter that felt fresh and interesting to me. The plot threads in The Faithless outside of Balladaire expanded that complexity, and I was hoping the final volume would continue in that direction. How could a colonial power atone for its history? How does the former colony establish its own governance? Is there a path to freedom without violence? Are attempts to chart a more moral course doomed to open lines of attack for one's other enemies? It's clear that Clark was thinking about similar themes, but The Sovereign narrows the field instead of widens it, restricts the political options, and then resolves most questions in a massive war. This is not that surprising of a conclusion, but it's one that I found unsatisfying and, honestly, a little boring. Yes, one way to resolve all the competing tensions is for everyone to try to kill each other and whoever survives wins, and historically that's one of the more likely outcomes, but that ending doesn't wrestle with the politics as much as it collapses them. Clark instead focuses this concluding volume on the romance, which becomes even more fraught, tragic, and dramatic than it was in previous books (and that's saying something). The hard questions of divided loyalties and moral conflicts are mostly framed by questions about Touraine's loyalty to Luca and Luca's trust of Touraine. This is all very Shakespearean, full of hard choices, sudden reversals, miscommunication, and a very deep conflict between Luca's realpolitik and Touraine's stubborn personal morality. If this is what you were reading the series for, if you were hoping for a maximum-drama sapphic relationship, you may thoroughly enjoy this. I thought it had its moments, but I wish they had been balanced by more moments of cool-headed practicality and creative political ingenuity. My biggest frustration with this ending is that the characters largely stop doing politics. The political complexity was the strength of both The Unbroken and The Faithless: People who intensely dislike each other negotiate because there is something larger to be gained, personal decisions made without considering the political ramifications have costs, and multiple characters are trying hard to find a way to turn a nasty, exploitative world into something better without simply killing everyone who disagrees. Many of the characters were objectively bad at politics, inexperienced and immature, but they stumbled or dragged or fought their way into political solutions anyway. I thought Clark moved too far away from that in The Sovereign. Everyone goes deep into their own emotions and desire for vengeance or conquest or revolution and stops compromising. To a depressingly large extent, the story is resolved by killing everyone who disagrees. I think the story is poorer for it. One of the other threads of the series is Balladairan magic, or rather its odd absence. Luca has one understanding of it, the rebels introduced in The Faithless have a different understanding of it, and its pursuit is set up as critical to resolving the threat of a plague. We do get an explanation of sorts, but it's not as complete or as satisfying as I was hoping, and the symbolism of Balladaire's missing magic is left frustratingly murky. For me, this has some of the same problems as the political conclusion: I wanted an intellectual catharsis alongside the emotional catharsis, but that was not the direction Clark was taking the story. I like reading about these characters. All of Luca, Touraine, and Pruett are complex, comprehensible, flawed, and often intriguing. But my favorite character in the story, the person I latched on to as an emotional path through the story, was Sabine. Her refreshingly straightforward loyalty and lack of drama was a breath of fresh air. She has some great moments in this book, but there too I got wrong-footed by the direction Clark went with her arc and found its conclusion deeply unsatisfying. I'm not sure how many of these complaints are because of missed opportunities in the novel, how many were due to a mismatch of taste, and how many were due to not being in the right mood to read this conclusion. I'm sure that it didn't help that I read this simultaneous with another novel in which the characters were always miserable, or that I read it in early 2026 with, uh, all that entails. I suspect that if you came away from the first two books invested in the messy romance and wanting MOAR DRAMA, you may get exactly what you were hoping for. That, sadly, was not what I was hoping for. I can't really recommend this. I thought it dragged in places and didn't deliver the ending I wanted. But it has some great moments, it does wrap up the threads of the trilogy as advertised, and at least the romance gets a dramatic climax worthy of the tension that has been built through the previous books. If that matches what you were enjoying in the previous books, you may well enjoy this more than I did. Rating: 5 out of 10

17 March 2026

Dirk Eddelbuettel: RcppArmadillo 15.2.4-1 on CRAN: Upstream Update

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1235 other packages on CRAN, downloaded 44.9 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 672 times according to Google Scholar. This versions updates to the 15.2.4 upstream Armadillo release from yesterday. The package has already been updated for Debian, and for r2u. This release, which we as usual checked against the reverse-dependencies, brings minor changes over the RcppArmadillo release 15.2.3 made in December (and described here) by addressing some corner-case ASAN/UBSAN reports (which Conrad, true to his style of course labels as false positive just how he initially responded that he would never add a fix based on such a false report; as always it is best to just watch what does as he is rather good at it, and, written comments notwithstanding, quite responsive) as well as speed-ups for empty sparse matrices. I made one more follow-up refinement on the OpenMP setup which should now just work on all suitable platforms. The detailed changes since the last release follow.

Changes in RcppArmadillo version 15.2.4-1 (2026-03-17)
  • Upgraded to Armadillo release 15.2.4 (Medium Roast Deluxe)
    • Workarounds for bugs in GCC and Clang sanitisers (ASAN false positives)
    • Faster handling of blank sparse matrices
  • Refined OpenMP setup (Dirk in #500)

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

13 March 2026

Hellen Chemtai: One week later after the Outreachy internship: Managing Work-Life Balance

Hello world. I have been doing a lot after my internship with Outreachy. We are still working on some tasks :

  1. I am working on running locales for my native language in live images.
  2. I am also working on points to add to talk proposals for a Debian conference.

As I am moving around constantly, there are problems I had encountered when changing my networks. I had to connect my virtual machine to different networks and the network would not reflect within the machine. From terminal I edited the virtual machine XML settings:

su -
// input password
sudo virsh edit <machine_name> #its openqa for me
// Look for the interface within devices and replace this:
<interface type=&aposnetwork&apos>
<source network=&aposdefault&apos/>
#some other code in here
</interface>
// With just this then restart your machine:
<interface type=&aposuser&apos>
<model type=&aposvirtio&apos/>
</interface>

Hopefully the above will help someone out there. I am still working on a lot of tasks regarding the conference, so much to do and so little time. I am hoping I won t get any burnout during this period. I won t be updating much further till the conference. Have a nice time

10 March 2026

Freexian Collaborators: Debian Contributions: Opening DebConf 26 Registration, Debian CI improvements and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-02 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

DebConf 26 Registration, by Stefano Rivera, Antonio Terceiro, and Santiago Ruano Rinc n DebConf 26, to be held in Santa Fe Argentina in July, has opened for registration and event proposals. Stefano, Antonio, and Santiago all contributed to making this happen. As always, some changes needed to be made to the registration system. Bigger changes were planned, but we ran out of time to implement them for DebConf 26. All 3 of us have had experience in hosting local DebConf events in the past and have been advising the DebConf 26 local team.

Debian CI improvements, by Antonio Terceiro Debian CI is the platform responsible for automated testing of packages from the Debian archive, and its results are used by the Debian Release team automation as Quality Assurance to control the migration of packages from Debian unstable into testing, the base for the next Debian release. Antonio started developing an incus backend, and that prompted two rounds of improvements to the platform, including but not limited to allowing user to select a job execution backend (lxc, qemu) during the job submission, reducing the part of testbed image creation that requires superuser privileges and other refactorings and bug fixes. The platform API was also improved to reduce disruption when reporting results to the Release Team automation after service downtimes. Last, but not least, the platform now has support for testing packages against variants of autopkgtest, which will allow the Debian CI team to test new versions of autopkgtest before making releases to avoid widespread regressions.

Miscellaneous contributions
  • Carles improved po-debconf-manager while users requested features / found bugs. Improvements done - add packages from unstable instead of just salsa.debian.org, upgrade and merge templates of upgraded packages, finished adding typing annotations, improved deleting packages: support multiple line texts, add debug to see subprocess.run commands, etc.
  • Carles, using po-debconf-manager, reviewed 7 Catalan translations and sent bug reports or MRs for 11 packages. Also reviewed the translations of fortunes-debian-hints and submitted possible changes in the hints.
  • Carles submitted MRs for reportbug (reportbug --ui gtk detecting the wrong dependencies), devscript (delete unused code from debrebuild and add recommended dependency), wcurl (format help for 80 columns). Carles submitted a bug report for apt not showing the long descriptions of packages.
  • Carles resumed effort for checking relations (e.g. Recommends / Suggests) between Debian packages. A new codebase (still in early stages) was started with a new approach in order to detect, report and track the broken relations.
  • Emilio drove several transitions, most notably the haskell transition and the glibc/gcc-15/zlib transition for the s390 31-bit removal. This last one included reviewing and requeueing lots of autopkgtests due to britney losing a lot of results.
  • Emilio reviewed and uploaded poppler updates to experimental for a new transition.
  • Emilio reviewed, merged and deployed some performance improvements proposed for the security-tracker.
  • Stefano prepared routine updates for pycparser, python-confuse, python-cffi, python-mitogen, python-pip, wheel, platformdirs, python-authlib, and python-virtualenv.
  • Stefano updated Python 3.13 and 3.14 to the latest point releases, including security updates, and did some preliminary work for Python 3.15.
  • Stefano reviewed changes to dh-python and merged MRs.
  • Stefano did some debian.social sysadmin work, bridging additional IRC channels to Matrix.
  • Stefano and Antonio, as DebConf Committee Members, reviewed the DebConf 27 bids and took part in selecting the Japanese bid to host DebConf 27.
  • Helmut sent patches for 29 cross build failures.
  • Helmut continued to maintain rebootstrap addressing issues relating to specific architectures (such as musl-linux-any, hurd-any or s390x) or specific packages (such as binutils, brotli or fontconfig).
  • Helmut worked on diagnosing bugs such as rocblas #1126608, python-memray #1126944 upstream and greetd #1129070 with varying success.
  • Antonio provided support for multiple MiniDebConfs whose websites run wafer + wafer-debconf (the same stack as DebConf itself).
  • Antonio fixed the salsa tagpending webhook.
  • Antonio sent specinfra upstream a patch to fix detection of Debian systems in some situations.
  • Santiago reviewed some Merge Requests for the Salsa CI pipeline, including !703 and !704, that aim to improve how the build source job is handled by Salsa CI. Thanks a lot to Jochen for his work on this.
  • In collaboration with Emmanuel Arias, Santiago proposed a couple of projects for the Google Summer of Code (GSoC) 2026 round. Santiago has been reviewing applications and giving feedback to candidates.
  • Thorsten uploaded new upstream versions of ipp-usb, brlaser and gutenprint.
  • Rapha l updated publican to fix an old bug that became release critical and that happened only when building with the nocheck profile. Publican is a build dependency of the Debian s Administrator Handbook and with that fix, the package is back into testing.
  • Rapha l implemented a small feature in Debusine that makes it possible to refer to a collection in a parent workspace even if a collection with the same name is present in the current workspace.
  • Lucas updated the current status of ruby packages affecting the Ruby 3.4 transition after a bunch of updates made by team members. He will follow up on this next month.
  • Lucas joined the Debian orga team for GSoC this year and tried to reach out to potential mentors.
  • Lucas did some content work for MiniDebConf Campinas - Brazil.
  • Colin published minor security updates to bookworm and trixie for CVE-2025-61984 and CVE-2025-61985 in OpenSSH, both of which allowed code execution via ProxyCommand in some cases. The trixie update also included a fix for mishandling of PerSourceMaxStartups.
  • Colin spotted and fixed a typo in the bug tracking system s spam-handling rules, which in combination with a devscripts regression caused bts forwarded commands to be discarded.
  • Colin ported 12 more Python packages away from using the deprecated (and now removed upstream) pkg_resources module.
  • Anupa is co-organizing MiniDebConf Kanpur with Debian India team. Anupa was responsible for preparing the schedule, publishing it on the website, co-ordination with the fiscal host in addition to attending meetings.
  • Anupa attended the Debian Publicity team online sprint which was a skill sharing session.

5 March 2026

Vincent Bernat: Automatic Prometheus metrics discovery with Docker labels

Akvorado, a network flow collector, relies on Traefik, a reverse HTTP proxy, to expose HTTP endpoints for its Docker Compose services. Docker labels attached to each service define the routing rules. Traefik picks them up automatically when a container starts. Instead of maintaining a static configuration file to collect Prometheus metrics, we apply the same approach with Grafana Alloy.

Traefik & Docker Traefik listens for events on the Docker socket. Each service advertises its configuration through labels. For example, here is the Loki service in Akvorado:
services:
  loki:
    #  
    expose:
      - 3100/tcp
    labels:
      - traefik.enable=true
      - traefik.http.routers.loki.rule=PathPrefix( /loki )
Once the container is healthy, Traefik creates a router forwarding requests matching /loki to its first exposed port. Colocating Traefik configuration with the service definition is attractive. How do we achieve the same for Prometheus metrics?

Metrics discovery with Alloy Grafana Alloy, a metrics collector that scrapes Prometheus endpoints, includes a discovery.docker component. Just like Traefik, it connects to the Docker socket.1 With a few relabeling rules, we teach it to use Docker labels to locate and scrape metrics. We define three labels on each service:
  • metrics.enable set to true enables metrics collection,
  • metrics.port specifies the port exposing the Prometheus endpoint, and
  • metrics.path specifies the path to the metrics endpoint.
If a service exposes more than one port, metrics.port is mandatory. Otherwise, it defaults to the only exposed port. The default value for metrics.path is /metrics. The Loki service from earlier becomes:
services:
  loki:
    #  
    expose:
      - 3100/tcp
    labels:
      - traefik.enable=true
      - traefik.http.routers.loki.rule=PathPrefix( /loki )
      - metrics.enable=true
      - metrics.path=/loki/metrics
Alloy s configuration is split into four parts:
  1. discover containers through the Docker socket,
  2. filter and relabel targets using Docker labels,
  3. scrape the matching endpoints, and
  4. forward the metrics to Prometheus.

Discovering Docker containers The first building block discovers running containers:
discovery.docker "docker"  
  host             = "unix:///var/run/docker.sock"
  refresh_interval = "30s"
  filter  
    name   = "label"
    values = ["com.docker.compose.project=akvorado"]
   
 
This connects to the Docker socket and lists containers every 30 seconds.2 The filter block restricts discovery to containers belonging to the akvorado project, avoiding interference with unrelated containers on the same host. For each discovered container, Alloy produces a target with labels such as __meta_docker_container_label_metrics_port for the metrics.port Docker label.

Relabeling targets The relabeling step filters and transforms raw targets from Docker discovery into scrape targets. The first stage keeps only targets with metrics.enable set to true:
discovery.relabel "prometheus"  
  targets = discovery.docker.docker.targets
  // Keep only targets with metrics.enable=true
  rule  
    source_labels = ["__meta_docker_container_label_metrics_enable"]
    regex         =  true 
    action        = "keep"
   
  //  
 
The second stage overrides the discovered port when the service defines metrics.port:
// When metrics.port is set, override __address__.
rule  
  source_labels = ["__address__", "__meta_docker_container_label_metrics_port"]
  regex         =  (.+):\d+;(.+) 
  target_label  = "__address__"
  replacement   = "$1:$2"
 
Next, we handle containers in host network mode. When __meta_docker_network_name equals host, Alloy rewrites the address to host.docker.internal instead of localhost:3
// When host networking, override __address__ to host.docker.internal.
rule  
  source_labels = ["__meta_docker_container_label_metrics_port", "__meta_docker_network_name"]
  regex         =  (.+);host 
  target_label  = "__address__"
  replacement   = "host.docker.internal:$1"
 
The next stage derives the job name from the service name, stripping any numbered suffix. The instance label is the address without the port:
rule  
  source_labels = ["__meta_docker_container_label_com_docker_compose_service"]
  regex         =  (.+)(?:-\d+)? 
  target_label  = "job"
 
rule  
  source_labels = ["__address__"]
  regex         =  (.+):\d+ 
  target_label  = "instance"
 
If a container defines metrics.path, Alloy uses it. Otherwise, it defaults to /metrics:
rule  
  source_labels = ["__meta_docker_container_label_metrics_path"]
  regex         =  (.+) 
  target_label  = "__metrics_path__"
 
rule  
  source_labels = ["__metrics_path__"]
  regex         = ""
  target_label  = "__metrics_path__"
  replacement   = "/metrics"
 

Scraping and forwarding With the targets properly relabeled, scraping and forwarding are straightforward:
prometheus.scrape "docker"  
  targets         = discovery.relabel.prometheus.output
  forward_to      = [prometheus.remote_write.default.receiver]
  scrape_interval = "30s"
 
prometheus.remote_write "default"  
  endpoint  
    url = "http://prometheus:9090/api/v1/write"
   
 
prometheus.scrape periodically fetches metrics from the discovered targets. prometheus.remote_write sends them to Prometheus.

Built-in exporters Some services do not expose a Prometheus endpoint. Redis and Kafka are common examples. Alloy ships built-in Prometheus exporters that query these services and expose metrics on their behalf.
prometheus.exporter.redis "docker"  
  redis_addr = "redis:6379"
 
discovery.relabel "redis"  
  targets = prometheus.exporter.redis.docker.targets
  rule  
    target_label = "job"
    replacement  = "redis"
   
 
prometheus.scrape "redis"  
  targets         = discovery.relabel.redis.output
  forward_to      = [prometheus.remote_write.default.receiver]
  scrape_interval = "30s"
 
The same pattern applies to Kafka:
prometheus.exporter.kafka "docker"  
  kafka_uris = ["kafka:9092"]
 
discovery.relabel "kafka"  
  targets = prometheus.exporter.kafka.docker.targets
  rule  
    target_label = "job"
    replacement  = "kafka"
   
 
prometheus.scrape "kafka"  
  targets         = discovery.relabel.kafka.output
  forward_to      = [prometheus.remote_write.default.receiver]
  scrape_interval = "30s"
 
Each exporter is a separate component with its own relabeling and scrape configuration. We set the job label explicitly since no Docker metadata can provide it.
With this setup, adding metrics to a new service with a Prometheus endpoint requires only a few labels in docker-compose.yml, just like adding a Traefik route. Alloy picks it up automatically. You can apply the same pattern with another discovery method, like discovery.kubernetes, discovery.scaleway, or discovery.http.

  1. Both Traefik and Alloy require access to the Docker socket, which grants root-level access to the host. A Docker socket proxy mitigates this by exposing only the read-only API endpoints needed for discovery.
  2. Unlike Traefik, which watches for events, Grafana Alloy polls the container list at regular intervals a behavior inherited from Prometheus.
  3. The Alloy service needs extra_hosts: ["host.docker.internal:host-gateway"] in its definition.

3 March 2026

Matthew Garrett: To update blobs or not to update blobs

A lot of hardware runs non-free software. Sometimes that non-free software is in ROM. Sometimes it s in flash. Sometimes it s not stored on the device at all, it s pushed into it at runtime by another piece of hardware or by the operating system. We typically refer to this software as firmware to differentiate it from the software run on the CPU after the OS has started1, but a lot of it (and, these days, probably most of it) is software written in C or some other systems programming language and targeting Arm or RISC-V or maybe MIPS and even sometimes x862. There s no real distinction between it and any other bit of software you run, except it s generally not run within the context of the OS3. Anyway. It s code. I m going to simplify things here and stop using the words software or firmware and just say code instead, because that way we don t need to worry about semantics. A fundamental problem for free software enthusiasts is that almost all of the code we re talking about here is non-free. In some cases, it s cryptographically signed in a way that makes it difficult or impossible to replace it with free code. In some cases it s even encrypted, such that even examining the code is impossible. But because it s code, sometimes the vendor responsible for it will provide updates, and now you get to choose whether or not to apply those updates. I m now going to present some things to consider. These are not in any particular order and are not intended to form any sort of argument in themselves, but are representative of the opinions you will get from various people and I would like you to read these, think about them, and come to your own set of opinions before I tell you what my opinion is. THINGS TO CONSIDER Ok we re done with the things to consider. Please spend a few seconds thinking about what the tradeoffs are here and what your feelings are. Proceed when ready. I trust my CPU vendor. I don t trust my CPU vendor because I want to, I trust my CPU vendor because I have no choice. I don t think it s likely that my CPU vendor has designed a CPU that identifies when I m generating cryptographic keys and biases the RNG output so my keys are significantly weaker than they look, but it s not literally impossible. I generate keys on it anyway, because what choice do I have? At some point I will buy a new laptop because Electron will no longer fit in 32GB of RAM and I will have to make the same affirmation of trust, because the alternative is that I just don t have a computer. And in any case, I will be communicating with other people who generated their keys on CPUs I have no control over, and I will also be relying on them to be trustworthy. If I refuse to trust my CPU then I don t get to computer, and if I don t get to computer then I will be sad. I suspect I m not alone here. Why would I install a code update on my CPU when my CPU s job is to run my code in the first place? Because it turns out that CPUs are complicated and messy and they have their own bugs, and those bugs may be functional (for example, some performance counter functionality was broken on Sandybridge at release, and was then fixed with a microcode blob update) and if you update it your hardware works better. Or it might be that you re running a CPU with speculative execution bugs and there s a microcode update that provides a mitigation for that even if your CPU is slower when you enable it, but at least now you can run virtual machines without code in those virtual machines being able to reach outside the hypervisor boundary and extract secrets from other contexts. When it s put that way, why would I not install the update? And the straightforward answer is that theoretically it could include new code that doesn t act in my interests, either deliberately or not. And, yes, this is theoretically possible. Of course, if you don t trust your CPU vendor, why are you buying CPUs from them, but well maybe they ve been corrupted (in which case don t buy any new CPUs from them either) or maybe they ve just introduced a new vulnerability by accident, and also you re in a position to determine whether the alleged security improvements matter to you at all. Do you care about speculative execution attacks if all software running on your system is trustworthy? Probably not! Do you need to update a blob that fixes something you don t care about and which might introduce some sort of vulnerability? Seems like no! But there s a difference between a recommendation for a fully informed device owner who has a full understanding of threats, and a recommendation for an average user who just wants their computer to work and to not be ransomwared. A code update on a wifi card may introduce a backdoor, or it may fix the ability for someone to compromise your machine with a hostile access point. Most people are just not going to be in a position to figure out which is more likely, and there s no single answer that s correct for everyone. What we do know is that where vulnerabilities in this sort of code have been discovered, updates have tended to fix them - but nobody has flagged such an update as a real-world vector for system compromise. My personal opinion? You should make your own mind up, but also you shouldn t impose that choice on others, because your threat model is not necessarily their threat model. Code updates are a reasonable default, but they shouldn t be unilaterally imposed, and nor should they be blocked outright. And the best way to shift the balance of power away from vendors who insist on distributing non-free blobs is to demonstrate the benefits gained from them being free - a vendor who ships free code on their system enables their customers to improve their code and enable new functionality and make their hardware more attractive. It s impossible to say with absolute certainty that your security will be improved by installing code blobs. It s also impossible to say with absolute certainty that it won t. So far evidence tends to support the idea that most updates that claim to fix security issues do, and there s not a lot of evidence to support the idea that updates add new backdoors. Overall I d say that providing the updates is likely the right default for most users - and that that should never be strongly enforced, because people should be allowed to define their own security model, and whatever set of threats I m worried about, someone else may have a good reason to focus on different ones.

  1. Code that runs on the CPU before the OS is still usually described as firmware - UEFI is firmware even though it s executing on the CPU, which should give a strong indication that the difference between firmware and software is largely arbitrary
  2. And, obviously 8051
  3. Because UEFI makes everything more complicated, UEFI makes this more complicated. Triggering a UEFI runtime service involves your OS jumping into firmware code at runtime, in the same context as the OS kernel. Sometimes this will trigger a jump into System Management Mode, but other times it won t, and it s just your kernel executing code that got dumped into RAM when your system booted.
  4. I don t understand most of the diff between one kernel version and the next, and I don t have time to read all of it either.
  5. There s a bunch of reasons to do this, the most reasonable of which is probably not wanting customers to replace the code and break their hardware and deal with the support overhead of that, but not being able to replace code running on hardware I own is always going to be an affront to me.

23 February 2026

Antoine Beaupr : PSA: North america changes time forward soon, Europe next

This is a copy of an email I used to send internally at work and now made public. I'm not sure I'll make a habit of posting it here, especially not twice a year, unless people really like it. Right now, it's mostly here to keep with my current writing spree going.
This is your bi-yearly reminder that time is changing soon!

What's happening? For people not on tor-internal, you should know that I've been sending semi-regular announcements when daylight saving changes occur. Starting now, I'm making those announcements public so they can be shared with the wider community because, after all, this affects everyone (kind of). For those of you lucky enough to have no idea what I'm talking about, you should know that some places in the world implement what is called Daylight saving time or DST. Normally, you shouldn't have to do anything: computers automatically change time following local rules, assuming they are correctly configured, provided recent updates have been applied in the case of a recent change in said rules (because yes, this happens). Appliances, of course, will likely not change time and will need to adjusted unless they are so-called "smart" (also known as "part of a bot net"). If your clock is flashing "0:00" or "12:00", you have no action to take, congratulations on having the right time once or twice a day. If you haven't changed those clocks in six months, congratulations, they will be accurate again! In any case, you should still consider DST because it might affect some of your meeting schedules, particularly if you set up a new meeting schedule in the last 6 months and forgot to consider this change.

If your location does not have DST Properly scheduled meetings affecting multiple time zones are set in UTC time, which does not change. So if your location does not observer time changes, your (local!) meeting time will not change. But be aware that some other folks attending your meeting might have the DST bug and their meeting times will change. They might miss entire meetings or arrive late as you frantically ping them over IRC, Matrix, Signal, SMS, Ricochet, Mattermost, SimpleX, Whatsapp, Discord, Slack, Wechat, Snapchat, Telegram, XMPP, Briar, Zulip, RocketChat, DeltaChat, talk(1), write(1), actual telegrams, Meshtastic, Meshcore, Reticulum, APRS, snail mail, and, finally, flying a remote presence drone to their house, asking what's going on. (Sorry if I forgot your preferred messaging client here, I tried my best.) Be kind; those poor folks might be more sleep deprived as DST steals one hour of sleep from them on the night that implements the change.

If you do observe DST If you are affected by the DST bug, your local meeting times will change access the board. Normally, you can trust that your meetings are scheduled to take this change into account and the new time should still be reasonable. Trust, but verify; make sure the new times are adequate and there are no scheduling conflicts. Do this now: take a look at your calendar in two week and in April. See if any meeting need to be rescheduled because of an impossible or conflicting time.

When does time change, how and where? Notice how I mentioned "North America" in the subject? That's a lie. ("The doctor lies", as they say on the BBC.) Other places, including Europe, also changes times, just not all at once (and not all North America). We'll get into "where" soon, but first let's look at the "how". As you might already know, the trick is:
Spring forward, fall backwards.
This northern-centric (sorry!) proverb says that clocks will move forward by an hour this "spring", after moving backwards last "fall". This is why we lose an hour of work, sorry, sleep. It sucks, to put it bluntly. I want it to stop and will keep writing those advisories until it does. To see where and when, we, unfortunately, still need to go into politics.

USA and Canada First, we start with "North America" which, really, is just some parts of USA[1] and Canada[2]. As usual, on the Second Sunday in March (the 8th) at 02:00 local (not UTC!), the clocks will move forward. This means that properly set clocks will flip from 1:59 to 3:00, coldly depriving us from an hour of sleep that was perniciously granted 6 months ago and making calendar software stupidly hard to write. Practically, set your wrist watch and alarm clocks[3] back one hour before going to bed and go to bed early. [1] except Arizona (except the Navajo nation), US territories, and Hawaii [2] except Yukon, most of Saskatchewan, and parts of British Columbia (northeast), one island in Nunavut (Southampton Island), one town in Ontario (Atikokan) and small parts of Quebec (Le Golfe-du-Saint-Laurent), a list which I keep recopying because I find it just so amazing how chaotic it is. When your clock has its own Wikipedia page, you know something is wrong. [3] hopefully not managed by a botnet, otherwise kindly ask your bot net operator to apply proper software upgrades in a timely manner

Europe Next we look at our dear Europe, which will change time on the last Sunday in March (the 29th) at 01:00 UTC (not local!). I think it means that, Amsterdam-time, the clocks will flip from 1:59 to 3:00 AM local on that night. (Every time I write this, I have doubts. I would welcome independent confirmation from night owls that observe that funky behavior experimentally.) Just like your poor fellows out west, just fix your old-school clocks before going to bed, and go to sleep early, it's good for you.

Rest of the world with DST Renewed and recurring apologies again to the people of Cuba, Mexico, Moldova, Israel, Lebanon, Palestine, Egypt, Chile (except Magallanes Region), parts of Australia, and New Zealand which all have their own individual DST rules, omitted here for brevity. In general, changes also happen in March, but either on different times or different days, except in the south hemisphere, where they happen in April.

Rest of the world without DST All of you other folks without DST, rejoice! Thank you for reminding us how manage calendars and clocks normally. Sometimes, doing nothing is precisely the right thing to do. You're an inspiration for us all.

Changes since last time There were, again, no changes since last year on daylight savings that I'm aware of. It seems the US congress debating switching to a "half-daylight" time zone which is an half-baked idea that I should have expected from the current USA politics. The plan is to, say, switch from "Eastern is UTC-4 in the summer" to "Eastern is UTC-4.5". The bill also proposes to do this 90 days after enactment, which is dangerously optimistic about our capacity at deploying any significant change in human society. In general, I rely on the Wikipedia time nerds for this and Paul Eggert which seems to singlehandledly be keeping everything in order for all of us, on the tz-announce mailing list. This time, I've also looked at the tz mailing list which is where I learned about the congress bill. If your country has changed time and no one above noticed, now would be an extremely late time to do something about this, typically writing to the above list. (Incredibly, I need to write to the list because of this post.) One thing that did change since last year is that I've implemented what I hope to be a robust calendar for this, which was surprisingly tricky. If you have access to our Nextcloud, it should be visible under the heading "Daylight saving times". If you don't, you can access it using this direct link. The procedures around how this calendar was created, how this email was written, and curses found along the way, are also documented in this wiki page, if someone ever needs to pick up the Time Lord duty.

12 February 2026

Freexian Collaborators: Debian Contributions: cross building, rebootstrap updates, Refresh of the patch tagging guidelines and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-01 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

cross building, by Helmut Grohne In version 1.10.1, Meson merged a patch to make it call the correct g-ir-scanner by default thanks to Eli Schwarz. This problem affected more than 130 source packages. Helmut retried building them all and filed 69 patches as a result. A significant portion of those packages require another Meson change to call the correct vapigen. Another notable change is converting gnu-efi to multiarch, which ended up requiring changes to a number of other packages. Since Aurelien dropped the libcrypt-dev dependency from libc6-dev, this transition now is mostly complete and has resulted in most of the Perl ecosystem correctly expressing perl-xs-dev dependencies needed for cross building. It is these infrastructure changes affecting several client packages that this work targets. As a result of this continued work, about 66% of Debian s source packages now have satisfiable cross Build-Depends in unstable and about 10000 (55%) actually can be cross built. There are now more than 500 open bug reports affecting more than 2000 packages most of which carry patches.

rebootstrap, by Helmut Grohne Maintaining architecture cross-bootstrap requires continued effort for adapting to archive changes such as glib2.0 dropping a build profile or an e2fsprogs FTBFS. Beyond those generic problems, architecture-specific problems with e.g. musl-linux-any or sparc may arise. While all these changes move things forward on the surface, the bootstrap tooling has become a growing pile of patches. Helmut managed to upstream two changes to glibc for reducing its Build-Depends in the stage2 build profile and thanks Aurelien Jarno.

Refresh of the patch tagging guidelines, by Rapha l Hertzog Debian Enhancement Proposal #3 (DEP-3) is named Patch Tagging Guidelines and standardizes meta-information that Debian contributors can put in patches included in Debian source packages. With the feedback received over the years, and with the change in the package management landscape, the need to refresh those guidelines became evident. As the initial driver of that DEP, I spent a good day reviewing all the feedback (that I kept in a folder) and producing a new version of the document. The changes aim to give more weight to the syntax that is compatible with git format-patch s output, and also to clarify the expected uses and meanings of a couple of fields, including some algorithm that parsers should follow to define the state of the patch. After the announcement of the new draft on debian-devel, the revised DEP-3 received a significant number of comments that I still have to process.

Miscellaneous contributions
  • Helmut uploaded debvm making it work with unstable as a target distribution again.
  • Helmut modernized the code base backing dedup.debian.net significantly expanding the support for type checking.
  • Helmut fixed the multiarch hinter once more given feedback from Fabian Gr nbichler.
  • Helmut worked on migrating the rocblas package to forky.
  • Rapha l fixed RC bug #1111812 in publican and did some maintenance for tracker.debian.org.
  • Carles added support in the festival Debian package for systemd socket activation and systemd service and socket units. Adapted the patch for upstream and created a merge request (also fixed a MacOS X building system error while working on it). Updated Orca Wiki documentation regarding festival. Discussed a 2007 bug/feature in festival which allowed having a local shell and that the new systemd socket activation has the same code path.
  • Carles using po-debconf-manager worked on Catalan translations: 7 reviewed and sent; 5 follow ups, 5 deleted packages.
  • Carles made some po-debconf-manager changes: now it attaches the translation file on follow ups, fixed bullseye compatibility issues.
  • Carles reviewed a new Catalan apt translation.
  • Carles investigated and reported a lxhotkey bug and sent a patch for the abcde package.
  • Carles made minor updates for Debian Wiki for different pages (lxde for dead keys, Ripping with abcde troubleshooting, VirtualBox troubleshooting).
  • Stefano renamed build-details.json in Python 3.14 to fix multiarch coinstallability.
  • Stefano audited the tooling and ignore lists for checking the contents of the python3.X-minimal packages, finding and fixing some issues in the process.
  • Stefano made a few uploads of python3-defaults and dh-python in support of Python 3.14-as-default in Ubuntu. Also investigated the risk of ignoring byte-compilation failures by default, and started down the road of implementing this.
  • Stefano did some sysadmin work on debian.social infrastructure.
  • Stefano and Santiago worked on preparations for DebConf 26. Especially to help the local team on opening the registration, and reviewing the budget to be presented for approval.
  • Stefano uploaded routine updates of python-virtualenv and python-flexmock.
  • Antonio collaborated with DSA on enabling a new proxy for salsa to prevent scrapers from taking the service down.
  • Antonio did miscellaneous salsa administrative tasks.
  • Antonio fixed a few Ruby packages towards the Ruby 3.4 transition.
  • Antonio started work on planned improvements to the DebConf registration system.
  • Santiago prepared unstable updates for the latest upstream versions of knot-dns and knot-resolver. The authoritative DNS server and DNS resolver software developed by CZ.NIC. It is worth highlighting that, given the separation of functionality compared to other implementations, knot-dns and knot-resolver are also less complex software, which results in advantages in terms of security: only three CVEs have been reported for knot-dns since 2011).
  • Santiago made some routine reviews of merge requests proposed for the Salsa CI s pipeline. E.g. a proposal to fix how sbuild chooses the chroot when building a package for experimental.
  • Colin fixed lots of Python packages to handle Python 3.14 and to avoid using the deprecated pkg_resources module.
  • Colin added forky support to the images used in Salsa CI pipelines.
  • Colin began working on getting a release candidate of groff 1.24.0 (the first upstream release since mid-2023, so a very large set of changes) into experimental.
  • Lucas kept working on the preparation for Ruby 3.4 transition. Some packages fixed (support build against Ruby 3.3 and 3.4): ruby-rbpdf, jekyll, origami-pdf, ruby-kdl, ruby-twitter, ruby-twitter-text, ruby-globalid.
  • Lucas supported some potential mentors in the Google Summer of Code 26 program to submit their projects.
  • Anupa worked on the point release announcements for Debian 12.13 and 13.3 from the Debian publicity team side.
  • Anupa attended the publicity team meeting to discuss the team activities and to plan an online sprint in February.
  • Anupa attended meetings with the Debian India team to plan and coordinate the MinDebConf Kanpur and sent out related Micronews.
  • Emilio coordinated various transitions and helped get rid of llvm-toolchain-17 from sid.

13 January 2026

Freexian Collaborators: Debian Contributions: dh-python development, Python 3.14 and Ruby 3.4 transitions, Surviving scraper traffic in Debian CI and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-12 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

dh-python development, by Stefano Rivera In Debian we build our Python packages with the help of a debhelper-compatible tool, dh-python. Before starting the 3.14 transition (that would rebuild many packages) we landed some updates to dh-python to fix bugs and add features. This started a month of attention on dh-python, iterating through several bug fixes, and a couple of unfortunate regressions. dh-python is used by almost all packages containing Python (over 5000). Most of these are very simple, but some are complex and use dh-python in unexpected ways. It s hard to avoid almost any change (including obvious bug fixes) from causing some unexpected knock-on behaviour. There is a fair amount of complexity in dh-python, and some rather clever code, which can make it tricky to work on. All of this means that good QA is important. Stefano spent some time adding type annotations and specialized types to make it easier to see what the code is doing and catch mistakes. This has already made work on dh-python easier. Now that Debusine has built-in repositories and debdiff support, Stefano could quickly test the effects of changes on many other packages. After each big change, he could upload dh-python to a repository, rebuild e.g. 50 Python packages with it, and see what differences appeared in the output. Reviewing the diffs is still a manual process, but can be improved. Stefano did a small test on what it would take to replace direct setuptools setup.py calls with PEP-517 (pyproject-style) builds. There is more work to do here.

Python 3.14 transition, by Stefano Rivera (et al.) In December the transition to add Python 3.14 as a supported version started in Debian unstable. To do this, we update the list of supported versions in python3-defaults, and then start rebuilding modules with C extensions from the leaves inwards. This had already been tested in a PPA and Ubuntu, so many of the biggest blocking compatibility issues with 3.14 had already been found and fixed. But there are always new issues to discover. Thanks to a number of people in the Debian Python team, we got through the first bit of the transition fairly quickly. There are still a number of open bugs that need attention and many failed tests blocking migration to testing. Python 3.14.1 released just after we started the transition, and very soon after, a follow-up 3.14.2 release came out to address a regression. We ran into another regression in Python 3.14.2.

Ruby 3.4 transition, by Lucas Kanashiro (et al.) The Debian Ruby team just started the preparation to move the default Ruby interpreter version to 3.4. At the moment, ruby3.4 source package is already available in experimental, also ruby-defaults added support to Ruby 3.4. Lucas rebuilt all reverse dependencies against this new version of the interpreter and published the results here. Lucas also reached out to some stakeholders to coordinate the work. Next steps are: 1) announcing the results to the whole team and asking for help to fix packages failing to build against the new interpreter; 2) file bugs against packages FTBFSing against Ruby 3.4 which are not fixed yet; 3) once we have a low number of build failures against Ruby 3.4, ask the Debian Release team to start the transition in unstable.

Surviving scraper traffic in Debian CI, by Antonio Terceiro Like most of the open web, Debian Continuous Integration has been struggling for a while to keep up with the insatiable hunger from data scrapers everywhere. Solving this involved a lot of trial and error; the final result seems to be stable, and consists of two parts. First, all Debian CI data pages, except the direct links to test log files (such as those provided by the Release Team s testing migration excuses), now require users to be authenticated before being accessed. This means that the Debian CI data is no longer publicly browseable, which is a bit sad. However, this is where we are now. Additionally, there is now a fail2ban powered firewall-level access limitation for clients that display an abusive access pattern. This went through several iterations, with some of them unfortunately blocking legitimate Debian contributors, but the current state seems to strike a good balance between blocking scrapers and not blocking real users. Please get in touch with the team on the #debci OFTC channel if you are affected by this.

A hybrid dependency solver for crossqa.debian.net, by Helmut Grohne crossqa.debian.net continuously cross builds packages from the Debian archive. Like Debian s native build infrastructure, it uses dose-builddebcheck to determine whether a package s dependencies can be satisfied before attempting a build. About one third of Debian s packages fail this check, so understanding the reasons is key to improving cross building. Unfortunately, dose-builddebcheck stops after reporting the first problem and does not display additional ones. To address this, a greedy solver implemented in Python now examines each build-dependency individually and can report multiple causes. dose-builddebcheck is still used as a fall-back when the greedy solver does not identify any problems. The report for bazel-bootstrap is a lengthy example.

rebootstrap, by Helmut Grohne Due to the changes suggested by Loongson earlier, rebootstrap now adds debhelper to its final installability test and builds a few more packages required for installing it. It also now uses a variant of build-essential that has been marked Multi-Arch: same (see foundational work from last year). This in turn made the use of a non-default GCC version more difficult and required more work to make it work for gcc-16 from experimental. Ongoing archive changes temporarily regressed building fribidi and dash. libselinux and groff have received patches for architecture specific changes and libverto has been NMUed to remove the glib2.0 dependency.

Miscellaneous contributions
  • Stefano did some administrative work on debian.social and debian.net instances and Debian reimbursements.
  • Stefano did routine updates of python-authlib, python-mitogen, xdot.
  • Stefano spent several hours discussing Debian s Python package layout with the PyPA upstream community. Debian has ended up with a very different on-disk installed Python layout than other distributions, and this continues to cause some frustration in many communities that have to have special workarounds to handle it. This ended up impacting cross builds as Helmut discovered.
  • Rapha l set up Debusine workflows for the various backports repositories on debusine.debian.net.
  • Zulip is not yet in Debian (RFP in #800052), but Rapha l helped on the French translation as he is experimenting with that discussion platform.
  • Antonio performed several routine Salsa maintenance tasks, including fixing salsa-nm-sync, the service that synchronizes project members data from LDAP to Salsa, which had been broken since salsa.debian.org was upgraded to trixie .
  • Antonio deployed a new amd64 worker host for Debian CI.
  • Antonio did several DebConf technical and administrative bits, including but adding support for custom check-in/check-out dates in the MiniDebConf registration module, publishing a call for bids for DebConf27.
  • Carles reviewed and submitted 14 Catalan translations using po-debconf-manager.
  • Carles improved po-debconf-manager: added delete-package command, show-information now uses properly formatted output (YAML), it now attaches the translation on the bug reports for which a merge request has been opened too long.
  • Carles investigated why some packages appeared in po-debconf-manager but not in the Debian l10n list. Turns out that some packages had debian/po/templates.pot (appearing in po-debconf-manager) but not the POTFILES.in file as expected. Created a script to find out which packages were in this or similar situation and reported bugs.
  • Carles tested and documented how to set up voices (mbrola and festival) if using Orca speech synthesizer. Commented a few issues and possible improvements in the debian-accessibility list.
  • Helmut sent patches for 48 cross build failures and initiated discussions on how to deal with two non-trivial matters. Besides Python mentioned above, CMake introduced a cmake_pkg_config builtin which is not aware of the host architecture. He also forwarded a Meson patch upstream.
  • Thorsten uploaded a new upstream version of cups to fix a nasty bug that was introduced by the latest security update.
  • Along with many other Python 3.14 fixes, Colin fixed a tricky segfault in python-confluent-kafka after a helpful debugging hint from upstream.
  • Colin upstreamed an improved version of an OpenSSH patch we ve been carrying since 2008 to fix misleading verbose output from scp.
  • Colin used Debusine to coordinate transitions for astroid and pygments, and wrote up the astroid case on his blog.
  • Emilio helped with various transitions, and provided a build fix for opencv for the ffmpeg 8 transition.
  • Emilio tested the GNOME updates for trixie proposed updates (gnome-shell, mutter, glib2.0).
  • Santiago helped to review the status of how to test different build profiles in parallel on the same pipeline, using the test-build-profiles job. This means, for example, to simultaneously test build profiles such as nocheck and nodoc for the same git tree. Finally, Santiago provided MR !685 to fix the documentation.
  • Anupa prepared a bits post for Outreachy interns announcement along with T ssia Cam es Ara jo and worked on publicity team tasks.

5 January 2026

Colin Watson: Free software activity in December 2025

About 95% of my Debian contributions this month were sponsored by Freexian. You can also support my work directly via Liberapay or GitHub Sponsors. Python packaging I upgraded these packages to new upstream versions: Python 3.14 is now a supported version in unstable, and we re working to get that into testing. As usual this is a pretty arduous effort because it requires going round and fixing lots of odds and ends across the whole ecosystem. We can deal with a fair number of problems by keeping up with upstream (see above), but there tends to be a long tail of packages whose upstreams are less active and where we need to chase them, or where problems only show up in Debian for one reason or another. I spent a lot of time working on this: Fixes for pytest 9: I filed lintian: Report Python egg-info files/directories to help us track the migration to pybuild-plugin-pyproject. I did some work on dh-python: Normalize names in pydist lookups and pyproject plugin: Support headers (the latter of which allowed converting python-persistent and zope.proxy to pybuild-plugin-pyproject, although it needed a follow-up fix). I fixed or helped to fix several other build/test failures: Other bugs: Other bits and pieces Code reviews

Vincent Bernat: Using eBPF to load-balance traffic across UDP sockets with Go

Akvorado collects sFlow and IPFIX flows over UDP. Because UDP does not retransmit lost packets, it needs to process them quickly. Akvorado runs several workers listening to the same port. The kernel should load-balance received packets fairly between these workers. However, this does not work as expected. A couple of workers exhibit high packet loss:
$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
>   sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total listener="0.0.0.0:2055",worker="0"  0
packets_total listener="0.0.0.0:2055",worker="1"  0
packets_total listener="0.0.0.0:2055",worker="2"  0
packets_total listener="0.0.0.0:2055",worker="3"  1.614933572278264e+15
packets_total listener="0.0.0.0:2055",worker="4"  0
packets_total listener="0.0.0.0:2055",worker="5"  0
packets_total listener="0.0.0.0:2055",worker="6"  9.59964121598348e+14
packets_total listener="0.0.0.0:2055",worker="7"  0
eBPF can help by implementing an alternate balancing algorithm.

Options for load-balancing There are three methods to load-balance UDP packets across workers:
  1. One worker receives the packets and dispatches them to the other workers.
  2. All workers share the same socket.
  3. Each worker has its own socket, listening to the same port, with the SO_REUSEPORT socket option.

SO_REUSEPORT option Tom Hebert added the SO_REUSEPORT socket option in Linux 3.9. The cover letter for his patch series explains why this new option is better than the two existing ones from a performance point of view:
SO_REUSEPORT allows multiple listener sockets to be bound to the same port. [ ] Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash. The motivating case for SO_RESUSEPORT in TCP would be something like a web server binding to port 80 running with multiple threads, where each thread might have it s own listener socket. This could be done as an alternative to other models:
  1. have one listener thread which dispatches completed connections to workers, or
  2. accept on a single listener socket from multiple threads.
In case #1, the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load. [ ] We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With SO_REUSEPORT the distribution is uniform. The motivating case for SO_REUSEPORT in UDP would be something like a DNS server. An alternative would be to receive on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock.
Akvorado uses the SO_REUSEPORT option to dispatch the packets across the workers. However, because the distribution uses a 4-tuple hash, a single socket handles all the flows from one exporter.

SO_ATTACH_REUSEPORT_EBPF option In Linux 4.5, Craig Gallek added the SO_ATTACH_REUSEPORT_EBPF option to attach an eBPF program to select the target UDP socket. In Linux 4.6, he extended it to support TCP. The socket(7) manual page documents this mechanism:1
The BPF program must return an index between 0 and N-1 representing the socket which should receive the packet (where N is the number of sockets in the group). If the BPF program returns an invalid index, socket selection will fall back to the plain SO_REUSEPORT mechanism.
In Linux 4.19, Martin KaFai Lau added the BPF_PROG_TYPE_SK_REUSEPORT program type. Such an eBPF program selects the socket from a BPF_MAP_TYPE_REUSEPORT_ARRAY map instead. This new approach is more reliable when switching target sockets from one instance to another for example, when upgrading, a new instance can add its sockets and remove the old ones.

Load-balancing with eBPF and Go Altering the load-balancing algorithm for a group of sockets requires two steps:
  1. write and compile an eBPF program in C,2 and
  2. load it and attach it in Go.

eBPF program in C A simple load-balancing algorithm is to randomly choose the destination socket. The kernel provides the bpf_get_prandom_u32() helper function to get a pseudo-random number.
volatile const __u32 num_sockets; //  
struct  
    __uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
    __type(key, __u32);
    __type(value, __u64);
    __uint(max_entries, 256);
  socket_map SEC(".maps"); //  
SEC("sk_reuseport")
int reuseport_balance_prog(struct sk_reuseport_md *reuse_md)
 
    __u32 index = bpf_get_prandom_u32() % num_sockets; //  
    bpf_sk_select_reuseport(reuse_md, &socket_map, &index, 0); //  
    return SK_PASS; //  
 
char _license[] SEC("license") = "GPL";
In , we declare a volatile constant for the number of sockets in the group. We will initialize this constant before loading the eBPF program into the kernel. In , we define the socket map. We will populate it with the socket file descriptors. In , we randomly select the index of the target socket.3 In , we invoke the bpf_sk_select_reuseport() helper to record our decision. Finally, in , we accept the packet.

Header files If you compile the C source with clang, you get errors due to missing headers. The recommended way to solve this is to generate a vmlinux.h file with bpftool:
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
Then, include the following headers:4
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
For my 6.17 kernel, the generated vmlinux.h is quite big: 2.7 MiB. Moreover, bpf/bpf_helpers.h is shipped with libbpf. This adds another dependency for users. As the eBPF program is quite small, I prefer to put the strict minimum in vmlinux.h by cherry-picking the definitions I need.

Compilation The eBPF Library for Go ships bpf2go, a tool to compile eBPF programs and to generate some scaffolding code. We create a gen.go file with the following content:
package main
//go:generate go tool bpf2go -tags linux reuseport reuseport_kern.c
After running go generate ./..., we can inspect the resulting objects with readelf and llvm-objdump:
$ readelf -S reuseport_bpfeb.o
There are 14 section headers, starting at offset 0x840:
  [Nr] Name              Type             Address           Offset
[ ]
  [ 3] sk_reuseport      PROGBITS         0000000000000000  00000040
  [ 6] .maps             PROGBITS         0000000000000000  000000c8
  [ 7] license           PROGBITS         0000000000000000  000000e8
[ ]
$ llvm-objdump -S reuseport_bpfeb.o
reuseport_bpfeb.o:  file format elf64-bpf
Disassembly of section sk_reuseport:
0000000000000000 <reuseport_balance_prog>:
;  
       0:   bf 61 00 00 00 00 00 00     r6 = r1
;     __u32 index = bpf_get_prandom_u32() % num_sockets;
       1:   85 00 00 00 00 00 00 07     call 0x7
[ ]

Usage from Go Let s set up 10 workers listening to the same port.5 Each socket enables the SO_REUSEPORT option before binding:6
var (
    err error
    fds []uintptr
    conns []*net.UDPConn
)
workers := 10
listenAddr := "127.0.0.1:0"
listenConfig := net.ListenConfig 
    Control: func(_, _ string, c syscall.RawConn) error  
        c.Control(func(fd uintptr)  
            err = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_REUSEPORT, 1)
            fds = append(fds, fd)
         )
        return err
     ,
 
for range workers  
    pconn, err := listenConfig.ListenPacket(t.Context(), "udp", listenAddr)
    if err != nil  
        t.Fatalf("ListenPacket() error:\n%+v", err)
     
    udpConn := pconn.(*net.UDPConn)
    listenAddr = udpConn.LocalAddr().String()
    conns = append(conns, udpConn)
 
The second step is to load the eBPF program, initialize the num_sockets variable, populate the socket map, and attach the program to the first socket.7
// Load the eBPF collection.
spec, err := loadReuseport()
if err != nil  
    t.Fatalf("loadVariables() error:\n%+v", err)
 
// Set "num_sockets" global variable to the number of file descriptors we will register
if err := spec.Variables["num_sockets"].Set(uint32(len(fds))); err != nil  
    t.Fatalf("NumSockets.Set() error:\n%+v", err)
 
// Load the map and the program into the kernel.
var objs reuseportObjects
if err := spec.LoadAndAssign(&objs, nil); err != nil  
    t.Fatalf("loadReuseportObjects() error:\n%+v", err)
 
t.Cleanup(func()   objs.Close()  )
// Assign the file descriptors to the socket map.
for worker, fd := range fds  
    if err := objs.reuseportMaps.SocketMap.Put(uint32(worker), uint64(fd)); err != nil  
        t.Fatalf("SocketMap.Put() error:\n%+v", err)
     
 
// Attach the eBPF program to the first socket.
socketFD := int(fds[0])
progFD := objs.reuseportPrograms.ReuseportBalanceProg.FD()
if err := unix.SetsockoptInt(socketFD, unix.SOL_SOCKET, unix.SO_ATTACH_REUSEPORT_EBPF, progFD); err != nil  
    t.Fatalf("SetsockoptInt() error:\n%+v", err)
 
We are now ready to process incoming packets. Each worker is a Go routine incrementing a counter for each received packet:8
var wg sync.WaitGroup
receivedPackets := make([]int, workers)
for worker := range workers  
    conn := conns[worker]
    packets := &receivedPackets[worker]
    wg.Go(func()  
        payload := make([]byte, 9000)
        for  
            if _, err := conn.Read(payload); err != nil  
                if errors.Is(err, net.ErrClosed)  
                    return
                 
                t.Logf("Read() error:\n%+v", err)
             
            *packets++
         
     )
 
Let s send 1000 packets:
sentPackets := 1000
conn, err := net.Dial("udp", conns[0].LocalAddr().String())
if err != nil  
    t.Fatalf("Dial() error:\n%+v", err)
 
defer conn.Close()
for range sentPackets  
    if _, err := conn.Write([]byte("hello world!")); err != nil  
        t.Fatalf("Write() error:\n%+v", err)
     
 
If we print the content of the receivedPackets array, we can check the balancing works as expected, with each worker getting about 100 packets:
=== RUN   TestUDPWorkerBalancing
    balancing_test.go:84: receivedPackets[0] = 107
    balancing_test.go:84: receivedPackets[1] = 92
    balancing_test.go:84: receivedPackets[2] = 99
    balancing_test.go:84: receivedPackets[3] = 105
    balancing_test.go:84: receivedPackets[4] = 107
    balancing_test.go:84: receivedPackets[5] = 96
    balancing_test.go:84: receivedPackets[6] = 102
    balancing_test.go:84: receivedPackets[7] = 105
    balancing_test.go:84: receivedPackets[8] = 99
    balancing_test.go:84: receivedPackets[9] = 88
    balancing_test.go:91: receivedPackets = 1000
    balancing_test.go:92: sentPackets     = 1000

Graceful restart You can also use SO_ATTACH_REUSEPORT_EBPF to gracefully restart an application. A new instance of the application binds to the same address and prepare its own version of the socket map. Once it attaches the eBPF program to the first socket, the kernel steers incoming packets to this new instance. The old instance needs to drain the already received packets before shutting down. To check we are not losing any packet, we spawn a Go routine to send as many packets as possible:
sentPackets := 0
notSentPackets := 0
done := make(chan bool)
conn, err := net.Dial("udp", conns1[0].LocalAddr().String())
if err != nil  
    t.Fatalf("Dial() error:\n%+v", err)
 
defer conn.Close()
go func()  
    for  
        if _, err := conn.Write([]byte("hello world!")); err != nil  
            notSentPackets++
          else  
            sentPackets++
         
        select  
        case <-done:
            return
        default:
         
     
 ()
Then, while the Go routine runs, we start the second set of workers. Once they are running, they start receiving packets. If we gracefully stop the initial set of workers, not a single packet is lost!9
=== RUN   TestGracefulRestart
    graceful_test.go:135: receivedPackets1[0] = 165
    graceful_test.go:135: receivedPackets1[1] = 195
    graceful_test.go:135: receivedPackets1[2] = 194
    graceful_test.go:135: receivedPackets1[3] = 190
    graceful_test.go:135: receivedPackets1[4] = 213
    graceful_test.go:135: receivedPackets1[5] = 187
    graceful_test.go:135: receivedPackets1[6] = 170
    graceful_test.go:135: receivedPackets1[7] = 190
    graceful_test.go:135: receivedPackets1[8] = 194
    graceful_test.go:135: receivedPackets1[9] = 155
    graceful_test.go:139: receivedPackets2[0] = 1631
    graceful_test.go:139: receivedPackets2[1] = 1582
    graceful_test.go:139: receivedPackets2[2] = 1594
    graceful_test.go:139: receivedPackets2[3] = 1611
    graceful_test.go:139: receivedPackets2[4] = 1571
    graceful_test.go:139: receivedPackets2[5] = 1660
    graceful_test.go:139: receivedPackets2[6] = 1587
    graceful_test.go:139: receivedPackets2[7] = 1605
    graceful_test.go:139: receivedPackets2[8] = 1631
    graceful_test.go:139: receivedPackets2[9] = 1689
    graceful_test.go:147: receivedPackets = 18014
    graceful_test.go:148: sentPackets     = 18014
Unfortunately, gracefully shutting down a UDP socket is not trivial in Go.10 Previously, we were terminating workers by closing their sockets. However, if we close them too soon, the application loses packets that were assigned to them but not yet processed. Before stopping, a worker needs to call conn.Read() until there are no more packets. A solution is to set a deadline for conn.Read() and check if we should stop the Go routine when the deadline is exceeded:
payload := make([]byte, 9000)
for  
    conn.SetReadDeadline(time.Now().Add(50 * time.Millisecond))
    if _, err := conn.Read(payload); err != nil  
        if errors.Is(err, os.ErrDeadlineExceeded)  
            select  
            case <-done:
                return
            default:
                continue
             
         
        t.Logf("Read() error:\n%+v", err)
     
    *packets++
 
With TCP, this aspect is simpler: after enabling the net.ipv4.tcp_migrate_req sysctl, the kernel automatically migrates waiting connections to a random socket in the same group. Alternatively, eBPF can also control this migration. Both features are available since Linux 5.14.

Addendum After implementing this strategy in Akvorado, all workers now drop packets!
$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
>   sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total listener="0.0.0.0:2055",worker="0"  838673
packets_total listener="0.0.0.0:2055",worker="1"  843675
packets_total listener="0.0.0.0:2055",worker="2"  837922
packets_total listener="0.0.0.0:2055",worker="3"  841443
packets_total listener="0.0.0.0:2055",worker="4"  840668
packets_total listener="0.0.0.0:2055",worker="5"  850274
packets_total listener="0.0.0.0:2055",worker="6"  835488
packets_total listener="0.0.0.0:2055",worker="7"  834479
The root cause is the default limit of 32 records for Kafka batch sizes. This limit is too low because the brokers have a large overhead when handling each batch: they need to ensure they persist correctly before acknowledging them. Increasing the limit to 4096 records fixes this issue. While load-balancing incoming flows with eBPF remains useful, it did not solve the main issue. At least the even distribution of dropped packets helped identify the real bottleneck.

  1. The current version of the manual page is incomplete and does not cover the evolution introduced in Linux 4.19. There is a pending patch about this.
  2. Rust is another option. However, the program we use is so trivial that it does not make sense to use Rust.
  3. As bpf_get_prandom_u32() returns a pseudo-random 32-bit unsigned value, this method exhibits a very slight bias towards the first indexes. This is unlikely to be worth fixing.
  4. Some examples include <linux/bpf.h> instead of "vmlinux.h". This makes your eBPF program dependent on the installed kernel headers.
  5. listenAddr is initially set to 127.0.0.1:0 to allocate a random port. After the first iteration, it is updated with the allocated port.
  6. This is the setupSockets() function in fixtures_test.go.
  7. This is the setupEBPF() function in fixtures_test.go.
  8. The complete code is in balancing_test.go
  9. The complete code is in graceful_test.go
  10. In C, we would poll() both the socket and a pipe used to signal for shutdown. When the second condition is triggered, we drain the socket by executing a series of non-blocking read() until we get EWOULDBLOCK.

1 January 2026

Russ Allbery: 2025 Book Reading in Review

In 2025, I finished and reviewed 32 books, not counting another five books I've finished but not yet reviewed and which will therefore roll over to 2026. This was not a great reading year, although not my worst reading year since I started keeping track. I'm not entirely sure why, although part of the explanation was that I hit a bad stretch of books in spring of 2025 and got into a bit of a reading slump. Mostly, though, I shifted a lot of reading this year to short non-fiction (newsletters and doom-scrolling) and spent rather more time than I intended watching YouTube videos, and sadly each hour in the day can only be allocated one way. This year felt a bit like a holding pattern. I have some hopes of being more proactive and intentional in 2026. I'm still working on finding a good balance between all of my hobbies and the enjoyment of mindless entertainment. The best book I read this year was also the last book I reviewed (and yes, I snuck the review under the wire for that reason): Bethany Jacobs's This Brutal Moon, the conclusion of the Kindom Trilogy that started with These Burning Stars. I thought the first two books of the series were interesting but flawed, but the conclusion blew me away and improved the entire trilogy in retrospect. Like all books I rate 10 out of 10, I'm sure a large part of my reaction is idiosyncratic, but two friends of mine also loved the conclusion so it's not just me. The stand-out non-fiction book of the year was Rory Stewart's Politics on the Edge. I have a lot of disagreements with Stewart's political positions (the more I listen to him, the more disagreements I find), but he is an excellent memoirist who skewers the banality, superficiality, and contempt for competence that has become so prevailing in centrist and right-wing politics. It's hard not to read this book and despair of electoralism and the current structures of governments, but it's bracing to know that even some people I disagree with believe in the value of expertise. I also finished Suzanne Palmer's excellent Finder Chronicles series, reading The Scavenger Door and Ghostdrift. This series is some of the best science fiction I've read in a long time and I'm sad it is over (at least for now). Palmer has a new, unrelated book coming in 2026 (Ode to the Half-Broken), and I'm looking forward to reading that. This year, I experimented with re-reading books I had already reviewed for the first time since I started writing reviews. After my reading slump, I felt like revisiting something I knew I liked, and therefore re-read C.J. Cherryh's Cyteen and Regenesis. Cyteen mostly held up, but Regenesis was worse than I had remembered. I experimented with a way to add on to my previous reviews, but I didn't like the results and the whole process of re-reading and re-reviewing annoyed me. I'm counting this as a failed experiment, which means I've still not solved the problem of how to revisit series that I read long enough ago that I want to re-read them before picking up the new book. (You may have noticed that I've not read the new Jacqueline Carey Kushiel novel, for example.) You may have also noticed that I didn't start a new series re-read, or continue my semi-in-progress re-reads of Mercedes Lackey or David Eddings. I have tentative plans to kick off a new series re-read in 2026, but I'm not ready to commit to that yet. As always, I have no firm numeric goals for the next year, but I hope to avoid another reading slump and drag my reading attention back from lower-quality and mostly-depressing material in 2026. The full analysis includes some additional personal reading statistics, probably only of interest to me.

Daniel Baumann: P r ch storage (in Western Europe)

The majority of P r ch ( ) is stored in Asia (predominantly China, some in Malaysia and Taiwan) because of its climatic conditions:
  • warm (~30 C)
  • humid (~75% RH)
  • stable (comparatively small day to day, day to night, and seasonal differences)
These are ideal for ageing and allow efficient storage in simple storehouses, usually without the need for any air conditioning. The climate in Western European countries is significantly colder, drier, and more variable. However, residential houses offers throughout the year a warm (~20 C), humid (~50% RH), and stable baseline to start with. High quality, long term storage and ageing is possible too with some of the Asian procedures slightly adjusted for the local conditions. Nevertheless, fast accelerated ageing still doesn t work here (even with massive climate control). Personally I prefer the balanced, natural storage over the classical wet or dry , or the mixed traditional storage (of course all storage types are not that meaningful as they are all relative terms anyway). Also I don t like strong fermentation either (P r is not my favourite tea, I only drink it in the 3 months of winter). Therefore, my intention is primarily to preserve the tea while it continues to age normally, keep it in optimal drinking condition and don t rush its fermentation. The value of correct storage is of great importance and has a big effect on P r, and is often overlooked and underrated. Here s a short summary on how to store P r ch (in Western Europe).
P ' r ch  storage (in Western Europe) Image: some D y P r ch stored at home
1. Location
1.1 No light Choose a location with neither direct nor indirect sunlight (= ultraviolet radiation) exposure:
  • direct sunlight damages organic material ( bleeching )
  • indirect sunlight by heating up the containers inbalances the microclimate ( suppression )
1.2 No odors Choose a location with neither direct nor indirect odor exposure:
  • direct odor immissions (like incense, food, air polution, etc.) makes the tea undrinkable
  • indirect odor immissions (especially other teas stored next to it) taint the taste and dilute the flavours
Always use individual containers for each tea, store only identical tea of the same year and batch in the same container. Idealy never store one cake, brick or tuo on its own, buy multiples and store them together for better ageing.
2. Climate
2.1 Consistent temperature Use a location with no interference from any devices or structures next to it:
  • use an regular indoor location for mild temperature curves (i.e. no attics with large day/night temperature differences)
  • aim for >= 20 C average temperature for natural storage (i.e. no cold basements)
  • don t use a place next to any heat dissipating devices (like radiators, computers, etc.)
  • don t use a place next to an outside facing wall
  • always leave 5 to 10cm distance to a wall for air to circulate (generally prevents mold on the wall but also isolates the containers further from possible immissions)
As consistent temperature as possible allows even and steady fermentation. However, neither air conditioning or filtering is needed. Regular day to day, day to night, and season to season fluctuations are balanced out fine with otherwise correct indoor storage. Also humidity control is much more important for storage and ageing, and much less forgiving than temperature control.
2.2 Consistent humidity Use humidity control packs to ensure a stable humidity level:
  • aim for ~65% RH
Lower than 60% RH will (completely) dry out the tea, higher than 70% RH increases chances for mold.
3. Equipment
3.1 Proper containers Use containers that are long term available and that are easily stackable, both in form and dimensions as well as load-bearing capacity. They should be inexpensive, otherwise it s most likely scam (there are people selling snake-oil containers specifically for tea storage). For long term storage and ageing:
  • never use plastic: they leak chemicals over time (no tupperdors , mylar and zip lock bags, etc.)
  • never use cardboard or bamboo: they let to much air in, absorb too much humity and dry to slow
  • never use wood: they emit odors (no humidors)
  • never use clay: they absorb all humidity and dry out the tea (glazed or porcelain urns are acceptable for short term storage)
Always use sealed but not air tight, cleaned steal cans (i.e. with no oil residues from manufacturing; such as these).
3.2 Proper humidity control Use two-way humidity control packs to either absorb or add moisture depending on the needs:
  • use humidity control pack(s) in every container
  • use one 60g pack (such as these) per 500g of tea
3.3 Proper labels Put labels on the outside of the containers to not need to open them to know what is inside and disturbe the ageing:
  • write at least manufacturer and name, pressing year, and batch number on the label (such as these)
  • if desired and not otherwise kept track already elsewhere, add additional information such as the amount of items, weights, previous storage location/type, date of purchase, vendor, or price, etc.
3.4 Proper monitoring Measuring and checking temperature and humidity regularly prevents storage parameters turning bad going unnoticed:
  • put temperature and humidity measurement sensors (such as these) inside some of the containers (e.g. at least one in every different size/type of container)
  • keep at least one temperature and humidity measurement sensor next to the containers on the outside to monitor the storage location
4. Storage
4.1 Continued maintenance Before beginning to initially store a new tea, let it acclimatize for 2h after unpacking from transport (or longer if temperature differences between indoors and outdoors are high, i.e. in winter). Then continuesly:
  • once a month briefly air all containers for a minute or two
  • once a month check for mold, just to be safe
  • every 3 to 6 months check all humidity control packs for need of replacement
  • monitor battery life of temperature and humidity measurement sensors
Humidity control packs can be recharged (usually voiding any warranty) by submerging them for 2 days in distilled water and 4 hours drying on paper towl afterwards. Check with a weight scale after recharging, they should regain their original weight (e.g. 60g plus ~2 to 5g for the packaging).
Finally With a correct P r ch storage:
  • beginn to drink Sh ng P r ( ) roughly after about 10-15 years, or later
  • beginn to drink Sh u P r ( ) roughly after about 3-5 years, or later
Prepare the tea by breaking it into, or breaking off from it, ~5 to 10g pieces about 1 to 3 months ahead of drinking and consider increase humidity to 70% RH during that time. 2026

31 December 2025

Sergio Cipriano: Zero-Code Instrumentation of an Envoy TCP Proxy using eBPF

Zero-Code Instrumentation of an Envoy TCP Proxy using eBPF I recently had to debug an Envoy Network Load Balancer, and the options Envoy provides just weren't enough. We were seeing a small number of HTTP 499 errors caused by latency somewhere in our cloud, but it wasn't clear what the bottleneck was. As a result, each team had to set up additional instrumentation to catch latency spikes and figure out what was going on. My team is responsible for the LBaaS product (Load Balancer as a Service) and, of course, we are the first suspects when this kind of problem appear. Before going for the current solution, I read a lot of Envoy's documentation. It is possible to enable access logs for Envoy, but they don't provide the information required for this debug. This is an example of the output:
[2025-12-08T20:44:49.918Z] "- - -" 0 - 78 223 1 - "-" "-" "-" "-" "172.18.0.2:8080"
I won't go into detail about the line above, since it's not possible to trace the request using access logs alone. Envoy also has OpenTelemetry tracing, which is perfect for understanding sources of latency. Unfortunatly, it is only available for Application Load Balancers. Most of the HTTP 499 were happening every 10 minutes, so we managed to get some of the requests with tcpdump, Wireshark and using http headers to filter the requests. This approach helped us reproduce and track down the problem, but it wasn't a great solution. We clearly needed better tools to catch this kind of issue the next time it happened. Therefore, I decided to try out OpenTelemetry eBPF Instrumentation, also referred to as OBI. I saw the announcement of Grafana Beyla before it was renamed to OBI, but I didn't have the time or a strong reason to try it out until now. Even then, I really liked the idea, and the possibility of using eBPF to solve this instrumentation problem had been in the back of my mind. OBI promises zero-code automatic instrumentation for Linux services using eBPF, so I put together a minimal setup to see how well it works.

Reproducible setup I used the following tools: Setting up a TCP Proxy with Envoy was straightforward:
static_resources:
  listeners:
  - name: go_server_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8000
    filter_chains:
    - filters:
      - name: envoy.filters.network.tcp_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: go_server_tcp
          cluster: go_server_cluster
  clusters:
  - name: go_server_cluster
    connect_timeout: 1s
    type: LOGICAL_DNS
    load_assignment:
      cluster_name: go_server_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: target-backend
                port_value: 8080
This is the simplest Envoy TCP proxy configuration: a listener on port 8000 forwarding traffic to a backend running on port 8080. For the backend, I used a basic Go HTTP server:
package main

import (
    "fmt"
    "net/http"
)

func main()  
    http.Handle("/", http.FileServer(http.Dir(".")))

    server := http.Server Addr: ":8080" 

    fmt.Println("Starting server on :8080")
    panic(server.ListenAndServe())
 
Finally, I wrapped everything together with Docker Compose:
services:
  autoinstrumenter:
    image: otel/ebpf-instrument:main
    pid: "service:envoy"
    privileged: true
    environment:
      OTEL_EBPF_TRACE_PRINTER: text
      OTEL_EBPF_OPEN_PORT: 8000

  envoy:
    image: envoyproxy/envoy:v1.33-latest
    ports:
      - 8000:8000
    volumes:
      - ./envoy.yaml:/etc/envoy/envoy.yaml 
    depends_on:
      - target-backend
  
  target-backend:
    image: golang:1.22-alpine
    command: go run /app/backend.go
    volumes:
      - ./backend.go:/app/backend.go:ro
    expose:
      - 8080
OBI should output traces to the standard output similar to this when a HTTP request is made to Envoy:
2025-12-08 20:44:49.12884449 (305.572 s[305.572 s]) HTTPClient 200 GET /(/) [172.18.0.3 as envoy:36832]->[172.18.0.2 as localhost:8080] contentLen:78B responseLen:0B svc=[envoy generic] traceparent=[00-529458a2be271956134872668dc5ee47-6dba451ec8935e3e[06c7f817e6a5dae2]-01]
2025-12-08 20:44:49.12884449 (1.260901ms[366.65 s]) HTTP 200 GET /(/) [172.18.0.1 as 172.18.0.1:36282]->[172.18.0.3 as envoy:8000] contentLen:78B responseLen:223B svc=[envoy generic] traceparent=[00-529458a2be271956134872668dc5ee47-06c7f817e6a5dae2[0000000000000000]-01]
This is exactly what we needed, with zero-code. The above trace shows:
  • 2025-12-08 20:44:49.12884449: time of the trace.
  • (1.260901ms[366.65 s]): total response time for the request, with the actual internal execution time of the request (not counting the request enqueuing time).
  • HTTP 200 GET /: protocol, response code, HTTP method, and URL path.
  • [172.18.0.1 as 172.18.0.1:36282]->[172.18.0.3 as envoy:8000]: source and destination host:port. The initial request originates from my machine through the gateway (172.18.0.1), hits the Envoy (172.23.0.3), the proxy then forwards it to the backend application (172.23.0.2).
  • contentLen:78B: HTTP Content-Length. I used curl and the default request size for it is 78B.
  • responseLen:223B: Size of the response body.
  • svc=[envoy generic]: traced service.
  • traceparent: ids to trace the parent request. We can see that the Envoy makes a request to the target and this request has the other one as parent.
Let's add one more Envoy to show that it's also possible to track multiple services.
  envoy1:
    image: envoyproxy/envoy:v1.33-latest
    ports:
      - 9000:9000
    volumes:
      - ./envoy1.yaml:/etc/envoy/envoy.yaml
    depends_on:
      - envoy
The new Envoy will listen on port 9000 and forward the request to the other Envoy listening on port 8000. Now we just need to change OBI open port variable to look at a range:
OTEL_EBPF_OPEN_PORT: 8000-9000
And change the pid field of the autoinstrumenter service to use the host's PID namespace inside the container:
pid: host
This is the output I got after one curl:
2025-12-09 12:28:05.12912285 (2.202041ms[1.524713ms]) HTTP 200 GET /(/) [172.19.0.1 as 172.19.0.1:59030]->[172.19.0.5 as envoy:9000] contentLen:78B responseLen:223B svc=[envoy generic] traceparent=[00-69977bee0c2964b8fe53cdd16f8a9d19-856c9f700e73bf0d[0000000000000000]-01]
2025-12-09 12:28:05.12912285 (1.389336ms[1.389336ms]) HTTPClient 200 GET /(/) [172.19.0.5 as envoy:59806]->[172.19.0.4 as localhost:8000] contentLen:78B responseLen:0B svc=[envoy generic] traceparent=[00-69977bee0c2964b8fe53cdd16f8a9d19-caa7f1ad1c68fa77[856c9f700e73bf0d]-01]
2025-12-09 12:28:05.12912285 (1.5431ms[848.574 s]) HTTP 200 GET /(/) [172.19.0.5 as 172.19.0.5:59806]->[172.19.0.4 as envoy:8000] contentLen:78B responseLen:223B svc=[envoy generic] traceparent=[00-69977bee0c2964b8fe53cdd16f8a9d19-cbca9d64d3d26b40[caa7f1ad1c68fa77]-01]
2025-12-09 12:28:05.12912285 (690.217 s[690.217 s]) HTTPClient 200 GET /(/) [172.19.0.4 as envoy:34256]->[172.19.0.3 as localhost:8080] contentLen:78B responseLen:0B svc=[envoy generic] traceparent=[00-69977bee0c2964b8fe53cdd16f8a9d19-5502f7760ed77b5b[cbca9d64d3d26b40]-01]
2025-12-09 12:28:05.12912285 (267.9 s[238.737 s]) HTTP 200 GET /(/) [172.19.0.4 as 172.19.0.4:34256]->[172.19.0.3 as backend:8080] contentLen:0B responseLen:0B svc=[backend go] traceparent=[00-69977bee0c2964b8fe53cdd16f8a9d19-ac05c7ebe26f2530[5502f7760ed77b5b]-01]
Each log line represents a span belonging to the same trace (69977bee0c2964b8fe53cdd16f8a9d19). For readability, I ordered the spans by their traceparent relationship, showing the request's path as it moves through the system: from the client-facing Envoy, through the internal Envoy hop, and finally to the Go backend. You can see both server-side (HTTP) and client-side (HTTPClient) spans at each hop, along with per-span latency, source and destination addresses, and response sizes, making it easy to pinpoint where time is spent along the request chain. The log lines are helpful, but we need better ways to visualize the traces and the metrics generated by OBI. I'll share another setup that more closely reflects what we actually use.

Production setup I'll be using the following tools this time: The goal of this setup is to mirror an environment similar to what I used in production. This time, I've omitted the load balancer and shifted the emphasis to observability instead. setup diagram I will run three HTTP servers on port 8080: two inside Incus containers and one on the host machine. The OBI process will export metrics and traces to an OpenTelemetry Collector, which will forward traces to Jaeger and expose a metrics endpoint for Prometheus to scrape. Grafana will also be added to visualize the collected metrics using dashboards. The aim of this approach is to instrument only one of the HTTP servers while ignoring the others. This simulates an environment with hundreds of Incus containers, where the objective is to debug a single container without being overwhelmed by excessive and irrelevant telemetry data from the rest of the system. OBI can filter metrics and traces based on attribute values, but I was not able to filter by process PID. This is where the OBI Collector comes into play, it allows me to use a processor to filter telemetry data by the PID of the process being instrumented. These are the steps to reproduce this setup:
  1. Create the incus containers.
$ incus launch images:debian/trixie server01
Launching server01
$ incus launch images:debian/trixie server02
Launching server02
  1. Start the HTTP server on each container.
$ apt install python3 --update -y
$ tee /etc/systemd/system/server.service > /dev/null <<'EOF'
[Unit]
Description=Python HTTP server
After=network.target
[Service]
User=root
Group=root
Type=simple
ExecStart=/usr/bin/python3 -m http.server 8080
Restart=always
StandardOutput=journal
StandardError=journal
[Install]
WantedBy=multi-user.target
EOF
$ systemctl start server.service
  1. Start the HTTP server on the host.
$ python3 -m http.server 8080
  1. Start the Docker compose.
services:
  autoinstrumenter:
    image: otel/ebpf-instrument:main
    pid: host
    privileged: true
    environment:
      OTEL_EBPF_CONFIG_PATH: /etc/obi/obi.yml 
    volumes:
      - ./obi.yml:/etc/obi/obi.yml

  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.98.0
    command: ["--config=/etc/otel-collector-config.yml"]
    volumes:
      - ./otel-collector-config.yml:/etc/otel-collector-config.yml
    ports:
      - "4318:4318" # Otel Receiver
      - "8889:8889" # Prometheus Scrape
    depends_on:
      - autoinstrumenter
      - jaeger
      - prometheus

  prometheus:
    image: prom/prometheus
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
    ports:
      - "9090:9090" # Prometheus UI

  grafana:
    image: grafana/grafana
    restart: always
    environment:
      - GF_SECURITY_ADMIN_USER=admin
      - GF_SECURITY_ADMIN_PASSWORD=RandomString123!
    volumes:
      - ./grafana-ds.yml:/etc/grafana/provisioning/datasources/datasource.yml
    ports:
      - "3000:3000" # Grafana UI

  jaeger:
    image: jaegertracing/all-in-one
    container_name: jaeger
    ports:
      - "16686:16686" # Jaeger UI
      - "4317:4317"  # Jaeger OTLP/gRPC Collector
Here's what the configuration files look like:
  • obi.yml:
log_level: INFO
trace_printer: text

discovery:
  instrument:
    - open_ports: 8080

otel_metrics_export:
  endpoint: http://otel-collector:4318
otel_traces_export:
  endpoint: http://otel-collector:4318
  • prometheus.yml:
global:
  scrape_interval: 5s

scrape_configs:
  - job_name: 'otel-collector'
    static_configs:
      - targets: ['otel-collector:8889']
  • grafana-ds.yml:
apiVersion: 1

datasources:
  - name: Prometheus
    type: prometheus
    access: proxy
    url: http://prometheus:9090
    isDefault: true
  • otel-collector-config.yml:
receivers:
  otlp:
    protocols:
      http:
        endpoint: otel-collector:4318

exporters:
  otlp/jaeger:
    endpoint: jaeger:4317
    tls:
      insecure: true

  prometheus:
    endpoint: 0.0.0.0:8889
    namespace: default

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [otlp/jaeger]

    metrics:
      receivers: [otlp] 
      exporters: [prometheus]
We're almost there, the OpenTelemetry Collector is just missing a processor. To create the processor filter, we can look at the OBI logs to find the PID of the HTTP server being instrumented:
autoinstrumenter-1    time=2025-12-30T19:57:17.593Z level=INFO msg="instrumenting process" component=discover.traceAttacher cmd=/usr/bin/python3.13 pid=297514 ino=460310 type=python service=""
autoinstrumenter-1    time=2025-12-30T19:57:18.320Z level=INFO msg="instrumenting process" component=discover.traceAttacher cmd=/usr/bin/python3.13 pid=310288 ino=722998 type=python service=""
autoinstrumenter-1    time=2025-12-30T19:57:18.512Z level=INFO msg="instrumenting process" component=discover.traceAttacher cmd=/usr/bin/python3.13 pid=315183 ino=2888480 type=python service=""
Which can also be obtained using standard GNU/Linux utilities:
$ cat /sys/fs/cgroup/lxc.payload.server01/system.slice/server.service/cgroup.procs 
297514
$ cat /sys/fs/cgroup/lxc.payload.server02/system.slice/server.service/cgroup.procs 
310288
$ ps -aux   grep http.server
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
1000000   297514  0.0  0.1  32120 14856 ?        Ss   16:03   0:00 /usr/bin/python3 -m http.server 8080
1000000   310288  0.0  0.1  32120 10616 ?        Ss   16:09   0:00 /usr/bin/python3 -m http.server 8080
cipriano  315183  0.0  0.1 103476 11480 pts/3    S+   16:17   0:00 python -m http.server 8080
If we search for the PID in the OpenTelemetry Collector endpoint where Prometheus metrics are exposed, we can find the attribute values to filter on.
$ curl http://localhost:8889/metrics   rg 297514
default_target_info host_id="148f400ad3ea",host_name="148f400ad3ea",instance="148f400ad3ea:297514",job="python3.13",os_type="linux",service_instance_id="148f400ad3ea:297514",service_name="python3.13",telemetry_sdk_language="python",telemetry_sdk_name="opentelemetry-ebpf-instrumentation",telemetry_sdk_version="main"  1
Now we just need to add the processor to the collector configuration:
processors: # <--- NEW BLOCK
  filter/host_id:
    traces:
      span:
        - 'resource.attributes["service.instance.id"] == "148f400ad3ea:297514"'

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [filter/host_id] # <--- NEW LINE
      exporters: [otlp/jaeger]

    metrics:
      receivers: [otlp] 
      processors:  # <--- NEW BLOCK
        - filter/host_id
      exporters: [prometheus]
That's it! The processor will handle the filtering for us, and we'll only see traces and metrics from the HTTP server running in the server01 container. Below are some screenshots from Jaeger and Grafana: jaeger search will all traces one jaeger trace grafana request duration panel

Closing Notes I am still amazed at how powerful OBI can be. For those curious about the debug, we found out that a service responsible for the network orchestration of the Envoy containers was running netplan apply every 10 minutes because of a bug. Netplan apply causes interfaces to go down temporarily and this made the latency go above 500ms which caused the 499s.

Next.