Search Results: "zed"

12 March 2026

Reproducible Builds: Reproducible Builds in February 2026

Welcome to the February 2026 report from the Reproducible Builds project! These reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.

  1. reproduce.debian.net
  2. Tool development
  3. Distribution work
  4. Miscellaneous news
  5. Upstream patches
  6. Documentation updates
  7. Four new academic papers

reproduce.debian.net The last year has seen the introduction, development and deployment of reproduce.debian.net. In technical terms, this is an instance of rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there. This month, however, Holger Levsen added suite-based navigation (eg. Debian trixie vs forky) to the service (in addition to the already existing architecture based navigation) which can be observed on, for instance, the Debian trixie-backports or trixie-security pages.

Tool development diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including preparing and uploading versions, 312 and 313 to Debian. In particular, Chris updated the post-release deployment pipeline to ensure that the pipeline does not fail if the automatic deployment to PyPI fails [ ]. In addition, Vagrant Cascadian updated an external reference for the 7z tool for GNU Guix. [ ]. Vagrant Cascadian also updated diffoscope in GNU Guix to version 312 and 313.

Distribution work In Debian this month:
  • 26 reviews of Debian packages were added, 5 were updated and 19 were removed this month adding to our extensive knowledge about identified issues.
  • A new debsbom package was uploaded to unstable. According to the package description, this package generates SBOMs (Software Bill of Materials) for distributions based on Debian in the two standard formats, SPDX and CycloneDX. The generated SBOM includes all installed binary packages and also contains Debian Source packages.
  • In addition, a sbom-toolkit package was uploaded, which provides a collection of scripts for generating SBOM. This is the tooling used in Apertis to generate the Licenses SBOM and the Build Dependency SBOM. It also includes dh-setup-copyright, a Debhelper addon to generate SBOMs from DWARF debug information, which are extracted from DWARF debug information by running dwarf2sources on every ELF binaries in the package and saving the output.
Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for their work there.

Miscellaneous news

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Documentation updates Once again, there were a number of improvements made to our website this month including:

Four new academic papers Julien Malka and Arnout Engelen published a paper titled Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model:
[While] recent studies have shown that high reproducibility rates are achievable at scale demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database [ ].
A PDF of their paper is available online.
Javier Ron and Martin Monperrus of KTH Royal Institute of Technology, Sweden, also published a paper, titled Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. [ ]
A PDF of the paper is available online.
Oreofe Solarin of Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, USA, published It s Not Just Timestamps: A Study on Docker Reproducibility:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We built a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible.
A PDF of Oreofe s paper is available online.
Lastly, Jens Dietrich and Behnaz Hassanshahi published On the Variability of Source Code in Maven Package Rebuilds:
[In] this paper we test the assumption that the same source code is being used [by] alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google s Assured Open Source and Oracle s Build-from-Source projects. [ ]
A PDF of their paper is available online.

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

11 March 2026

Sven Hoexter: RFC 9849 - Encrypted Client Hello

Now that ECH is standardized I started to look into it to understand what's coming. While generally desirable to not leak the SNI information, I'm not sure if it will ever make it to the masses of (web)servers outside of big CDNs. Beside of the extension of the TLS protocol to have an inner and outer ClientHello, you also need (frequent) updates to your HTTPS/SVCB DNS records. The idea is to rotate the key quickly, the OpenSSL APIs document talks about hourly rotation. Which means you've to have encrypted DNS in place (I guess these days DNSoverHTTPS is the most common case), and you need to be able to distribute the private key between all involved hosts + update DNS records in time. In addition to that you can also use a "shared mode" where you handle the outer ClientHello (the one using the public key from DNS) centrally and the inner ClientHello on your backend servers. I'm not yet sure if that makes it easier or even harder to get it right. That all makes sense, and is feasible for setups like those at Cloudflare where the common case is that they provide you NS servers for your domain, and terminate your HTTPS connections. But for the average webserver setup I guess we will not see a huge adoption rate. Or we soon see something like a Caddy webserver on steroids which integrates a DNS server for DoT with not only automatic certificate renewal build in, but also automatic ECHConfig updates. If you want to read up yourself here are my starting points: RFC 9849 TLS Encrypted Client Hello RFC 9848 Bootstrapping TLS Encrypted ClientHello with DNS Service Bindings RFC 9934 Privacy-Enhanced Mail (PEM) File Format for Encrypted ClientHello (ECH) OpenSSL 4.0 ECH APIs curl ECH Support nginx ECH Support Cloudflare Good-bye ESNI, hello ECH! If you're looking for a test endpoint, I see one hosted by Cloudflare:
$ dig +short IN HTTPS cloudflare-ech.com
1 . alpn="h3,h2" ipv4hint=104.18.10.118,104.18.11.118 ech=AEX+DQBBFQAgACDBFqmr34YRf/8Ymf+N5ZJCtNkLm3qnjylCCLZc8rUZcwAEAAEAAQASY2xvdWRmbGFyZS1lY2guY29tAAA= ipv6hint=2606:4700::6812:a76,2606:4700::6812:b76

8 March 2026

Gunnar Wolf: As Answers Get Cheaper, Questions Grow Dearer

This post is an unpublished review for As Answers Get Cheaper, Questions Grow Dearer
This opinion article tackles the much discussed issues of Large Language Models (LLMs) both endangering jobs and improving productivity. The authors begin by making a comparison, likening the current understanding of the effects LLMs are currently having upon knowledge-intensive work to that of artists in the early XIX century, when photography was first invented: they explain that photography didn t result in painting becoming obsolete, but undeniably changed in a fundamental way. Realism was no longer the goal of painters, as they could no longer compete in equal terms with photography. Painters then began experimenting with the subjective experiences of color and light: Impressionism no longer limits to copying reality, but adds elements of human feeling to creations. The authors argue that LLMs make getting answers terribly cheap not necessarily correct, but immediate and plausible. In order for the use of LLMs to be advantageous to users, a good working knowledge of the domain in which LLMs are queried is key. They cite as LLMs increasing productivity on average 14% at call centers, where questions have unambiguous answers and the knowledge domain is limited, but causing prejudice close to 10% to inexperience entrepreneurs following their advice in an environment where understanding of the situation and critical judgment are key. The problem, thus, becomes that LLMs are optimized to generate plausible answers. If the user is not a domain expert, plausibility becomes a stand-in for truth . They identify that, with this in mind, good questions become strategic: Questions that continue a line of inquiry, that expand the user s field of awareness, that reveal where we must keep looking. They liken this to Clayton Christensen s 2010 text on consulting : A consultant s value is not in having all the answers, but in teaching clients how to think. LLMs are already, and will likely become more so as they improve, game-changing for society. The authors argue that for much of the 20th century, an individual s success was measured by domain mastery, but bring to the table that the defining factor is no longer knowledge accumulation, but the ability to formulate the right questions. Of course, the authors acknowledge (it s even the literal title of one of the article s sections) that good questions need strong theoretical foundations. Knowing a specific domain enables users to imagine what should happen if following a specific lead, anticipate second-order effects, and evaluate whether plausible answers are meaningful or misleading. Shortly after I read the article I am reviewing, I came across a data point that quite validates its claims: A short, informally published paper on combinatorics and graph theory titled Claude s Cycles written by Donald Knuth (one of the most respected Computer Science professors and researchers and author of the very well known The Art of Computer Programming series of books). Knuth s text, and particularly its postscripts , perfectly illustrate what the article of this review conveys: LLMs can help a skillful researcher connect the dots in very varied fields of knowledge, perform tiring and burdensome calculators, even try mixing together some ideas that will fail or succeed. But guided by a true expert of the field, asking the right, insightful and informed questions will the answers prove to be of value and, in this case, of immense value. Knuth writes of a particular piece of the solution, I would have found this solution myself if I d taken time to look carefully at all 760 of the generalizable solutions for m=3 , but having an LLM perform all the legwork it was surely a better use of his time. Christensen, C.M. How Will You Measure Your Life? Harvard Business Review Press (2017). Knuth, D. Claude s Cycles. https://cs.stanford.edu/~knuth/papers/claude-cycles.pdf

2 March 2026

Valhalla's Things: A Pen Case (or a Few)

Posted on March 2, 2026
Tags: madeof:atoms, FreeSoftWear, craft:sewing
A pen case made of two pieces of a relatively stiff black material with a flat base and three separate channels on top, plus a flap covering everything and a band to keep the flap closed; there is visible light blue stitching all around the channels. For my birthday, I ve bought myself a fancy new expensive1 fountain pen. A two slot pen case in the same material as above, but brown: the flap is too short to cover the pens, and there isn't a band to keep it closed. Such a fancy pen, of course requires a suitable case: I couldn t use the failed prototype of a case I ve been keeping my Preppys in, so I had to get out the nice vegetable tanned leather Yeah, nope, I don t have that (yet). I got out the latex and cardboard material that is sold as a (cheaper) leather substitute, doesn t look like leather at all, but is quite nice (and easy) to work with. The project is not vegan anyway, because I used waxed linen thread, waxing it myself with a lot of very nicely smelling beeswax. a case similar to the one above, but this one only has two slots, and there is a a Faber Castell pen nested on top of the case between the two slots. Here the stitches are white, and in a coarser thread. I got the measurements2 from the less failed prototype where I keep my desktop pens, and this time I made a proper pattern I could share online, under the usual Free Culture license. A case like the one above, except that the stitches are in black, and not as regular. This one has also been scrunched up a bit for a different look, and now the band is a bit too wide. From the width of the material I could conveniently cut two cases, so that s what I did, started sewing the first one, realized that I got the order of stitching wrong, and also that if I used light blue thread instead of the black one it would look nice, and be easier to see in the pictures for the published pattern, started sewing the second one, and kept alternating between the two, depending on the availability of light for taking pictures. The open pen case, showing two pens, a blue Preppy and a gunmetal Plaisir cosily nested in the two outer slots, while the middle slot is ominously empty. One of the two took the place of my desktop one, where I had one more pen than slots, and one of the old prototypes was moved to keep my bedside pen, and the other new case was used for the new pen in my handbag, together with a Preppy, and now I have a free slot and you can see how this is going to go wrong, right? :D

  1. 16 . plus a 9 converter, and another 6 pen to get the EF nib from, since it wasn t available for the expensive pen.
  2. I have them written down somewhere. I couldn t find them. So I measured the real thing, with some approximation.

28 February 2026

Daniel Baumann: Debian Fast Forward: An alternative backports repository

The Debian project releases a new stable version of its Linux distribution approximately every two years. During its life time, a stable release usually gets security updates only, but in general no feature updates. For some packages it is desirable to get feature updates earlier than with the next stable release. Some new packages included in Debian after the initial release of a stable distribution are desirable for stable too. Both use-cases can be solved by recompiling the newer version of a package from testing/unstable on stable (aka backporting). Packages are backported together with only the minimal amount of required build-depends or depends not already fulfilled in stable (if any), and without any changes unless required to fix building on stable (if needed). There are official Debian Backports available, as well as several well-known unofficial backports repositories. I have been involved in one of these unofficial repositories since 2005 which subsequently turned 2010 into its own Debian derivative, mixing both backports and modified packages in one repository for simplicity. Starting with the Debian 13 (trixie) release, the (otherwise unmodified) backports of this derivative have been split out from the derivative distribution into a separate repository. This way the backports are more accessible and useful for all interested Debian users too.
TL;DR: Debian Fast Forward - https://fastforward.debian.net
  • is an alternative Debian repository containing complementary backports from testing/unstable to stable
  • with packages organized in a curated, self-contained selection of coherent sets
  • supporting amd64, i386, and arm64 architectures
  • containing around 400 packages in trixie-fastforward-backports
  • with 1 800 uploads since July 2025
End user documentation about how to enable Debian Fast Forwards is available. Have fun!

27 February 2026

Petter Reinholdtsen: Free software toolchain for the simplest RISC-V CPU in a small FPGA?

On Wednesday I had the pleasure of attending a presentation organized by the Norwegian Unix Users Group on implementing RISC-V using a small FPGA. This project is the result of a university teacher wanting to teach students assembly programming using a real instruction set, while still providing a simple and transparent CPU environment. The CPU in question implements the smallest set of opcodes needed to still call the CPU a RISC-V CPU, the RV32I base set. The author and presenter, Kristoffer Robin Stokke, demonstrated how to build both the FPGA setup and a small startup code providing a "Hello World" message over both serial port and a small LCD display. The FPGA is programmed using VHDL, the entire source code is available from github, but unfortunately the target FPGA setup is compiled using the proprietary tool Quartus. It is such a pity that such a cool little piece of free software should be chained down by non-free software, so my friend Jon Nordby set out to see if we can liberate this small RISC-V CPU. After all, it would be unforgivable sin to force students to use non-free software to study at the University of Oslo. The VHDL code for the CPU instructions itself is only 1138 lines, if I am to believe wc -l lib/riscv_common/* lib/rv32i/*. On the small FPGA used during the talk, the entire CPU, ROM, display and serial port driver only used up half the capacity. These days, there exists a free software toolchain for FPGA programming not only in Verilog but also in VHDL, and we hope the support in yosys, ghdl, and yosys-plugin-ghdl (sadly and strangely enough, removed from Debian unstable) is complete enough to at least build this small and simple project with some minor portability fixes. Or perhaps there are other approaches that work better? The first patches are already floating on github, to make the VHDL code more portable and to test out the build. If you are interested in running your own little RISC-V CPU on a FPGA chip, please get in touch. At the moment we sadly have hit a GHDL bug, which we do not quite know how to work around or fix:
******************** GHDL Bug occurred ***************************
Please report this bug on https://github.com/ghdl/ghdl/issues
GHDL release: 5.0.1 (Debian 5.0.1+dfsg-1+b1) [Dunoon edition]
Compiled with unknown compiler version
Target: x86_64-linux-gnu
/scratch/pere/src/fpga/memstick-fpga-riscv-upstream/
Command line:
Exception CONSTRAINT_ERROR raised
Exception information:
raised CONSTRAINT_ERROR : synth-vhdl_expr.adb:1763 discriminant check failed
******************************************************************
Thus more work is needed. For me, this simple project is the first stepping stone for a larger dream I have of converting the MESA machine controller system to build its firmware using a free software toolchain. I just need to learn more FPGA programming first. :) As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Dirk Eddelbuettel: x13binary 1.1.61.2 on CRAN: Micro Maintenance

The x13binary team is happy to share the availability of Release 1.1.61.2 of the x13binary package providing the X-13ARIMA-SEATS program by the US Census Bureau which arrived on CRAN earlier today, and has already been built for r2u. This release responds to a CRAN request to display the compiler version when building. x13binary, just like three other packages there, creates and ships a local binary it interfaces with. So our build was a little outside of R CMD INSTALL ... but now signals build versions like R does. We also modernized and simplified our continuous intgegration script based on r-ci. Courtesy of my CRANberries, there is also a diffstat report for this release showing changes to the previous release.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

22 February 2026

Benjamin Mako Hill: What makes online groups vulnerable to governance capture?

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for folks who missed them, I will be periodically (re)publishing blog posts about some older published projects. This post is closely based on a previously published post by Zarine Kharazian on the Community Data Science Blog. For nearly a decade, the Croatian language version of Wikipedia was run by a cabal of far-right nationalists who edited articles in ways that promoted fringe political ideas and involved cases of historical revisionism related to the Usta e regime, a fascist movement that ruled the Nazi puppet state called the Independent State of Croatia during World War II. This cabal seized complete control of the encyclopedia s governance, banned and blocked those who disagreed with them, and operated a network of fake accounts to create the appearance of grassroots support for their policies. Thankfully, Croatian Wikipedia appears to be an outlier. Though both the Croatian and Serbian language editions have been documented to contain nationalist bias and historical revisionism, Croatian Wikipedia seems unique among Wikipedia editions in the extent to which its governance institutions were captured by a small group of users.

The situation in Croatian Wikipedia was well documented and is now largely fixed, but we still know very little about why it was taken over, while other language editions seem to have rebuffed similar capture attempts. In a paper published in the Proceedings of the ACM: Human-Computer Interaction (CSCW), Zarine Kharazian, Kate Starbird, and I present an interview-based study that provides an explanation for why Croatian was captured while several other editions facing similar contexts and threats fared better.

Short video presentation of the work given at Wikimania in August 2023.
Based on insights from interviews with 15 participants from both the Croatian and Serbian Wikipedia projects and from the broader Wikimedia movement, we arrived at three propositions that, together, help explain why Croatian Wikipedia succumbed to capture while Serbian Wikipedia did not:
  1. Perceived Value as a Target. Is the project worth expending the effort to capture?
  2. Bureaucratic Openness. How easy is it for contributors outside the core founding team to ascend to local governance positions?
  3. Institutional Formalization. To what degree does the project prefer personalistic, informal forms of organization over formal ones?
The conceptual model from our paper, visualizing possible institutional configurations among Wikipedia projects that affect the risk of governance capture.
We found that both Croatian and Serbian Wikipedias were attractive targets for far-right nationalist capture due to their sizable readership and resonance with national identity. However, we also found that the two projects diverged early in their trajectories in how open they remained to new contributors ascending to local governance positions and in the degree to which they privileged informal relationships over formal rules and processes as the project s organizing principles. Ultimately, Croatian s relative lack of bureaucratic openness and rules constraining administrator behavior created a window of opportunity for a motivated contingent of editors to seize control of the governance mechanisms of the project. Though our empirical setting was Wikipedia, our theoretical model may offer insight into the challenges faced by self-governed online communities more broadly. As interest in decentralized alternatives to Facebook and X (formerly Twitter) grows, communities on these sites will likely face similar threats from motivated actors. Understanding the vulnerabilities inherent in these self-governing systems is crucial to building resilient defenses against threats like disinformation. For more details on our findings, take a look at the published version of our paper.

Citation for the full paper: Kharazian, Zarine, Kate Starbird, and Benjamin Mako Hill. 2024. Governance Capture in a Self-Governing Community: A Qualitative Comparison of the Croatian, Serbian, Bosnian, and Serbo-Croatian Wikipedias. Proceedings of the ACM on Human-Computer Interaction 8 (CSCW1): 61:1-61:26. https://doi.org/10.1145/3637338.

This blog post and the paper it describes are collaborative work by Zarine Kharazian, Benjamin Mako Hill, and Kate Starbird.

Otto Kek l inen: Do AI models still keep getting better, or have they plateaued?

Featured image of post Do AI models still keep getting better, or have they plateaued?The AI hype is based on the assumption that the frontier AI labs are producing better and better foundational models at an accelerating pace. Is that really true, or are people just in sort of a mass psychosis because AI models have become so good at mimicking human behavior that we unconsciously attribute increasing intelligence to them? I decided to conduct a mini-benchmark of my own to find out if the latest and greatest AI models are actually really good or not.

The problem with benchmarks Every time any team releases a new LLM, they boast how well it performs on various industry benchmarks such as Humanity s Last Exam, SWE-Bench and Ai2 ARC or ARC-AGI. An overall leaderboard can be viewed at LLM-stats. This incentivizes teams to optimize for specific benchmarks, which might make them excel on specific tasks while general abilities degrade. Also, the older a benchmark dataset is, the more online material there is discussing the questions and best answers, which in turn increases the chances of newer models trained on more recent web content scoring better. Thus I prefer looking at real-time leaderboards such as the LM Arena leaderboard (or OpenCompass for Chinese models that might be missing from LM Arena). However, even though the LM Arena Elo score is rated by humans in real-time, the benchmark can still be played. For example, Meta reportedly used a special chat-optimized model instead of the actual Llama 4 model when getting scored on the LM Arena. Therefore I trust my own first-hand experience more than the benchmarks for gaining intuition. Intuition however is not a compelling argument in discussions on whether or not new flagship AI models have plateaued. Thus, I decided to devise my own mini-benchmark so that no model could have possibly seen it in its training data or be specifically optimized for it in any way.

My mini-benchmark I crafted 6 questions based on my own experience using various LLMs for several years and having developed some intuition about what kinds of questions LLMs typically struggle with. I conducted the benchmark using the OpenRouter.ai chat playroom with the following state-of-the-art models: OpenRouter.ai is great as it very easy to get responses from multiple models in parallel to a single question. Also it allows to turn off web search to force the models to answer purely based on their embedded knowledge. OpenRouter.ai Chat playroom Common for all the test questions is that they are fairly straightforward and have a clear answer, yet the answer isn t common knowledge or statistically the most obvious one, and instead requires a bit of reasoning to get correct. Some of these questions are also based on myself witnessing a flagship model failing miserably to answer it.

1. Which cities have hosted the Olympics more than just once? This question requires accounting for both summer and winter Olympics, and for Olympics hosted across multiple cities. The variance in responses comes from if the model understands that Beijing should be counted as it has hosted both summer and winter Olympics. Interestingly GPT was the only model to not mention Beijing at all. Some variance also comes from how models account for co-hosted Olympics. For example Cortina should be counted as having hosted the Olympics twice, in 1956 and 2026, but only Claude, Gemini and Kimi pointed this out. Stockholm s 1956 hosting of the equestrian games during the Melbourne Olympics is a special case, which GPT, Gemini and Kimi pointed out in a side note. Some models seem to have old training material, and for example Grok assumes the current year is 2024. All models that accounted for awarded future Olympics (e.g. Los Angeles 2028) marked them clearly as upcoming. Overall I would judge that only GPT and MinMax gave incomplete answers, while all other models replied as the best humans could reasonably have.

2. If EUR/USD continues to slide to 1.5 by mid-2026, what is the likely effect on BMW s stock price by end of 2026? This question requires mapping the currency exchange rate to historic value, dodging the misleading word slide , and reasoning on where the revenue of a company comes from and how a weaker US dollar affects it in multiple ways. I ve frequently witnessed flagship models get it wrong how interest rates and exchange rates work. Apparently the binary choice between up or down is somehow challenging to the internal statistical model in the LLMs on a topic where there are a lot of training material that talk about both things being likely to happen, and choosing between them requires specifically reasoning about the scenario at hand and disregarding general knowledge of the situation. However, this time all the models concluded correctly that a weak dollar would have a negative overall effect on the BMW stock price. Gemini, GLM, Qwen and Kimi also mention the potential hedging effect of BMW s X-series production in South Carolina for worldwide export.

3. What is the Unicode code point for the traffic cone emoji? This was the first question where the the flagship models clearly still struggle in 2026. The trap here is that there is no traffic cone emoji, so an advanced model should simply refuse to give any Unicode numbers at all. Most LLMs however have an urge to give some answer, leading to hallucinations. Also, as the answer has a graphical element to it, the LLM might not understand how the emoji looks in ways that would be obvious to a human, and thus many models claim the construction sign emoji is a traffic cone, which it is not. By far the worst response was from GPT, that simply hallucinates and stops there: OpenAIs GPT-5.2 completely wrong answer to traffic cone emoji question While Gemini and Grok were among the three models not falling into this trap, the response from Claude was exemplary good: Claude Opus 4.6 exemplary good answer to traffic cone emoji question

4. Which languages are the 10 most commonly spoken in the world? For each language, count from 1 to 5. This seemingly simple question can become quite complex when you account for the fact that other languages might also have their own character system, both for words and the numerals (e.g. Arabic /one vs /1), and that numbers might have variations depending on what the thing being counted is (e.g in French object gender un/une), or whether plain numbers and occurrences have their own words (e.g. Russian /odin vs /raz). I ve also witnessed older models frequently mix up on what parts of the reply should be using what language. For a human this question is trivial, assuming they know the language. For LLMs it is surprisingly challenging. Grok and MinMax only gave a list of the most popular languages and stopped there without showing any words in the languages. Qwen got a bit further by trying to list numbers in each language, but the lists were only the numbers 1 to 5: Qwen3.5 Plus count to five in top 10 languages I have seen Grok do something similar before, which in fact inspired me to include this question in my mini-benchmark. GPT got a bit further, but for Hindi, Arabic and Bengali it listed the numerals in local script, not the number words. Gemini, GLM and Kimi gave a complete and correct answer as a list, while the absolute best answer and presentation was by Claude, that gave the table below:
Claude Opus 4.6 10 Most Commonly Spoken Languages (by Total Speakers)
# Language 1 2 3 4 5
1 English One Two Three Four Five
2 Mandarin Chinese (y ) ( r) (s n) (s ) (w )
3 Hindi (ek) (do) (t n) (ch r) (p ch)
4 Spanish Uno Dos Tres Cuatro Cinco
5 French Un Deux Trois Quatre Cinq
6 Arabic (w id) (ithn n) (thal tha) (arba a) (khamsa)
7 Bengali (ek) (dui) (tin) (ch r) (p ch)
8 Portuguese Um Dois Tr s Quatro Cinco
9 Russian (odin) (dva) (tri) (chetyre) (pyat')
10 Japanese (ichi) (ni) (san) (shi) (go)
Note: Rankings are based on estimated total speakers (native + non-native). Rankings can vary depending on the source and methodology used.

5. Count the number of drone launch pads in the picture. Together with the question, I uploaded this picture: Drone pad picture attached to question above A human can easily count that there are 10 rows and 30+ columns in the grid, but because the picture resolution isn t good enough, the exact number of columns can t be counted, and the answer should be that there are at least 300 launch pads in the picture. GPT and Grok both guessed the count is zero. Instead of hallucinating some number they say zero, but it would have been better to not give any number at all, and just state that they are unable to perform the task. Gemini gave as its answer 101 , which is quite odd, but reading the reasoning section, it seems to have tried counting items in the image without reasoning much about what it is actually counting and that there is clearly a grid that can make the counting much easier. Both Qwen and Kimi state they can see four parallel structures, but are unable to count drone launch pads. The absolutely best answer was given by Claude, which counted 10-12 rows and 30-40+ columns, and concluded that there must be 300-500 drone launch pads. Very close to best human level - impressive! This question applied only to multi-modal models that can see images, so GLM and MinMax could not give any response.

6. Explain why I am getting the error below, and what is the best way to fix it? Together with the question above, I gave this code block:
$ SH_SCRIPTS="$(mktemp; grep -Irnw debian/ -e '^#!.*/sh'   sort -u   cut -d ':' -f 1   true)"
$ shellcheck -x --enable=all --shell=sh "$SH_SCRIPTS"
/tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: /tmp/tmp.xQOpI5Nljx
debian/tests/integration-tests: openBinaryFile: does not exist (No such file or directory)
Older models would easily be misled by the last error message thinking that a file went missing, and focus on suggesting changes to the complex-looking first line. In reality the error is simply caused by having the quotes around the $SH_SCRIPTS, resulting in the entire multi-line string being passed as a single argument to shellcheck. So instead of receiving two separate file paths, shellcheck tries to open one file literally named /tmp/tmp.xQOpI5Nljx\ndebian/tests/integration-tests. Incorrect argument expansion is fairly easy for an experienced human programmer to notice, but tricky for an LLM. Indeed, Grok, MinMax, and Qwen fell for this trap and focused on the mktemp, assuming it somehow fails to create a file. Interestingly GLM fails to produce an answer at all, as the reasoning step seems to be looping, thinking too much about the missing file, but not understanding why it would be missing when there is nothing wrong with how mktemp is executed. Claude, Gemini, and Kimi immediately spot the real root cause of passing the variable quoted and suggested correct fixes that involve either removing the quotes, or using Bash arrays or xargs in a way that makes the whole command also handle correctly filenames with spaces in them.

Conclusion
Model Sports Economics Emoji Languages Visual Shell Score
Claude Opus 4.6 6/6
GPT-5.2 ~ 2.5/6
Grok 4.1 3/6
Gemini 3.1 Pro 5/6
GLM 5 ? N/A 3/5
MinMax M2.5 N/A 1/5
Qwen3.5 Plus ~ 2.5/6
Kimi K2.5 4/6
Obviously, my mini-benchmark only had 6 questions, and I ran it only once. This was obviously not scientifically rigorous. However it was systematic enough to trump just a mere feeling. The main finding for me personally is that Claude Opus 4.6, the flagship model by Anthropic, seems to give great answers consistently. The answers are not only correct, but also well scoped giving enough information to cover everything that seems relevant, without blurping unnecessary filler. I used Claude extensively in 2023-2024 when it was the main model available at my day work, but for the past year I had been using other models that I felt were better at the time. Now Claude seems to be the best-of-the-best again, with Gemini and Kimi as close follow-ups. Comparing their pricing at OpenRouter.ai the Kimi K2.5 price of $0.6 / million tokens is almost 90% cheaper than the Claude Opus 4.6 s $5.0 / million tokens suggests that Kimi K2.5 offers the best price-per-performance ratio. Claude might be cheaper with a monthly subscription directly from Anthropic, potentially narrowing the price gap. Overall I do feel that Anthropic, Google and Moonshot.ai have been pushing the envelope with their latest models in a way that one can t really claim that AI models have plateaued. In fact, one could claim that at least Claude has now climbed over the hill of AI slop and consistently produces valuable results. If and when AI usage expands from here, we might actually not drown in AI slop as chances of accidentally crappy results decrease. This makes me positive about the future. I am also really happy to see that there wasn t just one model crushing everybody else, but that there are at least three models doing very well. As an open source enthusiast I am particularly glad to see that Moonshot.ai s Kimi K2.5 is published with an open license. Given the hardware, anyone can run it on their own. OpenRouter.ai currently lists 9 independent providers alongside Moonshot.ai itself, showcasing the potential of open-weight models in practice. If the pattern holds and flagship models continue improving at this pace we might look back at 2026 as the year AI stopped feeling like a call center associate and started to resemble a scientific researcher. While new models become available we need to keep testing, keep questioning, and keep our expectations grounded in actual performance rather than press releases. Thanks to OpenRouter.ai for providing a great service that makes testing various models incredibly easy!

21 February 2026

Vasudev Kamath: Learning Notes: Debsecan MCP Server

Since Generative AI is currently the most popular topic, I wanted to get my hands dirty and learn something new. I was learning about the Model Context Protocol at the time and wanted to apply it to build something simple.
Idea On Debian systems, we use debsecan to find vulnerabilities. However, the tool currently provides a simple list of vulnerabilities and packages with no indication of the system's security posture meaning no criticality information is exposed and no executive summary is provided regarding what needs to be fixed. Of course, one can simply run the following to install existing fixes and be done with it:
apt-get install $(debsecan --suite sid --format packages --only-fixed)
But this is not how things work in corporate environments; you need to provide a report showing the system's previous state and the actions taken to bring it to a safe state. It is all about metrics and reports. My goal was to use debsecan to generate a list of vulnerabilities, find more detailed information on them, and prioritize them as critical, high, medium, or low. By providing this information to an AI, I could ask it to generate an executive summary report detailing what needs to be addressed immediately and the overall security posture of the system.
Initial Take My initial thought was to use an existing LLM, either self-hosted or a cloud-based LLM like Gemini (which provides an API with generous limits via AI Studio). I designed functions to output the list of vulnerabilities on the system and provide detailed information on each. The idea was to use these as "tools" for the LLM.
Learnings
  1. I learned about open-source LLMs using Ollama, which allows you to download and use models on your laptop.
  2. I used Llama 3.1, Llama 3.2, and Granite 4 on my laptop without a GPU. I managed to run my experiments, even though they were time-consuming and occasionally caused my laptop to crash.
  3. I learned about Pydantic and how to use it to parse custom JSON schemas with minimal effort.
  4. I learned about osv.dev, an open-source initiative by Google that aggregates vulnerability information from various sources and provides data in a well-documented JSON schema format.
  5. I learned about the EPSS (Exploit Prediction Scoring System) and how it is used alongside static CVSS scoring to detect truly critical vulnerabilities. The EPSS score provides an idea of the probability of a vulnerability being exploited in the wild based on actual real-world attacks.
These experiments led to a collection of notebooks. One key takeaway was that when defining tools, I cannot simply output massive amounts of text because it consumes tokens and increases costs for paid models (though it is fine for local models using your own hardware and energy). Self-hosted models require significant prompting to produce proper output, which helped me understand the real-world application of prompt engineering.
Change of Plans Despite extensive experimentation, I felt I was nowhere close to a full implementation. While using a Gemini learning tool to study MCP, it suddenly occurred to me: why not write the entire thing as an MCP server? This would save me from implementing the agent side and allow me to hook it into any IDE-based LLM.
Design This MCP server is primarily a mix of a "tool" (which executes on the server machine to identify installed packages and their vulnerabilities) and a "resource" (which exposes read-only information for a specific CVE ID). The MCP exposes two tools:
  1. List Vulnerabilities: This tool identifies vulnerabilities in the packages installed on the system, categorizes them using CVE and EPSS scores, and provides a dictionary of critical, high, medium, and low vulnerabilities.
  2. Research Vulnerabilities: Based on the user prompt, the LLM can identify relevant vulnerabilities and pass them to this function to retrieve details such as whether a fix is available, the fixed version, and criticality.
Vibe Coding "Vibe coding" is the latest trend, with many claiming that software engineering jobs are a thing of the past. Without going into too much detail, I decided to give it a try. While this is not my first "vibe coded" project (I have done this previously at work using corporate tools), it is my first attempt to vibe code a hobby/learning project. I chose Antigravity because it seemed to be the only editor providing a sufficient amount of free tokens. For every vibe coding project, I spend time thinking about the barebones skeleton: the modules, function return values, and data structures. This allows me to maintain control over the LLM-generated code so it doesn't become overly complicated or incomprehensible. As a first step, I wrote down my initial design in a requirements document. In that document, I explicitly called for using debsecan as the model for various components. Additionally, I asked the AI to reference my specific code for the EPSS logic. The reasons were:
  1. debsecan already solves the core problem; I am simply rebuilding it. debsecan uses a single file generated by the Debian Security team containing all necessary information, which prevents us from needing multiple external sources.
  2. This provides the flexibility to categorize vulnerabilities within the listing tool itself since all required information is readily available, unlike my original notebook-based design.
I initially used Gemini 3 Flash as the model because I was concerned about exceeding my free limits.
Hiccups Although it initially seemed successful, I soon noticed discrepancies between the local debsecan outputs and the outputs generated by the tools. I asked the AI to fix this, but after two attempts, it still could not match the outputs. I realized it was writing its own version-comparison logic and failing significantly. Finally, I instructed it to depend entirely on the python-apt module for version comparison; since it is not on PyPI, I asked it to pull directly from the Git source. This solved some issues, but the problem persisted. By then, my weekly quota was exhausted, and I stopped debugging. A week later, I resumed debugging with the Claude 3.5 Sonnet model. Within 20-25 minutes, it found the fix, which involved four lines of changes in the parsing logic. However, I ran out of limits again before I could proceed further. In the requirements, I specified that the list vulnerabilities tool should only provide a dictionary of CVE IDs divided by severity. However, the AI instead provided full text for all vulnerability details, resulting in excessive data including negligible vulnerabilities being sent to the LLM. Consequently, it never called the research vulnerabilities tool. Since I had run out of limits, I manually fixed this in a follow-up commit.
How to Use I have published the current work in the debsecan-mcp repository. I have used the same license as the original debsecan. I am not entirely sure how to interpret licenses for vibe-coded projects, but here we are. To use this, you need to install the tool in a virtual environment and configure your IDE to use the MCP. Here is how I set it up for Visual Studio Code:
  1. Follow the guide from the VS Code documentation regarding adding an MCP server.
  2. My global mcp.json looks like this:
 
  "servers":  
      "debsecan-mcp":  
          "command": "uv",
              "args": [
                  "--directory",
                  "/home/vasudev/Documents/personal/FOSS/debsecan-mcp/debsecan-mcp",
                  "run",
                  "debsecan-mcp"
              ]
           
   ,
  "inputs": []
 
  1. I am running it directly from my local codebase using a virtualenv created with uv. You may need to tweak the path based on your installation.
  2. To use the MCP server in the Copilot chat window, reference it using #debsecan-mcp. The LLM will then use the server for the query.
  3. Use a prompt like: "Give an executive summary of the system security status and immediate actions to be taken."
  4. You can observe the LLM using list_vulnerabilities followed by research_cves. Because the first tool only provides CVE IDs based on severity, the LLM is smart enough to research only high and critical vulnerabilities, thereby saving tokens.
What's Next? This MCP is not yet perfect and has the following issues:
  1. The list_vulnerabilities dictionary contains duplicate CVE IDs because the code used a list instead of a set. While the LLM is smart enough to deduplicate these, it still costs extra tokens.
  2. Because I initially modeled this on debsecan, it uses a raw method for parsing /var/lib/dpkg/status instead of python-apt. I am considering switching to python-apt to reduce maintenance overhead.
  3. Interestingly, the AI did not add a single unit test, which is disappointing. I will add these once my limits are restored.
  4. I need to create a cleaner README with usage instructions.
  5. I need to determine if the MCP can be used via HTTP as well as stdio.
Conclusion Vibe coding is interesting, but things can get out of hand if not managed properly. Even with a good process, code must be reviewed and tested; you cannot blindly trust an AI to handle everything. Even if it adds tests, you must validate them, or you are doomed!

20 February 2026

Bits from Debian: Proxmox Platinum Sponsor of DebConf26

proxmox-logo We are pleased to announce that Proxmox has committed to sponsor DebConf26 as a Platinum Sponsor. Proxmox develops powerful, yet easy-to-use open-source server solutions. The comprehensive open-source ecosystem is designed to manage divers IT landscapes, from single servers to large-scale distributed data centers. Our unified platform integrates server virtualization, easy backup, and rock-solid email security ensuring seamless interoperability across the entire portfolio. With the Proxmox Datacenter Manager, the ecosystem also offers a "single pane of glass" for centralized management across different locations. Since 2005, all Proxmox solutions have been built on the rock-solid Debian platform. We are proud to return to DebConf26 as a sponsor because the Debian community provides the foundation that makes our work possible. We believe in keeping IT simple, open, and under your control. Thank you very much, Proxmox, for your support of DebConf26! Become a sponsor too! DebConf26 will take place from 20th to July 25th 2026 in Santa Fe, Argentina, and will be preceded by DebCamp, from 13th to 19th July 2026. DebConf26 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf26 website at https://debconf26.debconf.org/sponsors/become-a-sponsor/.

18 February 2026

Thomas Lange: 42.000 FAI.me jobs created

The FAI.me service has reached another milestone: The 42.000th job was submitted via the web interface since the beginning of this service in 2017. The idea was to provide a simple web interface for end users for creating the configs for the fully automatic installation with only minimal questions and without knowing the syntax of the configuration files. Thanks a lot for using this service and for all your feedback. The next job can be yours! P.S.: I like to get more feedback for the FAI.me service. What do you like most? What's missing? Do you have any success story how you use the customized ISO for your deployment? Please fill out the FAI questionaire or sent feedback via email to fai.me@fai-project.org About FAI.me FAI.me is the service for building your own customized images via a web interface. You can create an installation or live ISO or a cloud image. For Debian, multiple release versions can be chosen, as well as installations for Ubuntu Server, Ubuntu Desktop, or Linux Mint. Multiple options are available like selecting different desktop environments, the language and keyboard and adding a user with a password. Optional settings include adding your own package list, choosing a backports kernel, adding a postinst script and adding a ssh public key, choosing a partition layout and some more.

16 February 2026

Benjamin Mako Hill: Why do people participate in similar online communities?

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for folks who missed them, I will be periodically (re)publishing blog posts about some older published projects. It seems natural to think of online communities competing for the time and attention of their participants. Over the last few years, I ve worked with a team of collaborators led by Nathan TeBlunthuis to use mathematical and statistical techniques from ecology to understand these dynamics. What we ve found surprised us: competition between online communities is rare and typically short-lived. When we started this research, we figured competition would be most likely among communities discussing similar topics. As a first step, we identified clusters of such communities on Reddit. One surprising thing we noticed in our Reddit data was that many of these communities that used similar language also had very high levels of overlap among their users. This was puzzling: why were the same groups of people talking to each other about the same things in different places? And why don t they appear to be in competition with each other for their users time and activity? We didn t know how to answer this question using quantitative methods. As a result, we recruited and interviewed 20 active participants in clusters of highly related subreddits with overlapping user bases (for example, one cluster was focused on vintage audio). We found that the answer to the puzzle lay in the fact that the people we talked to were looking for three distinct things from the communities they worked in:

  1. The ability to connect to specific information and narrowly scoped discussions.
  2. The ability to socialize with people who are similar to themselves.
  3. Attention from the largest possible audience.
Critically, we also found that these three things represented a trilemma, and that no single community can meet all three needs. You might find two of the three in a single community, but you could never have all three.
Figure from No Community Can Do Everything: Why People Participate in Similar Online Communities depicts three key benefits that people seek from online communities and how individual communities tend not to optimally provide all three. For example, large communities tend not to afford a tight-knit homophilous community.
The end result is something I recognize in how I engage with online communities on platforms like Reddit. People tend to engage with a portfolio of communities that vary in size, specialization, topical focus, and rules. Compared with any single community, such overlapping systems can provide a wider range of benefits. No community can do everything.

This work was published as a paper at CSCW: TeBlunthuis, Nathan, Charles Kiene, Isabella Brown, Laura (Alia) Levi, Nicole McGinnis, and Benjamin Mako Hill. 2022. No Community Can Do Everything: Why People Participate in Similar Online Communities. Proceedings of the ACM on Human-Computer Interaction 6 (CSCW1): 61:1-61:25. https://doi.org/10.1145/3512908.

This work was supported by the National Science Foundation (awards IIS-1908850, IIS-1910202, and GRFP-2016220885). A full list of acknowledgements is in the paper.

12 February 2026

Freexian Collaborators: Debian Contributions: cross building, rebootstrap updates, Refresh of the patch tagging guidelines and more! (by Anupa Ann Joseph)

Debian Contributions: 2026-01 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

cross building, by Helmut Grohne In version 1.10.1, Meson merged a patch to make it call the correct g-ir-scanner by default thanks to Eli Schwarz. This problem affected more than 130 source packages. Helmut retried building them all and filed 69 patches as a result. A significant portion of those packages require another Meson change to call the correct vapigen. Another notable change is converting gnu-efi to multiarch, which ended up requiring changes to a number of other packages. Since Aurelien dropped the libcrypt-dev dependency from libc6-dev, this transition now is mostly complete and has resulted in most of the Perl ecosystem correctly expressing perl-xs-dev dependencies needed for cross building. It is these infrastructure changes affecting several client packages that this work targets. As a result of this continued work, about 66% of Debian s source packages now have satisfiable cross Build-Depends in unstable and about 10000 (55%) actually can be cross built. There are now more than 500 open bug reports affecting more than 2000 packages most of which carry patches.

rebootstrap, by Helmut Grohne Maintaining architecture cross-bootstrap requires continued effort for adapting to archive changes such as glib2.0 dropping a build profile or an e2fsprogs FTBFS. Beyond those generic problems, architecture-specific problems with e.g. musl-linux-any or sparc may arise. While all these changes move things forward on the surface, the bootstrap tooling has become a growing pile of patches. Helmut managed to upstream two changes to glibc for reducing its Build-Depends in the stage2 build profile and thanks Aurelien Jarno.

Refresh of the patch tagging guidelines, by Rapha l Hertzog Debian Enhancement Proposal #3 (DEP-3) is named Patch Tagging Guidelines and standardizes meta-information that Debian contributors can put in patches included in Debian source packages. With the feedback received over the years, and with the change in the package management landscape, the need to refresh those guidelines became evident. As the initial driver of that DEP, I spent a good day reviewing all the feedback (that I kept in a folder) and producing a new version of the document. The changes aim to give more weight to the syntax that is compatible with git format-patch s output, and also to clarify the expected uses and meanings of a couple of fields, including some algorithm that parsers should follow to define the state of the patch. After the announcement of the new draft on debian-devel, the revised DEP-3 received a significant number of comments that I still have to process.

Miscellaneous contributions
  • Helmut uploaded debvm making it work with unstable as a target distribution again.
  • Helmut modernized the code base backing dedup.debian.net significantly expanding the support for type checking.
  • Helmut fixed the multiarch hinter once more given feedback from Fabian Gr nbichler.
  • Helmut worked on migrating the rocblas package to forky.
  • Rapha l fixed RC bug #1111812 in publican and did some maintenance for tracker.debian.org.
  • Carles added support in the festival Debian package for systemd socket activation and systemd service and socket units. Adapted the patch for upstream and created a merge request (also fixed a MacOS X building system error while working on it). Updated Orca Wiki documentation regarding festival. Discussed a 2007 bug/feature in festival which allowed having a local shell and that the new systemd socket activation has the same code path.
  • Carles using po-debconf-manager worked on Catalan translations: 7 reviewed and sent; 5 follow ups, 5 deleted packages.
  • Carles made some po-debconf-manager changes: now it attaches the translation file on follow ups, fixed bullseye compatibility issues.
  • Carles reviewed a new Catalan apt translation.
  • Carles investigated and reported a lxhotkey bug and sent a patch for the abcde package.
  • Carles made minor updates for Debian Wiki for different pages (lxde for dead keys, Ripping with abcde troubleshooting, VirtualBox troubleshooting).
  • Stefano renamed build-details.json in Python 3.14 to fix multiarch coinstallability.
  • Stefano audited the tooling and ignore lists for checking the contents of the python3.X-minimal packages, finding and fixing some issues in the process.
  • Stefano made a few uploads of python3-defaults and dh-python in support of Python 3.14-as-default in Ubuntu. Also investigated the risk of ignoring byte-compilation failures by default, and started down the road of implementing this.
  • Stefano did some sysadmin work on debian.social infrastructure.
  • Stefano and Santiago worked on preparations for DebConf 26. Especially to help the local team on opening the registration, and reviewing the budget to be presented for approval.
  • Stefano uploaded routine updates of python-virtualenv and python-flexmock.
  • Antonio collaborated with DSA on enabling a new proxy for salsa to prevent scrapers from taking the service down.
  • Antonio did miscellaneous salsa administrative tasks.
  • Antonio fixed a few Ruby packages towards the Ruby 3.4 transition.
  • Antonio started work on planned improvements to the DebConf registration system.
  • Santiago prepared unstable updates for the latest upstream versions of knot-dns and knot-resolver. The authoritative DNS server and DNS resolver software developed by CZ.NIC. It is worth highlighting that, given the separation of functionality compared to other implementations, knot-dns and knot-resolver are also less complex software, which results in advantages in terms of security: only three CVEs have been reported for knot-dns since 2011).
  • Santiago made some routine reviews of merge requests proposed for the Salsa CI s pipeline. E.g. a proposal to fix how sbuild chooses the chroot when building a package for experimental.
  • Colin fixed lots of Python packages to handle Python 3.14 and to avoid using the deprecated pkg_resources module.
  • Colin added forky support to the images used in Salsa CI pipelines.
  • Colin began working on getting a release candidate of groff 1.24.0 (the first upstream release since mid-2023, so a very large set of changes) into experimental.
  • Lucas kept working on the preparation for Ruby 3.4 transition. Some packages fixed (support build against Ruby 3.3 and 3.4): ruby-rbpdf, jekyll, origami-pdf, ruby-kdl, ruby-twitter, ruby-twitter-text, ruby-globalid.
  • Lucas supported some potential mentors in the Google Summer of Code 26 program to submit their projects.
  • Anupa worked on the point release announcements for Debian 12.13 and 13.3 from the Debian publicity team side.
  • Anupa attended the publicity team meeting to discuss the team activities and to plan an online sprint in February.
  • Anupa attended meetings with the Debian India team to plan and coordinate the MinDebConf Kanpur and sent out related Micronews.
  • Emilio coordinated various transitions and helped get rid of llvm-toolchain-17 from sid.

10 February 2026

Freexian Collaborators: Writing a new worker task for Debusine (by Carles Pina i Estany)

Debusine is a tool designed for Debian developers and Operating System developers in general. You can try out Debusine on debusine.debian.net, and follow its development on salsa.debian.org. This post describes how to write a new worker task for Debusine. It can be used to add tasks to a self-hosted Debusine instance, or to submit to the Debusine project new tasks to add new capabilities to Debusine. Tasks are the lower-level pieces of Debusine workflows. Examples of tasks are Sbuild, Lintian, Debdiff (see the available tasks). This post will document the steps to write a new basic worker task. The example will add a worker task that runs reprotest and creates an artifact of the new type ReprotestArtifact with the reprotest log. Tasks are usually used by workflows. Workflows solve high-level goals by creating and orchestrating different tasks (e.g. a Sbuild workflow would create different Sbuild tasks, one for each architecture).

Overview of tasks A task usually does the following:
  • It receives structured data defining its input artifacts and configuration
  • Input artifacts are downloaded
  • A process is run by the worker (e.g. lintian, debdiff, etc.). In this blog post, it will run reprotest
  • The output (files, logs, exit code, etc.) is analyzed, artifacts and relations might be generated, and the work request is marked as completed, either with Success or Failure
If you want to follow the tutorial and add the Reprotest task, your Debusine development instance should have at least one worker, one user, a debusine client set up, and permissions for the client to create tasks. All of this can be setup following the steps in the Contribute section of the documentation. This blog post shows a functional Reprotest task. This task is not currently part of Debusine. The Reprotest task implementation is simplified (no error handling, unit tests, specific view, docs, some shortcuts in the environment preparation, etc.). At some point, in Debusine, we might add a debrebuild task which is based on buildinfo files and uses snapshot.debian.org to recreate the binary packages.

Defining the inputs of the task The input of the reprotest task will be a source artifact (a Debian source package). We model the input with pydantic in debusine/tasks/models.py:
class ReprotestData(BaseTaskDataWithExecutor):
   """Data for Reprotest task."""

   source_artifact: LookupSingle

class ReprotestDynamicData(BaseDynamicTaskDataWithExecutor):
   """Reprotest dynamic data."""

   source_artifact_id: int   None = None
The ReprotestData is what the user will input. A LookupSingle is a lookup that resolves to a single artifact. We would also have configuration for the desired variations to test, but we have left that out of this example for simplicity. Configuring variations is left as an exercise for the reader. Since ReprotestData is a subclass of BaseTaskDataWithExecutor it also contains environment where the user can specify in which environment the task will run. The environment is an artifact with a Debian image. The ReprotestDynamicData holds the resolution of all lookups. These can be seen in the Internals tab of the work request view.

Add the new Reprotest artifact data class In order for the reprotest task to create a new Artifact of the type DebianReprotest with the log and output metadata: add the new category to ArtifactCategory in debusine/artifacts/models.py:
    REPROTEST = "debian:reprotest"
In the same file add the DebianReprotest class:
class DebianReprotest(ArtifactData):
   """Data for debian:reprotest artifacts."""

   reproducible: bool   None = None

   def get_label(self) -> str:
       """Return a short human-readable label for the artifact."""
       return "reprotest analysis"
It could also include the package name or version. In order to have the category listed in the work request output artifacts table, edit the file debusine/db/models/artifacts.py: In ARTIFACT_CATEGORY_ICON_NAMES add ArtifactCategory.REPROTEST: "folder", and in ARTIFACT_CATEGORY_SHORT_NAMES add ArtifactCategory.REPROTEST: "reprotest",.

Create the new Task class In debusine/tasks/ create a new file reprotest.py.
reprotest.py
# Copyright   The Debusine Developers
# See the AUTHORS file at the top-level directory of this distribution
#
# This file is part of Debusine. It is subject to the license terms
# in the LICENSE file found in the top-level directory of this
# distribution. No part of Debusine, including this file, may be copied,
# modified, propagated, or distributed except according to the terms
# contained in the LICENSE file.

"""Task to use reprotest in debusine."""

from pathlib import Path
from typing import Any

from debusine import utils
from debusine.artifacts.local_artifact import ReprotestArtifact
from debusine.artifacts.models import (
    ArtifactCategory,
    CollectionCategory,
    DebianSourcePackage,
    DebianUpload,
    WorkRequestResults,
    get_source_package_name,
    get_source_package_version,
)
from debusine.client.models import RelationType
from debusine.tasks import BaseTaskWithExecutor, RunCommandTask
from debusine.tasks.models import ReprotestData, ReprotestDynamicData
from debusine.tasks.server import TaskDatabaseInterface


class Reprotest(
    RunCommandTask[ReprotestData, ReprotestDynamicData],
    BaseTaskWithExecutor[ReprotestData, ReprotestDynamicData],
):
    """Task to use reprotest in debusine."""

    TASK_VERSION = 1

    CAPTURE_OUTPUT_FILENAME = "reprotest.log"

    def __init__(
        self,
        task_data: dict[str, Any],
        dynamic_task_data: dict[str, Any]   None = None,
    ) -> None:
        """Initialize object."""
        super().__init__(task_data, dynamic_task_data)

        self._reprotest_target: Path   None = None

    def build_dynamic_data(
        self, task_database: TaskDatabaseInterface
    ) -> ReprotestDynamicData:
        """Compute and return ReprotestDynamicData."""
        input_source_artifact = task_database.lookup_single_artifact(
            self.data.source_artifact
        )

        assert input_source_artifact is not None
        self.ensure_artifact_categories(
            configuration_key="input.source_artifact",
            category=input_source_artifact.category,
            expected=(
                ArtifactCategory.SOURCE_PACKAGE,
                ArtifactCategory.UPLOAD,
            ),
        )
        assert isinstance(
            input_source_artifact.data, (DebianSourcePackage, DebianUpload)
        )
        subject = get_source_package_name(input_source_artifact.data)
        version = get_source_package_version(input_source_artifact.data)

        assert self.data.environment is not None

        environment = self.get_environment(
            task_database,
            self.data.environment,
            default_category=CollectionCategory.ENVIRONMENTS,
        )

        return ReprotestDynamicData(
            source_artifact_id=input_source_artifact.id,
            subject=subject,
            parameter_summary=f" subject _ version ",
            environment_id=environment.id,
        )

    def get_input_artifacts_ids(self) -> list[int]:
        """Return the list of input artifact IDs used by this task."""
        if not self.dynamic_data:
            return []

        return [
            self.dynamic_data.source_artifact_id,
            self.dynamic_data.environment_id,
        ]

    def fetch_input(self, destination: Path) -> bool:
        """Download the required artifacts."""
        assert self.dynamic_data

        artifact_id = self.dynamic_data.source_artifact_id
        assert artifact_id is not None
        self.fetch_artifact(artifact_id, destination)

        return True

    def configure_for_execution(self, download_directory: Path) -> bool:
        """
        Find a .dsc in download_directory.

        Install reprotest and other utilities used in _cmdline.
        Set self._reprotest_target to it.

        :param download_directory: where to search the files
        :return: True if valid files were found
        """
        self._prepare_executor_instance()

        if self.executor_instance is None:
            raise AssertionError("self.executor_instance cannot be None")

        self.run_executor_command(
            ["apt-get", "update"],
            log_filename="install.log",
            run_as_root=True,
            check=True,
        )
        self.run_executor_command(
            [
                "apt-get",
                "--yes",
                "--no-install-recommends",
                "install",
                "reprotest",
                "dpkg-dev",
                "devscripts",
                "equivs",
                "sudo",
            ],
            log_filename="install.log",
            run_as_root=True,
        )

        self._reprotest_target = utils.find_file_suffixes(
            download_directory, [".dsc"]
        )
        return True

    def _cmdline(self) -> list[str]:
        """
        Build the reprotest command line.

        Use configuration of self.data and self._reprotest_target.
        """
        target = self._reprotest_target
        assert target is not None

        cmd = [
            "bash",
            "-c",
            f"TMPDIR=/tmp ; cd /tmp ; dpkg-source -x  target  package/; "
            "cd package/ ; mk-build-deps ; apt-get install --yes ./*.deb ; "
            "rm *.deb ; "
            "reprotest --vary=-time,-user_group,-fileordering,-domain_host .",
        ]

        return cmd

    @staticmethod
    def _cmdline_as_root() -> bool:
        r"""apt-get install --yes ./\*.deb must be run as root."""
        return True

    def task_result(
        self,
        returncode: int   None,
        execute_directory: Path,  # noqa: U100
    ) -> WorkRequestResults:
        """
        Evaluate task output and return success.

        For a successful run of reprotest:
        -must have the output file
        -exit code is 0

        :return: WorkRequestResults.SUCCESS or WorkRequestResults.FAILURE.
        """
        reprotest_file = execute_directory / self.CAPTURE_OUTPUT_FILENAME

        if reprotest_file.exists() and returncode == 0:
            return WorkRequestResults.SUCCESS

        return WorkRequestResults.FAILURE

    def upload_artifacts(
        self, exec_directory: Path, *, execution_result: WorkRequestResults
    ) -> None:
        """Upload the ReprotestArtifact with the files and relationships."""
        if not self.debusine:
            raise AssertionError("self.debusine not set")

        assert self.dynamic_data is not None
        assert self.dynamic_data.parameter_summary is not None

        reprotest_artifact = ReprotestArtifact.create(
            reprotest_output=exec_directory / self.CAPTURE_OUTPUT_FILENAME,
            reproducible=execution_result == WorkRequestResults.SUCCESS,
            package=self.dynamic_data.parameter_summary,
        )

        uploaded = self.debusine.upload_artifact(
            reprotest_artifact,
            workspace=self.workspace_name,
            work_request=self.work_request_id,
        )

        assert self.dynamic_data is not None
        assert self.dynamic_data.source_artifact_id is not None
        self.debusine.relation_create(
            uploaded.id,
            self.dynamic_data.source_artifact_id,
            RelationType.RELATES_TO,
        )
Below are the main methods with some basic explanation. In order for Debusine to discover the task, add "Reprotest" in the file debusine/tasks/__init__.py in the __all__ list. Let s explain the different methods of the Reprotest class:

build_dynamic_data method The worker has no access to Debusine s database. Lookups are all resolved before the task gets dispatched to a worker, so all it has to do is download the specified input artifacts. build_dynamic_data method lookup the artifact, assert that is a valid category, extract the package name and version, and get the environment in which it will be executed. The environment is needed to run the task (reprotest will run in a container using unshare, incus ).
    def build_dynamic_data(
        self, task_database: TaskDatabaseInterface
    ) -> ReprotestDynamicData:
        """Compute and return ReprotestDynamicData."""
        input_source_artifact = task_database.lookup_single_artifact(
            self.data.source_artifact
        )

        assert input_source_artifact is not None
        self.ensure_artifact_categories(
            configuration_key="input.source_artifact",
            category=input_source_artifact.category,
            expected=(
                ArtifactCategory.SOURCE_PACKAGE,
                ArtifactCategory.UPLOAD,
            ),
        )
        assert isinstance(
            input_source_artifact.data, (DebianSourcePackage, DebianUpload)
        )
        subject = get_source_package_name(input_source_artifact.data)
        version = get_source_package_version(input_source_artifact.data)

        assert self.data.environment is not None

        environment = self.get_environment(
            task_database,
            self.data.environment,
            default_category=CollectionCategory.ENVIRONMENTS,
        )

        return ReprotestDynamicData(
            source_artifact_id=input_source_artifact.id,
            subject=subject,
            parameter_summary=f" subject _ version ",
            environment_id=environment.id,
        )

get_input_artifacts_ids method Used to list the task s input artifacts in the web UI.
   def get_input_artifacts_ids(self) -> list[int]:
       """Return the list of input artifact IDs used by this task."""
       if not self.dynamic_data:
           return []

       assert self.dynamic_data.source_artifact_id is not None
       return [self.dynamic_data.source_artifact_id]

fetch_input method Download the required artifacts on the worker.
    def fetch_input(self, destination: Path) -> bool:
        """Download the required artifacts."""
        assert self.dynamic_data

        artifact_id = self.dynamic_data.source_artifact_id
        assert artifact_id is not None
        self.fetch_artifact(artifact_id, destination)

        return True

configure_for_execution method Install the packages needed by the task and set _reprotest_target, which is used to build the task s command line.
   def configure_for_execution(self, download_directory: Path) -> bool:
       """
       Find a .dsc in download_directory.

       Install reprotest and other utilities used in _cmdline.
       Set self._reprotest_target to it.

       :param download_directory: where to search the files
       :return: True if valid files were found
       """
       self._prepare_executor_instance()

       if self.executor_instance is None:
           raise AssertionError("self.executor_instance cannot be None")

       self.run_executor_command(
           ["apt-get", "update"],
           log_filename="install.log",
           run_as_root=True,
           check=True,
       )
       self.run_executor_command(
           [
               "apt-get",
               "--yes",
               "--no-install-recommends",
               "install",
               "reprotest",
               "dpkg-dev",
               "devscripts",
               "equivs",
               "sudo",
           ],
           log_filename="install.log",
           run_as_root=True,
       )

       self._reprotest_target = utils.find_file_suffixes(
           download_directory, [".dsc"]
       )
       return True

_cmdline method Return the command line to run the task. In this case, and to keep the example simple, we will run reprotest directly in the worker s executor VM/container, without giving it an isolated virtual server. So, this command installs the build dependencies required by the package (so reprotest can build it) and runs reprotest itself.
   def _cmdline(self) -> list[str]:
       """
       Build the reprotest command line.

       Use configuration of self.data and self._reprotest_target.
       """
       target = self._reprotest_target
       assert target is not None

       cmd = [
           "bash",
           "-c",
           f"TMPDIR=/tmp ; cd /tmp ; dpkg-source -x  target  package/; "
           "cd package/ ; mk-build-deps ; apt-get install --yes ./*.deb ; "
           "rm *.deb ; "
           "reprotest --vary=-time,-user_group,-fileordering,-domain_host .",
       ]

       return cmd
Some reprotest variations are disabled. This is to keep the example simple with the set of packages to install and reprotest features.

_cmdline_as_root method Since during the execution it s needed to install packages, run it as root (in the container):
   @staticmethod
   def _cmdline_as_root() -> bool:
       r"""apt-get install --yes ./\*.deb must be run as root."""
       return True

task_result method Task succeeded if a log is generated and the return code is 0.
    def task_result(
        self,
        returncode: int   None,
        execute_directory: Path,  # noqa: U100
    ) -> WorkRequestResults:
        """
        Evaluate task output and return success.

        For a successful run of reprotest:
        -must have the output file
        -exit code is 0

        :return: WorkRequestResults.SUCCESS or WorkRequestResults.FAILURE.
        """
        reprotest_file = execute_directory / self.CAPTURE_OUTPUT_FILENAME

        if reprotest_file.exists() and returncode == 0:
            return WorkRequestResults.SUCCESS

        return WorkRequestResults.FAILURE

upload_artifacts method Create the ReprotestArtifact with the log and the reproducible boolean, upload it, and then add a relation between the ReprotestArtifact and the source package:
    def upload_artifacts(
        self, exec_directory: Path, *, execution_result: WorkRequestResults
    ) -> None:
        """Upload the ReprotestArtifact with the files and relationships."""
        if not self.debusine:
            raise AssertionError("self.debusine not set")

        assert self.dynamic_data is not None
        assert self.dynamic_data.parameter_summary is not None

        reprotest_artifact = ReprotestArtifact.create(
            reprotest_output=exec_directory / self.CAPTURE_OUTPUT_FILENAME,
            reproducible=execution_result == WorkRequestResults.SUCCESS,
            package=self.dynamic_data.parameter_summary,
        )

        uploaded = self.debusine.upload_artifact(
            reprotest_artifact,
            workspace=self.workspace_name,
            work_request=self.work_request_id,
        )

        assert self.dynamic_data is not None
        assert self.dynamic_data.source_artifact_id is not None
        self.debusine.relation_create(
            uploaded.id,
            self.dynamic_data.source_artifact_id,
            RelationType.RELATES_TO,
        )

Execution example To run this task in a local Debusine (see steps to have it ready with an environment, permissions and users created) you can do:
$ python3 -m debusine.client artifact import-debian -w System http://deb.debian.org/debian/pool/main/h/hello/hello_2.10-5.dsc
(get the artifact ID from the output of that command) The artifact can be seen in http://$DEBUSINE/debusine/System/artifact/$ARTIFACTID/. Then create a reprotest.yaml:
$ cat <<EOF > reprotest.yaml
source_artifact: $ARTIFACT_ID
environment: "debian/match:codename=bookworm"
EOF
Instead of debian/match:codename=bookworm it could use the artifact ID. Finally, create the work request to run the task:
$ python3 -m debusine.client create-work-request -w System reprotest --data reprotest.yaml
Using Debusine web you can see the work request, which should go to Running status, then Completed with Success or Failure (depending if reprotest could reproduce it or not). Clicking on the Output tab would have an artifact of type debian:reprotest with one file: the log. In the Metadata tab of the artifact it has Data: the package name and reproducible (true or false).

What is left to do? This was a simple example of creating a task. Other things that could be done:
  • unit tests
  • documentation
  • configurable variations
  • running reprotest directly on the worker host, using the executor environment as a reprotest virtual server
  • in this specific example, the command line might be doing too many things that could maybe be done by other parts of the task, such as prepare_environment.
  • integrate it in a workflow so it s easier to use (e.g. part of QaWorkflow)
  • extract more from the log than just pass/fail
  • display the output in a more useful way (implement an artifact specialized view)

8 February 2026

Thorsten Alteholz: My Debian Activities in January 2026

Debian LTS/ELTS This was my hundred-thirty-ninth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian (as the LTS- and ELTS-teams have been merged now, there is only one paragraph left for both activities). During my allocated time I uploaded or worked on: I also attended the monthly LTS/ELTS meeting. While working on updates, I stumbled upon packages, whose CVEs have been postponed for a long time and their CVSS score was rather high. I wonder whether one should pay more attention to postponed issues, otherwise one could have already marked them as ignored. Debian Printing Unfortunately I didn t found any time to work on this topic. Debian Lomiri This month I worked on unifying packaging on Debian and Ubuntu. This makes it easier to work on those packages independent of the used platform. This work is generously funded by Fre(i)e Software GmbH! Debian Astro This month I uploaded a new upstream version or a bugfix version of: Debian IoT Unfortunately I didn t found any time to work on this topic. Debian Mobcom Unfortunately I didn t found any time to work on this topic. misc This month I uploaded a new upstream version or a bugfix version of: Unfortunately this month I was distracted from my normal Debian work by other unpleasant things, so that the paragraphs above are mostly empty. I now have to think about how many of my spare time I am able to dedicate to Debian in the future.

6 February 2026

Reproducible Builds: Reproducible Builds in January 2026

Welcome to the first monthly report in 2026 from the Reproducible Builds project! These reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.

  1. Flathub now testing for reproducibility
  2. Reproducibility identifying projects that will fail to build in 2038
  3. Distribution work
  4. Tool development
  5. Two new academic papers
  6. Upstream patches

Flathub now testing for reproducibility Flathub, the primary repository/app store for Flatpak-based applications, has begun checking for build reproducibility. According to a recent blog post:
We have started testing binary reproducibility of x86_64 builds targeting the stable repository. This is possible thanks to flathub-repro-checker, a tool doing the necessary legwork to recreate the build environment and compare the result of the rebuild with what is published on Flathub. While these tests have been running for a while now, we have recently restarted them from scratch after enabling S3 storage for diffoscope artifacts.
The test results and status is available on their reproducible builds page.

Reproducibility identifying software projects that will fail to build in 2038 Longtime Reproducible Builds developer Bernhard M. Wiedemann posted on Reddit on Y2K38 commemoration day T-12 that is to say, twelve years to the day before the UNIX Epoch will no longer fit into a signed 32-bit integer variable on 19th January 2038. Bernhard s comment succinctly outlines the problem as well as notes some of the potential remedies, as well as links to a discussion with the GCC developers regarding adding warnings for int time_t conversions . At the time of publication, Bernhard s topic had generated 50 comments in response.

Distribution work Conda is language-agnostic package manager which was originally developed to help Python data scientists and is now a popular package manager for Python and R. conda-forge, a community-led infrastructure for Conda recently revamped their dashboards to rebuild packages straight to track reproducibility. There have been changes over the past two years to make the conda-forge build tooling fully reproducible by embedding the lockfile of the entire build environment inside the packages.
In Debian this month:
In NixOS this month, it was announced that the GNU Guix Full Source Bootstrap was ported to NixOS as part of Wire Jansen bachelor s thesis (PDF). At the time of publication, this change has landed in NiX stdev distribution.
Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for his work there.

Tool development diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions, 310 and 311 to Debian.
  • Fix test compatibility with u-boot-tools version 2026-01. [ ]
  • Drop the implied Rules-Requires-Root: no entry in debian/control. [ ]
  • Bump Standards-Version to 4.7.3. [ ]
  • Reference the Debian ocaml package instead of ocaml-nox. (#1125094)
  • Apply a patch by Jelle van der Waa to adjust a test fixture match new lines. [ ]
  • Also the drop implied Priority: optional from debian/control. [ ]

In addition, Holger Levsen uploaded two versions of disorderfs, first updating the package from FUSE 2 to FUSE 3 as described in last months report, as well as updating the packaging to the latest Debian standards. A second upload (0.6.2-1) was subsequently made, with Holger adding instructions on how to add the upstream release to our release archive and incorporating changes by Roland Clobus to set _FILE_OFFSET_BITS on 32-bit platforms, fixing a build failure on 32-bit systems. Vagrant Cascadian updated diffoscope in GNU Guix to version 311-2-ge4ec97f7 and disorderfs to 0.6.2.

Two new academic papers Julien Malka, Stefano Zacchiroli and Th o Zimmermann of T l com Paris in-house research laboratory, the Information Processing and Communications Laboratory (LTCI) published a paper this month titled Docker Does Not Guarantee Reproducibility:
[ ] While Docker is frequently cited in the literature as a tool that enables reproducibility in theory, the extent of its guarantees and limitations in practice remains under-explored. In this work, we address this gap through two complementary approaches. First, we conduct a systematic literature review to examine how Docker is framed in scientific discourse on reproducibility and to identify documented best practices for writing Dockerfiles enabling reproducible image building. Then, we perform a large-scale empirical study of 5,298 Docker builds collected from GitHub workflows. By rebuilding these images and comparing the results with their historical counterparts, we assess the real reproducibility of Docker images and evaluate the effectiveness of the best practices identified in the literature.
A PDF of their paper is available online.
Quentin Guilloteau, Antoine Waehren and Florina M. Ciorba of the University of Basel in Sweden also published a Docker-related paper, theirs called Longitudinal Study of the Software Environments Produced by Dockerfiles from Research Artifacts:
The reproducibility crisis has affected all scientific disciplines, including computer science (CS). To address this issue, the CS community has established artifact evaluation processes at conferences and in journals to evaluate the reproducibility of the results shared in publications. Authors are therefore required to share their artifacts with reviewers, including code, data, and the software environment necessary to reproduce the results. One method for sharing the software environment proposed by conferences and journals is to utilize container technologies such as Docker and Apptainer. However, these tools rely on non-reproducible tools, resulting in non-reproducible containers. In this paper, we present a tool and methodology to evaluate variations over time in software environments of container images derived from research artifacts. We also present initial results on a small set of Dockerfiles from the Euro-Par 2024 conference.
A PDF of their paper is available online.

Miscellaneous news On our mailing list this month: Lastly, kpcyrd added a Rust section to the Stable order for outputs page on our website. [ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 February 2026

Valhalla's Things: A Day Off

Posted on February 3, 2026
Tags: madeof:atoms, madeof:bits
Today I had a day off. Some of it went great. Some less so. I woke up, went out to pay our tribute to NotOurCat, and it was snowing! yay! And I had a day off, so if it had snowed enough that shovelling was needed, I had time to do it (it didn t, it started to rain soon afterwards, but still, YAY snow!). Then I had breakfast, with the fruit rye bread I had baked yesterday, and I treated myself to some of the strong Irish tea I have left, instead of the milder ones I want to finish before buying more of the Irish. And then, I bought myself a fancy new expensive fountain pen. One that costs 16 ! more than three times as much as my usual ones! I hope it will work as well, but I m quite confident it should. I ll find out when it arrives from Germany (together with a few ink samples that will result in a future blog post with some SCIENCE). I decided to try and use bank transfers instead of my visa debit card when buying from online shops that give the option to do so: it s a tiny bit more effort, but it means I m paying 0.25 to my bank1 rather than the seller having to pay some unknown amount to an US based payment provider. Unluckily, the fountain pen website offered a huge number of payment methods, but not bank transfers. sigh. And then, I could start working a bit on the connecting wires for the LED strips for our living room: I soldered two pieces, six wires each (it s one RGB strip, 4 pins, and a warm white one requiring two more), then did a bit of tests, including writing some micropython code to add a test mode that lights up each colour in sequence, and the morning was almost gone. For some reason this project, as simple as it is, is taking forever. But it is showing progress. There was a break, when the postman delivered a package of chemicals 2 for a future project or two. There will be blog posts! After lunch I spent some time finishing eyelets on the outfit I wanted to wear this evening, as I had not been able to finish it during fosdem. This one will result in two blog posts! Meanwhile, in the morning I didn t remember the name of the program I used to load software on micropython boards such as the one that will control the LED strips (that s thonny), and while searching for it in the documentation, I found that there is also a command line program I can use, mpremote, and that s a much better fit for my preferences! I mentioned it in an xmpp room full of nerds, and one of them mentioned that he could try it on his Inkplate, when he had time, and I was nerd-sniped into trying it on mine, which had been sitting unused showing the temperatures in our old house on the last day it spent there and needs to be updated for the sensors in the new house. And that lead to the writing of some notes on how to set it up from the command line (good), and to the opening on one upstream issue (bad), because I have an old model, and the board-specific library isn t working. at all. And that s when I realized that it was 17:00, I still had to cook the bread I had been working on since yesterday evening (ciabatta, one of my favourites, but it needs almost one hour in the oven), the outfit I wanted to wear in the evening was still not wearable, the table needed cleaning and some panicking was due. Thankfully, my mother was cooking dinner, so I didn t have to do that too. I turned the oven on, sewed the shoulder seams of the bodice while spraying water on the bread every 5 minutes, and then while it was cooking on its own, started to attach a closure to the skirt, decided that a safety pin was a perfectly reasonable closure for the first day an outfit is worn, took care of the table, took care of the bread, used some twine to close the bodice, because I still haven t worked out what to use for laces, realized my bodkin is still misplaced, used a long and sharp and big needle meant for sewing mattresses instead of a bodkin, managed not to stab myself, and less than half an hour late we could have dinner. There was bread, there was Swedish crispbread, there were spreads (tuna, and beans), and vegetables, and then there was the cake that caused my mother to panic when she added her last honey to the milk and it curdled (my SO and I tried it, it had no odd taste, we decided it could be used) and it was good, although I had to get a second slice just to be 100% sure of it. And now I m exhausted, and I ve only done half of the things I had planned to do, but I d still say I ve had quite a good day.

  1. Banca Etica, so one that avoids any investment in weapons and a number of other problematic things.
  2. not food grade, except for one, but kitchen-safe.

2 February 2026

Isoken Ibizugbe: How Open Source Contributions Define Your Career Path

Hi there, I m more than halfway through (8 weeks) my Outreachy internship with Debian, working on the openQA project to test Live Images.

My journey into tech began as a software engineering trainee, during which I built a foundation in Bash scripting, C programming, and Python. Later, I worked for a startup as a Product Manager. As is common in startups, I wore many hats, but I found myself drawn most to the Quality Assurance team. Testing user flows and edge-case scenarios sparked my curiosity, and that curiosity is exactly what led me to the Debian Live Image testing project.

From Manual to Automated

In my previous roles, I was accustomed to manual testing, simulating user actions one by one. While effective, I quickly realized it could be a bottleneck in fast-paced environments. This internship has been a masterclass in removing that bottleneck. I ve learned that automating repetitive actions makes life (and engineering) much easier. Life s too short for manual testing  .

One of my proudest technical wins so far has been creating Synergy across desktop environments. I proposed a solution to group common applications so we could use a single Perl script to handle tests for multiple desktop environments. With my mentor s guidance, we implemented this using symbolic links, which significantly reduced code redundancy.

Expanding My Technical Toolkit

Over the last 8 weeks, my technical toolkit has expanded significantly:

  • Perl & openQA: I ve learnt writing with Perl for automation within the openQA framework and I ve successfully automated apps_startstop tests for Cinnamon and LXQt
  • Technical Documentation: I authored a contributor guide. This required paying close attention to detail, ensuring that new contributors can have faster reviews and merged contributions
  • Ansible: I am learning that testing doesn t happen in a vacuum. To ensure new tests are truly automated, they must be integrated into the system s configuration.

Working on this project has shaped my perspective on where I fit in the tech ecosystem. In Open Source, my resume isn t just a PDF, it s a public trail of merged code, technical proposals, and collaborative discussions.

As my mentor recently pointed out, this is my proof-of-work. It s a transparent record of what I am capable of and where my interests lie.

Finally, I ve grown as a team player. Working with a global team across different time zones has taught me the importance of asynchronous communication and respecting everyone s time. Whether I am proposing a new logic or documenting a process, I am learning that open communication is just as vital as clean code.

1 February 2026

Benjamin Mako Hill: What do people do when they edit Wikipedia through Tor?

Note: I have not published blog posts about my academic papers over the past few years. To ensure that my blog contains a more comprehensive record of my published papers and to surface these for folks who missed them, I will be periodically (re)publishing blog posts about some older published projects. This post is closely based on a previously published post by Kaylea Champion on the Community Data Science Blog.

Many individuals use Tor to reduce their visibility to widespread internet surveillance.
One popular approach to protecting our privacy online is to use the Tor network. Tor protects users from being identified by their IP address, which can be tied to a physical location. However, if you d like to contribute to Wikipedia using Tor, you ll run into a problem. Although most IP addresses can edit without an account, Tor users are blocked from editing.
Tor users attempting to contribute to Wikipedia are shown a screen that informs them that they are not allowed to edit Wikipedia.
Other research by my team has shown that Wikipedia s attempt to block Tor is imperfect and that some people have been able to edit despite the ban. As part of this work, we built a dataset of more than 11,000 contributions to Wikipedia via Tor and used quantitative analysis to show that contributions from Tor were of about the same quality as contributions from other new editors and other contributors without accounts. Of course, given the unusual circumstances Tor-based contributors face, we wondered whether a deeper look at the content of their edits might reveal more about their motives and the kinds of contributions they seek to make. Kaylea Champion (then a student, now faculty at UW Bothell) led a qualitative investigation to explore these questions. Given the challenges of studying anonymity seekers, we designed a novel forensic qualitative approach inspired by techniques common in computer security and criminal investigation. We applied this new technique to a sample of 500 editing sessions and categorized each session based on what the editor seemed to be intending to do. Most of the contributions we found fell into one of the two following categories: Although these were most of what we observed, we also found evidence of several types of contributor intent:
An exploratory mapping of our themes in terms of the value a type of contribution represents to the Wikipedia community and the importance of anonymity in facilitating it. Anonymity-protecting tools play a critical role in facilitating contributions on the right side of the figure, while edits on the left are more likely to occur even when anonymity is impossible. Contributions toward the top reflect valuable forms of participation in Wikipedia, while edits at the bottom reflect damage.
In all, these themes led us to reflect on how the risks individuals face when contributing to online communities sometimes diverge from the risks the communities face in accepting their work. Expressing minoritized perspectives, maintaining community standards even when you may be targeted by the rulebreaker, highlighting injustice, or acting as a whistleblower can be very risky for an individual, and may not be possible without privacy protections. Of course, in platforms seeking to support the public good, such knowledge and accountability may be crucial.

This work was published as a paper at CSCW: Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. 2019. A Forensic Qualitative Analysis of Contributions to Wikipedia from Anonymity Seeking Users. Proceedings of the ACM on Human-Computer Interaction. 3, CSCW, Article 53 (November 2019), 26 pages. https://doi.org/10.1145/3359155

This project was conducted by Kaylea Champion, Nora McDonald, Stephanie Bankes, Joseph Zhang, Rachel Greenstadt, Andrea Forte, and Benjamin Mako Hill. This work was supported by the National Science Foundation (awards CNS-1703736 and CNS-1703049) and included the work of two undergraduates supported through an NSF REU supplement.

Next.