Reproducible Builds: Reproducible Builds in February 2026
Welcome to the February 2026 report from the Reproducible Builds project!
These reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
- reproduce.debian.net
- Tool development
- Distribution work
- Miscellaneous news
- Upstream patches
- Documentation updates
- Four new academic papers
reproduce.debian.net
The last year has seen the introduction, development and deployment of reproduce.debian.net. In technical terms, this is an instance of rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there.
This month, however, Holger Levsen added suite-based navigation (eg. Debian trixie vs forky) to the service (in addition to the already existing architecture based navigation) which can be observed on, for instance, the Debian trixie-backports or trixie-security pages.
Tool development
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including preparing and uploading versions, 312 and 313 to Debian.
In particular, Chris updated the post-release deployment pipeline to ensure that the pipeline does not fail if the automatic deployment to PyPI fails [ ]. In addition, Vagrant Cascadian updated an external reference for the 7z tool for GNU Guix. [ ]. Vagrant Cascadian also updated diffoscope in GNU Guix to version 312 and 313.
Distribution work
In Debian this month:
-
26 reviews of Debian packages were added, 5 were updated and 19 were removed this month adding to our extensive knowledge about identified issues.
-
A new debsbom package was uploaded to unstable. According to the package description, this package generates SBOMs (Software Bill of Materials) for distributions based on Debian in the two standard formats, SPDX and CycloneDX. The generated SBOM includes all installed binary packages and also contains Debian Source packages.
-
In addition, a
sbom-toolkit package was uploaded, which provides a collection of scripts for generating SBOM. This is the tooling used in Apertis to generate the Licenses SBOM and the Build Dependency SBOM. It also includes dh-setup-copyright, a Debhelper addon to generate SBOMs from DWARF debug information, which are extracted from DWARF debug information by running dwarf2sources on every ELF binaries in the package and saving the output.
Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for their work there.
Miscellaneous news
-
S ren Tempel (nmeum) wrote up their insightful notes on Debugging Reproducibility Issues in Rust Software after nondeterministic issues were found and investigated for
pimsync in the GNU Guix review process
-
Jeremy Bicha reported a bug in GNOME Clocks after they noticed that version
50.beta regressed in reproducibility compared to 49.0. Specifically, the new generated .oga files differ in their Serial No. and Checksum [fields] . However, Jeremy ended up fixing the issue by replacing ffmpeg with oggenc.
-
kpcyrd shared some information from the
archlinux-dev-public mailing list on our mailing list this month after a discussion at our latest Summit meeting on the topic of Link-Time Optimisation (LTO) specifically on the reasons why LTO often needs to be disabled in relation to Arch Linux s approach to binary hardening.
-
Janneke Nieuwenhuizen posed a question to our list about whether there might be situations where using the UNIX epoch itself (i.e.
0) may materially differ from using SOURCE_DATE_EPOCH) when a situation demands the use of a fixed timestamp.
-
Laurent Huberdeau announced that they had recently finished their masters thesis arguing for the use of POSIX shell for diverse double-compilation and reproducible builds . Laurent also presents
pnut, a C compiler capable of bootstrapping itself and TCC from any POSIX-compliant shell and human-readable source files.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
-
Gioele Barabucci:
- #1127641 filed against
bitsnpicas.
- #1127643 filed against
fonts-topaz-unicode.
- #1128901 filed against
bitsnpicas.
Documentation updates
Once again, there were a number of improvements made to our website this month including:
-
Aman Sharma added a Java reproducible builds paper to the Academic publications page. [ ]
-
Chris Lamb added a reference to the
repro-build to the Tools page. [ ]
-
Michiel Hendriks corrected an issue on the JVM page in relation to
.properties files. [ ]
-
kpcyrd added Homebrew to the Who is involved page. [ ][ ]
Four new academic papers
Julien Malka and Arnout Engelen published a paper titled Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model:
[While] recent studies have shown that high reproducibility rates are achievable at scale demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database [ ].
A PDF of their paper is available online.
Javier Ron and Martin Monperrus of KTH Royal Institute of Technology, Sweden, also published a paper, titled Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. [ ]
A PDF of the paper is available online.
Oreofe Solarin of Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, USA, published It s Not Just Timestamps: A Study on Docker Reproducibility:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We built a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible.
A PDF of Oreofe s paper is available online.
Lastly, Jens Dietrich and Behnaz Hassanshahi published On the Variability of Source Code in Maven Package Rebuilds:
[In] this paper we test the assumption that the same source code is being used [by] alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google s Assured Open Source and Oracle s Build-from-Source projects. [ ]
A PDF of their paper is available online.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds on irc.oftc.net.
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org
diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including preparing and uploading versions, 312 and 313 to Debian.
In particular, Chris updated the post-release deployment pipeline to ensure that the pipeline does not fail if the automatic deployment to PyPI fails [ ]. In addition, Vagrant Cascadian updated an external reference for the 7z tool for GNU Guix. [ ]. Vagrant Cascadian also updated diffoscope in GNU Guix to version 312 and 313.
Distribution work
In Debian this month:
-
26 reviews of Debian packages were added, 5 were updated and 19 were removed this month adding to our extensive knowledge about identified issues.
-
A new debsbom package was uploaded to unstable. According to the package description, this package generates SBOMs (Software Bill of Materials) for distributions based on Debian in the two standard formats, SPDX and CycloneDX. The generated SBOM includes all installed binary packages and also contains Debian Source packages.
-
In addition, a
sbom-toolkit package was uploaded, which provides a collection of scripts for generating SBOM. This is the tooling used in Apertis to generate the Licenses SBOM and the Build Dependency SBOM. It also includes dh-setup-copyright, a Debhelper addon to generate SBOMs from DWARF debug information, which are extracted from DWARF debug information by running dwarf2sources on every ELF binaries in the package and saving the output.
Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for their work there.
Miscellaneous news
-
S ren Tempel (nmeum) wrote up their insightful notes on Debugging Reproducibility Issues in Rust Software after nondeterministic issues were found and investigated for
pimsync in the GNU Guix review process
-
Jeremy Bicha reported a bug in GNOME Clocks after they noticed that version
50.beta regressed in reproducibility compared to 49.0. Specifically, the new generated .oga files differ in their Serial No. and Checksum [fields] . However, Jeremy ended up fixing the issue by replacing ffmpeg with oggenc.
-
kpcyrd shared some information from the
archlinux-dev-public mailing list on our mailing list this month after a discussion at our latest Summit meeting on the topic of Link-Time Optimisation (LTO) specifically on the reasons why LTO often needs to be disabled in relation to Arch Linux s approach to binary hardening.
-
Janneke Nieuwenhuizen posed a question to our list about whether there might be situations where using the UNIX epoch itself (i.e.
0) may materially differ from using SOURCE_DATE_EPOCH) when a situation demands the use of a fixed timestamp.
-
Laurent Huberdeau announced that they had recently finished their masters thesis arguing for the use of POSIX shell for diverse double-compilation and reproducible builds . Laurent also presents
pnut, a C compiler capable of bootstrapping itself and TCC from any POSIX-compliant shell and human-readable source files.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
-
Gioele Barabucci:
- #1127641 filed against
bitsnpicas.
- #1127643 filed against
fonts-topaz-unicode.
- #1128901 filed against
bitsnpicas.
Documentation updates
Once again, there were a number of improvements made to our website this month including:
-
Aman Sharma added a Java reproducible builds paper to the Academic publications page. [ ]
-
Chris Lamb added a reference to the
repro-build to the Tools page. [ ]
-
Michiel Hendriks corrected an issue on the JVM page in relation to
.properties files. [ ]
-
kpcyrd added Homebrew to the Who is involved page. [ ][ ]
Four new academic papers
Julien Malka and Arnout Engelen published a paper titled Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model:
[While] recent studies have shown that high reproducibility rates are achievable at scale demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database [ ].
A PDF of their paper is available online.
Javier Ron and Martin Monperrus of KTH Royal Institute of Technology, Sweden, also published a paper, titled Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. [ ]
A PDF of the paper is available online.
Oreofe Solarin of Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, USA, published It s Not Just Timestamps: A Study on Docker Reproducibility:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We built a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible.
A PDF of Oreofe s paper is available online.
Lastly, Jens Dietrich and Behnaz Hassanshahi published On the Variability of Source Code in Maven Package Rebuilds:
[In] this paper we test the assumption that the same source code is being used [by] alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google s Assured Open Source and Oracle s Build-from-Source projects. [ ]
A PDF of their paper is available online.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds on irc.oftc.net.
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org
sbom-toolkit package was uploaded, which provides a collection of scripts for generating SBOM. This is the tooling used in Apertis to generate the Licenses SBOM and the Build Dependency SBOM. It also includes dh-setup-copyright, a Debhelper addon to generate SBOMs from DWARF debug information, which are extracted from DWARF debug information by running dwarf2sources on every ELF binaries in the package and saving the output.
-
S ren Tempel (nmeum) wrote up their insightful notes on Debugging Reproducibility Issues in Rust Software after nondeterministic issues were found and investigated for
pimsyncin the GNU Guix review process -
Jeremy Bicha reported a bug in GNOME Clocks after they noticed that version
50.betaregressed in reproducibility compared to49.0. Specifically, the new generated.ogafiles differ in theirSerial No.andChecksum[fields] . However, Jeremy ended up fixing the issue by replacingffmpegwithoggenc. -
kpcyrd shared some information from the
archlinux-dev-publicmailing list on our mailing list this month after a discussion at our latest Summit meeting on the topic of Link-Time Optimisation (LTO) specifically on the reasons why LTO often needs to be disabled in relation to Arch Linux s approach to binary hardening. -
Janneke Nieuwenhuizen posed a question to our list about whether there might be situations where using the UNIX epoch itself (i.e.
0) may materially differ from usingSOURCE_DATE_EPOCH) when a situation demands the use of a fixed timestamp. -
Laurent Huberdeau announced that they had recently finished their masters thesis arguing for the use of POSIX shell for diverse double-compilation and reproducible builds . Laurent also presents
pnut, a C compiler capable of bootstrapping itself and TCC from any POSIX-compliant shell and human-readable source files.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
-
Bernhard M. Wiedemann:
-
Gioele Barabucci:
- #1127641 filed against
bitsnpicas.
- #1127643 filed against
fonts-topaz-unicode.
- #1128901 filed against
bitsnpicas.
Documentation updates
Once again, there were a number of improvements made to our website this month including:
-
Aman Sharma added a Java reproducible builds paper to the Academic publications page. [ ]
-
Chris Lamb added a reference to the
repro-build to the Tools page. [ ]
-
Michiel Hendriks corrected an issue on the JVM page in relation to
.properties files. [ ]
-
kpcyrd added Homebrew to the Who is involved page. [ ][ ]
Four new academic papers
Julien Malka and Arnout Engelen published a paper titled Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model:
[While] recent studies have shown that high reproducibility rates are achievable at scale demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database [ ].
A PDF of their paper is available online.
Javier Ron and Martin Monperrus of KTH Royal Institute of Technology, Sweden, also published a paper, titled Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. [ ]
A PDF of the paper is available online.
Oreofe Solarin of Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, USA, published It s Not Just Timestamps: A Study on Docker Reproducibility:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We built a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible.
A PDF of Oreofe s paper is available online.
Lastly, Jens Dietrich and Behnaz Hassanshahi published On the Variability of Source Code in Maven Package Rebuilds:
[In] this paper we test the assumption that the same source code is being used [by] alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google s Assured Open Source and Oracle s Build-from-Source projects. [ ]
A PDF of their paper is available online.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds on irc.oftc.net.
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org
- #1127641 filed against
bitsnpicas. - #1127643 filed against
fonts-topaz-unicode. - #1128901 filed against
bitsnpicas.
Once again, there were a number of improvements made to our website this month including:
- Aman Sharma added a Java reproducible builds paper to the Academic publications page. [ ]
-
Chris Lamb added a reference to the
repro-buildto the Tools page. [ ] -
Michiel Hendriks corrected an issue on the JVM page in relation to
.propertiesfiles. [ ] - kpcyrd added Homebrew to the Who is involved page. [ ][ ]
Four new academic papers
Julien Malka and Arnout Engelen published a paper titled Lila: Decentralized Build Reproducibility Monitoring for the Functional Package Management Model:
[While] recent studies have shown that high reproducibility rates are achievable at scale demonstrated by the Nix ecosystem achieving over 90% reproducibility on more than 80,000 packages the problem of effective reproducibility monitoring remains largely unsolved. In this work, we address the reproducibility monitoring challenge by introducing Lila, a decentralized system for reproducibility assessment tailored to the functional package management model. Lila enables distributed reporting of build results and aggregation into a reproducibility database [ ].
A PDF of their paper is available online.
Javier Ron and Martin Monperrus of KTH Royal Institute of Technology, Sweden, also published a paper, titled Verifiable Provenance of Software Artifacts with Zero-Knowledge Compilation:
Verifying that a compiled binary originates from its claimed source code is a fundamental security requirement, called source code provenance. Achieving verifiable source code provenance in practice remains challenging. The most popular technique, called reproducible builds, requires difficult matching and reexecution of build toolchains and environments. We propose a novel approach to verifiable provenance based on compiling software with zero-knowledge virtual machines (zkVMs). By executing a compiler within a zkVM, our system produces both the compiled output and a cryptographic proof attesting that the compilation was performed on the claimed source code with the claimed compiler. [ ]
A PDF of the paper is available online.
Oreofe Solarin of Department of Computer and Data Sciences, Case Western Reserve University, Cleveland, Ohio, USA, published It s Not Just Timestamps: A Study on Docker Reproducibility:
Reproducible container builds promise a simple integrity check for software supply chains: rebuild an image from its Dockerfile and compare hashes. We built a Docker measurement pipeline and apply it to a stratified sample of 2,000 GitHub repositories that contained a Dockerfile. We found that only 56% produce any buildable image, and just 2.7% of those are bitwise reproducible without any infrastructure configurations. After modifying infrastructure configurations, we raise bitwise reproducibility by 18.6%, but 78.7% of buildable Dockerfiles remain non-reproducible.
A PDF of Oreofe s paper is available online.
Lastly, Jens Dietrich and Behnaz Hassanshahi published On the Variability of Source Code in Maven Package Rebuilds:
[In] this paper we test the assumption that the same source code is being used [by] alternative builds. To study this, we compare the sources released with packages on Maven Central, with the sources associated with independently built packages from Google s Assured Open Source and Oracle s Build-from-Source projects. [ ]
A PDF of their paper is available online.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
-
IRC:
#reproducible-builds on irc.oftc.net.
-
Mastodon: @reproducible_builds@fosstodon.org
-
Mailing list:
rb-general@lists.reproducible-builds.org
#reproducible-builds on irc.oftc.net.
rb-general@lists.reproducible-builds.org
A lot of hardware runs non-free software. Sometimes that non-free software is in ROM. Sometimes it s in flash. Sometimes it s not stored on the device at all, it s pushed into it at runtime by another piece of hardware or by the operating system. We typically refer to this software as firmware to differentiate it from the software run on the CPU after the OS has started
For my birthday, I ve bought myself a fancy new expensive
Such a fancy pen, of course requires a suitable case: I couldn t use the
failed prototype of a case I ve been keeping my Preppys in, so I had to
get out the nice vegetable tanned leather Yeah, nope, I don t have that
(yet). I got out the latex and cardboard material that is sold as a
(cheaper) leather substitute, doesn t look like leather at all, but is
quite nice (and easy) to work with. The project is not vegan anyway,
because I used waxed linen thread, waxing it myself with a lot of very
nicely smelling beeswax.
I got the measurements
From the width of the material I could conveniently cut two cases, so
that s what I did, started sewing the first one, realized that I got the
order of stitching wrong, and also that if I used light blue thread
instead of the black one it would look nice, and be easier to see in the
pictures for the published pattern, started sewing the second one, and
kept alternating between the two, depending on the availability of light
for taking pictures.
One of the two took the place of my desktop one, where I had one more
pen than slots, and one of the old prototypes was moved to keep my
bedside pen, and the other new case was used for the new pen in my
handbag, together with a Preppy, and now I have a free slot and you can
see how this is going to go wrong, right? :D
Debian Lomiri for Debian 13 (previous project)
In our previous project around Debian and Lomiri (lasting until July
2025), we achieved to get Lomiri 0.5.0 (and with it another 130
packages) into Debian (with two minor exceptions [1]) just in time
for the Debian 13 release in August 2025.
Debian Lomiri for Debian 14
At DebConf in Brest, a follow-up project has been designed between the
project sponsor and Fre(i)e Software GmbH [2]. The new project (on paper)
started on 1st August 2025 and project duration was agreed on to be 2
years, allowing our company to work with an equivalent of ~5 FTE on
Lomiri targetting the Debian 14 release some time in the second half of
2027 (an assumed date, let's see what happens).
Ongoing work would be covered from day one of the new project and once
all contract details had been properly put on paper end of September,
Fre(i)e Software GmbH started hiring a new team of software developers
and (future) Debian maintainers. (More of that new team in our next
Q4/2025 report).
The ongoing work of Q3/2025 was basically Guido Berh rster and myself
working on Morph Browser Qt6 (mostly Guido together with Bhushan from
MiraLab [3]) and package maintenance in Debian (mostly me).
Morph Browser Qt6
The first milestone we could reach with the Qt6 porting of Morph Browser [4]
and related components (LUITK aka lomiri-ui-toolkit (big chunk! [5]),
lomiri-content-hub, lomiri-download-manager and a few other components)
was reached on 21st Sep 2025 with an upload of Morph Browser
1.2.0~git20250813.1ca2aa7+dfsg-1~exp1 to Debian experimental and the
Lomiri PPA [6]).
Preparation of Debian 13 Updates (still pending)
In background, various Lomiri updates for Debian 13 have been prepared
during Q3/2025 (with a huge patchset), but publishing those to Debian 13
are still pending as tests are still not satisfying.
[1] lomiri-push-service and nuntium
Earlier this week a colleague of mine, Emilio Jes s Gallego Arias, shared a demo of something he built as an experiment, and I felt the desire to share this and add a bit of reflection. (Not keen on watching a 5 min video? Read on below.)
When the Free Software movement started in the 1980s, most of the world
had just made a transition from free university-written software to
non-free, proprietary, company-written software. Because of that, the
initial ethical standpoint of the Free Software foundation was that it's
fine to run a non-free operating system, as long as all the software you
run on that operating system is free.
Initially this was just the


The conceptual model from our paper, visualizing possible institutional configurations among Wikipedia projects that affect the risk of governance capture.
AI generated code and its quality. It's hard to get larger tasks done and smaller tasks I am faster myself. I suspect this will change soon, but as of today things are challenging. Large chunks of code that's generated by AI is hard to review and generally of not great quality.
Possibly two layers that cause quality issues. One is that the instructions aren't clear for the AI, and the misunderstanding shows; I could sometimes reverse engineer the misunderstanding, and that could be resolved in the future. The other is that probably what the AI have learnt from is from a corpus that is not fit for the purpose. Which I suspect can be improved in the future with methodology and improvements in how they obtain the corpus, or redirect the learnings, or how it distills the learnings.
I'm noting down what I think today, as the world is changing rapidly, and I am bound to see a very different scene soon.
The AI hype is based on the assumption that the frontier AI labs are producing better and better foundational models at an accelerating pace. Is that really true, or are people just in sort of a mass psychosis because AI models have become so good at mimicking human behavior that we unconsciously attribute increasing intelligence to them? I decided to conduct a mini-benchmark of my own to find out if the latest and greatest AI models are actually really good or not.
Common for all the test questions is that they are fairly straightforward and have a clear answer, yet the answer isn t common knowledge or statistically the most obvious one, and instead requires a bit of reasoning to get correct.
Some of these questions are also based on myself witnessing a flagship model failing miserably to answer it.
While Gemini and Grok were among the three models not falling into this trap, the response from Claude was exemplary good:
I have seen Grok do something similar before, which in fact inspired me to include this question in my mini-benchmark.
GPT got a bit further, but for Hindi, Arabic and Bengali it listed the numerals in local script, not the number words. Gemini, GLM and Kimi gave a complete and correct answer as a list, while the absolute best answer and presentation was by Claude, that gave the table below:
A human can easily count that there are 10 rows and 30+ columns in the grid, but because the picture resolution isn t good enough, the exact number of columns can t be counted, and the answer should be that there are at least 300 launch pads in the picture.
GPT and Grok both guessed the count is zero. Instead of hallucinating some number they say zero, but it would have been better to not give any number at all, and just state that they are unable to perform the task. Gemini gave as its answer 101 , which is quite odd, but reading the reasoning section, it seems to have tried counting items in the image without reasoning much about what it is actually counting and that there is clearly a grid that can make the counting much easier. Both Qwen and Kimi state they can see four parallel structures, but are unable to count drone launch pads.
The absolutely best answer was given by Claude, which counted 10-12 rows and 30-40+ columns, and concluded that there must be 300-500 drone launch pads. Very close to best human level - impressive!
This question applied only to multi-modal models that can see images, so GLM and MinMax could not give any response.
I want to get back into the habit of blogging, but I've struggled.
I've had several ideas of topics to try and write about, but I've
not managed to put aside the time to do it. I thought I'd try and
bash out a one-take, stream-of-conciousness-style post now, to get
back into the swing.
I'm writing from the lounge of my hotel room in Lanzarote, where
my family have gone for the School break. The weather at home has
been pretty awful this year, and this week is traditionally quite
miserable at the best of times. It's been dry with highs of around
25 .
It's been an unusual holiday in one respect: one of my kids is
struggling with Autistic Burnout. We were really unsure whether
taking her was a good idea: and certainly towards the beginning
of the holiday felt we may have made a mistake. Writing now, at
the end, I'm not so sure. But we're very unlikely to have anything
resembling a traditional summer holiday for the foreseeable.
Managing Autistic Burnout and the UK ways the UK healthcare and
education systems manage it (or fail to) has been a huge part of my
recent life. Perhaps I should write more about that. This coming
week the government are likely to publish
The eighteenth release of the
Since Generative AI is currently the most popular topic, I wanted to get my
hands dirty and learn something new. I was learning about the Model Context
Protocol at the time and wanted to apply it to build something simple.