Search Results: "xam"

19 October 2025

Otto Kek l inen: Could the XZ backdoor have been detected with better Git and Debian packaging practices?

Featured image of post Could the XZ backdoor have been detected with better Git and Debian packaging practices?The discovery of a backdoor in XZ Utils in the spring of 2024 shocked the open source community, raising critical questions about software supply chain security. This post explores whether better Debian packaging practices could have detected this threat, offering a guide to auditing packages and suggesting future improvements. The XZ backdoor in versions 5.6.0/5.6.1 made its way briefly into many major Linux distributions such as Debian and Fedora, but luckily didn t reach that many actual users, as the backdoored releases were quickly removed thanks to the heroic diligence of Andres Freund. We are all extremely lucky that he detected a half a second performance regression in SSH, cared enough to trace it down, discovered malicious code in the XZ library loaded by SSH, and reported promtly to various security teams for quick coordinated actions. This episode makes software engineers pondering the following questions: As a Debian Developer, I decided to audit the xz package in Debian, share my methodology and findings in this post, and also suggest some improvements on how the software supply-chain security could be tightened in Debian specifically. Note that the scope here is only to inspect how Debian imports software from its upstreams, and how they are distributed to Debian s users. This excludes the whole story of how to assess if an upstream project is following software development security best practices. This post doesn t discuss how to operate an individual computer running Debian to ensure it remains untampered as there are plenty of guides on that already.

Downloading Debian and upstream source packages Let s start by working backwards from what the Debian package repositories offer for download. As auditing binaries is extremely complicated, we skip that, and assume the Debian build hosts are trustworthy and reliably building binaries from the source packages, and the focus should be on auditing the source code packages. As with everything in Debian, there are multiple tools and ways to do the same thing, but in this post only one (and hopefully the best) way to do something is presented for brevity. The first step is to download the latest version and some past versions of the package from the Debian archive, which is easiest done with debsnap. The following command will download all Debian source packages of xz-utils from Debian release 5.2.4-1 onwards:
$ debsnap --verbose --first 5.2.4-1 xz-utils
Getting json https://snapshot.debian.org/mr/package/xz-utils/
...
Getting dsc file xz-utils_5.2.4-1.dsc: https://snapshot.debian.org/file/a98271e4291bed8df795ce04d9dc8e4ce959462d
Getting file xz-utils_5.2.4.orig.tar.xz.asc: https://snapshot.debian.org/file/59ccbfb2405abe510999afef4b374cad30c09275
Getting file xz-utils_5.2.4-1.debian.tar.xz: https://snapshot.debian.org/file/667c14fd9409ca54c397b07d2d70140d6297393f
source-xz-utils/xz-utils_5.2.4-1.dsc:
Good signature found
validating xz-utils_5.2.4.orig.tar.xz
validating xz-utils_5.2.4.orig.tar.xz.asc
validating xz-utils_5.2.4-1.debian.tar.xz
All files validated successfully.
Once debsnap completes there will be a subfolder source-<package name> with the following types of files:
  • *.orig.tar.xz: source code from upstream
  • *.orig.tar.xz.asc: detached signature (if upstream signs their releases)
  • *.debian.tar.xz: Debian packaging source, i.e. the debian/ subdirectory contents
  • *.dsc: Debian source control file, including signature by Debian Developer/Maintainer
Example:
$ ls -1 source-xz-utils/
...
xz-utils_5.6.4.orig.tar.xz
xz-utils_5.6.4.orig.tar.xz.asc
xz-utils_5.6.4-1.debian.tar.xz
xz-utils_5.6.4-1.dsc
xz-utils_5.8.0.orig.tar.xz
xz-utils_5.8.0.orig.tar.xz.asc
xz-utils_5.8.0-1.debian.tar.xz
xz-utils_5.8.0-1.dsc
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
xz-utils_5.8.1-1.1.debian.tar.xz
xz-utils_5.8.1-1.1.dsc
xz-utils_5.8.1-1.debian.tar.xz
xz-utils_5.8.1-1.dsc
xz-utils_5.8.1-2.debian.tar.xz
xz-utils_5.8.1-2.dsc

Verifying authenticity of upstream and Debian sources using OpenPGP signatures As seen in the output of debsnap, it already automatically verifies that the downloaded files match the OpenPGP signatures. To have full clarity on what files were authenticated with what keys, we should verify the Debian packagers signature with:
$ gpg --verify --auto-key-retrieve --keyserver hkps://keyring.debian.org xz-utils_5.8.1-2.dsc
gpg: Signature made Fri Oct 3 22:04:44 2025 UTC
gpg: using RSA key 57892E705233051337F6FDD105641F175712FA5B
gpg: requesting key 05641F175712FA5B from hkps://keyring.debian.org
gpg: key 7B96E8162A8CF5D1: public key "Sebastian Andrzej Siewior" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Sebastian Andrzej Siewior" [unknown]
gpg: aka "Sebastian Andrzej Siewior <bigeasy@linutronix.de>" [unknown]
gpg: aka "Sebastian Andrzej Siewior <sebastian@breakpoint.cc>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6425 4695 FFF0 AA44 66CC 19E6 7B96 E816 2A8C F5D1
Subkey fingerprint: 5789 2E70 5233 0513 37F6 FDD1 0564 1F17 5712 FA5B
The upstream tarball signature (if available) can be verified with:
$ gpg --verify --auto-key-retrieve xz-utils_5.8.1.orig.tar.xz.asc
gpg: assuming signed data in 'xz-utils_5.8.1.orig.tar.xz'
gpg: Signature made Thu Apr 3 11:38:23 2025 UTC
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: key 38EE757D69184620: public key "Lasse Collin <lasse.collin@tukaani.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 3690 C240 CE51 B467 0D30 AD1C 38EE 757D 6918 4620
Note that this only proves that there is a key that created a valid signature for this content. The authenticity of the keys themselves need to be validated separately before trusting they in fact are the keys of these people. That can be done by checking e.g. the upstream website for what key fingerprints they published, or the Debian keyring for Debian Developers and Maintainers, or by relying on the OpenPGP web-of-trust .

Verifying authenticity of upstream sources by comparing checksums In case the upstream in question does not publish release signatures, the second best way to verify the authenticity of the sources used in Debian is to download the sources directly from upstream and compare that the sha256 checksums match. This should be done using the debian/watch file inside the Debian packaging, which defines where the upstream source is downloaded from. Continuing on the example situation above, we can unpack the latest Debian sources, enter and then run uscan to download:
$ tar xvf xz-utils_5.8.1-2.debian.tar.xz
...
debian/rules
debian/source/format
debian/source.lintian-overrides
debian/symbols
debian/tests/control
debian/tests/testsuite
debian/upstream/signing-key.asc
debian/watch
...
$ uscan --download-current-version --destdir /tmp
Newest version of xz-utils on remote site is 5.8.1, specified download version is 5.8.1
gpgv: Signature made Thu Apr 3 11:38:23 2025 UTC
gpgv: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpgv: Good signature from "Lasse Collin <lasse.collin@tukaani.org>"
Successfully symlinked /tmp/xz-5.8.1.tar.xz to /tmp/xz-utils_5.8.1.orig.tar.xz.
The original files downloaded from upstream are now in /tmp along with the files renamed to follow Debian conventions. Using everything downloaded so far the sha256 checksums can be compared across the files and also to what the .dsc file advertised:
$ ls -1 /tmp/
xz-5.8.1.tar.xz
xz-5.8.1.tar.xz.sig
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
$ sha256sum xz-utils_5.8.1.orig.tar.xz /tmp/xz-5.8.1.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e xz-utils_5.8.1.orig.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e /tmp/xz-5.8.1.tar.xz
$ grep -A 3 Sha256 xz-utils_5.8.1-2.dsc
Checksums-Sha256:
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e 1461872 xz-utils_5.8.1.orig.tar.xz
4138f4ceca1aa7fd2085fb15a23f6d495d27bca6d3c49c429a8520ea622c27ae 833 xz-utils_5.8.1.orig.tar.xz.asc
3ed458da17e4023ec45b2c398480ed4fe6a7bfc1d108675ec837b5ca9a4b5ccb 24648 xz-utils_5.8.1-2.debian.tar.xz
In the example above the checksum 0b54f79df85... is the same across the files, so it is a match.

Repackaged upstream sources can t be verified as easily Note that uscan may in rare cases repackage some upstream sources, for example to exclude files that don t adhere to Debian s copyright and licensing requirements. Those files and paths would be listed under the Files-Excluded section in the debian/copyright file. There are also other situations where the file that represents the upstream sources in Debian isn t bit-by-bit the same as what upstream published. If checksums don t match, an experienced Debian Developer should review all package settings (e.g. debian/source/options) to see if there was a valid and intentional reason for divergence.

Reviewing changes between two source packages using diffoscope Diffoscope is an incredibly capable and handy tool to compare arbitrary files. For example, to view a report in HTML format of the differences between two XZ releases, run:
diffoscope --html-dir xz-utils-5.6.4_vs_5.8.0 xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
browse xz-utils-5.6.4_vs_5.8.0/index.html
Inspecting diffoscope output of differences between two XZ Utils releases If the changes are extensive, and you want to use a LLM to help spot potential security issues, generate the report of both the upstream and Debian packaging differences in Markdown with:
diffoscope --markdown diffoscope-debian.md xz-utils_5.6.4-1.debian.tar.xz xz-utils_5.8.1-2.debian.tar.xz
diffoscope --markdown diffoscope.md xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
The Markdown files created above can then be passed to your favorite LLM, along with a prompt such as:
Based on the attached diffoscope output for a new Debian package version compared with the previous one, list all suspicious changes that might have introduced a backdoor, followed by other potential security issues. If there are none, list a short summary of changes as the conclusion.

Reviewing Debian source packages in version control As of today only 93% of all Debian source packages are tracked in git on Debian s GitLab instance at salsa.debian.org. Some key packages such as Coreutils and Bash are not using version control at all, as their maintainers apparently don t see value in using git for Debian packaging, and the Debian Policy does not require it. Thus, the only reliable and consistent way to audit changes in Debian packages is to compare the full versions from the archive as shown above. However, for packages that are hosted on Salsa, one can view the git history to gain additional insight into what exactly changed, when and why. For packages that are using version control, their location can be found in the Git-Vcs header in the debian/control file. For xz-utils the location is salsa.debian.org/debian/xz-utils. Note that the Debian policy does not state anything about how Salsa should be used, or what git repository layout or development practices to follow. In practice most packages follow the DEP-14 proposal, and use git-buildpackage as the tool for managing changes and pushing and pulling them between upstream and salsa.debian.org. To get the XZ Utils source, run:
$ gbp clone https://salsa.debian.org/debian/xz-utils.git
gbp:info: Cloning from 'https://salsa.debian.org/debian/xz-utils.git'
At the time of writing this post the git history shows:
$ git log --graph --oneline
* bb787585 (HEAD -> debian/unstable, origin/debian/unstable, origin/HEAD) Prepare 5.8.1-2
* 4b769547 d: Remove the symlinks from -dev package.
* a39f3428 Correct the nocheck build profile
* 1b806b8d Import Debian changes 5.8.1-1.1
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
 \
  * fa1e8796 (origin/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
  * a522a226 Bump version and soname for 5.8.1
  * 1c462c2a Add NEWS for 5.8.1
  * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
  * 48440e24 Tests: Add a fuzzing target for the multithreaded .xz decoder
  * 0c80045a liblzma: mt dec: Fix lack of parallelization in single-shot decoding
  * 81880488 liblzma: mt dec: Don't modify thr->in_size in the worker thread
  * d5a2ffe4 liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
  * c0c83596 liblzma: mt dec: Simplify by removing the THR_STOP state
  * 831b55b9 liblzma: mt dec: Fix a comment
  * b9d168ee liblzma: Add assertions to lzma_bufcpy()
  * c8e0a489 DOS: Update Makefile to fix the build
  * 307c02ed sysdefs.h: Avoid <stdalign.h> even with C11 compilers
  * 7ce38b31 Update THANKS
  * 688e51bd Translations: Update the Croatian translation
*   a6b54dde Prepare 5.8.0-1.
*   77d9470f Add 5.8 symbols.
*   9268eb66 Import 5.8.0
*   6f85ef4f Update upstream source from tag 'upstream/5.8.0'
 \ \
  *   afba662b New upstream version 5.8.0
   /
  * 173fb5c6 doc/SHA256SUMS: Add 5.8.0
  * db9258e8 Bump version and soname for 5.8.0
  * bfb752a3 Add NEWS for 5.8.0
  * 6ccbb904 Translations: Run "make -C po update-po"
  * 891a5f05 Translations: Run po4a/update-po
  * 4f52e738 Translations: Partially fix overtranslation in Serbian man pages
  * ff5d9447 liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage
  * 943b012d liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()
This shows both the changes on the debian/unstable branch as well as the intermediate upstream import branch, and the actual real upstream development branch. See my Debian source packages in git explainer for details of what these branches are used for. To only view changes on the Debian branch, run git log --graph --oneline --first-parent or git log --graph --oneline -- debian. The Debian branch should only have changes inside the debian/ subdirectory, which is easy to check with:
$ git diff --stat upstream/v5.8
debian/README.source   16 +++
debian/autogen.sh   32 +++++
debian/changelog   949 ++++++++++++++++++++++++++
...
debian/upstream/signing-key.asc   52 +++++++++
debian/watch   4 +
debian/xz-utils.README.Debian   47 ++++++++
debian/xz-utils.docs   6 +
debian/xz-utils.install   28 +++++
debian/xz-utils.postinst   19 +++
debian/xz-utils.prerm   10 ++
debian/xzdec.docs   6 +
debian/xzdec.install   4 +
33 files changed, 2014 insertions(+)
All the files outside the debian/ directory originate from upstream, and for example running git blame on them should show only upstream commits:
$ git blame CMakeLists.txt
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 1) # SPDX-License-Identifier: 0BSD
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 2)
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 3) ###############
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 4) #
426bdc709 (Lasse Collin 2024-02-17 21:45:07 +0200 5) # CMake support for building XZ Utils
If the upstream in question signs commits or tags, they can be verified with e.g.:
$ git verify-tag v5.6.2
gpg: Signature made Wed 29 May 2024 09:39:42 AM PDT
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: issuer "lasse.collin@tukaani.org"
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [expired]
gpg: Note: This key has expired!
The main benefit of reviewing changes in git is the ability to see detailed information about each individual change, instead of just staring at a massive list of changes without any explanations. In this example, to view all the upstream commits since the previous import to Debian, one would view the commit range from afba662b New upstream version 5.8.0 to fa1e8796 New upstream version 5.8.1 with git log --reverse -p afba662b...fa1e8796. However, a far superior way to review changes would be to browse this range using a visual git history viewer, such as gitk. Either way, looking at one code change at a time and reading the git commit message makes the review much easier. Browsing git history in gitk --all

Comparing Debian source packages to git contents As stated in the beginning of the previous section, and worth repeating, there is no guarantee that the contents in the Debian packaging git repository matches what was actually uploaded to Debian. While the tag2upload project in Debian is getting more and more popular, Debian is still far from having any system to enforce that the git repository would be in sync with the Debian archive contents. To detect such differences we can run diff across the Debian source packages downloaded with debsnap earlier (path source-xz-utils/xz-utils_5.8.1-2.debian) and the git repository cloned in the previous section (path xz-utils):
diff
$ diff -u source-xz-utils/xz-utils_5.8.1-2.debian/ xz-utils/debian/
diff -u source-xz-utils/xz-utils_5.8.1-2.debian/changelog xz-utils/debian/changelog
--- debsnap/source-xz-utils/xz-utils_5.8.1-2.debian/changelog 2025-10-03 09:32:16.000000000 -0700
+++ xz-utils/debian/changelog 2025-10-12 12:18:04.623054758 -0700
@@ -5,7 +5,7 @@
 * Remove the symlinks from -dev, pointing to the lib package.
 (Closes: #1109354)

- -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:32:16 +0200
+ -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:36:59 +0200
In the case above diff revealed that the timestamp in the changelog in the version uploaded to Debian is different from what was committed to git. This is not malicious, just a mistake by the maintainer who probably didn t run gbp tag immediately after upload, but instead some dch command and ended up with having a different timestamps in the git compared to what was actually uploaded to Debian.

Creating syntetic Debian packaging git repositories If no Debian packaging git repository exists, or if it is lagging behind what was uploaded to Debian s archive, one can use git-buildpackage s import-dscs feature to create synthetic git commits based on the files downloaded by debsnap, ensuring the git contents fully matches what was uploaded to the archive. To import a single version there is gbp import-dsc (no s at the end), of which an example invocation would be:
$ gbp import-dsc --verbose ../source-xz-utils/xz-utils_5.8.1-2.dsc
Version '5.8.1-2' imported under '/home/otto/debian/xz-utils-2025-09-29'
Example commit history from a repository with commits added with gbp import-dsc:
$ git log --graph --oneline
* 86aed07b (HEAD -> debian/unstable, tag: debian/5.8.1-2, origin/debian/unstable) Import Debian changes 5.8.1-2
* f111d93b (tag: debian/5.8.1-1.1) Import Debian changes 5.8.1-1.1
* 1106e19b (tag: debian/5.8.1-1) Import Debian changes 5.8.1-1
 \
  * 08edbe38 (tag: upstream/5.8.1, origin/upstream/v5.8, upstream/v5.8) Import Upstream version 5.8.1
   \
    * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
    * 1c462c2a Add NEWS for 5.8.1
    * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
An online example repository with only a few missing uploads added using gbp import-dsc can be viewed at salsa.debian.org/otto/xz-utils-2025-09-29/-/network/debian%2Funstable An example repository that was fully crafted using gbp import-dscs can be viewed at salsa.debian.org/otto/xz-utils-gbp-import-dscs-debsnap-generated/-/network/debian%2Flatest. There exists also dgit, which in a similar way creates a synthetic git history to allow viewing the Debian archive contents via git tools. However, its focus is on producing new package versions, so fetching a package with dgit that has not had the history recorded in dgit earlier will only show the latest versions:
$ dgit clone xz-utils
canonical suite name for unstable is sid
starting new git history
last upload to archive: NO git hash
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz...
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz.asc...
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1-2.debian.tar.xz...
dpkg-source: info: extracting xz-utils in unpacked
dpkg-source: info: unpacking xz-utils_5.8.1.orig.tar.xz
dpkg-source: info: unpacking xz-utils_5.8.1-2.debian.tar.xz
synthesised git commit from .dsc 5.8.1-2
HEAD is now at f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium
dgit ok: ready for work in xz-utils
$ dgit/sid   git log --graph --oneline
* f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium 9 days ago (HEAD -> dgit/sid, dgit/dgit/sid)
 \
  * 11d3a62 Import xz-utils_5.8.1-2.debian.tar.xz 9 days ago
* 15dcd95 Import xz-utils_5.8.1.orig.tar.xz 6 months ago
Unlike git-buildpackage managed git repositories, the dgit managed repositories cannot incorporate the upstream git history and are thus less useful for auditing the full software supply-chain in git.

Comparing upstream source packages to git contents Equally important to the note in the beginning of the previous section, one must also keep in mind that the upstream release source packages, often called release tarballs, are not guaranteed to have the exact same contents as the upstream git repository. Projects might strip out test data or extra development files from their release tarballs to avoid shipping unnecessary files to users, or projects might add documentation files or versioning information into the tarball that isn t stored in git. While a small minority, there are also upstreams that don t use git at all, so the plain files in a release tarball is still the lowest common denominator for all open source software projects, and exporting and importing source code needs to interface with it. In the case of XZ, the release tarball has additional version info and also a sizeable amount of pregenerated compiler configuration files. Detecting and comparing differences between git contents and tarballs can of course be done manually by running diff across an unpacked tarball and a checked out git repository. If using git-buildpackage, the difference between the git contents and tarball contents can be made visible directly in the import commit. In this XZ example, consider this git history:
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
 \
  * fa1e8796 (debian/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
  * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
  * 1c462c2a Add NEWS for 5.8.1
The commit a522a226 was the upstream release commit, which upstream also tagged v5.8.1. The merge commit 2808ec2d applied the new upstream import branch contents on the Debian branch. Between these is the special commit fa1e8796 New upstream version 5.8.1 tagged upstream/v5.8. This commit and tag exists only in the Debian packaging repository, and they show what is the contents imported into Debian. This is generated automatically by git-buildpackage when running git import-orig --uscan for Debian packages with the correct settings in debian/gbp.conf. By viewing this commit one can see exactly how the upstream release tarball differs from the upstream git contents (if at all). In the case of XZ, the difference is substantial, and shown below in full as it is very interesting:
$ git show --stat fa1e8796
commit fa1e8796dabd91a0f667b9e90f9841825225413a
(debian/upstream/v5.8, upstream/v5.8)
Author: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Thu Apr 3 22:58:39 2025 +0200
New upstream version 5.8.1
.codespellrc   30 -
.gitattributes   8 -
.github/workflows/ci.yml   163 -
.github/workflows/freebsd.yml   32 -
.github/workflows/netbsd.yml   32 -
.github/workflows/openbsd.yml   35 -
.github/workflows/solaris.yml   32 -
.github/workflows/windows-ci.yml   124 -
.gitignore   113 -
ABOUT-NLS   1 +
ChangeLog   17392 +++++++++++++++++++++
Makefile.in   1097 +++++++
aclocal.m4   1353 ++++++++
build-aux/ci_build.bash   286 --
build-aux/compile   351 ++
build-aux/config.guess   1815 ++++++++++
build-aux/config.rpath   751 +++++
build-aux/config.sub   2354 +++++++++++++
build-aux/depcomp   792 +++++
build-aux/install-sh   541 +++
build-aux/ltmain.sh   11524 ++++++++++++++++++++++
build-aux/missing   236 ++
build-aux/test-driver   160 +
config.h.in   634 ++++
configure   26434 ++++++++++++++++++++++
debug/Makefile.in   756 +++++
doc/SHA256SUMS   236 --
doc/man/txt/lzmainfo.txt   36 +
doc/man/txt/xz.txt   1708 ++++++++++
doc/man/txt/xzdec.txt   76 +
doc/man/txt/xzdiff.txt   39 +
doc/man/txt/xzgrep.txt   70 +
doc/man/txt/xzless.txt   36 +
doc/man/txt/xzmore.txt   31 +
lib/Makefile.in   623 ++++
m4/.gitignore   40 -
m4/build-to-host.m4   274 ++
m4/gettext.m4   392 +++
m4/host-cpu-c-abi.m4   529 +++
m4/iconv.m4   324 ++
m4/intlmacosx.m4   71 +
m4/lib-ld.m4   170 +
m4/lib-link.m4   815 +++++
m4/lib-prefix.m4   334 ++
m4/libtool.m4   8488 +++++++++++++++++++++
m4/ltoptions.m4   467 +++
m4/ltsugar.m4   124 +
m4/ltversion.m4   24 +
m4/lt~obsolete.m4   99 +
m4/nls.m4   33 +
m4/po.m4   456 +++
m4/progtest.m4   92 +
po/.gitignore   31 -
po/Makefile.in.in   517 +++
po/Rules-quot   66 +
po/boldquot.sed   21 +
po/ca.gmo   Bin 0 -> 15587 bytes
po/cs.gmo   Bin 0 -> 7983 bytes
po/da.gmo   Bin 0 -> 9040 bytes
po/de.gmo   Bin 0 -> 29882 bytes
po/en@boldquot.header   35 +
po/en@quot.header   32 +
po/eo.gmo   Bin 0 -> 15060 bytes
po/es.gmo   Bin 0 -> 29228 bytes
po/fi.gmo   Bin 0 -> 28225 bytes
po/fr.gmo   Bin 0 -> 10232 bytes
To be able to easily inspect exactly what changed in the release tarball compared to git release tag contents, the best tool for the job is Meld, invoked via git difftool --dir-diff fa1e8796^..fa1e8796. Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between git release tag and release tarball contents To compare changes across the new and old upstream tarball, one would need to compare commits afba662b New upstream version 5.8.0 and fa1e8796 New upstream version 5.8.1 by running git difftool --dir-diff afba662b..fa1e8796. Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between to upstream release tarball contents With all the above tips you can now go and try to audit your own favorite package in Debian and see if it is identical with upstream, and if not, how it differs.

Should the XZ backdoor have been detected using these tools? The famous XZ Utils backdoor (CVE-2024-3094) consisted of two parts: the actual backdoor inside two binary blobs masqueraded as a test files (tests/files/bad-3-corrupt_lzma2.xz, tests/files/good-large_compressed.lzma), and a small modification in the build scripts (m4/build-to-host.m4) to extract the backdoor and plant it into the built binary. The build script was not tracked in version control, but generated with GNU Autotools at release time and only shipped as additional files in the release tarball. The entire reason for me to write this post was to ponder if a diligent engineer using git-buildpackage best practices could have reasonably spotted this while importing the new upstream release into Debian. The short answer is no . The malicious actor here clearly anticipated all the typical ways anyone might inspect both git commits, and release tarball contents, and masqueraded the changes very well and over a long timespan. First of all, XZ has for legitimate reasons for several carefully crafted .xz files as test data to help catch regressions in the decompression code path. The test files are shipped in the release so users can run the test suite and validate that the binary is built correctly and xz works properly. Debian famously runs massive amounts of testing in its CI and autopkgtest system across tens of thousands of packages to uphold high quality despite frequent upgrades of the build toolchain and while supporting more CPU architectures than any other distro. Test data is useful and should stay. When git-buildpackage is used correctly, the upstream commits are visible in the Debian packaging for easy review, but the commit cf44e4b that introduced the test files does not deviate enough from regular sloppy coding practices to really stand out. It is unfortunately very common for git commit to lack a message body explaining why the change was done, and to not be properly atomic with test code and test data together in the same commit, and for commits to be pushed directly to mainline without using code reviews (the commit was not part of any PR in this case). Only another upstream developer could have spotted that this change is not on par to what the project expects, and that the test code was never added, only test data, and thus that this commit was not just a sloppy one but potentially malicious. Secondly, the fact that a new Autotools file appeared (m4/build-to-host.m4) in the XZ Utils 5.6.0 is not suspicious. This is perfectly normal for Autotools. In fact, starting from XZ Utils version 5.8.1 it is now shipping a m4/build-to-host.m4 file that it actually uses now. Spotting that there is anything fishy is practically impossible by simply reading the code, as Autotools files are full custom m4 syntax interwoven with shell script, and there are plenty of backticks ( ) that spawn subshells and evals that execute variable contents further, which is just normal for Autotools. Russ Cox s XZ post explains how exactly the Autotools code fetched the actual backdoor from the test files and injected it into the build. Inspecting the m4/build-to-host.m4 changes in Meld launched via git difftool There is only one tiny thing that maybe a very experienced Autotools user could potentially have noticed: the serial 30 in the version header is way too high. In theory one could also have noticed this Autotools file deviates from what other packages in Debian ship with the same filename, such as e.g. the serial 3, serial 5a or 5b versions. That would however require and an insane amount extra checking work, and is not something we should plan to start doing. A much simpler solution would be to simply strongly recommend all open source projects to stop using Autotools to eventually get rid of it entirely.

Not detectable with reasonable effort While planting backdoors is evil, it is hard not to feel some respect to the level of skill and dedication of the people behind this. I ve been involved in a bunch of security breach investigations during my IT career, and never have I seen anything this well executed. If it hadn t slowed down SSH by ~500 milliseconds and been discovered due to that, it would most likely have stayed undetected for months or years. Hiding backdoors in closed source software is relatively trivial, but hiding backdoors in plain sight in a popular open source project requires some unusual amount of expertise and creativity as shown above.

Is the software supply-chain in Debian easy to audit? While maintaining a Debian package source using git-buildpackage can make the package history a lot easier to inspect, most packages have incomplete configurations in their debian/gbp.conf, and thus their package development histories are not always correctly constructed or uniform and easy to compare. The Debian Policy does not mandate git usage at all, and there are many important packages that are not using git at all. Additionally the Debian Policy also allows for non-maintainers to upload new versions to Debian without committing anything in git even for packages where the original maintainer wanted to use git. Uploads that bypass git unfortunately happen surpisingly often. Because of the situation, I am afraid that we could have multiple similar backdoors lurking that simply haven t been detected yet. More audits, that hopefully also get published openly, would be welcome! More people auditing the contents of the Debian archives would probably also help surface what tools and policies Debian might be missing to make the work easier, and thus help improve the security of Debian s users, and improve trust in Debian.

Is Debian currently missing some software that could help detect similar things? To my knowledge there is currently no system in place as part of Debian s QA or security infrastructure to verify that the upstream source packages in Debian are actually from upstream. I ve come across a lot of packages where the debian/watch or other configs are incorrect and even cases where maintainers have manually created upstream tarballs as it was easier than configuring automation to work. It is obvious that for those packages the source tarball now in Debian is not at all the same as upstream. I am not aware of any malicious cases though (if I was, I would report them of course). I am also aware of packages in the Debian repository that are misconfigured to be of type 1.0 (native) packages, mixing the upstream files and debian/ contents and having patches applied, while they actually should be configured as 3.0 (quilt), and not hide what is the true upstream sources. Debian should extend the QA tools to scan for such things. If I find a sponsor, I might build it myself as my next major contribution to Debian. In addition to better tooling for finding mismatches in the source code, Debian could also have better tooling for tracking in built binaries what their source files were, but solutions like Fraunhofer-AISEC s supply-graph or Sony s ESSTRA are not practical yet. Julien Malka s post about NixOS discusses the role of reproducible builds, which may help in some cases across all distros.

Or, is Debian missing some policies or practices to mitigate this? Perhaps more importantly than more security scanning, the Debian Developer community should switch the general mindset from anyone is free to do anything to valuing having more shared workflows. The ability to audit anything is severely hampered by the fact that there are so many ways to do the same thing, and distinguishing what is a normal deviation from a malicious deviation is too hard, as the normal can basically be almost anything. Also, as there is no documented and recommended default workflow, both those who are old and new to Debian packaging might never learn any one optimal workflow, and end up doing many steps in the packaging process in a way that kind of works, but is actually wrong or unnecessary, causing process deviations that look malicious, but turn out to just be a result of not fully understanding what would have been the right way to do something. In the long run, once individual developers workflows are more aligned, doing code reviews will become a lot easier and smoother as the excess noise of workflow differences diminishes and reviews will feel much more productive to all participants. Debian fostering a culture of code reviews would allow us to slowly move from the current practice of mainly solo packaging work towards true collaboration forming around those code reviews. I have been promoting increased use of Merge Requests in Debian already for some time, for example by proposing DEP-18: Encourage Continuous Integration and Merge Request based Collaboration for Debian packages. If you are involved in Debian development, please give a thumbs up in dep-team/deps!21 if you want me to continue promoting it.

Can we trust open source software? Yes and I would argue that we can only trust open source software. There is no way to audit closed source software, and anyone using e.g. Windows or MacOS just have to trust the vendor s word when they say they have no intentional or accidental backdoors in their software. Or, when the news gets out that the systems of a closed source vendor was compromised, like Crowdstrike some weeks ago, we can t audit anything, and time after time we simply need to take their word when they say they have properly cleaned up their code base. In theory, a vendor could give some kind of contractual or financial guarantee to its customer that there are no preventable security issues, but in practice that never happens. I am not aware of a single case of e.g. Microsoft or Oracle would have paid damages to their customers after a security flaw was found in their software. In theory you could also pay a vendor more to have them focus more effort in security, but since there is no way to verify what they did, or to get compensation when they didn t, any increased fees are likely just pocketed as increased profit. Open source is clearly better overall. You can, if you are an individual with the time and skills, audit every step in the supply-chain, or you could as an organization make investments in open source security improvements and actually verify what changes were made and how security improved. If your organisation is using Debian (or derivatives, such as Ubuntu) and you are interested in sponsoring my work to improve Debian, please reach out.

17 October 2025

Dirk Eddelbuettel: ML quacks: Combining duckdb and mlpack

A side project I have been working on a little since last winter and which explores extending duckdb with mlpack is now public at the duckdb-mlpack repo. duckdb is an excellent small (as in runs as a self-contained binary ) database engine with both a focus on analytical payloads (OLAP rather than OLTP) and an impressive number of already bolted-on extensions (for example for cloud data access) delivered as a single-build C++ executable (or of course as a library used from other front-ends). mlpack is an excellent C++ library containing many/most machine learning algorithms, also built in a self-contained manner (or library) making it possible to build compact yet powerful binaries, or to embed (as opposed to other ML framework accessed from powerful but not lightweight run-times such as Python or R). The compact build aspect as well as the common build tools (C++, cmake) make these two a natural candidate for combining them. Moreover, duckdb is a champion of data access, management and control and the complementary machine learning insights and predictions offered by mlpack are fully complementary and hence fit this rather well. duckdb also has a very robust and active extension system. To use it, one starts from a template repository and its use this template button, runs a script and can then start experimenting. I have now grouped my initial start and test functions into a separate repository duckdb-example-extension to keep the duckdb-mlpack one focused on the extend to mlpack aspect. duckdb-mlpack is right an MVP , i.e. a minimally viable product (or demo). It just runs the adaboost classifier but does so on any dataset fitting the rectangular setup with columns of features (real valued) and a final column (integer valued) of labels. I had hope to use two select queries for both features and then labels but it turns a table function (returning a table of data from a query) can only run one select *. So the basic demo, also on the repo README is now to run the following script (where the SELECT * FROM mlpack_adaboost((SELECT * FROM D)); is the key invocation of the added functionality):
#!/bin/bash

cat <<EOF   build/release/duckdb
SET autoinstall_known_extensions=1;
SET autoload_known_extensions=1; # for httpfs

CREATE TEMP TABLE Xd AS SELECT * FROM read_csv("https://mlpack.org/datasets/iris.csv");
CREATE TEMP TABLE X AS SELECT row_number() OVER () AS id, * FROM Xd;
CREATE TEMP TABLE Yd AS SELECT * FROM read_csv("https://mlpack.org/datasets/iris_labels.csv");
CREATE TEMP TABLE Y AS SELECT row_number() OVER () AS id, CAST(column0 AS double) as label FROM Yd;
CREATE TEMP TABLE D AS SELECT * FROM X INNER JOIN Y ON X.id = Y.id;
ALTER TABLE D DROP id;
ALTER TABLE D DROP id_1;
CREATE TEMP TABLE A AS SELECT * FROM mlpack_adaboost((SELECT * FROM D));

SELECT COUNT(*) as n, predicted FROM A GROUP BY predicted;
EOF
to produce the following tabulation / group by:
./sampleCallRemote.sh
Misclassified: 1
 
    n     predicted  
  int64     int32    
 
     50           0  
     49           1  
     51           2  
 
$
(Note that this requires the httpfs extension. So when you build from a freshly created extension repository you may be ahead of the most recent release of duckdb by a few commits. It is easy to check out the most recent release tag (or maybe the one you are running for your local duckdb binary) to take advantage of the extensions you likely already have for that version. So here, and in the middle of October 2025, I picked v1.4.1 as I run duckdb version 1.4.1 on my box.) There are many other neat duckdb extensions. The core ones are regrouped here while a list of community extensions is here and here. For this (still more minimal) extension, I added a few TODO items to the README.md: Please reach out if you are interested in working on any of this.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

14 October 2025

Dirk Eddelbuettel: qlcal 0.0.17 on CRAN: Regular Update

The seventeenth release of the qlcal package arrivied at CRAN today, once again following a QuantLib release as 1.40 came out this morning. qlcal delivers the calendaring parts of QuantLib. It is provided (for the R package) as a set of included files, so the package is self-contained and does not depend on an external QuantLib library (which can be demanding to build). qlcal covers over sixty country / market calendars and can compute holiday lists, its complement (i.e. business day lists) and much more. Examples are in the README at the repository, the package page, and course at the CRAN package page. This releases mainly synchronizes qlcal with the QuantLib release 1.40. Only one country calendar got updated; the diffstat looks larger as the URL part of the copyright got updated throughout. We also updated the URL for the GPL-2 badge: when CRAN checks this, they always hit a timeout as the FSF server possibly keeps track of incoming requests; we now link to version from the R Licenses page to avoid this.

Changes in version 0.0.17 (2025-07-14)
  • Synchronized with QuantLib 1.40 released today
  • Calendar updates for Singapore
  • URL update in README.md

Courtesy of my CRANberries, there is a diffstat report for this release. See the project page and package documentation for more details, and more examples.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

Freexian Collaborators: Debian Contributions: Old Debian Printing software and C23, Work to decommission packages.qa.debian.org, rebootstrap uses *-for-host and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-09 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Updating old Debian Printing software to meet C23 requirements, by Thorsten Alteholz The work of Thorsten fell under the motto gcc15 . Due to the introduction of gcc15 in Debian, the default language version was changed to C23. This means that for example, function declarations without parameters are no longer allowed. As old software, which was created with ANSI C (or C89) syntax, made use of such function declarations, it was a busy month. One could have used something like -std=c17 as compile flags, but this would have just postponed the tasks. As a result Thorsten uploaded modernized versions of ink, nm2ppa and rlpr for the Debian printing team.

Work done to decommission packages.qa.debian.org, by Rapha l Hertzog Rapha l worked to decommission the old package tracking system (packages.qa.debian.org). After figuring out that it was still receiving emails from the bug tracking system (bugs.debian.org), from multiple debian lists and from some release team tools, he reached out to the respective teams to either drop those emails or adjust them so that they are sent to the current Debian Package Tracker (tracker.debian.org).

rebootstrap uses *-for-host, by Helmut Grohne Architecture cross bootstrapping is an ongoing effort that has shaped Debian in various ways over the years. A longer effort to express toolchain dependencies now bears fruit. When cross compiling, it becomes important to express what architecture one is compiling for in Build-Depends. As these packages have become available in trixie , more and more packages add this extra information and in August, the libtool package gained a gfortran-for-host dependency. It was the first package in the essential build closure to adopt this and required putting the pieces together in rebootstrap that now has to build gcc-defaults early on. There still are hundreds of packages whose dependencies need to be updated though.

Miscellaneous contributions
  • Rapha l dropped the Build Log Scan integration in tracker.debian.org since it was showing stale data for a while as the underlying service has been discontinued.
  • Emilio updated pixman to 0.46.4.
  • Emilio coordinated several transitions, and NMUed guestfs-tools to unblock one.
  • Stefano uploaded Python 3.14rc3 to Debian unstable. It s not yet used by any packages, but it allows testing the level of support in packages to begin.
  • Stefano upgraded almost all of the debian-social infrastructure to Debian trixie .
  • Stefano published the sponsorship brochures for DebConf 26.
  • Stefano attended the Debian Technical Committee meeting.
  • Stefano uploaded routine upstream updates for a handful of Python packages (pycparser, beautifulsoup4, platformdirs, pycparser, python-authlib, python-cffi, python-mitogen, python-resolvelib, python-super-collections, twine).
  • Stefano reviewed and responded to DebConf 25 feedback.
  • Stefano investigated and fixed a request visibility bug in debian-reimbursements (for admin-altered requests).
  • Lucas reviewed a couple of merge requests from external contributors for Go and Ruby packages.
  • Lucas updated some ruby packages to its latest upstream version (thin, passenger, and puma is still WIP).
  • Lucas set up the build environment to run rebuilds of reverse dependencies of ruby using ruby3.4. As an alternative, he is looking for personal repositories provided by Debusine to perform this task more easily. This is the preparation for the transition to ruby3.4 as the default in Debian.
  • Lucas helped on the next round of the Outreachy internship program.
  • Helmut sent patches for 30 cross build failures and responded to cross building support questions on the mailing list.
  • Helmut continued to maintain rebootstrap. As gcc version 15 became the default, test jobs for version 14 had to be dropped. A fair number of patches were applied to packages and could be dropped.
  • Helmut resumed removing RC-buggy packages from unstable and sponsored a termrec upload to avoid its deletion. This work was paused to give packages some time to migrate to forky .
  • Santiago reviewed different merge requests created by different contributors. Those MRs include a new test to build reverse dependencies, created by Aquila Macedo as part of his GSoC internship; restore how lintian was used in experimental, thanks Otto Kek l inen; and the fix by Christian Bayle to support again extra repositories in deb822-style sources, whose support was broken with the move to sbuild+unshare last month.
  • While doing some new upstream release updates, thanks to Debusine s reverse dependencies autopkgtest checks, Santiago discovered that paramiko 4.0 will introduce a regression in libcloud by the drop of support for the obsolete DSA keys. Santiago finally uploaded to unstable both paramiko 4.0, and a regression fix for libcloud.
  • Santiago has taken part in different discussions and meetings for the preparation of DebConf 26. The DebConf 26 local team aims to prepare for the conference with enough time in advance.
  • Carles kept working on the missing-package-relations and reporting missing Recommends. He improved the tooling to detect and report bugs creating 269 bugs and followed up comments. 37 bugs have been resolved, others acknowledged. The missing Recommends are a mixture of packages that are gone from Debian, packages that changed name, typos and also packages that were recommended but are not packaged in Debian.
  • Carles improved the missing-package-relations to report broken Suggests only for packages that used to be in Debian but are removed from it now. No bugs have been created yet for this case but identified 1320 of them.
  • Colin spent much of the month chasing down build/test regressions in various Python packages due to other upgrades, particularly relating to pydantic, python-pytest-asyncio, and rust-pyo3.
  • Colin optimized some code in ubuntu-dev-tools (affecting e.g. pull-debian-source) that made O(n) HTTP requests when it could instead make O(1).
  • Anupa published Micronews as part of Debian Publicity team work.

10 October 2025

Sergio Cipriano: Avoiding 5XX errors by adjusting Load Balancer Idle Timeout

Avoiding 5XX errors by adjusting Load Balancer Idle Timeout Recently I faced a problem in production where a client was running a RabbitMQ server behind the Load Balancers we provisioned and the TCP connections were closed every minute. My team is responsible for the LBaaS (Load Balancer as a Service) product and this Load Balancer was an Envoy proxy provisioned by our control plane. The error was similar to this:
[2025-10-03 12:37:17,525 - pika.adapters.utils.connection_workflow - ERROR] AMQPConnector - reporting failure: AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: ''/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('<IP>', 5672))")
[2025-10-03 12:37:17,526 - pika.adapters.utils.connection_workflow - ERROR] AMQP connection workflow failed: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: ''/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('<IP>', 5672))"); first exception - None.
[2025-10-03 12:37:17,526 - pika.adapters.utils.connection_workflow - ERROR] AMQPConnectionWorkflow - reporting failure: AMQPConnectionWorkflowFailed: 1 exceptions in all; last exception - AMQPConnectorSocketConnectError: timeout("TCP connection attempt timed out: ''/(<AddressFamily.AF_INET: 2>, <SocketKind.SOCK_STREAM: 1>, 6, '', ('<IP>', 5672))"); first exception - None
At first glance, the issue is simple: the Load Balancer's idle timeout is shorter than the RabbitMQ heartbeat interval. The idle timeout is the time at which a downstream or upstream connection will be terminated if there are no active streams. Heartbeats generate periodic network traffic to prevent idle TCP connections from closing prematurely. Adjusting these timeout settings to align properly solved the issue. However, what I want to explore in this post are other similar scenarios where it's not so obvious that the idle timeout is the problem. Introducing an extra network layer, such as an Envoy proxy, can introduce unpredictable behavior across your services, like intermittent 5XX errors. To make this issue more concrete, let's look at a minimal, reproducible setup that demonstrates how adding an Envoy proxy can lead to sporadic errors.

Reproducible setup I'll be using the following tools: This setup is based on what Kai Burjack presented in his article. Setting up Envoy with Docker is straightforward:
$ docker run \
    --name envoy --rm \
    --network host \
    -v $(pwd)/envoy.yaml:/etc/envoy/envoy.yaml \
    envoyproxy/envoy:v1.33-latest
I'll be running experiments with two different envoy.yaml configurations: one that uses Envoy's TCP proxy, and another that uses Envoy's HTTP connection manager. Here's the simplest Envoy TCP proxy setup: a listener on port 8000 forwarding traffic to a backend running on port 8080.
static_resources:
  listeners:
  - name: go_server_listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8000
    filter_chains:
    - filters:
      - name: envoy.filters.network.tcp_proxy
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
          stat_prefix: go_server_tcp
          cluster: go_server_cluster
  clusters:
  - name: go_server_cluster
    connect_timeout: 1s
    type: static
    load_assignment:
      cluster_name: go_server_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 127.0.0.1
                port_value: 8080
The default idle timeout if not otherwise specified is 1 hour, which is the case here. The backend setup is simple as well:
package main

import (
    "fmt"
    "net/http"
    "time"
)

func helloHandler(w http.ResponseWriter, r *http.Request)  
  w.Write([]byte("Hello from Go!"))
 

func main()  
    http.HandleFunc("/", helloHandler)

    server := http.Server 
        Addr:        ":8080",
        IdleTimeout: 3 * time.Second,
     

    fmt.Println("Starting server on :8080")
    panic(server.ListenAndServe())
 
The IdleTimeout is set to 3 seconds to make it easier to test. Now, oha is the perfect tool to generate the HTTP requests for this test. The Load test is not meant to stress this setup, the idea is to wait long enough so that some requests are closed. The burst-delay feature will help with that:
$ oha -z 30s -w --burst-delay 3s --burst-rate 100 http://localhost:8000
I'm running the Load test for 30 seconds, sending 100 requests at three-second intervals. I also use the -w option to wait for ongoing requests when the duration is reached. The result looks like this: oha test report tcp fail We had 886 responses with status code 200 and 64 connections closed. The backend terminated 64 connections while the load balancer still had active requests directed to it. Let's change the Load Balancer idle_timeout to two seconds.
filter_chains:
- filters:
  - name: envoy.filters.network.tcp_proxy
    typed_config:
      "@type": type.googleapis.com/envoy.extensions.filters.network.tcp_proxy.v3.TcpProxy
      stat_prefix: go_server_tcp
      cluster: go_server_cluster
      idle_timeout: 2s # <--- NEW LINE
Run the same test again. oha test report tcp success Great! Now all the requests worked. This is a common issue, not specific to Envoy Proxy or the setup shown earlier. Major cloud providers have all documented it. AWS troubleshoot guide for Application Load Balancers says this: The target closed the connection with a TCP RST or a TCP FIN while the load balancer had an outstanding request to the target. Check whether the keep-alive duration of the target is shorter than the idle timeout value of the load balancer. Google troubleshoot guide for Application Load Balancers mention this as well: Verify that the keepalive configuration parameter for the HTTP server software running on the backend instance is not less than the keepalive timeout of the load balancer, whose value is fixed at 10 minutes (600 seconds) and is not configurable. The load balancer generates an HTTP 5XX response code when the connection to the backend has unexpectedly closed while sending the HTTP request or before the complete HTTP response has been received. This can happen because the keepalive configuration parameter for the web server software running on the backend instance is less than the fixed keepalive timeout of the load balancer. Ensure that the keepalive timeout configuration for HTTP server software on each backend is set to slightly greater than 10 minutes (the recommended value is 620 seconds). RabbitMQ docs also warn about this: Certain networking tools (HAproxy, AWS ELB) and equipment (hardware load balancers) may terminate "idle" TCP connections when there is no activity on them for a certain period of time. Most of the time it is not desirable. Most of them are talking about Application Load Balancers and the test I did was using a Network Load Balancer. For the sake of completeness, I will do the same test but using Envoy's HTTP connection manager. The updated envoy.yaml:
static_resources:
  listeners:
  - name: listener
    address:
      socket_address:
        address: 0.0.0.0
        port_value: 8000
    filter_chains:
    - filters:
      - name: envoy.filters.network.http_connection_manager
        typed_config:
          "@type": type.googleapis.com/envoy.extensions.filters.network.http_connection_manager.v3.HttpConnectionManager
          stat_prefix: go_server_http
          access_log:
          - name: envoy.access_loggers.stdout
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.access_loggers.stream.v3.StdoutAccessLog
          http_filters:
          - name: envoy.filters.http.router
            typed_config:
              "@type": type.googleapis.com/envoy.extensions.filters.http.router.v3.Router
          route_config:
            name: http_route
            virtual_hosts:
            - name: local_service
              domains: ["*"]
              routes:
              - match:
                  prefix: "/"
                route:
                  cluster: go_server_cluster
  clusters:
  - name: go_server_cluster
    type: STATIC
    load_assignment:
      cluster_name: go_server_cluster
      endpoints:
      - lb_endpoints:
        - endpoint:
            address:
              socket_address:
                address: 0.0.0.0
                port_value: 8080
The yaml above is an example of a service proxying HTTP from 0.0.0.0:8000 to 0.0.0.0:8080. The only difference from a minimal configuration is that I enabled access logs. Let's run the same tests with oha. oha test report http fail Even thought the success rate is 100%, the status code distribution show some responses with status code 503. This is the case where is not that obvious that the problem is related to idle timeout. However, it's clear when we look the Envoy access logs:
[2025-10-10T13:32:26.617Z] "GET / HTTP/1.1" 503 UC 0 95 0 - "-" "oha/1.10.0" "9b1cb963-449b-41d7-b614-f851ced92c3b" "localhost:8000" "0.0.0.0:8080"
UC is the short name for UpstreamConnectionTermination. This means the upstream, which is the golang server, terminated the connection. To fix this once again, the Load Balancer idle timeout needs to change:
  clusters:
  - name: go_server_cluster
    type: STATIC
    typed_extension_protocol_options: # <--- NEW BLOCK
      envoy.extensions.upstreams.http.v3.HttpProtocolOptions:
        "@type": type.googleapis.com/envoy.extensions.upstreams.http.v3.HttpProtocolOptions
        common_http_protocol_options:
          idle_timeout: 2s # <--- NEW VALUE
        explicit_http_config:
          http_protocol_options:  
Finally, the sporadic 503 errors are over: oha test report http success

To Sum Up Here's an example of the values my team recommends to our clients: recap drawing Key Takeaways:
  1. The Load Balancer idle timeout should be less than the backend (upstream) idle/keepalive timeout.
  2. When we are working with long lived connections, the client (downstream) should use a keepalive smaller than the LB idle timeout.

2 October 2025

Dirk Eddelbuettel: tinythemes 0.0.4 at CRAN: Micro Maintenance

tinythemes demo A tiniest of tiny violins micro maintenance release 0.0.4 of our tinythemes arrived on CRAN today. tinythemes provides the theme_ipsum_rc() function from hrbrthemes by Bob Rudis in a zero (added) dependency way. A simple example is (also available as a demo inside the package) contrasts the default style (on the left) with the one added by this package (on the right): This version adjusts to the fact that hrbrthemes is no longer on CRAN so the help page cannot link to its documentation. No other changes were made. The set of changes since the last release follows.

Changes in tinythemes version 0.0.4 (2025-10-02)
  • No longer \link to now-archived hrbrthemes

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the repo where comments and suggestions are welcome.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

1 October 2025

Dirk Eddelbuettel: #052: Running r-ci with Coverage

Welcome to post 52 in the R4 series. Following up on the post #51 yesterday and its stated intent of posting some more here The r-ci setup (which was introduced in post #32 and updated in post #45) offers portable continuous integration which can take advantage of different backends: Github Actions, Azure Pipelines, GitLab, BitBuckets, and possibly as it only requires a basic Ubuntu shell after which it customizes itself and runs via shell script. Portably. Now, most of us, I suspect still use it with GitHub Actions but it is reassuring to know that one can take it elsewhere should the need or desire arise. One thing many repos did, and which stopped working reliably is coverage analysis. This is made easy by the covr package, and made fast, easy, reliable (as we quip) thanks to r2u. But transmission to codecov stopped working a while back so I had mostly commented it out in my repos, rendering the reports more and more stale. Which is not ideal. A few weeks ago I gave this another look, and it turns that that codecov now requires an API token to upload. Which one can generate in the user -> settings menu on their website under the access tab. Which in turn then needs to be stored in each repo wanting to upload. For Github, this is under settings -> secrets and variables -> actions as a repository secret . I suggest using CODECOV_TOKEN for its name. After that the three-line block in the yaml file can reference it as a secret as in the following snippet, now five lines, taken from one of my ci.yaml files:
      - name: Coverage
        if: $  matrix.os == 'ubuntu-latest'  
        run: ./run.sh coverage
        env:
          CODECOV_TOKEN: $  secrets.CODECOV_TOKEN  
It assigns the secret we stored on the website, references it by prefix secrets. and assigns it to the environment variable CODECOV_TOKEN. After this reports flow as one can see on repositories where I re-enabled this as for example here for RcppArmadillo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

30 September 2025

Dirk Eddelbuettel: #051: A Neat Little Rcpp Trick

Welcome to post 51 in the R4 series. A while back I realized I should really just post a little more as not all post have to be as deep and introspective as for example the recent-ish two cultures post #49. So this post is a neat little trick I (somewhat belatedly) realized somewhat recently. The context is the ongoing transition from (Rcpp)Armadillo 14.6.3 and earlier to (Rcpp)Armadillo 15.0.2 or later. (I need to write a bit more about that, but that may require a bit more time.) (And there are a total of seven (!!) issue tickets managing the transition with issue #475 being the main parent issue, please see there for more details.) In brief, the newer and current Armadillo no longer allows C++11 (which also means it no longer allowes suppression of deprecation warnings ). It so happens that around a decade ago packages were actively encouraged to move towards C++11 so many either set an explicit SystemRequirements: for it, or set CXX_STD=CXX11 in src/Makevars .win . CRAN has for some time now issued NOTEs asking for this to be removed, and more recently enforced this with actual deadlines. In RcppArmadillo I opted to accomodate old(er) packages (using this by-now anti-pattern) and flip to Armadillo 14.6.3 during a transition period. That is what the package does now: It gives you either Armadillo 14.6.3 in case C++11 was detected (or this legacy version was actively selected via a compile-time #define), or it uses Armadillo 15.0.2 or later. So this means we can have either one of two versions, and may want to know which one we have. Armadillo carries its own version macros, as many libraries or projects do (R of course included). Many many years ago (git blame points to sixteen and twelve for a revision) we added the following helper function to the package (full source here, we show it here without the full roxygen2 comment header)
// [[Rcpp::export]]
Rcpp::IntegerVector armadillo_version(bool single)  

    // These are declared as constexpr in Armadillo which actually does not define them
    // They are also defined as macros in arma_version.hpp so we just use that
    const unsigned int major = ARMA_VERSION_MAJOR;
    const unsigned int minor = ARMA_VERSION_MINOR;
    const unsigned int patch = ARMA_VERSION_PATCH;

    if (single)  
        return Rcpp::wrap(10000 * major + 100 * minor + patch) ;
      else  
        return Rcpp::IntegerVector::create(Rcpp::Named("major") = major,
                                           Rcpp::Named("minor") = minor,
                                           Rcpp::Named("patch") = patch);
     
 
It either returns a (named) vector of the standard major , minor , patch form of the common package versioning pattern, or a single integer which can used more easily in C(++) via preprocessor macros. And this being an Rcpp-using package, we can of course access either easily from R:
> library(RcppArmadillo)
> armadillo_version(FALSE)
major minor patch 
   15     0     2 
> armadillo_version(TRUE)
[1] 150002
>
Perfectly valid and truthful. But cumbersome at the R level. So when preparing for these (Rcpp)Armadillo changes in one of my package, I realized I could alter such a function and set the S3 type to package_version. (Full version of one such variant here)
// [[Rcpp::export]]
Rcpp::List armadilloVersion()  
    // create a vector of major, minor, patch
    auto ver = Rcpp::IntegerVector::create(ARMA_VERSION_MAJOR, ARMA_VERSION_MINOR, ARMA_VERSION_PATCH);
    // and place it in a list (as e.g. packageVersion() in R returns)
    auto lst = Rcpp::List::create(ver);
    // and class it as 'package_version' accessing print() etc methods
    lst.attr("class") = Rcpp::CharacterVector::create("package_version", "numeric_version");
    return lst;
 
Three statements each to and return the value. And now in R we can operate more easily on this (using three dots as I didn t export it from this package):
> RcppDE:::armadilloVersion()
[1]  15.0.2 
> RcppDE:::armadilloVersion() >= "15.0.0"
[1] TRUE
>
An object of class package_version inheriting from numeric_version can directly compare against a (human- but not normally machine-readable) string like 15.0.0 because the simple S3 class defines appropriate operators, as well as print() / format() methods as the first expression shows. It is these little things that make working with R so smooth, and we can easily (three statements !!) do so from Rcpp-based packages too. The underlying object really is merely a list containing a vector:
> unclass(RcppDE:::armadilloVersion())
[[1]]
[1] 15  0  2

> 
but the S3 glue around it makes it behave nicely. So next time you are working with an object you plan to return to R, consider classing it to take advantage of existing infrastructure (if it exists, of course). It s easy enough to do, and may smoothen the experience at the R side.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

29 September 2025

Russ Allbery: Review: The Incandescent

Review: The Incandescent, by Emily Tesh
Publisher: Tor
Copyright: 2025
ISBN: 1-250-83502-X
Format: Kindle
Pages: 417
The Incandescent is a stand-alone magical boarding school fantasy.
Your students forgot you. It was natural for them to forget you. You were a brief cameo in their lives, a walk-on character from the prologue. For every sentimental my teacher changed my life story you heard, there were dozens of my teacher made me moderately bored a few times a week and then I got through the year and moved on with my life and never thought about them again. They forgot you. But you did not forget them.
Doctor Saffy Walden is Director of Magic at Chetwood, an elite boarding school for prospective British magicians. She has a collection of impressive degrees in academic magic, a specialization in demonic invocation, and a history of vague but lucrative government job offers that go with that specialty. She turned them down to be a teacher, and although she's now in a mostly administrative position, she's a good teacher, with the usual crop of promising, lazy, irritating, and nervous students. As the story opens, Walden's primary problem is Nikki Conway. Or, rather, Walden's primary problem is protecting Nikki Conway from the Marshals, and the infuriating Laura Kenning in particular. When Nikki was seven, she summoned a demon who killed her entire family and left her a ward of the school. To Laura Kenning, that makes her a risk who should ideally be kept far away from invocation. To Walden, that makes Nikki a prodigious natural talent who is developing into a brilliant student and who needs careful, professional training before she's tempted into trying to learn on her own. Most novels with this setup would become Nikki's story. This one does not. The Incandescent is Walden's story. There have been a lot of young-adult magical boarding school novels since Harry Potter became a mass phenomenon, but most of them focus on the students and the inevitable coming-of-age story. This is a story about the teachers: the paperwork, the faculty meetings, the funding challenges, the students who repeat in endless variations, and the frustrations and joys of attempting to grab the interest of a young mind. It's also about the temptation of higher-paying, higher-status, and less ethical work, which however firmly dismissed still nibbles around the edges. Even if you didn't know Emily Tesh is herself a teacher, you would guess that before you get far into this novel. There is a vividness and a depth of characterization that comes from being deeply immersed in the nuance and tedium of the life that your characters are living. Walden's exasperated fondness for her students was the emotional backbone of this book for me. She likes teenagers without idealizing the process of being a teenager, which I think is harder to pull off in a novel than it sounds.
It was hard to quantify the difference between a merely very intelligent student and a brilliant one. It didn't show up in a list of exam results. Sometimes, in fact, brilliance could be a disadvantage when all you needed to do was neatly jump the hoop of an examiner's grading rubric without ever asking why. It was the teachers who knew, the teachers who could feel the difference. A few times in your career, you would have the privilege of teaching someone truly remarkable; someone who was hard work to teach because they made you work harder, who asked you questions that had never occurred to you before, who stretched you to the very edge of your own abilities. If you were lucky as Walden, this time, had been lucky your remarkable student's chief interest was in your discipline: and then you could have the extraordinary, humbling experience of teaching a child whom you knew would one day totally surpass you.
I also loved the world-building, and I say this as someone who is generally not a fan of demons. The demons themselves are a bit of a disappointment and mostly hew to one of the stock demon conventions, but the rest of the magic system is deep enough to have practitioners who approach it from different angles and meaty enough to have some satisfying layered complexity. This is magic, not magical science, so don't expect a fully fleshed-out set of laws, but the magical system felt substantial and satisfying to me. Tesh's first novel, Some Desperate Glory, was by far my favorite science fiction novel of 2023. This is a much different book, which says good things about Tesh's range and the potential of her work yet to come: adult rather than YA, fantasy rather than science fiction, restrained and subtle in places where Some Desperate Glory was forceful and pointed. One thing the books do have in common, though, is some structure, particularly the false climax near the midpoint of the book. I like the feeling of uncertainty and possibility that gives both books, but in the case of The Incandescent, I was not quite in the mood for the second half of the story. My problem with this book is more of a reader preference than an objective critique: I was in the mood for a story about a confident, capable protagonist who was being underestimated, and Tesh was writing a novel with a more complicated and fraught emotional arc. (I'm being intentionally vague to avoid spoilers.) There's nothing wrong with the story that Tesh wanted to tell, and I admire the skill with which she did it, but I got a tight feeling in my stomach when I realized where she was going. There is a satisfying ending, and I'm still very happy I read this book, but be warned that this might not be the novel to read if you're in the mood for a purer competence porn experience. Recommended, and I am once again eagerly awaiting the next thing Emily Tesh writes (and reminding myself to go back and read her novellas). Content warnings: Grievous physical harm, mind control, and some body horror. Rating: 8 out of 10

25 September 2025

Dirk Eddelbuettel: tint 0.1.6 on CRAN: Maintenance

A new version 0.1.6 of the tint package arrived at CRAN today. tint provides a style not unlike Tufte for use in html and pdf documents created from markdown. The github repo shows several examples in its README, more as usual in the package documentation. This release addresses a small issue where in pdf mode, pandoc (3.2.1 or newer) needs a particular macros defined when static (premade) images or figure files are included. No other changes.

Changes in tint version 0.1.6 (2025-09-25)
  • An additional LaTeX command needed by pandoc (>= 3.2.1) has been defined for the two pdf variants.

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More information is on the tint page. For questions or comments use the issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

kpcyrd: Release: rebuilderd v0.25.0

rebuilderd v0.25.0 was recently released, this version has improved in-toto support for cryptographic attestations that this blog post briefly outlines. As a quick recap, rebuilderd is an automatic build scheduler that emerged in 2019/2020 from the Reproducible Builds project doing the following:
  1. Track binary packages available in a Linux distribution
  2. Attempt to compile the official binary packages from their (alleged) source code
  3. Check if the package we compiled is bit-for-bit identical
    1. If so, mark it GOOD, issue an attestation
    2. In every other case, mark it BAD, generate a diffoscope
The binary packages in question are explicitly the packages users would also fetch and install. This project has caught the attention of Arch Linux, Debian and Fedora.

Before this version The original in-toto integration was added 4 years ago by Joy Liu during GSoC 2021, with help from Santiago Torres and Aditya Sirish (shoutout to the real ones!). Each rebuilderd-worker had its own cryptographic key and included a signed attestation along with the build result that could then be fetched from /api/v0/builds/ id /attestation. Since these workers are potentially ephemeral, and the list of worker public keys wasn t publicly known, it was difficult to make use of those signatures.

Since this version This version introduces the following:
  1. The rebuilderd daemon itself generates a long-term signing key
  2. All attestations signed by a trusted worker also get signed by the rebuilderd daemon
  3. The rebuilderd daemon gets a new endpoint that can be used to query the public-key this instance identifies with: /api/v0/public-keys
An example of this new endpoint can be found here: https://reproducible.archlinux.org/api/v0/public-keys The response looks something like this (this is the real long-term signing key used by reproducible.archlinux.org):
 
    "current": [
        "-----BEGIN PUBLIC KEY-----\r\nMCwwBwYDK2VwBQADIQBLNcEcgErQ1rZz9oIkUnzc3fPuqJEALr22rNbrBK7iqQ==\r\n-----END PUBLIC KEY-----\r\n"
    ]
 
It s a list so keys can potentially be rolled over time, and in future versions it should also list the public keys the instance has used in the past. I haven t develop any integrations for this yet (partially also to allow deployments to catch up with the new release), but I m planning to do so using the in-toto crate.

Closing words To give credit where credit is due (and because people pointed out I tend to end my blog posts too abruptly), rebuilderd is only the scheduler software, the actual build in the correct build-environment is outsourced to external tooling like archlinux-repro and debrebuild. For further reading on applied reproducible builds, see also my previous blogpost Disagreeing rebuilders and what that means. Also, there are currently efforts by the European Commission to outlaw unregulated end-to-end encrypted chat, so this may be a good time to prepare for (potential) impact and check what tools you have available to reduce unchecked trust in (open source) software authorities, to keep them operating honest and accountable. Never lose the plot~ Sincerely yours

23 September 2025

Ravi Dwivedi: Singapore Trip

In December 2024, I went on a trip through four countries - Singapore, Malaysia, Brunei, and Vietnam - with my friend Badri. This post covers our experiences in Singapore. I took an IndiGo flight from Delhi to Singapore, with a layover in Chennai. At the Chennai airport, I was joined by Badri. We had an early morning flight from Chennai that would land in Singapore in the afternoon. Within 48 hours of our scheduled arrival in Singapore, we submitted an arrival card online. At immigration, we simply needed to scan our passports at the gates, which opened automatically to let us through, and then give our address to an official nearby. The process was quick and smooth, but it unfortunately meant that we didn t get our passports stamped by Singapore. Before I left the airport, I wanted to visit the nature-themed park with a fountain I saw in pictures online. It is called Jewel Changi, and it took quite some walking to get there. After reaching the park, we saw a fountain that could be seen from all the levels. We roamed around for a couple of hours, then proceeded to the airport metro station to get to our hotel.
Jewel Changi A shot of Jewel Changi. Photo by Ravi Dwivedi. Released under the CC-BY-SA 4.0.
There were four ATMs on the way to the metro station, but none of them provided us with any cash. This was the first country (outside India, of course!) where my card didn t work at ATMs. To use the metro, one can tap the EZ-Link card or bank cards at the AFC gates to get in. You cannot buy tickets using cash. Before boarding the metro, I used my credit card to get Badri an EZ-Link card from a vending machine. It was 10 Singapore dollars ( 630) - 5 for the card, and 5 for the balance. I had planned to use my Visa credit card to pay for my own fare. I was relieved to see that my card worked, and I passed through the AFC gates. We had booked our stay at a hostel named Campbell s Inn, which was the cheapest we could find in Singapore. It was 1500 per night for dorm beds. The hostel was located in Little India. While Little India has an eponymous metro station, the one closest to our hostel was Rochor. On the way to the hostel, we found out that our booking had been canceled. We had booked from the Hostelworld website, opting to pay the deposit in advance and to pay the balance amount in person upon reaching. However, Hostelworld still tried to charge Badri s card again before our arrival. When the unauthorized charge failed, they sent an automatic message saying we tried to charge and to contact them soon to avoid cancellation, which we couldn t do as we were in the plane. Despite this, we went to the hostel to check the status of our booking. The trip from the airport to Rochor required a couple of transfers. It was 2 Singapore dollars (approx. 130) and took approximately an hour. Upon reaching the hostel, we were informed that our booking had indeed been canceled, and were not given any reason for the cancelation. Furthermore, no beds were available at the hostel for us to book on the spot. We decided to roam around and look for accommodation at other hostels in the area. Soon, we found a hostel by the name of Snooze Inn, which had two beds available. It was 36 Singapore dollars per person (around 2300) for a dormitory bed. Snooze Inn advertised supporting RuPay cards and UPI. Some other places in that area did the same. We paid using my card. We checked in and slept for a couple of hours after taking a shower. By the time we woke up, it was dark. We met Praveen s friend Sabeel to get my FLX1 phone. We also went to Mustafa Center nearby to exchange Indian rupees for Singapore dollars. Mustafa Center also had a shopping center with shops selling electronic items and souvenirs, among other things. When we were dropping off Sabeel at a bus stop, we discovered that the bus stops in Singapore had a digital board mentioning the bus routes for the stop and the number of minutes each bus was going to take. In addition to an organized bus system, Singapore had good pedestrian infrastructure. There were traffic lights and zebra crossings for pedestrians to cross the roads. Unlike in Indian cities, rules were being followed. Cars would stop for pedestrians at unmanaged zebra crossings; pedestrians would in turn wait for their crossing signal to turn green before attempting to walk across. Therefore, walking in Singapore was easy. Traffic rules were taken so seriously in Singapore I (as a pedestrian) was afraid of unintentionally breaking them, which could get me in trouble, as breaking rules is dealt with heavy fines in the country. For example, crossing roads without using a marked crossing (while being within 50 meters of it) - also known as jaywalking - is an offence in Singapore. Moreover, the streets were litter-free, and cleanliness seemed like an obsession. After exploring Mustafa Center, we went to a nearby 7-Eleven to top up Badri s EZ-Link card. He gave 20 Singapore dollars for the recharge, which credited the card by 19.40 Singapore dollars (0.6 dollars being the recharge fee). When I was planning this trip, I discovered that the World Chess Championship match was being held in Singapore. I seized the opportunity and bought a ticket in advance. The next day - the 5th of December - I went to watch the 9th game between Gukesh Dommaraju of India and Ding Liren of China. The venue was a hotel on Sentosa Island, and the ticket was 70 Singapore dollars, which was around 4000 at the time. We checked out from our hostel in the morning, as we were planning to stay with Badri s aunt that night. We had breakfast at a place in Little India. Then we took a couple of buses, followed by a walk to Sentosa Island. Paying the fare for the buses was similar to the metro - I tapped my credit card in the bus, while Badri tapped his EZ-Link card. We also had to tap it while getting off. If you are tapping your credit card to use public transport in Singapore, keep in mind that the total amount of all the trips taken on a day is deducted at the end. This makes it hard to determine the cost of individual trips. For example, I could take a bus and get off after tapping my card, but I would have no way to determine how much this journey cost. When you tap in, the maximum fare amount gets deducted. When you tap out, the balance amount gets refunded (if it s a shorter journey than the maximum fare one). So, there is incentive for passengers not to get off without tapping out. Going by your card statement, it looks like all that happens virtually, and only one statement comes in at the end. Maybe this combining only happens for international cards. We got off the bus a kilometer away from Sentosa Island and walked the rest of the way. We went on the Sentosa Boardwalk, which is itself a tourist attraction. I was using Organic Maps to navigate to the hotel Resorts World Sentosa, but Organic Maps route led us through an amusement park. I tried asking the locals (people working in shops) for directions, but it was a Chinese-speaking region, and they didn t understand English. Fortunately, we managed to find a local who helped us with the directions.
Sentosa Boardwalk A shot of Sentosa Boardwalk. Photo by Ravi Dwivedi. Released under the CC-BY-SA 4.0.
Following the directions, we somehow ended up having to walk on a road which did not have pedestrian paths. Singapore is a country with strict laws, so we did not want to walk on that road. Avoiding that road led us to the Michael Hotel. There was a person standing at the entrance, and I asked him for directions to Resorts World Sentosa. The person told me that the bus (which was standing at the entrance) would drop me there! The bus was a free service for getting to Resorts World Sentosa. Here I parted ways with Badri, who went to his aunt s place. I got to the Resorts Sentosa and showed my ticket to get in. There were two zones inside - the first was a room with a glass wall separating the audience and the players. This was the room to watch the game physically, and resembled a zoo or an aquarium. :) The room was also a silent room, which means talking or making noise was prohibited. Audiences were only allowed to have mobile phones for the first 30 minutes of the game - since I arrived late, I could not bring my phone inside that room. The other zone was outside this room. It had a big TV on which the game was being broadcast along with commentary by David Howell and Jovanka Houska - the official FIDE commentators for the event. If you don t already know, FIDE is the authoritative international chess body. I spent most of the time outside that silent room, giving me an opportunity to socialize. A lot of people were from Singapore. I saw there were many Indians there as well. Moreover, I had a good time with Vasudevan, a journalist from Tamil Nadu who was covering the match. He also asked questions to Gukesh during the post-match conference. His questions were in Tamil to lift Gukesh s spirits, as Gukesh is a Tamil speaker. Tea and coffee were free for the audience. I also bought a T-shirt from their stall as a souvenir. After the game, I took a shuttle bus from Resorts World Sentosa to a metro station, then travelled to Pasir Ris by metro, where Badri was staying with his aunt. I thought of getting something to eat, but could not find any caf s or restaurants while I was walking from the Pasir Ris metro station to my destination, and was positively starving when I got there. Badri s aunt s place was an apartment in a gated community. On the gate was a security guard who asked me the address of the apartment. Upon entering, there were many buildings. To enter the building, you need to dial the number of the apartment you want to go to and speak to them. I had seen that in the TV show Seinfeld, where Jerry s friends used to dial Jerry to get into his building. I was afraid they might not have anything to eat because I told them I was planning to get something on the way. This was fortunately not the case, and I was relieved to not have to sleep with an empty stomach. Badri s uncle gave us an idea of how safe Singapore is. He said that even if you forget your laptop in a public space, you can go back the next day to find it right there in the same spot. I also learned that owning cars was discouraged in Singapore - the government imposes a high registration fee on them, while also making public transport easy to use and affordable. I also found out that 7-Eleven was not that popular among residents in Singapore, unlike in Malaysia or Thailand. The next day was our third and final day in Singapore. We had a bus in the evening to Johor Bahru in Malaysia. We got up early, had breakfast, and checked out from Badri s aunt s home. A store by the name of Cat Socrates was our first stop for the day, as Badri wanted to buy some stationery. The plan was to take the metro, followed by the bus. So we got to Pasir Ris metro station. Next to the metro station was a mall. In the mall, Badri found an ATM where our cards worked, and we got some Singapore dollars. It was noon when we reached the stationery shop mentioned above. We had to walk a kilometer from the place where the bus dropped us. It was a hot, sunny day in Singapore, so walking was not comfortable. We had to go through residential areas in Singapore. We saw some non-touristy parts of Singapore. After we were done with the stationery shop, we went to a hawker center to get lunch. Hawker centers are unique to Singapore. They have a lot of shops that sell local food at cheap prices. It is similar to a food court. However, unlike the food courts in malls, hawker centers are open-air and can get quite hot.
Jewel Changi This is the hawker center we went to. Photo by Ravi Dwivedi. Released under the CC-BY-SA 4.0.
To have something, you just need to buy it from one of the shops and find a table. After you are done, you need to put your tray in the tray-collecting spots. I had a kaya toast with chai, since there weren t many vegetarian options. I also bought a persimmon from a nearby fruit vendor. On the other hand, Badri sampled some local non-vegetarian dishes.
A sign saying, 'No table littering, by law.' Table littering at the hawker center was prohibited by law. Photo by Ravi Dwivedi. Released under the CC-BY-SA 4.0.
Next, we took a metro to Raffles Place, as we wanted to visit Merlion, the icon of Singapore. It is a statue having the head of a lion and the body of a fish. While getting through the AFC gates, my card was declined. Therefore, I had to buy an EZ-Link card, which I had been avoiding because the card itself costs 5 Singapore dollars. From the Raffles Place metro station, we walked to Merlion. The place also gave a nice view of Marina Bay Sands. It was filled with tourists clicking pictures, and we also did the same.
Merlion from behind Merlion from behind, giving a good view of Marina Bay Sands. Photo by Ravi Dwivedi. Released under the CC-BY-SA 4.0.
After this, we went to the bus stop to catch our bus to the border city of Johor Bahru, Malaysia. The bus was more than an hour late, and we worried that we had missed the bus. I asked an Indian woman at the stop who also planned to take the same bus, and she told us that the bus was late. Finally, our bus arrived, and we set off for Johor Bahru. Before I finish, let me give you an idea of my expenditure. Singapore is an expensive country, and I realized that expenses could go up pretty quickly. Overall, my stay in Singapore for 3 days and 2 nights was approx. 5500 rupees. That too, when we stayed one night at Badri s aunt s place (so we didn t have to pay for accomodation for one of the nights) and didn t have to pay for a couple of meals. This amount doesn t include the ticket for the chess game, but includes the costs of getting there. If you are in Singapore, it is likely you will pay a visit to Sentosa Island anyway. Stay tuned for our experiences in Malaysia! Credits: Thanks to Dione, Sahil, Badri and Contrapunctus for reviewing the draft. Thanks to Bhe for spotting a duplicate sentence.

22 September 2025

Vincent Bernat: Akvorado release 2.0

Akvorado 2.0 was released today! Akvorado collects network flows with IPFIX and sFlow. It enriches flows and stores them in a ClickHouse database. Users can browse the data through a web console. This release introduces an important architectural change and other smaller improvements. Let s dive in!
$ git diff --shortstat v1.11.5
 493 files changed, 25015 insertions(+), 21135 deletions(-)

New outlet service The major change in Akvorado 2.0 is splitting the inlet service into two parts: the inlet and the outlet. Previously, the inlet handled all flow processing: receiving, decoding, and enrichment. Flows were then sent to Kafka for storage in ClickHouse:
Akvorado flow processing before the change: flows are received and processed by the inlet, sent to Kafka and stored in ClickHouse
Akvorado flow processing before the introduction of the outlet service
Network flows reach the inlet service using UDP, an unreliable protocol. The inlet must process them fast enough to avoid losing packets. To handle a high number of flows, the inlet spawns several sets of workers to receive flows, fetch metadata, and assemble enriched flows for Kafka. Many configuration options existed for scaling, which increased complexity for users. The code needed to avoid blocking at any cost, making the processing pipeline complex and sometimes unreliable, particularly the BMP receiver.1 Adding new features became difficult without making the problem worse.2 In Akvorado 2.0, the inlet receives flows and pushes them to Kafka without decoding them. The new outlet service handles the remaining tasks:
Akvorado flow processing after the change: flows are received by the inlet, sent to Kafka, processed by the outlet and inserted in ClickHouse
Akvorado flow processing after the introduction of the outlet service
This change goes beyond a simple split:3 the outlet now reads flows from Kafka and pushes them to ClickHouse, two tasks that Akvorado did not handle before. Flows are heavily batched to increase efficiency and reduce the load on ClickHouse using ch-go, a low-level Go client for ClickHouse. When batches are too small, asynchronous inserts are used (e20645). The number of outlet workers scales dynamically (e5a625) based on the target batch size and latency (50,000 flows and 5 seconds by default). This new architecture also allows us to simplify and optimize the code. The outlet fetches metadata synchronously (e20645). The BMP component becomes simpler by removing cooperative multitasking (3b9486). Reusing the same RawFlow object to decode protobuf-encoded flows from Kafka reduces pressure on the garbage collector (8b580f). The effect on Akvorado s overall performance was somewhat uncertain, but a user reported 35% lower CPU usage after migrating from the previous version, plus resolution of the long-standing BMP component issue.

Other changes This new version includes many miscellaneous changes, such as completion for source and destination ports (f92d2e), and automatic restart of the orchestrator service (0f72ff) when configuration changes to avoid a common pitfall for newcomers. Let s focus on some key areas for this release: observability, documentation, CI, Docker, Go, and JavaScript.

Observability Akvorado exposes metrics to provide visibility into the processing pipeline and help troubleshoot issues. These are available through Prometheus HTTP metrics endpoints, such as /api/v0/inlet/metrics. With the introduction of the outlet, many metrics moved. Some were also renamed (4c0b15) to match Prometheus best practices. Kafka consumer lag was added as a new metric (e3a778). If you do not have your own observability stack, the Docker Compose setup shipped with Akvorado provides one. You can enable it by activating the profiles introduced for this purpose (529a8f). The prometheus profile ships Prometheus to store metrics and Alloy to collect them (2b3c46, f81299, and 8eb7cd). Redis and Kafka metrics are collected through the exporter bundled with Alloy (560113). Other metrics are exposed using Prometheus metrics endpoints and are automatically fetched by Alloy with the help of some Docker labels, similar to what is done to configure Traefik. cAdvisor was also added (83d855) to provide some container-related metrics. The loki profile ships Loki to store logs (45c684). While Alloy can collect and ship logs to Loki, its parsing abilities are limited: I could not find a way to preserve all metadata associated with structured logs produced by many applications, including Akvorado. Vector replaces Alloy (95e201) and features a domain-specific language, VRL, to transform logs. Annoyingly, Vector currently cannot retrieve Docker logs from before it was started. Finally, the grafana profile ships Grafana, but the shipped dashboards are broken. This is planned for a future version.

Documentation The Docker Compose setup provided by Akvorado makes it easy to get the web interface up and running quickly. However, Akvorado requires a few mandatory steps to be functional. It ships with comprehensive documentation, including a chapter about troubleshooting problems. I hoped this documentation would reduce the support burden. It is difficult to know if it works. Happy users rarely report their success, while some users open discussions asking for help without reading much of the documentation. In this release, the documentation was significantly improved.
$ git diff --shortstat v1.11.5 -- console/data/docs
 10 files changed, 1873 insertions(+), 1203 deletions(-)
The documentation was updated (fc1028) to match Akvorado s new architecture. The troubleshooting section was rewritten (17a272). Instructions on how to improve ClickHouse performance when upgrading from versions earlier than 1.10.0 was added (5f1e9a). An LLM proofread the entire content (06e3f3). Developer-focused documentation was also improved (548bbb, e41bae, and 871fc5). From a usability perspective, table of content sections are now collapsable (c142e5). Admonitions help draw user attention to important points (8ac894).
Admonition in Akvorado documentation to ask a user not to open an issue or start a discussion before reading the documentation
Example of use of admonitions in Akvorado's documentation

Continuous integration This release includes efforts to speed up continuous integration on GitHub. Coverage and race tests run in parallel (6af216 and fa9e48). The Docker image builds during the tests but gets tagged only after they succeed (8b0dce).
GitHub workflow for CI with many jobs, some of them running in parallel, some not
GitHub workflow to test and build Akvorado
End-to-end tests (883e19) ensure the shipped Docker Compose setup works as expected. Hurl runs tests on various HTTP endpoints, particularly to verify metrics (42679b and 169fa9). For example:
## Test inlet has received NetFlow flows
GET http://127.0.0.1:8080/prometheus/api/v1/query
[Query]
query: sum(akvorado_inlet_flow_input_udp_packets_total job="akvorado-inlet",listener=":2055" )
HTTP 200
[Captures]
inlet_receivedflows: jsonpath "$.data.result[0].value[1]" toInt
[Asserts]
variable "inlet_receivedflows" > 10
## Test inlet has sent them to Kafka
GET http://127.0.0.1:8080/prometheus/api/v1/query
[Query]
query: sum(akvorado_inlet_kafka_sent_messages_total job="akvorado-inlet" )
HTTP 200
[Captures]
inlet_sentflows: jsonpath "$.data.result[0].value[1]" toInt
[Asserts]
variable "inlet_sentflows" >=   inlet_receivedflows  

Docker Akvorado ships with a comprehensive Docker Compose setup to help users get started quickly. It ensures a consistent deployment, eliminating many configuration-related issues. It also serves as a living documentation of the complete architecture. This release brings some small enhancements around Docker: Previously, many Docker images were pulled from the Bitnami Containers library. However, VMWare acquired Bitnami in 2019 and Broadcom acquired VMWare in 2023. As a result, Bitnami images were deprecated in less than a month. This was not really a surprise4. Previous versions of Akvorado had already started moving away from them. In this release, the Apache project s Kafka image replaces the Bitnami one (1eb382). Thanks to the switch to KRaft mode, Zookeeper is no longer needed (0a2ea1, 8a49ca, and f65d20). Akvorado s Docker images were previously compiled with Nix. However, building AArch64 images on x86-64 is slow because it relies on QEMU userland emulation. The updated Dockerfile uses multi-stage and multi-platform builds: one stage builds the JavaScript part on the host platform, one stage builds the Go part cross-compiled on the host platform, and the final stage assembles the image on top of a slim distroless image (268e95 and d526ca).
# This is a simplified version
FROM --platform=$BUILDPLATFORM node:20-alpine AS build-js
RUN apk add --no-cache make
WORKDIR /build
COPY console/frontend console/frontend
COPY Makefile .
RUN make console/data/frontend
FROM --platform=$BUILDPLATFORM golang:alpine AS build-go
RUN apk add --no-cache make curl zip
WORKDIR /build
COPY . .
COPY --from=build-js /build/console/data/frontend console/data/frontend
RUN go mod download
RUN make all-indep
ARG TARGETOS TARGETARCH TARGETVARIANT VERSION
RUN make
FROM gcr.io/distroless/static:latest
COPY --from=build-go /build/bin/akvorado /usr/local/bin/akvorado
ENTRYPOINT [ "/usr/local/bin/akvorado" ]
When building for multiple platforms with --platform linux/amd64,linux/arm64,linux/arm/v7, the build steps until the highlighted line execute only once for all platforms. This significantly speeds up the build. Akvorado now ships Docker images for these platforms: linux/amd64, linux/amd64/v3, linux/arm64, and linux/arm/v7. When requesting ghcr.io/akvorado/akvorado, Docker selects the best image for the current CPU. On x86-64, there are two choices. If your CPU is recent enough, Docker downloads linux/amd64/v3. This version contains additional optimizations and should run faster than the linux/amd64 version. It would be interesting to ship an image for linux/arm64/v8.2, but Docker does not support the same mechanism for AArch64 yet (792808).

Go This release includes many changes related to Go but not visible to the users.

Toolchain In the past, Akvorado supported the two latest Go versions, preventing immediate use of the latest enhancements. The goal was to allow users of stable distributions to use Go versions shipped with their distribution to compile Akvorado. However, this became frustrating when interesting features, like go tool, were released. Akvorado 2.0 requires Go 1.25 (77306d) but can be compiled with older toolchains by automatically downloading a newer one (94fb1c).5 Users can still override GOTOOLCHAIN to revert this decision. The recommended toolchain updates weekly through CI to ensure we get the latest minor release (5b11ec). This change also simplifies updates to newer versions: only go.mod needs updating. Thanks to this change, Akvorado now uses wg.Go() (77306d) and I have started converting some unit tests to the new test/synctest package (bd787e, 7016d8, and 159085).

Testing When testing equality, I use a helper function Diff() to display the differences when it fails:
got := input.Keys()
expected := []int 1, 2, 3 
if diff := helpers.Diff(got, expected); diff != ""  
    t.Fatalf("Keys() (-got, +want):\n%s", diff)
 
This function uses kylelemons/godebug. This package is no longer maintained and has some shortcomings: for example, by default, it does not compare struct private fields, which may cause unexpectedly successful tests. I replaced it with google/go-cmp, which is stricter and has better output (e2f1df).

Another package for Kafka Another change is the switch from Sarama to franz-go to interact with Kafka (756e4a and 2d26c5). The main motivation for this change is to get a better concurrency model. Sarama heavily relies on channels and it is difficult to understand the lifecycle of an object handed to this package. franz-go uses a more modern approach with callbacks6 that is both more performant and easier to understand. It also ships with a package to spawn fake Kafka broker clusters, which is more convenient than the mocking functions provided by Sarama.

Improved routing table for BMP To store its routing table, the BMP component used kentik/patricia, an implementation of a patricia tree focused on reducing garbage collection pressure. gaissmai/bart is a more recent alternative using an adaptation of [Donald Knuth s ART algorithm][] that promises better performance and delivers it: 90% faster lookups and 27% faster insertions (92ee2e and fdb65c). Unlike kentik/patricia, gaissmai/bart does not help efficiently store values attached to each prefix. I adapted the same approach as kentik/patricia to store route lists for each prefix: store a 32-bit index for each prefix, and use it to build a 64-bit index for looking up routes in a map. This leverages Go s efficient map structure. gaissmai/bart also supports a lockless routing table version, but this is not simple because we would need to extend this to the map storing the routes and to the interning mechanism. I also attempted to use Go s new unique package to replace the intern package included in Akvorado, but performance was worse.7

Miscellaneous Previous versions of Akvorado were using a custom Protobuf encoder for performance and flexibility. With the introduction of the outlet service, Akvorado only needs a simple static schema, so this code was removed. However, it is possible to enhance performance with planetscale/vtprotobuf (e49a74, and 8b580f). Moreover, the dependency on protoc, a C++ program, was somewhat annoying. Therefore, Akvorado now uses buf, written in Go, to convert a Protobuf schema into Go code (f4c879). Another small optimization to reduce the size of the Akvorado binary by 10 MB was to compress the static assets embedded in Akvorado in a ZIP file. It includes the ASN database, as well as the SVG images for the documentation. A small layer of code makes this change transparent (b1d638 and e69b91).

JavaScript Recently, two large supply-chain attacks hit the JavaScript ecosystem: one affecting the popular packages chalk and debug and another impacting the popular package @ctrl/tinycolor. These attacks also exist in other ecosystems, but JavaScript is a prime target due to heavy use of small third-party dependencies. The previous version of Akvorado relied on 653 dependencies. npm-run-all was removed (3424e8, 132 dependencies). patch-package was removed (625805 and e85ff0, 69 dependencies) by moving missing TypeScript definitions to env.d.ts. eslint was replaced with oxlint, a linter written in Rust (97fd8c, 125 dependencies, including the plugins). I switched from npm to Pnpm, an alternative package manager (fce383). Pnpm does not run install scripts by default8 and prevents installing packages that are too recent. It is also significantly faster.9 Node.js does not ship Pnpm but it ships Corepack, which allows us to use Pnpm without installing it. Pnpm can also list licenses used by each dependency, removing the need for license-compliance (a35ca8, 42 dependencies). For additional speed improvements, beyond switching to Pnpm and Oxlint, Vite was replaced with its faster Rolldown version (463827). After these changes, Akvorado only pulls 225 dependencies.

Next steps I would like to land three features in the next version of Akvorado:
  • Add Grafana dashboards to complete the observability stack. See issue #1906 for details.
  • Integrate OVH s Grafana plugin by providing a stable API for such integrations. Akvorado s web console would still be useful for browsing results, but if you want to build and share dashboards, you should switch to Grafana. See issue #1895.
  • Move some work currently done in ClickHouse (custom dictionaries, GeoIP and IP enrichment) back into the outlet service. This should give more flexibility for adding features like the one requested in issue #1030. See issue #2006.

I started working on splitting the inlet into two parts more than one year ago. I found more motivation in recent months, partly thanks to Claude Code, which I used as a rubber duck. Almost none of the produced code was kept:10 it is like an intern who does not learn.

  1. Many attempts were made to make the BMP component both performant and not blocking. See for example PR #254, PR #255, and PR #278. Despite these efforts, this component remained problematic for most users. See issue #1461 as an example.
  2. Some features have been pushed to ClickHouse to avoid the processing cost in the inlet. See for example PR #1059.
  3. This is the biggest commit:
    $ git show --shortstat ac68c5970e2c   tail -1
    231 files changed, 6474 insertions(+), 3877 deletions(-)
    
  4. Broadcom is known for its user-hostile moves. Look at what happened with VMWare.
  5. As a Debian developer, I dislike these mechanisms that circumvent the distribution package manager. The final straw came when Go 1.25 spent one month in the Debian NEW queue, an arbitrary mechanism I don t like at all.
  6. In the early years of Go, channels were heavily promoted. Sarama was designed during this period. A few years later, a more nuanced approach emerged. See notably Go channels are bad and you should feel bad.
  7. This should be investigated further, but my theory is that the intern package uses 32-bit integers, while unique uses 64-bit pointers. See commit 74e5ac.
  8. This is also possible with npm. See commit dab2f7.
  9. An even faster alternative is Bun, but it is less available.
  10. The exceptions are part of the code for the admonition blocks, the code for collapsing the table of content, and part of the documentation.

21 September 2025

Gunnar Wolf: We, Programmers A Chronicle of Coders from Ada to AI

This post is a review for Computing Reviews for We, Programmers A Chronicle of Coders from Ada to AI , a book published in Addison-Wesley
When this book was presented as available for review, I jumped on it. After all, who doesn t love reading a nice bit of computing history, as told by a well-known author (affectionaly known as Uncle Bob ), one who has been immersed in computing since forever? What s not to like there? Reading on, the book does not disappoint. Much to the contrary, it digs into details absent in most computer history books that, being an operating systems and computer architecture geek, I absolutely enjoyed. But let me first address the book s organization. The book is split into four parts. Part 1, Setting the Stage, is a short introduction, answering the question Who are we? ( we being the programmers, of course). It describes the fascination many of us felt when we realized that the computer was there to obey us, to do our bidding, and we could absolutely control it. Part 2 talks about the giants of the computing world, on whose shoulders we stand. It digs in with a level of detail I have never seen before, discussing their personal lives and technical contributions (as well as the hoops they had to jump through to get their work done). Nine chapters cover these giants, ranging chronologically from Charles Babbage and Ada Lovelace to Ken Thompson, Dennis Richie, and Brian Kernighan (understandably, giants who worked together are grouped in the same chapter). This is the part with the most historically overlooked technical details. For example, what was the word size in the first computers, before even the concept of a byte had been brought into regular use? What was the register structure of early central processing units (CPUs), and why did it lead to requiring self-modifying code to be able to execute loops? Then, just as Unix and C get invented, Part 3 skips to computer history as seen through the eyes of Uncle Bob. I must admit, while the change of rhythm initially startled me, it ends up working quite well. The focus is no longer on the giants of the field, but on one particular person (who casts a very long shadow). The narrative follows the author s life: a boy with access to electronics due to his father s line of work; a computing industry leader, in the early 2000s, with extreme programming; one of the first producers of training materials in video format a role that today might be recognized as an influencer. This first-person narrative reaches year 2023. But the book is not just a historical overview of the computing world, of course. Uncle Bob includes a final section with his thoughts on the future of computing. As this is a book for programmers, it is fitting to start with the changes in programming languages that we should expect to see and where such changes are likely to take place. The unavoidable topic of artificial intelligence is presented next: What is it and what does it spell for computing, and in particular for programming? Interesting (and sometimes surprising) questions follow: What does the future of hardware development look like? What is prone to be the evolution of the World Wide Web? What is the future of programming and programmers? At just under 500 pages, the book is a volume to be taken seriously. But space is very well used with this text. The material is easy to read, often funny and always informative. If you enjoy computer history and understanding the little details in the implementations, it might very well be the book you want.

Dirk Eddelbuettel: rcppmlpackexamples 0.0.1 on CRAN: New Package

mlpack is a fabulous project providing countless machine learning algorithms in clean and performant C++ code as a header-only library. This gives both high performance and the ability to run the code in resource-constrained environment such as embedded systems. Bindings to a number of other languages are available, and an examples repo provides examples. The project also has a mature R package on CRAN which offers the various algorithms directly in R. Sometimes, however, one might want to use the header-only C++ code in another R package. How to do that was not well documented. A user alerted me by email to this fact a few weeks ago, and this lead to both an updated mlpack release at CRAN and this package. In short, we show via three complete examples how to access the mlpack code in C++ code in this package, offering a re-usable stanza to start a different package from. The only other (header-only) dependencies are provided by CRAN packages RcppArmadillo and RcppEnsmallen wrapping, respectively, the linear algebra and optimization libraries used by mlpack. Courtesy of my CRANberries, there is also a a new package note (no diffstat report yet). More detailed information is on the rcppmlpackexamples page, or the github repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

19 September 2025

Dirk Eddelbuettel: RcppArmadillo 15.0.2-2 on CRAN: Transition to Armadillo 15

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1261 other packages on CRAN, downloaded 41.4 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 647 times according to Google Scholar. This versions updates the 15.0.2-1 release from last week. Following fairly extensive email discussions with CRAN, we are now accelerating the transition to Armadillo. When C++14 or newer is used (which after all is the default since R 4.1.0 released May 2021, see WRE Section 1.2.4), or when opted into, the newer Armadillo is selected. If on the other hand either C++11 is still forced, or the legacy version is explicitly selected (which currently one package at CRAN does), then Armadillo 14.6.3 is selected. Most packages will not see a difference and automatically switch to the newer Armadillo. However, some packages will see one or two types of warning. First, if C++11 is still actively selected via for examples CXX_STD then CRAN will nudge a change to a newer compilation standard (as they have been doing for some time already). Preferably the change should be to simply remove the constraint and let R pick the standard based on its version and compiler availability. These days that gives us C++17 in most cases; see WRE Section 1.2.4 for details. (Some packages may need C++14 or C++17 or C++20 explicitly and can also do so.) Second, some packages may see a deprecation warning. Up until Armadillo 14.6.3, the package suppressed these and you can still get that effect by opting into that version by setting -DARMA_USE_LEGACY. (However this route will be sunset eventually too.) But one really should update the code to the non-deprecated version. In a large number of cases this simply means switching from using arma::is_finite() (typically called on a scalar double) to calling std::isfinite(). But there are some other cases, and we will help as needed. If you maintain a package showing deprecation warnings, and are lost here and cannot workout the conversion to current coding styles, please open an issue at the RcppArmadillo repository (i.e. here) or in your own repository and tag me. I will also reach out to the maintainers of a smaller set of packages with more than one reverse dependency. A few small changes have been made internal packaging and documentation, a small synchronization with upstream for two commits since the 15.0.2 release, as well as a link to the ldlasb2 repository and its demonstration regarding some ill-stated benchmarks done elsewhere. The detailed changes since the last CRAN release follow.

Changes in RcppArmadillo version 15.0.2-2 (2025-09-18)
  • Minor update to skeleton Makevars,Makevars.win
  • Update README.md to mention ldlasb2 repository
  • Minor documentation update (#487)
  • Synchronized with Armadillo upstream (#488)
  • Refine Armadillo version selection in coordination with CRAN maintainers to support transition towards Armadillo 15.0.*

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the Rcpp R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

18 September 2025

John Goerzen: Running an Accurate 80 25 DOS-Style Console on Modern Linux Is Possible After All

Here, in classic Goerzen deep dive fashion, is more information than you knew you wanted about a topic you ve probably never thought of. I found it pretty interesting, because it took me down a rabbit hole of subsystems I ve never worked with much and a mishmash of 1980s and 2020s tech. I had previously tried and failed to get an actual 80x25 Linux console, but I ve since figured it out! This post is about the Linux text console not X or Wayland. We re going to get the console right without using those systems. These instructions are for Debian trixie, but should be broadly applicable elsewhere also. The end result can look like this: Photo of a color VGA monitor displaying a BBS login screen (That s a Wifi Retromodem that I got at VCFMW last year in the Hayes modem case)

What s a pixel? How would you define a pixel these days? Probably something like a uniquely-addressable square dot in a two-dimensional grid . In the world of VGA and CRTs, that was just a logical abstraction. We got an API centered around that because it was convenient. But, down the VGA cable and on the device, that s not what a pixel was. A pixel, back then, was a time interval. On a multisync monitor, which were common except in the very early days of VGA, the timings could be adjusted which produced logical pixels of different sizes. Those screens often had a maximum resolution but not necessarily a native resolution in the sense that an LCD panel does. Different timings produced different-sized pixels with equal clarity (or, on cheaper monitors, equal fuzziness). A side effect of this was that pixels need not be square. And, in fact, in the standard DOS VGA 80x25 text mode, they weren t. You might be seeing why DVI, DisplayPort, and HDMI replaced VGA for LCD monitors: with a VGA cable, you did a pixel-to-analog-timings conversion, then the display did a timings-to-pixels conversion, and this process could be a bit lossy. (Hence why you sometimes needed to fill the screen with an image and push the center button on those older LCD screens) (Note to the pedantically-inclined: yes I am aware that I have simplified several things here; for instance, a color LCD pixel is made up of approximately 3 sub-dots of varying colors, and that things like color eInk displays have two pixel grids with different sizes of pixels layered atop each other, and printers are another confusing thing altogether, and and and . MOST PEOPLE THINK OF A PIXEL AS A DOT THESE DAYS, OK?)

What was DOS text mode? We think of this as the standard display: 80 columns wide and 25 rows tall. 80x25. By the time Linux came along, the standard Linux console was VGA text mode something like the 4th incarnation of text modes on PCs (after CGA, MDA, and EGA). VGA also supported certain other sizes of characters giving certain other text dimensions, but if I cover all of those, this will explode into a ridiculously more massive page than it already is. So to display text on an 80x25 DOS VGA system, ultimately characters and attributes were written into the text buffer in memory. The VGA system then rendered it to the display as a 720x400 image (at 70Hz) with non-square pixels such that the result was approximately a 4:3 aspect ratio. The font used for this rendering was a bitmapped one using 8x16 cells. You might do some math here and point out that 8 * 80 is only 640, and you d be correct. The fonts were 8x16 but the rendered cells were 9x16. The extra pixel was normally used for spacing between characters. However, in line graphics mode, characters 0xC0 through 0xDF repeated the 8th column in the position of the 9th, allowing the continuous line-drawing characters we re used to from TUIs.

Problems rendering DOS fonts on modern systems By now, you re probably seeing some of the issues we have rendering DOS screens on more modern systems. These aren t new at all; I remember some of these from back in the days when I ran OS/2, and I think also saw them on various terminals and consoles in OS/2 and Windows. Some issues you d encounter would be:
  • Incorrect aspect ratio caused by using the original font and rendering it using 1:1 square pixels (resulting in a squashed appearance)
  • Incorrect aspect ratio for ANOTHER reason, caused by failing to render column 9, resulting in text that is overall too narrow
  • Characters appearing to be touching each other when they shouldn t (failing to render column 9; looking at you, dosbox)
  • Gaps between line drawing characters that should be continuous, caused by rendering column 9 as empty space in all cases

Character set issues DOS was around long before Unicode was. In the DOS world, there were codepages that selected the glyphs for roughly the high half of the 256 possible characters. CP437 was the standard for the USA; others existed for other locations that needed different characters. On Unix, the USA pre-Unicode standard was Latin-1. Same concept, but with different character mappings. Nowadays, just about everything is based on UTF-8. So, we need some way to map our CP437 glyphs into Unicode space. If we are displaying DOS-based content, we ll also need a way to map CP437 characters to Unicode for display later, and we need these maps to match so that everything comes out right. Whew. So, let s get on with setting this up!

Selecting the proper video mode As explained in my previous post, proper hardware support for DOS text mode is limited to x86 machines that do not use UEFI. Non-x86 machines, or x86 machines with UEFI, simply do not contain the necessary support for it. As these are now standard, most of the time, the text console you see on Linux is actually the kernel driving the video hardware in graphics mode, and doing the text rendering in software. That s all well and good, but it makes it quite difficult to actually get an 80x25 console. First, we need to be running at 720x400. This is where I ran into difficulty last time. I realized that my laptop s LCD didn t advertise any video modes other than its own native resolution. However, almost all external monitors will, and 720x400@70 is a standard VGA mode from way back, so it should be well-supported. You need to find the Linux device name for your device. You can look at the possible devices with ls -l /sys/class/drm. If you also have a GUI, xrandr may help too. But in any case, each directory under /sys/class/drm has a file named modes, and if you cat them all, you will eventually come across one with a bunch of modes defined. Drop the leading card0 or whatever from the directory name, and that s your device. (Verify that 720x400 is in modes while you re at it.) Now, you re going to edit /etc/default/grub and add something like this to GRUB_CMDLINE_LINUX_DEFAULT:
video=DP-1:720x400@70
Of course, replace DP-1 with whatever your device is. Now you can run update-grub and reboot. You should have a 720x400 display. At first, I thought I had succeeded by using Linux s built-in VGA font with that mode. But it looked too tall. After noticing that repeated 0s were touching, I got suspicious about the missing 9th column in the cells. stty -a showed that my screen was 90x25, which is exactly what it would show if I was using 8x16 instead of 9x16 cells. Sooo . I need to prepare a 9x16 font.

Preparing a font Here s where it gets complicated. I ll give you the simple version and the hard mode. The simple mode is this: Download https://www.complete.org/downloads/CP437-VGA.psf.gz and stick it in /usr/local/etc, then skip to the Activating the font section below. The font assembled here is based on the Ultimate Oldschool PC Font Pack v2.2, which is (c) 2016-2020 VileR and licensed under Creative Commons Attribution-ShareAlike 4.0 International License. My psf file is derived from this using the instructions below.

Building it yourself First, install some necessary software: apt-get install fontforge bdf2psf Start by going to the Oldschool PC Font Pack Download page. Download oldschool_pc_font_pack_v2.2_FULL.zip and unpack it. The file we re interested in is otb - Bm (linux bitmap)/Bm437_IBM_VGA_9x16.otb. Open it in fontforge by running fontforge BmPlus_IBM_VGA_9x16.otb. When it asks if you will load the bitmap fonts, hit select all, then yes. Go to File -> generate fonts. Save in a BDF, no need for outlines, and use guess for resolution. Now you have a file such as Bm437_IBM_VGA_9x16-16.bdf. Excellent. Now we need to generate a Unicode map file. We will make sure this matches the system s by enumerating every character from 0x00 to 0xFF, converting it from CP437 to Unicode, and writing the appropriate map. Here s a Python script to do that:
for i in range(0, 256):
    cp437b = b'%c' % i
    uni = ord(cp437b.decode('cp437'))
    print(f"U+ uni:04x ")
Save that file as genmap.py and run python3 genmap.py > cp437-uni. Now, we re ready to build the psf file:
bdf2psf --fb Bm437_IBM_VGA_9x16-16.bdf \
  /dev/null cp437-uni 256 CP437-VGA.psf
By convention, we normally store these files gzipped, so gzip CP437-VGA.psf. You can test it on the console with setfont CP437-VGA.psf.gz. Now copy this file into /usr/local/etc.

Activating the font Now, edit /etc/default/console-setup. It should look like this:
# CONFIGURATION FILE FOR SETUPCON

# Consult the console-setup(5) manual page.

ACTIVE_CONSOLES="/dev/tty[1-6]"

CHARMAP="UTF-8"

CODESET="Lat15"
FONTFACE="VGA"
FONTSIZE="8x16"
FONT=/usr/local/etc/CP437-VGA.psf.gz

VIDEOMODE=

# The following is an example how to use a braille font
# FONT='lat9w-08.psf.gz brl-8x8.psf'
At this point, you should be able to reboot. You should have a proper 80x25 display! Log in and run stty -a to verify it is indeed 80x25.

Using and testing CP437 Part of the point of CP437 is to be able to access BBSs, ANSI art, and similar. Now, remember, the Linux console is still in UTF-8 mode, so we have to translate CP437 to UTF-8, then let our font map translate it back to CP437. A weird trip, but it works. Let s test it using the Textfiles ANSI art collection. In the artworks section, I randomly grabbed a file near the top: borgman.ans. Download that, and display with:
clear; iconv -f CP437 -t UTF-8 < borgman.ans
You should see something similar to but actually more accurate than the textfiles PNG rendering of it, which you ll note has an incorrect aspect ratio and some rendering issues. I spot-checked with a few others and they seemed to look good. belinda.ans in particular tries quite a few characters and should give you a good sense if it is working.

Use with interactive programs That s all well and good, but you re probably going to want to actually use this with some interactive program that expects CP437. Maybe Minicom, Kermit, or even just telnet? For this, you ll want to apt-get install luit. luit maps CP437 (or any other encoding) to UTF-8 for display, and then of course the Linux console maps UTF-8 back to the CP437 font. Here s a way you can repeat the earlier experiment using luit to run the cat program:
clear; luit -encoding CP437 cat borgman.ans
You can run any command under luit. You can even run luit -encoding CP437 bash if you like. If you do this, it is probably a good idea to follow my instructions on generating locales on my post on serial terminals, and then within luit, set LANG=en_us.IBM437. But note especially that you can run programs like minicom and others for accessing BBSs under luit.

Final words This gave you a nice DOS-type console. Although it doesn t have glyphs for many codepoints, it does run in UTF-8 mode and therefore is compatible with modern software. You can achieve greater compatibility with more UTF-8 codepoints with the DOS font, at the expense of accuracy of character rendering (especially for the double-line drawing characters) by using /usr/share/bdf2psf/standard.equivalents instead of /dev/null in the bdf2psf command. Or you could go for another challenge, such as using the DEC vt-series fonts for coverage of ISO-8859-1. But just using fonts extracted from DEC ROM won t work properly, because DEC terminals had even more strangeness going on than DOS fonts.

15 September 2025

Sven Hoexter: HaProxy: Configuring SNI for a TLS Proxy

If you use HaProxy to e.g. terminate TLS on the frontend and connect via TLS to a backend, one has to take care of sending the SNI (server name indication) extension in the TLS handshake sort of manually. Even if you use host names to address the backend server, e.g.
server foobar foobar.example:2342 ssl verify required ca-file /etc/haproxy/ca/foo.crt
HaProxy will try to establish the connection without SNI. You manually have to enforce SNI here, e.g.
server foobar foobar.example:2342 ssl verify required ca-file /etc/haproxy/ca/foo.crt sni str(foobar.example)
The surprising thing here was that it requires an expression, so you can not just write sni foobar.example, you've to wrap it in an expression. The simplest one is making sure it's a string. Update: Might be noteworthy that you've to configure SNI for the health check separately, and in that case it's a string not an expression. E.g.
server foobar foobar.example:2342 check check-ssl check-sni foobar.example ssl verify required ca-file /etc/haproxy/ca/foo.crt sni str(foobar.example)
The ca-file is shared between the ssl context and the check-ssl.

14 September 2025

Ian Jackson: tag2upload in the first month of forky

tl;dr: tag2upload (beta) is going well so far, and is already handling around one in 13 uploads to Debian. Introduction and some stats We announced tag2upload s open beta in mid-July. That was in the middle of the the freeze for trixie, so usage was fairly light until the forky floodgates opened. Since then the service has successfully performed 637 uploads, of which 420 were in the last 32 days. That s an average of about 13 per day. For comparison, during the first half of September up to today there have been 2475 uploads to unstable. That s about 176/day. So, tag2upload is already handling around 7.5% of uploads. This is very gratifying for a service which is advertised as still being in beta! Sean and I are very pleased both with the uptake, and with the way the system has been performing. Recent UI/UX improvements During this open beta period we have been hard at work. We have made many improvements to the user experience. Current git-debpush in forky, or trixie-backports, is much better at detecting various problems ahead of time. When uploads do fail on the service the emailed error reports are now more informative. For example, anomalies involving orig tarballs, which by definition can t be detected locally (since one point of tag2upload is not to have tarballs locally) now generally result in failure reports containing a diffstat, and instructions for a local repro. Why we are still in beta There are a few outstanding work items that we currently want to complete before we declare the end of the beta. Retrying on Salsa-side failures The biggest of these is that the service should be able to retry when Salsa fails. Sadly, Salsa isn t wholly reliable, and right now if it breaks when the service is trying to handle your tag, your upload can fail. We think most of these failures could be avoided. Implementing retries is a fairly substantial task, but doesn t pose any fundamental difficulties. We re working on this right now. Other notable ongoing work We want to support pristine-tar, so that pristine-tar users can do a new upstream release. Andrea Pappacoda is working on that with us. See #1106071. (Note that we would generally recommend against use of pristine-tar within Debian. But we want to support it.) We have been having conversations with Debusine folks about what integration between tag2upload and Debusine would look like. We re making some progress there, but a lot is still up in the air. We are considering how best to provide tag2upload pre-checks as part of Salsa CI. There are several problems detected by the tag2upload service that could be detected by Salsa CI too, but which can t be detected by git-debpush. Common problems We ve been monitoring the service and until very recently we have investigated every service-side failure, to understand the root causes. This has given us insight into the kinds of things our users want, and the kinds of packaging and git practices that are common. We ve been able to improve the system s handling of various anomalies and also improved the documentation. Right now our failure rate is still rather high, at around 7%. Partly this is because people are trying out the system on packages that haven t ever seen git tooling with such a level of rigour. There are two classes of problem that are responsible for the vast majority of the failures that we re still seeing: Reuse of version numbers, and attempts to re-tag tag2upload, like git (and like dgit), hates it when you reuse a version number, or try to pretend that a (perhaps busted) release never happened. git tags aren t namespaced, and tend to spread about promiscuously. So replacing a signed git tag, with a different tag of the same name, is a bad idea. More generally, reusing the same version number for a different (signed!) package is poor practice. Likewise, it s usually a bad idea to remove changelog entries for versions which were actually released, just because they were later deemed improper. We understand that many Debian contributors have gotten used to this kind of thing. Indeed, tools like dcut encourage it. It does allow you to make things neat-looking, even if you ve made mistakes - but really it does so by covering up those mistakes! The bottom line is that tag2upload can t support such history-rewriting. If you discover a mistake after you ve signed the tag, please just burn the version number and add a new changelog stanza. One bonus of tag2upload s approach is that it will discover if you are accidentally overwriting an NMU, and report that as an error. Discrepancies between git and orig tarballs tag2upload promises that the source package that it generates corresponds precisely to the git tree you tag and sign. Orig tarballs make this complicated. They aren t present on your laptop when you git-debpush. When you re not uploading a new upstream version, the tag2upload service reuses existing orig tarballs from the archive. If your git and the archive s orig don t agree, the tag2upload service will report an error, rather than upload a package with contents that differ from your git tag. With the most common Debian workflows, everything is fine: If you base everything on upstream git, and make your orig tarballs with git archive (or git deborig), your orig tarballs are the same as the git, by construction. We recommend usually ignoring upstream tarballs: most upstreams work in git, and their tarballs can contain weirdness that we don t want. (At worst, the tarball can contain an attack that isn t visible in git, as with xz!) Alternatively, if you use gbp import-orig, the differences (including an attack like Jia Tan s) are imported into git for you. Then, once again, your git and the orig tarball will correspond. But there are other workflows where this correspondence may not hold. Those workflows are hazardous, because the thing you re probably working with locally for your routine development is the git view. Then, when you upload, your work is transplanted onto the orig tarball, which might be quite different - so what you upload isn t what you ve been working on! This situation is detected by tag2upload, precisely because tag2upload checks that it s keeping its promise: the source package is identical to the git view. (dgit push makes the same promise.) Get involved Of course the easiest way to get involved is to start using tag2upload. We would love to have more contributors. There are some easy tasks to get started with, in bugs we ve tagged newcomer mostly UX improvements such as detecting certain problems earlier, in git-debpush. More substantially, we are looking for help with sbuild: we d like it to be able to work directly from git, rather than needing to build source packages: #868527.

comment count unavailable comments

Otto Kek l inen: Zero-configuration TLS and password management best practices in MariaDB 11.8

Featured image of post Zero-configuration TLS and password management best practices in MariaDB 11.8Locking down database access is probably the single most important thing for a system administrator or software developer to prevent their application from leaking its data. As MariaDB 11.8 is the first long-term supported version with a few new key security features, let s recap what the most important things are every DBA should know about MariaDB in 2025. Back in the old days, MySQL administrators had a habit of running the clumsy mysql-secure-installation script, but it has long been obsolete. A modern MariaDB database server is already secure by default and locked down out of the box, and no such extra scripts are needed. On the contrary, the database administrator is expected to open up access to MariaDB according to the specific needs of each server. Therefore, it is important that the DBA can understand and correctly configure three things:
  1. Separate application-specific users with granular permissions allowing only necessary access and no more.
  2. Distributing and storing passwords and credentials securely
  3. Ensuring all remote connections are properly encrypted
For holistic security, one should also consider proper auditing, logging, backups, regular security updates and more, but in this post we will focus only on the above aspects related to securing database access.

How encrypting database connections with TLS differs from web server HTTP(S) Even though MariaDB (and other databases) use the same SSL/TLS protocol for encrypting remote connections as web servers and HTTPS, the way it is implemented is significantly different, and the different security assumptions are important for a database administrator to grasp. Firstly, most HTTP requests to a web server are unauthenticated, meaning the web server serves public web pages and does not require users to log in. Traditionally, when a user logs in over a HTTP connection, the username and password were transmitted in plaintext as a HTTP POST request. Modern TLS, which was previously called SSL, does not change how HTTP works but simply encapsulates it. When using HTTPS, a web browser and a web server will start an encrypted TLS connection as the very first thing, and only once established, do they send HTTP requests and responses inside it. There are no passwords or other shared secrets needed to form the TLS connection. Instead, the web server relies on a trusted third party, a Certificate Authority (CA), to vet that the TLS certificate offered by the web server can be trusted by the web browser. For a database server like MariaDB, the situation is quite different. All users need to first authenticate and log in to the server before getting being allowed to run any SQL and getting any data out of the server. The database server and client programs have built-in authentication methods, and passwords are not, and have never been, sent in plaintext. Over the years, MySQL and its successor, MariaDB, have had multiple password authentication methods: the original SHA-1-based hashing, later double SHA-1-based mysql_native_password, followed by sha256_password and caching_sha256_password in MySQL and ed25519 in MariaDB. The MariaDB.org blog post by Sergei Golubchik recaps the history of these well. Even though most modern MariaDB installations should be using TLS to encrypt all remote connections in 2025, having the authentication method be as secure as possible still matters, because authentication is done before the TLS connection is fully established. To further harden the authentication agains man-in-the-middle attacks, a new password the authentication method PARSEC was introduced in MariaDB 11.8, which builds upon the previous ed25519 public-key-based verification (similar to how modern SSH does), and also combines key derivation with PBKDF2 with hash functions (SHA512,SHA256) and a high iteration count. At first it may seem like a disadvantage to not wrap all connections in a TLS tunnel like HTTPS does, but actually not having the authentication done in a MitM resistant way regardless of the connection encryption status allows a clever extra capability that is now available in MariaDB: as the database server and client already have a shared secret that is being used by the server to authenticate the user, it can also be used by the client to validate the server s TLS certificate and no third parties like CAs or root certificates are needed. MariaDB 11.8 was the first LTS version to ship with this capability for zero-configuration TLS. Note that the zero-configuration TLS also works on older password authentication methods and does not require users to have PARSEC enabled. As PARSEC is not yet the default authentication method in MariaDB, it is recommended to enable it in installations that use zero-configuration TLS encryption to maximize the security of the TLS certificate validation.

Why the root user in MariaDB has no password and how it makes the database more secure Relying on passwords for security is problematic, as there is always a risk that they could leak, and a malicious user could access the system using the leaked password. It is unfortunately far too common for database passwords to be stored in plaintext in configuration files that are accidentally committed into version control and published on GitHub and similar platforms. Every application or administrative password that exists should be tracked to ensure only people who need it know it, and rotated at regular intervals to ensure old employees etc won t be able to use old passwords. This password management is complex and error-prone. Replacing passwords with other authentication methods is always advisable when possible. On a database server, whoever installed the database by running e.g. apt install mariadb-server, and configured it with e.g. nano /etc/mysql/mariadb.cnf, already has full root access to the operating system, and asking them for a password to access the MariaDB database shell is moot, since they could circumvent any checks by directly accessing the files on the system anyway. Therefore, MariaDB, since version 10.4 stopped requiring the root user to enter a password when connecting locally, and instead checks using socket authentication whether the user is the operating-system root user or equivalent (e.g. running sudo). This is an elegant way to get rid of a password that was actually unnecessary to begin with. As there is no root password anymore, the risk of an external user accessing the database as root with a leaked password is fully eliminated. Note that socket authentication only works for local connections on the same server. If you want to access a MariaDB server remotely as the root user, you would need to configure a password for it first. This is not generally recommended, as explained in the next section.

Create separate database users for normal use and keep root for administrative use only Out of the box a MariaDB installation is already secure by default, and only the local root user can connect to it. This account is intended for administrative use only, and for regular daily use you should create separate database users with access limited to the databases they need and the permissions required. The most typical commands needed to create a new database for an app and a user the app can use to connect to the database would be the following:
sql
CREATE DATABASE app_db;
CREATE USER 'app_user'@'%' IDENTIFIED BY 'your_secure_password';
GRANT ALL PRIVILEGES ON app_db.* TO 'app_user'@'%';
FLUSH PRIVILEGES;
Alternatively, if you want to use the parsec authentication method, run this to create the user:
sql
CREATE OR REPLACE USER 'app_user'@'%'
 IDENTIFIED VIA parsec
 USING PASSWORD('your_secure_password');
Note that the plugin auth_parsec is not enabled by default. If you see the error message ERROR 1524 (HY000): Plugin 'parsec' is not loaded fix this by running INSTALL SONAME 'auth_parsec';. In the CREATE USER statements, the @'%' means that the user is allowed to connect from any host. This needs to be defined, as MariaDB always checks permissions based on both the username and the remote IP address or hostname of the user, combined with the authentication method. Note that it is possible to have multiple user@remote combinations, and they can have different authentication methods. A user could, for example, be allowed to log in locally using the socket authentication and over the network using a password. If you are running a custom application and you know exactly what permissions are sufficient for the database users, replace the ALL PRIVILEGES with a subset of privileges listed in the MariaDB documentation. For new permissions to take effect, restart the database or run FLUSH PRIVILEGES.

Allow MariaDB to accept remote connections and enforce TLS Using the above 'app_user'@'%' is not enough on its own to allow remote connections to MariaDB. The MariaDB server also needs to be configured to listen on a network interface to accept remote connections. As MariaDB is secure by default, it only accepts connections from localhost until the administrator updates its configuration. On a typical Debian/Ubuntu system, the recommended way is to drop a new custom config in e.g. /etc/mysql/mariadb.conf.d/99-server-customizations.cnf, with the contents:
[mariadbd]
# Listen for connections from anywhere
bind-address = 0.0.0.0
# Only allow TLS encrypted connections
require-secure-transport = on
For settings to take effect, restart the server with systemctl restart mariadb. After this, the server will accept connections on any network interface. If the system is using a firewall, the port 3306 would additionally need to be allow-listed. To confirm that the settings took effect, run e.g. mariadb -e "SHOW VARIABLES LIKE 'bind_address';" , which should now show 0.0.0.0. When allowing remote connections, it is important to also always define require-secure-transport = on to enforce that only TLS-encrypted connections are allowed. If the server is running MariaDB 11.8 and the clients are also MariaDB 11.8 or newer, no additional configuration is needed thanks to MariaDB automatically providing TLS certificates and appropriate certificate validation in recent versions. On older long-term-supported versions of the MariaDB server one would have had to manually create the certificates and configure the ssl_key, ssl_cert and ssl_ca values on the server, and distribute the certificate to the clients as well, which was cumbersome, so good it is not required anymore. In MariaDB 11.8 the only additional related config that might still be worth setting is tls_version = TLSv1.3 to ensure only the latest TLS protocol version is used. Finally, test connections to ensure they work and to confirm that TLS is used by running e.g.:
shell
mariadb --user=app_user --password=your_secure_password \
 --host=192.168.1.66 -e '\s'
The result should show something along:
--------------
mariadb from 11.8.3-MariaDB, client 15.2 for debian-linux-gnu (x86_64)
...
Current user: app_user@192.168.1.66
SSL: Cipher in use is TLS_AES_256_GCM_SHA384, cert is OK
...
If running a Debian/Ubuntu system, see the bundled README with zcat /usr/share/doc/mariadb-server/README.Debian.gz to read more configuration tips.

Should TLS encryption be used also on internal networks? If a database server and app are running on the same private network, the chances that the connection gets eavesdropped on or man-in-the-middle attacked by a malicious user are low. However, it is not zero, and if it happens, it can be difficult to detect or prove that it didn t happen. The benefit of using end-to-end encryption is that both the database server and the client can validate the certificates and keys used, log it, and later have logs audited to prove that connections were indeed encrypted and show how they were encrypted. If all the computers on an internal network already have centralized user account management and centralized log collection that includes all database sessions, reusing existing SSH connections, SOCKS proxies, dedicated HTTPS tunnels, point-to-point VPNs, or similar solutions might also be a practical option. Note that the zero-configuration TLS only works with password validation methods. This means that systems configured to use PAM or Kerberos/GSSAPI can t use it, but again those systems are typically part of a centrally configured network anyway and are likely to have certificate authorities and key distribution or network encryption facilities already set up. In a typical software app stac however, the simplest solution is often the best and I recommend DBAs use the end-to-end TLS encryption in MariaDB 11.8 in most cases. Hopefully with these tips you can enjoy having your MariaDB deployments both simpler and more secure than before!

Next.

Previous.