Search Results: "js"

17 January 2026

Simon Josefsson: Backup of S3 Objects Using rsnapshot

I ve been using rsnapshot to take backups of around 10 servers and laptops for well over 15 years, and it is a remarkably reliable tool that has proven itself many times. Rsnapshot uses rsync over SSH and maintains a temporal hard-link file pool. Once rsnapshot is configured and running, on the backup server, you get a hardlink farm with directories like this for the remote server:
/backup/serverA.domain/.sync/foo
/backup/serverA.domain/daily.0/foo
/backup/serverA.domain/daily.1/foo
/backup/serverA.domain/daily.2/foo
...
/backup/serverA.domain/daily.6/foo
/backup/serverA.domain/weekly.0/foo
/backup/serverA.domain/weekly.1/foo
...
/backup/serverA.domain/monthly.0/foo
/backup/serverA.domain/monthly.1/foo
...
/backup/serverA.domain/yearly.0/foo
I can browse and rescue files easily, going back in time when needed. The rsnapshot project README explains more, there is a long rsnapshot HOWTO although I usually find the rsnapshot man page the easiest to digest. I have stored multi-TB Git-LFS data on GitLab.com for some time. The yearly renewal is coming up, and the price for Git-LFS storage on GitLab.com is now excessive (~$10.000/year). I have reworked my work-flow and finally migrated debdistget to only store Git-LFS stubs on GitLab.com and push the real files to S3 object storage. The cost for this is barely measurable, I have yet to run into the 25/month warning threshold. But how do you backup stuff stored in S3? For some time, my S3 backup solution has been to run the minio-client mirror command to download all S3 objects to my laptop, and rely on rsnapshot to keep backups of this. While 4TB NVME s are relatively cheap, I ve felt that this disk and network churn on my laptop is unsatisfactory for quite some time. What is a better approach? I find S3 hosting sites fairly unreliable by design. Only a couple of clicks in your web browser and you have dropped 100TB of data. Or by someone else who steal your plaintext-equivalent cookie. Thus, I haven t really felt comfortable using any S3-based backup option. I prefer to self-host, although continously running a mirror job is not sufficient: if I accidentally drop the entire S3 object store, my mirror run will remove all files locally too. The rsnapshot approach that allows going back in time and having data on self-managed servers feels superior to me. What if we could use rsnapshot with a S3 client instead of rsync? Someone else asked about this several years ago, and the suggestion was to use the fuse-based s3fs which sounded unreliable to me. After some experimentation, working around some hard-coded assumption in the rsnapshot implementation, I came up with a small configuration pattern and a wrapper tool to implement what I desired. Here is my configuration snippet:
cmd_rsync    /backup/s3/s3rsync
rsync_short_args    -Q
rsync_long_args    --json --remove
lockfile    /backup/s3/rsnapshot.pid
snapshot_root    /backup/s3
backup    s3:://hetzner/debdistget-gnuinos    ./debdistget-gnuinos
backup    s3:://hetzner/debdistget-tacos  ./debdistget-tacos
backup    s3:://hetzner/debdistget-diffos ./debdistget-diffos
backup    s3:://hetzner/debdistget-pureos ./debdistget-pureos
backup    s3:://hetzner/debdistget-kali   ./debdistget-kali
backup    s3:://hetzner/debdistget-devuan ./debdistget-devuan
backup    s3:://hetzner/debdistget-trisquel   ./debdistget-trisquel
backup    s3:://hetzner/debdistget-debian ./debdistget-debian
The idea is to save a backup of a couple of S3 buckets under /backup/s3/. I have some scripts that take a complete rsnapshot.conf file and append my per-directory configuration so that this becomes a complete configuration. If you are curious how I roll this, backup-all invokes backup-one appending my rsnapshot.conf template with the snippet above. The s3rsync wrapper script is the essential hack to convert rsnapshot s rsync parameters into something that talks S3 and the script is as follows:
#!/bin/sh
set -eu
S3ARG=
for ARG in "$@"; do
    case $ARG in
    s3:://*) S3ARG="$S3ARG "$(echo $ARG   sed -e 's,s3:://,,');;
    -Q*) ;;
    *) S3ARG="$S3ARG $ARG";;
    esac
done
echo /backup/s3/mc mirror $S3ARG
exec /backup/s3/mc mirror $S3ARG
It uses the minio-client tool. I first tried s3cmd but its sync command read all files to compute MD5 checksums every time you invoke it, which is very slow. The mc mirror command is blazingly fast since it only compare mtime s, just like rsync or git. First you need to store credentials for your S3 bucket. These are stored in plaintext in ~/.mc/config.json which I find to be sloppy security practices, but I don t know of any better way to do this. Replace AKEY and SKEY with your access token and secret token from your S3 provider:
/backup/s3/mc alias set hetzner AKEY SKEY
If I invoke a sync job for a fully synced up directory the output looks like this:
root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V sync
Setting locale to POSIX "C"
echo 1443 > /backup/s3/rsnapshot.pid 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-gnuinos \
    /backup/s3/.sync//debdistget-gnuinos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-gnuinos /backup/s3/.sync//debdistget-gnuinos
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-tacos \
    /backup/s3/.sync//debdistget-tacos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-tacos /backup/s3/.sync//debdistget-tacos
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-diffos \
    /backup/s3/.sync//debdistget-diffos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-diffos /backup/s3/.sync//debdistget-diffos
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-pureos \
    /backup/s3/.sync//debdistget-pureos 
/backup/s3/mc mirror --json --remove hetzner/debdistget-pureos /backup/s3/.sync//debdistget-pureos
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-kali \
    /backup/s3/.sync//debdistget-kali 
/backup/s3/mc mirror --json --remove hetzner/debdistget-kali /backup/s3/.sync//debdistget-kali
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-devuan \
    /backup/s3/.sync//debdistget-devuan 
/backup/s3/mc mirror --json --remove hetzner/debdistget-devuan /backup/s3/.sync//debdistget-devuan
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-trisquel \
    /backup/s3/.sync//debdistget-trisquel 
/backup/s3/mc mirror --json --remove hetzner/debdistget-trisquel /backup/s3/.sync//debdistget-trisquel
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
/backup/s3/s3rsync -Qv --json --remove s3:://hetzner/debdistget-debian \
    /backup/s3/.sync//debdistget-debian 
/backup/s3/mc mirror --json --remove hetzner/debdistget-debian /backup/s3/.sync//debdistget-debian
 "status":"success","total":0,"transferred":0,"duration":0,"speed":0 
touch /backup/s3/.sync/ 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1443] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V sync: completed successfully 
root@hamster /backup# 
You can tell from the paths that this machine runs Guix. This was the first production use of the Guix System for me, and the machine has been running since 2015 (with the occasional new hard drive). Before, I used rsnapshot on Debian, but some stable release of Debian dropped the rsnapshot package, paving the way for me to test Guix in production on a non-Internet exposed machine. Unfortunately, mc is not packaged in Guix, so you will have to install it from the MinIO Client GitHub page manually. Running the daily rotation looks like this:
root@hamster /backup# /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf -V daily
Setting locale to POSIX "C"
echo 1549 > /backup/s3/rsnapshot.pid 
mv /backup/s3/daily.5/ /backup/s3/daily.6/ 
mv /backup/s3/daily.4/ /backup/s3/daily.5/ 
mv /backup/s3/daily.3/ /backup/s3/daily.4/ 
mv /backup/s3/daily.2/ /backup/s3/daily.3/ 
mv /backup/s3/daily.1/ /backup/s3/daily.2/ 
mv /backup/s3/daily.0/ /backup/s3/daily.1/ 
/run/current-system/profile/bin/cp -al /backup/s3/.sync /backup/s3/daily.0 
rm -f /backup/s3/rsnapshot.pid 
/run/current-system/profile/bin/logger -p user.info -t rsnapshot[1549] \
    /run/current-system/profile/bin/rsnapshot -c /backup/s3/rsnapshot.conf \
    -V daily: completed successfully 
root@hamster /backup# 
Hopefully you will feel inspired to take backups of your S3 buckets now!

16 January 2026

Freexian Collaborators: Monthly report about Debian Long Term Support, December 2025 (by Santiago Ruano Rinc n)

The Debian LTS Team, funded by [Freexian s Debian LTS offering] (https://www.freexian.com/lts/debian/), is pleased to report its activities for December.

Activity summary During the month of December, 18 contributors have been paid to work on Debian LTS (links to individual contributor reports are located below). The team released 41 DLAs fixing 252 CVEs. The team currently focuses on preparing security updates for Debian 11 bullseye , but also looks for contributing with updates for Debian 12 bookworm , Debian 13 trixie and even Debian unstable. Notable security updates:
  • libsoup2.4 (DLA-4398-1), prepared by Andreas Henrikson, fixing several vulnerabilities.
  • glib2.0 (DLA-4412-1), published by Emilio Pozuelo Monfort, addressing multiple issues.
  • lasso (DLA-4397-1), prepared by Sylvain Beucler, addressing multiple issues, including a critical remote code execution (RCE) vulnerability (CVE-2025-47151)
  • roundcube (DLA 4415-1), prepared by Guilhem Moulin, fixing a cross-site-scripting (XSS) (CVE-2025-68461) and an information disclosure (CVE-2025-68460) vulnerabilities
  • mediawiki (DLA 4428-1), published by Guilhem, fixing multiple vulnerabilities could lead to information disclosure, denial of service or privilege escalation.
  • While the DLA has not been published yet, Charles Henrique Melara proposed upstream fixes for seven CVEs in ffmpeg: https://code.ffmpeg.org/FFmpeg/FFmpeg/pulls/21275.
  • python-apt (DLA 4408-1), prepared by Utkarsh Gupta, in coordination with the Debian Security Team and Julian Andres Klode, the apt s maintainer.
  • libpng1.6 (DLA-4396-1), published by Tobias Frost, completing the work started the previous month.
Notable non-security updates:
  • tzdata (DLA-4403-1), prepared by Emilio, including the latest changes to the leap second list and its expiry date, which was set for the end of December.
Contributions from outside the LTS Team:
  • Christoph Berg, co-maintainer of PostgreSQL in Debian, prepared a postgresql-13 update, released as DLA-4420-1
The LTS Team has also contributed with updates to the latest Debian releases:

Individual Debian LTS contributor reports

Thanks to our sponsors Sponsors that joined recently are in bold.

14 January 2026

Dirk Eddelbuettel: RcppSimdJson 0.1.15 on CRAN: New Upstream, Some Maintenance

A brand new release 0.1.15 of the RcppSimdJson package is now on CRAN. RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is faster than CPU speed as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon. This version updates to the current 4.2.4 upstream release. It also updates the RcppExports.cpp file with glue between C++ and R. We want move away from using Rf_error() (as Rcpp::stop() is generally preferable). Packages (such as this one) that are declaring an interface have an actual Rf_error() call generated in RcppExports.cpp which can protect which is what current Rcpp code generation does. Long story short, a minor internal reason. The short NEWS entry for this release follows.

Changes in version 0.1.15 (2026-01-14)
  • simdjson was upgraded to version 4.2.4 (Dirk in #97
  • RcppExports.cpp was regenerated to aid a Rcpp transition
  • Standard maintenance updates for continuous integration and URLs

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

12 January 2026

Daniel Kahn Gillmor: AI as a Compression Problem

A recent article in The Atlantic makes the case that very large language models effectively contain much of the works they're trained on. This article is an attempt to popularize the insights in the recent academic paper Extracting books from production language models from Ahmed et al. The authors of the paper demonstrate convincingly that well-known copyrighted textual material can be extracted from the chatbot interfaces of popular commercial LLM services. The Atlantic article cites a podcast quote about the Stable Diffusion AI image-generator model, saying "We took 100,000 gigabytes of images and compressed it to a two-gigabyte file that can re-create any of those and iterations of those". By analogy, this suggests we might think of LLMs (which work on text, not the images handled by Stable Diffusion) as a form of lossy textual compression. The entire text of Moby Dick, the canonical Big American Novel is merely 1.2MiB uncompressed (and less than 0.4MiB losslessly compressed with bzip2 -9). It's not surprising to imagine that a model with hundreds of billions of parameters might contain copies of these works. Warning: The next paragraph contains fuzzy math with no real concrete engineering practice behind it! Consider a hypothetical model with 100 billion parameters, where each parameter is stored as a 16-bit floating point value. The model weights would take 200 GB of storage. If you were to fill the parameter space only with losslessly compressed copies of books like Moby Dick, you could still fit half a million books, more than anyone can read in a lifetime. And lossy compression is typically orders of magnitude less in size than lossless compression, so we're talking about millions of works effectively encoded, with the acceptance of some artifacts being injected in the output. I first encountered this "compression" view of AI nearly three years ago, in Ted Chiang's insightful ChatGPT is a Blurry JPEG of the Web. I was suprised that The Atlantic article didn't cite Chiang's piece. If you haven't read Ted Chiang, i strongly recommend his work, and this piece is a great place to start. Chiang aside, the more recent writing that focuses on the idea of compressed works being "contained" in the model weights seems to be used by people interested in wielding esome sort of copyright claims against the AI companies that maintain or provide access to these models. There are many many problems with AI today, but attacking AI companies based on copyright concerns seems similar to going after Al Capone for tax evasion. We should be much more concerned with the effect these projects have on cultural homogeneity, mental health, labor rights, privacy, and social control than whether they're violating copyright in some specific instance.

11 January 2026

Dirk Eddelbuettel: RProtoBuf 0.4.25 on CRAN: Mostly Maintenance

A new maintenance release 0.4.25 of RProtoBuf arrived on CRAN today. RProtoBuf provides R with bindings for the Google Protocol Buffers ( ProtoBuf ) data encoding and serialization library used and released by Google, and deployed very widely in numerous projects as a language and operating-system agnostic protocol. This release brings an update to a header use force by R-devel, the usual set of continunous integration updates, and a large overhaul of URLs as CRAN is now running more powerful checks. As a benefit the three vignettes have all been refreshed. they are now also delivered via the new Rcpp::asis() vignette builder that permits pre-made pdf files to be used easily. The following section from the NEWS.Rd file has full details.

Changes in RProtoBuf version 0.4.25 (2026-01-11)
  • Several routine updates to continuous integration script
  • Include ObjectTable.h instead of Callback.h to accommodate R 4.6.0
  • Switch vignettes to Rcpp::asis driver, update references

Thanks to my CRANberries, there is a diff to the previous release. The RProtoBuf page has copies of the (older) package vignette, the quick overview vignette, and the pre-print of our JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

10 January 2026

Dirk Eddelbuettel: Rcpp 1.1.1 on CRAN: Many Improvements in Semi-Annual Update

rcpp logo Team Rcpp is thrilled to share that an exciting new version 1.1.1 of Rcpp is now on CRAN (and also uploaded to Debian and already built for r2u). Having switchted to C++11 as the minimum standard in the previous 1.1.0 release, this version takes full advantage of it and removes a lot of conditional code catering to older standards that no longer need to be supported. Consequently, the source tarball shrinks by 39% from 3.11 mb to 1.88 mb. That is a big deal. (Size peaked with Rcpp 1.0.12 two years ago at 3.43 mb; relative to its size we are down 45% !!) Removing unused code also makes maintenance easier, and quickens both compilation and installation in general. This release continues as usual with the six-months January-July cycle started with release 1.0.5 in July 2020. Interim snapshots are always available via the r-universe page and repo. We continue to strongly encourage the use of these development released and their testing we tend to run our systems with them too. Rcpp has long established itself as the most popular way of enhancing R with C or C++ code. Right now, 3020 packages on CRAN depend on Rcpp for making analytical code go faster and further. On CRAN, 13.1% of all packages depend (directly) on Rcpp, and 60.9% of all compiled packages do. From the cloud mirror of CRAN (which is but a subset of all CRAN downloads), Rcpp has been downloaded 109.8 million times. The two published papers (also included in the package as preprint vignettes) have, respectively, 2151 (JSS, 2011) and 405 (TAS, 2018) citations, while the the book (Springer useR!, 2013) has another 715. This time, I am not attempting to summarize the different changes. The full list follows below and details all these changes, their respective PRs and, if applicable, issue tickets. Big thanks from all of us to all contributors!

Changes in Rcpp release version 1.1.1 (2026-01-08)
  • Changes in Rcpp API:
    • An unused old R function for a compiler version check has been removed after checking no known package uses it (Dirk in #1395)
    • A narrowing warning is avoided via a cast (Dirk in #1398)
    • Demangling checks have been simplified (I aki in #1401 addressing #1400)
    • The treatment of signed zeros is now improved in the Sugar code (I aki in #1404)
    • Preparations for phasing out use of Rf_error have been made (I aki in #1407)
    • The long-deprecated function loadRcppModules() has been removed (Dirk in #1416 closing #1415)
    • Some non-API includes from R were refactored to accommodate R-devel changes (I aki in #1418 addressing #1417)
    • An accessor to Rf_rnbeta has been removed (Dirk in #1419 also addressing #1420)
    • Code accessing non-API Rf_findVarInFrame now uses R_getVarEx (Dirk in #1423 fixing #1421)
    • Code conditional on the R version now expects at least R 3.5.0; older code has been removed (Dirk in #1426 fixing #1425)
    • The non-API ATTRIB entry point to the R API is no longer used (Dirk in #1430 addressing #1429)
    • The unwind-protect mechanism is now used unconditionally (Dirk in #1437 closing #1436)
  • Changes in Rcpp Attributes:
    • The OpenMP plugin has been generalized for different macOS compiler installations (Kevin in #1414)
  • Changes in Rcpp Documentation:
    • Vignettes are now processed via a new "asis" processor adopted from R.rsp (Dirk in #1394 fixing #1393)
    • R is now cited via its DOI (Dirk)
    • A (very) stale help page has been removed (Dirk in #1428 fixing #1427)
    • The main README.md was updated emphasizing r-universe in favor of the local drat repos (Dirk in #1431)
  • Changes in Rcpp Deployment:
    • A temporary change in R-devel concerning NA part in complex variables was accommodated, and then reverted (Dirk in #1399 fixing #1397)
    • The macOS CI runners now use macos-14 (Dirk in #1405)
    • A message is shown if R.h is included before Rcpp headers as this can lead to errors (Dirk in #1411 closing #1410)
    • Old helper functions use message() to signal they are not used, deprecation and removal to follow (Dirk in #1413 closing #1412)
    • Three tests were being silenced following #1413 (Dirk in #1422)
    • The heuristic whether to run all available tests was refined (Dirk in #1434 addressing #1433)
    • Coverage has been tweaked via additional #nocov tags (Dirk in #1435)
  • Non-release Changes:
    • Two interim non-releases 1.1.0.8.1 and .2 were made in order to unblock CRAN due to changes in R-devel rather than Rcpp

Thanks to my CRANberries, you can also look at a diff to the previous interim release along with pre-releaseds 1.1.0.8, 1.1.0.8.1 and 1.1.0.8.2 that were needed because R-devel all of a sudden decided to move fast and break things. Not our doing. Questions, comments etc should go to the GitHub discussion section list]rcppdevellist off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well. Both sections can be searched as well.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

8 January 2026

Reproducible Builds: Reproducible Builds in December 2025

Welcome to the December 2025 from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.

  1. New orig-check service to validate Debian upstream tarballs
  2. Distribution work
  3. disorderfs updated to FUSE 3
  4. Mailing list updates
  5. Three new academic papers published
  6. Website updates
  7. Upstream patches

New orig-check service to validate Debian upstream tarballs This month, Debian Developer Lucas Nussbaum announced the orig-check service, which attempts to automatically reproduce the generation upstream tarballs (ie. the original source component of a Debian source package), comparing that to the upstream tarball actually shipped with Debian. As of the time of writing, it is possible for a Debian developer to upload a source archive that does not actually correspond to upstream s version. Whilst this is not inherently malicious (it typically indicates some tooling/process issue), the very possibility that a maintainer s version may differ potentially permits a maintainer to make (malicious) changes that would be misattributed to upstream. This service therefore nicely complements the whatsrc.org service, which was reported in our reports for both April and August. The orig-check is dedicated to Lunar, who sadly passed away a year ago.

Distribution work In Arch Linux this month, Robin Candau and Mark Hegreberg worked at making the Arch Linux WSL image bit-for-bit reproducible. Robin also shared some implementation details and future related work on our mailing list. Continuing a series reported in these reports for March, April and July 2025 (etc.), Simon Josefsson has published another interesting article this month, itself a followup to a post Simon published in December 2024 regarding GNU Guix Container Images that are hosted on GitLab. In Debian this month, Micha Lenk posted to the debian-backports-announce mailing list with the news that the Backports archive will now discard binaries generated and uploaded by maintainers: The benefit is that all binary packages [will] get built by the Debian buildds before we distribute them within the archive. Felix Moessbauer of Siemens then filed a bug in the Debian bug tracker to signal their intention to package debsbom, a software bill of materials (SBOM) generator for distributions based on Debian. This generated a discussion on the bug inquiring about the output format as well as a question about how these SBOMs might be distributed. Holger Levsen merged a number of significant changes written by Alper Nebi Yasak to the Debian Installer in order to improve its reproducibility. As noted in Alper s merge request, These are the reproducibility fixes I looked into before bookworm release, but was a bit afraid to send as it s just before the release, because the things like the xorriso conversion changes the content of the files to try to make them reproducible. In addition, 76 reviews of Debian packages were added, 8 were updated and 27 were removed this month adding to our knowledge about identified issues. A new different_package_content_when_built_with_nocheck issue type was added by Holger Levsen. [ ] Arnout Engelen posted to our mailing list reporting that they successfully reproduced the NixOS minimal installation ISO for the 25.11 release without relying on a pre-compiled package archive, with more details on their blog. Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for his work there.

disorderfs updated to FUSE 3 disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into system calls to reliably flush out reproducibility issues. This month, however, Roland Clobus upgraded disorderfs* from FUSE 2 to FUSE 3 after its package automatically got removed from Debian testing. Some tests in Debian currently require disorderfs to make the Debian live images reproducible, although disorderfs is not a Debian-specific tool.

Mailing list updates On our mailing list this month:
  • Luca Di Maio announced stampdalf, a filesystem timestamp preservation tool that wraps arbitrary commands and ensures filesystem timestamp reproducibility :
    stampdalf allows you to run any command that modifies files in a directory tree, then automatically resets all timestamps back to their original values. Any new files created during command execution are set to [the UNIX epoch] or a custom timestamp via SOURCE_DATE_EPOCH.
    The project s GitHub page helpfully reveals that the project is pronounced: stamp-dalf (stamp like time-stamp, dalf like Gandalf the wizard) as it s a wizard of time and stamps .)
  • Lastly, Reproducible Builds developer cen1 posted to our list announcing that early/experimental/alpha support for FreeBSD was added to rebuilderd. In their post, cen1 reports that the initial builds are in progress and look quite decent . cen1 also interestingly notes that since the upstream is currently not technically reproducible I had to relax the bit-for-bit identical requirement of rebuilderd [ ] I consider the pkg to be reproducible if the tar is content-identical (via diffoscope), ignoring timestamps and some of the manifest files. .

Three new academic papers published Yogya Gamage and Benoit Baudry of Universit de Montr al, Canada together with Deepika Tiwari and Martin Monperrus of KTH Royal Institute of Technology, Sweden published a paper on The Design Space of Lockfiles Across Package Managers:
Most package managers also generate a lockfile, which records the exact set of resolved dependency versions. Lockfiles are used to reduce build times; to verify the integrity of resolved packages; and to support build reproducibility across environments and time. Despite these beneficial features, developers often struggle with their maintenance, usage, and interpretation. In this study, we unveil the major challenges related to lockfiles, such that future researchers and engineers can address them. [ ]
A PDF of their paper is available online. Benoit Baudry also posted an announcement to our mailing list, which generated a number of replies.
Betul Gokkaya, Leonardo Aniello and Basel Halak of the University of Southampton then published a paper on the A taxonomy of attacks, mitigations and risk assessment strategies within the software supply chain:
While existing studies primarily focus on software supply chain attacks prevention and detection methods, there is a need for a broad overview of attacks and comprehensive risk assessment for software supply chain security. This study conducts a systematic literature review to fill this gap. By analyzing 96 papers published between 2015-2023, we identified 19 distinct SSC attacks, including 6 novel attacks highlighted in recent studies. Additionally, we developed 25 specific security controls and established a precisely mapped taxonomy that transparently links each control to one or more specific attacks. [ ]
A PDF of the paper is available online via the article s canonical page.
Aman Sharma and Martin Monperrus of the KTH Royal Institute of Technology, Sweden along with Benoit Baudry of Universit de Montr al, Canada published a paper this month on Causes and Canonicalization of Unreproducible Builds in Java. The abstract of the paper is as follows:
[Achieving] reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central, and we develop a novel taxonomy of six root causes of unreproducibility. [ ]
A PDF of the paper is available online.

Website updates Once again, there were a number of improvements made to our website this month including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 January 2026

Vincent Bernat: Using eBPF to load-balance traffic across UDP sockets with Go

Akvorado collects sFlow and IPFIX flows over UDP. Because UDP does not retransmit lost packets, it needs to process them quickly. Akvorado runs several workers listening to the same port. The kernel should load-balance received packets fairly between these workers. However, this does not work as expected. A couple of workers exhibit high packet loss:
$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
>   sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total listener="0.0.0.0:2055",worker="0"  0
packets_total listener="0.0.0.0:2055",worker="1"  0
packets_total listener="0.0.0.0:2055",worker="2"  0
packets_total listener="0.0.0.0:2055",worker="3"  1.614933572278264e+15
packets_total listener="0.0.0.0:2055",worker="4"  0
packets_total listener="0.0.0.0:2055",worker="5"  0
packets_total listener="0.0.0.0:2055",worker="6"  9.59964121598348e+14
packets_total listener="0.0.0.0:2055",worker="7"  0
eBPF can help by implementing an alternate balancing algorithm.

Options for load-balancing There are three methods to load-balance UDP packets across workers:
  1. One worker receives the packets and dispatches them to the other workers.
  2. All workers share the same socket.
  3. Each worker has its own socket, listening to the same port, with the SO_REUSEPORT socket option.

SO_REUSEPORT option Tom Hebert added the SO_REUSEPORT socket option in Linux 3.9. The cover letter for his patch series explains why this new option is better than the two existing ones from a performance point of view:
SO_REUSEPORT allows multiple listener sockets to be bound to the same port. [ ] Received packets are distributed to multiple sockets bound to the same port using a 4-tuple hash. The motivating case for SO_RESUSEPORT in TCP would be something like a web server binding to port 80 running with multiple threads, where each thread might have it s own listener socket. This could be done as an alternative to other models:
  1. have one listener thread which dispatches completed connections to workers, or
  2. accept on a single listener socket from multiple threads.
In case #1, the listener thread can easily become the bottleneck with high connection turn-over rate. In case #2, the proportion of connections accepted per thread tends to be uneven under high connection load. [ ] We have seen the disproportion to be as high as 3:1 ratio between thread accepting most connections and the one accepting the fewest. With SO_REUSEPORT the distribution is uniform. The motivating case for SO_REUSEPORT in UDP would be something like a DNS server. An alternative would be to receive on the same socket from multiple threads. As in the case of TCP, the load across these threads tends to be disproportionate and we also see a lot of contection on the socket lock.
Akvorado uses the SO_REUSEPORT option to dispatch the packets across the workers. However, because the distribution uses a 4-tuple hash, a single socket handles all the flows from one exporter.

SO_ATTACH_REUSEPORT_EBPF option In Linux 4.5, Craig Gallek added the SO_ATTACH_REUSEPORT_EBPF option to attach an eBPF program to select the target UDP socket. In Linux 4.6, he extended it to support TCP. The socket(7) manual page documents this mechanism:1
The BPF program must return an index between 0 and N-1 representing the socket which should receive the packet (where N is the number of sockets in the group). If the BPF program returns an invalid index, socket selection will fall back to the plain SO_REUSEPORT mechanism.
In Linux 4.19, Martin KaFai Lau added the BPF_PROG_TYPE_SK_REUSEPORT program type. Such an eBPF program selects the socket from a BPF_MAP_TYPE_REUSEPORT_ARRAY map instead. This new approach is more reliable when switching target sockets from one instance to another for example, when upgrading, a new instance can add its sockets and remove the old ones.

Load-balancing with eBPF and Go Altering the load-balancing algorithm for a group of sockets requires two steps:
  1. write and compile an eBPF program in C,2 and
  2. load it and attach it in Go.

eBPF program in C A simple load-balancing algorithm is to randomly choose the destination socket. The kernel provides the bpf_get_prandom_u32() helper function to get a pseudo-random number.
volatile const __u32 num_sockets; //  
struct  
    __uint(type, BPF_MAP_TYPE_REUSEPORT_SOCKARRAY);
    __type(key, __u32);
    __type(value, __u64);
    __uint(max_entries, 256);
  socket_map SEC(".maps"); //  
SEC("sk_reuseport")
int reuseport_balance_prog(struct sk_reuseport_md *reuse_md)
 
    __u32 index = bpf_get_prandom_u32() % num_sockets; //  
    bpf_sk_select_reuseport(reuse_md, &socket_map, &index, 0); //  
    return SK_PASS; //  
 
char _license[] SEC("license") = "GPL";
In , we declare a volatile constant for the number of sockets in the group. We will initialize this constant before loading the eBPF program into the kernel. In , we define the socket map. We will populate it with the socket file descriptors. In , we randomly select the index of the target socket.3 In , we invoke the bpf_sk_select_reuseport() helper to record our decision. Finally, in , we accept the packet.

Header files If you compile the C source with clang, you get errors due to missing headers. The recommended way to solve this is to generate a vmlinux.h file with bpftool:
$ bpftool btf dump file /sys/kernel/btf/vmlinux format c > vmlinux.h
Then, include the following headers:4
#include "vmlinux.h"
#include <bpf/bpf_helpers.h>
For my 6.17 kernel, the generated vmlinux.h is quite big: 2.7 MiB. Moreover, bpf/bpf_helpers.h is shipped with libbpf. This adds another dependency for users. As the eBPF program is quite small, I prefer to put the strict minimum in vmlinux.h by cherry-picking the definitions I need.

Compilation The eBPF Library for Go ships bpf2go, a tool to compile eBPF programs and to generate some scaffolding code. We create a gen.go file with the following content:
package main
//go:generate go tool bpf2go -tags linux reuseport reuseport_kern.c
After running go generate ./..., we can inspect the resulting objects with readelf and llvm-objdump:
$ readelf -S reuseport_bpfeb.o
There are 14 section headers, starting at offset 0x840:
  [Nr] Name              Type             Address           Offset
[ ]
  [ 3] sk_reuseport      PROGBITS         0000000000000000  00000040
  [ 6] .maps             PROGBITS         0000000000000000  000000c8
  [ 7] license           PROGBITS         0000000000000000  000000e8
[ ]
$ llvm-objdump -S reuseport_bpfeb.o
reuseport_bpfeb.o:  file format elf64-bpf
Disassembly of section sk_reuseport:
0000000000000000 <reuseport_balance_prog>:
;  
       0:   bf 61 00 00 00 00 00 00     r6 = r1
;     __u32 index = bpf_get_prandom_u32() % num_sockets;
       1:   85 00 00 00 00 00 00 07     call 0x7
[ ]

Usage from Go Let s set up 10 workers listening to the same port.5 Each socket enables the SO_REUSEPORT option before binding:6
var (
    err error
    fds []uintptr
    conns []*net.UDPConn
)
workers := 10
listenAddr := "127.0.0.1:0"
listenConfig := net.ListenConfig 
    Control: func(_, _ string, c syscall.RawConn) error  
        c.Control(func(fd uintptr)  
            err = unix.SetsockoptInt(int(fd), unix.SOL_SOCKET, unix.SO_REUSEPORT, 1)
            fds = append(fds, fd)
         )
        return err
     ,
 
for range workers  
    pconn, err := listenConfig.ListenPacket(t.Context(), "udp", listenAddr)
    if err != nil  
        t.Fatalf("ListenPacket() error:\n%+v", err)
     
    udpConn := pconn.(*net.UDPConn)
    listenAddr = udpConn.LocalAddr().String()
    conns = append(conns, udpConn)
 
The second step is to load the eBPF program, initialize the num_sockets variable, populate the socket map, and attach the program to the first socket.7
// Load the eBPF collection.
spec, err := loadReuseport()
if err != nil  
    t.Fatalf("loadVariables() error:\n%+v", err)
 
// Set "num_sockets" global variable to the number of file descriptors we will register
if err := spec.Variables["num_sockets"].Set(uint32(len(fds))); err != nil  
    t.Fatalf("NumSockets.Set() error:\n%+v", err)
 
// Load the map and the program into the kernel.
var objs reuseportObjects
if err := spec.LoadAndAssign(&objs, nil); err != nil  
    t.Fatalf("loadReuseportObjects() error:\n%+v", err)
 
t.Cleanup(func()   objs.Close()  )
// Assign the file descriptors to the socket map.
for worker, fd := range fds  
    if err := objs.reuseportMaps.SocketMap.Put(uint32(worker), uint64(fd)); err != nil  
        t.Fatalf("SocketMap.Put() error:\n%+v", err)
     
 
// Attach the eBPF program to the first socket.
socketFD := int(fds[0])
progFD := objs.reuseportPrograms.ReuseportBalanceProg.FD()
if err := unix.SetsockoptInt(socketFD, unix.SOL_SOCKET, unix.SO_ATTACH_REUSEPORT_EBPF, progFD); err != nil  
    t.Fatalf("SetsockoptInt() error:\n%+v", err)
 
We are now ready to process incoming packets. Each worker is a Go routine incrementing a counter for each received packet:8
var wg sync.WaitGroup
receivedPackets := make([]int, workers)
for worker := range workers  
    conn := conns[worker]
    packets := &receivedPackets[worker]
    wg.Go(func()  
        payload := make([]byte, 9000)
        for  
            if _, err := conn.Read(payload); err != nil  
                if errors.Is(err, net.ErrClosed)  
                    return
                 
                t.Logf("Read() error:\n%+v", err)
             
            *packets++
         
     )
 
Let s send 1000 packets:
sentPackets := 1000
conn, err := net.Dial("udp", conns[0].LocalAddr().String())
if err != nil  
    t.Fatalf("Dial() error:\n%+v", err)
 
defer conn.Close()
for range sentPackets  
    if _, err := conn.Write([]byte("hello world!")); err != nil  
        t.Fatalf("Write() error:\n%+v", err)
     
 
If we print the content of the receivedPackets array, we can check the balancing works as expected, with each worker getting about 100 packets:
=== RUN   TestUDPWorkerBalancing
    balancing_test.go:84: receivedPackets[0] = 107
    balancing_test.go:84: receivedPackets[1] = 92
    balancing_test.go:84: receivedPackets[2] = 99
    balancing_test.go:84: receivedPackets[3] = 105
    balancing_test.go:84: receivedPackets[4] = 107
    balancing_test.go:84: receivedPackets[5] = 96
    balancing_test.go:84: receivedPackets[6] = 102
    balancing_test.go:84: receivedPackets[7] = 105
    balancing_test.go:84: receivedPackets[8] = 99
    balancing_test.go:84: receivedPackets[9] = 88
    balancing_test.go:91: receivedPackets = 1000
    balancing_test.go:92: sentPackets     = 1000

Graceful restart You can also use SO_ATTACH_REUSEPORT_EBPF to gracefully restart an application. A new instance of the application binds to the same address and prepare its own version of the socket map. Once it attaches the eBPF program to the first socket, the kernel steers incoming packets to this new instance. The old instance needs to drain the already received packets before shutting down. To check we are not losing any packet, we spawn a Go routine to send as many packets as possible:
sentPackets := 0
notSentPackets := 0
done := make(chan bool)
conn, err := net.Dial("udp", conns1[0].LocalAddr().String())
if err != nil  
    t.Fatalf("Dial() error:\n%+v", err)
 
defer conn.Close()
go func()  
    for  
        if _, err := conn.Write([]byte("hello world!")); err != nil  
            notSentPackets++
          else  
            sentPackets++
         
        select  
        case <-done:
            return
        default:
         
     
 ()
Then, while the Go routine runs, we start the second set of workers. Once they are running, they start receiving packets. If we gracefully stop the initial set of workers, not a single packet is lost!9
=== RUN   TestGracefulRestart
    graceful_test.go:135: receivedPackets1[0] = 165
    graceful_test.go:135: receivedPackets1[1] = 195
    graceful_test.go:135: receivedPackets1[2] = 194
    graceful_test.go:135: receivedPackets1[3] = 190
    graceful_test.go:135: receivedPackets1[4] = 213
    graceful_test.go:135: receivedPackets1[5] = 187
    graceful_test.go:135: receivedPackets1[6] = 170
    graceful_test.go:135: receivedPackets1[7] = 190
    graceful_test.go:135: receivedPackets1[8] = 194
    graceful_test.go:135: receivedPackets1[9] = 155
    graceful_test.go:139: receivedPackets2[0] = 1631
    graceful_test.go:139: receivedPackets2[1] = 1582
    graceful_test.go:139: receivedPackets2[2] = 1594
    graceful_test.go:139: receivedPackets2[3] = 1611
    graceful_test.go:139: receivedPackets2[4] = 1571
    graceful_test.go:139: receivedPackets2[5] = 1660
    graceful_test.go:139: receivedPackets2[6] = 1587
    graceful_test.go:139: receivedPackets2[7] = 1605
    graceful_test.go:139: receivedPackets2[8] = 1631
    graceful_test.go:139: receivedPackets2[9] = 1689
    graceful_test.go:147: receivedPackets = 18014
    graceful_test.go:148: sentPackets     = 18014
Unfortunately, gracefully shutting down a UDP socket is not trivial in Go.10 Previously, we were terminating workers by closing their sockets. However, if we close them too soon, the application loses packets that were assigned to them but not yet processed. Before stopping, a worker needs to call conn.Read() until there are no more packets. A solution is to set a deadline for conn.Read() and check if we should stop the Go routine when the deadline is exceeded:
payload := make([]byte, 9000)
for  
    conn.SetReadDeadline(time.Now().Add(50 * time.Millisecond))
    if _, err := conn.Read(payload); err != nil  
        if errors.Is(err, os.ErrDeadlineExceeded)  
            select  
            case <-done:
                return
            default:
                continue
             
         
        t.Logf("Read() error:\n%+v", err)
     
    *packets++
 
With TCP, this aspect is simpler: after enabling the net.ipv4.tcp_migrate_req sysctl, the kernel automatically migrates waiting connections to a random socket in the same group. Alternatively, eBPF can also control this migration. Both features are available since Linux 5.14.

Addendum After implementing this strategy in Akvorado, all workers now drop packets!
$ curl -s 127.0.0.1:8080/api/v0/inlet/metrics \
>   sed -n s/akvorado_inlet_flow_input_udp_in_dropped//p
packets_total listener="0.0.0.0:2055",worker="0"  838673
packets_total listener="0.0.0.0:2055",worker="1"  843675
packets_total listener="0.0.0.0:2055",worker="2"  837922
packets_total listener="0.0.0.0:2055",worker="3"  841443
packets_total listener="0.0.0.0:2055",worker="4"  840668
packets_total listener="0.0.0.0:2055",worker="5"  850274
packets_total listener="0.0.0.0:2055",worker="6"  835488
packets_total listener="0.0.0.0:2055",worker="7"  834479
The root cause is the default limit of 32 records for Kafka batch sizes. This limit is too low because the brokers have a large overhead when handling each batch: they need to ensure they persist correctly before acknowledging them. Increasing the limit to 4096 records fixes this issue. While load-balancing incoming flows with eBPF remains useful, it did not solve the main issue. At least the even distribution of dropped packets helped identify the real bottleneck.

  1. The current version of the manual page is incomplete and does not cover the evolution introduced in Linux 4.19. There is a pending patch about this.
  2. Rust is another option. However, the program we use is so trivial that it does not make sense to use Rust.
  3. As bpf_get_prandom_u32() returns a pseudo-random 32-bit unsigned value, this method exhibits a very slight bias towards the first indexes. This is unlikely to be worth fixing.
  4. Some examples include <linux/bpf.h> instead of "vmlinux.h". This makes your eBPF program dependent on the installed kernel headers.
  5. listenAddr is initially set to 127.0.0.1:0 to allocate a random port. After the first iteration, it is updated with the allocated port.
  6. This is the setupSockets() function in fixtures_test.go.
  7. This is the setupEBPF() function in fixtures_test.go.
  8. The complete code is in balancing_test.go
  9. The complete code is in graceful_test.go
  10. In C, we would poll() both the socket and a pipe used to signal for shutdown. When the second condition is triggered, we drain the socket by executing a series of non-blocking read() until we get EWOULDBLOCK.

31 December 2025

Freexian Collaborators: How files are stored by Debusine (by Stefano Rivera)

Debusine is a tool designed for Debian developers and Operating System developers in general. This post describes how Debusine stores and manages files. Debusine has been designed to run a network of workers that can perform various tasks that consume and produce artifacts . The artifact itself is a collection of files structured into an ontology of artifact types. This generic architecture should be suited to many sorts of build & CI problems. We have implemented artifacts to support building a Debian-like distribution, but the foundations of Debusine aim to be more general than that. For example a package build task takes a debian:source-package as input and produces some debian:binary-packages and a debian:package-build-log as output. This generalized approach is quite different to traditional Debian APT archive implementations, which typically required having the archive contents on the filesystem. Traditionally, most Debian distribution management tasks happen within bespoke applications that cannot share much common infrastructure.

File Stores Debusine s files themselves are stored by the File Store layer. There can be multiple file stores configured, with different policies. Local storage is useful as the initial destination for uploads to Debusine, but it has to be backed up manually and might not scale to sufficiently large volumes of data. Remote storage such as S3 is also available. It is possible to serve a file from any store, with policies for which one to prefer for downloads and uploads. Administrators can set policies for which file stores to use at the scope level, as well as policies for populating and draining stores of files.

Artifacts As mentioned above, files are collected into Artifacts. They combine:
  • a set of files with names (including potentially parent directories)
  • a category, e.g. debian:source-package
  • key-value data in a schema specified by the category and stored as a JSON-encoded dictionary.
Within the stores, files are content-addressed: a file with a given SHA-256 digest is only stored once in any given store, and may be retrieved by that digest. When a new artifact is created, its files are uploaded to Debusine as needed. Some of the files may already be present in the Debusine instance. In that case, if the file is already part of the artifact s workspace, then the client will not need to re-upload the file. But if not, it must be reuploaded to avoid users obtaining unauthorized access to existing file contents in another private workspace or multi-tenant scope. Because the content-addressing makes storing duplicates cheap, it s common to have artifacts that overlap files. For example a debian:upload will contain some of the same files as the related debian:source-package as well as the .changes file. Looking at the debusine.debian.net instance that we run, we can see a content-addressing savings of 629 GiB across our (currently) 2 TiB file store. This is somewhat inflated by the Debian Archive import, that did not need to bother to share artifacts between suites. But it still shows reasonable real-world savings.

APT Repository Representation Unlike a traditional Debian APT repository management tool, the source package and binary packages are not stored directly in the pool of an APT repository on disk on the debusine server. Instead we abstract the repository into a debian:suite collection within the Debusine database. The collection contains the artifacts that make up the APT repository. To ensure that it can be safely represented as a valid URL structure (or files on disk) the suite collection maintains an index of the pool filenames of its artifacts. Suite collections can combine into a debian:archive collection that shares a common file pool. Debusine collections can keep an historical record of when things were added and removed. This, combined with the database-backed collection-driven repository representation makes it very easy to provide APT-consumable snapshot views to every point in a repository s history.

Expiry While a published distribution probably wants to keep the full history of all its package builds, we don t need to retain all of the output of all QA tasks that were run. Artifacts can have an expiration delay or inherit one from their workspace. Once this delay has expired, artifacts which are not being held in any collection are eligible to be automatically cleaned up. QA work that is done in a workspace that has automatic artifact expiry, and isn t publishing the results to an APT suite, will safely automatically expire.

Daily Vacuum A daily vacuum task handles all of the file periodic maintenance for file stores. It does some cleanup of working areas, a scan for unreferenced & missing files, and enforces file store policies. The policy work could be copying files for backup or moving files between stores to keep them within size limits (e.g. from a local upload store into a general cloud store).

In Conclusion Debusine provides abstractions for low-level file storage and object collections. This allows storage to be scalable beyond a single filesystem and highly available. Using content-addressed storage minimizes data duplication within a Debusine instance. For Debian distributions, storing the archive metadata entirely in a database made providing built-in snapshot support easy in Debusine.

22 December 2025

C.J. Collier: I m learning about perlguts today.


im-learning-about-perlguts-today.png


## 0.23	2025-12-20
commit be15aa25dea40aea66a8534143fb81b29d2e6c08
Author: C.J. Collier 
Date:   Sat Dec 20 22:40:44 2025 +0000
    Fixes C-level test infrastructure and adds more test cases for upb_to_sv conversions.
    
    - **Makefile.PL:**
        - Allow  extra_src  in  c_test_config.json  to be an array.
        - Add ASan flags to CCFLAGS and LDDLFLAGS for better debugging.
        - Corrected echo newlines in  test_c  target.
    - **c_test_config.json:**
        - Added missing type test files to  deps  and  extra_src  for  convert/sv_to_upb  and  convert/upb_to_sv  test runners.
    - **t/c/convert/upb_to_sv.c:**
        - Fixed a double free of  test_pool .
        - Added missing includes for type test headers.
        - Updated test plan counts.
    - **t/c/convert/sv_to_upb.c:**
        - Added missing includes for type test headers.
        - Updated test plan counts.
        - Corrected Perl interpreter initialization.
    - **t/c/convert/types/**:
        - Added missing  test_util.h  include in new type test headers.
        - Completed the set of  upb_to_sv  test cases for all scalar types by adding optional and repeated tests for  sfixed32 ,  sfixed64 ,  sint32 , and  sint64 , and adding repeated tests to the remaining scalar type files.
    - **Documentation:**
        - Updated  01-xs-testing.md  with more debugging tips, including ASan usage and checking for double frees and typos.
        - Updated  xs_learnings.md  with details from the recent segfault.
        - Updated  llm-plan-execution-instructions.md  to emphasize debugging steps.
## 0.22	2025-12-19
commit 2c171d9a5027e0150eae629729c9104e7f6b9d2b
Author: C.J. Collier 
Date:   Fri Dec 19 23:41:02 2025 +0000
    feat(perl,testing): Initialize C test framework and build system
    
    This commit sets up the foundation for the C-level tests and the build system for the Perl Protobuf module:
    
    1.  **Makefile.PL Enhancements:**
        *   Integrates  Devel::PPPort  to generate  ppport.h  for better portability.
        *   Object files now retain their path structure (e.g.,  xs/convert/sv_to_upb.o ) instead of being flattened, improving build clarity.
        *   The  MY::postamble  is significantly revamped to dynamically generate build rules for all C tests located in  t/c/  based on the  t/c/c_test_config.json  file.
        *   C tests are linked against  libprotobuf_common.a  and use  ExtUtils::Embed  flags.
        *   Added  JSON::MaybeXS  to  PREREQ_PM .
        *   The  test  target now also depends on the  test_c  target.
    
    2.  **C Test Infrastructure ( t/c/ ):
        *   Introduced  t/c/c_test_config.json  to configure individual C test builds, specifying dependencies and extra source files.
        *   Created  t/c/convert/test_util.c  and  .h  for shared test functions like loading descriptors.
        *   Initial  t/c/convert/upb_to_sv.c  and  t/c/convert/sv_to_upb.c  test runners.
        *   Basic  t/c/integration/030_protobuf_coro.c  for Coro safety testing on core utils using  libcoro .
        *   Basic  t/c/integration/035_croak_test.c  for testing exception handling.
        *   Basic  t/c/integration/050_convert.c  for integration testing conversions.
    
    3.  **Test Proto:** Updated  t/data/test.proto  with more field types for conversion testing and regenerated  test_descriptor.bin .
    
    4.  **XS Test Harness ( t/c/upb-perl-test.h ):** Added  like_n  macro for length-aware regex matching.
    
    5.  **Documentation:** Updated architecture and plan documents to reflect the C test structure.
    6.  **ERRSV Testing:** Note that the C tests ( t/c/ ) will primarily check *if* a  croak  occurs (i.e., that the exception path is taken), but will not assert on the string content of  ERRSV . Reliably testing  $@  content requires the full Perl test environment with  Test::More , which will be done in the  .t  files when testing the Perl API.
    
    This provides a solid base for developing and testing the XS and C components of the module.
## 0.21	2025-12-18
commit a8b6b6100b2cf29c6df1358adddb291537d979bc
Author: C.J. Collier 
Date:   Thu Dec 18 04:20:47 2025 +0000
    test(C): Add integration tests for Milestone 2 components
    
    - Created t/c/integration/030_protobuf.c to test interactions
      between obj_cache, arena, and utils.
    - Added this test to t/c/c_test_config.json.
    - Verified that all C tests for Milestones 2 and 3 pass,
      including the libcoro-based stress test.
## 0.20	2025-12-18
commit 0fcad68680b1f700a83972a7c1c48bf3a6958695
Author: C.J. Collier 
Date:   Thu Dec 18 04:14:04 2025 +0000
    docs(plan): Add guideline review reminders to milestones
    
    - Added a "[ ] REFRESH: Review all documents in @perl/doc/guidelines/**"
      checklist item to the start of each component implementation
      milestone (C and Perl layers).
    - This excludes Integration Test milestones.
## 0.19	2025-12-18
commit 987126c4b09fcdf06967a98fa3adb63d7de59a34
Author: C.J. Collier 
Date:   Thu Dec 18 04:05:53 2025 +0000
    docs(plan): Add C-level and Perl-level Coro tests to milestones
    
    - Added checklist items for  libcoro -based C tests
      (e.g.,  t/c/integration/050_convert_coro.c ) to all C layer
      integration milestones (050 through 220).
    - Updated  030_Integration_Protobuf.md  to standardise checklist
      items for the existing  030_protobuf_coro.c  test.
    - Removed the single  xt/author/coro-safe.t  item from
       010_Build.md .
    - Added checklist items for Perl-level  Coro  tests
      (e.g.,  xt/coro/240_arena.t ) to each Perl layer
      integration milestone (240 through 400).
    - Created  perl/t/c/c_test_config.json  to manage C test
      configurations externally.
    - Updated  perl/doc/architecture/testing/01-xs-testing.md  to describe
      both C-level  libcoro  and Perl-level  Coro  testing strategies.
## 0.18	2025-12-18
commit 6095a5a610401a6035a81429d0ccb9884d53687b
Author: C.J. Collier 
Date:   Thu Dec 18 02:34:31 2025 +0000
    added coro testing to c layer milestones
## 0.17	2025-12-18
commit cc0aae78b1f7f675fc8a1e99aa876c0764ea1cce
Author: C.J. Collier 
Date:   Thu Dec 18 02:26:59 2025 +0000
    docs(plan): Refine test coverage checklist items for SMARTness
    
    - Updated the "Tests provide full coverage" checklist items in
      C layer plan files (020, 040, 060, 080, 100, 120, 140, 160, 180, 200)
      to explicitly mention testing all public functions in the
      corresponding header files.
    - Expanded placeholder checklists in 140, 160, 180, 200.
    - Updated the "Tests provide full coverage" and "Add coverage checks"
      checklist items in Perl layer plan files (230, 250, 270, 290, 310, 330,
      350, 370, 390) to be more specific about the scope of testing
      and the use of  Test::TestCoverage .
    - Expanded Well-Known Types milestone (350) to detail each type.
## 0.16	2025-12-18
commit e4b601f14e3817a17b0f4a38698d981dd4cb2818
Author: C.J. Collier 
Date:   Thu Dec 18 02:07:35 2025 +0000
    docs(plan): Full refactoring of C and Perl plan files
    
    - Split both ProtobufPlan-C.md and ProtobufPlan-Perl.md into
      per-milestone files under the  perl/doc/plan/  directory.
    - Introduced Integration Test milestones after each component
      milestone in both C and Perl plans.
    - Numbered milestone files sequentially (e.g., 010_Build.md,
      230_Perl_Arena.md).
    - Updated main ProtobufPlan-C.md and ProtobufPlan-Perl.md to
      act as Tables of Contents.
    - Ensured consistent naming for integration test files
      (e.g.,  t/c/integration/030_protobuf.c ,  t/integration/260_descriptor_pool.t ).
    - Added architecture review steps to the end of all milestones.
    - Moved Coro safety test to C layer Milestone 1.
    - Updated Makefile.PL to support new test structure and added Coro.
    - Moved and split t/c/convert.c into t/c/convert/*.c.
    - Moved other t/c/*.c tests into t/c/protobuf/*.c.
    - Deleted old t/c/convert.c.
## 0.15	2025-12-17
commit 649cbacf03abb5e7293e3038bb451c0406e9d0ce
Author: C.J. Collier 
Date:   Wed Dec 17 23:51:22 2025 +0000
    docs(plan): Refactor and reset ProtobufPlan.md
    
    - Split the plan into ProtobufPlan-C.md and ProtobufPlan-Perl.md.
    - Reorganized milestones to clearly separate C layer and Perl layer development.
    - Added more granular checkboxes for each component:
      - C Layer: Create test, Test coverage, Implement, Tests pass.
      - Perl Layer: Create test, Test coverage, Implement Module/XS, Tests pass, C-Layer adjustments.
    - Reset all checkboxes to  [ ]  to prepare for a full audit.
    - Updated status in architecture/api and architecture/core documents to "Not Started".
    
    feat(obj_cache): Add unregister function and enhance tests
    
    - Added  protobuf_unregister_object  to  xs/protobuf/obj_cache.c .
    - Updated  xs/protobuf/obj_cache.h  with the new function declaration.
    - Expanded tests in  t/c/protobuf_obj_cache.c  to cover unregistering,
      overwriting keys, and unregistering non-existent keys.
    - Corrected the test plan count in  t/c/protobuf_obj_cache.c  to 17.
## 0.14	2025-12-17
commit 40b6ad14ca32cf16958d490bb575962f88d868a1
Author: C.J. Collier 
Date:   Wed Dec 17 23:18:27 2025 +0000
    feat(arena): Complete C layer for Arena wrapper
    
    This commit finalizes the C-level implementation for the Protobuf::Arena wrapper.
    
    - Adds  PerlUpb_Arena_Destroy  for proper cleanup from Perl's DEMOLISH.
    - Enhances error checking in  PerlUpb_Arena_Get .
    - Expands C-level tests in  t/c/protobuf_arena.c  to cover memory allocation
      on the arena and lifecycle through  PerlUpb_Arena_Destroy .
    - Corrects embedded Perl initialization in the C test.
    
    docs(plan): Refactor ProtobufPlan.md
    
    - Restructures the development plan to clearly separate "C Layer" and
      "Perl Layer" tasks within each milestone.
    - This aligns the plan with the "C-First Implementation Strategy" and improves progress tracking.
## 0.13	2025-12-17
commit c1e566c25f62d0ae9f195a6df43b895682652c71
Author: C.J. Collier 
Date:   Wed Dec 17 22:00:40 2025 +0000
    refactor(perl): Rename C tests and enhance Makefile.PL
    
    - Renamed test files in  t/c/  to better match the  xs  module structure:
        -  01-cache.c  ->  protobuf_obj_cache.c 
        -  02-arena.c  ->  protobuf_arena.c 
        -  03-utils.c  ->  protobuf_utils.c 
        -  04-convert.c  ->  convert.c 
        -  load_test.c  ->  upb_descriptor_load.c 
    - Updated  perl/Makefile.PL  to reflect the new test names in  MY::postamble 's  $c_test_config .
    - Refactored the  $c_test_config  generation in  Makefile.PL  to reduce repetition by using a default flags hash and common dependencies array.
    - Added a  fail()  macro to  perl/t/c/upb-perl-test.h  for consistency.
    - Modified  t/c/upb_descriptor_load.c  to use the  t/c/upb-perl-test.h  macros, making its output consistent with other C tests.
    - Added a skeleton for  t/c/convert.c  to test the conversion functions.
    - Updated documentation in  ProtobufPlan.md  and  architecture/testing/01-xs-testing.md  to reflect new test names.
## 0.12	2025-12-17
commit d8cb5dd415c6c129e71cd452f78e29de398a82c9
Author: C.J. Collier 
Date:   Wed Dec 17 20:47:38 2025 +0000
    feat(perl): Refactor XS code into subdirectories
    
    This commit reorganizes the C code in the  perl/xs/  directory into subdirectories, mirroring the structure of the Python UPB extension. This enhances modularity and maintainability.
    
    - Created subdirectories for each major component:  convert ,  descriptor ,  descriptor_containers ,  descriptor_pool ,  extension_dict ,  map ,  message ,  protobuf ,  repeated , and  unknown_fields .
    - Created skeleton  .h  and  .c  files within each subdirectory to house the component-specific logic.
    - Updated top-level component headers (e.g.,  perl/xs/descriptor.h ) to include the new sub-headers.
    - Updated top-level component source files (e.g.,  perl/xs/descriptor.c ) to include their main header and added stub initialization functions (e.g.,  PerlUpb_InitDescriptor ).
    - Moved code from the original  perl/xs/protobuf.c  to new files in  perl/xs/protobuf/  (arena, obj_cache, utils).
    - Moved code from the original  perl/xs/convert.c  to new files in  perl/xs/convert/  (upb_to_sv, sv_to_upb).
    - Updated  perl/Makefile.PL  to use a glob ( xs/*/*.c ) to find the new C source files in the subdirectories.
    - Added  perl/doc/architecture/core/07-xs-file-organization.md  to document the new structure.
    - Updated  perl/doc/ProtobufPlan.md  and other architecture documents to reference the new organization.
    - Corrected self-referential includes in the newly created .c files.
    
    This restructuring provides a solid foundation for further development and makes it easier to port logic from the Python implementation.
## 0.11	2025-12-17
commit cdedcd13ded4511b0464f5d3bdd72ce6d34e73fc
Author: C.J. Collier 
Date:   Wed Dec 17 19:57:52 2025 +0000
    feat(perl): Implement C-first testing and core XS infrastructure
    
    This commit introduces a significant refactoring of the Perl XS extension, adopting a C-first development approach to ensure a robust foundation.
    
    Key changes include:
    
    -   **C-Level Testing Framework:** Established a C-level testing system in  t/c/  with a dedicated Makefile, using an embedded Perl interpreter. Initial tests cover the object cache ( 01-cache.c ), arena wrapper ( 02-arena.c ), and utility functions ( 03-utils.c ).
    -   **Core XS Infrastructure:**
        -   Implemented a global object cache ( xs/protobuf.c ) to manage Perl wrappers for UPB objects, using weak references.
        -   Created an  upb_Arena  wrapper ( xs/protobuf.c ).
        -   Consolidated common XS helper functions into  xs/protobuf.h  and  xs/protobuf.c .
    -   **Makefile.PL Enhancements:** Updated to support building and linking C tests, incorporating flags from  ExtUtils::Embed , and handling both  .c  and  .cc  source files.
    -   **XS File Reorganization:** Restructured XS files to mirror the Python UPB extension's layout (e.g.,  message.c ,  descriptor.c ). Removed older, monolithic  .xs  files.
    -   **Typemap Expansion:** Added extensive typemap entries in  perl/typemap  to handle conversions between Perl objects and various  const upb_*Def*  pointers.
    -   **Descriptor Tests:** Added a new test suite  t/02-descriptor.t  to validate descriptor loading and accessor methods.
    -   **Documentation:** Updated development plans and guidelines ( ProtobufPlan.md ,  xs_learnings.md , etc.) to reflect the C-first strategy, new testing methods, and lessons learned.
    -   **Build Cleanup:** Removed  ppport.h  from  .gitignore  as it's no longer used, due to  -DPERL_NO_PPPORT  being set in  Makefile.PL .
    
    This C-first approach allows for more isolated and reliable testing of the core logic interacting with the UPB library before higher-level Perl APIs are built upon it.
## 0.10	2025-12-17
commit 1ef20ade24603573905cb0376670945f1ab5d829
Author: C.J. Collier 
Date:   Wed Dec 17 07:08:29 2025 +0000
    feat(perl): Implement C-level tests and core XS utils
    
    This commit introduces a C-level testing framework for the XS layer and implements key components:
    
    1.  **C-Level Tests ( t/c/ )**:
        *   Added  t/c/Makefile  to build standalone C tests.
        *   Created  t/c/upb-perl-test.h  with macros for TAP-compliant C tests ( plan ,  ok ,  is ,  is_string ,  diag ).
        *   Implemented  t/c/01-cache.c  to test the object cache.
        *   Implemented  t/c/02-arena.c  to test  Protobuf::Arena  wrappers.
        *   Implemented  t/c/03-utils.c  to test string utility functions.
        *   Corrected include paths and diagnostic messages in C tests.
    
    2.  **XS Object Cache ( xs/protobuf.c )**:
        *   Switched to using stringified pointers ( %p ) as hash keys for stability.
        *   Fixed a critical double-free bug in  PerlUpb_ObjCache_Delete  by removing an extra  SvREFCNT_dec  on the lookup key.
    
    3.  **XS Arena Wrapper ( xs/protobuf.c )**:
        *   Corrected  PerlUpb_Arena_New  to use  newSVrv  and  PTR2IV  for opaque object wrapping.
        *   Corrected  PerlUpb_Arena_Get  to safely unwrap the arena pointer.
    
    4.  **Makefile.PL ( perl/Makefile.PL )**:
        *   Added  -Ixs  to  INC  to allow C tests to find  t/c/upb-perl-test.h  and  xs/protobuf.h .
        *   Added  LIBS  to link  libprotobuf_common.a  into the main  Protobuf.so .
        *   Added C test targets  01-cache ,  02-arena ,  03-utils  to the test config in  MY::postamble .
    
    5.  **Protobuf.pm ( perl/lib/Protobuf.pm )**:
        *   Added  use XSLoader;  to load the compiled XS code.
    
    6.  **New files  xs/util.h **:
        *   Added initial type conversion function.
    
    These changes establish a foundation for testing the C-level interface with UPB and fix crucial bugs in the object cache implementation.
## 0.09	2025-12-17
commit 07d61652b032b32790ca2d3848243f9d75ea98f4
Author: C.J. Collier 
Date:   Wed Dec 17 04:53:34 2025 +0000
    feat(perl): Build system and C cache test for Perl XS
    
    This commit introduces the foundational pieces for the Perl XS implementation, focusing on the build system and a C-level test for the object cache.
    
    -   **Makefile.PL:**
        -   Refactored C test compilation rules in  MY::postamble  to use a hash ( $c_test_config ) for better organization and test-specific flags.
        -   Integrated  ExtUtils::Embed  to provide necessary compiler and linker flags for embedding the Perl interpreter, specifically for the  t/c/01-cache.c  test.
        -   Correctly constructs the path to the versioned Perl library ( libperl.so.X.Y.Z ) using  $Config archlib  and  $Config libperl  to ensure portability.
        -   Removed  VERSION_FROM  and  ABSTRACT_FROM  to avoid dependency on  .pm  files for now.
    
    -   **C Cache Test (t/c/01-cache.c):**
        -   Added a C test to exercise the object cache functions implemented in  xs/protobuf.c .
        -   Includes tests for adding, getting, deleting, and weak reference behavior.
    
    -   **XS Cache Implementation (xs/protobuf.c, xs/protobuf.h):**
        -   Implemented  PerlUpb_ObjCache_Init ,  PerlUpb_ObjCache_Add ,  PerlUpb_ObjCache_Get ,  PerlUpb_ObjCache_Delete , and  PerlUpb_ObjCache_Destroy .
        -   Uses a Perl hash ( HV* ) for the cache.
        -   Keys are string representations of the C pointers, created using  snprintf  with  "%llx" .
        -   Values are weak references ( sv_rvweaken ) to the Perl objects ( SV* ).
        -    PerlUpb_ObjCache_Get  now correctly returns an incremented reference to the original SV, not a copy.
        -    PerlUpb_ObjCache_Destroy  now clears the hash before decrementing its refcount.
    
    -   **t/c/upb-perl-test.h:**
        -   Updated  is_sv  to perform direct pointer comparison ( got == expected ).
    
    -   **Minor:** Added  util.h  (currently empty), updated  typemap .
    
    These changes establish a working C-level test environment for the XS components.
## 0.08	2025-12-17
commit d131fd22ea3ed8158acb9b0b1fe6efd856dc380e
Author: C.J. Collier 
Date:   Wed Dec 17 02:57:48 2025 +0000
    feat(perl): Update docs and core XS files
    
    - Explicitly add TDD cycle to ProtobufPlan.md.
    - Clarify mirroring of Python implementation in upb-interfacing.md for both C and Perl layers.
    - Branch and adapt python/protobuf.h and python/protobuf.c to perl/xs/protobuf.h and perl/xs/protobuf.c, including the object cache implementation. Removed old cache.* files.
    - Create initial C test for the object cache in t/c/01-cache.c.
## 0.07	2025-12-17
commit 56fd6862732c423736a2f9a9fb1a2816fc59e9b0
Author: C.J. Collier 
Date:   Wed Dec 17 01:09:18 2025 +0000
    feat(perl): Align Perl UPB architecture docs with Python
    
    Updates the Perl Protobuf architecture documents to more closely align with the design and implementation strategies used in the Python UPB extension.
    
    Key changes:
    
    -   **Object Caching:** Mandates a global, per-interpreter cache using weak references for all UPB-derived objects, mirroring Python's  PyUpb_ObjCache .
    -   **Descriptor Containers:** Introduces a new document outlining the plan to use generic XS container types (Sequence, ByNameMap, ByNumberMap) with vtables to handle collections of descriptors, similar to Python's  descriptor_containers.c .
    -   **Testing:** Adds a note to the testing strategy to port relevant test cases from the Python implementation to ensure feature parity.
## 0.06	2025-12-17
commit 6009ce6ab64eccce5c48729128e5adf3ef98e9ae
Author: C.J. Collier 
Date:   Wed Dec 17 00:28:20 2025 +0000
    feat(perl): Implement object caching and fix build
    
    This commit introduces several key improvements to the Perl XS build system and core functionality:
    
    1.  **Object Caching:**
        *   Introduces  xs/protobuf.c  and  xs/protobuf.h  to implement a caching mechanism ( protobuf_c_to_perl_obj ) for wrapping UPB C pointers into Perl objects. This uses a hash and weak references to ensure object identity and prevent memory leaks.
        *   Updates the  typemap  to use  protobuf_c_to_perl_obj  for  upb_MessageDef *  output, ensuring descriptor objects are cached.
        *   Corrected  sv_weaken  to the correct  sv_rvweaken  function.
    
    2.  **Makefile.PL Enhancements:**
        *   Switched to using the Bazel-generated UPB descriptor sources from  bazel-bin/src/google/protobuf/_virtual_imports/descriptor_proto/google/protobuf/ .
        *   Updated  INC  paths to correctly locate the generated headers.
        *   Refactored  MY::dynamic_lib  to ensure the static library  libprotobuf_common.a  is correctly linked into each generated  .so  module, resolving undefined symbol errors.
        *   Overrode  MY::test  to use  prove -b -j$(nproc) t/*.t xt/*.t  for running tests.
        *   Cleaned up  LIBS  and  LDDLFLAGS  usage.
    
    3.  **Documentation:**
        *   Updated  ProtobufPlan.md  to reflect the current status and design decisions.
        *   Reorganized architecture documents into subdirectories.
        *   Added  object-caching.md  and  c-perl-interface.md .
        *   Updated  llm-guidance.md  with notes on  upb/upb.h  and  sv_rvweaken .
    
    4.  **Testing:**
        *   Fixed  xt/03-moo_immutable.t  to skip tests if no Moo modules are found.
    
    This resolves the build issues and makes the core test suite pass.
## 0.05	2025-12-16
commit 177d2f3b2608b9d9c415994e076a77d8560423b8
Author: C.J. Collier 
Date:   Tue Dec 16 19:51:36 2025 +0000
    Refactor: Rename namespace to Protobuf, build system and doc updates
    
    This commit refactors the primary namespace from  ProtoBuf  to  Protobuf 
    to align with the style guide. This involves renaming files, directories,
    and updating package names within all Perl and XS files.
    
    **Namespace Changes:**
    
    *   Renamed  perl/lib/ProtoBuf  to  perl/lib/Protobuf .
    *   Moved and updated  ProtoBuf.pm  to  Protobuf.pm .
    *   Moved and updated  ProtoBuf::Descriptor  to  Protobuf::Descriptor  (.pm & .xs).
    *   Removed other  ProtoBuf::*  stubs (Arena, DescriptorPool, Message).
    *   Updated  MODULE  and  PACKAGE  in  Descriptor.xs .
    *   Updated  NAME ,  *_FROM  in  perl/Makefile.PL .
    *   Replaced  ProtoBuf  with  Protobuf  throughout  perl/typemap .
    *   Updated namespaces in test files  t/01-load-protobuf-descriptor.t  and  t/02-descriptor.t .
    *   Updated namespaces in all documentation files under  perl/doc/ .
    *   Updated paths in  perl/.gitignore .
    
    **Build System Enhancements (Makefile.PL):**
    
    *   Included  xs/*.c  files in the common object files list.
    *   Added  -I.  to the  INC  paths.
    *   Switched from  MYEXTLIB  to  LIBS => ['-L$(CURDIR) -lprotobuf_common']  for linking.
    *   Removed custom keys passed to  WriteMakefile  for postamble.
    *    MY::postamble  now sources variables directly from the main script scope.
    *   Added  all :: $ common_lib  dependency in  MY::postamble .
    *   Added  t/c/load_test.c  compilation rule in  MY::postamble .
    *   Updated  clean  target to include  blib .
    *   Added more modules to  TEST_REQUIRES .
    *   Removed the explicit  PM  and  XS  keys from  WriteMakefile , relying on  XSMULTI => 1 .
    
    **New Files:**
    
    *    perl/lib/Protobuf.pm 
    *    perl/lib/Protobuf/Descriptor.pm 
    *    perl/lib/Protobuf/Descriptor.xs 
    *    perl/t/01-load-protobuf-descriptor.t 
    *    perl/t/02-descriptor.t 
    *    perl/t/c/load_test.c : Standalone C test for UPB.
    *    perl/xs/types.c  &  perl/xs/types.h : For Perl/C type conversions.
    *    perl/doc/architecture/upb-interfacing.md 
    *    perl/xt/03-moo_immutable.t : Test for Moo immutability.
    
    **Deletions:**
    
    *   Old test files:  t/00_load.t ,  t/01_basic.t ,  t/02_serialize.t ,  t/03_message.t ,  t/04_descriptor_pool.t ,  t/05_arena.t ,  t/05_message.t .
    *   Removed  lib/ProtoBuf.xs  as it's not needed with  XSMULTI .
    
    **Other:**
    
    *   Updated  test_descriptor.bin  (binary change).
    *   Significant content updates to markdown documentation files in  perl/doc/architecture  and  perl/doc/internal  reflecting the new architecture and learnings.
## 0.04	2025-12-14
commit 92de5d482c8deb9af228f4b5ce31715d3664d6ee
Author: C.J. Collier 
Date:   Sun Dec 14 21:28:19 2025 +0000
    feat(perl): Implement Message object creation and fix lifecycles
    
    This commit introduces the basic structure for  ProtoBuf::Message  object
    creation, linking it with  ProtoBuf::Descriptor  and  ProtoBuf::DescriptorPool ,
    and crucially resolves a SEGV by fixing object lifecycle management.
    
    Key Changes:
    
    1.  ** ProtoBuf::Descriptor :** Added  _pool  attribute to hold a strong
        reference to the parent  ProtoBuf::DescriptorPool . This is essential to
        prevent the pool and its C  upb_DefPool  from being garbage collected
        while a descriptor is still in use.
    
    2.  ** ProtoBuf::DescriptorPool :**
        *    find_message_by_name : Now passes the  $self  (the pool object) to the
             ProtoBuf::Descriptor  constructor to establish the lifecycle link.
        *   XSUB  pb_dp_find_message_by_name : Updated to accept the pool  SV*  and
            store it in the descriptor's  _pool  attribute.
        *   XSUB  _load_serialized_descriptor_set : Renamed to avoid clashing with the
            Perl method name. The Perl wrapper now correctly calls this internal XSUB.
        *    DEMOLISH : Made safer by checking for attribute existence.
    
    3.  ** ProtoBuf::Message :**
        *   Implemented using Moo with lazy builders for  _upb_arena  and
             _upb_message .
        *    _descriptor  is a required argument to  new() .
        *   XS functions added for creating the arena ( pb_msg_create_arena ) and
            the  upb_Message  ( pb_msg_create_upb_message ).
        *    pb_msg_create_upb_message  now extracts the  upb_MessageDef*  from the
            descriptor and uses  upb_MessageDef_MiniTable()  to get the minitable
            for  upb_Message_New() .
        *    DEMOLISH : Added to free the message's arena.
    
    4.  ** Makefile.PL :**
        *   Added  -g  to  CCFLAGS  for debugging symbols.
        *   Added Perl CORE include path to  MY::postamble 's  base_flags .
    
    5.  **Tests:**
        *    t/04_descriptor_pool.t : Updated to check the structure of the
            returned  ProtoBuf::Descriptor .
        *    t/05_message.t : Now uses a descriptor obtained from a real pool to
            test  ProtoBuf::Message->new() .
    
    6.  **Documentation:**
        *   Updated  ProtobufPlan.md  to reflect progress.
        *   Updated several files in  doc/architecture/  to match the current
            implementation details, especially regarding arena management and object
            lifecycles.
        *   Added  doc/internal/development_cycle.md  and  doc/internal/xs_learnings.md .
    
    With these changes, the SEGV is resolved, and message objects can be successfully
    created from descriptors.
## 0.03	2025-12-14
commit 6537ad23e93680c2385e1b571d84ed8dbe2f68e8
Author: C.J. Collier 
Date:   Sun Dec 14 20:23:41 2025 +0000
    Refactor(perl): Object-Oriented DescriptorPool with Moo
    
    This commit refactors the  ProtoBuf::DescriptorPool  to be fully object-oriented using Moo, and resolves several issues related to XS, typemaps, and test data.
    
    Key Changes:
    
    1.  **Moo Object:**  ProtoBuf::DescriptorPool.pm  now uses  Moo  to define the class. The  upb_DefPool  pointer is stored as a lazy attribute  _upb_defpool .
    2.  **XS Lifecycle:**  DescriptorPool.xs  now has  pb_dp_create_pool  called by the Moo builder and  pb_dp_free_pool  called from  DEMOLISH  to manage the  upb_DefPool  lifecycle per object.
    3.  **Typemap:** The  perl/typemap  file has been significantly updated to handle the conversion between the  ProtoBuf::DescriptorPool  Perl object and the  upb_DefPool *  C pointer. This includes:
        *   Mapping  upb_DefPool *  to  T_PTR .
        *   An  INPUT  section for  ProtoBuf::DescriptorPool  to extract the pointer from the object's hash, triggering the lazy builder if needed via  call_method .
        *   An  OUTPUT  section for  upb_DefPool *  to convert the pointer back to a Perl integer, used by the builder.
    4.  **Method Renaming:**  add_file_descriptor_set_binary  is now  load_serialized_descriptor_set .
    5.  **Test Data:**
        *   Added  perl/t/data/test.proto  with a sample message and enum.
        *   Generated  perl/t/data/test_descriptor.bin  using  protoc .
        *   Removed  t/data/  from  .gitignore  to ensure test data is versioned.
    6.  **Test Update:**  t/04_descriptor_pool.t  is updated to use the new OO interface, load the generated descriptor set, and check for message definitions.
    7.  **Build Fixes:**
        *   Corrected  #include  paths in  DescriptorPool.xs  to be relative to the  upb/  directory (e.g.,  upb/wire/decode.h ).
        *   Added  -I../upb  to  CCFLAGS  in  Makefile.PL .
        *   Reordered  INC  paths in  Makefile.PL  to prioritize local headers.
    
    **Note:** While tests now pass in some environments, a SEGV issue persists in  make test  runs, indicating a potential memory or lifecycle issue within the XS layer that needs further investigation.
## 0.02	2025-12-14
commit 6c9a6f1a5f774dae176beff02219f504ea3a6e07
Author: C.J. Collier 
Date:   Sun Dec 14 20:13:09 2025 +0000
    Fix(perl): Correct UPB build integration and generated file handling
    
    This commit resolves several issues to achieve a successful build of the Perl extension:
    
    1.  **Use Bazel Generated Files:** Switched from compiling UPB's stage0 descriptor.upb.c to using the Bazel-generated  descriptor.upb.c  and  descriptor.upb_minitable.c  located in  bazel-bin/src/google/protobuf/_virtual_imports/descriptor_proto/google/protobuf/ .
    2.  **Updated Include Paths:** Added the  bazel-bin  path to  INC  in  WriteMakefile  and to  base_flags  in  MY::postamble  to ensure the generated headers are found during both XS and static library compilation.
    3.  **Removed Stage0:** Removed references to  UPB_STAGE0_DIR  and no longer include headers or source files from  upb/reflection/stage0/ .
    4.  **-fPIC:** Explicitly added  -fPIC  to  CCFLAGS  in  WriteMakefile  and ensured  $(CCFLAGS)  is used in the custom compilation rules in  MY::postamble . This guarantees all object files in the static library are compiled with position-independent code, resolving linker errors when creating the shared objects for the XS modules.
    5.  **Refined UPB Sources:** Used  File::Find  to recursively find UPB C sources, excluding  /conformance/  and  /reflection/stage0/  to avoid conflicts and unnecessary compilations.
    6.  **Arena Constructor:** Modified  ProtoBuf::Arena::pb_arena_new  XSUB to accept the class name argument passed from Perl, making it a proper constructor.
    7.  **.gitignore:** Added patterns to  perl/.gitignore  to ignore generated C files from XS ( lib/*.c ,  lib/ProtoBuf/*.c ), the copied  src_google_protobuf_descriptor.pb.cc , and the  t/data  directory.
    8.  **Build Documentation:** Updated  perl/doc/architecture/upb-build-integration.md  to reflect the new build process, including the Bazel prerequisite, include paths,  -fPIC  usage, and  File::Find .
    
    Build Steps:
    1.   bazel build //src/google/protobuf:descriptor_upb_proto  (from repo root)
    2.   cd perl 
    3.   perl Makefile.PL 
    4.   make 
    5.   make test  (Currently has expected failures due to missing test data implementation).
## 0.01	2025-12-14
commit 3e237e8a26442558c94075766e0d4456daaeb71d
Author: C.J. Collier 
Date:   Sun Dec 14 19:34:28 2025 +0000
    feat(perl): Initialize Perl extension scaffold and build system
    
    This commit introduces the  perl/  directory, laying the groundwork for the Perl Protocol Buffers extension. It includes the essential build files, linters, formatter configurations, and a vendored Devel::PPPort for XS portability.
    
    Key components added:
    
    *   ** Makefile.PL **: The core  ExtUtils::MakeMaker  build script. It's configured to:
        *   Build a static library ( libprotobuf_common.a ) from UPB, UTF8_Range, and generated protobuf C/C++ sources.
        *   Utilize  XSMULTI => 1  to create separate shared objects for  ProtoBuf ,  ProtoBuf::Arena , and  ProtoBuf::DescriptorPool .
        *   Link each XS module against the common static library.
        *   Define custom compilation rules in  MY::postamble  to handle C vs. C++ flags and build the static library.
        *   Set up include paths for the project root, UPB, and other dependencies.
    
    *   **XS Stubs ( .xs  files)**:
        *    lib/ProtoBuf.xs : Placeholder for the main module's XS functions.
        *    lib/ProtoBuf/Arena.xs : XS interface for  upb_Arena  management.
        *    lib/ProtoBuf/DescriptorPool.xs : XS interface for  upb_DefPool  management.
    
    *   **Perl Module Stubs ( .pm  files)**:
        *    lib/ProtoBuf.pm : Main module, loads XS.
        *    lib/ProtoBuf/Arena.pm : Perl class for Arenas.
        *    lib/ProtoBuf/DescriptorPool.pm : Perl class for Descriptor Pools.
        *    lib/ProtoBuf/Message.pm : Base class for messages (TBD).
    
    *   **Configuration Files**:
        *    .gitignore : Ignores build artifacts, editor files, etc.
        *    .perlcriticrc : Configures Perl::Critic for static analysis.
        *    .perltidyrc : Configures perltidy for code formatting.
    
    *   ** Devel::PPPort **: Vendored version 3.72 to generate  ppport.h  for XS compatibility across different Perl versions.
    
    *   ** typemap **: Custom typemap for XS argument/result conversion.
    
    *   **Documentation ( doc/ )**: Initial architecture and plan documents.
    
    This provides a solid foundation for developing the UPB-based Perl extension.

21 December 2025

Russell Coker: Links December 2025

Russ Allbery wrote an interesting review of Politics on the Edge, by Rory Stewart who sems like one of the few conservative politicians I could respect and possibly even like [1]. It has some good insights about the problems with our current political environment. The NY Times has an amusing article about the attempt to sell the solution to the CIA s encrypted artwork [2]. Wired has an interesting article about computer face recognition systems failing on people with facial disabilities or scars [3]. This is a major accessibility issue potentially violating disability legislation and a demonstration of the problems of fully automating systems when there should be a human in the loop. The October 2025 report from the Debian Reproducible Builds team is particularly interesting [4]. kpcyrd forwarded a fascinating tidbit regarding so-called ninja and samurai build ordering, that uses data structures in which the pointer values returned from malloc are used to determine some order of execution LOL Louis Rossmann made an insightful youtube video about the moral case for piracy of software and media [5]. Louis Rossman made an insightful video about the way that Hyundai is circumventing Right to Repair laws to make repairs needlessly expensive [6]. Korean cars aren t much good nowadays. Their prices keep increasing and the quality doesn t. Brian Krebs wrote an interesting article about how Google is taking legal action against SMS phishing crime groups [7]. We need more of this! Josh Griffiths wrote an informative blog post about how YouTube is awful [8]. I really should investigate Peertube. Louis Rossman made an informative YouTube video about Right to Repair and the US military, if even the US military is getting ripped off by this it s a bigger problem than most people realise [9]. He also asks the rhetorical question of whether politicians are bought or whether it s a subscription model . Brian Krebs wrote an informative article about the US plans to ban TP Link devices, OpenWRT seems like a good option [10]. Brian Krebs wrote an informative article about free streaming Android TV boxes that act as hidden residential VPN proxies [11]. Also the free streaming violates copyright law. Bruce Schneier and Nathan E. Sanders wrote an interesting article about ways that AI is being used to strengthen democracy [12]. Cory Doctorow wrote an insightful article about the incentives for making shitty goods and services and why we need legislation to protect consumers [13]. Linus Tech Tips has an interesting interview with Linus Torvalds [14]. Interesting video about the Kowloon Walled City [15]. It would be nice if a government deliberately created a hive city like that, the only example I know of is the Alaskan town in a single building. David Brin wrote an insightful set of 3 blog posts about a Democratic American deal that could improve the situation there [16].

19 December 2025

Otto Kek l inen: Backtesting trailing stop-loss strategies with Python and market data

Featured image of post Backtesting trailing stop-loss strategies with Python and market dataIn January 2024 I wrote about the insanity of the Magnificent Seven dominating the MSCI World Index, and I wondered how long the number can continue to go up? It has continued to surge upward at an accelerating pace, which makes me worry that a crash is likely closer. As a software professional, I decided to analyze whether using stop-loss orders could reliably automate avoiding deep drawdowns. As everyone with some savings in the stock market (hopefully) knows, the stock market eventually experiences crashes. It is just a matter of when and how deep the crash will be. Staying on the sidelines for years is not a good investment strategy, as inflation will erode the value of your savings. Assuming the current true inflation rate is around 7%, a restaurant dinner that costs 20 euros today will cost 24.50 euros in three years. Savings of 1000 euros today would drop in purchasing power from 50 dinners to only 40 dinners in three years. Hence, if you intend to retain the value of your hard-earned savings, they need to be invested in something that grows in value. Most people try to beat inflation by buying shares in stable companies, directly or via broad market ETFs. These historically grow faster than inflation during normal years, but likely drop in value during recessions.

What is a trailing stop-loss order? What if you could buy stocks to benefit from their value increasing without having to worry about a potential crash? All modern online stock brokers have a feature called stop-loss, where you can enter a price at which your stocks automatically get sold if they drop down to that price. A trailing stop-loss order is similar, but instead of a fixed price, you enter a margin (e.g. 10%). If the stock price rises, the stop-loss price will trail upwards by that margin. For example, if you buy a share at 100 euros and it has risen to 110 euros, you can set a 10% trailing stop-loss order which automatically sells it if the price drops 10% from the peak of 110 euros, at 99 euros. Thus, no matter what happens, you only lost 1 euro. And if the stock price continues to rise to 150 euros, the trailing stop-loss would automatically readjust to 150 euros minus 10%, which is 135 euros (150-15=135). If the price dropped to 135 euros, you would lock in a gain of 35 euros, which is not the peak price of 150 euros, but still better than whatever the price fell down to as a result of a large crash. In the simple case above, it obviously makes sense in theory, but it might not make sense in practice. Prices constantly oscillate, so you don t want a margin that is too small, otherwise you exit too early. Conversely, having a large margin may result in too large a drawdown before exiting. If markets crash rapidly, it might be that nobody buys your stocks at the stop-loss price, and shares have to be sold at an even lower price. Also, what will you do once the position is sold? The reason you invested in the stock market was to avoid holding cash, so would you buy the same stock back when the crash bottoms? But how will you know when the bottom has been reached?

Backtesting stock market strategies with Python, YFinance, Pandas and Lightweight Charts I am not a professional investor, and nobody should take investment advice from me. However, I know what backtesting is and how to leverage open source software. So, I wrote a Python script to test if the trading strategy of using trailing stop-loss orders with specific margin values would have worked for a particular stock. First you need to have data. YFinance is a handy Python library that can be used to download the historic price data for any stock ticker on Yahoo.com. Then you need to manipulate the data. Pandas is the Python data analysis library with advanced data structures for working with relational or labeled data. Finally, to visualize the results, I used Lightweight Charts, which is a fast, interactive library for rendering financial charts, allowing you to plot the stock price, the trailing stop-loss line, and the points where trades would have occurred. I really like how the zoom is implemented in Lightweight Charts, which makes drilling into the data points feel effortless. The full solution is not polished enough to be published for others to use, but you can piece together your own by reusing some of the key snippets. To avoid re-downloading the same data repeatedly, I implemented a small caching wrapper that saves the data locally (as Parquet files):
python
CACHE_DIR.mkdir(parents=True, exist_ok=True)
end_date = datetime.today().strftime("%Y-%m-%d")
cache_file = CACHE_DIR / f" TICKER - START_DATE -- end_date .parquet"

if cache_file.is_file():
 dataframe = pandas.read_parquet(cache_file)
 print(f"Loaded price data from cache:  cache_file ")
else:
 dataframe = yfinance.download(
 TICKER,
 start=START_DATE,
 end=end_date,
 progress=False,
 auto_adjust=False
 )

 dataframe.to_parquet(cache_file)
 print(f"Fetched new price data from Yahoo Finance and cached to:  cache_file ")
The dataframe is a Pandas object with a powerful API. For example, to print a snippet from the beginning and the end of the dataframe to see what the data looks like, you can use:
python
print("First 5 rows of the raw data:")
print(df.head())
print("Last 5 rows of the raw data:")
print(df.tail())
Example output:
First 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2014-01-02 29.956285 55.540001 56.910000 55.349998 56.700001 316552
2014-01-03 30.031801 55.680000 55.990002 55.290001 55.580002 210044
2014-01-06 30.080338 55.770000 56.230000 55.529999 55.560001 185142
2014-01-07 30.943321 57.369999 57.619999 55.790001 55.880001 370397
2014-01-08 31.385597 58.189999 59.209999 57.750000 57.790001 489940
Last 5 rows of the raw data
Price Adj Close Close High Low Open Volume
Ticker BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA BNP.PA
Date
2025-12-11 78.669998 78.669998 78.919998 76.900002 76.919998 357918
2025-12-12 78.089996 78.089996 80.269997 78.089996 79.470001 280477
2025-12-15 79.080002 79.080002 79.449997 78.559998 78.559998 233852
2025-12-16 78.860001 78.860001 79.980003 78.809998 79.430000 283057
2025-12-17 80.080002 80.080002 80.150002 79.080002 79.199997 262818
Adding new columns to the dataframe is easy. For example, I used a custom function to calculate the Relative Strength Index (RSI). To add a new column RSI with a value for every row based on the price from that row, only one line of code is needed, without custom loops:
python
df["RSI"] = compute_rsi(df["price"], period=14)
After manipulating the data, the series can be converted into an array structure and printed as JSON into a placeholder in an HTML template:
python
 baseline_series = [
  "time": ts, "value": val 
 for ts, val in df_plot[["timestamp", BASELINE_LABEL]].itertuples(index=False)
 ]

 baseline_json = json.dumps(baseline_series)
 template = jinja2.Template("template.html")
 rendered_html = template.render(
 title=title,
 heading=heading,
 description=description_html,
 ...
 baseline_json=baseline_json,
 ...
 )

 with open("report.html", "w", encoding="utf-8") as f:
 f.write(rendered_html)
 print("Report generated!")
In the HTML template, the marker variable in Jinja syntax gets replaced with the actual JSON:
html
<!DOCTYPE html>
<html lang="en">
<head>
 <meta charset="UTF-8">
 <title>  title  </title>
 ...
</head>
<body>
 <h1>  heading  </h1>
 <div id="chart"></div>
 <script>
 // Ensure the DOM is ready before we initialise the chart
 document.addEventListener('DOMContentLoaded', () =>  
 // Parse the JSON data passed from Python
 const baselineData =   baseline_json   safe  ;
 const strategyData =   strategy_json   safe  ;
 const markersData =   markers_json   safe  ;

 // Create the chart
 const chart = LightweightCharts.createChart(document.getElementById('chart'),  
 width: document.getElementById('chart').clientWidth,
 height: 500,
 layout:  
 background:   color: "#222"  ,
 textColor: "#ccc"
  ,
 grid:  
 vertLines:   color: "#555"  ,
 horzLines:   color: "#555"  
  
  );

 // Add baseline series
 const baselineSeries = chart.addLineSeries( 
 title: '  baseline_label  ',
 lastValueVisible: false,
 priceLineVisible: false,
 priceLineWidth: 1
  );
 baselineSeries.setData(baselineData);

 baselineSeries.priceScale().applyOptions( 
 entireTextOnly: true
  );

 // Add strategy series
 const strategySeries = chart.addLineSeries( 
 title: '  strategy_label  ',
 lastValueVisible: false,
 priceLineVisible: false,
 color: '#FF6D00'
  );
 strategySeries.setData(strategyData);

 // Add buy/sell markers to the strategy series
 strategySeries.setMarkers(markersData);

 // Fit the chart to show the full data range (full zoom)
 chart.timeScale().fitContent();
  )
 </script>
</body>
</html>
There are also Python libraries built specifically for backtesting investment strategies, such as Backtrader and Zipline, but they do not seem to be actively maintained, and probably have too many features and complexity compared to what I needed for doing this simple test. The screenshot below shows an example of backtesting a strategy on the Waste Management Inc stock from January 2015 to December 2025. The baseline Buy and hold scenario is shown as the blue line and it fully tracks the stock price, while the orange line shows how the strategy would have performed, with markers for the sells and buys along the way. Backtest run example

Results I experimented with multiple strategies and tested them with various parameters, but I don t think I found a strategy that was consistently and clearly better than just buy-and-hold. It basically boils down to the fact that I was not able to find any way to calculate when the crash has bottomed based on historical data. You can only know in hindsight that the price has stopped dropping and is on a steady path to recovery, but at that point it is already too late to buy in. In my testing, most strategies underperformed buy-and-hold because they sold when the crash started, but bought back after it recovered at a slightly higher price. In particular when using narrow margins and selling on a 3-6% drawdown the strategy performed very badly, as those small dips tend to recover in a few days. Essentially, the strategy was repeating the pattern of selling 100 stocks at a 6% discount, then being able to buy back only 94 shares the next day, then again selling 94 shares at a 6% discount, and only being able to buy back maybe 90 shares after recovery, and so forth, never catching up to the buy-and-hold. The strategy worked better in large market crashes as they tended to last longer, and there were higher chances of buying back the shares while the price was still low. For example, in the 2020 crash selling at a 20% drawdown was a good strategy, as the stock I tested dropped nearly 50% and remained low for several weeks; thus, the strategy bought back the stocks while the price was still low and had not yet started to climb significantly. But that was just a lucky incident, as the delta between the trailing stop-loss margin of 20% and total crash of 50% was large enough. If the crash had been only 25%, the strategy would have missed the rebound and ended up buying back the stocks at a slightly higher price. Also, note that the simulation assumes that the trade itself is too small to affect the price formation. We should keep in mind that in reality, if many people have stop-loss orders in place, a large price drop would trigger all of them, creating a flood of sell orders, which in turn would affect the price and drive it lower even faster and deeper. Luckily, it seems that stop-loss orders are generally not a good strategy, and we don t need to fear that too many people will be using them.

Conclusion Even though using a trailing stop-loss strategy does not seem to help in getting consistently higher returns based on my backtesting, I would still say it is useful in protecting from the downside of stock investing. It can act as a kind of insurance policy to considerably decrease the chances of losing big while increasing the chances of losing a little bit. If you are risk-averse, which I think I probably am, this tradeoff can make sense. I d rather miss out on an initial 50% loss and an overall 3% gain on recovery than have to sit through weeks or months with a 50% loss before the price recovers to prior levels. Most notably, the trailing stop-loss strategy works best if used only once. If it is repeated multiple times, the small losses in gains will compound into big losses overall. Thus, I think I might actually put this automation in place at least on the stocks in my portfolio that have had the highest gains. If they keep going up, I will ride along, but once the crash happens, I will be out of those particular stocks permanently. Do you have a favorite open source investment tool or are you aware of any strategy that actually works? Comment below!

14 December 2025

Evgeni Golov: Home Assistant, Govee Lights Local, VLANs, Oh my!

We recently bought some Govee Glide Hexa Light Panels, because they have a local LAN API that is well integrated into Home Assistant. Or so we thought. Our network is not that complicated, but there is a dedicated VLAN for IOT devices. Home Assistant runs in a container (with network=host) on a box in the basement, and that box has a NIC in the IOT VLAN so it can reach devices there easily. So far, this has never been a problem. Enter the Govee LAN API. Or maybe its Python implementation. Not exactly sure who's to blame here. The API involves sending JSON over multicast, which the Govee device will answer to.
No devices found on the network
After turning logging for homeassistant.components.govee_light_local to 11, erm debug, we see:
DEBUG (MainThread) [homeassistant.components.govee_light_local.config_flow] Starting discovery with IP 192.168.42.2
DEBUG (MainThread) [homeassistant.components.govee_light_local.config_flow] No devices found with IP 192.168.42.2
That's not the IP address in the IOT VLAN! Turns out the integration recently got support for multiple NICs, but Home Assistant doesn't just use all the interfaces it sees by default. You need to go to Settings Network Network adapter and deselect "Autoconfigure", which will allow your to select individual interfaces. Once you've done that, you'll see Starting discovery with IP messages for all selected interfaces and adding of Govee Lights Local will work.

7 December 2025

Vincent Bernat: Compressing embedded files in Go

Go s embed feature lets you bundle static assets into an executable, but it stores them uncompressed. This wastes space: a web interface with documentation can bloat your binary by dozens of megabytes. A proposition to optionally enable compression was declined because it is difficult to handle all use cases. One solution? Put all the assets into a ZIP archive!

Code The Go standard library includes a module to read and write ZIP archives. It contains a function that turns a ZIP archive into an io/fs.FS structure that can replace embed.FS in most contexts.1
package embed
import (
  "archive/zip"
  "bytes"
  _ "embed"
  "fmt"
  "io/fs"
  "sync"
)
//go:embed data/embed.zip
var embeddedZip []byte
var dataOnce = sync.OnceValue(func() *zip.Reader  
  r, err := zip.NewReader(bytes.NewReader(embeddedZip), int64(len(embeddedZip)))
  if err != nil  
    panic(fmt.Sprintf("cannot read embedded archive: %s", err))
   
  return r
 )
func Data() fs.FS  
  return dataOnce()
 
We can build the embed.zip archive with a rule in a Makefile. We specify the files to embed as dependencies to ensure changes are detected.
common/embed/data/embed.zip: console/data/frontend console/data/docs
common/embed/data/embed.zip: orchestrator/clickhouse/data/protocols.csv 
common/embed/data/embed.zip: orchestrator/clickhouse/data/icmp.csv
common/embed/data/embed.zip: orchestrator/clickhouse/data/asns.csv
common/embed/data/embed.zip:
    mkdir -p common/embed/data && zip --quiet --recurse-paths --filesync $@ $^
The automatic variable $@ is the rule target, while $^ expands to all the dependencies, modified or not.

Space gain Akvorado, a flow collector written in Go, embeds several static assets:
  • CSV files to translate port numbers, protocols or AS numbers, and
  • HTML, CSS, JS, and image files for the web interface, and
  • the documentation.
Breakdown of space used by each package before and after introducing embed.zip. It is displayed as a treemap and we can see many embedded files replaced by a bigger one.
Breakdown of the space used by each component before (left) and after (right) the introduction of embed.zip.
Embedding these assets into a ZIP archive reduced the size of the Akvorado executable by more than 4 MiB:
$ unzip -p common/embed/data/embed.zip   wc -c   numfmt --to=iec
7.3M
$ ll common/embed/data/embed.zip
-rw-r--r-- 1 bernat users 2.9M Dec  7 17:17 common/embed/data/embed.zip

Performance loss Reading from a compressed archive is not as fast as reading a flat file. A simple benchmark shows it is more than 4 slower. It also allocates some memory.2
goos: linux
goarch: amd64
pkg: akvorado/common/embed
cpu: AMD Ryzen 5 5600X 6-Core Processor
BenchmarkData/compressed-12     2262   526553 ns/op   610 B/op   10 allocs/op
BenchmarkData/uncompressed-12   9482   123175 ns/op     0 B/op    0 allocs/op
Each access to an asset requires a decompression step, as seen in this flame graph:
&#128444; Flame graph when reading data from embed.zip compared to reading data directly
CPU flame graph comparing the time spent on CPU when reading data from embed.zip (left) versus reading data directly (right). Because the Go testing framework executes the benchmark for uncompressed data 4 times more often, it uses the same horizontal space as the benchmark for compressed data. The graph is interactive.
While a ZIP archive has an index to quickly find the requested file, seeking inside a compressed file is currently not possible.3 Therefore, the files from a compressed archive do not implement the io.ReaderAt or io.Seeker interfaces, unlike directly embedded files. This prevents some features, like serving partial files or detecting MIME types when serving files over HTTP.
For Akvorado, this is an acceptable compromise to save a few mebibytes from an executable of almost 100 MiB. Next week, I will continue this futile adventure by explaining how I prevented Go from disabling dead code elimination!

  1. You can safely read multiple files concurrently. However, it does not implement ReadDir() and ReadFile() methods.
  2. You could keep frequently accessed assets in memory. This reduces CPU usage and trades cached memory for resident memory.
  3. SOZip is a profile that enables fast random access in a compressed file. However, Go s archive/zip module does not support it.

6 December 2025

Jonathan Dowland: thesis

It's done! It's over! I've graduated, I have the scroll, I'm staring at the eye-watering prices for the official photographer snap, I'm adjusting to post-thesis life. My PhD thesis revisions have been accepted and my thesis is now available from Newcastle University Library's eThesis repository. As part of submitting my corrections, I wrote a brief report detailing the changes I made from my thesis at the time of the viva. I also produced a latexdiff marked-up copy of the thesis to visualise the exact changes. In order to shed some light on the post-viva corrections process, at least at my institution, and in the hope that they are some use to someone, I'm sharing those documents:

3 December 2025

Reproducible Builds: Reproducible Builds in November 2025

Welcome to the report for November 2025 from the Reproducible Builds project! These monthly reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As always, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website. In this report:

  1. 10 years of Reproducible Build at SeaGL
  2. Distribution work
  3. Tool development
  4. Website updates
  5. Miscellaneous news
  6. Software Supply Chain Security of Web3
  7. Upstream patches

10 years of Reproducible Builds at SeaGL 2025 On Friday 8th November, Chris Lamb gave a talk called 10 years of Reproducible Builds at SeaGL in Seattle, WA. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. Chris talk:
[ ] introduces the concept of reproducible builds, its technical underpinnings and its potentially transformative impact on software security and transparency. It is aimed at developers, security professionals and policy-makers who are concerned with enhancing trust and accountability in our software. It also provides a history of the Reproducible Builds project, which is approximately ten years old. How are we getting on? What have we got left to do? Aren t all the builds reproducible now?

Distribution work In Debian this month, Jochen Sprickerhof created a merge request to replace the use of reprotest in Debian s Salsa Continuous Integration (CI) pipeline with debrebuild. Jochen cites the advantages as being threefold: firstly, that only one extra build needed ; it uses the same sbuild and ccache tooling as the normal build ; and works for any Debian release . The merge request was merged by Emmanuel Arias and is now active. kpcyrd posted to our mailing list announcing the initial release of repro-threshold, which implements an APT transport that defines a threshold of at least X of my N trusted rebuilders need to confirm they reproduced the binary before installing Debian packages. Configuration can be done through a config file, or through a curses-like user interface. Holger then merged two commits by Jochen Sprickerhof in order to address a fakeroot-related reproducibility issue in the debian-installer, and J rg Jaspert deployed a patch by Ivo De Decker for a bug originally filed by Holger in February 2025 related to some Debian packages not being archived on snapshot.debian.org. Elsewhere, Roland Clobus performed some analysis on the live Debian trixie images, which he determined were not reproducible. However, in a follow-up post, Roland happily reports that the issues have been handled. In addition, 145 reviews of Debian packages were added, 12 were updated and 15 were removed this month adding to our knowledge about identified issues. Lastly, Jochen Sprickerhof filed a bug announcing their intention to binary NMU a very large number of the R programming language after a reproducibility-related toolchain bug was fixed.
Bernhard M. Wiedemann posted another openSUSE monthly update for their work there.
Julien Malka and Arnout Engelen launched the new hash collection server for NixOS. Aside from improved reporting to help focus reproducible builds efforts within NixOS, it collects build hashes as individually-signed attestations from independent builders, laying the groundwork for further tooling.

Tool development diffoscope version 307 was uploaded to Debian unstable (as well as version 309). These changes included further attempts to automatically attempt to deploy to PyPI by liaising with the PyPI developers/maintainers (with this experimental feature). [ ][ ][ ] In addition, reprotest versions 0.7.31 and 0.7.32 were uploaded to Debian unstable by Holger Levsen, who also made the following changes:
  • Do not vary the architecture personality if the kernel is not varied. (Thanks to Ra l Cumplido). [ ]
  • Drop the debian/watch file, as Lintian now flags this as error for native Debian packages. [ ][ ]
  • Bump Standards-Version to 4.7.2, with no changes needed. [ ]
  • Drop the Rules-Requires-Root header as it is no longer required.. [ ]
In addition, however, Vagrant Cascadian fixed a build failure by removing some extra whitespace from an older changelog entry. [ ]

Website updates Once again, there were a number of improvements made to our website this month including:

Miscellaneous news

Software Supply Chain Security of Web3 Via our mailing list, Martin Monperrus let us know about their recently-published page on the Software Supply Chain Security of Web3. The abstract of their paper is as follows:
Web3 applications, built on blockchain technology, manage billions of dollars in digital assets through decentralized applications (dApps) and smart contracts. These systems rely on complex, software supply chains that introduce significant security vulnerabilities. This paper examines the software supply chain security challenges unique to the Web3 ecosystem, where traditional Web2 software supply chain problems intersect with the immutable and high-stakes nature of blockchain technology. We analyze the threat landscape and propose mitigation strategies to strengthen the security posture of Web3 systems.
Their paper lists reproducible builds as one of the mitigating strategies. A PDF of the full text is available to download.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

19 November 2025

Michael Ablassmeier: building SLES 16 vagrant/libvirt images using guestfs tools

SLES 16 has been released. In the past, SUSE offered ready built vagrant images. Unfortunately that s not the case anymore, as with more recent SLES15 releases the official images were gone. In the past, it was possible to clone existing projects on the opensuse build service to build the images by yourself, but i couldn t find any templates for SLES 16. Naturally, there are several ways to build images, and the tooling around involves kiwi-ng, opensuse build service, or packer recipes etc.. (existing packer recipes wont work anymore, as Yast has been replaced by a new installer, called agma). All pretty complicated, So my current take on creating a vagrant image for SLE16 has been the following: Two guestfs-tools that can now be used to modify the created qcow2 image:
 virt-sysprep -a sles16.qcow2
#!/bin/bash
useradd vagrant
mkdir -p /home/vagrant/.ssh/
chmod 0700 /home/vagrant/.ssh/
echo "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6NF8iallvQVp22WDkTkyrtvp9eWW6A8YVr+kz4TjGYe7gHzIw+niNltGEFHzD8+v1I2YJ6oXevct1YeS0o9HZyN1Q9qgCgzUFtdOKLv6IedplqoPkcmF0aYet2PkEDo3MlTBckFXPITAMzF8dJSIF
o9D8HfdOV0IAdx4O7PtixWKn5y2hMNG0zQPyUecp4pzC6kivAIhyfHilFR61RGL+GPXQ2MWZWFYbAGjyiYJnAmCP3NOTd0jMZEnDkbUvxhMmBYSdETk1rRgm+R4LOzFUGaHqHDLKLX+FIPKcF96hrucXzcWyLbIbEgE98OHlnVYCzRdK8jlqm8tehUc9c9W
hQ== vagrant insecure public key" > /home/vagrant/.ssh/authorized_keys
chmod 0600 /home/vagrant/.ssh/authorized_keys
chown -R vagrant:vagrant /home/vagrant/
# apply recommended ssh settings for vagrant boxes
SSHD_CONFIG=/etc/ssh/sshd_config.d/99-vagrant.conf
if [[ ! -d "$(dirname $ SSHD_CONFIG )" ]]; then
    SSHD_CONFIG=/etc/ssh/sshd_config
    # prepend the settings, so that they take precedence
    echo -e "UseDNS no\nGSSAPIAuthentication no\n$(cat $ SSHD_CONFIG )" > $ SSHD_CONFIG 
else
    echo -e "UseDNS no\nGSSAPIAuthentication no" > $ SSHD_CONFIG 
fi
SUDOERS_LINE="vagrant ALL=(ALL) NOPASSWD: ALL"
if [ -d /etc/sudoers.d ]; then
    echo "$SUDOERS_LINE" >  /etc/sudoers.d/vagrant
    visudo -cf /etc/sudoers.d/vagrant
    chmod 0440 /etc/sudoers.d/vagrant
else
    echo "$SUDOERS_LINE" >> /etc/sudoers
    visudo -cf /etc/sudoers
fi
 
mkdir -p /vagrant
chown -R vagrant:vagrant /vagrant
systemctl enable sshd
 virt-customize -a sle16.qcow2 --upload vagrant.sh:/tmp/vagrant.sh
 virt-customize -a sle16.qcow2 --run-command "/tmp/vagrant.sh"
After this, use the create-box.sh from the vagrant-libvirt project to create an box image: https://github.com/vagrant-libvirt/vagrant-libvirt/blob/main/tools/create_box.sh and add the image to your environment:
 create_box.sh sle16.qcow2 sle16.box
 vagrant box add --name my/sles16 test.box
the resulting box is working well within my CI environment as far as i can tell.

8 November 2025

Thorsten Alteholz: My Debian Activities in October 2025

Debian LTS This was my hundred-thirty-sixth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on: I also attended the monthly LTS/ELTS meeting. Debian ELTS This month was the eighty-seventh ELTS month. During my allocated time I uploaded or worked on: I also attended the monthly LTS/ELTS meeting. Debian Printing This month I uploaded a new upstream version or a bugfix version of: This work is generously funded by Freexian! Debian Astro This month I uploaded a new upstream version or a bugfix version of: Debian IoT Unfortunately I didn t found any time to work on this topic. Debian Mobcom This month I uploaded a new upstream version or a bugfix version of: misc This month I uploaded a new upstream version or a bugfix version of: On my fight against outdated RFPs, I closed 31 of them in October. I could even close one RFP by uploading the new package gypsy. Meanwhile only 3373 are still open, so don t hesitate to help closing one or another. FTP master This month I accepted 420 and rejected 45 packages. The overall number of packages that got accepted was 423. I would like to remind everybody that in case you don t agree with the removal of a package, please set the moreinfo tag on this bug. This is the only reliable way to prevent processing of that RM-bug. Well, there is a second way, of course you could also achieve this by closing the bug.

26 October 2025

Dirk Eddelbuettel: duckdb-mlpack 0.0.2: mlpack is now a duckdb community extension

A couple of days ago in a short post, I announced duckdb-mlpack as ML quacks : combining the powerful C++ machine learning library mlpack with the amazing analytical database engine duckdb. See that post for more background. The duckdb-mlpack package is now a community extension joining an impressive list of existing extensions. This means duckdb builds and distributes duckdb-mlpack for all supported platforms allowing users to just install the resulting (signed) binary. (We currently only support Linux in both arm64 and amd64, adding macOS should be straightforward once we sort one build issue out. Windows and WASM should work too, with a little love and polish, as both duckdb and mlpack support them.) Given the binary build, a simple
INSTALL mlpack FROM community;
LOAD mlpack;
installs and loads the package. By the duckdb convention the code is stored per-user and per-version, so the first line needs to be executed only once per duckdb release used. The second line is then per session. We also extended the capabilities of duckdb-mlpack. While still a MVP stressing minimal viable product, the two supported methods adaBoost and (regularized) linear regression both serialize and store their model object permitting rapid prediction on new data as shown in the adaBoost example:
-- Perform adaBoost (using weak learner 'Perceptron' by default)
-- Read 'features' into 'X', 'labels' into 'Y', use optional parameters
-- from 'Z', and prepare model storage in 'M'
CREATE TABLE X AS SELECT * FROM read_csv("https://eddelbuettel.github.io/duckdb-mlpack/data/iris.csv");
CREATE TABLE Y AS SELECT * FROM read_csv("https://eddelbuettel.github.io/duckdb-mlpack/data/iris_labels.csv");
CREATE TABLE Z (name VARCHAR, value VARCHAR);
INSERT INTO Z VALUES ('iterations', '50'), ('tolerance', '1e-7');
CREATE TABLE M (json VARCHAR);

-- Train model for 'Y' on 'X' using parameters 'Z', store in 'M'
CREATE TEMP TABLE A AS SELECT * FROM mlpack_adaboost("X", "Y", "Z", "M");

-- Count by predicted group
SELECT COUNT(*) as n, predicted FROM A GROUP BY predicted;

-- Model 'M' can be used to predict
CREATE TABLE N (x1 DOUBLE, x2 DOUBLE, x3 DOUBLE, x4 DOUBLE);
-- inserting approximate column mean values
INSERT INTO N VALUES (5.843, 3.054, 3.759, 1.199);
-- inserting approximate column mean values, min values, max values
INSERT INTO N VALUES (5.843, 3.054, 3.759, 1.199), (4.3, 2.0, 1.0, 0.1), (7.9, 4.4, 6.9, 2.5);
-- and this predict one element each
SELECT * FROM mlpack_adaboost_pred("N", "M");
Ryan and I have some ideas for where to go from here, ideally towards autogenerating bindings for most (if not all) methods as is done for the mlpack language bindings. Anybody interested and willing to help should reach out to us.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

20 October 2025

Birger Schacht: A plea for <dialog>

A couple of weeks ago there was an article on the Freexian blog about Using JavaScript in Debusine without depending on JavaScript. It describes how JavaScript is used in the Debusine Django app, namely for progressive enhancement rather than core functionality . This is an approach I also follow when implementing web interfaces and I think developments in web technologies and standardization in recent years have made this a lot easier. One of the examples described in the post, the Bootstrap toast messages, was something that I implemented myself recently, in a similar but slightly different way. In the main app I develop for my day job we also use the Bootstrap framework. I have also used it for different personal projects (for example the GSOC project I did for Debian in 2018, was also a Django app that used Bootstrap). Bootstrap is still primarily a CSS framework, but it also comes with a JavaScript library for some functionality. Previous versions of Bootstrap depended on jQuery, but since version 5 of Bootstrap, you don t need jQuery anymore. In my experience, two of the more commonly used JavaScript utilities of Bootstrap are modals (also called lightbox or popup, they are elements that are displayed above the main content of a website) and toasts (also called alerts, they are little notification windows that often disappear after a timeout). The thing is, Bootstrap 5 was released in 2021 and a lot has happened since then regarding web technologies. I believe that both these UI components can nowadays be implemented using standard HTML5 elements. An eye opening talk I watched was Stop using JS for that from last years JSConf(!). In this talk the speaker argues that the Rule of least power is one of the core principles of web development, which means we should use HTML over CSS and CSS over JavaScript. And the speaker also presents some CSS rules and HTML elements that added recently and that help to make that happen, one of them being the dialog element:
The <dialog> HTML element represents a modal or non-modal dialog box or other interactive component, such as a dismissible alert, inspector, or subwindow. The Dialog element at MDN
The baseline for this element is widely available :
This feature is well established and works across many devices and browser versions. It s been available across browsers since March 2022. The Dialog element at MDN
This means there is an HTML element that does what a modal Bootstrap does! Once I had watched that talk I removed all my Bootstrap modals and replaced them with HTML <dialog> elements (JavaScript is still needed to .show() and .close() the elements, though, but those are two methods instead of a full library). This meant not only that I replaced code that depended on an external library, I m now also a lot more flexible regarding the styling of the elements. When I started implementing notifications for our app, my first approach was to use Bootstrap toasts, similar to how it is implemented in Debusine. But looking at the amount of HTML code I had to write for a simple toast message, I thought that it might be possible to also implement toasts with the <dialog> element. I mean, basically it is the same, only the styling is a bit different. So what I did was that I added a #snackbar area to the DOM of the app. This would be the container for the toast messages. All the toast messages are simply <dialog> elements with the open attribute, which means that they are visible right away when the page loads.
<div id="snackbar">
   % for message in messages % 
    <dialog class="mytoast alert alert-  message.tags  " role="alert" open>
        message  
    </dialog>
   % endfor % 
</div>
This looks a lot simpler than the Bootstrap toasts would have. To make the <dialog> elements a little bit more fancy, I added some CSS to make them fade in and out:
.mytoast  
    z-index: 1;
    animation: fadein 0.5s, fadeout 0.5s 2.6s;
 
@keyframes fadein  
    from  
        opacity: 0;
     
    to  
        opacity: 1;
     
 
@keyframes fadeout  
    from  
        opacity: 1;
     
    to  
        opacity: 0;
     
 
To close a <dialog> element once it has faded away, I had to add one JavaScript event listener:
window.addEventListener('load', () =>  
    document.querySelectorAll(".mytoast").forEach((element) =>  
        element.addEventListener('animationend', function(e)  
            e.animationName == "fadeout" && element.close();
         );
     );
 );
(If one would want to use the same HTML code for both script and noscript users, then the CSS should probably adapted: it fades away and if there is no JavaScript to close the element, it stays visible after the animation is over. A solution would for example be to use a close button and for noscript users simply let it stay visible - this is also what happens with the noscript messages in Debusine). So there are many new elements in HTML and a lot of new features of CSS. It makes sense to sometimes ask ourselves if instead of the solutions we know (or what a web search / some AI shows us as the most common solution) there might be some newer solution that was not there when the first choice was created. Using standardized solutions instead of custom libraries makes the software more maintainable. In web development I also prefer standardized elements over a third party library because they have usually better accessibility and UX. In How Functional Programming Shaped (and Twisted) Frontend Development the author writes:
Consider the humble modal dialog. The web has <dialog>, a native element with built-in functionality: it manages focus trapping, handles Escape key dismissal, provides a backdrop, controls scroll-locking on the body, and integrates with the accessibility tree. It exists in the DOM but remains hidden until opened. No JavaScript mounting required. [ ] you ve trained developers to not even look for native solutions. The platform becomes invisible. When someone asks how do I build a modal? , the answer is install a library or here s my custom hook, never use <dialog>. Ahmad Alfy

Next.