Search Results: "xam"

6 August 2025

Reproducible Builds: Reproducible Builds in July 2025

Welcome to the seventh report from the Reproducible Builds project in 2025. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website. In this report:
  1. Reproducible Builds Summit 2025
  2. Reproducible Builds an official goal for SUSE Enterprise Linux
  3. Reproducible Builds at FOSSY 2025
  4. New OSS Rebuild project from Google
  5. New extension of Python setuptools to support reproducible builds
  6. diffoscope
  7. New library to patch system functions for reproducibility
  8. Independently Reproducible Git Bundles
  9. Website updates
  10. Distribution work
  11. Reproducibility testing framework
  12. Upstream patches

Reproducible Builds Summit 2025 We are extremely pleased to announce the upcoming Reproducible Builds Summit, set to take place from October 28th 30th 2025 in Vienna, Austria! We are thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving. If you re interesting in joining us this year, please make sure to read the event page which has more details about the event and location. Registration is open until 20th September 2025, and we are very much looking forward to seeing many readers of these reports there!

Reproducible Builds an official goal for SUSE Enterprise Linux On our mailing list this month, Bernhard M. Wiedemann revealed the big news that reproducibility is now an official goal for SUSE Linux Enterprise Server (SLES) 16:
[Everything] changed earlier this year when reproducible-builds for SLES-16 became an official goal for the product. More people are talking about digital sovereignty and supply-chain security now. [ ] Today, only 9 of 3319 (source) packages have significant problems left (plus 7 with pending fixes), so 99.5% of packages have reproducible builds.

Reproducible Builds at FOSSY 2025 On Saturday 2nd August, Vagrant Cascadian and Chris Lamb presented at this year s FOSSY 2025. Their talk, titled Never Mind the Checkboxes, Here s Reproducible Builds!, was introduced as follows:
There are numerous policy compliance and regulatory processes being developed that target software development but do they solve actual problems? Does it improve the quality of software? Do Software Bill of Materials (SBOMs) actually give you the information necessary to verify how a given software artifact was built? What is the goal of all these compliance checklists anyways or more importantly, what should the goals be? If a software object is signed, who should be trusted to sign it, and can they be trusted forever?
Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, USA, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2025 website, including the full programme schedule. Vagrant and Chris also staffed a table, where they will be available to answer any questions about Reproducible Builds and discuss collaborations with other projects.

New OSS Rebuild project from Google The Google Open Source Security Team (GOSST) published an article this month announcing OSS Rebuild, a new project to strengthen trust in open source package ecosystems by reproducing upstream artifacts. As the post itself documents, the new project comprises four facets:
  • Automation to derive declarative build definitions for existing PyPI (Python), npm (JS/TS), and Crates.io (Rust) packages.
  • SLSA Provenance for thousands of packages across our supported ecosystems, meeting SLSA Build Level 3 requirements with no publisher intervention.
  • Build observability and verification tools that security teams can integrate into their existing vulnerability management workflows.
  • Infrastructure definitions to allow organizations to easily run their own instances of OSS Rebuild to rebuild, generate, sign, and distribute provenance.
One difference with most projects that aim for bit-for-bit reproducibility, OSS Rebuild aims for a kind of semantic reproducibility:
Through automation and heuristics, we determine a prospective build definition for a target package and rebuild it. We semantically compare the result with the existing upstream artifact, normalizing each one to remove instabilities that cause bit-for-bit comparisons to fail (e.g. archive compression).
The extensive post includes examples about how to access OSS Rebuild attestations using the Go-based command-line interface.

New extension of Python setuptools to support reproducible builds Wim Jeantine-Glenn has written a PEP 517 Build backend in order to enable reproducible builds when building Python projects that use setuptools. Called setuptools-reproducible, the project s README file contains the following:
Setuptools can create reproducible wheel archives (.whl) by setting SOURCE_DATE_EPOCH at build time, but setting the env var is insufficient for creating reproducible sdists (.tar.gz). setuptools-reproducible [therefore] wraps the hooks build_sdist build_wheel with some modifications to make reproducible builds by default.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 301, 302 and 303 to Debian:
  • Improvements:
    • Use Difference.from_operation in an attempt to pipeline the output of the extract-vmlinux script, potentially avoiding it all in memory. [ ]
    • Memoize a number of calls to --version, saving a very large number of external subprocess calls.
  • Bug fixes:
    • Don t check for PyPDF version 3 specifically, check for versions greater than 3. [ ]
    • Ensure that Java class files are named .class on the filesystem before passing them to javap(1). [ ]
    • Mask stderr from extract-vmlinux script. [ ][ ]
    • Avoid spurious differences in h5dump output caused by exposure of absolute internal extraction paths. (#1108690)
  • Misc:
    • Use our_check_output in the ODT comparator. [ ]
    • Update copyright years. [ ]
In addition: Lastly, Chris Lamb added a tmpfs to try.diffoscope.org so that diffoscope has a non-trivial temporary area to unpack archives, etc. [ ] Elsewhere in our tooling, however, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, reprotest version 0.7.30 was uploaded to Debian unstable by Holger Levsen, chiefly including a change by Rebecca N. Palmer to not call sudo with the -h flag in order to fix Debian bug #1108550. [ ]

New library to patch system functions for reproducibility Nicolas Graves has written and published libfate, a simple collection of tiny libraries to patch system functions deterministically using LD_PRELOAD. According to the project s README:
libfate provides deterministic replacements for common non-deterministic system functions that can break reproducible builds. Instead of relying on complex build systems or apps or extensive patching, libfate uses the LD_PRELOAD trick to intercept system calls and return fixed, predictable values.
Describing why he wrote it, Nicolas writes:
I originally used the OpenSUSE dettrace approach to make Emacs reproducible in Guix. But when Guix switch to GCC@14, dettrace stopped working as expected. dettrace is a complex piece of software, my need was much less heavy: I don t need to systematically patch all sources of nondetermism, just the ones that make a process/binary unreproducible in a container/chroot.

Independently Reproducible Git Bundles Simon Josefsson has published another interesting article this month. Titled Independently Reproducible Git Bundles, the blog post describes the advantages of why you might a reproducible bundle, and the pitfalls that can arise when trying to create them:
One desirable property is that someone else should be able to reproduce the same git bundle, and not only that a single individual is able to reproduce things on one machine. It surprised me to see that when I ran the same set of commands on a different machine (started from a fresh git clone), I got a different checksum. The different checksums occurred even when nothing had been committed on the server side between the two runs.

Website updates Once again, there were a number of improvements made to our website this month including:

Distribution work In Debian this month:
Debian contributors have made significant progress toward ensuring package builds produce byte-for-byte reproducible results. You can check the status for packages installed on your system using the new package debian-repro-status, or visit reproduce.debian.net for Debian s overall statistics for trixie and later. You can contribute to these efforts by joining #debian-reproducible on IRC to discuss fixes, or verify the statistics by installing the new rebuilderd package and setting up your own instance.

The IzzyOnDroid Android APK repository made further progress in July, crossing the 50% reproducibility threshold congratulations. Furthermore, a new release of the Neo Store was released, which exposes the reproducible status directly next to the version of each app.
In GNU Guix, a series of patches intended to fix the reproducibility for the Mono programming language was merged, fixing reproducibility in Mono versions 1.9 [ ], 2.4 [ ] and 2.6 [ ].
Lastly, in addition to the news that openSUSE Enterprise Linux now [has an official goal of reproducibility]((https://lists.reproducible-builds.org/pipermail/rb-general/2025-July/003846.html), Bernhard M. Wiedemann posted another monthly update for their work there.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In July, however, a number of changes were made by Holger Levsen, including:
  • Switch the URL for the Tails package set. [ ]
  • Make the dsa-check-packages output more useful. [ ]
  • Setup the ppc64el architecture again, has it has returned this time with a 2.7 GiB database instead of 72 GiB. [ ]
In addition, Jochen Sprickerhof improved the reproducibility statistics generation:
  • Enable caching of statistics. [ ][ ][ ]
  • Add some common non-reproducible patterns. [ ]
  • Change output to directory. [ ]
  • Add a page sorted by diffoscope size. [ ][ ]
  • Switch to Python s argparse module and separate output(). [ ]
Holger also submitted a number of Debian bugs against rebuilderd and rebuilderd-worker:
  • Config files and scripts for a simple one machine setup. [ ][ ]
  • Create a rebuilderd user. [ ]
  • Create rebuilderd-worker user with sbuild. [ ]
Lastly, Mattia Rizzolo added a scheduled job to renew some SSL certificates [ ] and Vagrant Cascadian performed some node maintenance [ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: There were a number of other patches from openSUSE developers:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 August 2025

Michael Ablassmeier: PVE 9.0 - Snapshots for LVM

The new Proxmox release advertises a new feature for easier snapshot handling of virtual machines whose disks are stored on LVM volumes, I wondered.. whats the deal..? To be able to use the new feature, you need to enable a special flag for the LVM volume group. This example shows the general workflow for a fresh setup. 1) Create the volume group with the snapshot-as-volume-chain feature turned on:
 pvesm add lvm lvmthick --content images --vgname lvm --snapshot-as-volume-chain 1
2) From this point on, you can create virtual machines right away, BUT those virtual machines disks must use the QCOW image format for their disk volumes. If you use the RAW format, you wont be able to create snapshots, still.
 VMID=401
 qm create $VMID --name vm-lvmthick
 qm set $VMID -scsi1 lvmthick:2,format=qcow2
So, why would it make sense to format the LVM volume as QCOW? Snapshots on LVM thick provisioned devices are, as everybody knows, a very I/O intensive task. Besides each snapshot, a special -cow Device is created that tracks the changed block regions and the original block data for each change to the active volume. This will waste quite some space within your volume group for each snapshot. Formatting the LVM volume as QCOW image, makes it possible to use the QCOW backing-image option for these devices, this is the way PVE 9 handles these kind of snapshots. Creating a snapshot looks like this:
 qm snapshot $VMID id
 snapshotting 'drive-scsi1' (lvmthick3:vm-401-disk-0.qcow2)
 Renamed "vm-401-disk-0.qcow2" to "snap_vm-401-disk-0_id.qcow2" in volume group "lvm"
 Rounding up size to full physical extent 1.00 GiB
 Logical volume "vm-401-disk-0.qcow2" created.
 Formatting '/dev/lvm/vm-401-disk-0.qcow2', fmt=qcow2 cluster_size=131072 extended_l2=on preallocation=metadata compression_type=zlib size=1073741824 backing_file=snap_vm-401-disk-0_id.qcow2 backing_fmt=qcow2 lazy_refcounts=off refcount_bits=16
So it will rename the current active disk and create another QCOW formatted LVM volume, but pointing it to the snapshot image using the backing_file option. Neat.

4 August 2025

Aigars Mahinovs: Snapshot mirroring in Debian (and Ubuntu)

Snapshot mirroring in Debian (and Ubuntu) The use of snapshots has been routine in both Debian and Ubuntu for several years now or more than 15 years for Debian, to be precise. Snapshots have become not only very reliable, but also an increasingly important part of the Debian package archive. This week, I encountered a problem at work that could be perfectly solved by correctly using the Snapshot service. However, while trying to figure it out, I ran into some shortcomings in the documentation. Until the docs are updated, I am publishing this blog post to make this information easier to find. Problem 1: Ensure fully reproducible creation of Docker containers with the exact same packages installed, even years after the original images were generated. Solution 1: Pin everything! Use a pinned source image in the FROM statement, such as debian:trixie-20250721-slim, and also pin the APT package sources to the "same" date - "20250722". Hint: The APT packages need to be newer than the Docker image base. If the APT packages are a bit newer, that's not a problem, as APT can upgrade packages without issues. However, if your Docker image has a newer package than your APT package sources, you will have a big problem. For example, if you have "libbearssl0" version 0.6-2 installed in the Docker image, but your package sources only have the older version 0.6-1, you will fail when trying to install the "libbearssl-dev" package. This is because you only have version 0.6-1 of the "-dev" package available, which hard-depends on exactly version 0.6-1 of "libbearssl0", and APT will refuse to downgrade an already installed package to satisfy that dependency. Problem 2: You are using a lot of images in a lot of executions and building tens of thousands of images per day. It would be a bad idea to put all this load on public Debian servers. Using local sources is also faster and adds extra security. Solution 2: Use local (transparently caching) mirrors for both the Docker Hub repository and the APT package source. At this point, I ran into another issue I could not easily figure out how to specify a local mirror for the snapshot part of the archive service. First of all, snapshot support in both Ubuntu and Debian accepts both syntaxes described in the Debian and Ubuntu documentation above. The documentation on both sites presents different approaches and syntax examples, but both work. The best approach nowadays is to use the "deb822" sources syntax. Remove /etc/apt/sources.list (if it still exists), delete all contents of the /etc/apt/sources.list.d directory, and instead create this file at /etc/apt/sources.list.d/debian.sources:
Types: deb
URIs: https://common.mirror-proxy.local/ftp.debian.org/debian/
Suites: trixie
Components: main non-free-firmware non-free contrib
Signed-By: /usr/share/keyrings/debian-archive-keyring.gpg
Snapshot: 20250722
Hint: This assumes you have a mirror service running at common.mirror-proxy.local that proxies requests (with caching) to whitelisted domains, based on the name of the first folder in the path. If you now run sudo apt update --print-uris, you will see that your configuration accesses your mirror, but does not actually use the snapshot. Next, add the following to /etc/apt/apt.conf.d/80snapshots:
APT::Snapshot "20250722";
That should work, right? Let's try sudo apt update --print-uris again. I've got good news and bad news! The good news is that we are now actually using the snapshot we specified (twice). The bad news is that we are completely ignoring the mirror and going directly to snapshots.debian.org instead. Finding the right information was a bit of a challenge, but after a few tries, this worked: to specify a custom local mirror of the Debian (or Ubuntu) snapshot service, simply add the following line to the same file, /etc/apt/apt.conf.d/80snapshots:
Acquire::Snapshots::URI::Override::Origin::debian "https://common.mirror-proxy.local/snapshot.debian.org/archive/debian/@SNAPSHOTID@/";
Now, if you check again with sudo apt update --print-uris, you will see that the requests go to your mirror and include the specified snapshot identifier. Success! Now you can install any packages you want, and everything will be completely local and fully reproducible, even years later!

2 August 2025

Raju Devidas: Use phone/tablets/other laptops as external monitor with your laptop

This method is for wayland based systems. There are better ways to do this on GNOME or KDE desktops, but the method we are going to use is independent of DE/WM that you are using. I am doing this on sway window manager, but you can try this on any other Wayland based WM or DE. I have not tried this on Xorg based systems, there are several other guides for Xorg based systems online. When we connect a physical monitor to our laptops, it creates a second display output in our display settings that we can then re-arrange in layout, set resolution, set scale etc. Since we are not connecting via a physical interface like HDMI, DP, VGA etc. We need to create a virtual display within our system and set the display properties manually.
Get a list of current display outputs. You can also just check it in display settings of your DE/WM with wdisplays
rajudev@sanganak ~> swaymsg -t get_outputs
Output LVDS-1 &aposSeiko Epson Corporation 0x3047 Unknown&apos (focused)
  Current mode: 1366x768 @ 60.002 Hz
  Power: on
  Position: 0,0
  Scale factor: 1.000000
  Scale filter: nearest
  Subpixel hinting: rgb
  Transform: normal
  Workspace: 2
  Max render time: off
  Adaptive sync: disabled
  Allow tearing: no
  Available modes:
    1366x768 @ 60.002 Hz
altSingle physical display of the laptop
Currently we are seeing only one display output. Our goal is to create a second virtual display that we will then share on the tablet/phone. To do this there are various tools available. We are using sway-vdctl . It is currently not available within Debian packages, so we need to install it manually.
$ git clone https://github.com/odincat/sway-vdctl.git
$ cd sway-vdctl
$ cargo build --release
This will generate the binary with the name main under target/release . We can then copy this binary to our bin folder.
$ sudo cp target/release/main /usr/local/bin/vdctl
Now we have the vdctl command available.
$ vdctl --help
Usage: vdctl [OPTIONS] <ACTION> [VALUE]
Arguments:
  <ACTION>
          Possible values:
          - create:      Create new output based on a preset
          - kill:        Terminate / unplug an active preset
          - list:        List out active presets
          - next-number: Manually set the next output number, in case something breaks
          - sync-number: Sync the next output number using &aposswaymsg -t get_outputs&apos
  [VALUE]
          Preset name to apply, alternatively a value
          
          [default: ]
Options:
      --novnc
          do not launch a vnc server, just create the output
  -h, --help
          Print help (see a summary with &apos-h&apos)
Before creating the virtual display, we need to set it&aposs properties at .config/vdctl/config.json . I am using Xiaomi Pad 6 tablet as my external display. You can adjust the properties according to the device you want to use as a second display. $ (text-editor) .config/vdctl/config.json
 
    "host": "0.0.0.0",
    "presets": [
         
            "name": "pad6",
            "scale_factor": 2,
            "port": 9901,
            "resolution":  
                "width": 2800,
                "height": 1800
             
       
    ]
 
In the JSON, you can set the display resolution according to your external device and other configurations. If you want to configure multiple displays, you can add another entry into the presets in the json file. You can refer to example json file into the git repository. Now we need to actually create the virtual monitor.
$ vdctl create pad6
Created output, presumably &aposHEADLESS-1&apos
Set resolution of &aposHEADLESS-1&apos to 2800x1800
Set scale factor of &aposHEADLESS-1&apos to 2
Preset &apospad6&apos (&aposHEADLESS-1&apos: 2800x1800) is now active on port 9901
Now if you will check the display outputs in your display settings or from command line, you will see two different displays.
$ swaymsg -t get_outputs
Output LVDS-1 &aposSeiko Epson Corporation 0x3047 Unknown&apos
  Current mode: 1366x768 @ 60.002 Hz
  Power: on
  Position: 0,0
  Scale factor: 1.000000
  Scale filter: nearest
  Subpixel hinting: rgb
  Transform: normal
  Workspace: 2
  Max render time: off
  Adaptive sync: disabled
  Allow tearing: no
  Available modes:
    1366x768 @ 60.002 Hz
Output HEADLESS-1 &aposUnknown Unknown Unknown&apos (focused)
  Current mode: 2800x1800 @ 0.000 Hz
  Power: on
  Position: 1366,0
  Scale factor: 2.000000
  Scale filter: nearest
  Subpixel hinting: unknown
  Transform: normal
  Workspace: 3
  Max render time: off
  Adaptive sync: disabled
  Allow tearing: no
Also in the display settings.
altDisplay settings on Wayland with physical and virtual monitor output
Now we need to make this virtual display available over VNC which we will access with a VNC client on the tablet. To accomplish this I am using wayvnc but you can use any VNC server package. Install wayvnc
$ sudo apt install wayvnc
Now we will serve our virtual display HEADLESS-1 with wayvnc.
$ wayvnc -o HEADLESS-1 0.0.0.0 5900
You can adjust the port number as per your need. The process from laptop side is done. Now install any VNC software on your tablet. I am using AVNC, which is available on F-Droid. In the VNC software interface, add a new connection with the IP address of your laptop and the port started by wayvnc. Remember, both your laptop and phone need to be on the same Wi-Fi network.
altAVNC interface with the connection details to connect to the virtual monitor.
Save and connect. Now you will be able to see a extended display on your tablet. Enjoy working with multiple screens in a portable setup. Till next time.. Have a great time.

1 August 2025

puer-robustus: My Google Summer of Code '25 at Debian

I ve participated in this year s Google Summer of Code (GSoC) program and have been working on the small (90h) autopkgtests for the rsync package project at Debian.

Writing my proposal Before you can start writing a proposal, you need to select an organization you want to work with. Since many organizations participate in GSoC, I ve used the following criteria to narrow things down for me:
  • Programming language familiarity: For me only Python (preferably) as well as shell and Go projects would have made sense. While learning another programming language is cool, I wouldn t be as effective and helpful to the project as someone who is proficient in the language already.
  • Standing of the organization: Some of the organizations participating in GSoC are well-known for the outstanding quality of the software they produce. Debian is one of them, but so is e.g. the Django Foundation or PostgreSQL. And my thinking was that the higher the quality of the organization, the more there is to learn for me as a GSoC student.
  • Mentor interactions: Apart from the advantage you get from mentor feedback when writing your proposal (more on that further below), it is also helpful to gauge how responsive/helpful your potential mentor is during the application phase. This is important since you will be working together for a period of at least 2 months; if the mentor-student communication doesn t work, the GSoC project is going to be difficult.
  • Free and Open-Source Software (FOSS) communication platforms: I generally believe that FOSS projects should be built on FOSS infrastructure. I personally won t run proprietary software when I want to contribute to FOSS in my spare time.
  • Be a user of the project: As Eric S. Raymond has pointed out in his seminal The Cathedral and the Bazaar 25 years ago
    Every good work of software starts by scratching a developer s personal itch.
Once I had some organizations in mind whose projects I d be interested in working on, I started writing proposals for them. Turns out, I started writing my proposals way too late: In the end I only managed to hand in a single one which is risky. Competition for the GSoC projects is fierce and the more quality (!) proposals you send out, the better your chances are at getting one. However, don t write proposals for the sake of it: Reviewers get way too many AI slop proposals already and you will not do yourself a favor with a low-quality proposal. Take the time to read the instructions/ideas/problem descriptions the project mentors have provided and follow their guidelines. Don t hesitate to reach out to project mentors: In my case, I ve asked Samuel Henrique a few clarification questions whereby the following (email) discussion has helped me greatly in improving my proposal. Once I ve finalized my proposal draft, I ve sent it to Samuel for a review, which again led to some improvements to the final proposal which I ve uploaded to the GSoC program webpage.

Community bonding period Once you get the information that you ve been accepted into the GSoC program (don t take it personally if you don t make it; this was my second attempt after not making the cut in 2024), get in touch with your prospective mentor ASAP. Agree upon a communication channel and some response times. Put yourself in the loop for project news and discussions whatever that means in the context of your organization: In Debian s case this boiled down to subscribing to a bunch of mailing lists and IRC channels. Also make sure to setup a functioning development environment if you haven t done so for writing the proposal already.

Payoneer setup The by far most annoying part of GSoC for me. But since you don t have a choice if you want to get the stipend, you will need to signup for an account at Payoneer. In this iteration of GSoC all participants got a personalized link to open a Payoneer account. When I tried to open an account by following this link, I got an email after the registration and email verification that my account is being blocked because Payoneer deems the email adress I gave a temporary one. Well, the email in question is most certainly anything but temporary, so I tried to get in touch with the Payoneer support - and ended up in an LLM-infused kafkaesque support hell. Emails are answered by an LLM which for me meant utterly off-topic replies and no help whatsoever. The Payoneer website offers a real-time chat, but it is yet another instance of a bullshit-spewing LLM bot. When I at last tried to call them (the support lines are not listed on the Payoneer website but were provided by the GSoC program), I kid you not, I was being told that their platform is currently suffering from technical problems and was hung up on. Only thanks to the swift and helpful support of the GSoC administrators (who get priority support from Payoneer) I was able to setup a Payoneer account in the end. Apart from showing no respect to customers, Payoneer is also ripping them off big time with fees (unless you get paid in USD). They charge you 2% for currency conversions to EUR on top of the FX spread they take. What worked for me to avoid all of those fees, was to open a USD account at Wise and have Payoneer transfer my GSoC stipend in USD to that account. Then I exchanged the USD to my local currency at Wise for significantly less than Payoneer would have charged me. Also make sure to close your Payoneer account after the end of GSoC to avoid their annual fee.

Project work With all this prelude out of the way, I can finally get to the actual work I ve been doing over the course of my GSoC project.

Background The upstream rsync project generally sees little development. Nonetheless, they released version 3.4.0 including some CVE fixes earlier this year. Unfortunately, their changes broke the -H flag. Now, Debian package maintainers need to apply those security fixes to the package versions in the Debian repositories; and those are typically a bit older. Which usually means that the patches cannot be applied as is but will need some amendments by the Debian maintainers. For these cases it is helpful to have autopkgtests defined, which check the package s functionality in an automated way upon every build. The question then is, why should the tests not be written upstream such that regressions are caught in the development rather than the distribution process? There s a lot to say on this question and it probably depends a lot on the package at hand, but for rsync the main benefits are twofold:
  1. The upstream project mocks the ssh connection over which rsync is most typically used. Mocking is better than nothing but not the real thing. In addition to being a more realisitic test scenario for the typical rsync use case, involving an ssh server in the test would automatically extend the overall resilience of Debian packages as now new versions of the openssh-server package in Debian benefit from the test cases in the rsync reverse dependency.
  2. The upstream rsync test framework is somewhat idiosyncratic and difficult to port to reimplementations of rsync. Given that the original rsync upstream sees little development, an extensive test suit further downstream can serve as a threshold for drop-in replacements for rsync.

Goal(s) At the start of the project, the Debian rsync package was just running (a part of) the upstream tests as autopkgtests. The relevant snippet from the build log for the rsync_3.4.1+ds1-3 package reads:
114s ------------------------------------------------------------
114s ----- overall results:
114s 36 passed
114s 7 skipped
Samuel and I agreed that it would be a good first milestone to make the skipped tests run. Afterwards, I should write some rsync test cases for local calls, i.e. without an ssh connection, effectively using rsync as a more powerful cp. And once that was done, I should extend the tests such that they run over an active ssh connection. With these milestones, I went to work.

Upstream tests Running the seven skipped upstream tests turned out to be fairly straightforward:
  • Two upstream tests concern access control lists and extended filesystem attributes. For these tests to run they rely on functionality provided by the acl and xattr Debian packages. Adding those to the Build-Depends list in the debian/control file of the rsync Debian package repo made them run.
  • Four upstream tests required root privileges to run. The autopkgtest tool knows the needs-root restriction for that reason. However, Samuel and I agreed that the tests should not exclusively run with root privileges. So, instead of just adding the restiction to the existing autopkgtest test, we created a new one which has the needs-root restriction and runs the upstream-tests-as-root script - which is nothing else than a symlink to the existing upstream-tests script.
The commits to implement these changes can be found in this merge request. The careful reader will have noticed that I only made 2 + 4 = 6 upstream test cases run out of 7: The leftover upstream test is checking the functionality of the --ctimes rsync option. In the context of Debian, the problem is that the Linux kernel doesn t have a syscall to set the creation time of a file. As long as that is the case, this test will always be skipped for the Debian package.

Local tests When it came to writing Debian specific test cases I started of a completely clean slate. Which is a blessing and a curse at the same time: You have full flexibility but also full responsibility. There were a few things to consider at this point in time:
  • Which language to write the tests in? The programming language I am most proficient in is Python. But testing a CLI tool in Python would have been weird: it would have meant that I d have to make repeated subprocess calls to run rsync and then read from the filesystem to get the file statistics I want to check. Samuel suggested I stick with shell scripts and make use of diffoscope - one of the main tools used and maintained by the Reproducible Builds project - to check whether the file contents and file metadata are as expected after rsync calls. Since I did not have good reasons to use bash, I ve decided to write the scripts to be POSIX compliant.
  • How to avoid boilerplate? If one makes use of a testing framework, which one? Writing the tests would involve quite a bit of boilerplate, mostly related to giving informative output on and during the test run, preparing the file structure we want to run rsync on, and cleaning the files up after the test has run. It would be very repetitive and in violation of DRY to have the code for this appear in every test. Good testing frameworks should provide convenience functions for these tasks. shunit2 comes with those functions, is packaged for Debian, and given that it is already being used in the curl project, I decided to go with it.
  • Do we use the same directory structure and files for every test or should every test have an individual setup? The tradeoff in this question being test isolation vs. idiosyncratic code. If every test has its own setup, it takes a) more work to write the test and b) more work to understand the differences between tests. However, one can be sure that changes to the setup in one test will have no side effects on other tests. In my opinion, this guarantee was worth the additional effort in writing/reading the tests.
Having made these decisions, I simply started writing tests and ran into issues very quickly.

rsync and subsecond mtime diffs When testing the rsync --times option, I observed a weird phenomenon: If the source and destination file have modification times which differ only in the nanoseconds, an rsync --times call will not synchronize the modification times. More details about this behavior and examples can be found in the upstream issue I raised. In the Debian tests we had to occasionally work around this by setting the timestamps explicitly with touch -d.
diffoscope regression In one test case, I was expecting a difference in the modification times but diffoscope would not report a diff. After a good amount of time spent on debugging the problem (my default, and usually correct, assumption is that something about my code is seriously broken if I run into issues like that), I was able to show that diffoscope only displayed this behavior in the version in the unstable suite, not on Debian stable (which I am running on my development machine). Since everything pointed to a regression in the diffoscope project and with diffoscope being written in Python, a language I am familiar with, I wanted to spend some time investigating (and hopefully fixing) the problem. Running git bisect on the diffoscope repo helped me in identifying the commit which introduced the regression: The commit contained an optimization via an early return for bit-by-bit identical files. Unfortunately, the early return also caused an explicitly requested metadata comparison (which could be different between the files) to be skipped. With a nicely diagnosed issue like that, I was able to go to a local hackerspace event, where people work on FOSS together for an evening every month. In a group, we were able to first, write a test which showcases the broken behavior in the latest diffoscope version, and second, make a fix to the code such that the same test passes going forward. All details can be found in this merge request.
shunit2 failures At some point I had a few autopkgtests setup and passing, but adding a new one would throw me totally inexplicable errors. After trying to isolate the problem as much as possible, it turns out that shunit2 doesn t play well together we the -e shell option. The project mentions this in the release notes for the 2.1.8 version1, but in my opinion a constraint this severe should be featured much more prominently, e.g. in the README.

Tests over an ssh connection The centrepiece of this project; everything else has in a way only been preparation for this. Obviously, the goal was to reuse the previously written local tests in some way. Not only because lazy me would have less work to do this way, but also because of a reduced long-term maintenance burden of one rather than two test sets. As it turns out, it is actually possible to accomplish that: The remote-tests script doesn t do much apart from starting an ssh server on localhost and running the local-tests script with the REMOTE environment variable set. The REMOTE environment variable changes the behavior of the local-tests script in such a way that it prepends "$REMOTE": to the destination of the rsync invocations. And given that we set REMOTE=rsync@localhost in the remote-tests script, local-tests copies the files to the exact same locations as before, just over ssh. The implementational details for this can be found in this merge request.

proposed-updates Most of my development work on the Debian rsync package took place during the Debian freeze as the release of Debian Trixie is just around the corner. This means that uploading by Debian Developers (DD) and Debian Maintainers (DM) to the unstable suite is discouraged as it makes migrating the packages to testing more difficult for the Debian release team. If DDs/DMs want to have the package version in unstable migrated to testing during the freeze they have to file an unblock request. Samuel has done this twice (1, 2) for my work for Trixie but has asked me to file the proposed-updates request for current stable (i.e. Debian Bookworm) myself after I ve backported my tests to bookworm.

Unfinished business To run the upstream tests which check access control list and extended file system attributes functionality, I ve added the acl and xattr packages to Build-Depends in debian/control. This, however, will only make the packages available at build time: If Debian users install the rsync package, the acl and xattr packages will not be installed alongside it. For that, the dependencies would have to be added to Depends or Suggests in debian/control. Depends is probably to strong of a relation since rsync clearly works well in practice without, but adding them to Suggests might be worthwhile. A decision on this would involve checking, what happens if rsync is called with the relevant options on a host machine which has those packages installed, but where the destination machine lacks them. Apart from the issue described above, the 15 tests I managed to write are are a drop in the water in light of the infinitude of rsync options and their combinations. Most glaringly, not all options of the --archive option are covered separately (which would help indicating what code path of rsync broke in a regression). To increase the likelihood of catching regressions with the autopkgtests, the test coverage should be extended in the future.

Conclusion Generally, I am happy with my contributions to Debian over the course of my small GSoC project: I ve created an extensible, easy to understand, and working autopkgtest setup for the Debian rsync package. There are two things which bother me, however:
  1. In hindsight, I probably shouldn t have gone with shunit2 as a testing framework. The fact that it behaves erratically with the -e flag is a serious drawback for a shell testing framework: You really don t want a shell command to fail silently and the test to continue running.
  2. As alluded to in the previous section, I m not particularly proud of the number of tests I managed to write.
On the other hand, finding and fixing the regression in diffoscope - while derailing me from the GSoC project itself - might have a redeeming quality.

DebConf25 By sheer luck I happened to work on a GSoC project at Debian over a time period during which the annual Debian conference would take place close enough to my place of residence. Samuel pointed the opportunity to attend DebConf out to me during the community bonding period and since I could make time for the event in my schedule, I signed up. DebConf was a great experience which - aside from gaining more knowledge about Debian development - allowed me to meet the actual people usually hidden behind email adresses and IRC nicks. I can wholeheartedly recommend attending a DebConf to every interested Debian user! For those who have missed this year s iteration of the conference, I can recommend the following recorded talks: While not featuring as a keynote speaker (understandably so as the newcomer to Debian community that I am), I could still contribute a bit to the conference program.

GSoC project presentation The Debian Outreach team has scheduled a session in which all GSoC and Outreachy students over the past year had the chance to present their work in a lightning talk. The session has been recorded and is available online, just like my slides and the source for them.

Debian install workshop Additionally, with so many Debian experts gathering in one place while KDE s End of 10 campaign is ongoing, I felt it natural to organize a Debian install workhop. In hindsight I can say that I underestimated how much work it would be, especially for me who does not speak a word of French. But although the turnout of people who wanted us to install Linux on their machines was disappointingly low, it was still worth it: Not only because the material in the repo can be helpful to others planning install workshops but also because it was nice to meet a) the person behind the Debian installer images and b) the local Brest/Finist re Linux user group as well as the motivated and helpful people at Infini.

Credits I want to thank the Open Source team at Google for organizing GSoC: The highly structured program with a one-to-one mentorship is a great avenue to start contributing to well established and at times intimidating FOSS projects. And as much as I disagree with Google s surveillance capitalist business model, I have to give it to them that the company at least takes its responsibility for FOSS (somewhat) seriously - unlike many other businesses which rely on FOSS and choose to freeride of it. Big thanks to the Debian community! I ve experienced nothing but friendliness in my interactions with the community. And lastly, the biggest thanks to my GSoC mentor Samuel Henrique. He has dealt patiently and competently with all my stupid newbie questions. His support enabled me to make - albeit small - contributions to Debian. It has been a pleasure to work with him during GSoC and I m looking forward to working together with him in the future.

  1. Obviously, I ve only read them after experiencing the problem.

31 July 2025

Simon Josefsson: Independently Reproducible Git Bundles

The gnulib project publish a git bundle as a stable archival copy of the gnulib git repository once in a while. Why? We don t know exactly what this may be useful for, but I m promoting for this to see if we can establish some good use-case. A git bundle may help to establish provinence in case of an attack on the Savannah hosting platform that compromise the gnulib git repository. Another use is in the Debian gnulib package: that gnulib bundle is git cloned when building some Debian packages, to get to exactly the gnulib commit used by each upstream project see my talk on gnulib at Debconf24 and this approach reduces the amount of vendored code that is part of Debian s source code, which is relevant to mitigate XZ-style attacks. The first time we published the bundle, I wanted it to be possible to re-create it bit-by-bit identically by others. At the time I discovered a well-written blog post by Paul Beacher on reproducible git bundles and thought he had solved the problem for me. Essentially it boils down to disable threading during compression when producing the bundle, and his final example show this results in a predictable bit-by-bit identical output:
$ for i in $(seq 1 100); do \
> git -c 'pack.threads=1' bundle create -q /tmp/bundle-$i --all; \
> done
$ md5sum /tmp/bundle-*   cut -f 1 -d ' '   uniq -c
    100 4898971d4d3b8ddd59022d28c467ffca
So what remains to be said about this? It seems reproducability goes deeper than that. One desirable property is that someone else should be able to reproduce the same git bundle, and not only that a single individual is able to reproduce things on one machine. It surprised me to see that when I ran the same set of commands on a different machine (started from a fresh git clone), I got a different checksum. The different checksums occured even when nothing had been committed on the server side between the two runs. I thought the reason had to do with other sources of unpredictable data, and I explored several ways to work around this but eventually gave up. I settled for the following sequence of commands:
REV=ac9dd0041307b1d3a68d26bf73567aa61222df54 # master branch commit to package
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input
# inspect that the new tree matches a trusted copy
git checkout -B master $REV # put $REV at master
for b in $(git branch -r   grep origin/stable-   sort --version-sort); do git checkout $ b#origin/ ; done
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any commits after $REV
git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle
At the time it felt more important to publish something than to reach for perfection, so we did so using the above snippet. Afterwards I reached out to the git community on this and there were good discussion about my challenge. At the end of that thread you see that I was finally able to reproduce a bit-by-bit identical bundles from two different clones, by using an intermediate git -c pack.threads=1 repack -adF step. I now assume that the unpredictable data I got earlier was introduced during the git clone steps, compressing the pack differently each time due to threaded compression. The outcome could also depend on what content the server provided, so if someone ran git gc, git repack on the server side things would change for the user, even if the user forced threading to 1 during cloning more experiments on what kind of server-side alterations results in client-side differences would be good research. A couple of months passed and it is now time to publish another gnulib bundle somewhat paired to the bi-yearly stable gnulib branches so let s walk through the commands and explain what they do. First clone the repository:
REV=225973a89f50c2b494ad947399425182dd42618c   # master branch commit to package
S1REV=475dd38289d33270d0080085084bf687ad77c74d # stable-202501 branch commit
S2REV=e8cc0791e6bb0814cf4e88395c06d5e06655d8b5 # stable-202507 branch commit
git clone https://git.savannah.gnu.org/git/gnulib.git
cd gnulib
git fsck # attempt to validate input
I believe the git fsck will validate that the chain of SHA1 commits are linked together, preventing someone from smuggling in unrelated commits earlier in the history without having to do SHA1 collision. SHA1 collisions are economically feasible today, so this isn t much of a guarantee of anything though.
git checkout -B master $REV # put $REV at master
# Add all stable-* branches locally:
for b in $(git branch -r   grep origin/stable-   sort --version-sort); do git checkout $ b#origin/ ; done
git checkout -B stable-202501 $S1REV
git checkout -B stable-202507 $S2REV
git remote remove origin # drop some unrelated branches
git gc --prune=now # drop any unrelated commits, not clear this helps
This establish a set of branches pinned to particular commits. The older stable-* branches are no longer updated, so they shouldn t be moving targets. In case they are modified in the future, the particular commit we used will be found in the official git bundle.
time git -c pack.threads=1 repack -adF
That s the new magic command to repack and recompress things in a hopefully more predictable way. This leads to a 72MB git pack under .git/objects/pack/ and a 62MB git bundle. The runtime on my laptop is around 5 minutes. I experimented with -c pack.compression=1 and -c pack.compression=9 but the size was roughly the same; 76MB and 66MB for level 1 and 72MB and 62MB for level 9. Runtime still around 5 minutes. Git uses zlib by default, which isn t the most optimal compression around. I tried -c pack.compression=0 and got a 163MB git pack and a 153MB git bundle. The runtime is still around 5 minutes, indicating that compression is not the bottleneck for the git repack command. That 153MB uncompressed git bundle compresses to 48MB with gzip default settings and 46MB with gzip -9; to 39MB with zst defaults and 34MB with zst -9; and to 28MB using xz defaults with a small 26MB using xz -9. Still the inconvenience of having to uncompress a 30-40MB archive into
the much larger 153MB is probably not worth the savings compared to
shipping and using the (still relatively modest) 62MB git bundle. Now finally prepare the bundle and ship it:
git -c 'pack.threads=1' bundle create gnulib.bundle --all
V=$(env TZ=UTC0 git show -s --date=format:%Y%m%d --pretty=%cd master)
mv gnulib.bundle gnulib-$V.bundle
build-aux/gnupload --to ftp.gnu.org:gnulib gnulib-$V.bundle
Yay! Another gnulib git bundle snapshot is available from
https://ftp.gnu.org/gnu/gnulib/. The essential part of the git repack command is the -F parameter. In the thread -f was suggested, which translates into the git pack-objects --no-reuse-delta parameter:
--no-reuse-delta
When creating a packed archive in a repository that has existing packs, the command reuses existing deltas. This sometimes results in a slightly suboptimal pack. This flag tells the command not to reuse existing deltas but compute them from scratch.
When reading the man page, I though that using -F which translates into --no-reuse-object would be slightly stronger:
--no-reuse-object
This flag tells the command not to reuse existing object data at all, including non deltified object, forcing recompression of everything. This implies --no-reuse-delta. Useful only in the obscure case where wholesale enforcement of a different compression level on the packed data is desired.
On the surface, without --no-reuse-objects, some amount of earlier compression could taint the final result. Still, I was able to get bit-by-bit identical bundles by using -f so possibly reaching for -F is not necessary. All the commands were done using git 2.51.0 as packaged by Guix. I fear the result may be different with other git versions and/or zlib libraries. I was able to reproduce the same bundle on a Trisquel 12 aramo (derived from Ubuntu 22.04) machine, which uses git 2.34.1. This suggests there is some chances of this being possible to reproduce in 20 years time. Time will tell. I also fear these commands may be insufficient if something is moving on the server-side of the git repository of gnulib (even just something simple as a new commit), I tried to make some experiments with this but let s aim for incremental progress here. At least I have now been able to reproduce the same bundle on different machines, which wasn t the case last time. Happy Reproducible Git Bundle Hacking!

29 July 2025

Christoph Berg: The Debian Conference 2025 in Brest

It's Sunday and I'm now sitting in the train from Brest to Paris where I will be changing to Germany, on the way back from the annual Debian conference. A full week of presentations, discussions, talks and socializing is laying behind me and my head is still spinning from the intensity.
Pollito and the gang of DebConf mascots wearing their conference badgesPollito and the gang of DebConf mascots wearing their conference badges (photo: Christoph Berg)
Sunday, July 13th It started last Sunday with traveling to the conference. I got on the Eurostar in Duisburg and we left on time, but even before reaching Cologne, the train was already one hour delayed for external reasons, collecting yet another hour between Aachen and Liege for its own technical problems. "The train driver is working on trying to fix the problem." My original schedule had well over two hours for changing train stations in Paris, but being that late, I missed the connection to Brest in Montparnasse. At least in the end, the total delay was only one hour when finally arriving at the destination. Due to the French julliet quatorze fireworks approaching, buses in Brest were rerouted, but I managed to catch the right bus to the conference venue, already meeting a few Debian people on the way. The conference was hosted at the IMT Atlantique Brest campus, giving the event a nice university touch. I arrived shortly after 10 in the evening and after settling down a bit, got on one of the "magic" buses for transportation to the camping site where half of the attendees where stationed. I shared a mobile home with three other Debianites, where I got a small room for myself. Monday, July 14th Next morning, we took the bus back to the venue with a small breakfast and the opening session where Enrico Zini invited me to come to his and Nicolas Dandrimont's session about Debian community governance and curation, which I gladly did. Many ideas about conflict moderation and community steering were floated around. I hope some of that can be put into effect to make flamewars on the mailing lists less heated and more directed. After that, I attended Olly Betts' "Stemming with Snowball" session, which is the stemmer used also in PostgreSQL. Text search is one of the areas in PostgreSQL that I never really looked closely at, including the integration into the postgresql-common package, so it was nice to get more information about that. In preparation for the conference, a few of us Ham radio operators in Debian had decided to bring some radio gear to DebConf this year in order to perhaps spark more interest for our hobby among the fellow geeks. In the afternoon after the talks, I found a quieter spot just outside of the main hall and set up a shortwave antenna by attaching a 10m mast to one of the park benches there. The 40m band was still pretty much closed, but I could work a few stations from England, just across the channel from Bretagne, answering questions from interested passing-by Debian people between the contacts. Over time, the band opened and more European stations got into the log.
F/DF7CB in Brest (photo: Evangelos Ribeiro Tzaras)
Tuesday, July 15th Tuesday started with Helmut Grohne's session about "Reviving (un)schroot". The schroot program has been Debian's standard way of managing build chroots for a long time, but it is more and more being regarded as obsolete with all kinds of newer containerization and virtualization technologies taking over. Since many bits of Debian infrastructure depend on schroot, and its user interface is still very useful, Helmut reimplemented it using Linux namespaces and the "unshare" systemcall. I had already worked with him at the Hamburg Minidebconf to replace the apt.postgresql.org buildd machinery with the new system, but we were not quite there yet (network isolation is nice, but we still sometimes need proper networking), so it was nice to see the effort is still progressing and I will give his new scripts a try when I'm back home. Next, Stefano Rivera and Colin Watson presented Debusine, a new package repository and workflow management system. It looks very promising for anyone running their own repository, so perhaps yet another bit of apt.postgresql.org infrastructure to replace in the future. After that, I went to the Debian LTS BoF session by Santiago Ruano Rinc n and Bastien Roucari s - Debian releases plus LTS is what we are covering with apt.postgresql.org. Then there were bits from the DPL (Debian Project Leader), and a session moderated by Stefano Rivera interesting to me as a member of the Debian Technical Committee on the future structure of the packages required for cross-building in Debian, a topic which had been brought to TC a while ago. I am happy that we could resolve the issue without having to issue a formal TC ruling as the involved parties (kernel, glibc, gcc and the cross-build people) found a promising way forward themselves. DebConf is really a good way to get such issues unstuck. Ten years ago at the 2015 Heidelberg DebConf, Enrico had given a seminal "Semi-serious stand-up comedy" talk, drawing parallels between the Debian Open Source community and the BDSM community - "People doing things consensually together". (Back then, the talk was announced as "probably unsuitable for people of all ages".) With his unique presentation style and witty insights, the session made a lasting impression on everyone attending. Now, ten years later (and he and many in the audience being ten years older), he gave an updated version of it. We are now looking forward to the sequel in 2035. The evening closed with the famous DebConf tradition of the Cheese & Wine party in a old fort next to the coast, just below the conference venue. Even when he's a fellow Debian Developer, Ham and also TC member, I had never met Paul Tagliamonte in person before, but we spent most of the evening together geeking out on all things Debian and Ham radio.
The northern coast of Ushant (photo: Christoph Berg)
Wednesday, July 16th Wednesday already marked the end of the first half of the week, the day of the day trips. I had chosen to go to Ouessant island (Ushant in English) which marks the Western end of French mainland and hosts one of the lighthouses yielding the way into the English channel. The ferry trip included surprisingly big waves which left some participants seasick, but everyone recovered fast. After around one and a half hours we arrived, picked up the bicycles, and spent the rest of the day roaming the island. The weather forecast was originally very cloudy and 18 C, but over noon this turned into sunny and warm, so many got an unplanned sunburn. I enjoyed the trip very much - it made up for not having time visiting the city during the week. After returning, we spent the rest of the evening playing DebConf's standard game, Mao (spoiler alert: don't follow the link if you ever intend to play).
Having a nice day (photo: Christoph Berg)
Thursday, July 17th The next day started with the traditional "Meet the Technical Committee" session. This year, we trimmed the usual slide deck down to remove the boring boilerplate parts, so after a very short introduction to the work of the committee by our chairman Matthew Vernon, we opened up the discussion with the audience, with seven (out of 8) TC members on stage. I think the format worked very well, with good input from attendees. Next up was "Don't fear the TPM" by Jonathan McDowell. A common misconception in the Free Software community is that the TPM is evil DRM hardware working against the user, but while it could be used in theory that way, the necessary TPM attestations seem to impossible to attain in practice, so that wouldn't happen anyway. Instead, it is a crypto coprocessor present in almost all modern computers that can be used to hold keys, for example to be used for SSH. It will also be interesting to research if we can make use of it for holding the Transparent Data Encryption keys for CYBERTEC's PostgreSQL Enterprise Edition. Aigars Mahinovs then directed everyone in place for the DebConf group picture, and Lucas Nussbaum started a discussion about archive-wide QA tasks in Debian, an area where I did a lot of work in the past and that still interests me. Antonio Terceiro and Paul Gevers followed up with techniques to track archive-wide rebuilding and testing of packages and in turn filing a lot of bugs to track the problems. The evening ended with the conference dinner, again in the fort close by the coast. DebConf is good for meeting new people, and I incidentally ran into another Chris, who happened to be one of the original maintainers of pgaccess, the pre-predecessor of today's pgadmin. I admit still missing this PostgreSQL frontend for its simplicity and ability to easily edit table data, but it disappeared around 2004. Friday, July 18th On Friday, I participated in discussion sessions around contributors.debian.org (PostgreSQL is planning to set up something similar) and the New Member process which I had helped to run and reform a decade or two ago. Agathe Porte (also a Ham radio operator, like so many others at the conference I had no idea of) then shared her work on rust-rewriting the slower parts of Lintian, the Debian package linter. Craig Small talked about "Free as in Bytes", the evolution of the Linux procps free command. Over the time and many kernel versions, the summary numbers printed became better and better, but there will probably never be a version that suits all use cases alike. Later over dinner, Craig (who is also a TC member) and I shared our experiences with these numbers and customers (not) understanding them. He pointed out that for PostgreSQL and looking at used memory in the presence of large shared memory buffers, USS (unique set size) and PSS (proportional set size) should be more realistic numbers than the standard RSS (resident set size) that the top utility is showing by default. Antonio Terceiro and Paul Gevers again joined to lead a session, now on ci.debian.net and autopkgtest, the test driver used for running tests on packages after then have been installed on a system. The PostgreSQL packages are heavily using this to make sure no regressions creep in even after builds have successfully completed and test re-runs are rescheduled periodically. The day ended with Bdale Garbee's electronics team BoF and Paul Tagliamonte and me setting up the radio station in the courtyard, again answering countless questions about ionospheric conditions and operating practice. Saturday, July 19th Saturday was the last conference day. In the first session, Nikos Tsipinakis and Federico Vaga from CERN announced that the LHC will be moving to Debian for the accelerator's frontend computers in their next "long shutdown" maintenance period in the next year. CentOS broke compatibility too often, and Debian trixie together with the extended LTS support will cover the time until the next long shutdown window in 2035, until when the computers should have all been replaced with newer processors covering higher x86_64 baseline versions. The audience was very delighted to hear that Debian is now also being used in this prestige project. Ben Hutchings then presented new Linux kernel features. Particularly interesting for me was the support for atomic writes spanning more than one filesystem block. When configured correctly, this would mean PostgreSQL didn't have to record full-page images in the WAL anymore, increasing throughput and performance. After that, the Debian ftp team discussed ways to improve review of new packages in the archive, and which of their processes could be relaxed with new US laws around Open Source and cryptography algorithms export. Emmanuel Arias led a session on Salsa CI, Debian's Gitlab instance and standard CI pipeline. (I think it's too slow, but the runners are not under their control.) Julian Klode then presented new features in APT, Debian's package manager. I like the new display format (and a tiny bit of that is also from me sending in wishlist bugs). In the last round of sessions this week, I then led the Ham radio BoF with an introduction into the hobby and how Debian can be used. Bdale mentioned that the sBitx family of SDR radios is natively running Debian, so stock packages can be used from the radio's touch display. We also briefly discussed his involvement in ARDC and the possibility to get grants from them for Ham radio projects. Finally, DebConf wrapped up with everyone gathering in the main auditorium and cheering the organizers for making the conference possible and passing Pollito, the DebConf mascot, to the next organizer team.
Pollito on stage (photo: Christoph Berg)
Sunday, July 20th Zoom back to the train: I made it through the Paris metro and I'm now on the Eurostar back to Germany. It has been an intense week with all the conference sessions and meeting all the people I had not seen so long. There are a lot of new ideas to follow up on both for my Debian and PostgreSQL work. Next year's DebConf will take place in Santa Fe, Argentina. I haven't yet decided if I will be going, but I can recommend the experience to everyone! The post The Debian Conference 2025 in Brest appeared first on CYBERTEC PostgreSQL Services & Support.

28 July 2025

Dimitri John Ledkov: Achieving actually full disk encryption of UEFI ESP at rest with TCG OPAL, FIPS, LUKS

Achieving full disk encryption using FIPS, TCG OPAL and LUKS to encrypt UEFI ESP on bare-metal and in VMs
Many security standards such as CIS and STIG require to protect information at rest. For example, NIST SP 800-53r5 SC-28 advocate to use cryptographic protection, offline storage and TPMs to enhance protection of information confidentiality and/or integrity.Traditionally to satisfy such controls on portable devices such as laptops one would utilize software based Full Disk Encryption - Mac OS X FileVault, Windows Bitlocker, Linux cryptsetup LUKS2. In cases when FIPS cryptography is required, additional burden would be placed onto these systems to operate their kernels in FIPS mode.Trusted Computing Group works on establishing many industry standards and specifications, which are widely adopted to improve safety and security of computing whilst keeping it easy to use. One of their most famous specifications them is TCG TPM 2.0 (Trusted Platform Module). TPMs are now widely available on most devices and help to protect secret keys and attest systems. For example, most software full disk encryption solutions can utilise TCG TPM to store full disk encryption keys providing passwordless, biometric or pin-base ways to unlock the drives as well as attesting that system have not been modified or compromised whilst offline.TCG Storage Security Subsystem Class: Opal Specification is a set of specifications for features of data storage devices. The authors and contributors to OPAL are leading and well trusted storage manufacturers such as Samsung, Western Digital, Seagate Technologies, Dell, Google, Lenovo, IBM, Kioxia, among others. One of the features that Opal Specification enables is self-encrypting drives which becomes very powerful when combined with pre-boot authentication. Out of the box, such drives always and transparently encrypt all disk data using hardware acceleration. To protect data one can enter UEFI firmware setup (BIOS) to set NVMe single user password (or user + administrator/recovery passwords) to encrypt the disk encryption key. If one's firmware didn't come with such features, one can also use SEDutil to inspect and configure all of this. Latest release of major Linux distributions have SEDutil already packaged.Once password is set, on startup, pre-boot authentication will request one to enter password - prior to booting any operating systems. It means that full disk is actually encrypted, including the UEFI ESP and all operating systems that are installed in case of dual or multi-boot installations. This also prevents tampering with ESP, UEFI bootloaders and kernels which with traditional software-based encryption often remain unencrypted and accessible. It also means one doesn't have to do special OS level repartitioning, or installation steps to ensure all data is encrypted at rest.What about FIPS compliance? Well, the good news is that majority of the OPAL compliant hard drives and/or security sub-chips do have FIPS 140-3 certification. Meaning they have been tested by independent laboratories to ensure they do in-fact encrypt data. On the CMVP website one can search for module name terms "OPAL" or "NVMe" or name of hardware vendor to locate FIPS certificates.Are such drives widely available? Yes. For example, a common Thinkpad X1 gen 11 has OPAL NVMe drives as standard, and they have FIPS certification too. Thus, it is likely in your hardware fleet these are already widely available. Use sedutil to check if MediaEncrypt and LockingSupported features are available.Well, this is great for laptops and physical servers, but you may ask - what about public or private cloud? Actually, more or less the same is already in-place in both. On CVMP website all major clouds have their disk encryption hardware certified, and all of them always encrypt all Virtual Machines with FIPS certified cryptography without an ability to opt-out. One is however in full control of how the encryption keys are managed: cloud-provider or self-managed (either with a cloud HSM or KMS or bring your own / external). See these relevant encryption options and key management docs for GCP, Azure, AWS. But the key takeaway without doing anything, at rest, VMs in public cloud are always encrypted and satisfy NIST SP 800-53 controls.What about private cloud? Most Linux based private clouds ultimately use qemu typically with qcow2 virtual disk images. Qemu supports user-space encryption of qcow2 disk, see this manpage. Such encryption encrypts the full virtual machine disk, including the bootloader and ESP. And it is handled entirely outside of the VM on the host - meaning the VM never has access to the disk encryption keys. Qemu implements this encryption entirely in userspace using gnutls, nettle, libgcrypt depending on how it was compiled. This also means one can satisfy FIPS requirements entirely in userspace without a Linux kernel in FIPS mode. Higher level APIs built on top of qemu also support qcow2 disk encryption, as in projects such as libvirt and OpenStack Cinder.If you carefully read the docs, you may notice that agent support is explicitly sometimes called out as not supported or not mentioned. Quite often agents running inside the OS may not have enough observability to them to assess if there is external encryption. It does mean that monitoring above encryption options require different approaches - for example monitor your cloud configuration using tools such as Wiz and Orca, rather than using agents inside individual VMs. For laptop / endpoint security agents, I do wish they would start gaining capability to report OPAL SED availability and status if it is active or not.What about using software encryption none-the-less on top of the above solutions? It is commonly referred to double or multiple encryption. There will be an additional performance impact, but it can be worthwhile. It really depends on what you define as data at rest for yourself and which controls you need. If one has a dual-boot laptop, and wants to keep one OS encrypted whilst booted into the other, it can perfectly reasonable to encrypted the two using separate software encryption keys. In addition to the OPAL encryption of the ESP. For more targeted per-file / per-folder encryption, one can look into using gocryptfs which is the best successor to the once popular, but now deprecated eCryptfs (amazing tool, but has fallen behind in development and can lead to data loss).All of the above mostly talks about cryptographic encryption, which only provides confidentially but not data integrity. To protect integrity, one needs to choose how to maintain that. dm-verity is a good choice for read-only and rigid installations. For read-write workloads, it may be easier to deploy ZFS or Btrfs instead. If one is using filesystems without a built-in integrity support such as XFS or Ext4, one can retrofit integrity layer to them by using dm-integrity (either standalone, or via dm-luks/cryptsetup --integrity option).

If one has a lot of estate and a lot of encryption keys to keep track off a key management solution is likely needed. The most popular solution is likely the one from Thales Group marketed under ChiperTrust Data Security Platform (previously Vormetric), but there are many others including OEM / Vendor / Hardware / Cloud specific or agnostic solutions.

I hope this crash course guide piques your interest to learn and discover modern confidentially and integrity solutions, and to re-affirm or change your existing controls w.r.t. to data protection at rest.

Full disk encryption, including UEFI ESP /boot/efi is now widely achievable by default on both baremetal machines and in VMs including with FIPS certification. To discuss more let's connect on Linkedin.

Russ Allbery: Review: Cyteen

Review: Cyteen, by C.J. Cherryh
Series: Cyteen #1
Publisher: Warner Aspect
Copyright: 1988
Printing: September 1995
ISBN: 0-446-67127-4
Format: Trade paperback
Pages: 680
The main text below is an edited version of my original review of Cyteen written on 2012-01-03. Additional comments from my re-read are after the original review. I've reviewed several other C.J. Cherryh books somewhat negatively, which might give the impression I'm not a fan. That is an artifact of when I started reviewing. I first discovered Cherryh with Cyteen some 20 years ago, and it remains one of my favorite SF novels of all time. After finishing my reading for 2011, I was casting about for what to start next, saw Cyteen on my parents' shelves, and decided it was past time for my third reading, particularly given the recent release of a sequel, Regenesis. Cyteen is set in Cherryh's Alliance-Union universe following the Company Wars. It references several other books in that universe, most notably Forty Thousand in Gehenna but also Downbelow Station and others. It also has mentions of the Compact Space series (The Pride of Chanur and sequels). More generally, almost all of Cherryh's writing is loosely tied together by an overarching future history. One does not need to read any of those other books before reading Cyteen; this book will fill you in on all of the politics and history you need to know. I read Cyteen first and have never felt the lack. Cyteen was at one time split into three books for publishing reasons: The Betrayal, The Rebirth, and The Vindication. This is an awful way to think of the book. There are no internal pauses or reasonable volume breaks; Cyteen is a single coherent novel, and Cherryh has requested that it never be broken up that way again. If you happen to find all three portions as your reading copy, they contain all the same words and are serviceable if you remember it's a single novel under three covers, but I recommend against reading the portions in isolation. Human colonization of the galaxy started with slower-than-light travel sponsored by the private Sol Corporation. The inhabitants of the far-flung stations and the crews of the merchant ships that supplied them have formed their own separate cultures, but initially remained attached to Earth. That changed with the discovery of FTL travel and a botched attempt by Earth to reassert its authority. At the time of Cyteen, there are three human powers: distant Earth (which plays little role in this book), the merchanter Alliance, and Union. The planet Cyteen is one of only a few Earth-like worlds discovered by human expansion, and is the seat of government and the most powerful force in Union. This is primarily because of Reseune: the Cyteen lab that produces the azi. If Cyteen is about any one thing, it's about azi: genetically engineered human clones who are programmed via intensive psychological conditioning starting before birth. The conditioning uses a combination of drugs to make them receptive and "tape," specific patterns of instruction and sensory stimulation. They are designed for specific jobs or roles, they're conditioned to be obedient to regular humans, and they're not citizens. They are, in short, slaves. In a lot of books, that's as deep as the analysis would go. Azi are slaves, and slavery is certainly bad, so there would probably be a plot around azi overthrowing their conditioning, or around the protagonists trying to free them from servitude. But Cyteen is not any SF novel, and azi are considerably more complex and difficult than that analysis. We learn over the course of the book that the immensely powerful head of Reseune Labs, Ariane Emory, has a specific broader purpose in mind for the azi. One of the reasons why Reseune fought for and gained the role of legal protector of all azi in Union, regardless of where they were birthed, is so that Reseune could act to break any permanent dependence on azi as labor. And yet, they are slaves; one of the protagonists of Cyteen is an experimental azi, which makes him the permanent property of Reseune and puts him in constant jeopardy of being used as a political prisoner and lever of manipulation against those who care about him. Cyteen is a book about manipulation, about programming people, about what it means to have power over someone else's thoughts, and what one can do with that power. But it's also a book about connection and identity, about what makes up a personality, about what constitutes identity and how people construct the moral codes and values that they hold at their core. It's also a book about certainty. Azi are absolutely certain, and are capable of absolute trust, because that's part of their conditioning. Naturally-raised humans are not. This means humans can do things that azi can't, but the reverse is also true. The azi are not mindless slaves, nor are they mindlessly programmed, and several of the characters, both human and azi, find a lot of appeal in the core of certainty and deep self-knowledge of their own psychological rules that azis can have. Cyteen is a book about emotions, and logic, and where they come from and how to balance them. About whether emotional pain and uncertainty is beneficial or damaging, and about how one's experiences make up and alter one's identity. This is also a book about politics, both institutional and personal. It opens with Ariane Emory, Councilor for Science for five decades and the head of the ruling Union Expansionist party. She's powerful, brilliant, dangerously good at reading people, and dangerously willing to manipulate and control people for her own ends. What she wants, at the start of the book, is to completely clone a Special (the legal status given to the most brilliant minds of Union). This was attempted before and failed, but Ariane believes it's now possible, with a combination of tape, genetic engineering, and environmental control, to reproduce the brilliance of the original mind. To give Union another lifespan of work by their most brilliant thinkers. Jordan Warrick, another scientist at Reseune, has had a long-standing professional and personal feud with Ariane Emory. As the book opens, he is fighting to be transferred out from under her to the new research station that would be part of the Special cloning project, and he wants to bring his son Justin and Justin's companion azi Grant with them. Justin is a PR, a parental replicate, meaning he shares Jordan's genetic makeup but was not an attempt to reproduce the conditions of Jordan's rearing. Grant was raised as his brother. And both have, for reasons that are initially unclear, attracted the attention of Ariane, who may be using them as pawns. This is just the initial setup, and along with this should come a warning: the first 150 pages set up a very complex and dangerous political situation and build the tension that will carry the rest of the book, and they do this by, largely, torturing Justin and Grant. The viewpoint jumps around, but Justin and Grant are the primary protagonists for this first section of the book. While one feels sympathy for both of them, I have never, in my multiple readings of the book, particularly liked them. They're hard to like, as opposed to pity, during this setup; they have very little agency, are in way over their heads, are constantly making mistakes, and are essentially having their lives destroyed. Don't let this turn you off on the rest of the book. Cyteen takes a dramatic shift about 150 pages in. A new set of protagonists are introduced who are some of the most interesting, complex, and delightful protagonists in any SF novel I have read, and who are very much worth waiting for. While Justin has his moments later on (his life is so hard that his courage can be profoundly moving), it's not necessary to like him to love this book. That's one of the reasons why I so strongly dislike breaking it into three sections; that first section, which is mostly Justin and Grant, is not representative of the book. I can't talk too much more about the plot without risking spoiling it, but it's a beautiful, taut, and complex story that is full of my favorite things in both settings and protagonists. Cyteen is a book about brilliant people who think on their feet. Cherryh succeeds at showing this through what they do, which is rarely done as well as it is here. It's a book about remembering one's friends and remembering one's enemies, and waiting for the most effective moment to act, but it also achieves some remarkable transformations. About 150 pages in, you are likely to loathe almost everyone in Reseune; by the end of the book, you find yourself liking, or at least understanding, nearly everyone. This is extremely hard, and Cherryh pulls it off in most cases without even giving the people she's redeeming their own viewpoint sections. Other than perhaps George R.R. Martin I've not seen another author do this as well. And, more than anything else, Cyteen is a book with the most wonderful feeling of catharsis. I think this is one of the reasons why I adore this book and have difficulties with some of Cherryh's other works. She's always good at ramping up the tension and putting her characters in awful, untenable positions. Less frequently does she provide the emotional payoff of turning the tables, where you get to watch a protagonist do everything you've been wanting them to do for hundreds of pages, except even better and more delightfully than you would have come up with. Cyteen is one of the most emotionally satisfying books I've ever read. I could go on and on; there is just so much here that I love. Deep questions of ethics and self-control, presented in a way that one can see the consequences of both bad decisions and good ones and contrast them. Some of the best political negotiations in fiction. A wonderful look at friendship and loyalty from several directions. Two of the best semi-human protagonists I've seen, who one can see simultaneously as both wonderful friends and utterly non-human and who put nearly all of the androids in fiction to shame by being something trickier and more complex. A wonderful unfolding sense of power. A computer that can somewhat anticipate problems and somewhat can't, and that encapsulates much of what I love about semi-intelligent bases in science fiction. Cyteen has that rarest of properties of SF novels: Both the characters and the technology meld in a wonderful combination where neither could exist without the other, where the character issues are illuminated by the technology and the technology supports the characters. I have, for this book, two warnings. The first, as previously mentioned, is that the first 150 pages of setup is necessary but painful to read, and I never fully warmed to Justin and Grant throughout. I would not be surprised to hear that someone started this book but gave up on it after 50 or 100 pages. I do think it's worth sticking out the rocky beginning, though. Justin and Grant continue to be a little annoying, but there's so much other good stuff going on that it doesn't matter. The other warning is that part of the setup of the story involves the rape of an underage character. This is mostly off-camera, but the emotional consequences are significant (as they should be) and are frequently discussed throughout the book. There is also rather frank discussion of adolescent sexuality later in the book. I think both of these are relevant to the story and handled in a way that isn't gratuitous, but they made me uncomfortable and I don't have any past history with those topics. Those warnings notwithstanding, this is simply one of the best SF novels ever written. It uses technology to pose deep questions about human emotions, identity, and interactions, and it uses complex and interesting characters to take a close look at the impact of technology on lives. And it does this with a wonderfully taut, complicated plot that sustains its tension through all 680 pages, and with characters whom I absolutely love. I have no doubt that I'll be reading it for a fourth and fifth time some years down the road. Followed by Regenesis, although Cyteen stands well entirely on its own and there's no pressing need to read the sequel. Rating: 10 out of 10

Some additional thoughts after re-reading Cyteen in 2025: I touched on this briefly in my original review, but I was really struck during this re-read how much the azi are a commentary on and a complication of the role of androids in earlier science fiction. Asimov's Three Laws of Robotics were an attempt to control the risks of robots, but can also be read as turning robots into slaves. Azis make the slavery more explicit and disturbing by running the programming on a human biological platform, but they're more explicitly programmed and artificial than a lot of science fiction androids. Artificial beings and their relationship to humans have been a recurring theme of SF since Frankenstein, but I can't remember a novel that makes the comparison to humans this ambiguous and conflicted. The azi not only like being azi, they can describe why they prefer it. It's clear that Union made azi for many of the same reasons that humans enslave other humans, and that Ariane Emory is using them as machinery in a larger (and highly ethically questionable) plan, but Cherryh gets deeper into the emergent social complications and societal impact than most SF novels manage. Azi are apparently closer to humans than the famous SF examples such as Commander Data, but the deep differences are both more subtle and more profound. I've seen some reviewers who are disturbed by the lack of a clear moral stance by the protagonists against the creation of azi. I'm not sure what to think about that. It's clear the characters mostly like the society they've created, and the groups attempting to "free" azi from their "captivity" are portrayed as idiots who have no understanding of azi psychology. Emory says she doesn't want azi to be a permanent aspect of society but clearly has no intention of ending production any time soon. The book does seem oddly unaware that the production of azi is unethical per se and, unlike androids, has an obvious exit ramp: Continue cloning gene lines as needed to maintain a sufficient population for a growing industrial civilization, but raise the children as children rather than using azi programming. If Cherryh included some reason why that was infeasible, I didn't see it, and I don't think the characters directly confronted it. I don't think societies in books need to be ethical, or that Cherryh intended to defend this one. There are a lot of nasty moral traps that civilizations can fall into that make for interesting stories. But the lack of acknowledgment of the problem within the novel did seem odd this time around. The other part of this novel that was harder to read past in this re-read is the sexual ethics. There's a lot of adolescent sexuality in this book, and even apart from the rape scene which was more on-the-page than I had remembered and which is quite (intentionally) disturbing there is a whole lot of somewhat dubious consent. Maybe I've gotten older or just more discriminating, but it felt weirdly voyeuristic to know this much about the sex lives of characters who are, at several critical points in the story, just a bunch of kids. All that being said, and with the repeated warning that the first 150 pages of this novel are just not very good, there is still something magic about the last two-thirds of this book. It has competence porn featuring a precociously brilliant teenager who I really like, it has one of the more interesting non-AI programmed computer systems that I've read in SF, it has satisfying politics that feel like modern politics (media strategy and relationships and negotiated alliances, rather than brute force and ideology), and it has a truly excellent feeling of catharsis. The plot resolution is a bit too abrupt and a bit insufficiently explained (there's more in Regenesis), but even though this was my fourth time through this book, the pacing grabbed me again and I could barely put down the last part of the story. Ethics aside (and I realize that's quite the way to start a sentence), I find the azi stuff fascinating. I know the psychology in this book is not real and is hopelessly simplified compared to real psychology, but there's something in the discussions of value sets and flux and self-knowledge that grabs my interest and makes me want to ponder. I think it's the illusion of simplicity and control, the what-if premise of thought where core motivations and moral rules could be knowable instead of endlessly fluid the way they are in us humans. Cherryh's azi are some of the most intriguing androids in science fiction to me precisely because they don't start with computers and add the humanity in, but instead start with humanity and overlay a computer-like certainty of purpose that's fully self-aware. The result is more subtle and interesting than anything Star Trek managed. I was not quite as enamored with this book this time around, but it's still excellent once the story gets properly started. I still would recommend it, but I might add more warnings about the disturbing parts.

Re-read rating: 9 out of 10

26 July 2025

Matthew Palmer: Object deserialization attacks using Ruby's Oj JSON parser

tl;dr: there is an attack in the wild which is triggering dangerous-but-seemingly-intended behaviour in the Oj JSON parser when used in the default and recommended manner, which can lead to everyone s favourite kind of security problem: object deserialization bugs! If you have the oj gem anywhere in your Gemfile.lock, the quickest mitigation is to make sure you have Oj.default_options = mode: :strict somewhere, and that no library is overwriting that setting to something else.

Prologue As a sensible sysadmin, all the sites I run send me a notification if any unhandled exception gets raised. Mostly, what I get sent is error-handling corner cases I missed, but now and then things get more interesting. In this case, it was a PG::UndefinedColumn exception, which looked something like this:
PG::UndefinedColumn: ERROR:  column "xyzzydeadbeef" does not exist
This is weird on two fronts: firstly, this application has been running for a while, and if there was a schema problem, I d expect it to have made itself apparent long before now. And secondly, while I don t profess to perfection in my programming, I m usually better at naming my database columns than that. Something is definitely hinky here, so let s jump into the mystery mobile!

The column name is coming from outside the building! The exception notifications I get sent include a whole lot of information about the request that caused the exception, including the request body. In this case, the request body was JSON, and looked like this:
 "name":":xyzzydeadbeef", ... 
The leading colon looks an awful lot like the syntax for a Ruby symbol, but it s in a JSON string. Surely there s no way a JSON parser would be turning that into a symbol, right? Right?!? Immediately, I thought that that possibly was what was happening, because I use Sequel for my SQL database access needs, and Sequel treats symbols as database column names. It seemed like too much of a coincidence that a vaguely symbol-shaped string was being sent in, and the exact same name was showing up as a column name. But how the flying fudgepickles was a JSON string being turned into a Ruby symbol, anyway? Enter Oj.

Oj? I barely know aj A long, long time ago, the standard Ruby JSON library had a reputation for being slow. Thus did many competitors flourish, claiming more features and better performance. Strong amongst the contenders was oj (for Optimized JSON ), touted as The fastest JSON parser and object serializer . Given the history, it s not surprising that people who wanted the best possible performance turned to Oj, leading to it being found in a great many projects, often as a sub-dependency of a dependency of a dependency (which is how it ended up in my project). You might have noticed in Oj s description that, in addition to claiming fastest , it also describes itself as an object serializer . Anyone who has kept an eye on the security bug landscape will recall that object deserialization is a rich vein of vulnerabilities to mine. Libraries that do object deserialization, especially ones with a history that goes back to before the vulnerability class was well-understood, are likely to be trouble magnets. And thus, it turns out to be with Oj. By default, Oj will happily turn any string that starts with a colon into a symbol:

>> require "oj"
>> Oj.load(' "name":":xyzzydeadbeef","username":"bob","answer":42 ')
=>  "name"=>:xyzzydeadbeef, "username"=>"bob", "answer"=>42 

How that gets exploited is only limited by the creativity of an attacker. Which I ll talk about more shortly but first, a word from my rant cortex.

Insecure By Default is a Cancer While the object of my ire today is Oj and its fast-and-loose approach to deserialization, it is just one example of a pervasive problem in software: insecurity by default. Whether it s a database listening on 0.0.0.0 with no password as soon as its installed, or a library whose default behaviour is to permit arbitrary code execution, it all contributes to a software ecosystem that is an appalling security nightmare. When a user (in this case, a developer who wants to parse JSON) comes across a new piece of software, they have by definition no idea what they re doing with that software. They re going to use the defaults, and follow the most easily-available documentation, to achieve their goal. It is unrealistic to assume that a new user of a piece of software is going to do things the right way , unless that right way is the only way, or at least the by-far-the-easiest way. Conversely, the developer(s) of the software is/are the domain experts. They have knowledge of the problem domain, through their exploration while building the software, and unrivalled expertise in the codebase. Given this disparity in knowledge, it is tantamount to malpractice for the experts the developer(s) to off-load the responsibility for the safe and secure use of the software to the party that has the least knowledge of how to do that (the new user). To apply this general principle to the specific case, take the Using section of the Oj README. The example code there calls Oj.load, with no indication that this code will, in fact, parse specially-crafted JSON documents into Ruby objects. The brand-user user of the library, no doubt being under pressure to Get Things Done, is almost certainly going to look at this Using example, get the apparent result they were after (a parsed JSON document), and call it a day. It is unlikely that a brand-new user will, for instance, scroll down to the Further Reading section, find the second last (of ten) listed documents, Security.md , and carefully peruse it. If they do, they ll find an oblique suggestion that parsing untrusted input is never a good idea . While that s true, it s also rather unhelpful, because I d wager that by far the majority of JSON parsed in the world is untrusted , in one way or another, given the predominance of JSON as a format for serializing data passing over the Internet. This guidance is roughly akin to putting a label on a car s airbags that driving at speed can be hazardous to your health : true, but unhelpful under the circumstances. The solution is for default behaviours to be secure, and any deviation from that default that has the potential to degrade security must, at the very least, be clearly labelled as such. For example, the Oj.load function should be named Oj.unsafe_load, and the Oj.load function should behave as the Oj.safe_load function does presently. By naming the unsafe function as explicitly unsafe, developers (and reviewers) have at least a fighting chance of recognising they re doing something risky. We put warning labels on just about everything in the real world; the same should be true of dangerous function calls. OK, rant over. Back to the story.

But how is this exploitable? So far, I ve hopefully made it clear that Oj does some Weird Stuff with parsing certain JSON strings. It caused an unhandled exception in a web application I run, which isn t cool, but apart from bombing me with exception notifications, what s the harm? For starters, let s look at our original example: when presented with a symbol, Sequel will interpret that as a column name, rather than a string value. Thus, if our save an update to the user code looked like this:

# request_body has the JSON representation of the form being submitted
body = Oj.load(request_body)
DB[:users].where(id: user_id).update(name: body["name"])

In normal operation, this will issue an SQL query along the lines of UPDATE users SET name='Jaime' WHERE id=42. If the name given is Jaime O Dowd , all is still good, because Sequel quotes string values, etc etc. All s well so far. But, imagine there is a column in the users table that normally users cannot read, perhaps admin_notes. Or perhaps an attacker has gotten temporary access to an account, and wants to dump the user s password hash for offline cracking. So, they send an update claiming that their name is :admin_notes (or :password_hash). In JSON, that ll look like "name":":admin_notes" , and Oj.load will happily turn that into a Ruby object of "name"=>:admin_notes . When run through the above update the user code fragment, it ll produce the SQL UPDATE users SET name=admin_notes WHERE id=42. In other words, it ll copy the contents of the admin_notes column into the name column which the attacker can then read out just by refreshing their profile page.

But Wait, There s More! That an attacker can read other fields in the same table isn t great, but that s barely scratching the surface. Remember before I said that Oj does object serialization ? That means that, in general, you can create arbitrary Ruby objects from JSON. Since objects contain code, it s entirely possible to trigger arbitrary code execution by instantiating an appropriate Ruby object. I m not going to go into details about how to do this, because it s not really my area of expertise, and many others have covered it in detail. But rest assured, if an attacker can feed input of their choosing into a default call to Oj.load, they ve been handed remote code execution on a platter.

Mitigations As Oj s object deserialization is intended and documented behaviour, don t expect a future release to make any of this any safer. Instead, we need to mitigate the risks. Here are my recommended steps:
  1. Look in your Gemfile.lock (or SBOM, if that s your thing) to see if the oj gem is anywhere in your codebase. Remember that even if you don t use it directly, it s popular enough that it is used in a lot of places. If you find it in your transitive dependency tree anywhere, there s a chance you re vulnerable, limited only by the ingenuity of attackers to feed crafted JSON into a deeply-hidden Oj.load call.
  2. If you depend on oj directly and use it in your project, consider not doing that. The json gem is acceptably fast, and JSON.parse won t create arbitrary Ruby objects.
  3. If you really, really need to squeeze the last erg of performance out of your JSON parsing, and decide to use oj to do so, find all calls to Oj.load in your code and switch them to call Oj.safe_load.
  4. It is a really, really bad idea to ever use Oj to deserialize JSON into objects, as it lacks the safety features needed to mitigate the worst of the risks of doing so (for example, restricting which classes can be instantiated, as is provided by the permitted_classes argument to Psych.load). I d make it a priority to move away from using Oj for that, and switch to something somewhat safer (such as the aforementioned Psych). At the very least, audit and comment heavily to minimise the risk of user-provided input sneaking into those calls somehow, and pass mode: :object as the second argument to Oj.load, to make it explicit that you are opting-in to this far more dangerous behaviour only when it s absolutely necessary.
  5. To secure any unsafe uses of Oj.load in your dependencies, consider setting the default Oj parsing mode to :strict, by putting Oj.default_options = mode: :strict somewhere in your initialization code (and make sure no dependencies are setting it to something else later!). There is a small chance that this change of default might break something, if a dependency is using Oj to deliberately create Ruby objects from JSON, but the overwhelming likelihood is that Oj s just being used to parse ordinary JSON, and these calls are just RCE vulnerabilities waiting to give you a bad time.

Is Your Bacon Saved? If I ve helped you identify and fix potential RCE vulnerabilities in your software, or even just opened your eyes to the risks of object deserialization, please help me out by buying me a refreshing beverage. I would really appreciate any support you can give. Alternately, if you d like my help in fixing these (and many other) sorts of problems, I m looking for work, so email me.

23 July 2025

Peter Pentchev: Ringlet software updates (2025-03-23)

Ringlet software updates (2025-03-23)Recent initial releases of [Ringlet software][r-site] (a fancy name for my pet projects):

Dirk Eddelbuettel: qlcal 0.0.16 on CRAN: Regular Update

The sixteenth release of the qlcal package arrivied at CRAN today, once again following the QuantLib 1.39 release this morning. qlcal delivers the calendaring parts of QuantLib. It is provided (for the R package) as a set of included files, so the package is self-contained and does not depend on an external QuantLib library (which can be demanding to build). qlcal covers over sixty country / market calendars and can compute holiday lists, its complement (i.e. business day lists) and much more. Examples are in the README at the repository, the package page, and course at the CRAN package page. This releases mainly synchronizes qlcal with the QuantLib release 1.39.

Changes in version 0.0.16 (2025-07-23)
  • Synchronized with QuantLib 1.39 released today
  • Calendar updates for Israel, minor utility functions update
  • Minor package maintenance updates

Courtesy of my CRANberries, there is a diffstat report for this release. See the project page and package documentation for more details, and more examples.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

Sergio Cipriano: How I finally tracked my Debian uploads correctly

How I finally tracked my Debian uploads correctly A long time ago, I became aware of UDD (Ultimate Debian Database), which gathers various Debian data into a single SQL database. At that time, we were trying to do something simple: list the contributions (package uploads) of our local community, Debian Bras lia. We ended up with a script that counted uploads to unstable and experimental. I was never satisfied with the final result because some uploads were always missing. Here is an example:
debci (3.0) experimental; urgency=medium
...
   [ Sergio de almeida cipriano Junior ]
   * Fix Style/GlovalVars issue
   * Rename blacklist to rejectlist
...
I made changes in debci 3.0, but the upload was done by someone else. This kind of contribution cannot be tracked by that script. Then, a few years ago, I learned about Minechangelogs, which allows us to search through the changelogs of all Debian packages currently published. Today, I decided to explore how this was done, since I couldn't find anything useful for that kind of query in UDD's tables. That's when I came across ProjectB. It was my first time hearing about it. ProjectB is a database that stores all the metadata about the packages in the Debian archive, including the changelogs of those packages. Now that I'm a Debian Developer, I have access to this database. If you also have access and want to try some queries, you can do this:
$ ssh mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
$ psql postgresql://guest@localhost:15434/projectb?sslmode=allow
In the end, it finally solved my problem. Using the code below, with UDD, I get 38 uploads:
import psycopg2

contributor = 'almeida cipriano'

try:
    connection = psycopg2.connect(
        user="udd-mirror",
        password="udd-mirror",
        host="udd-mirror.debian.net",
        port="5432",
        database="udd"
    )

    cursor = connection.cursor()

    query = f"SELECT source,version,date,distribution,signed_by_name \
FROM public.upload_history \
WHERE changed_by_name ILIKE '% contributor %' \
ORDER BY date;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have  len(records)  uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)
Using the code bellow, with ProjectB, I get 43 uploads (the correct amount):
import psycopg2

contributor = 'almeida cipriano'

try:
    # SSH tunnel is required to access the database:
    # ssh <username>@mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
    connection = psycopg2.connect(
        user="guest",
        host="localhost",
        port="15434",
        database="projectb",
        sslmode="allow"
    )
    connection.set_client_encoding('UTF8')

    cursor = connection.cursor()

    query = f"SELECT c.source, c.version, c.changedby \
FROM changes c \
JOIN changelogs ch ON ch.id = c.changelog_id \
WHERE c.source != 'debian-keyring' \
  AND (\
    ch.changelog ILIKE '% contributor %' \
    OR c.changedby ILIKE '% contributor %' \
  )\
ORDER BY c.seen;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have  len(records)  uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)
It feels good to finally solve this itch I've had for years.

Russell Coker: Annoying Wrongness on TV

One thing that annoys me on TV shows and movies is getting the details wrong. Yes it s fiction and yes some things can t be done correctly and in some situations correctly portraying things goes against the plot. But otherwise I think they should try to make it accurate. I was just watching The Americans (a generally good show that I recommend watching) and in Season 4 Episode 9 there s a close up of a glass of wine which clearly shows that the Tears of Wine effect is missing, the liquid in the glass obviously has the surface tension of water not of wine. When they run a show about spies then have to expect that the core audience will be the type of detail oriented people who notice these things. Having actors not actually drink alcohol on set is a standard practice, if they have to do 10 takes of someone drinking a glass of wine then that would be a problem if they actually drank real wine. But they could substitute real wine for the close up shots and of course just getting it right the first time is a good option. Some ridiculous inaccuracy we just need to deal with, like knives making a schwing sound when pulled out of scabbards and silenced guns usually still being quite loud (so many people are used to it being wrong). Organisations like the KGB had guns that were actually silent, but they generally looked obviously different to regular guns and had a much lower effective range. The gold coins shown on TV are another ridiculous thing. The sound of metal hitting something depends on how hard it is and how dense it is. Surely most people have heard the sounds of dropping steel nuts and ball bearings and the sound of dropping lead sinkers and knows that the sounds of items of similar size and shape differ greatly based on density and hardness. A modern coin made of copper, cupro-nickel (the current silver coins), or copper-aluminium (the current gold coins) sounds very different to a gold coin when dropped on a bench. For a show like The Witcher it wouldn t be difficult to make actual gold coins of a similar quality to iron age coin production, any jeweller could make the blanks and making stamps hard enough to press gold isn t an engineering challenge (stamping copper coins would be much more difficult). The coins used for the show could be sold to fans afterwards. Once coins are made they can t be just heaped up. Even if you are a sorcerer you probably couldn t fill a barrel a meter high with gold coins and not have it break from the weight and/or have the coins at the bottom cold welded. Gold coins are supposed to have a precise amount of gold and if you pile them up too high then the cold welding process will transfer gold between coins changing the value. If someone was going to have a significant quantity of gold stored then it would be in gold ingots with separators between layers to prevent cold welding. Movies tend not to show coins close up, I presume that s because they considered it too difficult to make coins and they just use some random coins from their own country. Another annoying thing is shows that don t match up the build dates of objects used. It s nice when they get it right like the movie Titanic featuring a M1911 pistol which is something that a rich person in 1912 would likely have. The series Carnival Row (which I recommend) has weapons that mostly match our WW1 era, everything that doesn t involve magic seems legit. One of the worst examples of this is the movie Anna (by Luc Besson which is mostly a recreation of his film Nikita but in the early 90s and with the KGB). That film features laptops with color screens and USB ports before USB was invented and when color screens weren t common on laptops, as an aside military spec laptops tend to have older designs than consumer spec ones. I ve mostly given up on hoping that movies will get hacking scenes that are any more accurate than knives making a schwing sound. But it shouldn t be that hard for them to find computer gear that was manufactured in the right year to use for the film. Why can t they hire experts on technology to check everything?

22 July 2025

Iustin Pop: Watching website scanning bots

Ever since I put up http://demo.corydalis.io, and setup logcheck, I m inadvertently keeping up with recent exploits in common CMS frameworks, or maybe even normal web frameworks issues, by seeing what 404s I get from the logs. Now, I didn t indent to do this per se, I just wanted to make sure I don t have any 500s, and at one point, I did actually catch a bug by seeing seemingly valid URLs, with referrer my own pages, leading to 404s. But besides that, it s mainly a couple times per week, a bot finds the site, and then it tries in fast succession something like this (real log entries, with the source IP address removed):
[21/Jul/2025:09:27:09 +0200] "GET /pms?module=logging&file_name=../../../../../../~/.aws/credentials&number_of_lines=10000 HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:11 +0200] "GET /admin/config?cmd=cat+/root/.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:11 +0200] "GET /.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:13 +0200] "GET /.env.local HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:13 +0200] "GET /.env.production HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:16 +0200] "GET /.env.dev HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:17 +0200] "GET /.env.development HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:19 +0200] "GET /.env.prod HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:19 +0200] "GET /.env.stage HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:22 +0200] "GET /.env.test HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:23 +0200] "GET /.env.example HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:25 +0200] "GET /.env.bak HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:26 +0200] "GET /.env.old HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:28 +0200] "GET /.envs/.production/.django HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:28 +0200] "GET /blog.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:31 +0200] "GET /wp-content/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:32 +0200] "GET /application/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:34 +0200] "GET /app/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:35 +0200] "GET /apps/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:37 +0200] "GET /config/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:38 +0200] "GET /config/config.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:40 +0200] "GET /config/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:41 +0200] "GET /api/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:43 +0200] "GET /vendor/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:44 +0200] "GET /backend/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:46 +0200] "GET /server/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:46 +0200] "GET /home/user/.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:49 +0200] "GET /aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:50 +0200] "GET /.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:52 +0200] "GET /.aws/config HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:52 +0200] "GET /config/aws.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:55 +0200] "GET /config/aws.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:55 +0200] "GET /.env.production HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:58 +0200] "GET /config.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:59 +0200] "GET /config/config.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:01 +0200] "GET /config/settings.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:02 +0200] "GET /config/secrets.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:04 +0200] "GET /config.yaml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:04 +0200] "GET /config.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:07 +0200] "GET /config.py HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:08 +0200] "GET /secrets.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:10 +0200] "GET /secrets.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:11 +0200] "GET /credentials.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:13 +0200] "GET /.git-credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:14 +0200] "GET /.git/config HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:16 +0200] "GET /.gitignore HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:18 +0200] "GET /.gitlab-ci.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:19 +0200] "GET /.github/workflows HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:22 +0200] "GET /.idea/workspace.xml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:22 +0200] "GET /.vscode/settings.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:25 +0200] "GET /docker-compose.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:25 +0200] "GET /docker-compose.override.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:28 +0200] "GET /docker-compose.prod.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:28 +0200] "GET /docker-compose.dev.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:32 +0200] "GET /phpinfo HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:32 +0200] "GET /_profiler/phpinfo HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:34 +0200] "GET /phpinfo.php HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:34 +0200] "GET /info.php HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:37 +0200] "GET /storage/logs/laravel.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:37 +0200] "GET /storage/logs/error.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:40 +0200] "GET /logs/debug.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:40 +0200] "GET /logs/app.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:49 +0200] "GET /debug.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:51 +0200] "GET /error.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:53 +0200] "GET /.DS_Store HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:55 +0200] "GET /backup.zip HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:58 +0200] "GET /.backup HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:00 +0200] "GET /db.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:03 +0200] "GET /dump.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:06 +0200] "GET /database.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:09 +0200] "GET /backup.tar.gz HTTP/1.1" 404 - "" "Mozilla/5.0"
Now, this example is actually trying to catch a bit more things, but many times it s focused on some specific thing, or two things. Here we have docker, MacOS .DS_Store (I m not sure how that s useful - to find more filenames?), VSCode settings, various secrets, GitHub workflows, log output, database dumps, AWS credentials, and still I guess from the wp filename WordPress settings. The first few years were full of WordPress scanners, now it seems it has quieted down, I haven t seen a bot scanning 200 WP potential filenames in ages. And this bot even bothers to put in Mozilla/5.0 as browser identification . Side-note: I don t think the filename path in the first log entry, i.e. ../../../../../../~/, ever properly resolves to the home directory of any user. So I m not that particular scanner ever works, but who knows? Maybe some framework does bad tilde expansion, but at least bash will not expand ~ inside a path, it seems that path is passed as-is to an invoked command (strace confirms it). What s surprising here is that these are usually plain dumb scanners, from the same IP address, no concern on throttling, no attempt to hide, just 2 minutes of brute-forcing a random list of known treasures , then moving on. For this to be worth, it means there are still victims found using this method, sadly. Well, sometimes I get a single, one-off "GET /wp-login.php HTTP/1.1, which is strange enough it might not be a bot even, who knows. But in general, periods of activity of this type are coming and going, probably aligned with new CVEs. And another surprising thing is that for this type of scanning to work (and I ve seen many over the years), the website framework/configuration must allow random file download. Corydalis itself is written in Haskell, using Yesod, and it has a hardcoded (built at compile time) list of static resources it will serve. I haven t made the switch to fully embedding in the binary, but at that point, it won t need to read from the filesystem at all. Right now it will serve a few CSS and JS files, plus fonts, but that s it, no arbitrary filesystem traversal. Strange that some frameworks allow it. This is not productively spent time, but it is fun, especially seeing how this changes over time. And probably the most use anyone gets out of http://demo.corydalis.io .

20 July 2025

Michael Prokop: What to expect from Debian/trixie #newintrixie

Trixie Banner, Copyright 2024 Elise Couper Update on 2025-07-28: added note about Debian 13/trixie support for OpenVox (thanks, Ben Ford!) Debian v13 with codename trixie is scheduled to be published as new stable release on 9th of August 2025. I was the driving force at several of my customers to be well prepared for the upcoming stable release (my efforts for trixie started in August 2024). On the one hand, to make sure packages we care about are available and actually make it into the release. On the other hand, to ensure there are no severe issues that make it into the release and to get proper and working upgrades. So far everything is looking pretty well and working fine, the efforts seemed to have payed off. :) As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on trixie that might be worth for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings As usual start at the official Debian release notes, make sure to especially go through What s new in Debian 13 + issues to be aware of for trixie (strongly recommended read!). Package versions As a starting point, let s look at some selected packages and their versions in bookworm vs. trixie as of 2025-07-20 (mainly having amd64 in mind):
Package bookworm/v12 trixie/v13
ansible 2.14.3 2.19.0
apache 2.4.62 2.4.64
apt 2.6.1 3.0.3
bash 5.2.15 5.2.37
ceph 16.2.11 18.2.7
docker 20.10.24 26.1.5
dovecot 2.3.19 2.4.1
dpkg 1.21.22 1.22.21
emacs 28.2 30.1
gcc 12.2.0 14.2.0
git 2.39.5 2.47.2
golang 1.19 1.24
libc 2.36 2.41
linux kernel 6.1 6.12
llvm 14.0 19.0
lxc 5.0.2 6.0.4
mariadb 10.11 11.8
nginx 1.22.1 1.26.3
nodejs 18.13 20.19
openjdk 17.0 21.0
openssh 9.2p1 10.0p1
openssl 3.0 3.5
perl 5.36.0 5.40.1
php 8.2+93 8.4+96
podman 4.3.1 5.4.2
postfix 3.7.11 3.10.3
postgres 15 17
puppet 7.23.0 8.10.0
python3 3.11.2 3.13.5
qemu/kvm 7.2 10.0
rsync 3.2.7 3.4.1
ruby 3.1 3.3
rust 1.63.0 1.85.0
samba 4.17.12 4.22.3
systemd 252.36 257.7-1
unattended-upgrades 2.9.1 2.12
util-linux 2.38.1 2.41
vagrant 2.3.4 2.3.7
vim 9.0.1378 9.1.1230
zsh 5.9 5.9
Misc unsorted apt The new apt version 3.0 brings several new features, including: systemd systemd got upgraded from v252.36-1~deb12u1 to 257.7-1 and there are lots of changes. Be aware that systemd v257 has a new net.naming_scheme, v257 being PCI slot number is now read from firmware_node/sun sysfs file. The naming scheme based on devicetree aliases was extended to support aliases for individual interfaces of controllers with multiple ports. This might affect you, see e.g. #1092176 and #1107187, the Debian Wiki provides further useful information. There are new systemd tools available: The tools provided by systemd gained several new options: Debian s systemd ships new binary packages: Linux Kernel The trixie release ships a Linux kernel based on latest longterm version 6.12. As usual there are lots of changes in the kernel area, including better hardware support, and this might warrant a separate blog entry. To highlight some changes with Debian trixie: See Kernelnewbies.org for further changes between kernel versions. Configuration management For puppet users, Debian provides the puppet-agent (v8.10.0), puppetserver (v8.7.0) and puppetdb (v8.4.1) packages. Puppet s upstream does not provide packages for trixie, yet. Given how long it took them for Debian bookworm, and with their recent Plans for Open Source Puppet in 2025, it s unclear when (and whether at all) we might get something. As a result of upstream behavior, also the OpenVox project evolved, and they already provide Debian 13/trixie support (https://apt.voxpupuli.org/openvox8-release-debian13.deb). FYI: the AIO puppet-agent package for bookworm (v7.34.0-1bookworm) so far works fine for me on Debian/trixie. Be aware that due to the apt-key removal you need a recent version of the puppetlabs-apt for usage with trixie. The puppetlabs-ntp module isn t yet ready for trixie (regarding ntp/ntpsec), if you should depend on that. ansible is available and made it with version 2.19 into trixie. Prometheus stack Prometheus server was updated from v2.42.0 to v2.53, and all the exporters that got shipped with bookworm are still around (in more recent versions of course). Trixie gained some new exporters: Virtualization docker (v26.1.5), ganeti (v3.1.0), libvirt (v11.3.0, be aware of significant changes to libvirt packaging), lxc (v6.0.4), podman (v5.4.2), openstack (see openstack-team on Salsa), qemu/kvm (v10.0.2), xen (v4.20.0) are all still around. Proxmox already announced their PVE 9.0 BETA, being based on trixie and providing 6.14.8-1 kernel, QEMU 10.0.2, LXC 6.0.4, OpenZFS 2.3.3. Vagrant is available in version 2.3.7, but Vagrant upstream does not provide packages for trixie yet. Given that HashiCorp adopted the BSL, the future of vagrant in Debian is unclear. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for trixie, yet. VirtualBox is available from Debian/unstable (version 7.1.12-dfsg-1 as of 2025-07-20), but not shipped with stable release since quite some time (due to lack of cooperation from upstream on security support for older releases, see #794466). Be aware that starting with Linux kernel 6.12, KVM initializes virtualization on module loading by default. This prevents VirtualBox VMs from starting. In order to avoid this, either add kvm.enable_virt_at_load=0 parameter into kernel command line or unload the corresponding kvm_intel / kvm_amd module. If you want to use Vagrant with VirtualBox on trixie, be aware that Debian s vagrant package as present in trixie doesn t support the VirtualBox package version 7.1 as present in Debian/unstable (manually patching vagrant s meta.rb and rebuilding the package without Breaks: virtualbox (>= 7.1) is known to be working). util-linux The are plenty of new options available in the tools provided by util-linux: Now no longer present in util-linux as of trixie: The following binaries got moved from util-linux to the util-linux-extra package: And the util-linux-extra package also provides new tools: OpenSSH OpenSSH was updated from v9.2p1 to 10.0p1-5, so if you re interested in all the changes, check out the release notes between those versions (9.3, 9.4, 9.5, 9.6, 9.7, 9.8, 9.9 + 10.0). Let s highlight some notable behavior changes in Debian: There are some notable new features: Thanks to everyone involved in the release, looking forward to trixie + and happy upgrading!
Let s continue with working towards Debian/forky. :)

17 July 2025

Gunnar Wolf: About our proof-of-concept LLM tool for navigating Debian's manpages

So, for the first year, this year s DebConf had the DebConf Academic Track , that is, content for a one-day-long set of short sessions, for which of them there was a written article presenting the content often with a very academic format, but not necessarily. I hope that for future DebConfs we will continue to hold this track, and that we can help bridge the gap: to get people that are not usually from the academic / universitary prepare content that will be formally expressed and included in a long-lasting, indexed book of proceedings. We did have (informal) proceedings in many of the early DebConfs, and I m very happy to regain that, but with 20 years of better practices. Anyway, of course, I took the opportunity to join this experiment, and together with my Mexican colleague Tzolkin Gardu o who is finishing her PhD here in France (or should I say, second to her, as she is the true leading author of our work). And here you can see the paper we submitted to the DebConf Academic Track, which will be published soon: A retrieval-augmented-generation pipeline to help users query system-provided documentation The corresponding source code is all available at Tzolkin s repository in GitHub. So, what is it about, in shorter words and layman terms? Debian has lots of documentation, but lacks in discoverability. We targetted our venerable manpages: It is well known that manpages are relevant, well-organized documentation describing how to use each of the binaries we ship in Debian. Eric Raymond wrote in his well-known essay The Art of Unix Programming (2003) that the Unix cultural style is telegraphic but complete. It does not hold you by the hand, but it usualy points in the right direction. Our original intent was to digest all of the information in our manpages, but we narrowed it to only the first section of the manual due to the limitations of the hardware a good friend lent us to play with LLMs. We took four different base, redistributable (although, yes, non-DFSG-free) Large Language Models downloaded from HuggingFace (T5-small, MiniLM-L6-v2, BGE-small-en and Multilingual-e5-small), and trained them with the 34579 pages found inside /usr/share/man/man1 of all of the existing Debian packages. We did some interesting fine-tuning (for further details, please refer to the paper itself or to our GitHub repository. The idea is to present an interactive tool that udnerstand natural language queries, and answers with the specific manpage to which they better relate (I d like to say they answer best , but Tzolkin has repeatedly tried to correct my understanding of the resulting vectorial distances). I had prepared some images to present as interaction examples, but I ll wrap up this post with something even better So, this is what you would get with the following queries: We were happy to present like this. During DebCamp, however, I was able to devote some time to translating my client into a Web-accessible system. Do note that it s a bit brittle, and might break now and then. But you are welcome to play with it! Play with the Web interface for our RAG for Debian manuals! I find it worth sharing that, while we used quite a bit of GPU for the training (not too much a couple of nights on an old-although-powerful nVidia 1070 card lent to us by the Felipe Esquivel Fund for Science and Cultural Support), all querying takes place in the CPU, and I usually get between 0.1 and 0.3 seconds per query. My server s architecture is far from rock-solid, and it can break now and then but at least it should respawn upon failure And it should at least be available there for a couple of weeks into August. Anyway, this has been a very interesting journey getting my toes wet in the waters of LLM, and while current results are a bit crude, I think this can be made into an interesting tool for Debian exploration.

Birger Schacht: My first tag2upload upload

Following the DebConf25 talk by Ian Jackson tag2upload - upload simply by pushing a signed git tag I decided to use the quiet time during the day of the DayTrip on DebConf 25 to try out uploading a package using tag2upload. Given the current freeze a couple of the packages I maintainer have new releases waiting. I decided on uploading the new version of labwc to experimental. Labwc is a Wayland compositor based on the wlroots compositor library (the library that sway is using). Labwc is inspired by the Openbox window manager. The upstream team of Labwc released version 0.9.0 last week, the first version that is based on wlroots 0.19.x. Wlroots 0.19.x is also only available in experimental, so that was a good fit for trying an upload with tag2upload. I first used my usual workflow, going into my package repository, doing get fetch origin, checking out the tag of the new release and tagging that with git tag upstream/0.9.0. Then I bumped the version in the debian/experimental branch, adapted the debian/control file for the changed wlroots dependency, committed and built the package using git-buildpackage to check if it builds fine and there are no lintian errors. Then I moved on to look at tag2upload. As a starting point for using tag2upload I read the blogpost by Jonathan Carter My first tag2upload upload. It pointed me to one very important option of git debpush, namely the --baredebian option which I have to use because I use the bare Debian git layout. Given that the last upload of labwc I did was to unstable, I also had to add the --force=changed-suite. I also started right away to use the --tag-only option, because for my first tests I only wanted to have local changes and nothing pushed to anywhere. I also used the --dry-run option. This led to the following command:
> git debpush --baredebian --force=changed-suite --dry-run --tag-only
tags 0.9.0, upstream/0.9.0 all exist in this repository
tell me which one you want to make an orig.tar from: git deborig --just-print '--version=0.9.0-1' TAG
git-debpush: git-deborig failed; maybe try git-debpush --upstream=TAG
This was a bit confusing, because the error message talked about git-deborig, but I was using git-debpush. I also did not want to make an orig.tar! The --upstream option in the git-debpush(1) manual gave an explanation for this:
When pushing a non-native package, git-debpush needs a tag for the upstream part of your package. By default git-debpush asks git-deborig(1), which searches for a suitable tag based on the upstream version in debian/changelog.
So apparently git-debpush can not find out what the correct tag for the upstream version is, because git-deborig can not find out what the correct tag for the upstream version is. git-debpush simply calls git deborig --just-print --version="$version" in line 437. This fails because I initially created a second upstream/0.9.0 to the existing 0.9.0 release tag. I do this for git-buildpackage to find the upstream sources, but with multiple tags git-deborig is not sure which one is the tag it should use (although both point to the same commit). So I removed the upstream/0.9.0 tag and ran git debpush again and now there was no error message (besides the warning regarding the changed suite) but it also did not give an feedback about what is happening. So I tried without the --dry-run option. Again, no output whatsoever, other than the warning about the changed release, BUT my gnupg asked me for permission to sign via my yubikey! And when I looked at the list of tags, I saw that there is now a debian/0.9.0-1 tag that was not there before! Looking at the tag I saw that it was a tag in the format described in the tag2upload(5) manual page, containing the following lines:
labwc release 0.9.0-1 for experimental
[dgit distro=debian split --quilt=baredebian]
[dgit please-upload source=labwc version=0.9.0-1 upstream-tag=0.9.0 upstream=4beee3851f75b53afc2e8699c594c0cc222115bd]
and the tag was signed by me. The 4beee3851f75b53afc2e8699c594c0cc222115bd commit ID is the commit the 0.9.0 tag points to. Now that I had a signed commit tag in the correct format, I went to the labwc packaging repository on salsa and enabled the webhook to trigger the tag2upload service (I just saw that the documentation was updated and there is now a global webhook on salsa, so this step is not needed anymore). Finally I pushed the tags using git push --tags. I could also have used git-debpush for this step, but I d rather use git directly. I then looked at the tag2upload queue and saw how a worker built and uploaded the newest labwc release and I also got an email from the tag2upload service [tag2upload 275] uploaded labwc 0.9.0-1. And a couple of minutes later I got the confirmation that labwc 0.9.0-1 was accepted into experimental. Great! So, to conclude: for tag2upload to work we simply need a git tag in the correct format. The tag2upload service now gets triggered by every pushed tag on salsa but only acts on tags that adhere to the tag2upload(5) format. git-debpush is a simply bash script that creates such a tag and by default also pushes the tag. I think the script could be a bit more verbose, for example telling me that it created a tag and the name of that tag. I think the dependency on git-deborig is also a problem. I use git-buildpackage to build my packages and by default git-buildpacakge assumes upstream tags are of the form upstream/%(version)s (source). I could now change that for all the packages I maintain, but I also think it makes sense to control the tag myself and not use a tag that is controlled by upstream. Upstream could change or delete that tag or I might need to create a tag for a version that is not tagged by upstream. I also think git-debpush is a rather mileading command name, given that the main task of the script is to create a tag in the correct format. Other than that, I m pretty happy about this service. I have a rather crappy uplink at home and it is not so uncommon for my uploads to fail because the connection dropped during dput. Using a simple git based upload approach makes these problems a thing of the past. I might look into other ways to create the needed tag, though.

Otto Kek l inen: Debcraft Easiest way to modify and build Debian packages

Featured image of post Debcraft   Easiest way to modify and build Debian packagesDebian packaging is notoriously hard. Far too many new contributors give up while trying, and many long-time contributors leave due to burnout from having to do too many thankless maintenance tasks. Some just skip testing their changes properly because it feels like too much toil. Debcraft is my attempt to solve this by automating all the boring stuff, and making it easier to learn the correct practices and helping new and old packagers better track changes in both source code and build artifacts.

The challenge of declarative packaging code Unlike how rpm or apk packages are done, the deb package sources by design avoid having one massive procedural packaging recipe. Instead, the packaging is defined in multiple declarative files in the debian/ subdirectory. For example, instead of a script running install -m 755 bin/btop /usr/bin/btop there is a file debian/btop.install containing the line usr/bin/btop. This makes the overall system more robust and reliable, and allows, for example, extensive static analysis to find problems without having to build the package. The notable exception is the debian/rules file, which contains procedural code that can modify any aspect of the package build. Almost all other files are declarative. Benefits include, among others, that the effect of a Debian-wide policy change can be relatively easily predicted by scanning what attributes and configurations all packages have declared. The drawback is that to understand the syntax and meaning of each file, one must understand which build tools read which files and traverse potentially multiple layers of abstraction. In my view, this is the root cause for most of the perceived complexity.

Common complaints about .deb packaging Related to the above, people learning Debian packaging frequently voice the following complaints:
  • Debian has too many tools to learn, often with overlapping or duplicate functionality.
  • Too much outdated and inconsistent documentation that makes learning the numerous tools needlessly hard.
  • Lack of documentation of the generally agreed best practices, mainly due to Debian s reluctance as a project to pick one tool and deprecate the alternatives.
  • Multiple layers of abstraction and lack of clarity on what any single change in the debian/ subdirectory leads to in the final package.
  • Requirement of Debian packages to be developed on a Debian system.

How Debcraft solves (some of) this Debcraft is intentionally opinionated for the sake of simplicity, and makes heavy use of git, git-buildpackage, and most importantly Linux containers, supporting both Docker and Podman. By using containers, Debcraft frees the user from the requirement of having to run Debian. This makes .deb packaging more accessible to developers running some other Linux distro or even Mac or Windows (with WSL). Of course we want developers to run Debian (or a derivative like Ubuntu) but we want them even more to build, test and ship their software as .deb. Even for Debian/Ubuntu users having everything done inside clean hermetic containers of the latest target distribution version will yield more robust, secure and reproducible builds and tests. All containers are built automatically on-the-fly using best practices for layer caching, making everything easy and fast. Debcraft has simple commands to make it easy to build, rebuild, test and update packages. The most fundamental command is debcraft build, which will not only build the package but also fetch the sources if not already present, and with flags such as --distribution or --source-only build for any requested Debian or Ubuntu release, or generate source packages only for Debian or PPA upload purposes. For ease of use, the output is colored and includes helpful explanations on what is being done, and suggests relevant Debian documentation for more information. Most importantly, the build artifacts, along with various logs, are stored in separate directories, making it easy to compare before and after to see what changed as a result of the code or dependency updates (utilizing diffoscope among others). While the above helps to debug successful builds, there is also the debcraft shell command to make debugging failed builds significantly easier by dropping into a shell where one can run various dh commands one-by-one. Once the build works, testing autopkgtests is as easy as running debcraft test. As with all other commands, Debcraft is smart enough to read information like the target distribution from the debian/changelog entry. When the package is ready to be released, there is the debcraft release command that will create the Debian source package in the correct format and facilitate uploading it either to your Personal Package Archive (PPA) or if you are a Debian Developer to the official Debian archive.

Automatically improve and update packages Additionally, the command debcraft improve will try to fix all issues that are possible to address automatically. It utilizes, among others, lintian-brush, codespell and debputy. This makes repetitive Debian maintenance tasks easier, such as updating the package to follow the latest Debian policies. To update the package to the latest upstream version there is also debcraft update. It will read the package configuration files such as debian/gbp.conf and debian/watch and attempts to import the latest upstream version, refresh patches, build and run autopkgtests. If everything passes, the new version is committed. This helps automate the process of updating to new upstream versions.

Try out Debcraft now! On a recent version of Debian and Ubuntu, Debcraft can be installed simply by running apt install debcraft. To use Debcraft on some other distribution or to get the latest features available in the development version install it using:
git clone https://salsa.debian.org/debian/debcraft.git
cd debcraft
make install-local
To see exact usage instructions run debcraft --help.

Contributions welcome Current Debcraft version 0.5 still has some rough edges and missing features, but I have personally been using it for over a year to maintain all my packages in Debian. If you come across some issue, feel free to file a report at https://salsa.debian.org/debian/debcraft/-/issues or submit an improvement at https://salsa.debian.org/debian/debcraft/-/merge_requests. The code is intentionally written entirely in shell script to keep the barrier to code contribution as low as possible.
By the way, if you aspire to become a Debian Developer, and want to follow my examples in using state-of-the-art tooling and collaborate using salsa.debian.org, feel free to reach out for mentorship. I am glad to see more people contribute to Debian!

Arnaud Rebillout: Acquire-By-Hash for APT packages repositories, and the lack of it in Kali Linux

This is a lenghty blog post. It features a long introduction that explains how apt update acquires various files from a package repository, what is Acquire-By-Hash, and how it all works for Kali Linux: a Debian-based distro that doesn't support Acquire-By-Hash, and which is distributed via a network of mirrors and a redirector. In a second part, I explore some "Hash Sum Mismatch" errors that we can hit with Kali Linux, errors that would not happen if only Acquire-By-Hash was supported. If anything, this blog post supports the case for adding Acquire-By-Hash support in reprepro, as requested at https://bugs.debian.org/820660. All of this could have just remained some personal notes for myself, but I got carried away and turned it into a blog post, dunno why... Hopefully others will find it interesting, but you really need to like troubleshooting stories, packed with details, and poorly written at that. You've been warned! Introducing Acquire-By-Hash Acquire-By-Hash is a feature of APT package repositories, that might or might not be supported by your favorite Debian-based distribution. A repository that supports it says so, in the Release file, by setting the field Acquire-By-Hash: yes. It's easy to check. Debian and Ubuntu both support it:
$ wget -qO- http://deb.debian.org/debian/dists/sid/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
$ wget -qO- http://archive.ubuntu.com/ubuntu/dists/devel/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
What about other Debian derivatives?
$ wget -qO- http://http.kali.org/kali/dists/kali-rolling/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://archive.raspberrypi.com/debian/dists/trixie/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- http://packages.linuxmint.com/dists/faye/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://apt.pop-os.org/release/dists/noble/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
Huhu, Acquire-By-Hash is not ubiquitous. But wait, what is Acquire-By-Hash to start with? To answer that, we have to take a step back and cover some basics first. The HTTP requests performed by 'apt update' What happens when one runs apt update? APT first requests the Release file from the repository(ies) configured in the APT sources. This file is a starting point, it contains a list of other files (sometimes called "Index files") that are available in the repository, along with their hashes. After fetching the Release file, APT proceeds to request those Index files. To give you an idea, there are many kinds of Index files, among which: There's an excellent Wiki page that details the structure of a Debian package repository, it's there: https://wiki.debian.org/DebianRepository/Format. Note that APT doesn't necessarily download ALL of those Index files. For simplicity, we'll limit ourselves to the minimal scenario, where apt update downloads only the Packages files. Let's try to make it more visual: here's a representation of a apt update transaction, assuming that all the components of the repository are enabled:
apt update -> Release -> Packages (main/amd64)
                      -> Packages (contrib/amd64)
                      -> Packages (non-free/amd64)
                      -> Packages (non-free-firmware/amd64)
Meaning that, in a first step, APT downloads the Release file, reads its content, and then in a second step it downloads the Index files in parallel. You can actually see that happen with a command such as apt -q -o Debug::Acquire::http=true update 2>&1 grep ^GET. For Kali Linux you'll see something pretty similar to what I described above. Try it!
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /kali/dists/kali-rolling/InRelease HTTP/1.1    # <- returns a redirect, that is why the file is requested twice
GET /kali/dists/kali-rolling/InRelease HTTP/1.1
GET /kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/main/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz HTTP/1.1
However, and it's now becoming interesting, for Debian or Ubuntu you won't see the same kind of URLs:
$ podman run --rm debian:sid apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /debian/dists/sid/InRelease HTTP/1.1
GET /debian/dists/sid/main/binary-amd64/by-hash/SHA256/22709f0ce67e5e0a33a6e6e64d96a83805903a3376e042c83d64886bb555a9c3 HTTP/1.1
APT doesn't download a file named Packages, instead it fetches a file named after a hash. Why? This is due to the field Acquire-By-Hash: yes that is present in the Debian's Release file. What does Acquire-By-Hash mean for 'apt update' The idea with Acquire-By-Hash is that the Index files are named after their hash on the repository, so if the MD5 sum of main/binary-amd64/Packages is 77b2c1539f816832e2d762adb20a2bb1, then the file will be stored at main/binary-amd64/by-hash/MD5Sum/77b2c1539f816832e2d762adb20a2bb1. The path main/binary-amd64/Packages still exists (it's the "Canonical Location" of this particular Index file), but APT won't use it, instead it downloads the file located in the by-hash/ directory. Why does it matter? This has to do with repository updates, and allowing the package repository to be updated atomically, without interruption of service, and without risk of failure client-side. It's important to understand that the Release file and the Index files are part of a whole, a set of files that go altogether, given that Index files are validated by their hash (as listed in the Release file) after download by APT. If those files are simply named "Release" and "Packages", it means they are not immutable: when the repository is updated, all of those files are updated "in place". And it causes problems. A typical failure mode for the client, during a repository update, is that: 1) APT requests the Release file, then 2) the repository is updated and finally 3) APT requests the Packages files, but their checksum don't match, causing apt update to fail. There are variations of this error, but you get the idea: updating a set of files "in place" is problematic. The Acquire-By-Hash mechanism was introduced exactly to solve this problem: now the Index files have a unique, immutable name. When the repository is updated, at first new Index files are added in the by-hash/ directory, and only after the Release file is updated. Old Index files in by-hash/ are retained for a while, so there's a grace period during which both the old and the new Release files are valid and working: the Index files that they refer to are available in the repo. As a result: no interruption of service, no failure client-side during repository updates. This is explained in more details at https://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html, which is the blog post from Colin Watson that came out at the time Acquire-By-Hash was introduced in... 2016. This is still an excellent read in 2025. So you might be wondering why I'm rambling about a problem that's been solved 10 years ago, but then as I've shown in the introduction, the problem is not solved for everyone. Support for Acquire-By-Hash server side is not for granted, and unfortunately it never landed in reprepro, as one can see at https://bugs.debian.org/820660. reprepro is a popular tool for creating APT package repositories. In particular, at Kali Linux we use reprepro, and that's why there's no Acquire-By-Hash: yes in the Kali Release file. As one can guess, it leads to subtle issues during those moments when the repository is updated. However... we're not ready to talk about that yet! There's still another topic that we need to cover: this window of time during which a repository is being updated, and during which apt update might fail. The window for Hash Sum Mismatches, and the APT trick that saves the day Pay attention! In this section, we're now talking about packages repositories that do NOT support Acquire-By-Hash, such as the Kali Linux repository. As I've said above, it's only when the repository is being updated that there is a "Hash Sum Mismatch Window", ie. a moment when apt update might fail for some unlucky clients, due to invalid Index files. Surely, it's a very very short window of time, right? I mean, it can't take that long to update files on a server, especially when you know that a repository is usually updated via rsync, and rsync goes to great length to update files the most atomically as it can (with the option --delay=updates). So if apt update fails for me, I've been very unlucky, but I can just retry in a few seconds and it should be fixed, isn't it? The answer is: it's not that simple. So far I pictured the "package repository" as a single server, for simplicity. But it's not always what it is. For Kali Linux, by default users have http.kali.org configured in their APT sources, and it is a redirector, ie. a web server that redirects requests to mirrors that are nearby the client. Some context that matters for what comes next: the Kali repository is synced with ~70 mirrors all around the world, 4 times a day. What happens if your apt update requests are redirected to 2 mirrors close-by, and one was just synced, while the other is still syncing (or even worse, failed to sync entirely)? You'll get a mix of old and new Index files. Hash Sum Mismatch! As you can see, with this setup the "Hash Sum Mismatch Window" becomes much longer than a few seconds: as long as nearby mirrors are syncing the window is opened. You could have a fast and a slow mirror next to you, and they can be out of sync with each other for several minutes every time the repository is updated, for example. For Kali Linux in particular, there's a "detail" in our network of mirrors that, as a side-effect, almost guarantees that this window lasts several minutes at least. This is because the pool of mirrors includes kali.download which is in fact the Cloudflare CDN, and from the redirector point of view, it's seen as a "super mirror" that is present in every country. So when APT fires a bunch of requests against http.kali.org, it's likely that some of them will be redirected to the Kali CDN, and others will be redirected to a mirror nearby you. So far so good, but there's another point of detail to be aware of: the Kali CDN is synced first, before the other mirrors. Another thing: usually the mirrors that are the farthest from the Tier-0 mirror are the longest to sync. Packing all of that together: if you live somewhere in Asia, it's not uncommon for your "Hash Sum Mismatch Window" to be as long as 30 minutes, between the moment the Kali CDN is synced, and the moment your nearby mirrors catch up and are finally in sync as well. Having said all of that, and assuming you're still reading (anyone here?), you might be wondering... Does that mean that apt update is broken 4 times a day, for around 30 minutes, for every Kali user out there? How can they bear with that? Answer is: no, of course not, it's not broken like that. It works despite all of that, and this is thanks to yet another detail that we didn't go into yet. This detail lies in APT itself. APT is in fact "redirector aware", in a sense. When it fetches a Release file, and if ever the request is redirected, it then fires the subsequent requests against the server where it was initially redirected. So you are guaranteed that the Release file and the Index files are retrieved from the same mirror! Which brings back our "Hash Sum Mismatch Window" to the window for a single server, ie. something like a few seconds at worst, hopefully. And that's what makes it work for Kali, literally. Without this trick, everything would fall apart. For reference, this feature was implemented in APT back in... 2016! A busy year it seems! Here's the link to the commit: use the same redirection mirror for all index files. To finish, a dump from the console. You can see this behaviour play out easily, again with APT debugging turned on. Below we can see that only the first request hits the Kali redirector:
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 302 Found
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/main/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Interlude Believe it or not, we're done with the introduction! At this point, we have a good understanding of what apt update does (in terms of HTTP requests), we know that Release files and Index files are part of a whole, and we know that a repository can be updated atomically thanks to the Acquire-By-Hash feature, so that users don't experience interruption of service or failures of any sort, even with a rolling repository that is updated several times a day, like Debian sid. We've also learnt that, despite the fact that Acquire-By-Hash landed almost 10 years ago, some distributions like Kali Linux are still doing without it... and yet it works! But the reason why it works is more complicated to grasp, especially when you add a network of mirrors and a redirector to the picture. Moreover, it doesn't work as flawlessly as with the Acquire-By-Hash feature: we still expect some short (seconds at worst) "Hash Sum Mismatch Windows" for those unlucky users that run apt update at the wrong moment. This was a long intro, but that really sets the stage for what comes next: the edge cases. Some situations in which we can hit some Hash Sum Mismatch errors with Kali. Error cases that I've collected and investigated over the time... If anything, it supports the case that Acquire-By-Hash is really something that should be implemented in reprepro. More on that in the conclusion, but for now, let's look at those edge cases. Edge Case 1: the caching proxy If you put a caching proxy (such as approx, my APT caching proxy of choice) between yourself and the actual packages repository, then obviously it's the caching proxy that performs the HTTP requests, and therefore APT will never know about the redirections returned by the server, if any. So the APT trick of downloading all the Index files from the same server in case of redirect doesn't work anymore. It was rather easy to confirm that by building a Kali package during a mirror sync, and watch if fail at the "Update chroot" step:
$ sudo rm /var/cache/approx/kali/dists/ -fr
$ gbp buildpackage --git-builder=sbuild
+------------------------------------------------------------------------------+
  Update chroot                                Wed, 11 Jun 2025 10:33:32 +0000  
+------------------------------------------------------------------------------+
Get:1 http://http.kali.org/kali kali-dev InRelease [41.4 kB]
Get:2 http://http.kali.org/kali kali-dev/contrib Sources [81.6 kB]
Get:3 http://http.kali.org/kali kali-dev/main Sources [17.3 MB]
Get:4 http://http.kali.org/kali kali-dev/non-free Sources [122 kB]
Get:5 http://http.kali.org/kali kali-dev/non-free-firmware Sources [8297 B]
Get:6 http://http.kali.org/kali kali-dev/non-free amd64 Packages [197 kB]
Get:7 http://http.kali.org/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:8 http://http.kali.org/kali kali-dev/contrib amd64 Packages [120 kB]
Get:9 http://http.kali.org/kali kali-dev/main amd64 Packages [21.0 MB]
Err:9 http://http.kali.org/kali kali-dev/main amd64 Packages
  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
  Hashes of expected file:
   - Filesize:20984861 [weak]
   - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
   - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
   - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
  Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
Fetched 17.9 MB in 9s (1893 kB/s)
Reading package lists...
E: Failed to fetch http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
   Hashes of expected file:
    - Filesize:20984861 [weak]
    - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
    - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
    - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
   Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
E: apt-get update failed
The obvious workaround is to NOT use the redirector in the approx configuration. Either use a mirror close by, or the Kali CDN:
$ grep kali /etc/approx/approx.conf 
#kali http://http.kali.org/kali <- do not use the redirector!
kali  http://kali.download/kali
Edge Case 2: debootstrap struggles What if one tries to debootstrap Kali while mirrors are being synced? It can give you some ugly logs, but it might not be fatal:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Target architecture can be executed
I: Retrieving InRelease 
I: Checking Release signature
I: Valid Release signature (key id 827C8569F2518CC677FECA1AED65462EC8D5E4C5)
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Checking component main on http://http.kali.org/kali...
I: Retrieving adduser 3.152
[...]
To understand this one, we have to go and look at the debootstrap source code. How does debootstrap fetch the Release file and the Index files? It uses wget, and it retries up to 10 times in case of failure. It's not as sophisticated as APT: it doesn't detect when the Release file is served via a redirect. As a consequence, what happens above can be explained as such:
  1. debootstrap requests the Release file, gets redirected to a mirror, and retrieves it from there
  2. then it requests the Packages file, gets redirected to another mirror that is not in sync with the first one, and retrieves it from there
  3. validation fails, since the checksum is not as expected
  4. try again and again
Since debootstrap retries up to 10 times, at some point it's lucky enough to get redirected to the same mirror as the one from where it got its Release file from, and this time it gets the right Packages file, with the expected checksum. So ultimately it succeeds. Edge Case 3: post-debootstrap failure I like this one, because it gets us to yet another detail that we didn't talk about yet. So, what happens after we successfully debootstraped Kali? We have only the main component enabled, and only the Index file for this component have been retrieved. It looks like that:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Base system installed successfully.
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main
$ ls -l kali-dev/var/lib/apt/lists/
total 80468
-rw-r--r-- 1 root root    41445 Jun 19 07:02 http.kali.org_kali_dists_kali-dev_InRelease
-rw-r--r-- 1 root root 82299122 Jun 19 07:01 http.kali.org_kali_dists_kali-dev_main_binary-amd64_Packages
-rw-r--r-- 1 root root    40562 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release
-rw-r--r-- 1 root root      833 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release.gpg
drwxr-xr-x 2 root root     4096 Jun 19 11:54 partial
So far so good. Next step would be to complete the sources.list with other components, then run apt update: APT will download the missing Index files. But if you're unlucky, that might fail:
$ sudo sed -i 's/main$/main contrib non-free non-free-firmware/' kali-dev/etc/apt/sources.list
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main contrib non-free non-free-firmware
$ sudo chroot kali-dev apt update
Hit:1 http://http.kali.org/kali kali-dev InRelease
Get:2 http://kali.download/kali kali-dev/contrib amd64 Packages [121 kB]
Get:4 http://mirror.sg.gs/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages [198 kB]
Err:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages
  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
  Hashes of expected file:
   - Filesize:10584 [weak]
   - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
   - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
   - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
  Release file created at: Fri, 23 May 2025 06:48:41 +0000
Fetched 336 kB in 4s (77.5 kB/s)  
Reading package lists... Done
E: Failed to fetch http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
   Hashes of expected file:
    - Filesize:10584 [weak]
    - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
    - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
    - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
   Release file created at: Fri, 23 May 2025 06:48:41 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
What happened here? Again, we need APT debugging options to have a hint:
$ sudo chroot kali-dev apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-dev/InRelease
HTTP/1.1 304 Not Modified
Answer for: http://http.kali.org/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://kali.download/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.sg.gs/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
As we can see above, for the Release file we get a 304 (aka. "Not Modified") from the redirector. Why is that? This is due to If-Modified-Since also known as RFC-7232. APT supports this feature when it retrieves the Release file, it basically says to the server "Give me the Release file, but only if it's newer than what I already have". If the file on the server is not newer than that, it answers with a 304, which basically says to the client "You have the latest version already". So APT doesn't get a new Release file, it uses the Release file that is already present locally in /var/lib/apt/lists/, and then it proceeeds to download the missing Index files. And as we can see above: it then hits the redirector for each requests, and might be redirected to different mirrors for each Index file. So the important bit here is: the APT "trick" of downloading all the Index files from the same mirror only works if the Release file is served via a redirect. If it's not, like in this case, then APT hits the redirector for each files it needs to download, and it's subject to the "Hash Sum Mismatch" error again. In practice, for the casual user running apt update every now and then, it's not an issue. If they have the latest Release file, no extra requests are done, because they also have the latest Index files, from a previous apt update transaction. So APT doesn't re-download those Index files. The only reason why they'd have the latest Release file, and would miss some Index files, would be that they added new components to their APT sources, like we just did above. Not so common, and then they'd need to run apt update at a unlucky moment. I don't think many users are affected in practice. Note that this issue is rather new for Kali Linux. The redirector running on http.kali.org is mirrorbits, and support for If-Modified-Since just landed in the latest release, version 0.6. This feature was added by no one else than me, a great example of the expression "shooting oneself in the foot". An obvious workaround here is to empty /var/lib/apt/lists/ in the chroot after debootstrap completed. Or we could disable support for If-Modified-Since entirely for Kali's instance of mirrorbits. Summary and Conclusion The Hash Sum Mismatch failures above are caused by a combination of things: At the same time: All in all, it seems that all those issues would go away if only Acquire-By-Hash was supported in the Kali packages repository. Now is not a bad moment to try to land this feature in reprepro. After development halted in 2019, there's now a new upstream, and patches are being merged again. But it won't be easy: reprepro is a C codebase of around 50k lines of code, and it will take time and effort for the newcomer to get acquainted with the codebase, to the point of being able to implement a significant feature like this one. As an alternative, aptly is another popular tool to manage APT package repositories. And it seems to support Acquire-By-Hash already. Another alternative: I was told that debusine has (experimental) support for package repositories, and that Acquire-By-Hash is supported as well. Options are on the table, and I hope that Kali will eventually get support for Acquire-By-Hash, one way or another. To finish, due credits: this blog post exists thanks to my employer OffSec. Thanks for reading!

Next.

Previous.