Search Results: "error"

20 August 2025

Emmanuel Kasper: Benchmarking 3D graphic cards and their drivers

I have in the past benchmarked network links and disks, so as to have a rough idea of the performance of the hardware I am confronted at $WORK. As I started to dabble into Linux gaming (on non-PC hardware !), I wanted to have some numbers from the graphic stack as well. I am using the command glmark2 --size 1920x1080 which is testing the performance of an OpenGL implementation, hardware + drivers. OpenGL is the classic 3D API used by most opensource gaming on Linux (Doom3 Engine, SuperTuxCart, 0AD, Cube 2 Engine). Vulkan is getting traction as a newer 3D API however the equivalent Vulkan vkmark benchmark was crashing using the NVIDIA semi-proprietary drivers. (vkmark --size 1920x1080 was throwing an ugly Error: Selected present mode Mailbox is not supported by the used Vulkan physical device. )
# apt install glmark2
$ lspci   grep -i vga # integrated GPU
00:02.0 VGA compatible controller: Intel Corporation HD Graphics 615 (rev 02)
$ glmark2 --size 1920x1080
...
...
glmark2 Score: 2063
$ lspci   grep -i vga # integrated GPU
00:02.0 VGA compatible controller: Intel Corporation Meteor Lake-P [Intel Graphics] (rev 08)
glmark2 Score: 3095
$ lspci   grep -i vga # discrete GPU, using nouveau
0000:01:00.0 VGA compatible controller: NVIDIA Corporation AD107GL [RTX 2000 / 2000E Ada Generation] (rev a1)
glmark2 Score: 2463
$ lspci   grep -i vga # discrete GPU, using nvidia-open semi-proprietary driver
0000:01:00.0 VGA compatible controller: NVIDIA Corporation AD107GL [RTX 2000 / 2000E Ada Generation] (rev a1)
glmark2 score: 4960
Nouveau has currently some graphical glitches with Doom3 so I am using the nvidia-open driver for this hardware. In my testing with Doom3 and SuperTuxKart, post 2015 integrated Intel Hardware is more than enough to play in HD resolution.

17 August 2025

Valhalla's Things: rrdtool and Trixie

Posted on August 17, 2025
Tags: madeof:bits
TL;DL: if you re using rrdtool on a 32 bit architecture like armhf make an XML dump of your RRD files just before upgrading to Debian Trixie. I am an old person at heart, so the sensor data from my home monitoring system1 doesn t go to one of those newfangled javascript-heavy data visualization platforms, but into good old RRD files, using rrdtool to generate various graphs. This happens on the home server, which is an armhf single board computer2, hosting a few containers3. So, yesterday I started upgrading one of the containers to Trixie, and luckily I started from the one with the RRD, because when I rebooted into the fresh system and checked the relevant service I found it stopped on ERROR: '<file>' is too small (should be <size> bytes). Some searxing later, I ve4 found this was caused by the 64-bit time_t transition, which changed the format of the files, and that (somewhat unexpectedly) there was no way to fix it on the machine itself. What needed to be done instead was to export the data on an XML dump before the upgrade, and then import it back afterwards. Easy enough, right? If you know about it, which is why I m blogging this, so that other people will know in advance :) Anyway, luckily I still had the other containers on bookworm, so I copied the files over there, did the upgrade, and my home monitoring system is happily running as before.

  1. of course one has a self-built home monitoring system, right?
  2. an A20-OLinuXino-MICRO, if anybody wants to know.
  3. mostly for ease of migrating things between different hardware, rather than insulation, since everything comes from Debian packages anyway.
  4. and by I I really mean Diego, as I was still into denial / distractions mode.

3 August 2025

Ben Hutchings: FOSS activity in July 2025

In July I attended DebCamp and DebConf in Brest, France. I very much enjoyed the opportunity to reconnect with other Debian contributors in person. I had a number of interesting and fruitful conversations there, besides the formally organised BoFs and talks. I also gave my own talk on What s new in the Linux kernel (and what s missing in Debian). Here s the usual categorisation of activity:

Sergio Cipriano: Handling malicious requests with fail2ban

Handling malicious requests with fail2ban I've been receiving a lot of malicious requests for a while now, so I decided to try out fail2ban as a possible solution. I see fail2ban as nice to have tool that is useful to keep down the "noise", but I wouldn't rely on it for security. If you need a tool to block unauthorized attempts or monitor log files excessively, you are probably doing something wrong. I'm currently using fail2ban 1.0.2-2 from Debian Bookworm. Unfortunatly, I quickly ran into a problem, fail2ban doesn't work out of the box with this version:
systemd[1]: Started fail2ban.service - Fail2Ban Service.
fail2ban-server[2840]: 2025-07-28 14:40:13,450 fail2ban.configreader   [2840]: WARNING 'allowipv6' not defined in 'Definition'. Using default one: 'auto'
fail2ban-server[2840]: 2025-07-28 14:40:13,456 fail2ban                [2840]: ERROR   Failed during configuration: Have not found an y log file for sshd jail
fail2ban-server[2840]: 2025-07-28 14:40:13,456 fail2ban                [2840]: ERROR   Async configuration of server failed
systemd[1]: fail2ban.service: Main process exited, code=exited, status=255/EXCEPTION
systemd[1]: fail2ban.service: Failed with result 'exit-code'.
The good news is that this issue has already been addressed for Debian Trixie. Since I prefer to manage my own configuration, I removed the default file at /etc/fail2ban/jail.d/defaults-debian.conf and replaced it with a custom setup. To fix the earlier issue, I also added a systemd backend to the sshd jail so it would stop expecting a logpath. Here's the configuration I'm using:
$ cat /etc/fail2ban/jail.d/custom.conf 
[DEFAULT]
maxretry = 3
findtime = 24h
bantime  = 24h
[nginx-bad-request]
enabled  = true
port     = http,https
filter   = nginx-bad-request
logpath  = /var/log/nginx/access.log
[nginx-botsearch]
enabled  = true
port     = http,https
filter   = nginx-botsearch
logpath  = /var/log/nginx/access.log
[sshd]
enabled  = true
port     = ssh
filter   = sshd
backend  = systemd
I like to make things explicit, so I did repeat some lines from the default jail.conf file. In the end, I'm quite happy with it so far. Soon after I set it up, fail2ban was already banning a few hosts.
$ sudo fail2ban-client status nginx-bad-request
Status for the jail: nginx-bad-request
 - Filter
    - Currently failed: 42
    - Total failed: 454
 - Actions
    - Currently banned: 12
    - Total banned: 39

1 August 2025

Jonathan Dowland: School of Computing Technical Reports

(You wait ages for an archiving blog post and two come along at once!) Between 1969-2019, the Newcastle University School of Computing published a Technical Reports Series. Until 2017-ish, the full list of individually-numbered reports was available on the School's website, as well as full text PDFs for every report. At some time around 2014 I was responsible for migrating the School's website from self-managed to centrally-managed. The driver was to improve the website from the perspective of student recruitment. The TR listings (as well as full listings and texts for awarded PhD theses, MSc dissertations, Director's reports and various others) survived the initial move. After I left (as staff) in 2015, anything not specifically about student recruitment degraded and by 2017 the listings were gone. I've been trying, on and off, to convince different parts of the University to restore and take ownership of these lists ever since. For one reason or another each avenue I've pursued has gone nowhere. Recently the last remaining promising way forward failed, so I gave up and did it myself. The list is now hosted by the Historic Computing Committee, here: https://nuhc.ncl.ac.uk/computing/techreports/ It's not complete (most of the missing entries are towards the end of the run), but it's a start. The approach that finally yielded results was simply scraping the Internet Archive Wayback Machine for various pages from back when the material was represented on the School website, and then filling in the gaps from some other sources. What I envisage in the future: per-page reports with the relevant metadata (including abstracts); authors de-duplicated and cross-referenced; PDFs OCRd; providing access to the whole metadata DB (probably as as lump of JSON); a mechanism for people to report errors; a platform for students to perform data mining projects: perhaps some kind of classification/tagging by automated content analysis; cross-referencing copies of papers in other venues (lots of TRs are pre-prints).

Sergio Cipriano: How I deployed this Website

How I deployed this Website I will describe the step-by-step process I followed to make this static website accessible on the Internet.

DNS I bought this domain on NameCheap and am using their DNS for now, where I created these records:
Record Type Host Value
A sergiocipriano.com 201.54.0.17
CNAME www sergiocipriano.com

Virtual Machine I am using Magalu Cloud for hosting my VM, since employees have free credits. Besides creating a VM with a public IP, I only needed to set up a Security Group with the following rules:
Type Protocol Port Direction CIDR
IPv4 / IPv6 TCP 80 IN Any IP
IPv4 / IPv6 TCP 443 IN Any IP

Firewall The first thing I did in the VM was enabling ufw (Uncomplicated Firewall). Enabling ufw without pre-allowing SSH is a common pitfall and can lock you out of your VM. I did this once :) A safe way to enable ufw:
$ sudo ufw allow OpenSSH      # or: sudo ufw allow 22/tcp
$ sudo ufw allow 'Nginx Full' # or: sudo ufw allow 80,443/tcp
$ sudo ufw enable
To check if everything is ok, run:
$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
To                           Action      From
--                           ------      ----
22/tcp (OpenSSH)             ALLOW IN    Anywhere                  
80,443/tcp (Nginx Full)      ALLOW IN    Anywhere                  
22/tcp (OpenSSH (v6))        ALLOW IN    Anywhere (v6)             
80,443/tcp (Nginx Full (v6)) ALLOW IN    Anywhere (v6) 

Reverse Proxy I'm using Nginx as the reverse proxy. Since I use the Debian package, I just needed to add this file:
/etc/nginx/sites-enabled/sergiocipriano.com
with this content:
server  
    listen 443 ssl;      # IPv4
    listen [::]:443 ssl; # IPv6
    server_name sergiocipriano.com www.sergiocipriano.com;
    root /path/to/website/sergiocipriano.com;
    index index.html;
    location /  
        try_files $uri /index.html;
     
 
server  
    listen 80;
    listen [::]:80;
    server_name sergiocipriano.com www.sergiocipriano.com;
    # Redirect all HTTP traffic to HTTPS
    return 301 https://$host$request_uri;
 

TLS It's really easy to setup TLS thanks to Let's Encrypt:
$ sudo apt-get install certbot python3-certbot-nginx
$ sudo certbot install --cert-name sergiocipriano.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Deploying certificate
Successfully deployed certificate for sergiocipriano.com to /etc/nginx/sites-enabled/sergiocipriano.com
Successfully deployed certificate for www.sergiocipriano.com to /etc/nginx/sites-enabled/sergiocipriano.com
Certbot will edit the nginx configuration with the path to the certificate.

HTTP Security Headers I decided to use wapiti, which is a web application vulnerability scanner, and the report found this problems:
  1. CSP is not set
  2. X-Frame-Options is not set
  3. X-XSS-Protection is not set
  4. X-Content-Type-Options is not set
  5. Strict-Transport-Security is not set
I'll explain one by one:
  1. The Content-Security-Policy header prevents XSS and data injection by restricting sources of scripts, images, styles, etc.
  2. The X-Frame-Options header prevents a website from being embedded in iframes (clickjacking).
  3. The X-XSS-Protection header is deprecated. It is recommended that CSP is used instead of XSS filtering.
  4. The X-Content-Type-Options header stops MIME-type sniffing to prevent certain attacks.
  5. The Strict-Transport-Security header informs browsers that the host should only be accessed using HTTPS, and that any future attempts to access it using HTTP should automatically be upgraded to HTTPS. Additionally, on future connections to the host, the browser will not allow the user to bypass secure connection errors, such as an invalid certificate. HSTS identifies a host by its domain name only.
I added this security headers inside the HTTPS and HTTP server block, outside the location block, so they apply globally to all responses. Here's how the Nginx config look like:
add_header Content-Security-Policy "default-src 'self'; style-src 'self';" always;
add_header X-Frame-Options "DENY" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
I added always to ensure that nginx sends the header regardless of the response code. To add Content-Security-Policy header I had to move the css to a separate file, because browsers block inline styles under strict CSP unless you allow them explicitly. They're considered unsafe inline unless you move to a separate file and link it like this:
<link rel="stylesheet" href="./resources/header.css">

puer-robustus: My Google Summer of Code '25 at Debian

I ve participated in this year s Google Summer of Code (GSoC) program and have been working on the small (90h) autopkgtests for the rsync package project at Debian.

Writing my proposal Before you can start writing a proposal, you need to select an organization you want to work with. Since many organizations participate in GSoC, I ve used the following criteria to narrow things down for me:
  • Programming language familiarity: For me only Python (preferably) as well as shell and Go projects would have made sense. While learning another programming language is cool, I wouldn t be as effective and helpful to the project as someone who is proficient in the language already.
  • Standing of the organization: Some of the organizations participating in GSoC are well-known for the outstanding quality of the software they produce. Debian is one of them, but so is e.g. the Django Foundation or PostgreSQL. And my thinking was that the higher the quality of the organization, the more there is to learn for me as a GSoC student.
  • Mentor interactions: Apart from the advantage you get from mentor feedback when writing your proposal (more on that further below), it is also helpful to gauge how responsive/helpful your potential mentor is during the application phase. This is important since you will be working together for a period of at least 2 months; if the mentor-student communication doesn t work, the GSoC project is going to be difficult.
  • Free and Open-Source Software (FOSS) communication platforms: I generally believe that FOSS projects should be built on FOSS infrastructure. I personally won t run proprietary software when I want to contribute to FOSS in my spare time.
  • Be a user of the project: As Eric S. Raymond has pointed out in his seminal The Cathedral and the Bazaar 25 years ago
    Every good work of software starts by scratching a developer s personal itch.
Once I had some organizations in mind whose projects I d be interested in working on, I started writing proposals for them. Turns out, I started writing my proposals way too late: In the end I only managed to hand in a single one which is risky. Competition for the GSoC projects is fierce and the more quality (!) proposals you send out, the better your chances are at getting one. However, don t write proposals for the sake of it: Reviewers get way too many AI slop proposals already and you will not do yourself a favor with a low-quality proposal. Take the time to read the instructions/ideas/problem descriptions the project mentors have provided and follow their guidelines. Don t hesitate to reach out to project mentors: In my case, I ve asked Samuel Henrique a few clarification questions whereby the following (email) discussion has helped me greatly in improving my proposal. Once I ve finalized my proposal draft, I ve sent it to Samuel for a review, which again led to some improvements to the final proposal which I ve uploaded to the GSoC program webpage.

Community bonding period Once you get the information that you ve been accepted into the GSoC program (don t take it personally if you don t make it; this was my second attempt after not making the cut in 2024), get in touch with your prospective mentor ASAP. Agree upon a communication channel and some response times. Put yourself in the loop for project news and discussions whatever that means in the context of your organization: In Debian s case this boiled down to subscribing to a bunch of mailing lists and IRC channels. Also make sure to setup a functioning development environment if you haven t done so for writing the proposal already.

Payoneer setup The by far most annoying part of GSoC for me. But since you don t have a choice if you want to get the stipend, you will need to signup for an account at Payoneer. In this iteration of GSoC all participants got a personalized link to open a Payoneer account. When I tried to open an account by following this link, I got an email after the registration and email verification that my account is being blocked because Payoneer deems the email adress I gave a temporary one. Well, the email in question is most certainly anything but temporary, so I tried to get in touch with the Payoneer support - and ended up in an LLM-infused kafkaesque support hell. Emails are answered by an LLM which for me meant utterly off-topic replies and no help whatsoever. The Payoneer website offers a real-time chat, but it is yet another instance of a bullshit-spewing LLM bot. When I at last tried to call them (the support lines are not listed on the Payoneer website but were provided by the GSoC program), I kid you not, I was being told that their platform is currently suffering from technical problems and was hung up on. Only thanks to the swift and helpful support of the GSoC administrators (who get priority support from Payoneer) I was able to setup a Payoneer account in the end. Apart from showing no respect to customers, Payoneer is also ripping them off big time with fees (unless you get paid in USD). They charge you 2% for currency conversions to EUR on top of the FX spread they take. What worked for me to avoid all of those fees, was to open a USD account at Wise and have Payoneer transfer my GSoC stipend in USD to that account. Then I exchanged the USD to my local currency at Wise for significantly less than Payoneer would have charged me. Also make sure to close your Payoneer account after the end of GSoC to avoid their annual fee.

Project work With all this prelude out of the way, I can finally get to the actual work I ve been doing over the course of my GSoC project.

Background The upstream rsync project generally sees little development. Nonetheless, they released version 3.4.0 including some CVE fixes earlier this year. Unfortunately, their changes broke the -H flag. Now, Debian package maintainers need to apply those security fixes to the package versions in the Debian repositories; and those are typically a bit older. Which usually means that the patches cannot be applied as is but will need some amendments by the Debian maintainers. For these cases it is helpful to have autopkgtests defined, which check the package s functionality in an automated way upon every build. The question then is, why should the tests not be written upstream such that regressions are caught in the development rather than the distribution process? There s a lot to say on this question and it probably depends a lot on the package at hand, but for rsync the main benefits are twofold:
  1. The upstream project mocks the ssh connection over which rsync is most typically used. Mocking is better than nothing but not the real thing. In addition to being a more realisitic test scenario for the typical rsync use case, involving an ssh server in the test would automatically extend the overall resilience of Debian packages as now new versions of the openssh-server package in Debian benefit from the test cases in the rsync reverse dependency.
  2. The upstream rsync test framework is somewhat idiosyncratic and difficult to port to reimplementations of rsync. Given that the original rsync upstream sees little development, an extensive test suit further downstream can serve as a threshold for drop-in replacements for rsync.

Goal(s) At the start of the project, the Debian rsync package was just running (a part of) the upstream tests as autopkgtests. The relevant snippet from the build log for the rsync_3.4.1+ds1-3 package reads:
114s ------------------------------------------------------------
114s ----- overall results:
114s 36 passed
114s 7 skipped
Samuel and I agreed that it would be a good first milestone to make the skipped tests run. Afterwards, I should write some rsync test cases for local calls, i.e. without an ssh connection, effectively using rsync as a more powerful cp. And once that was done, I should extend the tests such that they run over an active ssh connection. With these milestones, I went to work.

Upstream tests Running the seven skipped upstream tests turned out to be fairly straightforward:
  • Two upstream tests concern access control lists and extended filesystem attributes. For these tests to run they rely on functionality provided by the acl and xattr Debian packages. Adding those to the Build-Depends list in the debian/control file of the rsync Debian package repo made them run.
  • Four upstream tests required root privileges to run. The autopkgtest tool knows the needs-root restriction for that reason. However, Samuel and I agreed that the tests should not exclusively run with root privileges. So, instead of just adding the restiction to the existing autopkgtest test, we created a new one which has the needs-root restriction and runs the upstream-tests-as-root script - which is nothing else than a symlink to the existing upstream-tests script.
The commits to implement these changes can be found in this merge request. The careful reader will have noticed that I only made 2 + 4 = 6 upstream test cases run out of 7: The leftover upstream test is checking the functionality of the --ctimes rsync option. In the context of Debian, the problem is that the Linux kernel doesn t have a syscall to set the creation time of a file. As long as that is the case, this test will always be skipped for the Debian package.

Local tests When it came to writing Debian specific test cases I started of a completely clean slate. Which is a blessing and a curse at the same time: You have full flexibility but also full responsibility. There were a few things to consider at this point in time:
  • Which language to write the tests in? The programming language I am most proficient in is Python. But testing a CLI tool in Python would have been weird: it would have meant that I d have to make repeated subprocess calls to run rsync and then read from the filesystem to get the file statistics I want to check. Samuel suggested I stick with shell scripts and make use of diffoscope - one of the main tools used and maintained by the Reproducible Builds project - to check whether the file contents and file metadata are as expected after rsync calls. Since I did not have good reasons to use bash, I ve decided to write the scripts to be POSIX compliant.
  • How to avoid boilerplate? If one makes use of a testing framework, which one? Writing the tests would involve quite a bit of boilerplate, mostly related to giving informative output on and during the test run, preparing the file structure we want to run rsync on, and cleaning the files up after the test has run. It would be very repetitive and in violation of DRY to have the code for this appear in every test. Good testing frameworks should provide convenience functions for these tasks. shunit2 comes with those functions, is packaged for Debian, and given that it is already being used in the curl project, I decided to go with it.
  • Do we use the same directory structure and files for every test or should every test have an individual setup? The tradeoff in this question being test isolation vs. idiosyncratic code. If every test has its own setup, it takes a) more work to write the test and b) more work to understand the differences between tests. However, one can be sure that changes to the setup in one test will have no side effects on other tests. In my opinion, this guarantee was worth the additional effort in writing/reading the tests.
Having made these decisions, I simply started writing tests and ran into issues very quickly.

rsync and subsecond mtime diffs When testing the rsync --times option, I observed a weird phenomenon: If the source and destination file have modification times which differ only in the nanoseconds, an rsync --times call will not synchronize the modification times. More details about this behavior and examples can be found in the upstream issue I raised. In the Debian tests we had to occasionally work around this by setting the timestamps explicitly with touch -d.
diffoscope regression In one test case, I was expecting a difference in the modification times but diffoscope would not report a diff. After a good amount of time spent on debugging the problem (my default, and usually correct, assumption is that something about my code is seriously broken if I run into issues like that), I was able to show that diffoscope only displayed this behavior in the version in the unstable suite, not on Debian stable (which I am running on my development machine). Since everything pointed to a regression in the diffoscope project and with diffoscope being written in Python, a language I am familiar with, I wanted to spend some time investigating (and hopefully fixing) the problem. Running git bisect on the diffoscope repo helped me in identifying the commit which introduced the regression: The commit contained an optimization via an early return for bit-by-bit identical files. Unfortunately, the early return also caused an explicitly requested metadata comparison (which could be different between the files) to be skipped. With a nicely diagnosed issue like that, I was able to go to a local hackerspace event, where people work on FOSS together for an evening every month. In a group, we were able to first, write a test which showcases the broken behavior in the latest diffoscope version, and second, make a fix to the code such that the same test passes going forward. All details can be found in this merge request.
shunit2 failures At some point I had a few autopkgtests setup and passing, but adding a new one would throw me totally inexplicable errors. After trying to isolate the problem as much as possible, it turns out that shunit2 doesn t play well together we the -e shell option. The project mentions this in the release notes for the 2.1.8 version1, but in my opinion a constraint this severe should be featured much more prominently, e.g. in the README.

Tests over an ssh connection The centrepiece of this project; everything else has in a way only been preparation for this. Obviously, the goal was to reuse the previously written local tests in some way. Not only because lazy me would have less work to do this way, but also because of a reduced long-term maintenance burden of one rather than two test sets. As it turns out, it is actually possible to accomplish that: The remote-tests script doesn t do much apart from starting an ssh server on localhost and running the local-tests script with the REMOTE environment variable set. The REMOTE environment variable changes the behavior of the local-tests script in such a way that it prepends "$REMOTE": to the destination of the rsync invocations. And given that we set REMOTE=rsync@localhost in the remote-tests script, local-tests copies the files to the exact same locations as before, just over ssh. The implementational details for this can be found in this merge request.

proposed-updates Most of my development work on the Debian rsync package took place during the Debian freeze as the release of Debian Trixie is just around the corner. This means that uploading by Debian Developers (DD) and Debian Maintainers (DM) to the unstable suite is discouraged as it makes migrating the packages to testing more difficult for the Debian release team. If DDs/DMs want to have the package version in unstable migrated to testing during the freeze they have to file an unblock request. Samuel has done this twice (1, 2) for my work for Trixie but has asked me to file the proposed-updates request for current stable (i.e. Debian Bookworm) myself after I ve backported my tests to bookworm.

Unfinished business To run the upstream tests which check access control list and extended file system attributes functionality, I ve added the acl and xattr packages to Build-Depends in debian/control. This, however, will only make the packages available at build time: If Debian users install the rsync package, the acl and xattr packages will not be installed alongside it. For that, the dependencies would have to be added to Depends or Suggests in debian/control. Depends is probably to strong of a relation since rsync clearly works well in practice without, but adding them to Suggests might be worthwhile. A decision on this would involve checking, what happens if rsync is called with the relevant options on a host machine which has those packages installed, but where the destination machine lacks them. Apart from the issue described above, the 15 tests I managed to write are are a drop in the water in light of the infinitude of rsync options and their combinations. Most glaringly, not all options of the --archive option are covered separately (which would help indicating what code path of rsync broke in a regression). To increase the likelihood of catching regressions with the autopkgtests, the test coverage should be extended in the future.

Conclusion Generally, I am happy with my contributions to Debian over the course of my small GSoC project: I ve created an extensible, easy to understand, and working autopkgtest setup for the Debian rsync package. There are two things which bother me, however:
  1. In hindsight, I probably shouldn t have gone with shunit2 as a testing framework. The fact that it behaves erratically with the -e flag is a serious drawback for a shell testing framework: You really don t want a shell command to fail silently and the test to continue running.
  2. As alluded to in the previous section, I m not particularly proud of the number of tests I managed to write.
On the other hand, finding and fixing the regression in diffoscope - while derailing me from the GSoC project itself - might have a redeeming quality.

DebConf25 By sheer luck I happened to work on a GSoC project at Debian over a time period during which the annual Debian conference would take place close enough to my place of residence. Samuel pointed the opportunity to attend DebConf out to me during the community bonding period and since I could make time for the event in my schedule, I signed up. DebConf was a great experience which - aside from gaining more knowledge about Debian development - allowed me to meet the actual people usually hidden behind email adresses and IRC nicks. I can wholeheartedly recommend attending a DebConf to every interested Debian user! For those who have missed this year s iteration of the conference, I can recommend the following recorded talks: While not featuring as a keynote speaker (understandably so as the newcomer to Debian community that I am), I could still contribute a bit to the conference program.

GSoC project presentation The Debian Outreach team has scheduled a session in which all GSoC and Outreachy students over the past year had the chance to present their work in a lightning talk. The session has been recorded and is available online, just like my slides and the source for them.

Debian install workshop Additionally, with so many Debian experts gathering in one place while KDE s End of 10 campaign is ongoing, I felt it natural to organize a Debian install workhop. In hindsight I can say that I underestimated how much work it would be, especially for me who does not speak a word of French. But although the turnout of people who wanted us to install Linux on their machines was disappointingly low, it was still worth it: Not only because the material in the repo can be helpful to others planning install workshops but also because it was nice to meet a) the person behind the Debian installer images and b) the local Brest/Finist re Linux user group as well as the motivated and helpful people at Infini.

Credits I want to thank the Open Source team at Google for organizing GSoC: The highly structured program with a one-to-one mentorship is a great avenue to start contributing to well established and at times intimidating FOSS projects. And as much as I disagree with Google s surveillance capitalist business model, I have to give it to them that the company at least takes its responsibility for FOSS (somewhat) seriously - unlike many other businesses which rely on FOSS and choose to freeride of it. Big thanks to the Debian community! I ve experienced nothing but friendliness in my interactions with the community. And lastly, the biggest thanks to my GSoC mentor Samuel Henrique. He has dealt patiently and competently with all my stupid newbie questions. His support enabled me to make - albeit small - contributions to Debian. It has been a pleasure to work with him during GSoC and I m looking forward to working together with him in the future.

  1. Obviously, I ve only read them after experiencing the problem.

31 July 2025

Matthew Garrett: Secure boot certificate rollover is real but probably won't hurt you

LWN wrote an article which opens with the assertion "Linux users who have Secure Boot enabled on their systems knowingly or unknowingly rely on a key from Microsoft that is set to expire in September". This is, depending on interpretation, either misleading or just plain wrong, but also there's not a good source of truth here, so.

First, how does secure boot signing work? Every system that supports UEFI secure boot ships with a set of trusted certificates in a database called "db". Any binary signed with a chain of certificates that chains to a root in db is trusted, unless either the binary (via hash) or an intermediate certificate is added to "dbx", a separate database of things whose trust has been revoked[1]. But, in general, the firmware doesn't care about the intermediate or the number of intermediates or whatever - as long as there's a valid chain back to a certificate that's in db, it's going to be happy.

That's the conceptual version. What about the real world one? Most x86 systems that implement UEFI secure boot have at least two root certificates in db - one called "Microsoft Windows Production PCA 2011", and one called "Microsoft Corporation UEFI CA 2011". The former is the root of a chain used to sign the Windows bootloader, and the latter is the root used to sign, well, everything else.

What is "everything else"? For people in the Linux ecosystem, the most obvious thing is the Shim bootloader that's used to bridge between the Microsoft root of trust and a given Linux distribution's root of trust[2]. But that's not the only third party code executed in the UEFI environment. Graphics cards, network cards, RAID and iSCSI cards and so on all tend to have their own unique initialisation process, and need board-specific drivers. Even if you added support for everything on the market to your system firmware, a system built last year wouldn't know how to drive a graphics card released this year. Cards need to provide their own drivers, and these drivers are stored in flash on the card so they can be updated. But since UEFI doesn't have any sandboxing environment, those drivers could do pretty much anything they wanted to. Someone could compromise the UEFI secure boot chain by just plugging in a card with a malicious driver on it, and have that hotpatch the bootloader and introduce a backdoor into your kernel.

This is avoided by enforcing secure boot for these drivers as well. Every plug-in card that carries its own driver has it signed by Microsoft, and up until now that's been a certificate chain going back to the same "Microsoft Corporation UEFI CA 2011" certificate used in signing Shim. This is important for reasons we'll get to.

The "Microsoft Windows Production PCA 2011" certificate expires in October 2026, and the "Microsoft Corporation UEFI CA 2011" one in June 2026. These dates are not that far in the future! Most of you have probably at some point tried to visit a website and got an error message telling you that the site's certificate had expired and that it's no longer trusted, and so it's natural to assume that the outcome of time's arrow marching past those expiry dates would be that systems will stop booting. Thankfully, that's not what's going to happen.

First up: if you grab a copy of the Shim currently shipped in Fedora and extract the certificates from it, you'll learn it's not directly signed with the "Microsoft Corporation UEFI CA 2011" certificate. Instead, it's signed with a "Microsoft Windows UEFI Driver Publisher" certificate that chains to the "Microsoft Corporation UEFI CA 2011" certificate. That's not unusual, intermediates are commonly used and rotated. But if we look more closely at that certificate, we learn that it was issued in 2023 and expired in 2024. Older versions of Shim were signed with older intermediates. A very large number of Linux systems are already booting certificates that have expired, and yet things keep working. Why?

Let's talk about time. In the ways we care about in this discussion, time is a social construct rather than a meaningful reality. There's no way for a computer to observe the state of the universe and know what time it is - it needs to be told. It has no idea whether that time is accurate or an elaborate fiction, and so it can't with any degree of certainty declare that a certificate is valid from an external frame of reference. The failure modes of getting this wrong are also extremely bad! If a system has a GPU that relies on an option ROM, and if you stop trusting the option ROM because either its certificate has genuinely expired or because your clock is wrong, you can't display any graphical output[3] and the user can't fix the clock and, well, crap.

The upshot is that nobody actually enforces these expiry dates - here's the reference code that disables it. In a year's time we'll have gone past the expiration date for "Microsoft Windows UEFI Driver Publisher" and everything will still be working, and a few months later "Microsoft Windows Production PCA 2011" will also expire and systems will keep booting Windows despite being signed with a now-expired certificate. This isn't a Y2K scenario where everything keeps working because people have done a huge amount of work - it's a situation where everything keeps working even if nobody does any work.

So, uh, what's the story here? Why is there any engineering effort going on at all? What's all this talk of new certificates? Why are there sensationalist pieces about how Linux is going to stop working on old computers or new computers or maybe all computers?

Microsoft will shortly start signing things with a new certificate that chains to a new root, and most systems don't trust that new root. System vendors are supplying updates[4] to their systems to add the new root to the set of trusted keys, and Microsoft has supplied a fallback that can be applied to all systems even without vendor support[5]. If something is signed purely with the new certificate then it won't boot on something that only trusts the old certificate (which shouldn't be a realistic scenario due to the above), but if something is signed purely with the old certificate then it won't boot on something that only trusts the new certificate.

How meaningful a risk is this? We don't have an explicit statement from Microsoft as yet as to what's going to happen here, but we expect that there'll be at least a period of time where Microsoft signs binaries with both the old and the new certificate, and in that case those objects should work just fine on both old and new computers. The problem arises if Microsoft stops signing things with the old certificate, at which point new releases will stop booting on systems that don't trust the new key (which, again, shouldn't happen). But even if that does turn out to be a problem, nothing is going to force Linux distributions to stop using existing Shims signed with the old certificate, and having a Shim signed with an old certificate does nothing to stop distributions signing new versions of grub and kernels. In an ideal world we have no reason to ever update Shim[6] and so we just keep on shipping one signed with two certs.

If there's a point in the future where Microsoft only signs with the new key, and if we were to somehow end up in a world where systems only trust the old key and not the new key[7], then those systems wouldn't boot with new graphics cards, wouldn't be able to run new versions of Windows, wouldn't be able to run any Linux distros that ship with a Shim signed only with the new certificate. That would be bad, but we have a mechanism to avoid it. On the other hand, systems that only trust the new certificate and not the old one would refuse to boot older Linux, wouldn't support old graphics cards, and also wouldn't boot old versions of Windows. Nobody wants that, and for the foreseeable future we're going to see new systems continue trusting the old certificate and old systems have updates that add the new certificate, and everything will just continue working exactly as it does now.

Conclusion: Outside some corner cases, the worst case is you might need to boot an old Linux to update your trusted keys to be able to install a new Linux, and no computer currently running Linux will break in any way whatsoever.

[1] (there's also a separate revocation mechanism called SBAT which I wrote about here, but it's not relevant in this scenario)

[2] Microsoft won't sign GPLed code for reasons I think are unreasonable, so having them sign grub was a non-starter, but also the point of Shim was to allow distributions to have something that doesn't change often and be able to sign their own bootloaders and kernels and so on without having to have Microsoft involved, which means grub and the kernel can be updated without having to ask Microsoft to sign anything and updates can be pushed without any additional delays

[3] It's been a long time since graphics cards booted directly into a state that provided any well-defined programming interface. Even back in 90s, cards didn't present VGA-compatible registers until card-specific code had been executed (hence DEC Alphas having an x86 emulator in their firmware to run the driver on the card). No driver? No video output.

[4] There's a UEFI-defined mechanism for updating the keys that doesn't require a full firmware update, and it'll work on all devices that use the same keys rather than being per-device

[5] Using the generic update without a vendor-specific update means it wouldn't be possible to issue further updates for the next key rollover, or any additional revocation updates, but I'm hoping to be retired by then and I hope all these computers will also be retired by then

[6] I said this in 2012 and it turned out to be wrong then so it's probably wrong now sorry, but at least SBAT means we can revoke vulnerable grubs without having to revoke Shim

[7] Which shouldn't happen! There's an update to add the new key that should work on all PCs, but there's always the chance of firmware bugs

comment count unavailable comments

26 July 2025

Matthew Palmer: Object deserialization attacks using Ruby's Oj JSON parser

tl;dr: there is an attack in the wild which is triggering dangerous-but-seemingly-intended behaviour in the Oj JSON parser when used in the default and recommended manner, which can lead to everyone s favourite kind of security problem: object deserialization bugs! If you have the oj gem anywhere in your Gemfile.lock, the quickest mitigation is to make sure you have Oj.default_options = mode: :strict somewhere, and that no library is overwriting that setting to something else.

Prologue As a sensible sysadmin, all the sites I run send me a notification if any unhandled exception gets raised. Mostly, what I get sent is error-handling corner cases I missed, but now and then things get more interesting. In this case, it was a PG::UndefinedColumn exception, which looked something like this:
PG::UndefinedColumn: ERROR:  column "xyzzydeadbeef" does not exist
This is weird on two fronts: firstly, this application has been running for a while, and if there was a schema problem, I d expect it to have made itself apparent long before now. And secondly, while I don t profess to perfection in my programming, I m usually better at naming my database columns than that. Something is definitely hinky here, so let s jump into the mystery mobile!

The column name is coming from outside the building! The exception notifications I get sent include a whole lot of information about the request that caused the exception, including the request body. In this case, the request body was JSON, and looked like this:
 "name":":xyzzydeadbeef", ... 
The leading colon looks an awful lot like the syntax for a Ruby symbol, but it s in a JSON string. Surely there s no way a JSON parser would be turning that into a symbol, right? Right?!? Immediately, I thought that that possibly was what was happening, because I use Sequel for my SQL database access needs, and Sequel treats symbols as database column names. It seemed like too much of a coincidence that a vaguely symbol-shaped string was being sent in, and the exact same name was showing up as a column name. But how the flying fudgepickles was a JSON string being turned into a Ruby symbol, anyway? Enter Oj.

Oj? I barely know aj A long, long time ago, the standard Ruby JSON library had a reputation for being slow. Thus did many competitors flourish, claiming more features and better performance. Strong amongst the contenders was oj (for Optimized JSON ), touted as The fastest JSON parser and object serializer . Given the history, it s not surprising that people who wanted the best possible performance turned to Oj, leading to it being found in a great many projects, often as a sub-dependency of a dependency of a dependency (which is how it ended up in my project). You might have noticed in Oj s description that, in addition to claiming fastest , it also describes itself as an object serializer . Anyone who has kept an eye on the security bug landscape will recall that object deserialization is a rich vein of vulnerabilities to mine. Libraries that do object deserialization, especially ones with a history that goes back to before the vulnerability class was well-understood, are likely to be trouble magnets. And thus, it turns out to be with Oj. By default, Oj will happily turn any string that starts with a colon into a symbol:

>> require "oj"
>> Oj.load(' "name":":xyzzydeadbeef","username":"bob","answer":42 ')
=>  "name"=>:xyzzydeadbeef, "username"=>"bob", "answer"=>42 

How that gets exploited is only limited by the creativity of an attacker. Which I ll talk about more shortly but first, a word from my rant cortex.

Insecure By Default is a Cancer While the object of my ire today is Oj and its fast-and-loose approach to deserialization, it is just one example of a pervasive problem in software: insecurity by default. Whether it s a database listening on 0.0.0.0 with no password as soon as its installed, or a library whose default behaviour is to permit arbitrary code execution, it all contributes to a software ecosystem that is an appalling security nightmare. When a user (in this case, a developer who wants to parse JSON) comes across a new piece of software, they have by definition no idea what they re doing with that software. They re going to use the defaults, and follow the most easily-available documentation, to achieve their goal. It is unrealistic to assume that a new user of a piece of software is going to do things the right way , unless that right way is the only way, or at least the by-far-the-easiest way. Conversely, the developer(s) of the software is/are the domain experts. They have knowledge of the problem domain, through their exploration while building the software, and unrivalled expertise in the codebase. Given this disparity in knowledge, it is tantamount to malpractice for the experts the developer(s) to off-load the responsibility for the safe and secure use of the software to the party that has the least knowledge of how to do that (the new user). To apply this general principle to the specific case, take the Using section of the Oj README. The example code there calls Oj.load, with no indication that this code will, in fact, parse specially-crafted JSON documents into Ruby objects. The brand-user user of the library, no doubt being under pressure to Get Things Done, is almost certainly going to look at this Using example, get the apparent result they were after (a parsed JSON document), and call it a day. It is unlikely that a brand-new user will, for instance, scroll down to the Further Reading section, find the second last (of ten) listed documents, Security.md , and carefully peruse it. If they do, they ll find an oblique suggestion that parsing untrusted input is never a good idea . While that s true, it s also rather unhelpful, because I d wager that by far the majority of JSON parsed in the world is untrusted , in one way or another, given the predominance of JSON as a format for serializing data passing over the Internet. This guidance is roughly akin to putting a label on a car s airbags that driving at speed can be hazardous to your health : true, but unhelpful under the circumstances. The solution is for default behaviours to be secure, and any deviation from that default that has the potential to degrade security must, at the very least, be clearly labelled as such. For example, the Oj.load function should be named Oj.unsafe_load, and the Oj.load function should behave as the Oj.safe_load function does presently. By naming the unsafe function as explicitly unsafe, developers (and reviewers) have at least a fighting chance of recognising they re doing something risky. We put warning labels on just about everything in the real world; the same should be true of dangerous function calls. OK, rant over. Back to the story.

But how is this exploitable? So far, I ve hopefully made it clear that Oj does some Weird Stuff with parsing certain JSON strings. It caused an unhandled exception in a web application I run, which isn t cool, but apart from bombing me with exception notifications, what s the harm? For starters, let s look at our original example: when presented with a symbol, Sequel will interpret that as a column name, rather than a string value. Thus, if our save an update to the user code looked like this:

# request_body has the JSON representation of the form being submitted
body = Oj.load(request_body)
DB[:users].where(id: user_id).update(name: body["name"])

In normal operation, this will issue an SQL query along the lines of UPDATE users SET name='Jaime' WHERE id=42. If the name given is Jaime O Dowd , all is still good, because Sequel quotes string values, etc etc. All s well so far. But, imagine there is a column in the users table that normally users cannot read, perhaps admin_notes. Or perhaps an attacker has gotten temporary access to an account, and wants to dump the user s password hash for offline cracking. So, they send an update claiming that their name is :admin_notes (or :password_hash). In JSON, that ll look like "name":":admin_notes" , and Oj.load will happily turn that into a Ruby object of "name"=>:admin_notes . When run through the above update the user code fragment, it ll produce the SQL UPDATE users SET name=admin_notes WHERE id=42. In other words, it ll copy the contents of the admin_notes column into the name column which the attacker can then read out just by refreshing their profile page.

But Wait, There s More! That an attacker can read other fields in the same table isn t great, but that s barely scratching the surface. Remember before I said that Oj does object serialization ? That means that, in general, you can create arbitrary Ruby objects from JSON. Since objects contain code, it s entirely possible to trigger arbitrary code execution by instantiating an appropriate Ruby object. I m not going to go into details about how to do this, because it s not really my area of expertise, and many others have covered it in detail. But rest assured, if an attacker can feed input of their choosing into a default call to Oj.load, they ve been handed remote code execution on a platter.

Mitigations As Oj s object deserialization is intended and documented behaviour, don t expect a future release to make any of this any safer. Instead, we need to mitigate the risks. Here are my recommended steps:
  1. Look in your Gemfile.lock (or SBOM, if that s your thing) to see if the oj gem is anywhere in your codebase. Remember that even if you don t use it directly, it s popular enough that it is used in a lot of places. If you find it in your transitive dependency tree anywhere, there s a chance you re vulnerable, limited only by the ingenuity of attackers to feed crafted JSON into a deeply-hidden Oj.load call.
  2. If you depend on oj directly and use it in your project, consider not doing that. The json gem is acceptably fast, and JSON.parse won t create arbitrary Ruby objects.
  3. If you really, really need to squeeze the last erg of performance out of your JSON parsing, and decide to use oj to do so, find all calls to Oj.load in your code and switch them to call Oj.safe_load.
  4. It is a really, really bad idea to ever use Oj to deserialize JSON into objects, as it lacks the safety features needed to mitigate the worst of the risks of doing so (for example, restricting which classes can be instantiated, as is provided by the permitted_classes argument to Psych.load). I d make it a priority to move away from using Oj for that, and switch to something somewhat safer (such as the aforementioned Psych). At the very least, audit and comment heavily to minimise the risk of user-provided input sneaking into those calls somehow, and pass mode: :object as the second argument to Oj.load, to make it explicit that you are opting-in to this far more dangerous behaviour only when it s absolutely necessary.
  5. To secure any unsafe uses of Oj.load in your dependencies, consider setting the default Oj parsing mode to :strict, by putting Oj.default_options = mode: :strict somewhere in your initialization code (and make sure no dependencies are setting it to something else later!). There is a small chance that this change of default might break something, if a dependency is using Oj to deliberately create Ruby objects from JSON, but the overwhelming likelihood is that Oj s just being used to parse ordinary JSON, and these calls are just RCE vulnerabilities waiting to give you a bad time.

Is Your Bacon Saved? If I ve helped you identify and fix potential RCE vulnerabilities in your software, or even just opened your eyes to the risks of object deserialization, please help me out by buying me a refreshing beverage. I would really appreciate any support you can give. Alternately, if you d like my help in fixing these (and many other) sorts of problems, I m looking for work, so email me.

23 July 2025

Sergio Cipriano: How I finally tracked my Debian uploads correctly

How I finally tracked my Debian uploads correctly A long time ago, I became aware of UDD (Ultimate Debian Database), which gathers various Debian data into a single SQL database. At that time, we were trying to do something simple: list the contributions (package uploads) of our local community, Debian Bras lia. We ended up with a script that counted uploads to unstable and experimental. I was never satisfied with the final result because some uploads were always missing. Here is an example:
debci (3.0) experimental; urgency=medium
...
   [ Sergio de almeida cipriano Junior ]
   * Fix Style/GlovalVars issue
   * Rename blacklist to rejectlist
...
I made changes in debci 3.0, but the upload was done by someone else. This kind of contribution cannot be tracked by that script. Then, a few years ago, I learned about Minechangelogs, which allows us to search through the changelogs of all Debian packages currently published. Today, I decided to explore how this was done, since I couldn't find anything useful for that kind of query in UDD's tables. That's when I came across ProjectB. It was my first time hearing about it. ProjectB is a database that stores all the metadata about the packages in the Debian archive, including the changelogs of those packages. Now that I'm a Debian Developer, I have access to this database. If you also have access and want to try some queries, you can do this:
$ ssh mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
$ psql postgresql://guest@localhost:15434/projectb?sslmode=allow
In the end, it finally solved my problem. Using the code below, with UDD, I get 38 uploads:
import psycopg2

contributor = 'almeida cipriano'

try:
    connection = psycopg2.connect(
        user="udd-mirror",
        password="udd-mirror",
        host="udd-mirror.debian.net",
        port="5432",
        database="udd"
    )

    cursor = connection.cursor()

    query = f"SELECT source,version,date,distribution,signed_by_name \
FROM public.upload_history \
WHERE changed_by_name ILIKE '% contributor %' \
ORDER BY date;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have  len(records)  uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)
Using the code bellow, with ProjectB, I get 43 uploads (the correct amount):
import psycopg2

contributor = 'almeida cipriano'

try:
    # SSH tunnel is required to access the database:
    # ssh <username>@mirror.ftp-master.debian.org -N -L 15434:danzi.debian.org:5435
    connection = psycopg2.connect(
        user="guest",
        host="localhost",
        port="15434",
        database="projectb",
        sslmode="allow"
    )
    connection.set_client_encoding('UTF8')

    cursor = connection.cursor()

    query = f"SELECT c.source, c.version, c.changedby \
FROM changes c \
JOIN changelogs ch ON ch.id = c.changelog_id \
WHERE c.source != 'debian-keyring' \
  AND (\
    ch.changelog ILIKE '% contributor %' \
    OR c.changedby ILIKE '% contributor %' \
  )\
ORDER BY c.seen;"

    cursor.execute(query)
    records = cursor.fetchall()

    print(f"I have  len(records)  uploads.")

    cursor.close()
    connection.close()

except (Exception, psycopg2.Error) as error:
    print("Error while fetching data from PostgreSQL", error)
It feels good to finally solve this itch I've had for years.

22 July 2025

Iustin Pop: Watching website scanning bots

Ever since I put up http://demo.corydalis.io, and setup logcheck, I m inadvertently keeping up with recent exploits in common CMS frameworks, or maybe even normal web frameworks issues, by seeing what 404s I get from the logs. Now, I didn t indent to do this per se, I just wanted to make sure I don t have any 500s, and at one point, I did actually catch a bug by seeing seemingly valid URLs, with referrer my own pages, leading to 404s. But besides that, it s mainly a couple times per week, a bot finds the site, and then it tries in fast succession something like this (real log entries, with the source IP address removed):
[21/Jul/2025:09:27:09 +0200] "GET /pms?module=logging&file_name=../../../../../../~/.aws/credentials&number_of_lines=10000 HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:11 +0200] "GET /admin/config?cmd=cat+/root/.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:11 +0200] "GET /.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:13 +0200] "GET /.env.local HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:13 +0200] "GET /.env.production HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:16 +0200] "GET /.env.dev HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:17 +0200] "GET /.env.development HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:19 +0200] "GET /.env.prod HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:19 +0200] "GET /.env.stage HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:22 +0200] "GET /.env.test HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:23 +0200] "GET /.env.example HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:25 +0200] "GET /.env.bak HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:26 +0200] "GET /.env.old HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:28 +0200] "GET /.envs/.production/.django HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:28 +0200] "GET /blog.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:31 +0200] "GET /wp-content/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:32 +0200] "GET /application/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:34 +0200] "GET /app/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:35 +0200] "GET /apps/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:37 +0200] "GET /config/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:38 +0200] "GET /config/config.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:40 +0200] "GET /config/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:41 +0200] "GET /api/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:43 +0200] "GET /vendor/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:44 +0200] "GET /backend/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:46 +0200] "GET /server/.env HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:46 +0200] "GET /home/user/.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:49 +0200] "GET /aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:50 +0200] "GET /.aws/credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:52 +0200] "GET /.aws/config HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:52 +0200] "GET /config/aws.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:55 +0200] "GET /config/aws.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:55 +0200] "GET /.env.production HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:58 +0200] "GET /config.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:27:59 +0200] "GET /config/config.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:01 +0200] "GET /config/settings.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:02 +0200] "GET /config/secrets.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:04 +0200] "GET /config.yaml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:04 +0200] "GET /config.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:07 +0200] "GET /config.py HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:08 +0200] "GET /secrets.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:10 +0200] "GET /secrets.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:11 +0200] "GET /credentials.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:13 +0200] "GET /.git-credentials HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:14 +0200] "GET /.git/config HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:16 +0200] "GET /.gitignore HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:18 +0200] "GET /.gitlab-ci.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:19 +0200] "GET /.github/workflows HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:22 +0200] "GET /.idea/workspace.xml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:22 +0200] "GET /.vscode/settings.json HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:25 +0200] "GET /docker-compose.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:25 +0200] "GET /docker-compose.override.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:28 +0200] "GET /docker-compose.prod.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:28 +0200] "GET /docker-compose.dev.yml HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:32 +0200] "GET /phpinfo HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:32 +0200] "GET /_profiler/phpinfo HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:34 +0200] "GET /phpinfo.php HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:34 +0200] "GET /info.php HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:37 +0200] "GET /storage/logs/laravel.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:37 +0200] "GET /storage/logs/error.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:40 +0200] "GET /logs/debug.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:40 +0200] "GET /logs/app.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:49 +0200] "GET /debug.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:51 +0200] "GET /error.log HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:53 +0200] "GET /.DS_Store HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:55 +0200] "GET /backup.zip HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:28:58 +0200] "GET /.backup HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:00 +0200] "GET /db.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:03 +0200] "GET /dump.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:06 +0200] "GET /database.sql HTTP/1.1" 404 - "" "Mozilla/5.0"
[21/Jul/2025:09:29:09 +0200] "GET /backup.tar.gz HTTP/1.1" 404 - "" "Mozilla/5.0"
Now, this example is actually trying to catch a bit more things, but many times it s focused on some specific thing, or two things. Here we have docker, MacOS .DS_Store (I m not sure how that s useful - to find more filenames?), VSCode settings, various secrets, GitHub workflows, log output, database dumps, AWS credentials, and still I guess from the wp filename WordPress settings. The first few years were full of WordPress scanners, now it seems it has quieted down, I haven t seen a bot scanning 200 WP potential filenames in ages. And this bot even bothers to put in Mozilla/5.0 as browser identification . Side-note: I don t think the filename path in the first log entry, i.e. ../../../../../../~/, ever properly resolves to the home directory of any user. So I m not that particular scanner ever works, but who knows? Maybe some framework does bad tilde expansion, but at least bash will not expand ~ inside a path, it seems that path is passed as-is to an invoked command (strace confirms it). What s surprising here is that these are usually plain dumb scanners, from the same IP address, no concern on throttling, no attempt to hide, just 2 minutes of brute-forcing a random list of known treasures , then moving on. For this to be worth, it means there are still victims found using this method, sadly. Well, sometimes I get a single, one-off "GET /wp-login.php HTTP/1.1, which is strange enough it might not be a bot even, who knows. But in general, periods of activity of this type are coming and going, probably aligned with new CVEs. And another surprising thing is that for this type of scanning to work (and I ve seen many over the years), the website framework/configuration must allow random file download. Corydalis itself is written in Haskell, using Yesod, and it has a hardcoded (built at compile time) list of static resources it will serve. I haven t made the switch to fully embedding in the binary, but at that point, it won t need to read from the filesystem at all. Right now it will serve a few CSS and JS files, plus fonts, but that s it, no arbitrary filesystem traversal. Strange that some frameworks allow it. This is not productively spent time, but it is fun, especially seeing how this changes over time. And probably the most use anyone gets out of http://demo.corydalis.io .

David Bremner: Hibernate on the pocket reform 10/n

Context

Finally applying the pci reset series.
$ b4 am 20250715-pci-port-reset-v6-0-6f9cce94e7bb@oss.qualcomm.com
$ git am -3 v6_20250715_manivannan_sadhasivam_pci_add_support_for_resetting_the_root_ports_in_a_platform_specifi.mbx
There is quite a scary looking conflict between the last patch in the series and https://lore.kernel.org/r/1744940759-23823-1-git-send-email-shawn.lin@rock-chips.com which is now upstream (collabora) in rockchip-devel. I resolved the second basically by taking both, as it seemed like two independent sets of additions to the same parts of the file. The first it looks like Shawn's commit referenced above should prevail.
  • If anyone is curious about the (possibly incorrectly) rebased patches, they are at https://salsa.debian.org/bremner/collabora-rockchip-3588 (reform-patches is the default, and relevant branch).

testing
  • The new (6.16~rc7+) kernel boots
  • It successfully reboots
  • devices test passes, although the UBSAN warning / error is still there
 174.559032] UBSAN: array-index-out-of-bounds in net/mac80211/rc80211_minstrel_ht.c:409:33
[  174.559830] index 15 is out of range for type 'minstrel_rate_stats [10]'
[  174.560462] CPU: 7 UID: 0 PID: 213 Comm: kworker/u32:10 Tainted: G        WC OE       6.16.0-rc7+ #6 NONE
[  174.560470] Tainted: [W]=WARN, [C]=CRAP, [O]=OOT_MODULE, [E]=UNSIGNED_MODULE
[  174.560472] Hardware name: MNT Pocket Reform with RCORE RK3588 Module (DT)
[  174.560474] Workqueue: mt76 mt76u_tx_status_data [mt76_usb]
[  174.560489] Call trace:
[  174.560491]  show_stack+0x34/0x98 (C)
[  174.560501]  dump_stack_lvl+0x60/0x80
[  174.560508]  dump_stack+0x18/0x24
[  174.560514]  ubsan_epilogue+0x10/0x48
[  174.560520]  __ubsan_handle_out_of_bounds+0xa0/0xd0
[  174.560526]  minstrel_ht_tx_status+0x890/0xc68 [mac80211]
[  174.560633]  rate_control_tx_status+0xbc/0x180 [mac80211]
[  174.560730]  ieee80211_tx_status_ext+0x1d8/0x9a0 [mac80211]
[  174.560822]  mt76_tx_status_unlock+0x188/0x2a0 [mt76]
[  174.560844]  mt76x02_send_tx_status+0x130/0x4a0 [mt76x02_lib]
[  174.560860]  mt76x02_tx_status_data+0x64/0xa8 [mt76x02_lib]
[  174.560872]  mt76u_tx_status_data+0x84/0x120 [mt76_usb]
[  174.560879]  process_one_work+0x178/0x3c8
[  174.560885]  worker_thread+0x208/0x400
[  174.560890]  kthread+0x120/0x220
[  174.560894]  ret_from_fork+0x10/0x20
[  174.560898] ---[ end trace ]---
  • "platform" test still fails with
[   88.484072] rk_gmac-dwmac fe1b0000.ethernet end0: Link is Down
[   88.597026] rockchip-dw-pcie a40c00000.pcie: Failed to receive PME_TO_Ack
[   88.598523] PM: hibernation: hibernation debug: Waiting for 5 second(s).
[   94.667723] rockchip-dw-pcie a40c00000.pcie: Phy link never came up
[   94.668281] rockchip-dw-pcie a40c00000.pcie: fail to resume
[   94.668783] rockchip-dw-pcie a40c00000.pcie: PM: dpm_run_callback(): genpd_restore_noirq returns -110
[   94.669594] rockchip-dw-pcie a40c00000.pcie: PM: failed to restore noirq: error -110
[  120.035426] watchdog: CPU4: Watchdog detected hard LOCKUP on cpu 5
[  120.035978] Modules linked in: xt_CHECKSUM xt_tcpudp nft_chain_nat xt_MASQUERADE nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 nft_compat bridge stp llc nf_tables aes_neon_bs aes_neon_blk ccm snd_seq_dummy snd_hrtimer snd_seq snd_seq_device dwmac_rk binfmt_misc mt76x2_common mt76x02_usb mt76_usb mt76x02_lib mt76 mac80211 libarc4 snd_soc_simple_card rockchip_saradc industrialio_triggered_buffer cfg80211 snd_soc_tlv320aic31xx rk805_pwrkey kfifo_buf reform2_lpc(OE) industrialio rockchip_thermal rfkill rockchip_rng hantro_vpu cdc_acm rockchip_rga v4l2_vp9 snd_soc_rockchip_i2s_tdm rockchip_vdec2 panthor videobuf2_dma_sg v4l2_jpeg drm_gpuvm v4l2_h264 drm_exec snd_soc_audio_graph_card snd_soc_simple_card_utils joydev evdev dm_mod nvme_fabrics efi_pstore configfs nfnetlink ip_tables x_tables autofs4 ext4 crc16 mbcache jbd2 btrfs blake2b_generic xor xor_neon raid6_pq mali_dp snd_soc_meson_axg_toddr snd_soc_meson_axg_fifo snd_soc_meson_codec_glue panfrost drm_shmem_helper gpu_sched ao_cec_g12a meson_vdec(C)
[  120.036066]  videobuf2_dma_contig hid_generic videobuf2_memops v4l2_mem2mem videobuf2_v4l2 videodev videobuf2_common mc dw_hdmi_i2s_audio meson_drm meson_canvas meson_dw_mipi_dsi meson_dw_hdmi usbhid hid mxsfb mux_mmio panel_edp imx_dcss ti_sn65dsi86 nwl_dsi mux_core pwm_imx27 xhci_plat_hcd xhci_hcd onboard_usb_dev snd_soc_hdmi_codec snd_soc_core micrel snd_pcm_dmaengine nvme snd_pcm nvme_core snd_timer snd nvme_keyring nvme_auth soundcore stmmac_platform stmmac pcs_xpcs phylink mdio_devres of_mdio sdhci_of_dwcmshc fixed_phy sdhci_pltfm phy_rockchip_usbdp dw_mmc_rockchip fwnode_mdio ehci_platform typec phy_rockchip_samsung_hdptx phy_rockchip_naneng_combphy rk808_regulator pwm_rockchip dwc3 dw_wdt libphy fan53555 ohci_platform sdhci ehci_hcd ulpi rtc_pcf8523 dw_mmc_pltfm udc_core ohci_hcd dw_mmc cqhci mdio_bus rockchip_dfi rockchipdrm dw_hdmi_qp analogix_dp i2c_rk3x usbcore phy_rockchip_inno_usb2 dw_mipi_dsi dw_mipi_dsi2 usb_common cpufreq_dt drm_dp_aux_bus [last unloaded: mt76x2u]
[  120.036150] Sending NMI from CPU 4 to CPUs 5:
  • The results are similar if I uncomment the unloading of the dwc3 module
set -x
echo platform >  /sys/power/pm_test
echo reboot > /sys/power/disk
sleep 2
rmmod mt76x2u
sleep 2
#rmmod dwc3
#sleep 2
echo disk >  /sys/power/state
sleep 2
#modprobe dwc3
#sleep 2
modprobe mt76x2u
  • Unsurprisingly, if I try an actual resume (instead of a "platform" test), I get the same messages about "Phy link never came up" and the system needs a hard reboot after trying to resume.
  • Barring inspiration, my next move will be to report my lack of success to the appropriate kernel mailing list(s).
previous episode

17 July 2025

Birger Schacht: My first tag2upload upload

Following the DebConf25 talk by Ian Jackson tag2upload - upload simply by pushing a signed git tag I decided to use the quiet time during the day of the DayTrip on DebConf 25 to try out uploading a package using tag2upload. Given the current freeze a couple of the packages I maintainer have new releases waiting. I decided on uploading the new version of labwc to experimental. Labwc is a Wayland compositor based on the wlroots compositor library (the library that sway is using). Labwc is inspired by the Openbox window manager. The upstream team of Labwc released version 0.9.0 last week, the first version that is based on wlroots 0.19.x. Wlroots 0.19.x is also only available in experimental, so that was a good fit for trying an upload with tag2upload. I first used my usual workflow, going into my package repository, doing get fetch origin, checking out the tag of the new release and tagging that with git tag upstream/0.9.0. Then I bumped the version in the debian/experimental branch, adapted the debian/control file for the changed wlroots dependency, committed and built the package using git-buildpackage to check if it builds fine and there are no lintian errors. Then I moved on to look at tag2upload. As a starting point for using tag2upload I read the blogpost by Jonathan Carter My first tag2upload upload. It pointed me to one very important option of git debpush, namely the --baredebian option which I have to use because I use the bare Debian git layout. Given that the last upload of labwc I did was to unstable, I also had to add the --force=changed-suite. I also started right away to use the --tag-only option, because for my first tests I only wanted to have local changes and nothing pushed to anywhere. I also used the --dry-run option. This led to the following command:
> git debpush --baredebian --force=changed-suite --dry-run --tag-only
tags 0.9.0, upstream/0.9.0 all exist in this repository
tell me which one you want to make an orig.tar from: git deborig --just-print '--version=0.9.0-1' TAG
git-debpush: git-deborig failed; maybe try git-debpush --upstream=TAG
This was a bit confusing, because the error message talked about git-deborig, but I was using git-debpush. I also did not want to make an orig.tar! The --upstream option in the git-debpush(1) manual gave an explanation for this:
When pushing a non-native package, git-debpush needs a tag for the upstream part of your package. By default git-debpush asks git-deborig(1), which searches for a suitable tag based on the upstream version in debian/changelog.
So apparently git-debpush can not find out what the correct tag for the upstream version is, because git-deborig can not find out what the correct tag for the upstream version is. git-debpush simply calls git deborig --just-print --version="$version" in line 437. This fails because I initially created a second upstream/0.9.0 to the existing 0.9.0 release tag. I do this for git-buildpackage to find the upstream sources, but with multiple tags git-deborig is not sure which one is the tag it should use (although both point to the same commit). So I removed the upstream/0.9.0 tag and ran git debpush again and now there was no error message (besides the warning regarding the changed suite) but it also did not give an feedback about what is happening. So I tried without the --dry-run option. Again, no output whatsoever, other than the warning about the changed release, BUT my gnupg asked me for permission to sign via my yubikey! And when I looked at the list of tags, I saw that there is now a debian/0.9.0-1 tag that was not there before! Looking at the tag I saw that it was a tag in the format described in the tag2upload(5) manual page, containing the following lines:
labwc release 0.9.0-1 for experimental
[dgit distro=debian split --quilt=baredebian]
[dgit please-upload source=labwc version=0.9.0-1 upstream-tag=0.9.0 upstream=4beee3851f75b53afc2e8699c594c0cc222115bd]
and the tag was signed by me. The 4beee3851f75b53afc2e8699c594c0cc222115bd commit ID is the commit the 0.9.0 tag points to. Now that I had a signed commit tag in the correct format, I went to the labwc packaging repository on salsa and enabled the webhook to trigger the tag2upload service (I just saw that the documentation was updated and there is now a global webhook on salsa, so this step is not needed anymore). Finally I pushed the tags using git push --tags. I could also have used git-debpush for this step, but I d rather use git directly. I then looked at the tag2upload queue and saw how a worker built and uploaded the newest labwc release and I also got an email from the tag2upload service [tag2upload 275] uploaded labwc 0.9.0-1. And a couple of minutes later I got the confirmation that labwc 0.9.0-1 was accepted into experimental. Great! So, to conclude: for tag2upload to work we simply need a git tag in the correct format. The tag2upload service now gets triggered by every pushed tag on salsa but only acts on tags that adhere to the tag2upload(5) format. git-debpush is a simply bash script that creates such a tag and by default also pushes the tag. I think the script could be a bit more verbose, for example telling me that it created a tag and the name of that tag. I think the dependency on git-deborig is also a problem. I use git-buildpackage to build my packages and by default git-buildpacakge assumes upstream tags are of the form upstream/%(version)s (source). I could now change that for all the packages I maintain, but I also think it makes sense to control the tag myself and not use a tag that is controlled by upstream. Upstream could change or delete that tag or I might need to create a tag for a version that is not tagged by upstream. I also think git-debpush is a rather mileading command name, given that the main task of the script is to create a tag in the correct format. Other than that, I m pretty happy about this service. I have a rather crappy uplink at home and it is not so uncommon for my uploads to fail because the connection dropped during dput. Using a simple git based upload approach makes these problems a thing of the past. I might look into other ways to create the needed tag, though.

Arnaud Rebillout: Acquire-By-Hash for APT packages repositories, and the lack of it in Kali Linux

This is a lenghty blog post. It features a long introduction that explains how apt update acquires various files from a package repository, what is Acquire-By-Hash, and how it all works for Kali Linux: a Debian-based distro that doesn't support Acquire-By-Hash, and which is distributed via a network of mirrors and a redirector. In a second part, I explore some "Hash Sum Mismatch" errors that we can hit with Kali Linux, errors that would not happen if only Acquire-By-Hash was supported. If anything, this blog post supports the case for adding Acquire-By-Hash support in reprepro, as requested at https://bugs.debian.org/820660. All of this could have just remained some personal notes for myself, but I got carried away and turned it into a blog post, dunno why... Hopefully others will find it interesting, but you really need to like troubleshooting stories, packed with details, and poorly written at that. You've been warned! Introducing Acquire-By-Hash Acquire-By-Hash is a feature of APT package repositories, that might or might not be supported by your favorite Debian-based distribution. A repository that supports it says so, in the Release file, by setting the field Acquire-By-Hash: yes. It's easy to check. Debian and Ubuntu both support it:
$ wget -qO- http://deb.debian.org/debian/dists/sid/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
$ wget -qO- http://archive.ubuntu.com/ubuntu/dists/devel/Release   grep -i ^Acquire-By-Hash:
Acquire-By-Hash: yes
What about other Debian derivatives?
$ wget -qO- http://http.kali.org/kali/dists/kali-rolling/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://archive.raspberrypi.com/debian/dists/trixie/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- http://packages.linuxmint.com/dists/faye/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
$ wget -qO- https://apt.pop-os.org/release/dists/noble/Release   grep -i ^Acquire-By-Hash:   echo not supported
not supported
Huhu, Acquire-By-Hash is not ubiquitous. But wait, what is Acquire-By-Hash to start with? To answer that, we have to take a step back and cover some basics first. The HTTP requests performed by 'apt update' What happens when one runs apt update? APT first requests the Release file from the repository(ies) configured in the APT sources. This file is a starting point, it contains a list of other files (sometimes called "Index files") that are available in the repository, along with their hashes. After fetching the Release file, APT proceeds to request those Index files. To give you an idea, there are many kinds of Index files, among which: There's an excellent Wiki page that details the structure of a Debian package repository, it's there: https://wiki.debian.org/DebianRepository/Format. Note that APT doesn't necessarily download ALL of those Index files. For simplicity, we'll limit ourselves to the minimal scenario, where apt update downloads only the Packages files. Let's try to make it more visual: here's a representation of a apt update transaction, assuming that all the components of the repository are enabled:
apt update -> Release -> Packages (main/amd64)
                      -> Packages (contrib/amd64)
                      -> Packages (non-free/amd64)
                      -> Packages (non-free-firmware/amd64)
Meaning that, in a first step, APT downloads the Release file, reads its content, and then in a second step it downloads the Index files in parallel. You can actually see that happen with a command such as apt -q -o Debug::Acquire::http=true update 2>&1 grep ^GET. For Kali Linux you'll see something pretty similar to what I described above. Try it!
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /kali/dists/kali-rolling/InRelease HTTP/1.1    # <- returns a redirect, that is why the file is requested twice
GET /kali/dists/kali-rolling/InRelease HTTP/1.1
GET /kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/main/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz HTTP/1.1
GET /kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz HTTP/1.1
However, and it's now becoming interesting, for Debian or Ubuntu you won't see the same kind of URLs:
$ podman run --rm debian:sid apt -q -o Debug::Acquire::http=true update 2>&1   grep ^GET
GET /debian/dists/sid/InRelease HTTP/1.1
GET /debian/dists/sid/main/binary-amd64/by-hash/SHA256/22709f0ce67e5e0a33a6e6e64d96a83805903a3376e042c83d64886bb555a9c3 HTTP/1.1
APT doesn't download a file named Packages, instead it fetches a file named after a hash. Why? This is due to the field Acquire-By-Hash: yes that is present in the Debian's Release file. What does Acquire-By-Hash mean for 'apt update' The idea with Acquire-By-Hash is that the Index files are named after their hash on the repository, so if the MD5 sum of main/binary-amd64/Packages is 77b2c1539f816832e2d762adb20a2bb1, then the file will be stored at main/binary-amd64/by-hash/MD5Sum/77b2c1539f816832e2d762adb20a2bb1. The path main/binary-amd64/Packages still exists (it's the "Canonical Location" of this particular Index file), but APT won't use it, instead it downloads the file located in the by-hash/ directory. Why does it matter? This has to do with repository updates, and allowing the package repository to be updated atomically, without interruption of service, and without risk of failure client-side. It's important to understand that the Release file and the Index files are part of a whole, a set of files that go altogether, given that Index files are validated by their hash (as listed in the Release file) after download by APT. If those files are simply named "Release" and "Packages", it means they are not immutable: when the repository is updated, all of those files are updated "in place". And it causes problems. A typical failure mode for the client, during a repository update, is that: 1) APT requests the Release file, then 2) the repository is updated and finally 3) APT requests the Packages files, but their checksum don't match, causing apt update to fail. There are variations of this error, but you get the idea: updating a set of files "in place" is problematic. The Acquire-By-Hash mechanism was introduced exactly to solve this problem: now the Index files have a unique, immutable name. When the repository is updated, at first new Index files are added in the by-hash/ directory, and only after the Release file is updated. Old Index files in by-hash/ are retained for a while, so there's a grace period during which both the old and the new Release files are valid and working: the Index files that they refer to are available in the repo. As a result: no interruption of service, no failure client-side during repository updates. This is explained in more details at https://www.chiark.greenend.org.uk/~cjwatson/blog/no-more-hash-sum-mismatch-errors.html, which is the blog post from Colin Watson that came out at the time Acquire-By-Hash was introduced in... 2016. This is still an excellent read in 2025. So you might be wondering why I'm rambling about a problem that's been solved 10 years ago, but then as I've shown in the introduction, the problem is not solved for everyone. Support for Acquire-By-Hash server side is not for granted, and unfortunately it never landed in reprepro, as one can see at https://bugs.debian.org/820660. reprepro is a popular tool for creating APT package repositories. In particular, at Kali Linux we use reprepro, and that's why there's no Acquire-By-Hash: yes in the Kali Release file. As one can guess, it leads to subtle issues during those moments when the repository is updated. However... we're not ready to talk about that yet! There's still another topic that we need to cover: this window of time during which a repository is being updated, and during which apt update might fail. The window for Hash Sum Mismatches, and the APT trick that saves the day Pay attention! In this section, we're now talking about packages repositories that do NOT support Acquire-By-Hash, such as the Kali Linux repository. As I've said above, it's only when the repository is being updated that there is a "Hash Sum Mismatch Window", ie. a moment when apt update might fail for some unlucky clients, due to invalid Index files. Surely, it's a very very short window of time, right? I mean, it can't take that long to update files on a server, especially when you know that a repository is usually updated via rsync, and rsync goes to great length to update files the most atomically as it can (with the option --delay=updates). So if apt update fails for me, I've been very unlucky, but I can just retry in a few seconds and it should be fixed, isn't it? The answer is: it's not that simple. So far I pictured the "package repository" as a single server, for simplicity. But it's not always what it is. For Kali Linux, by default users have http.kali.org configured in their APT sources, and it is a redirector, ie. a web server that redirects requests to mirrors that are nearby the client. Some context that matters for what comes next: the Kali repository is synced with ~70 mirrors all around the world, 4 times a day. What happens if your apt update requests are redirected to 2 mirrors close-by, and one was just synced, while the other is still syncing (or even worse, failed to sync entirely)? You'll get a mix of old and new Index files. Hash Sum Mismatch! As you can see, with this setup the "Hash Sum Mismatch Window" becomes much longer than a few seconds: as long as nearby mirrors are syncing the window is opened. You could have a fast and a slow mirror next to you, and they can be out of sync with each other for several minutes every time the repository is updated, for example. For Kali Linux in particular, there's a "detail" in our network of mirrors that, as a side-effect, almost guarantees that this window lasts several minutes at least. This is because the pool of mirrors includes kali.download which is in fact the Cloudflare CDN, and from the redirector point of view, it's seen as a "super mirror" that is present in every country. So when APT fires a bunch of requests against http.kali.org, it's likely that some of them will be redirected to the Kali CDN, and others will be redirected to a mirror nearby you. So far so good, but there's another point of detail to be aware of: the Kali CDN is synced first, before the other mirrors. Another thing: usually the mirrors that are the farthest from the Tier-0 mirror are the longest to sync. Packing all of that together: if you live somewhere in Asia, it's not uncommon for your "Hash Sum Mismatch Window" to be as long as 30 minutes, between the moment the Kali CDN is synced, and the moment your nearby mirrors catch up and are finally in sync as well. Having said all of that, and assuming you're still reading (anyone here?), you might be wondering... Does that mean that apt update is broken 4 times a day, for around 30 minutes, for every Kali user out there? How can they bear with that? Answer is: no, of course not, it's not broken like that. It works despite all of that, and this is thanks to yet another detail that we didn't go into yet. This detail lies in APT itself. APT is in fact "redirector aware", in a sense. When it fetches a Release file, and if ever the request is redirected, it then fires the subsequent requests against the server where it was initially redirected. So you are guaranteed that the Release file and the Index files are retrieved from the same mirror! Which brings back our "Hash Sum Mismatch Window" to the window for a single server, ie. something like a few seconds at worst, hopefully. And that's what makes it work for Kali, literally. Without this trick, everything would fall apart. For reference, this feature was implemented in APT back in... 2016! A busy year it seems! Here's the link to the commit: use the same redirection mirror for all index files. To finish, a dump from the console. You can see this behaviour play out easily, again with APT debugging turned on. Below we can see that only the first request hits the Kali redirector:
$ podman run --rm kali-rolling apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 302 Found
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/InRelease
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/main/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-rolling/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Interlude Believe it or not, we're done with the introduction! At this point, we have a good understanding of what apt update does (in terms of HTTP requests), we know that Release files and Index files are part of a whole, and we know that a repository can be updated atomically thanks to the Acquire-By-Hash feature, so that users don't experience interruption of service or failures of any sort, even with a rolling repository that is updated several times a day, like Debian sid. We've also learnt that, despite the fact that Acquire-By-Hash landed almost 10 years ago, some distributions like Kali Linux are still doing without it... and yet it works! But the reason why it works is more complicated to grasp, especially when you add a network of mirrors and a redirector to the picture. Moreover, it doesn't work as flawlessly as with the Acquire-By-Hash feature: we still expect some short (seconds at worst) "Hash Sum Mismatch Windows" for those unlucky users that run apt update at the wrong moment. This was a long intro, but that really sets the stage for what comes next: the edge cases. Some situations in which we can hit some Hash Sum Mismatch errors with Kali. Error cases that I've collected and investigated over the time... If anything, it supports the case that Acquire-By-Hash is really something that should be implemented in reprepro. More on that in the conclusion, but for now, let's look at those edge cases. Edge Case 1: the caching proxy If you put a caching proxy (such as approx, my APT caching proxy of choice) between yourself and the actual packages repository, then obviously it's the caching proxy that performs the HTTP requests, and therefore APT will never know about the redirections returned by the server, if any. So the APT trick of downloading all the Index files from the same server in case of redirect doesn't work anymore. It was rather easy to confirm that by building a Kali package during a mirror sync, and watch if fail at the "Update chroot" step:
$ sudo rm /var/cache/approx/kali/dists/ -fr
$ gbp buildpackage --git-builder=sbuild
+------------------------------------------------------------------------------+
  Update chroot                                Wed, 11 Jun 2025 10:33:32 +0000  
+------------------------------------------------------------------------------+
Get:1 http://http.kali.org/kali kali-dev InRelease [41.4 kB]
Get:2 http://http.kali.org/kali kali-dev/contrib Sources [81.6 kB]
Get:3 http://http.kali.org/kali kali-dev/main Sources [17.3 MB]
Get:4 http://http.kali.org/kali kali-dev/non-free Sources [122 kB]
Get:5 http://http.kali.org/kali kali-dev/non-free-firmware Sources [8297 B]
Get:6 http://http.kali.org/kali kali-dev/non-free amd64 Packages [197 kB]
Get:7 http://http.kali.org/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:8 http://http.kali.org/kali kali-dev/contrib amd64 Packages [120 kB]
Get:9 http://http.kali.org/kali kali-dev/main amd64 Packages [21.0 MB]
Err:9 http://http.kali.org/kali kali-dev/main amd64 Packages
  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
  Hashes of expected file:
   - Filesize:20984861 [weak]
   - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
   - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
   - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
  Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
Fetched 17.9 MB in 9s (1893 kB/s)
Reading package lists...
E: Failed to fetch http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz  File has unexpected size (20984689 != 20984861). Mirror sync in progress? [IP: ::1 9999]
   Hashes of expected file:
    - Filesize:20984861 [weak]
    - SHA256:6cbbee5838849ffb24a800bdcd1477e2f4adf5838a844f3838b8b66b7493879e
    - SHA1:a5c7e557a506013bd0cf938ab575fc084ed57dba [weak]
    - MD5Sum:1433ce57419414ffb348fca14ca1b00f [weak]
   Release file created at: Wed, 11 Jun 2025 07:15:10 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
E: apt-get update failed
The obvious workaround is to NOT use the redirector in the approx configuration. Either use a mirror close by, or the Kali CDN:
$ grep kali /etc/approx/approx.conf 
#kali http://http.kali.org/kali <- do not use the redirector!
kali  http://kali.download/kali
Edge Case 2: debootstrap struggles What if one tries to debootstrap Kali while mirrors are being synced? It can give you some ugly logs, but it might not be fatal:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Target architecture can be executed
I: Retrieving InRelease 
I: Checking Release signature
I: Valid Release signature (key id 827C8569F2518CC677FECA1AED65462EC8D5E4C5)
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
W: Retrying failed download of http://http.kali.org/kali/dists/kali-dev/main/binary-amd64/Packages.gz
I: Retrieving Packages 
I: Validating Packages 
I: Resolving dependencies of required packages...
I: Resolving dependencies of base packages...
I: Checking component main on http://http.kali.org/kali...
I: Retrieving adduser 3.152
[...]
To understand this one, we have to go and look at the debootstrap source code. How does debootstrap fetch the Release file and the Index files? It uses wget, and it retries up to 10 times in case of failure. It's not as sophisticated as APT: it doesn't detect when the Release file is served via a redirect. As a consequence, what happens above can be explained as such:
  1. debootstrap requests the Release file, gets redirected to a mirror, and retrieves it from there
  2. then it requests the Packages file, gets redirected to another mirror that is not in sync with the first one, and retrieves it from there
  3. validation fails, since the checksum is not as expected
  4. try again and again
Since debootstrap retries up to 10 times, at some point it's lucky enough to get redirected to the same mirror as the one from where it got its Release file from, and this time it gets the right Packages file, with the expected checksum. So ultimately it succeeds. Edge Case 3: post-debootstrap failure I like this one, because it gets us to yet another detail that we didn't talk about yet. So, what happens after we successfully debootstraped Kali? We have only the main component enabled, and only the Index file for this component have been retrieved. It looks like that:
$ sudo debootstrap kali-dev kali-dev http://http.kali.org/kali
[...]
I: Base system installed successfully.
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main
$ ls -l kali-dev/var/lib/apt/lists/
total 80468
-rw-r--r-- 1 root root    41445 Jun 19 07:02 http.kali.org_kali_dists_kali-dev_InRelease
-rw-r--r-- 1 root root 82299122 Jun 19 07:01 http.kali.org_kali_dists_kali-dev_main_binary-amd64_Packages
-rw-r--r-- 1 root root    40562 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release
-rw-r--r-- 1 root root      833 Jun 19 11:54 http.kali.org_kali_dists_kali-dev_Release.gpg
drwxr-xr-x 2 root root     4096 Jun 19 11:54 partial
So far so good. Next step would be to complete the sources.list with other components, then run apt update: APT will download the missing Index files. But if you're unlucky, that might fail:
$ sudo sed -i 's/main$/main contrib non-free non-free-firmware/' kali-dev/etc/apt/sources.list
$ cat kali-dev/etc/apt/sources.list
deb http://http.kali.org/kali kali-dev main contrib non-free non-free-firmware
$ sudo chroot kali-dev apt update
Hit:1 http://http.kali.org/kali kali-dev InRelease
Get:2 http://kali.download/kali kali-dev/contrib amd64 Packages [121 kB]
Get:4 http://mirror.sg.gs/kali kali-dev/non-free-firmware amd64 Packages [10.6 kB]
Get:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages [198 kB]
Err:3 http://mirror.freedif.org/kali kali-dev/non-free amd64 Packages
  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
  Hashes of expected file:
   - Filesize:10584 [weak]
   - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
   - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
   - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
  Release file created at: Fri, 23 May 2025 06:48:41 +0000
Fetched 336 kB in 4s (77.5 kB/s)  
Reading package lists... Done
E: Failed to fetch http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz  File has unexpected size (10442 != 10584). Mirror sync in progress? [IP: 66.96.199.63 80]
   Hashes of expected file:
    - Filesize:10584 [weak]
    - SHA256:71a83d895f3488d8ebf63ccd3216923a7196f06f088461f8770cee3645376abb
    - SHA1:c4ff126b151f5150d6a8464bc6ed3c768627a197 [weak]
    - MD5Sum:a49f46a85febb275346c51ba0aa8c110 [weak]
   Release file created at: Fri, 23 May 2025 06:48:41 +0000
E: Some index files failed to download. They have been ignored, or old ones used instead.
What happened here? Again, we need APT debugging options to have a hint:
$ sudo chroot kali-dev apt -q -o Debug::Acquire::http=true update 2>&1   grep -e ^Answer -e ^HTTP
Answer for: http://http.kali.org/kali/dists/kali-dev/InRelease
HTTP/1.1 304 Not Modified
Answer for: http://http.kali.org/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://http.kali.org/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 302 Found
Answer for: http://kali.download/kali/dists/kali-dev/contrib/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.sg.gs/kali/dists/kali-dev/non-free-firmware/binary-amd64/Packages.gz
HTTP/1.1 200 OK
Answer for: http://mirror.freedif.org/kali/dists/kali-dev/non-free/binary-amd64/Packages.gz
HTTP/1.1 200 OK
As we can see above, for the Release file we get a 304 (aka. "Not Modified") from the redirector. Why is that? This is due to If-Modified-Since also known as RFC-7232. APT supports this feature when it retrieves the Release file, it basically says to the server "Give me the Release file, but only if it's newer than what I already have". If the file on the server is not newer than that, it answers with a 304, which basically says to the client "You have the latest version already". So APT doesn't get a new Release file, it uses the Release file that is already present locally in /var/lib/apt/lists/, and then it proceeeds to download the missing Index files. And as we can see above: it then hits the redirector for each requests, and might be redirected to different mirrors for each Index file. So the important bit here is: the APT "trick" of downloading all the Index files from the same mirror only works if the Release file is served via a redirect. If it's not, like in this case, then APT hits the redirector for each files it needs to download, and it's subject to the "Hash Sum Mismatch" error again. In practice, for the casual user running apt update every now and then, it's not an issue. If they have the latest Release file, no extra requests are done, because they also have the latest Index files, from a previous apt update transaction. So APT doesn't re-download those Index files. The only reason why they'd have the latest Release file, and would miss some Index files, would be that they added new components to their APT sources, like we just did above. Not so common, and then they'd need to run apt update at a unlucky moment. I don't think many users are affected in practice. Note that this issue is rather new for Kali Linux. The redirector running on http.kali.org is mirrorbits, and support for If-Modified-Since just landed in the latest release, version 0.6. This feature was added by no one else than me, a great example of the expression "shooting oneself in the foot". An obvious workaround here is to empty /var/lib/apt/lists/ in the chroot after debootstrap completed. Or we could disable support for If-Modified-Since entirely for Kali's instance of mirrorbits. Summary and Conclusion The Hash Sum Mismatch failures above are caused by a combination of things: At the same time: All in all, it seems that all those issues would go away if only Acquire-By-Hash was supported in the Kali packages repository. Now is not a bad moment to try to land this feature in reprepro. After development halted in 2019, there's now a new upstream, and patches are being merged again. But it won't be easy: reprepro is a C codebase of around 50k lines of code, and it will take time and effort for the newcomer to get acquainted with the codebase, to the point of being able to implement a significant feature like this one. As an alternative, aptly is another popular tool to manage APT package repositories. And it seems to support Acquire-By-Hash already. Another alternative: I was told that debusine has (experimental) support for package repositories, and that Acquire-By-Hash is supported as well. Options are on the table, and I hope that Kali will eventually get support for Acquire-By-Hash, one way or another. To finish, due credits: this blog post exists thanks to my employer OffSec. Thanks for reading!

15 July 2025

Dirk Eddelbuettel: anytime 0.3.12 on CRAN: Minor Bugfix and Maintenance

A maintenance release 0.3.132 of the anytime package arrived on CRAN today. The package is fairly feature-complete, and code and functionality remain mature and stable. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character, factor, ordered, input format to either POSIXct (when called as anytime) or Date objects (when called as anydate) and to do so without requiring a format string as well as accomodating different formats in one input vector. See the anytime page, or the GitHub repo for a few examples, and the beautiful documentation site for all documentation. This release covers a corner case reported in a GitHub issue: the (nonsensical but possible) input of zero-length (floating point or integer) vectors was not dealt with properly which lead to an error. We now return the requested type (POSIXct or Date, depending on the call) also with length zero. Two minor maintenance tasks were also addressed since the last release six months ago. The short list of changes follows.

Changes in anytime version 0.3.12 (2025-07-14)
  • Continuous integration now uses r-ci action with embedded bootstrap
  • The versioned depends on Rcpp now requires 1.0.8 or newer to support use of the updated header file structure
  • The corner-case of an empty (numeric or integer) vector argument is now addressed, new tests have been added (#135)))

Courtesy of my CRANberries, there is also a diffstat report of changes relative to the previous release. The issue tracker tracker off the GitHub repo can be use for questions and comments. More information about the package is at the package page, the GitHub repo and the documentation site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

11 July 2025

Jamie McClelland: Avoiding Apache Max Request Workers Errors

Wow, I hate this error:
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
For starters, it means I have to relearn how MaxRequestWorkers functions in Apache:
For threaded and hybrid servers (e.g. event or worker), MaxRequestWorkers restricts the total number of threads that will be available to serve clients. For hybrid MPMs, the default value is 16 (ServerLimit) multiplied by the value of 25 (ThreadsPerChild). Therefore, to increase MaxRequestWorkers to a value that requires more than 16 processes, you must also raise ServerLimit.
Ok remind me what ServerLimit refers to?
For the prefork MPM, this directive sets the maximum configured value for MaxRequestWorkers for the lifetime of the Apache httpd process. For the worker and event MPMs, this directive in combination with ThreadLimit sets the maximum configured value for MaxRequestWorkers for the lifetime of the Apache httpd process. For the event MPM, this directive also defines how many old server processes may keep running and finish processing open connections. Any attempts to change this directive during a restart will be ignored, but MaxRequestWorkers can be modified during a restart. Special care must be taken when using this directive. If ServerLimit is set to a value much higher than necessary, extra, unused shared memory will be allocated. If both ServerLimit and MaxRequestWorkers are set to values higher than the system can handle, Apache httpd may not start or the system may become unstable. With the prefork MPM, use this directive only if you need to set MaxRequestWorkers higher than 256 (default). Do not set the value of this directive any higher than what you might want to set MaxRequestWorkers to. With worker, use this directive only if your MaxRequestWorkers and ThreadsPerChild settings require more than 16 server processes (default). Do not set the value of this directive any higher than the number of server processes required by what you may want for MaxRequestWorkers and ThreadsPerChild. With event, increase this directive if the process number defined by your MaxRequestWorkers and ThreadsPerChild settings, plus the number of gracefully shutting down processes, is more than 16 server processes (default).
Got it? In other words, you can consider raising the MaxRequestWorkers setting all you want, but you can t just change that setting, you have to read about several other compliated settings, do some math, and spend a lot of time wondering if you are going to remember what you just did and how to undo it if you blow up your server. On the plus side, typically, nobody should increase this limit - because if the server runs out of connections, it usually means something else is wrong. In our case, on a shared web server running Apache2 and PHP-FPM, it s usually because a single web site has gone out of control. But wait! How can that happen, we are using PHP-FPM s max_children setting to prevent a single PHP web site from taking down the server? After years of struggling with this problem I have finally made some headway. Our PHP pool configuration typically looks like this:
user = site342999writer
group = site342999writer
listen = /run/php/8.1-site342999.sock
listen.owner = www-data
listen.group = www-data
pm = ondemand
pm.max_children = 12
pm.max_requests = 500
php_admin_value[memory_limit] = 256M
And we invoke PHP-FPM via this apache snippet:
<FilesMatch \.php$>
        SetHandler "proxy:unix:/var/run/php/8.1-site342999.sock fcgi://localhost"
</FilesMatch>
With these settings in place, what happens when we use up all 12 max_children? According to the docs:
By default, mod_proxy will allow and retain the maximum number of connections that could be used simultaneously by that web server child process. Use the max parameter to reduce the number from the default. The pool of connections is maintained per web server child process, and max and other settings are not coordinated among all child processes, except when only one child process is allowed by configuration or MPM design.
The max parameter seems to default to the ThreadsPerChild, so it seems that the default here is to allow any web site to consume ThreadsPerChild (25) x ServerLimit (16), which is also the max number of over all connections. Not great. To make matter worse, there is another setting available which is mysteriously called acquire:
If set, this will be the maximum time to wait for a free connection in the connection pool, in milliseconds. If there are no free connections in the pool, the Apache httpd will return SERVER_BUSY status to the client.
By default this is not set which seems to suggest Apache will just hang on to connections forever until a free PHP process becomes available (or some other time out happens). So, let s try something different:
 <Proxy "fcgi://localhost">
    ProxySet acquire=1 max=12
  </proxy>
This snippet is the way you can configure the proxy configuration we setup in the SetHandler statement above. It s documented on the Apache mod_proxy page. Now we limit the maximum pool size per process to half of what is available for the entire server and we tell Apache to immediately throw a 503 error if we have exceeded our maximum number of connecitons. Now, if a site is overwhelmed with traffic, instead of maxing out the available Apache connections while leaving user with constantly spinning browsers, the users will get 503 ed and the server will be able to server other sites.

10 July 2025

David Bremner: Hibernate on the pocket reform 4/n

Context

Log from (failed) platform test After some fun I got the serial console working and re-ran the platform test. After a bit of reading the serial console, I realized that rmmod dwc3 was causing more problems than it solved, in particularly reliable hard lockup on one of the CPUs. My revised test script is
set -x
echo platform >  /sys/power/pm_test
echo reboot > /sys/power/disk
sleep 2
rmmod mt76x2u
sleep 2
echo disk >  /sys/power/state
sleep 2
modprobe mt76x2u
The current problem seems to be pcie not resuming properly.
[   65.306842] usbcore: deregistering interface driver mt76x2u
[   65.343606] wlx000a5205eb2d: deauthenticating from 20:05:b7:00:2d:89 by local choice (Reason: 3=DEAUTH_LEAVING)
[   67.995239] PM: hibernation: hibernation entry
[   68.048103] Filesystems sync: 0.022 seconds
[   68.049005] Freezing user space processes
[   68.051075] Freezing user space processes completed (elapsed 0.001 seconds)
[   68.051760] OOM killer disabled.
[   68.052597] PM: hibernation: Basic memory bitmaps created
[   68.053108] PM: hibernation: Preallocating image memory
[   69.719040] PM: hibernation: Allocated 366708 pages for snapshot
[   69.719650] PM: hibernation: Allocated 1466832 kbytes in 1.66 seconds (883.63 MB/s)
[   69.720370] Freezing remaining freezable tasks
[   69.723558] Freezing remaining freezable tasks completed (elapsed 0.002 seconds)
[   69.728002] rk_gmac-dwmac fe1b0000.ethernet end0: Link is Down
[   69.992324] rockchip-dw-pcie a40c00000.pcie: Failed to receive PME_TO_Ack
[   69.993405] PM: hibernation: debug: Waiting for 5 seconds.
[   76.059484] rockchip-dw-pcie a40c00000.pcie: Phy link never came up
[   76.060043] rockchip-dw-pcie a40c00000.pcie: fail to resume
[   76.060546] rockchip-dw-pcie a40c00000.pcie: PM: dpm_run_callback(): genpd_restore_noirq returns -110
[   76.061363] rockchip-dw-pcie a40c00000.pcie: PM: failed to restore noirq: error -110
previous episode next episode

4 July 2025

Sahil Dhiman: Secondary Authoritative Name Server Options for Self-Hosted Domains

In the past few months, I have moved authoritative name servers (NS) of two of my domains (sahilister.net and sahil.rocks) in house using PowerDNS. Subdomains of sahilister.net see roughly 320,000 hits/day across my IN and DE mirror nodes, so adding secondary name servers with good availability (in addition to my own) servers was one of my first priorities. I explored the following options for my secondary NS, which also didn t cost me anything:

1984 Hosting

Hurriance Electric

Afraid.org

Puck

NS-Global

Asking friends Two of my friends and fellow mirror hosts have their own authoritative name server setup, Shrirang (ie albony) and Luke. Shirang gave me another POP in IN and through Luke (who does have an insane amount of in-house NS, see dig ns jing.rocks +short), I added a JP POP. If we know each other, I would be glad to host a secondary NS for you in (IN and/or DE locations).

Some notes
  • Adding a third-party secondary is putting trust that the third party would serve your zone right.
  • Hurricane Electric and 1984 hosting provide multiple NS. One can use some or all of them. Ideally, you can get away with just using your own with full set from any of these two. Play around with adding and removing secondaries, which gives you the best results. . Using everyone is anyhow overkill, unless you have specific reasons for it.
  • Moving NS in-house isn t that hard. Though, be prepared to get it wrong a few times (and some more). I have already faced partial outages because:
    • Recursive resolvers (RR) in the wild behave in a weird way and cache the wrong NS response for longer time than in TTL.
    • NS expiry took more than time. 2 out of 3 of my Netim s NS (my domain registrar) had stopped serving my domain, while RRs in the wild hadn t picked up my new in-house NS. I couldn t really do anything about it, though.
    • Dot is pretty important at the end.
    • With HE.net, I forgot to delegate my domain on their panel and just added in my NS set, thinking I ve already done so (which I did but for another domain), leading to a lame server situation.
  • In terms of serving traffic, there s no distinction between primary and secondary NS. RR don t really care who they re asking the query to. So one can have hidden primary too.
  • I initially thought of adding periodic RIPE Atlas measurements from the global set but thought against it as I already host a termux mirror, which brings in thousands of queries from around the world leading to a diverse set of RRs querying my domain already.
  • In most cases, query resolution time would increase with out of zone NS servers (which most likely would be in external secondary). 1 query vs. 2 queries. Pay close attention to ADDITIONAL SECTION Shrirang s case followed by mine:
$ dig ns albony.in
; <<>> DiG 9.18.36 <<>> ns albony.in
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 60525
;; flags: qr rd ra; QUERY: 1, ANSWER: 4, AUTHORITY: 0, ADDITIONAL: 9
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;albony.in.			IN	NS
;; ANSWER SECTION:
albony.in.		1049	IN	NS	ns3.albony.in.
albony.in.		1049	IN	NS	ns4.albony.in.
albony.in.		1049	IN	NS	ns2.albony.in.
albony.in.		1049	IN	NS	ns1.albony.in.
;; ADDITIONAL SECTION:
ns3.albony.in.		1049	IN	AAAA	2a14:3f87:f002:7::a
ns1.albony.in.		1049	IN	A	82.180.145.196
ns2.albony.in.		1049	IN	AAAA	2403:44c0:1:4::2
ns4.albony.in.		1049	IN	A	45.64.190.62
ns2.albony.in.		1049	IN	A	103.77.111.150
ns1.albony.in.		1049	IN	AAAA	2400:d321:2191:8363::1
ns3.albony.in.		1049	IN	A	45.90.187.14
ns4.albony.in.		1049	IN	AAAA	2402:c4c0:1:10::2
;; Query time: 29 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Jul 04 07:57:01 IST 2025
;; MSG SIZE  rcvd: 286
vs mine
$ dig ns sahil.rocks
; <<>> DiG 9.18.36 <<>> ns sahil.rocks
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64497
;; flags: qr rd ra; QUERY: 1, ANSWER: 11, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;sahil.rocks.			IN	NS
;; ANSWER SECTION:
sahil.rocks.		6385	IN	NS	ns5.he.net.
sahil.rocks.		6385	IN	NS	puck.nether.net.
sahil.rocks.		6385	IN	NS	colin.sahilister.net.
sahil.rocks.		6385	IN	NS	marvin.sahilister.net.
sahil.rocks.		6385	IN	NS	ns2.afraid.org.
sahil.rocks.		6385	IN	NS	ns4.he.net.
sahil.rocks.		6385	IN	NS	ns2.albony.in.
sahil.rocks.		6385	IN	NS	ns3.jing.rocks.
sahil.rocks.		6385	IN	NS	ns0.1984.is.
sahil.rocks.		6385	IN	NS	ns1.1984.is.
sahil.rocks.		6385	IN	NS	ns-global.kjsl.com.
;; Query time: 24 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Fri Jul 04 07:57:20 IST 2025
;; MSG SIZE  rcvd: 313
  • Theoretically speaking, a small increase/decrease in resolution would occur based on the chosen TLD and the popularity of the TLD in query originators area (already cached vs. fresh recursion).
  • One can get away with having only 3 NS (or be like Google and have 4 anycast NS or like Amazon and have 8 or like Verisign and make it 13 :P).
  • Nowhere it s written, your NS needs not to be called dns* or ns1, ns2 etc. Get creative with naming NS; be deceptive with the naming :D.
  • A good understanding of RR behavior can help engineer a good authoritative NS system.

Further reading

30 June 2025

Colin Watson: Free software activity in June 2025

My Debian contributions this month were all sponsored by Freexian. This was a very light month; I did a few things that were easy or that seemed urgent for the upcoming trixie release, but otherwise most of my energy went into Debusine. I ll be giving a talk about that at DebConf in a couple of weeks; this is the first DebConf I ll have managed to make it to in over a decade, so I m pretty excited. You can also support my work directly via Liberapay or GitHub Sponsors. PuTTY After reading a bunch of recent discourse about X11 and Wayland, I decided to try switching my laptop (a Framework 13 AMD running Debian trixie with GNOME) over to Wayland. I don t remember why it was running X; I think I must have either inherited some configuration from my previous laptop (in which case it could have been due to anything up to ten years ago or so), or else I had some initial problem while setting up my new laptop and failed to make a note of it. Anyway, the switch was hardly noticeable, which was great. One problem I did notice is that my preferred terminal emulator, pterm, crashed after the upgrade. I run a slightly-modified version from git to make some small terminal emulation changes that I really must either get upstream or work out how to live without one of these days, so it took me a while to notice that it only crashed when running from the packaged version, because the crash was in code that only runs when pterm has a set-id bit. I reported this upstream, they quickly fixed it, and I backported it to the Debian package. groff Upstream bug #67169 reported URLs being dropped from PDF output in some cases. I investigated the history both upstream and in Debian, identified the correct upstream patch to backport, and uploaded a fix. libfido2 I upgraded libfido2 to 1.16.0 in experimental. Python team I upgraded pydantic-extra-types to a new upstream version, and fixed some resulting fallout in pendulum. I updated python-typing-extensions in bookworm-backports, to help fix python3-tango: python3-pytango from bookworm-backports does not work (10.0.2-1~bpo12+1). I upgraded twisted to a new upstream version in experimental. I fixed or helped to fix a few release-critical bugs:

Russell Coker: Links June 2025

Jonathan McDowell wrote part 2 of his blog series about setting up a voice assistant on Debian, I look forward to reading further posts [1]. I m working on some related things for Debian that will hopefully work with this. I m testing out OpenSnitch on Trixie inspired by this blog post, it s an interesting package [2]. Valerie wrote an informative article about creating mesh networks using LORA for emergency use [3]. Interesting article about Signal and Windows Recall. That gives us some things to consider regarding ML features on Linux systems [4]. Insightful article about AI and the end of prestige [5]. We should all learn about LLMs. Jonathan Dowland wrote an informative blog post about how to manage namespaces on Linux [6]. The Consumer Rights wiki is a great resource for raising awareness of corporations exploiting their customers for computer related goods and services [7]. Interesting article about Schizophrenia and the cliff-edge function of evolution [8].

Next.

Previous.