Search Results: "ks"

16 June 2025

Paul Tagliamonte: The Promised LAN

The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There s a lot of good things about the Internet as of 2025, but there s also an inescapable hole in what it used to be, for me. I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are brave enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level. I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which security offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other. Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month. This is wrong, and those who have seen what was know it. I can t keep doing it. I m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it.

What comes before part b? Throughout the 2000s, some of my favorite memories were from LAN parties at my friends places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working being able to do something fiercely technical in the context of a uniquely social activity. It wasn t really much about the games or the projects it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven t been back to campus in a few years, but I d be surprised if this were still the case. In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it. Over the years, it s evolved very slowly we haven t pulled any all-nighters. Our mantra has become old growth , building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as The Bureau of LAN Management . Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. Nothing here is exotic or even technically interesting.

Applications of trusting trust The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there s no shortage of friends that meet the qualifications but we are not equipped to take that many folks on. It s hard to join -both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It s worked for the last 4 years, and it should continue working until we think of something better. We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they re let into our friends' network, there is a level of trust that works transitively for us I trust them to be on mine. This LAN is not for security , rather, the network border is a social one. Benign hacking in the original sense of misusing systems to do fun and interesting things is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal not technical failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms. Over the last 4 years, we ve grown our own culture and fads around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other s counters. It s incredible how much network transport and a trusting culture gets you there s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of redphones .

DIY We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug. This is a call for you to do the same. Build your own LAN. Connect it with friends homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily. Bring back what we re missing.

Kentaro Hayashi: Fixing long standing font issue about Debian Graphical Installer

Introduction This is just a note-taking about how fixed the long standing font issue about Debian Graphical Installer for up-coming trixie ready. Recently, this issue had been resolved by Cyril Brulebois. Thanks!

What is the problem? Because of Han unification, wrong font typefaces are rendered by default when you choose Japanese language using Graphical Debian installer.
"Wrong" glyph for Japanese
Most of typefaces seems correct, but there are wrong typefaces (Simplified Chinese) which is used for widget rendering. This issue will not be solved during using DroidSansFallback.ttf continuously for Japanese. Thus, it means that we need to switch font itself which contains Japanese typeface to fix this issue. If you wan to know about how Han Unification is harmful in this context, See

What causes this problem? In short, fonts-android (DroidSansFallback.ttf) had been used for CJK, especially for Japanese. Since Debian 9 (stretch), fonts-android was adopted for CJK fonts by default. Thus this issue was not resolved in Debian 9, Debian 10, Debian 11 and Debian 12 release cycle!

What is the impact about this issue? For sadly, Japanese native speakers can recognize such a unexpectedly rendered "Wrong" glyph, so it is not hard to continue Debian installation process. Even if there is no problem with the installer's functionality, it gives a terrible user experience for newbie. For example, how can you trust an installer which contains full of typos? It is similar situation for Japanese users.

How Debian Graphical Installer was fixed? In short, newly fonts-motoya-l-cedar-udeb was bundled for Japanese, and changed to switch that font via gtk-set-font command. It was difficult that what is the best font to deal font file size and visibility. Typically Japanese font file occupies extra a few MB. Luckily, some space was back for Installer, it was not seen as a problem (I guess). As a bonus, we tried to investigate a possibility of font compression mechanism for Installer, but it was regarded as too complicated and not suitable for trixie release cycle.

Conclution
  • The font issue was fixed in Debian Graphical Installer for Japanese
  • As recently fixed, not officially shipped yet (NOTE Debian Installer Trixie RC1 does not contain this fix) Try daily build installer if you want.
This article was written with Ultimate Hacking Keyboard 60 v2 with Rizer 60 (New my gear!).

15 June 2025

Iustin Pop: Markdown lint and site cleanup

I was not aware that one can write bad Markdown, since Markdown has such a simple syntax, that I thought you just write, and it s fine. Na ve, I know! I ve started editing the files for this blog/site with Visual Studio Code too, and I had from another project the markdown lint extension installed, so as I was opening old files, more and more problems appeared. On a whim, I searched and found the lint all files command, and after running it, oops more than 400 problems! Now, some of them were entirely trivial and a matter of subjective style, like mixing both underscore and asterisk for emphasis in a single file, and asterisks and dashes for list items. Others, seemingly trivial like tab indentation, were actually also causing rendering issues, so fixing that solved a real cosmetic issue. But some of the issues flagged were actual problems. For example, one sentence that I had, was:
there seems to be some race condition between <something> and ntp
Here something was interpreted as an (invalid) HTML tag, and not rendered at all. Another problem, but more minor, was that I had links to Wikipedia with spaces in the link name, which Visual Studio Code breaks at first space, rather than encoded spaces or underscores-based, as Wikipedia generates today. In the rendered output, Pandoc seemed to do the right think though. However, the most interesting issue that was flagged was no details in HTML links, i.e. links of the form:
for more details, see [here](http://example.com).
Which works for non-visually impaired people, but not for people using assistive technologies. And while trying to fix this, it turns out that you can do much better, for everyone, because here is really non-descriptive. You can use either the content as label ( an article about configuring BIND ), or the destination ( an article on this-website ), rather than the plain here . The only, really only check I disabled, was tweaking the trailing punctuation checks in headers, as I really like to write a header that ends with exclamation marks. I like exclamation marks in general! So why not use them in headers too. The question mark is allowlisted by default, though that I use rarely. During the changes/tweaks, I also did random improvements, but I didn t change the updated tag, since most of them were minor. But a non-minor thing was tweaking the CSS for code blocks, since I had a really stupid non-symmetry between top and bottom padding (5px vs 0), and which I don t know where it came from. But the MDN article on padding has as an example exactly what I had (except combined, I had it split). Did I just copy blindly? Possible So, all good and then, and I hope this doesn t trigger a flow of updates on any aggregators, since all the changes were really trivial. And while I don t write often, I did touch about 60 posts or pages, ouch! Who knew that changing editors can have such a large impact

Sahil Dhiman: A Look at .UA ccTLD Authoritative Name Servers

I find the case of the .UA country code top level domain (ccTLD) interesting simply because of the different name server secondaries they have now. Post Russian invasion, the cyber warfare peaked, and critical infrastructure like getting one side ccTLD down would be big news in anycase. Most (g/cc)TLDs are served by two (and less likely) by three or more providers. Even in those cases, not all authoritative name servers are anycasted. Take, example of .NL ccTLD name servers:
$ dig ns nl +short
ns1.dns.nl.
ns3.dns.nl.
ns4.dns.nl.
ns1.dns.nl is SIDN which also manages their registry. ns3.dns.nl is ReCodeZero/ipcom, another anycast secondary. ns4.dns.nl is CIRA, anycast secondary. That s 3 diverse, anycast networks to serve the .NL ccTLD. .DE has a bit more at name servers at 6 but only 3 seems anycasted. Now let s take a look at .UA. Hostmaster LLC is the registry operator of the .UA ccTLD since 2001.
$ dig soa ua +short
in1.ns.ua. domain-master.cctld.ua. 2025061434 1818 909 3024000 2020
Shows in1.ns.ua as primary nameserver (which can be intentionally deceptive too). I used bgp.tools for checking anycast and dns.coffee for timeline of when secondary nameserver was added. dns.coffee only has data going back till 2011 though. Let s deep dive at who s hosting each of the name servers:

in1.ns.ua by Intuix LLC
  • 74.123.224.40
  • 2604:ee00:0:101:0:0:0:40
  • unicast
  • Serving .UA since 13/12/2018.
  • Company by Dmitry Kohmanyuk and Igor Sviridov who re administrative and technical contacts for .UA zone as well as the IANA DB.

ho1.ns.ua by Hostmaster LLC
  • 195.47.253.1
  • 2001:67c:258:0:0:0:0:1
  • bgp.tools doesn t mark the prefix as anycast but basis test from various location, this is indeed anycasted (visible in atleast DE, US, UA etc.). Total POPs unknown.
  • Serving .UA atleast since 2011.
  • The registry themselves.

bg.ns.ua by ClouDNS
  • 185.136.96.185 and 185.136.97.185
  • 2a06:fb00:1:0:0:0:4:185 and 2a06:fb00:1:0:0:0:2:185
  • anycast
  • Serving .UA since 01/03/2022.
  • atleast 62 PoPs

cz.ns.ua by NIC.cz

nn.ns.ua by Netnod
  • 194.58.197.4
  • 2a01:3f1:c001:0:0:0:0:53
  • anycast
  • atleast 80 PoPs.
  • Serving .UA since 01/12/2022.
  • Netnod has the distinction of being one of the 13 root server operator (i.root-servers.net) and .SE operator.

pch.ns.ua by PCH
  • 204.61.216.12
  • 2001:500:14:6012:ad:0:0:1
  • anycast
  • atleast 328 POPs.
  • Serving .UA atleast since 2011.
  • With more than 36 years of production anycast DNS experience, two of the root name server operators and more than 172 top-level domain registries using our infrastructure, and more than 120 million resource records in service from https://www.pch.net/services/anycast.

rcz.ns.ua by RcodeZero
  • 193.46.128.10
  • 2a02:850:ffe0:0:0:0:0:10
  • anycast
  • Atleast 56 PoPs via 2 different cloud providers.
  • Serving .UA since 04/02/2022.
  • sister company of nic.at (.AT operator).

Some points to note
  • That s 1 unicast and 6 anycast name servers with hundreds of POPs from 7 different organizations.
  • Having X number of Point of Presence (POP) doesn t always mean each location is serving the .UA nameserver prefix.
  • Number of POPs keeps going up or down based on operational requirements and optimizations.
  • Highest concentration of DNS queries for a ccTLD would essentially originate in the country (or larger region) itself. If one of the secondaries doesn t have POP inside UA, the query might very well be served from outside the country, which can affect resolution and may even stop during outages and fiber cuts (which have become common there it seems). - Global POPs do help in faster resolutions for others/outside users though and ofcourse availability.
  • Having this much diversity does lessen the chance of the ccTLD going down. Theoretically, the adversary has to bring down 7 different networks/setups before resolution starts failing (post TTLs expiry).

12 June 2025

Dirk Eddelbuettel: #50: Introducing almm: Activate-Linux (based) Market Monitor

Welcome to post 50 in the R4 series. Today we reconnect to a previous post, namely #36 on pub/sub for live market monitoring with R and Redis. It introduced both Redis as well as the (then fairly recent) extensions to RcppRedis to support the publish-subscibe ( pub/sub ) model of Redis. In short, it manages both subscribing clients as well as producer for live, fast and lightweight data transmission. Using pub/sub is generally more efficient than the (conceptually simpler) poll-sleep loops as polling creates cpu and network load. Subscriptions are lighterweight as they get notified, they are also a little (but not much!) more involved as they require a callback function. We should mention that Redis has a recent fork in Valkey that arose when the former did one of these non-uncommon-among-db-companies licenuse suicides which, happy to say, they reversed more recently so that we now have both the original as well as this leading fork (among others). Both work, the latter is now included in several Linux distros, and the C library hiredis used to connect to either is still licensed permissibly as well. All this came about because Yahoo! Finance recently had another hickup in which they changed something leading to some data clients having hiccups. This includes GNOME applet Stocks Extension I had been running. There is a lively discussion on its issue #120 suggestions for example a curl wrapper (which then makes each access a new system call). Separating data acquisition and presentation becomes an attractive alternative, especially given how the standard Python and R accessors to the Yahoo! Finance service continued to work (and how per post #36 I already run data acquisition). Moreoever, and somewhat independently, it occurred to me that the cute (and both funny in its pun, and very pretty in its display) ActivateLinux program might offer an easy-enough way to display updates on the desktop. There were two aspects to address. First, the subscription side needed to be covered in either plain C or C++. That, it turns out, is very straightforward and there are existing documentation and prior examples (e.g. at StackOverflow) as well as the ability to have an LLM generate a quick stanza as I did with Claude. A modified variant is now in the example repo redis-pubsub-examples in file subscriber.c. It is deliberately minimal and the directory does not even have a Makefile: just compile and link against both libevent (for the event loop controlling this) and libhiredis (for the Redis or Valkey connection). This should work on any standard Linux (or macOS) machine with those two (very standard) libraries installed. The second aspect was trickier. While we can get Claude to modify the program to also display under x11, it still uses a single controlling event loop. It took a little bit of probing on my event to understand how to modify (the x11 use of) ActivateLinux, but as always it was reasonably straightforward in the end: instead of one single while loop awaiting events we now first check for pending events and deal with them if present but otherwise do not idle and wait but continue in another loop that also checks on the Redis or Valkey pub/sub events. So two thumbs up to vibe coding which clearly turned me into an x11-savvy programmer too The result is in a new (and currently fairly bare-bones) repo almm. It includes all files needed to build the application, borrowed with love from ActivateLinux (which is GPL-licensed, as is of course our minimal extension) and adds the minimal modifications we made, namely linking with libhiredis and some minimal changes to x11/x11.c. (Supporting wayland as well is on the TODO list, and I also need to release a new RcppRedis version to CRAN as one currently needs the GitHub version.) We also made a simple mp4 video with a sound overlay which describes the components briefly: Comments and questions welcome. I will probably add a little bit of command-line support to the almm. Selecting the symbol subscribed to is currently done in the most minimal way via environment variable SYMBOL (NB: not SYM as the video using the default value shows). I also worked out how to show the display only one of my multiple monitors so I may add an explicit screen id selector too. A little bit of discussion (including minimal Docker use around r2u) is also in issue #121 where I first floated the idea of having StocksExtension listen to Redis (or Valkey). Other suggestions are most welcome, please use issue tickets at the almm repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

11 June 2025

Iustin Pop: This blog finally goes git-annex!

A long, long time ago I have a few pictures on this blog, mostly in earlier years, because even with small pictures, the git repository became 80MiB soon this is not much in absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny compared to the picture, PDFs and fonts. I realised I need a better solution, probably about ten years ago, and that I should investigate git-annex. Then time passed, and I heard about git-lfs, so I thought that s the way forward. Now, I recently got interested again into doing something about this repository, and started researching.

Detour: git-lfs I was sure that git-lfs, being supported by large providers, would be the modern solution. But to my surprise, git-lfs is very server centric, which in hindsight makes sense, but for a home setup, it s not very good. Maybe I misunderstood, but git-lfs is more a protocol/method for a forge to store files, rather than an end-user solution. But then you need to backup those files separately (together with the rest of the forge), or implement another way of safeguarding them. Further details such as the fact that it keeps two copies of the files (one in the actual checked-out tree, one in internal storage) means it s not a good solution. Well, for my blog yes, but not in general. Then posts on Reddit about horror stories people being locked out of github due to quota, as an example, or this Stack Overflow post about git-lfs constraining how one uses git, convinced me that s not what I want. To each their own, but not for me I might want to push this blog s repo to github, but I definitely wouldn t want in that case to pay for github storage for my blog images (which are copies, not originals). And yes, even in 2025, those quotas are real GitHub limits and I agree with GitHub, storage and large bandwidth can t be free.

Back to the future: git-annex So back to git-annex. I thought it s going to be a simple thing, but oh boy, was I wrong. It took me half a week of continuous (well, in free time) reading and discussions with LLMs to understand a bit how it works. I think, honestly, it s a bit too complex, which is why the workflows page lists seven (!) levels of workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the author for the awesome tool, but if you need a web app to help you manage git, it hints that the tool is too complex. I made the mistake of running git annex sync once, to realise it actually starts pushing to my upstream repo and creating new branches and whatnot, so after enough reading, I settled on workflow 6/7, since I don t want another tool to manage my git history. Maybe I m an outlier here, but everything automatic is a bit too much for me. Once you do managed yourself how git-annex works (on the surface, at least), it is a pretty cool thing. It uses a git-annex git branch to store metainformation, and that is relatively clean. If you do run git annex sync, it creates some extra branches, which I don t like, but meh.

Trick question: what is a remote? One of the most confusing things about git-annex was understanding its remote concept. I thought a remote is a place where you replicate your data. But not, that s a special remote. A normal remote is a git remote, but which is expected to be git/ssh/with command line access. So if you have a git+ssh remote, git-annex will not only try to push it s above-mentioned branch, but also copy the files. If such a remote is on a forge that doesn t support git-annex, then it will complain and get confused. Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true, and it will understand that it should not sync to it. But, aside, from this case, git-annex expects that all checkouts and clones of the repository are both metadata and data. And if you do any annex commands in them, all other clones will know about them! This can be unexpected, and you find people complaining about it, but nowadays there s a solution:
git clone   dir && cd dir
git config annex.private true
git annex init "temp copy"
This is important. Any leaf git clone must be followed by that annex.private true config, especially on CI/CD machines. Honestly, I don t understand why by default clones should be official data stores, but it is what it is. I settled on not making any of my checkouts stable , but only the actual storage places. Except those are not git repositories, but just git-annex storage things. I.e., special remotes. Is it confusing enough yet ?

Special remotes The special remotes, as said, is what I expected to be the normal git annex remotes, i.e. places where the data is stored. But well, they exist, and while I m only using a couple simple ones, there is a large number of them. Among the interesting ones: git-lfs, a remote that allows also storing the git repository itself (git-remote-annex), although I m bit confused about this one, and most of the common storage providers via the rclone remote. Plus, all of the special remotes support encryption, so this is a really neat way to store your files across a large number of things, and handle replication, number of copies, from which copy to retrieve, etc. as you with.

And many of other features git-annex has tons of other features, so to some extent, the sky s the limit. Automatic selection of what to add git it vs plain git, encryption handling, number of copies, clusters, computed files, etc. etc. etc. I still think it s cool but too complex, though!

Uses Aside from my blog post, of course. I ve seen blog posts/comments about people using git-annex to track/store their photo collection, and I could see very well how the remote encrypted repos any of the services supported by rclone could be an N+2 copy or so. For me, tracking photos would be a bit too tedious, but it could maybe work after more research. A more practical thing would probably be replicating my local movie collection (all legal, to be clear) better than just run rsync from time to time and tracking the large files in it via git-annex. That s an exercise for another day, though, once I get more mileage with it - my blog pictures are copies, so I don t care much if they get lost, but movies are primary online copies, and I don t want to re-dump the discs. Anyway, for later.

Migrating to git-annex Migrating here means ending in a state where all large files are in git-annex, and the plain git repo is small. Just moving the files to git annex at the current head doesn t remove them from history, so your git repository is still large; it won t grow in the future, but remains with old size (and contains the large files in its history). In my mind, a nice migration would be: run a custom command, and all the history is migrated to git-annex, so I can go back in time and the still use git-annex. I na vely expected this would be easy and already available, only to find comments on the git-annex site with unsure git-filter-branch calls and some web discussions. This is the discussion on the git annex website, but it didn t make me confident it would do the right thing. But that discussion is now 8 years old. Surely in 2025, with git-filter-repo, it s easier? And, maybe I m missing something, but it is not. Not from the point of view of plain git, that s easy, but because interacting with git-annex, which stores its data in git itself, so doing this properly across successive steps of a repo (when replaying the commits) is, I think, not well defined behaviour. So I was stuck here for a few days, until I got an epiphany: As I m going to rewrite the repository, of course I m keeping a copy of it from before git-annex. If so, I don t need the history, back in time, to be correct in the sense of being able to retrieve the binary files too. It just needs to be correct from the point of view of the actual Markdown and Haskell files that represent the meat of the blog. This simplified the problem a lot. At first, I wanted to just skip these files, but this could also drop commits (git-filter-repo, by default, drops the commits if they re empty), and removing the files loses information - when they were added, what were the paths, etc. So instead I came up with a rather clever idea, if I might say so: since git-annex replaces files with symlinks already, just replace the files with symlinks in the whole history, except symlinks that are dangling (to represent the fact that files are missing). One could also use empty files, but empty files are more valid in a sense than dangling symlinks, hence why I settled on those. Doing this with git-filter-repo is easy, in newer versions, with the new --file-info-callback. Here is the simple code I used:
import os
import os.path
import pathlib

SKIP_EXTENSIONS= 'jpg', 'jpeg', 'png', 'pdf', 'woff', 'woff2' 
FILE_MODES =  b"100644", b"100755" 
SYMLINK_MODE = b"120000"

fas_string = filename.decode()
path = pathlib.PurePosixPath(fas_string)
ext = path.suffix.removeprefix('.')

if ext not in SKIP_EXTENSIONS:
  return (filename, mode, blob_id)

if mode not in FILE_MODES:
  return (filename, mode, blob_id)

print(f"Replacing ' filename ' (extension '. ext ') in  os.getcwd() ")

symlink_target = '/none/binary-file-removed-from-git-history'.encode()
new_blob_id = value.insert_file_with_contents(symlink_target)
return (filename, SYMLINK_MODE, new_blob_id)
This goes and replaces files with a symlink to nowhere, but the symlink should explain why it s dangling. Then later renames or moving the files around work naturally , as the rename/mv doesn t care about file contents. Then, when the filtering is done via:
git-filter-repo --file-info-callback <(cat ~/filter-big.py ) --force
It is easy to onboard to git annex:
  • remove all dangling symlinks
  • copy the (binary) files from the original repository
  • since they re named the same, and in the same places, git sees a type change
  • then simply run git annex add on those files
For me it was easy as all such files were in a few directories, so just copying those directories back, a few git-annex add commands, and done. Of course, then adding a few rsync remotes, git annex copy --to, and the repository was ready. Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the large files are just dangling symlinks, the builder doesn t complain, just ignores the images. Will have to fix.

Other resources This is a blog that I read at the beginning, and I found it very useful as an intro: https://switowski.com/blog/git-annex/. It didn t help me understand how it works under the covers, but it is well written. The author does use the sync command though, which is too magic for me, but also agrees about its complexity

The proof is in the pudding And now, for the actual first image to be added that never lived in the old plain git repository. It s not full-res/full-size, it s cropped a bit on the bottom. Earlier in the year, I went to Paris for a very brief work trip, and I walked around a bit it was more beautiful than what I remembered from way way back. So a bit random selection of a picture, but here it is:
Un bateau sur la Seine Un bateau sur la Seine
Enjoy!

Gunnar Wolf: Understanding Misunderstandings - Evaluating LLMs on Networking Questions

This post is a review for Computing Reviews for Understanding Misunderstandings - Evaluating LLMs on Networking Questions , a article published in Association for Computing Machinery (ACM), SIGCOMM Computer Communication Review
Large Language Models have awed the world, emerging as the fastest-growing application of all time ChatGPT reached 100 million active users in January 2023, just two months after its launch. After an initial cycle, they been gradually mostly accepted and incorporated in various workflows, and their basic mechanics are no longer beyond the understanding of people with moderate computer literacy. Now, given the technology is better understood, we face the question of how convenient LLM chatbots are for different occupations. This article embarks on the question of how much LLMs can be useful for networking applications. This article systematizes querying three popular LLMs (GPT-3.5, GPT-4 and Claude 3) with questions taken from several network management online courses and certifications, and presents a taxonomy of six axes along which the incorrect responses were classified: Accuracy (correctness of the answers provided by LLMs), Detectability (how easily errors in the LLM output can be identified), Cause (for each incorrect answer, the underlying causes behind the error), Explainability (the quality of explanations with which the LLMs support their answers), Effects (impact of wrong answers on the users) and Stability (whether a minor change, such as the change of the order of prompts, yields vastly different answers for a single query). The authors also measure four strategies towards improving answers: Self-correction (giving back the LLM the original question and received answer, as well as the expected correct answer, as part of the prompt), One-shot prompting (adding to the prompt, when answering user questions, follow this example followed by a similar correct answer), Majority voting (using the answer that most models agree upon) and Fine tuning (further train on a specific dataset to adapt the LLM to the particular task or domain). The authors noted that they observed that, while some of thos strategies were marginally useful, they sometimes resulted in degraded performance. The authors queried the commercially available instances of Gemini and GPT, reaching quite high results (89.4% for Claude 3, 88.7% for GPT-4 and 76.0% for GPT-3.5), reaching scores over 90% for basic subjects, but faring notably worse in topics that require understanding and converting between different numeric notations, such as working with IP addresses, even if they are trivial (i.e. presenting the subnet mask for a given network address expressed as the typical IPv4 dotted-quad representation). As a last item in the article, the authors menioned they also compared performance with three popular open source models (Llama3.1, Gemma2 and Mistral with their default settings). They mention that, although those models are almost 20 times smaller than the GPT-3.5 commercial model used, they reached comparable performance levels. Sadly, the article does not delve deeper into these models, that can be deployed locally and adapted to specific scenarios. The article is easy to read and does not require deep mathematical or AI-related knowledge. It presents a clear comparison along the described axes for the 503 multiple-choice questions presented. This article can be used as a guide for structuring similar studies over different fields.

John Goerzen: I Learned We All Have Linux Seats, and I m Not Entirely Pleased

I recently wrote about How to Use SSH with FIDO2/U2F Security Keys, which I now use on almost all of my machines. The last one that needed this was my Raspberry Pi hooked up to my DEC vt510 terminal and IBM mechanical keyboard. Yes I do still use that setup! To my surprise, generating a key on it failed. I very quickly saw that /dev/hidraw0 had incorrect permissions, accessible only to root. On other machines, it looks like this:
crw-rw----+ 1 root root 243, 16 May 24 16:47 /dev/hidraw16
And, if I run getfacl on it, I see:
# file: dev/hidraw16
# owner: root
# group: root
user::rw-
user:jgoerzen:rw-
group::---
mask::rw-
other::---
Yes, something was setting an ACL on it. Thus began to saga to figure out what was doing that. Firing up inotifywatch, I saw it was systemd-udevd or its udev-worker. But cranking up logging on that to maximum only showed me that uaccess was somehow doing this. I started digging. uaccess turned out to be almost entirely undocumented. People say to use it, but there s no description of what it does or how. Its purpose appears to be to grant access to devices to those logged in to a machine by dynamically adding them to ACLs for devices. OK, that s a nice goal, but why was machine A doing this and not machine B? I dug some more. I came across a hint that uaccess may only do that for a seat . A seat? I ve not heard of that in Linux before. Turns out there s some information (older and newer) about this out there. Sure enough, on the machine with KDE, loginctl list-sessions shows me on seat0, but on the machine where I log in from ttyUSB0, it shows an empty seat. But how to make myself part of the seat? I tried various udev rules to add the seat or master-of-seat tags, but nothing made any difference. I finally gave up and did the old-fashioned rule to just make it work already:
TAG=="security-device",SUBSYSTEM=="hidraw",GROUP="mygroup"
I still don t know how to teach logind to add a seat for ttyUSB0, but oh well. At least I learned something. An annoying something, but hey. This all had a laudable goal, but when there are so many layers of indirection, poorly documented, with poor logging, it gets pretty annoying.

Scarlett Gately Moore: KDE Application snaps 25.04.2 released!

KDE MascotKDE Mascot
Release notes: https://kde.org/announcements/gear/25.04.2/ Now available in the snap store! Along with that, I have fixed some outstanding bugs: Ark: now can open/save files in removable media Kasts: Once again has sound WIP: Updating Qt6 to 6.9 and frameworks to 6.14 Enjoy everyone! Unlike our software, life is not free. Please consider a donation, thanks!

Freexian Collaborators: Monthly report about Debian Long Term Support, May 2025 (by Roberto C. S nchez)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In May, 22 contributors have been paid to work on Debian LTS, their reports are available:
  • Abhijith PA did 8.0h (out of 0.0h assigned and 8.0h from previous period).
  • Adrian Bunk did 26.0h (out of 26.0h assigned).
  • Andreas Henriksson did 1.0h (out of 15.0h assigned and 3.0h from previous period), thus carrying over 17.0h to the next month.
  • Andrej Shadura did 3.0h (out of 10.0h assigned), thus carrying over 7.0h to the next month.
  • Bastien Roucari s did 20.0h (out of 20.0h assigned).
  • Ben Hutchings did 8.0h (out of 20.0h assigned and 4.0h from previous period), thus carrying over 16.0h to the next month.
  • Carlos Henrique Lima Melara did 12.0h (out of 11.0h assigned and 1.0h from previous period).
  • Chris Lamb did 15.5h (out of 0.0h assigned and 15.5h from previous period).
  • Daniel Leidert did 25.0h (out of 26.0h assigned), thus carrying over 1.0h to the next month.
  • Emilio Pozuelo Monfort did 21.0h (out of 16.75h assigned and 11.0h from previous period), thus carrying over 6.75h to the next month.
  • Guilhem Moulin did 11.5h (out of 8.5h assigned and 6.5h from previous period), thus carrying over 3.5h to the next month.
  • Jochen Sprickerhof did 3.5h (out of 8.75h assigned and 17.5h from previous period), thus carrying over 22.75h to the next month.
  • Lee Garrett did 26.0h (out of 12.75h assigned and 13.25h from previous period).
  • Lucas Kanashiro did 20.0h (out of 18.0h assigned and 2.0h from previous period).
  • Markus Koschany did 20.0h (out of 26.25h assigned), thus carrying over 6.25h to the next month.
  • Roberto C. S nchez did 20.75h (out of 24.0h assigned), thus carrying over 3.25h to the next month.
  • Santiago Ruano Rinc n did 15.0h (out of 12.5h assigned and 2.5h from previous period).
  • Sean Whitton did 6.25h (out of 6.0h assigned and 2.0h from previous period), thus carrying over 1.75h to the next month.
  • Sylvain Beucler did 26.25h (out of 26.25h assigned).
  • Thorsten Alteholz did 15.0h (out of 15.0h assigned).
  • Tobias Frost did 12.0h (out of 12.0h assigned).
  • Utkarsh Gupta did 1.0h (out of 15.0h assigned), thus carrying over 14.0h to the next month.

Evolution of the situation In May, we released 54 DLAs. The LTS Team was particularly active in May, publishing a higher than normal number of advisories, as well as helping with a wide range of updates to packages in stable and unstable, plus some other interesting work. We are also pleased to welcome several updates from contributors outside the regular team.
  • Notable security updates:
    • containerd, prepared by Andreas Henriksson, fixes a vulnerability that could cause containers launched as non-root users to be run as root
    • libapache2-mod-auth-openidc, prepared by Moritz Schlarb, fixes a vulnerability which could allow an attacker to crash an Apache web server with libapache2-mod-auth-openidc installed
    • request-tracker4, prepared by Andrew Ruthven, fixes multiple vulnerabilities which could result in information disclosure, cross-site scripting and use of weak encryption for S/MIME emails
    • postgresql-13, prepared by Bastien Roucari s, fixes an application crash vulnerability that could affect the server or applications using libpq
    • dropbear, prepared by Guilhem Moulin, fixes a vulnerability which could potentially result in execution of arbitrary shell commands
    • openjdk-17, openjdk-11, prepared by Thorsten Glaser, fixes several vulnerabilities, which include denial of service, information disclosure or bypass of sandbox restrictions
    • glibc, prepared by Sean Whitton, fixes a privilege escalation vulnerability
  • Notable non-security updates:
    • wireless-regdb, prepared by Ben Hutchings, updates information reflecting changes to radio regulations in many countries
This month s contributions from outside the regular team include the libapache2-mod-auth-openidc update mentioned above, prepared by Moritz Schlarb (the maintainer of the package); the update of request-tracker4, prepared by Andrew Ruthven (the maintainer of the package); and the updates of openjdk-17 and openjdk-11, also noted above, prepared by Thorsten Glaser. Additionally, LTS Team members contributed stable updates of the following packages:
  • rubygems and yelp/yelp-xsl, prepared by Lucas Kanashiro
  • simplesamlphp, prepared by Tobias Frost
  • libbson-xs-perl, prepared by Roberto C. S nchez
  • fossil, prepared by Sylvain Beucler
  • setuptools and mydumper, prepared by Lee Garrett
  • redis and webpy, prepared by Adrian Bunk
  • xrdp, prepared by Abhijith PA
  • tcpdf, prepared by Santiago Ruano Rinc n
  • kmail-account-wizard, prepared by Thorsten Alteholz
Other contributions were also made by LTS Team members to packages in unstable:
  • proftpd-dfsg DEP-8 tests (autopkgtests) were provided to the maintainer, prepared by Lucas Kanashiro
  • a regular upload of libsoup2.4, prepared by Sean Whitton
  • a regular upload of setuptools, prepared by Lee Garrett
Freexian, the entity behind the management of the Debian LTS project, has been working for some time now on the development of an advanced CI platform for Debian-based distributions, called Debusine. Recently, Debusine has reached a level of feature implementation that makes it very usable. Some members of the LTS Team have been using Debusine informally, and during May LTS coordinator Santiago Ruano Rinc n has made a call for the team to help with testing of Debusine, and to help evaluate its suitability for the LTS Team to eventually begin using as the primary mechanism for uploading packages into Debian. Team members who have started using Debusine are providing valuable feedback to the Debusine development team, thus helping to improve the platform for all users. Actually, a number of updates, for both bullseye and bookworm, made during the month of May were handled using Debusine, e.g. rubygems s DLA-4163-1. By the way, if you are a Debian Developer, you can easily test Debusine following the instructions found at https://wiki.debian.org/DebusineDebianNet. DebConf, the annual Debian Conference, is coming up in July and, as is customary each year, the week preceding the conference will feature an event called DebCamp. The DebCamp week provides an opportunity for teams and other interested groups/individuals to meet together in person in the same venue as the conference itself, with the purpose of doing focused work, often called sprints . LTS coordinator Roberto C. S nchez has announced that the LTS Team is planning to hold a sprint primarily focused on the Debian security tracker and the associated tooling used by the LTS Team and the Debian Security Team.

Thanks to our sponsors Sponsors that joined recently are in bold.

Freexian Collaborators: Debian Contributions: Updated Austin, DebConf 25 preparations continue and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-05 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Updated Austin, by Colin Watson and Helmut Grohne Austin is a frame stack sampling profiler for Python. It allows profiling Python applications without instrumenting them while losing some accuracy in the process, and is the only one of its kind presently packaged for Debian. Unfortunately, it hadn t been uploaded in a while and hence the last Python version it worked with was 3.8. We updated it to a current version and also dealt with a number of architecture-specific problems (such as unintended sign promotion, 64bit time_t fallout and strictness due to -Wformat-security ) in cooperation with upstream. With luck, it will migrate in time for trixie.

Preparing for DebConf 25, by Stefano Rivera and Santiago Ruano Rinc n DebConf 25 is quickly approaching, and the organization work doesn t stop. In May, Stefano continued supporting the different teams. Just to give a couple of examples, Stefano made changes in DebConf 25 website to make BoF and sprints submissions public, so interested people can already know if a BoF or sprint for a given subject is planned, allowing coordination with the proposer; or to enhance how statistics are made public to help the work of the local team. Santiago has participated in different tasks, including the logistics of the conference, like preparing more information about the public transportation that will be available. Santiago has also taken part in activities related to fundraising and reviewing more event proposals.

Miscellaneous contributions
  • Lucas fixed security issues in Valkey in unstable.
  • Lucas tried to help with the update of Redis to version 8 in unstable. The package hadn t been updated for a while due to licensing issues, but now upstream maintainers fixed them.
  • Lucas uploaded around 20 ruby-* packages to unstable that weren t updated for some years to make them build reproducible. Thanks to reproducible builds folks to point out those issues. Also some unblock requests (and follow-ups) were needed to make them reach trixie in time for the release.
  • Lucas is organizing a Debian Outreach session for DebConf 25, reaching out to all interns of Google Summer of Code and Outreachy programs from the last year. The session will be presented by in-person interns and also video recordings from the interns interested in participating but did not manage to attend the conference.
  • Lucas continuously works on DebConf Content team tasks. Replying to speakers, sponsors, and communicating internally with the team.
  • Carles improved po-debconf-manager: fixed bugs reported by Catalan translator, added possibility to import packages out of salsa, added using non-default project branches on salsa, polish to get ready for DebCamp.
  • Carles tested new apt in trixie and reported bugs to apt , installation-report , libqt6widget6 .
  • Carles used po-debconf-manager and imported remaining 80 packages, reviewed 20 translations, submitted (MR or bugs) 54 translations.
  • Carles prepared some topics for translation BoF in DebConf (gathered feedback, first pass on topics).
  • Helmut gave an introductory talk about the mechanics of Linux namespaces at MiniDebConf Hamburg.
  • Helmut sent 25 patches for cross compilation failures.
  • Helmut reviewed, refined and applied a patch from Jochen Sprickerhof to make the Multi-Arch hinter emit more hints for pure Python modules.
  • Helmut sat down with Christoph Berg (not affiliated with Freexian) and extended unschroot to support directory-based chroots with overlayfs. This is a feature that was lost in transitioning from sbuild s schroot backend to its unshare backend. unschroot implements the schroot API just enough to be usable with sbuild and otherwise works a lot like the unshare backend. As a result, apt.postgresql.org now performs its builds contained in a user namespace.
  • Helmut looked into a fair number of rebootstrap failures most of which related to musl or gcc-15 and imported patches or workarounds to make those builds proceed.
  • Helmut updated dumat to use sqop fixing earlier PGP verification problems thanks to Justus Winter and Neal Walfield explaining a lot of sequoia at MiniDebConf Hamburg.
  • Helmut got the previous zutils update for /usr-move wrong again and had to send another update.
  • Helmut looked into why debvm s autopkgtests were flaky and with lots of help from Paul Gevers and Michael Tokarev tracked it down to a race condition in qemu. He updated debvm to trigger the problem less often and also fixed a wrong dependency using Luca Boccassi s patch.
  • Santiago continued the switch to sbuild for Salsa CI (that was stopped for some months), and has been mainly testing linux, since it s a complex project that heavily customizes the pipeline. Santiago is preparing the changes for linux to submit a MR soon.
  • In openssh, Colin tracked down some intermittent sshd crashes to a root cause, and issued bookworm and bullseye updates for CVE-2025-32728.
  • Colin spent some time fixing up fail2ban, mainly reverting a patch that caused its tests to fail and would have banned legitimate users in some common cases.
  • Colin backported upstream fixes for CVE-2025-48383 (django-select2) and CVE-2025-47287 (python-tornado) to unstable.
  • Stefano supported video streaming and recording for 2 miniDebConfs in May: Macei and Hamburg. These had overlapping streams for one day, which is a first for us.
  • Stefano packaged the new version of python-virtualenv that includes our patches for not including the wheel for wheel.
  • Stefano got all involved parties to agree (in principle) to meet at DebConf for a mediated discussion on a dispute that was brought to the technical committee.
  • Anupa coordinated the swag purchase for DebConf 25 with Juliana and Nattie.
  • Anupa joined the publicity team meeting for discussing the upcoming events and BoF at DebConf 25.
  • Anupa worked with the publicity team to publish Bits post to welcome GSoc 2025 Interns.

8 June 2025

Thorsten Alteholz: My Debian Activities in May 2025

Debian LTS This was my hundred-thirty-first month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on: I also continued my to work on libxmltok and suricata. This month I also had to do some support on seger, for example to inject packages newly needed for builds. Debian ELTS This month was the eighty-second ELTS month. During my allocated time I uploaded or worked on: All packages I worked on have been on the list of longstanding packages. For example espeak-ng has been on this list for more than nine month. I now understood that there is a reason why packages are on this list. Some parts of the software have been almost completely reworked, so that the patches need a reverse rework. For some packages this is easy, but for others this rework needs quite some time. I also continued to work on libxmltok and suricata. Debian Printing Unfortunately I didn t found any time to work on this topic. Debian Astro This month I uploaded bugfix versions of: Debian Mobcom This month I uploaded bugfix versions of: misc This month I uploaded bugfix versions of: Thanks a lot to the Release Team who quickly handled all my unblock bugs! FTP master It is this time of the year when just a few packages arrive in NEW: it is Hard Freeze. So I enjoy this period and basically just take care of kernels or other important packages. As people seem to be more interested in discussions than in fixing RC bugs, my period of rest seems to continue for a while. So thanks for all this valuable discussions and really thanks to the few people who still take care of Trixie. This month I accepted 146 and rejected 10 packages. The overall number of packages that got accepted was 147.

7 June 2025

Evgeni Golov: show your desk - 2025 edition

Back in 2020 I posted about my desk setup at home. Recently someone in our #remotees channel at work asked about WFH setups and given quite a few things changed in mine, I thought it's time to post an update. But first, a picture! standing desk with a monitor, laptop etc (Yes, it's cleaner than usual, how could you tell?!) desk It's still the same Flexispot E5B, no change here. After 7 years (I bought mine in 2018) it still works fine. If I'd have to buy a new one, I'd probably get a four-legged one for more stability (they got quite affordable now), but there is no immediate need for that. chair It's still the IKEA Volmar. Again, no complaints here. hardware Now here we finally have some updates! laptop A Lenovo ThinkPad X1 Carbon Gen 12, Intel Core Ultra 7 165U, 32GB RAM, running Fedora (42 at the moment). It's connected to a Lenovo ThinkPad Thunderbolt 4 Dock. It just works . workstation It's still the P410, but mostly unused these days. monitor An AOC U2790PQU 27" 4K. I'm running it at 150% scaling, which works quite decently these days (no comparison to when I got it). speakers As the new monitor didn't want to take the old Dell soundbar, I have upgraded to a pair of Alesis M1Active 330 USB. They sound good and were not too expensive. I had to fix the volume control after some time though. webcam It's still the Logitech C920 Pro. microphone The built in mic of the C920 is really fine, but to do conference-grade talks (and some podcasts ), I decided to get something better. I got a FIFINE K669B, with a nice arm. It's not a Shure, for sure, but does the job well and Christian was quite satisfied with the results when we recorded the Debian and Foreman specials of Focus on Linux. keyboard It's still the ThinkPad Compact USB Keyboard with TrackPoint. I had to print a few fixes and replacement parts for it, but otherwise it's doing great. Seems Lenovo stopped making those, so I really shouldn't break it any further. mouse Logitech MX Master 3S. The surface of the old MX Master 2 got very sticky at some point and it had to be replaced. other notepad I'm still terrible at remembering things, so I still write them down in an A5 notepad. whiteboard I've also added a (small) whiteboard on the wall right of the desk, mostly used for long term todo lists. coaster Turns out Xeon-based coasters are super stable, so it lives on! yubikey Yepp, still a thing. Still USB-A because... reasons. headphones Still the Bose QC25, by now on the third set of ear cushions, but otherwise working great and the odd 15 cushion replacement does not justify buying anything newer (which would have the same problem after some time, I guess). I did add a cheap (~10 ) Bluetooth-to-Headphonejack dongle, so I can use them with my phone too (shakes fist at modern phones). And I do use the headphones more in meetings, as the Alesis speakers fill the room more with sound and thus sometimes produce a bit of an echo. charger The Bose need AAA batteries, and so do some other gadgets in the house, so there is a technoline BC 700 charger for AA and AAA on my desk these days. light Yepp, I've added an IKEA Tertial and an ALDI "face" light. No, I don't use them much. KVM switch I've "built" a KVM switch out of an USB switch, but given I don't use the workstation that often these days, the switch is also mostly unused.

6 June 2025

Reproducible Builds: Reproducible Builds in May 2025

Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website. In this report:
  1. Security audit of Reproducible Builds tools published
  2. When good pseudorandom numbers go bad
  3. Academic articles
  4. Distribution work
  5. diffoscope and disorderfs
  6. Website updates
  7. Reproducibility testing framework
  8. Upstream patches

Security audit of Reproducible Builds tools published The Open Technology Fund s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a whitebox audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service. The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility. OTF s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.

When good pseudorandom numbers go bad Danielle Navarro published an interesting and amusing article on their blog on When good pseudorandom numbers go bad. Danielle sets the stage as follows:
[Colleagues] approached me to talk about a reproducibility issue they d been having with some R code. They d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren t just a little bit different in the way that we ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
Thanks to David Wheeler for posting about this article on our mailing list

Academic articles There were two scholarly articles published this month that related to reproducibility: Daniel Hugenroth and Alastair R. Beresford of the University of Cambridge in the United Kingdom and Mario Lins and Ren Mayrhofer of Johannes Kepler University in Linz, Austria published an article titled Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments. In their paper, they:
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare attestable builds with reproducible builds by noting an attestable build requires only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it , and proceed by determining that t he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.
Timo Pohl, Pavel Nov k, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is sparse , focusing on few individual problems and ecosystems, and therefore identify space for more critical research.

Distribution work In Debian this month:
Hans-Christoph Steiner of the F-Droid catalogue of open source applications for the Android platform published a blog post on Making reproducible builds visible. Noting that Reproducible builds are essential in order to have trustworthy software , Hans also mentions that F-Droid has been delivering reproducible builds since 2015 . However:
There is now a Reproducibility Status link for each app on f-droid.org, listed on every app s page. Our verification server shows or based on its build results, where means our rebuilder reproduced the same APK file and means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.

diffoscope & disorderfs diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:
  • Don t rely on zipdetails --walk argument being available, and only add that argument on newer versions after we test for that. [ ]
  • Review and merge support for NuGet packages from Omair Majid. [ ]
  • Update copyright years. [ ]
  • Merge support for an lzma comparator from Will Hollywood. [ ][ ]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues [ ]. This was then uploaded to Debian as version 0.6.0-1. Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 [ ][ ] and 297 [ ][ ], and disorderfs to version 0.6.0 [ ][ ].

Website updates Once again, there were a number of improvements made to our website this month including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL s public post, recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year . As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
  • Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
  • After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a regular architecture there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
  • Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
  • Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.

Outside of this, a number of smaller changes were also made by Holger Levsen:
  • reproduce.debian.net-related:
    • Only use two workers for the ppc64el architecture due to RAM size. [ ]
    • Monitor nginx_request and nginx_status with the Munin monitoring system. [ ][ ]
    • Detect various variants of network and memory errors. [ ][ ][ ][ ]
    • Add a prominent link to reproducible-builds.org. [ ]
    • Add a rebuilderd-cache-cleanup.service and run it daily via timer. [ ][ ][ ][ ][ ]
    • Be more verbose what sources are being downloaded. [ ]
    • Correctly deal with packages with an epoch in their version [ ] and deal with binNMUs versions with an epoch as well [ ][ ].
    • Document how to reschedule all other errors on all archs. [ ]
    • Misc documentation improvements. [ ][ ][ ][ ]
    • Include the $HOSTNAME variable in the rebuilderd logfiles. [ ]
    • Install the equivs package on all worker nodes. [ ][ ]
  • Jenkins nodes:
    • Permit the sudo tool to fix up permission issues. [ ][ ]
    • Document how to manage diskspace with OpenStack. [ ]
    • Ignore a number of spurious monitoring errors on riscv64, FreeBSD, etc.. [ ][ ][ ][ ]
    • Install ntpsec-ntpdate (instead of ntpdate) as the former is available on Debian trixie and bookworm. [ ][ ]
    • Use the same SSH ControlPath for all nodes. [ ]
    • Make sure the munin user uses the same SSH config as the jenkins user. [ ]
  • tests.reproducible-builds.org-related:
    • Disable testing of the i386 architecture. [ ][ ][ ][ ][ ]
    • Document the current disk usage. [ ][ ]
    • Address some image placement now that we only test three architectures. [ ]
    • Keep track of build performance. [ ]
  • Misc:
    • Fix a (harmless) typo in the multiarch_versionskew script. [ ]
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
  • Add out of memory detection to the statistics page. [ ]
  • Reverse the sorting order on the statistics page. [ ][ ][ ][ ]
  • Improve the spacing between statistics groups. [ ]
  • Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. [ ]
  • Support Debian unstable in the rebuilder-debian.sh script. [ ] ]
  • Rely on rebuildctl to sync only arch-specific packages. [ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Dirk Eddelbuettel: #49: The Two Cultures of Deploying Statistical Software

Welcome to post 49 in the R4 series. The Two Cultures is a term first used by C.P. Snow in a 1959 speech and monograph focused on the split between humanities and the sciences. Decades later, the term was (quite famously) re-used by Leo Breiman in a (somewhat prophetic) 2001 article about the split between data models and algorithmic models . In this note, we argue that statistical computing practice and deployment can also be described via this Two Cultures moniker. Referring to the term linking these foundational pieces is of course headline bait. Yet when preparing for the discussion of r2u in the invited talk in Mons (video, slides), it occurred to me that there is in fact a wide gulf between two alternative approaches of using R and, specifically, deploying packages. On the one hand we have the approach described by my friend Jeff as you go to the Apple store, buy the nicest machine you can afford, install what you need and then never ever touch it . A computer / workstation / laptop is seen as an immutable object where every attempt at change may lead to breakage, instability, and general chaos and is hence best avoided. If you know Jeff, you know he exaggerates. Maybe only slightly though. Similarly, an entire sub-culture of users striving for reproducibility (and sometimes also replicability ) does the same. This is for example evidenced by the popularity of package renv by Rcpp collaborator and pal Kevin. The expressed hope is that by nailing down a (sub)set of packages, outcomes are constrained to be unchanged. Hope springs eternal, clearly. (Personally, if need be, I do the same with Docker containers and their respective Dockerfile.) On the other hand, rolling is fundamentally different approach. One (well known) example is Google building everything at @HEAD . The entire (ginormous) code base is considered as a mono-repo which at any point in time is expected to be buildable as is. All changes made are pre-tested to be free of side effects to other parts. This sounds hard, and likely is more involved than an alternative of a whatever works approach of independent changes and just hoping for the best. Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to a staging place (Debian calls this the unstable distribution) and, if no side effects are seen, propagated after a fixed number of days to the rolling distribution (called testing ). With this mechanism, testing should always be installable too. And based on the rolling distribution, at certain times (for Debian roughly every two years) a release is made from testing into stable (following more elaborate testing). The released stable version is then immutable (apart from fixes for seriously grave bugs and of course security updates). So this provides the connection between frequent and rolling updates, and produces immutable fixed set: a release. This Debian approach has been influential for any other projects including CRAN as can be seen in aspects of its system providing a rolling set of curated packages. Instead of a staging area for all packages, extensive tests are made for candidate packages before adding an update. This aims to ensure quality and consistence and has worked remarkably well. We argue that it has clearly contributed to the success and renown of CRAN. Now, when accessing CRAN from R, we fundamentally have two accessor functions. But seemingly only one is widely known and used. In what we may call the Jeff model , everybody is happy to deploy install.packages() for initial installations. That sentiment is clearly expressed by this bsky post:
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it s been updated with 18 new major features
And that is why we have two cultures. Because some of us, yours truly included, also use update.packages() at recurring (frequent !!) intervals: daily or near-daily for me. The goodness and, dare I say, gift of packages is not limited to those by my pal Vincent. CRAN updates all the time, and updates are (generally) full of (usually excellent) changes, fixes, or new features. So update frequently! Doing (many but small) updates (frequently) is less invasive than (large, infrequent) waterfall -style changes! But the fear of change, or disruption, is clearly pervasive. One can only speculate why. Is the experience of updating so painful on other operating systems? Is it maybe a lack of exposure / tutorials on best practices? These Two Cultures coexist. When I delivered the talk in Mons, I briefly asked for a show of hands among all the R users in the audience to see who in fact does use update.packages() regularly. And maybe a handful of hands went up: surprisingly few! Now back to the context of installing packages: Clearly only installing has its uses. For continuous integration checks we generally install into ephemeral temporary setups. Some debugging work may be with one-off container or virtual machine setups. But all other uses may well be under maintained setups. So consider calling update.packages() once in while. Or even weekly or daily. The rolling feature of CRAN is a real benefit, and it is there for the taking and enrichment of your statistical computing experience. So to sum up, the real power is to use For both tasks, relying on binary installations accelerates and eases the process. And where available, using binary installation with system-dependency support as r2u does makes it easier still, following the r2u slogan of Fast. Easy. Reliable. Pick All Three. Give it a try!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

5 June 2025

Matthew Garrett: How Twitter could (somewhat) fix their encrypted DMs

As I wrote in my last post, Twitter's new encrypted DM infrastructure is pretty awful. But the amount of work required to make it somewhat better isn't large.

When Juicebox is used with HSMs, it supports encrypting the communication between the client and the backend. This is handled by generating a unique keypair for each HSM. The public key is provided to the client, while the private key remains within the HSM. Even if you can see the traffic sent to the HSM, it's encrypted using the Noise protocol and so the user's encrypted secret data can't be retrieved.

But this is only useful if you know that the public key corresponds to a private key in the HSM! Right now there's no way to know this, but there's worse - the client doesn't have the public key built into it, it's supplied as a response to an API request made to Twitter's servers. Even if the current keys are associated with the HSMs, Twitter could swap them out with ones that aren't, terminate the encrypted connection at their endpoint, and then fake your query to the HSM and get the encrypted data that way. Worse, this could be done for specific targeted users, without any indication to the user that this has happened, making it almost impossible to detect in general.

This is at least partially fixable. Twitter could prove to a third party that their Juicebox keys were generated in an HSM, and the key material could be moved into clients. This makes attacking individual users more difficult (the backdoor code would need to be shipped in the public client), but can't easily help with the website version[1] even if a framework exists to analyse the clients and verify that the correct public keys are in use.

It's still worse than Signal. Use Signal.

[1] Since they could still just serve backdoored Javascript to specific users. This is, unfortunately, kind of an inherent problem when it comes to web-based clients - we don't have good frameworks to detect whether the site itself is malicious.

comment count unavailable comments

4 June 2025

Gunnar Wolf: The subjective value of privacy Assessing individuals' calculus of costs and benefits in the context of state surveillance

This post is an unpublished review for The subjective value of privacy Assessing individuals' calculus of costs and benefits in the context of state surveillance
Internet users, software developers, academics, entrepreneurs basically everybody is now aware of the importance of considering privacy as a core part of our online experience. User demand, and various national or regional laws, have made privacy a continuously present subject. However, how do regular people like ourselves, in our many capacities feel about privacy? Lukas Antoine presents a series of experiments aiming at better understanding how people throughout the world understands privacy, and when is privacy held as more or less important than security in different aspects, Particularly, privacy is often portrayed as a value set at tension against surveillance, and particularly state surveillance, in the name of security: conventional wisdom presents the idea of privacy calculus. This is, it is often assumed that individuals continuously evaluate the costs and benefits of divulging their personal data, sharing data when they expect a positive net outcome, and denying it otherwise. This framework has been accepted for decades, and the author wishes to challenge it. This book is clearly his doctoral thesis on political sciences, and its contents are as thorough as expected in this kind of product. The author presents three empirical studies based on cross-survey analysis. The first experiment explores the security justifications for surveillance and how they influence their support. The second one searches whether the stance on surveillance can be made dependent on personal convenience or financial cost. The third study explores whether privacy attitude is context-dependant or can be seen as a stable personality trait. The studies aim to address the shortcomings of published literature in the field, mainly, (a) the lack of comprehensive research on state surveillance, needed or better understanding privacy appreciation, (b) while several studies have tackled the subjective measure of privacy, there is a lack of cross-national studies to explain wide-ranging phenomena, (c) most studies in this regard are based on population-based surveys, which cannot establish causal relationships, (d) a seemingly blind acceptance of the privacy calculus mentioned above, with no strong evidence that it accurately measures people s motivations for disclosing or withholding their data. The book is full with theoretical references and does a very good job of explaining the path followed by the author. It is, though, a heavy read, and, for people not coming from the social sciences tradition, leads to the occasional feeling of being lost. The conceptual and theoretical frameworks and presented studies are thorough and clear. The author is honest in explaining when the data points at some of his hypotheses being disproven, while others are confirmed. The aim of the book is for people digging deep into this topic. Personally, I have authored several works on different aspects of privacy, but this book did get me thinking on many issues I had not previously considered. My only complaint would be that, for the publication as part of its highly prestigious publisher, little attention has been paid to editorial aspects: sub-subsection depth is often excessive and unclear. Also, when publishing monographs based on doctoral works, it is customary to no longer refer to the work as a thesis and to soften some of the formal requirements such a work often has, with the aim of producing a more gentle and readable book; this book seems just like the mass-production of an (otherwise very interesting and well made) thesis work.

Gunnar Wolf: Humanities and big data in Ibero-America Theory, methodology and practical applications

This post is an unpublished review for Humanities and big data in Ibero-America Theory, methodology and practical applications
Digital humanities is a young though established field. It deals with different expressions in which digital data manipulation techniques can be applied and used to analyze subjects that are identified as belonging to the humanities. Although most often used to analyze different aspects of literature or social network analysis, it can also be applied to other humanistic disciplines or artistic expressions. Digital humanities employs many tools, but those categorized as big data are among the most frequently employed. This book samples different takes on digital humanities, with the particularity that it focuses on Ibero-American uses. It is worth noting that this book is the second in a series of four volumes, published or set to be published between 2022 and 2026. Being the output of a field survey, I perceive this book to be targeted towards fellow Digital Humanists people interested in applying computational methods to further understand and research topics in the humanities. It is not a technical book in the sense Computer Science people would recognize as such, but several of the presented works do benefit from understanding some technical concepts. The 12 articles (plus an introduction) that make up this book are organized in three parts: (1) Theoretical Framework presents the ideas and techniques of data science (that make up the tools for handling big data), and explores how data science can contribute to literary analysis, all while noting that many such techniques are usually frowned upon in Latin America as data science smells neoliberal ; (2) Methodological Issues looks at specific issues through the lens of how they can be applied to big data, with specific attention given to works in Spanish; and (3) Practical Applications analyzes specific Spanish works and communities based on big data techniques. Several chapters treat a recurring theme: the simultaneous resistance and appropriation of big data by humanists. For example, at least three of the chapters describe the tensions between humanism ( aesthesis ) and cold, number-oriented data analysis ( mathesis ). The analyzed works of Parts 2 and 3 are interesting and relatively easy to follow. Some inescapable ideological gleans from several word uses from the book s and series name, which refers to the Spanish-speaking regions as Ibero-America , often seen as Eurocentric, in contrast with the Latin America term much more widely used throughout the region. I will end with some notes about the specific versions of the book I reviewed. I read both an EPUB version and a print copy. The EPUB did not include links for easy navigation to footnotes, that is, the typographical superindexes are not hyperlinked to the location of the notes, so it is very impractical to try to follow them. The print version (unlike the EPUB) did not have an index, that is, the six pages before the introduction are missing from the print copy I received. For a book such as this one, not having an index hampers the ease of reading and referencing.

Gunnar Wolf: Beyond data poisoning in federated learning

This post is an unpublished review for Beyond data poisoning in federated learning
The current boom of artificial intelligence (AI) is based upon neural networks (NNs). In order for these to be useful, the network has to undergo a machine learning (ML) process: work over a series of inputs, and adjust the inner weights of the connections between neurons so that each of the data samples the network was trained on produces the right set of labels for each item. Federated learning (FL) appeared as a reaction given the data centralization power that traditional ML provides: instead of centrally controlling the whole training data, various different actors analyze disjoint subsets of data, and provide only the results of this analysis, thus increasing privacy while analyzing a large dataset. Finally, given multiple actors are involved in FL, how hard is it for a hostile actor to provide data that will confuse the NN, instead of helping it reach better performance? This kind of attack is termed a poisoning attack, and is the main focus of this paper. The authors set out to research how effective can a hyperdimensional data poisoning attack (HDPA) be to confuse a NN and cause it to misclassify both the items trained on and yet unseen items. Data used for NN training is usually represented as a large set of orthogonal vectors, each describing a different aspect of the item, allowing for very simple vector arithmetic operations. Thus, NN training is termed as high-dimensional or hyperdimensional. The attack method described by the authors employs cosine similarity, that is, in order to preserve similarity, a target hypervector is reflected over a given dimension, yielding a cosine-similar result that will trick ML models, even if using byzantine-robust defenses. The paper is clear, though not an easy read. It explains in detail the mathematical operations, following several related although different threat models. The authors present the results of the experimental evaluation of their proposed model, comparing it to several other well-known adversarial attacks for visual recognition tasks, over pre-labeled datasets frequently used as training data, such as MNIST, Fashion-MNIST and CIFAR-10. They show that their method is not only more effective as an attack, but falls within the same time range as other surveyed attacks. Adversarial attacks are, all in all, an important way to advance any field of knowledge; by publishing this attack, the authors will surely spark other works to detect and prevent this kind of alteration. It is important for AI implementers to understand the nature of this field and be aware of the risks that this work, as well as others cited in it, highlight: ML will train a computer system to recognize a dataset, warts and all; efficient as AI is, if noise is allowed into the training data (particularly adversarially generated noise), the trained model might present impaired performance.

Gunnar Wolf: Computational modelling of robot personhood and relationality

This post is an unpublished review for Computational modelling of robot personhood and relationality
If humans and robots were to be able to roam around the same spaces, mutually recognizing each other for what they are, how would interaction be? How can we model such interactions in a way that we can reason about and understand the implications of a given behavior? This book aims at answering this question. The book is split into two very different parts. Chapters 1 through 3 are mostly written with a philosophical angle. It starts by framing the possibility of having sentient androids exist in the same plane as humans, without them trying to pass as us or vice versa. The first chapters look at issues related to personhood, that is, how androids can be treated as valid interaction partners in a society with humans, and how interactions with them can be seen as meaningful. In doing so, several landmarks of the past 40 years in the AI field are reviewed. The issues of the Significant Concerns that make up a society and give it coherence and of Personhood and Relationality , describing how this permeates from a society into each of the individuals that make it up, the relations between them and the social objects that bring individuals closer together (or farther apart) are introduced and explained. The second part of the book is written from a very different angle, and the change in pace took me somewhat by surprise. Each subsequent chapter presents a different angle of the Affinity system, a model that follows some aspects of human behavior over time and in a given space. Chapter 4 introduces the Affinity environment: a 3D simulated environment with simulated physical laws and characteristics, where a number of agents (30-50 is mentioned as usual) interact. Agents have a series of attributes ( value memory ), can adhere to different programs ( narratives ), and gain or lose on some vectors ( economy ). They can sense the world around them with sensors, and can modify the world or signal other agents using effectors. The last two chapters round out the book, as expected: the first presents a set of results from analyzing a given set of value systems, and the second gives readers the conclusions reached by the author. However, I was expecting more either having at least a link to download the Affinity system and continue exploring it or modifying some of the aspects it models to get it to model a set of agents with different stories and narratives, or extend it to yet unforseen behaviors, or at least have the author present a more complete comparison of results than the evaluation of patterns resulting from a given run. The author is a well-known, prolific author in the field, and I was expecting bigger insights from this book. Nevertheless, the book is an interesting and fun read, with important insights in both the first and second parts. There is a certain lack of connection between their respective rhythms, and the second part indeed builds on the concepts introduced in the first one. Overall, I enjoyed reading the book despite expecting more.

Next.