Search Results: "tats"

6 June 2025

Dirk Eddelbuettel: #49: The Two Cultures of Deploying Statistical Software

Welcome to post 49 in the R4 series. The Two Cultures is a term first used by C.P. Snow in a 1959 speech and monograph focused on the split between humanities and the sciences. Decades later, the term was (quite famously) re-used by Leo Breiman in a (somewhat prophetic) 2001 article about the split between data models and algorithmic models . In this note, we argue that statistical computing practice and deployment can also be described via this Two Cultures moniker. Referring to the term linking these foundational pieces is of course headline bait. Yet when preparing for the discussion of r2u in the invited talk in Mons (video, slides), it occurred to me that there is in fact a wide gulf between two alternative approaches of using R and, specifically, deploying packages. On the one hand we have the approach described by my friend Jeff as you go to the Apple store, buy the nicest machine you can afford, install what you need and then never ever touch it . A computer / workstation / laptop is seen as an immutable object where every attempt at change may lead to breakage, instability, and general chaos and is hence best avoided. If you know Jeff, you know he exaggerates. Maybe only slightly though. Similarly, an entire sub-culture of users striving for reproducibility (and sometimes also replicability ) does the same. This is for example evidenced by the popularity of package renv by Rcpp collaborator and pal Kevin. The expressed hope is that by nailing down a (sub)set of packages, outcomes are constrained to be unchanged. Hope springs eternal, clearly. (Personally, if need be, I do the same with Docker containers and their respective Dockerfile.) On the other hand, rolling is fundamentally different approach. One (well known) example is Google building everything at @HEAD . The entire (ginormous) code base is considered as a mono-repo which at any point in time is expected to be buildable as is. All changes made are pre-tested to be free of side effects to other parts. This sounds hard, and likely is more involved than an alternative of a whatever works approach of independent changes and just hoping for the best. Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to a staging place (Debian calls this the unstable distribution) and, if no side effects are seen, propagated after a fixed number of days to the rolling distribution (called testing ). With this mechanism, testing should always be installable too. And based on the rolling distribution, at certain times (for Debian roughly every two years) a release is made from testing into stable (following more elaborate testing). The released stable version is then immutable (apart from fixes for seriously grave bugs and of course security updates). So this provides the connection between frequent and rolling updates, and produces immutable fixed set: a release. This Debian approach has been influential for any other projects including CRAN as can be seen in aspects of its system providing a rolling set of curated packages. Instead of a staging area for all packages, extensive tests are made for candidate packages before adding an update. This aims to ensure quality and consistence and has worked remarkably well. We argue that it has clearly contributed to the success and renown of CRAN. Now, when accessing CRAN from R, we fundamentally have two accessor functions. But seemingly only one is widely known and used. In what we may call the Jeff model , everybody is happy to deploy install.packages() for initial installations. That sentiment is clearly expressed by this bsky post:
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it s been updated with 18 new major features
And that is why we have two cultures. Because some of us, yours truly included, also use update.packages() at recurring (frequent !!) intervals: daily or near-daily for me. The goodness and, dare I say, gift of packages is not limited to those by my pal Vincent. CRAN updates all the time, and updates are (generally) full of (usually excellent) changes, fixes, or new features. So update frequently! Doing (many but small) updates (frequently) is less invasive than (large, infrequent) waterfall -style changes! But the fear of change, or disruption, is clearly pervasive. One can only speculate why. Is the experience of updating so painful on other operating systems? Is it maybe a lack of exposure / tutorials on best practices? These Two Cultures coexist. When I delivered the talk in Mons, I briefly asked for a show of hands among all the R users in the audience to see who in fact does use update.packages() regularly. And maybe a handful of hands went up: surprisingly few! Now back to the context of installing packages: Clearly only installing has its uses. For continuous integration checks we generally install into ephemeral temporary setups. Some debugging work may be with one-off container or virtual machine setups. But all other uses may well be under maintained setups. So consider calling update.packages() once in while. Or even weekly or daily. The rolling feature of CRAN is a real benefit, and it is there for the taking and enrichment of your statistical computing experience. So to sum up, the real power is to use For both tasks, relying on binary installations accelerates and eases the process. And where available, using binary installation with system-dependency support as r2u does makes it easier still, following the r2u slogan of Fast. Easy. Reliable. Pick All Three. Give it a try!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

31 May 2025

Antoine Beaupr : Traffic meter per ASN without logs

Have you ever found yourself in the situation where you had no or anonymized logs and still wanted to figure out where your traffic was coming from? Or you have multiple upstreams and are looking to see if you can save fees by getting into peering agreements with some other party? Or your site is getting heavy load but you can't pinpoint it on a single IP and you suspect some amoral corporation is training their degenerate AI on your content with a bot army? (You might be getting onto something there.) If that rings a bell, read on.

TL;DR: ... or just skip the cruft and install asncounter:
pip install asncounter
Also available in Debian 14 or later, or possibly in Debian 13 backports (soon to be released) if people are interested:
apt install asncounter
Then count whoever is hitting your network with:
awk ' print $2 ' /var/log/apache2/*access*.log   asncounter
or:
tail -F /var/log/apache2/*access*.log   awk ' print $2 '   asncounter
or:
tcpdump -q -n   asncounter --input-format=tcpdump --repl
or:
tcpdump -q -i eth0 -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)"   asncounter --input-format=tcpdump --repl
Read on for why this matters, and why I wrote yet another weird tool (almost) from scratch.

Background and manual work This is a tool I've been dreaming of for a long, long time. Back in 2006, at Koumbit a colleague had setup TAS ("Traffic Accounting System", " " in Russian, apparently), a collection of Perl script that would do per-IP accounting. It was pretty cool: it would count bytes per IP addresses and, from that, you could do analysis. But the project died, and it was kind of bespoke. Fast forward twenty years, and I find myself fighting off bots at the Tor Project (the irony...), with our GitLab suffering pretty bad slowdowns (see issue tpo/tpa/team#41677 for the latest public issue, the juicier one is confidential, unfortunately). (We did have some issues caused by overloads in CI, as we host, after all, a fork of Firefox, which is a massive repository, but the applications team did sustained, awesome work to fix issues on that side, again and again (see tpo/applications/tor-browser#43121 for the latest, and tpo/applications/tor-browser#43121 for some pretty impressive correlation work, I work with really skilled people). But those issues, I believe were fixed.) So I had the feeling it was our turn to get hammered by the AI bots. But how do we tell? I could tell something was hammering at the costly /commit/ and (especially costly) /blame/ endpoint. So at first, I pulled out the trusted awk, sort uniq -c sort -n tail pipeline I am sure others have worked out before:
awk ' print $1 ' /var/log/nginx/*.log   sort   uniq -c   sort -n   tail -10
For people new to this, that pulls the first field out of web server log files, sort the list, counts the number of unique entries, and sorts that so that the most common entries (or IPs) show up first, then show the top 10. That, other words, answers the question of "which IP address visits this web server the most?" Based on this, I found a couple of IP addresses that looked like Alibaba. I had already addressed an abuse complaint to them (tpo/tpa/team#42152) but never got a response, so I just blocked their entire network blocks, rather violently:
for cidr in 47.240.0.0/14 47.246.0.0/16 47.244.0.0/15 47.235.0.0/16 47.236.0.0/14; do 
  iptables-legacy -I INPUT -s $cidr -j REJECT
done
That made Ali Baba and his forty thieves (specifically their AL-3 network go away, but our load was still high, and I was still seeing various IPs crawling the costly endpoints. And this time, it was hard to tell who they were: you'll notice all the Alibaba IPs are inside the same 47.0.0.0/8 prefix. Although it's not a /8 itself, it's all inside the same prefix, so it's visually easy to pick it apart, especially for a brain like mine who's stared too long at logs flowing by too fast for their own mental health. What I had then was different, and I was tired of doing the stupid thing I had been doing for decades at this point. I had recently stumbled upon pyasn recently (in January, according to my notes) and somehow found it again, and thought "I bet I could write a quick script that loops over IPs and counts IPs per ASN". (Obviously, there are lots of other tools out there for that kind of monitoring. Argos, for example, presumably does this, but it's a kind of a huge stack. You can also get into netflows, but there's serious privacy implications with those. There are also lots of per-IP counters like promacct, but that doesn't scale. Or maybe someone already had solved this problem and I just wasted a week of my life, who knows. Someone will let me know, I hope, either way.)

ASNs and networks A quick aside, for people not familiar with how the internet works. People that know about ASNs, BGP announcements and so on can skip. The internet is the network of networks. It's made of multiple networks that talk to each other. The way this works is there is a Border Gateway Protocol (BGP), a relatively simple TCP-based protocol, that the edge routers of those networks used to announce each other what network they manage. Each of those network is called an Autonomous System (AS) and has an AS number (ASN) to uniquely identify it. Just like IP addresses, ASNs are allocated by IANA and local registries, they're pretty cheap and useful if you like running your own routers, get one. When you have an ASN, you'll use it to, say, announce to your BGP neighbors "I have 198.51.100.0/24" over here and the others might say "okay, and I have 216.90.108.31/19 over here, and I know of this other ASN over there that has 192.0.2.1/24 too! And gradually, those announcements flood the entire network, and you end up with each BGP having a routing table of the global internet, with a map of which network block, or "prefix" is announced by which ASN. It's how the internet works, and it's a useful thing to know, because it's what, ultimately, makes an organisation responsible for an IP address. There are "looking glass" tools like the one provided by routeviews.org which allow you to effectively run "trace routes" (but not the same as traceroute, which actively sends probes from your location), type an IP address in that form to fiddle with it. You will end up with an "AS path", the way to get from the looking glass to the announced network. But I digress, and that's kind of out of scope. Point is, internet is made of networks, networks are autonomous systems (AS) and they have numbers (ASNs), and they announced IP prefixes (or "network blocks") that ultimately tells you who is responsible for traffic on the internet.

Introducing asncounter So my goal was to get from "lots of IP addresses" to "list of ASNs", possibly also the list of prefixes (because why not). Turns out pyasn makes that really easy. I managed to build a prototype in probably less than an hour, just look at the first version, it's 44 lines (sloccount) of Python, and it works, provided you have already downloaded the required datafiles from routeviews.org. (Obviously, the latest version is longer at close to 1000 lines, but it downloads the data files automatically, and has many more features). The way the first prototype (and later versions too, mostly) worked is that you feed it a list of IP addresses on standard input, it looks up the ASN and prefix associated with the IP, and increments a counter for those, then print the result. That showed me something like this:
root@gitlab-02:~/anarcat-scripts# tcpdump -q -i eth0 -n -Q in "(udp or tcp)"   ./asncounter.py --tcpdump                                                                                                                                                                          
tcpdump: verbose output suppressed, use -v[v]... for full protocol decode                                                                
listening on eth0, link-type EN10MB (Ethernet), snapshot length 262144 bytes                                                             
INFO: collecting IPs from stdin, using datfile ipasn_20250523.1600.dat.gz                                                                
INFO: loading datfile /root/.cache/pyasn/ipasn_20250523.1600.dat.gz...                                                                   
INFO: loading /root/.cache/pyasn/asnames.json                       
ASN     count   AS               
136907  7811    HWCLOUDS-AS-AP HUAWEI CLOUDS, HK                                                                                         
[----]  359     [REDACTED]
[----]  313     [REDACTED]
8075    254     MICROSOFT-CORP-MSN-AS-BLOCK, US
[---]   164     [REDACTED]
[----]  136     [REDACTED]
24940   114     HETZNER-AS, DE  
[----]  98      [REDACTED]
14618   82      AMAZON-AES, US                                                                                                           
[----]  79      [REDACTED]
prefix  count                                         
166.108.192.0/20        1294                                                                                                             
188.239.32.0/20 1056                                          
166.108.224.0/20        970                    
111.119.192.0/20        951              
124.243.128.0/18        667                                         
94.74.80.0/20   651                                                 
111.119.224.0/20        622                                         
111.119.240.0/20        566           
111.119.208.0/20        538                                         
[REDACTED]  313           
Even without ratios and a total count (which will come later), it was quite clear that Huawei was doing something big on the server. At that point, it was responsible for a quarter to half of the traffic on our GitLab server or about 5-10 queries per second. But just looking at the logs, or per IP hit counts, it was really hard to tell. That traffic is really well distributed. If you look more closely at the output above, you'll notice I redacted a couple of entries except major providers, for privacy reasons. But you'll also notice almost nothing is redacted in the prefix list, why? Because all of those networks are Huawei! Their announcements are kind of bonkers: they have hundreds of such prefixes. Now, clever people in the know will say "of course they do, it's an hyperscaler; just ASN14618 (AMAZON-AES) there is way more announcements, they have 1416 prefixes!" Yes, of course, but they are not generating half of my traffic (at least, not yet). But even then: this also applies to Amazon! This way of counting traffic is way more useful for large scale operations like this, because you group by organisation instead of by server or individual endpoint. And, ultimately, this is why asncounter matters: it allows you to group your traffic by organisation, the place you can actually negotiate with. Now, of course, that assumes those are entities you can talk with. I have written to both Alibaba and Huawei, and have yet to receive a response. I assume I never will. In their defence, I wrote in English, perhaps I should have made the effort of translating my message in Chinese, but then again English is the Lingua Franca of the Internet, and I doubt that's actually the issue.

The Huawei and Facebook blocks Another aside, because this is my blog and I am not looking for a Pullitzer here. So I blocked Huawei from our GitLab server (and before you tear your shirt open: only our GitLab server, everything else is still accessible to them, including our email server to respond to my complaint). I did so 24h after emailing them, and after examining their user agent (UA) headers. Boy that was fun. In a sample of 268 requests I analyzed, they churned out 246 different UAs. At first glance, they looked legit, like:
Mozilla/5.0 (Macintosh; Intel Mac OS X 10_15_7) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/131.0.0.0 Safari/537.36
Safari on a Mac, so far so good. But when you start digging, you notice some strange things, like here's Safari running on Linux:
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.3 (KHTML, like Gecko) Chrome/6.0.457.0 Safari/534.3
Was Safari ported to Linux? I guess that's.. possible? But here is Safari running on a 15 year old Ubuntu release (10.10):
Mozilla/5.0 (X11; Linux i686) AppleWebKit/534.24 (KHTML, like Gecko) Ubuntu/10.10 Chromium/12.0.702.0 Chrome/12.0.702.0 Safari/534.24
Speaking of old, here's Safari again, but this time running on Windows NT 5.1, AKA Windows XP, released 2001, EOL since 2019:
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-CA) AppleWebKit/534.13 (KHTML like Gecko) Chrome/9.0.597.98 Safari/534.13
Really? Here's Firefox 3.6, released 14 years ago, there were quite a lot of those:
Mozilla/5.0 (Windows; U; Windows NT 6.1; lt; rv:1.9.2) Gecko/20100115 Firefox/3.6
I remember running those old Firefox releases, those were the days. But to me, those look like entirely fake UAs, deliberately rotated to make it look like legitimate traffic. In comparison, Facebook seemed a bit more legit, in the sense that they don't fake it. most hits are from:
meta-externalagent/1.1 (+https://developers.facebook.com/docs/sharing/webmasters/crawler)
which, according their documentation:
crawls the web for use cases such as training AI models or improving products by indexing content directly
From what I could tell, it was even respecting our rather liberal robots.txt rules, in that it wasn't crawling the sprawling /blame/ or /commit/ endpoints, explicitly forbidden by robots.txt. So I've blocked the Facebook bot in robots.txt and, amazingly, it just went away. Good job Facebook, as much as I think you've given the empire to neo-nazis, cause depression and genocide, you know how to run a crawler, thanks. Huawei was blocked at the web server level, with a friendly 429 status code telling people to contact us (over email) if they need help. And they don't care: they're still hammering the server, from what I can tell, but then again, I didn't block the entire ASN just yet, just the blocks I found crawling the server over a couple hours.

A full asncounter run So what does a day in asncounter look like? Well, you start with a problem, say you're getting too much traffic and want to see where it's from. First you need to sample it. Typically, you'd do that with tcpdump or tailing a log file:
tail -F /var/log/apache2/*access*.log   awk ' print $2 '   asncounter
If you have lots of traffic or care about your users' privacy, you're not going to log IP addresses, so tcpdump is likely a good option instead:
tcpdump -q -n   asncounter --input-format=tcpdump --repl
If you really get a lot of traffic, you might want to get a subset of that to avoid overwhelming asncounter, it's not fast enough to do multiple gigabit/second, I bet, so here's only incoming SYN IPv4 packets:
tcpdump -q -n -Q in "tcp and tcp[tcpflags] & tcp-syn != 0 and (port 80 or port 443)"   asncounter --input-format=tcpdump --repl
In any case, at this point you're staring at a process, just sitting there. If you passed the --repl or --manhole arguments, you're lucky: you have a Python shell inside the program. Otherwise, send SIGHUP to the thing to have it dump the nice tables out:
pkill -HUP asncounter
Here's an example run:
> awk ' print $2 ' /var/log/apache2/*access*.log   asncounter
INFO: using datfile ipasn_20250527.1600.dat.gz
INFO: collecting addresses from <stdin>
INFO: loading datfile /home/anarcat/.cache/pyasn/ipasn_20250527.1600.dat.gz...
INFO: finished reading data
INFO: loading /home/anarcat/.cache/pyasn/asnames.json
count   percent ASN AS
12779   69.33   66496   SAMPLE, CA
3361    18.23   None    None
366 1.99    66497   EXAMPLE, FR
337 1.83    16276   OVH, FR
321 1.74    8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
309 1.68    14061   DIGITALOCEAN-ASN, US
128 0.69    16509   AMAZON-02, US
77  0.42    48090   DMZHOST, GB
56  0.3 136907  HWCLOUDS-AS-AP HUAWEI CLOUDS, HK
53  0.29    17621   CNCGROUP-SH China Unicom Shanghai network, CN
total: 18433
count   percent prefix  ASN AS
12779   69.33   192.0.2.0/24    66496   SAMPLE, CA
3361    18.23   None        
298 1.62    178.128.208.0/20    14061   DIGITALOCEAN-ASN, US
289 1.57    51.222.0.0/16   16276   OVH, FR
272 1.48    2001:DB8::/48   66497   EXAMPLE, FR
235 1.27    172.160.0.0/11  8075    MICROSOFT-CORP-MSN-AS-BLOCK, US
94  0.51    2001:DB8:1::/48 66497   EXAMPLE, FR
72  0.39    47.128.0.0/14   16509   AMAZON-02, US
69  0.37    93.123.109.0/24 48090   DMZHOST, GB
53  0.29    27.115.124.0/24 17621   CNCGROUP-SH China Unicom Shanghai network, CN
Those numbers are actually from my home network, not GitLab. Over there, the battle still rages on, but at least the vampire bots are banging their heads against the solid Nginx wall instead of eating the fragile heart of GitLab. We had a significant improvement in latency thanks to the Facebook and Huawei blocks... Here are the "workhorse request duration stats" for various time ranges, 20h after the block:
range mean max stdev
20h 449ms 958ms 39ms
7d 1.78s 5m 14.9s
30d 2.08s 3.86m 8.86s
6m 901ms 27.3s 2.43s
We went from two seconds mean to 500ms! And look at that standard deviation! 39ms! It was ten seconds before! I doubt we'll keep it that way very long but for now, it feels like I won a battle, and I didn't even have to setup anubis or go-away, although I suspect that will unfortunately come. Note that asncounter also supports exporting Prometheus metrics, but you should be careful with this, as it can lead to cardinal explosion, especially if you track by prefix (which can be disabled with --no-prefixes . Folks interested in more details should read the fine manual for more examples, usage, and discussion. It shows, among other things, how to effectively block lots of networks from Nginx, aggregate multiple prefixes, block entire ASNs, and more! So there you have it: I now have the tool I wish I had 20 years ago. Hopefully it will stay useful for another 20 years, although I'm not sure we'll have still have internet in 20 years. I welcome constructive feedback, "oh no you rewrote X", Grafana dashboards, bug reports, pull requests, and "hell yeah" comments. Hacker News, let it rip, I know you can give me another juicy quote for my blog. This work was done as part of my paid work for the Tor Project, currently in a fundraising drive, give us money if you like what you read.

11 May 2025

Sergio Durigan Junior: Debian Bug Squashing Party Brazil 2025

With the trixie release approaching, I had the idea back in April to organize a bug squashing party with the Debian Brasil community. I believe the outcome was very positive, and we were able to tackle and fix quite a number of release-critical bugs. This is a brief report of what we did.

A remote BSP It s not the first time I organize a BSP: back in 2019, I helped throw another similar party in Toronto. The difference this time is that, because Brazil is a big country and (perhaps most importantly) because I m not currently living there, the BSP had to be done online. I m a fan of social interactions (especially with the Brazilian community), and in my experience we usually can achieve much more when we get together in a physical place, but hey, you gotta do what you gotta do Most (if not all) of the folks interested in participating had busy weekdays, so it was decided that we would meet during the weekends and try to work on a few bugs over Jitsi. Nothing stopped people from working on bugs during the week as well, of course.

A tag to rule them all We used the bsp-2025-04-brazil usertag to mark those bugs that were touched by us. You can see the full list of bugs here, although the current list (as of 2025-05-11) is smaller than the one we had by the end of April. I don t know what happened; maybe it s some glitch with the BTS, or maybe someone removed the usertag by mistake.

Stats In total, we had:
  • 7 participants
  • 37 bugs handled. Of those,
  • 35 bugs fixed
The BSP officially started on 04 April 2025, and ended on 30 April 2025. I was able to attend meetings during two weekends; other people participated more sporadically.

Outcome As I said above, the Debian Brasil community is great and very engaged in the project. Speaking more specifically about the Debian Brasil Devel group, I can say that We have contributors with strong technical skills, and I really love that we have this inclusive, extremely technical culture where debugging and understanding things is really core to pretty much all our discussions. We already meet weekly on Thursdays to talk shop and help newcomers, so having a remote BSP with this group seemed like a logical thing to do. I m really glad to see our results and even happier to hear positive feedback from the community during the last MiniDebConf in Macei . There s some interest in organizing another BSP, this time face-to-face and during the next DebConf. I m all for it, as I love fixing bugs and having a great time with friends. If you re interested in attending, let me know. Thanks, and until next time.

2 May 2025

Daniel Lange: The Stallman wars

So, 2021 isn't bad enough yet, but don't despair, people are working to fix that:

Welcome to the Stallman wars Team Cancel: https://rms-open-letter.github.io/ (repo) Team Support: https://rms-support-letter.github.io/ (repo) Current Final stats are:

Team Cancel:  3019 signers from 1415 individual commit authors
Team Support: 6853 signers from 5418 individual commit authors
Git shortlog (Top 10):
rms_cancel.git (Last update: 2021-08-16 00:11:15 (UTC))
  1230  Neil McGovern
   251  Joan Touzet
    99  Elana Hashman
    73  Molly de Blanc
    36  Shauna
    19  Juke
    18  Stefano Zacchiroli
    17  Alexey Mirages
    16  Devin Halladay
    14  Nader Jafari
rms_support.git (Last update: 2021-09-29 07:14:39 (UTC))
  1821  shenlebantongying
  1585  nukeop
  1560  Ivanq
  1057  Victor
   880  Job Bautista
   123  nekonee
   101  Victor Gridnevsky
    41  Patrick Spek
    25  Borys Kabakov
    17  KIM Taeyeob
(data as of 2021-10-01) Technical info:
Signers are counted from their "Signed / Individuals" sections. Commits are counted with git shortlog -s.
Team Cancel also has organizational signatures with Mozilla, Suse and X.Org being among the notable signatories. The 16 original signers of the Cancel petition are added in their count. Neil McGovern, Juke and shenlebantongying need .mailmap support as they have committed with different names. Further reading: 12.04.2021 Statements from the accused 18.04.2021 Debian General Resolution The Debian General Resolution (GR) vote of the developers has concluded to not issue a public statement at all, see https://www.debian.org/vote/2021/vote_002#outcome for the results.
It is better to keep quiet and seem ignorant than to speak up and remove all doubt.
See Quote Investigator for the many people that rephrased these words over the centuries. They still need to be recalled more often as too many people in the FLOSS community have forgotten about that wisdom... 01.10.2021 Final stats It seems enough dust has settled on this unfortunate episode of mob activity now. Hence I stopped the cronjob that updated the stats above regularly. Team Support has kept adding signature all the time while Team Cancel gave up very soon after the FSF decided to stand with Mr. Stallman. So this battle was decided within two months. The stamina of the accused and determined support from some dissenting web devs trumped the orchestrated outrage of well known community figures and their publicity power this time. But history teaches us that does not mean the war is over. There will a the next opportunity to call for arms. And people will call. Unfortunately. 01.11.2024 Team Cancel is opening a new round; Team Support responds with exposing the author of "The Stallman report" I hate to be right. Three years later than the above: An anonymous member of team Cancel has published https://stallman-report.org/ [local pdf mirror, 504kB] to "justify our unqualified condemnation of Richard Stallman". It contains a detailed collection of quotes that are used to allege supporting (sexual) misconduct. The demand is again that Mr. Stallman "step[s] down from all positions at the FSF and the GNU project". Addressing him: "the scope and extent of your misconduct disqualifies you from formal positions of power within our community indefinitely". Team Support has not issues a rebuttal (yet?) but has instead identified the anonymous author as Drew "sircmpwn" DeVault, a gifted software developer, but also a vocal and controversial figure in the Open Source / Free Software space. Ironically quite similar to Richard "rms" Stallman. Their piece is published at https://dmpwn.info/ [local pdf mirror, 929kB]. They also allege a proximity of Mr. DeVault to questionable "Lolita" anime preferences and societal positions to disqualify him.

1 April 2025

Colin Watson: Free software activity in March 2025

Most of my Debian contributions this month were sponsored by Freexian. You can also support my work directly via Liberapay. OpenSSH Changes in dropbear 2025.87 broke OpenSSH s regression tests. I cherry-picked the fix. I reviewed and merged patches from Luca Boccassi to send and accept the COLORTERM and NO_COLOR environment variables. Python team Following up on last month, I fixed some more uscan errors: I upgraded these packages to new upstream versions: In bookworm-backports, I updated python-django to 3:4.2.19-1. Although Debian s upgrade to python-click 8.2.0 was reverted for the time being, I fixed a number of related problems anyway since we re going to have to deal with it eventually: dh-python dropped its dependency on python3-setuptools in 6.20250306, which was long overdue, but it had quite a bit of fallout; in most cases this was simply a question of adding build-dependencies on python3-setuptools, but in a few cases there was a missing build-dependency on python3-typing-extensions which had previously been pulled in as a dependency of python3-setuptools. I fixed these bugs resulting from this: We agreed to remove python-pytest-flake8. In support of this, I removed unnecessary build-dependencies from pytest-pylint, python-proton-core, python-pyzipper, python-tatsu, python-tatsu-lts, and python-tinycss, and filed #1101178 on eccodes-python and #1101179 on rpmlint. There was a dnspython autopkgtest regression on s390x. I independently tracked that down to a pylsqpack bug and came up with a reduced test case before realizing that Pranav P had already been working on it; we then worked together on it and I uploaded their patch to Debian. I fixed various other build/test failures: I enabled more tests in python-moto and contributed a supporting fix upstream. I sponsored Maximilian Engelhardt to reintroduce zope.sqlalchemy. I fixed various odds and ends of bugs: I contributed a small documentation improvement to pybuild-autopkgtest(1). Rust team I upgraded rust-asn1 to 0.20.0. Science team I finally gave in and joined the Debian Science Team this month, since it often has a lot of overlap with the Python team, and Freexian maintains several packages under it. I fixed a uscan error in hdf5-blosc (maintained by Freexian), and upgraded it to a new upstream version. I fixed python-vispy: missing dependency on numpy abi. Other bits and pieces I fixed debconf should automatically be noninteractive if input is /dev/null. I fixed a build failure with GCC 15 in yubihsm-shell (maintained by Freexian). Prompted by a CI failure in debusine, I submitted a large batch of spelling fixes and some improved static analysis to incus (#1777, #1778) and distrobuilder. After regaining access to the repository, I fixed telegnome: missing app icon in About dialogue and made a new 0.3.7 release.

5 March 2025

Otto Kek l inen: Will decentralized social media soon go mainstream?

Featured image of post Will decentralized social media soon go mainstream?In today s digital landscape, social media is more than just a communication tool it is the primary medium for global discourse. Heads of state, corporate leaders and cultural influencers now broadcast their statements directly to the world, shaping public opinion in real time. However, the dominance of a few centralized platforms X/Twitter, Facebook and YouTube raises critical concerns about control, censorship and the monopolization of information. Those who control these networks effectively wield significant power over public discourse. In response, a new wave of distributed social media platforms has emerged, each built on different decentralized protocols designed to provide greater autonomy, censorship resistance and user control. While Wikipedia maintains a comprehensive list of distributed social networking software and protocols, it does not cover recent blockchain-based systems, nor does it highlight which have the most potential for mainstream adoption. This post explores the leading decentralized social media platforms and the protocols they are based on: Mastodon (ActivityPub), Bluesky (AT Protocol), Warpcast (Farcaster), Hey (Lens) and Primal (Nostr).

Comparison of architecture and mainstream adoption potential
Protocol Identity System Example Storage model Cost for end users Potential
Mastodon Tied to server domain @ottok@mastodon.social Federated instances Free (some instances charge) High
Bluesky Portable (DID) ottoke.bsky.social Federated instances Free Moderate
Farcaster ENS (Ethereum) @ottok Blockchain + off-chain Small gas fees Moderate
Lens NFT-based (Polygon) @ottok Blockchain + off-chain Small gas fees Niche
Nostr Cryptographic Keys npub16lc6uhqpg6dnqajylkhwuh3j7ynhcnje508tt4v6703w9kjlv9vqzz4z7f Federated instances Free (some instances charge) Niche

1. Mastodon (ActivityPub) Screenshot of Mastodon Mastodon was created in 2016 by Eugen Rochko, a German software developer who sought to provide a decentralized and user-controlled alternative to Twitter. It was built on the ActivityPub protocol, now standardized by W3C Social Web Working Group, to allow users to join independent servers while still communicating across the broader Mastodon network. Mastodon operates on a federated model, where multiple independently run servers communicate via ActivityPub. Each server sets its own moderation policies, leading to a decentralized but fragmented experience. The servers can alternatively be called instances, relays or nodes, depending on what vocabulary a protocol has standardized on.
  • Identity: User identity is tied to the instance where they registered, represented as @username@instance.tld.
  • Storage: Data is stored on individual instances, which federate messages to other instances based on their configurations.
  • Cost: Free to use, but relies on instance operators willing to run the servers.
The protocol defines multiple activities such as:
  • Creating a post
  • Liking
  • Sharing
  • Following
  • Commenting

Example Message in ActivityPub (JSON-LD Format)
json
 
 "@context": "https://www.w3.org/ns/activitystreams",
 "type": "Create",
 "actor": "https://mastodon.social/users/ottok",
 "object":  
 "type": "Note",
 "content": "Hello from #Mastodon!",
 "published": "2025-03-03T12:00:00Z",
 "to": ["https://www.w3.org/ns/activitystreams#Public"]
  
 
Servers communicate across different platforms by publishing activities to their followers or forwarding activities between servers. Standard HTTPS is used between servers for communication, and the messages use JSON-LD for data representation. The WebFinger protocol is used for user discovery. There is however no neat way for home server discovery yet. This means that if you are browsing e.g. Fosstodon and want to follow a user and press Follow, a dialog will pop up asking you to enter your own home server (e.g. mastodon.social) to redirect you there for actually executing the Follow action on with your account. Mastodon is open source under the AGPL at github.com/mastodon/mastodon. Anyone can operate their own instance. It just requires to run your own server and some skills to maintain a Ruby on Rails app with a PostgreSQL database backend, and basic understanding of the protocol to configure federation with other ActivityPub instances.

Popularity: Already established, but will it grow more? Mastodon has seen steady growth, especially after Twitter s acquisition in 2022, with some estimates stating it peaked at 10 million users across thousands of instances. However, its fragmented user experience and the complexity of choosing instances have hindered mainstream adoption. Still, it remains the most established decentralized alternative to Twitter. Note that Donald Trump s Truth Social is based on the Mastodon software but does not federate with the ActivityPub network. The ActivityPub protocol is the most widely used of its kind. One of the other most popular services is the Lemmy link sharing service, similar to Reddit. The larger ecosystem of ActivityPub is called Fediverse, and estimates put the total active user count around 6 million.

2. Bluesky (AT Protocol) Screenshot of Bluesky Interestingly, Bluesky was conceived within Twitter in 2019 by Twitter founder Jack Dorsey. After being incubated as a Twitter-funded project, it spun off as an independent Public Benefit LLC in February 2022 and launched its public beta in February 2023. Bluesky runs on top of the Authenticated Transfer (AT) Protocol published at https://github.com/bluesky-social/atproto. The protocol enables portable identities and data ownership, meaning users can migrate between platforms while keeping their identity and content intact. In practice, however, there is only one popular server at the moment, which is Bluesky itself.
  • Identity: Usernames are domain-based (e.g., @user.bsky.social).
  • Storage: Content is theoretically federated among various servers.
  • Cost: Free to use, but relies on instance operators willing to run the servers.

Example Message in AT Protocol (JSON Format)
json
 
 "repo": "did:plc:ottoke.bsky.social",
 "collection": "app.bsky.feed.post",
 "record":  
 "$type": "app.bsky.feed.post",
 "text": "Hello from Bluesky!",
 "createdAt": "2025-03-03T12:00:00Z",
 "langs": ["en"]
  
 

Popularity: Hybrid approach may have business benefits? Bluesky reported over 3 million users by 2024, probably getting traction due to its Twitter-like interface and Jack Dorsey s involvement. Its hybrid approach decentralized identity with centralized components could make it a strong candidate for mainstream adoption, assuming it can scale effectively.

3. Warpcast (Farcaster Network) Farcaster was launched in 2021 by Dan Romero and Varun Srinivasan, both former crypto exchange Coinbase executives, to create a decentralized but user-friendly social network. Built on the Ethereum blockchain, it could potentially offer a very attack-resistant communication medium. However, in my own testing, Farcaster does not seem to fully leverage what Ethereum could offer. First of all, there is no diversity in programs implementing the protocol as at the moment there is only Warpcast. In Warpcast the signup requires an initial 5 USD fee that is not payable in ETH, and users need to create a new wallet address on the Ethereum layer 2 network Base instead of simply reusing their existing Ethereum wallet address or ENS name. Despite this, I can understand why Farcaster may have decided to start out like this. Having a single client program may be the best strategy initially. One of the decentralized chat protocol Matrix founders, Matthew Hodgson, shared in his FOSDEM 2025 talk that he slightly regrets focusing too much on developing the protocol instead of making sure the app to use it is attractive to end users. So it may be sensible to ensure Warpcast gets popular first, before attempting to make the Farcaster protocol widely used. As a protocol Farcaster s hybrid approach makes it more scalable than fully on-chain networks, giving it a higher chance of mainstream adoption if it integrates seamlessly with broader Web3 ecosystems.
  • Identity: ENS (Ethereum Name Service) domains are used as usernames.
  • Storage: Messages are stored in off-chain hubs, while identity is on-chain.
  • Cost: Users must pay gas fees for some operations but reading and posting messages is mostly free.

Example Message in Farcaster (JSON Format)
json
 
 "fid": 766579,
 "username": "ottok",
 "custodyAddress": "0x127853e48be3870172baa4215d63b6d815d18f21",
 "connectedWallet": "0x3ebe43aa3ae5b891ca1577d9c49563c0cee8da88",
 "text": "Hello from Farcaster!",
 "publishedAt": 1709424000,
 "replyTo": null,
 "embeds": []
 

Popularity: Decentralized social media + decentralized payments a winning combo? Ethereum founder Vitalik Buterin (warpcast.com/vbuterin) and many core developers are active on the platform. Warpcast, the main client for Farcaster, has seen increasing adoption, especially among Ethereum developers and Web3 enthusiasts. I too have an profile at warpcast.com/ottok. However, the numbers are still very low and far from reaching network effects to really take off. Blockchain-based social media networks, particularly those built on Ethereum, are compelling because they leverage existing user wallets and persistent identities while enabling native payment functionality. When combined with decentralized content funding through micropayments, these blockchain-backed social networks could offer unique advantages that centralized platforms may find difficult to replicate, being decentralized both as a technical network and in a funding mechanism.

4. Hey.xyz (Lens Network) The Lens Protocol was developed by decentralized finance (DeFi) team Aave and launched in May 2022 to provide a user-owned social media network. While initially built on Polygon, it has since launched its own Layer 2 network called the Lens Network in February 2024. Lens is currently the main competitor to Farcaster. Lens stores profile ownership and references on-chain, while content is stored on IPFS/Arweave, enabling composability with DeFi and NFTs.
  • Identity: Profile ownership is tied to NFTs on the Polygon blockchain.
  • Storage: Content is on-chain and integrates with IPFS/Arweave (like NFTs).
  • Cost: Users must pay gas fees for some operations but reading and posting messages is mostly free.

Example Message in Lens (JSON Format)
json
 
 "profileId": "@ottok",
 "contentURI": "ar://QmExampleHash",
 "collectModule": "0x23b9467334bEb345aAa6fd1545538F3d54436e96",
 "referenceModule": "0x0000000000000000000000000000000000000000",
 "timestamp": 1709558400
 

Popularity: Probably not as social media site, but maybe as protocol? The social media side of Lens is mainly the Hey.xyz website, which seems to have fewer users than Warpcast, and is even further away from reaching critical mass for network effects. The Lens protocol however has a lot of advanced features and it may gain adoption as the building block for many Web3 apps.

5. Primal.net (Nostr Network) Nostr (Notes and Other Stuff Transmitted by Relays) was conceptualized in 2020 by an anonymous developer known as fiatjaf. One of the primary design tenets was to be a censorship-resistant protocol and it is popular among Bitcoin enthusiasts, with Jack Dorsey being one of the public supporters. Unlike the Farcaster and Lens protocols, Nostr is not blockchain-based but just a network of relay servers for message distribution. If does however use public key cryptography for identities, similar to how wallets work in crypto.
  • Identity: Public-private key pairs define identity (with prefix npub...).
  • Storage: Content is federated among multiple servers, which in Nostr vocabulary are called relays.
  • Cost: No gas fees, but relies on relay operators willing to run the servers.

Example Message in Nostr (JSON Format)
json
 
 "id": "note1xyz...",
 "pubkey": "npub1...",
 "kind": 1,
 "content": "Hello from Nostr!",
 "created_at": 1709558400,
 "tags": [],
 "sig": "sig1..."
 

Popularity: If Jack Dorsey and Bitcoiners promote it enough? Primal.net as a web app is pretty solid, but it does not stand out much. While Jack Dorsey has shown support by donating $1.5 million to the protocol development in December 2021, its success likely depends on broader adoption by the Bitcoin community.

Will any of these replace X/Twitter? As usage patterns vary, the statistics are not fully comparable, but this overview of the situation in March 2025 gives a decent overview.
Platform Total Accounts Active Users Growth Trend
Mastodon ~10 million ~1 million Steady
Bluesky ~33 million ~1 million Steady
Nostr ~41 million ~20 thousand Steady
Farcaster ~850 thousand ~50 thousand Flat
Lens ~140 thousand ~20 thousand Flat
Mastodon and Bluesky have already reached millions of users, while Lens and Farcaster are growing within crypto communities. It is however clear that none of these are anywhere close to how popular X/Twitter is. In particular, Mastodon had a huge influx of users in the fall of 2022 when Twitter was acquired, but to challenge the incumbents the growth would need to significantly accelerate. We can all accelerate this development by embracing decentralized social media now alongside existing dominant platforms. Who knows, given the right circumstances maybe X.com leadership decides to change the operating model and start federating contents to break out from a walled garden model. The likelyhood of such development would increase if decentralized networks get popular, and the encumbents feel they need to participate to not lose out.

Past and future The idea of decentralized social media is not new. One early pioneer identi.ca launched in 2008, only two years after Twitter, using the OStatus protocol to promote decentralization. A few years later it evolved into pump.io with the ActivityPump protocol, and also forked into GNU Social that continued with OStatus. I remember when these happened, and that in 2010 also Diaspora launched with fairly large publicity. Surprisingly both of these still operate (I can still post both on identi.ca and diasp.org), but the activity fizzled out years ago. The protocol however survived partially and evolved into ActivityPub, which is now the backbone of the Fediverse. The evolution of decentralized social media over the next decade will likely parallel developments in democracy, freedom of speech and public discourse. While the early 2010s emphasized maximum independence and freedom, the late 2010s saw growing support for content moderation to combat misinformation. The AI era introduces new challenges, potentially requiring proof-of-humanity verification for content authenticity. Key factors that will determine success:
  • User experience and ease of onboarding
  • Network effects and critical mass of users
  • Integration with existing web3 infrastructure
  • Balance between decentralization and usability
  • Sustainable economic models for infrastructure
This is clearly an area of development worth monitoring closely, as the next few years may determine which protocol becomes the de facto standard for decentralized social communication.

11 February 2025

Freexian Collaborators: Debian Contributions: Python 3.13 as the default Python 3 version, Fixing qtpaths6 for cross compilation, sbuild support for Salsa CI, Rails 7 transition, DebConf preparations and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-01 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Python 3.13 is now the default Python 3 version in Debian, by Stefano Rivera and Colin Watson The Python 3.13 as default transition has now completed. The next step is to remove Python 3.12 from the archive, which should be very straightforward, it just requires rebuilding C extension packages in no particular order. Stefano fixed some miscellaneous bugs blocking the completion of the 3.13 as default transition.

Fixing qtpaths6 for cross compilation, by Helmut Grohne While Qt5 used to use qmake to query installation properties, Qt6 is moving more and more to CMake and to ease that transition it relies on more qtpaths. Since this tool is not naturally aware of the architecture it is called for, it tends to produce results for the build architecture. Therefore, more than 100 packages were picking up a multiarch directory for the build architecture during cross builds. In collaboration with the Qt/KDE team and Sandro Knau in particular (none affiliated with Freexian), we added an architecture-specific wrapper script in the same way qmake has one for Qt5 and Qt6 already. The relevant CMake module has been updated to prefer the triplet-prefixed wrapper. As a result, most of the KDE packages now cross build on unstable ready in time for the trixie release.

/usr-move, by Helmut Grohne In December, Emil S dergren reported that a live-build was not working for him and in January, Colin Watson reported that the proposed mitigation for debian-installer-utils would practically fail. Both failures were to be attributed to a wrong understanding of implementation-defined behavior in dpkg-divert. As a result, all M18 mitigations had to be reviewed and many of them replaced. Many have been uploaded already and all instances have received updated patches. Even though dumat has been in operation for more than a year, it gained recent changes. For one thing, analysis of architectures other than amd64 was requested. Chris Hofstaedler (not affiliated with Freexian) kindly provided computing resources for repeatedly running it on the larger set. Doing so revealed various cross-architecture undeclared file conflicts in gcc, glibc, and binutils-z80, but it also revealed a previously unknown /usr-move issue in rpi.rpi-common. On top of that, dumat produced false positive diagnostics and wrongly associated Debian bugs in some cases, both of which have now been fixed. As a result, a supposedly fixed python3-sepolicy issue had to be reopened.

rebootstrap, by Helmut Grohne As much as we think of our base system as stable, it is changing a lot and the architecture cross bootstrap tooling is very sensitive to such changes requiring permanent maintenance. A problem that recently surfaced was that building a binutils cross toolchain would result in a binutils-for-host package that would not be practically installable as it would depend on a binutils-common package that was not built. This turned into an examination of binutils-common and noticing that it actually differed across architectures even though it should not. Johannes Schauer Marin Rodrigues (not affiliated with Freexian) and Colin Watson kindly helped brainstorm possible solutions. Eventually, Helmut provided a patch to move gprofng bits out of binutils-common. Independently, Matthias Klose (not affiliated with Freexian) split out binutils-gold into a separate source package. As a result, binutils-common is now equal across architectures and can be marked Multi-Arch: foreign resolving the initial problem.

Salsa CI, by Santiago Ruano Rinc n Santiago continued the work about the sbuild support for Salsa CI, that was mentioned in the previous month report. The !568 merge request that created the new build image was merged, making it easier to test !569 with external projects. Santiago used a fork of the debusine repo to try the draft !569, and some issues were spotted, and part of them fixed. This is the last debusine pipeline run with the current !569: https://salsa.debian.org/santiago/debusine/-/pipelines/794233. One of the last improvements relates to how to enable projects to customize the pipeline, in an equivalent way than they currently do in the extract-source and build jobs. While this is work-in-progress, the results are rather promising. Next steps include deciding on introducing schroot support for bookworm, bookworm-security, and older releases, as they are done in the official debian buildd.

DebConf preparations, by Stefano Rivera and Santiago Ruano Rinc n DebConf will be happening in Brest, France, in July. Santiago continued the DebConf 25 organization work, looking for catering providers. Both Stefano and Santiago have been reaching out to some potential sponsors. DebConf depends on sponsors to cover the organization cost, if your company depends on Debian, please consider sponsoring DebConf. Stefano has been winding up some of the finances from previous DebConfs. Finalizing reimbursements to team members from DebConf 23, and handling some outstanding issues from DebConf 24. Stefano and the rest of the DebConf committee have been reviewing bids for DebConf 26, to select the next venue.

Ruby 3.3 is now the default Ruby interpreter, by Lucas Kanashiro Ruby 3.3 is about to become the default Ruby interpreter for Trixie. Many bugs were fixed by Lucas and the Debian Ruby team during the sprint hold in Paris during Jan 27-31. The next step is to remove support of Ruby 3.1, which is the alternative Ruby interpreter for now. Thanks to the Debian Release team for all the support, especially Emilio Pozuelo Monfort.

Rails 7 transition, by Lucas Kanashiro Rails 6 has been shipped by Debian since Bullseye, and as a WEB framework, many issues (especially security related issues) have been encountered and the maintainability of it becomes harder and harder. With that in mind, during the Debian Ruby team sprint last month, the transition to Rack 3 (an important dependency of rails containing many breaking changes) was started in Debian unstable, it is ongoing. Once it is done, the Rails 7 transition will take place, and Rails 7 should be shipped in Debian Trixie.

Miscellaneous contributions
  • Stefano improved a poor ImportError for users of the turtle module on Python 3, who haven t installed the python3-tk package.
  • Stefano updated several packages to new upstream releases.
  • Stefano added the Python extension to the re2 package, allowing for the use of the Google RE2 regular expression library as a direct replacement for the standard library re module.
  • Stefano started provisioning a new physical server for the debian.social infrastructure.
  • Carles improved simplemonitor (documentation on systemd integration, worked with upstream for fixing a bug).
  • Carles upgraded packages to new upstream versions: python-ring-doorbell and python-asyncclick.
  • Carles did po-debconf translations to Catalan: reviewed 44 packages and submitted translations to 90 packages (via salsa merge requests or bugtracker bugs).
  • Carles maintained po-debconf-manager with small fixes.
  • Rapha l worked on some outstanding DEP-14 merge request and participated in the associated discussion. The discussions have been more contentious than anticipated, somewhat exacerbated by Otto s desire to conclude fast while the required tool support is not yet there.
  • Rapha l, with the help of Philipp Kern from the DSA team, upgraded tracker.debian.org to use Django 4.2 (from bookworm-backports) which in turn enabled him to configure authentication via salsa.debian.org. It s now possible to login to tracker.debian.org with your salsa credentials!
  • Rapha l updated zim a nice desktop wiki that is very handy to organize your day-to-day digital life to the latest upstream version (0.76).
  • Helmut sent patches for 10 cross build failures.
  • Helmut continued working on a tool for memory-based concurrency limit of builds.
  • Helmut NMUed libtool, opensysusers and virtualbox.
  • Enrico tried to support Helmut in working out tricky usrmerge situations
  • Thorsten Alteholz uploaded a new upstream version of brlaser.
  • Colin Watson upgraded 33 Python packages to new upstream versions, including fixes for CVE-2024-42353, CVE-2024-47532, and CVE-2025-22153.
  • Emilio Pozuelo managed various transitions, and fixed various RC bugs (telepathy-glib, xorg, xserver-xorg-video-vesa, apitrace, mesa).
  • Anupa attended the monthly team meeting for Debian publicity team and shared the social media stats.
  • Anupa assisted Jean-Pierre Giraud in the point release announcement for Debian 12.9 and published the Micronews.
  • Anupa took part in multiple Debian publicity team discussions regarding our presence in social media platforms.

9 February 2025

Antoine Beaupr : A slow blogging year

Well, 2024 will be remembered, won't it? I guess 2025 already wants to make its mark too, but let's not worry about that right now, and instead let's talk about me. A little over a year ago, I was gloating over how I had such a great blogging year in 2022, and was considering 2023 to be average, then went on to gather more stats and traffic analysis... Then I said, and I quote:
I hope to write more next year. I've been thinking about a few posts I could write for work, about how things work behind the scenes at Tor, that could be informative for many people. We run a rather old setup, but things hold up pretty well for what we throw at it, and it's worth sharing that with the world...
What a load of bollocks.

A bad year for this blog 2024 was the second worst year ever in my blogging history, tied with 2009 at a measly 6 posts for the year:
anarcat@angela:anarc.at$ curl -sSL https://anarc.at/blog/   grep 'href="\./'   grep -o 20[0-9][0-9]   sort   uniq -c   sort -nr   grep -v 2025   tail -3
      6 2024
      6 2009
      3 2014
I did write about my work though, detailing the migration from Gitolite to GitLab we completed that year. But after August, total radio silence until now.

Loads of drafts It's not that I have nothing to say: I have no less than five drafts in my working tree here, not counting three actual drafts recorded in the Git repository here:
anarcat@angela:anarc.at$ git s blog
## main...origin/main
?? blog/bell-bot.md
?? blog/fish.md
?? blog/kensington.md
?? blog/nixos.md
?? blog/tmux.md
anarcat@angela:anarc.at$ git grep -l '\!tag draft'
blog/mobile-massive-gallery.md
blog/on-dying.mdwn
blog/secrets-recovery.md
I just don't have time to wrap those things up. I think part of me is disgusted by seeing my work stolen by large corporations to build proprietary large language models while my idols have been pushed to suicide for trying to share science with the world. Another part of me wants to make those things just right. The "tagged drafts" above are nothing more than a huge pile of chaotic links, far from being useful for anyone else than me, and even then. The on-dying article, in particular, is becoming my nemesis. I've been wanting to write that article for over 6 years now, I think. It's just too hard.

Writing elsewhere There's also the fact that I write for work already. A lot. Here are the top-10 contributors to our team's wiki:
anarcat@angela:help.torproject.org$ git shortlog --numbered --summary --group="format:%al"   head -10
  4272  anarcat
   423  jerome
   117  zen
   116  lelutin
   104  peter
    58  kez
    45  irl
    43  hiro
    18  gaba
    17  groente
... but that's a bit unfair, since I've been there half a decade. Here's the last year:
anarcat@angela:help.torproject.org$ git shortlog --since=2024-01-01 --numbered --summary --group="format:%al"   head -10
   827  anarcat
   117  zen
   116  lelutin
    91  jerome
    17  groente
    10  gaba
     8  micah
     7  kez
     5  jnewsome
     4  stephen.swift
So I still write the most commits! But to truly get a sense of the amount I wrote in there, we should count actual changes. Here it is by number of lines (from commandlinefu.com):
anarcat@angela:help.torproject.org$ git ls-files   xargs -n1 git blame --line-porcelain   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  99046 Antoine Beaupr 
   6900 Zen Fu
   4784 J r me Charaoui
   1446 Gabriel Filion
   1146 Jerome Charaoui
    837 groente
    705 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
That, of course, is the entire history of the git repo, again. We should take only the last year into account, and probably ignore the tails directory, as sneaky Zen Fu imported the entire docs from another wiki there...
anarcat@angela:help.torproject.org$ find [d-s]* -type f -mtime -365   xargs -n1 git blame --line-porcelain 2>/dev/null   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  75037 Antoine Beaupr 
   2932 J r me Charaoui
   1442 Gabriel Filion
   1400 Zen Fu
    929 Jerome Charaoui
    837 groente
    702 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
Pretty good! 75k lines. But those are the files that were modified in the last year. If we go a little more nuts, we find that:
anarcat@angela:help.torproject.org$ $ git-count-words-range.py    sort -k6 -nr   head -10
parsing commits for words changes from command: git log '--since=1 year ago' '--format=%H %al'
anarcat 126116 - 36932 = 89184
zen 31774 - 5749 = 26025
groente 9732 - 607 = 9125
lelutin 10768 - 2578 = 8190
jerome 6236 - 2586 = 3650
gaba 3164 - 491 = 2673
stephen.swift 2443 - 673 = 1770
kez 1034 - 74 = 960
micah 772 - 250 = 522
weasel 410 - 0 = 410
I wrote 126,116 words in that wiki, only in the last year. I also deleted 37k words, so the final total is more like 89k words, but still: that's about forty (40!) articles of the average size (~2k) I wrote in 2022. (And yes, I did go nuts and write a new log parser, essentially from scratch, to figure out those word diffs. I did get the courage only after asking GPT-4o for an example first, I must admit.) Let's celebrate that again: I wrote 90 thousand words in that wiki in 2024. According to Wikipedia, a "novella" is 17,500 to 40,000 words, which would mean I wrote about a novella and a novel, in the past year. But interestingly, if I look at the repository analytics. I certainly didn't write that much more in the past year. So that alone cannot explain the lull in my production here.

Arguments Another part of me is just tired of the bickering and arguing on the internet. I have at least two articles in there that I suspect is going to get me a lot of push-back (NixOS and Fish). I know how to deal with this: you need to write well, consider the controversy, spell it out, and defuse things before they happen. But that's hard work and, frankly, I don't really care that much about what people think anymore. I'm not writing here to convince people. I have stop evangelizing a long time ago. Now, I'm more into documenting, and teaching. And, while teaching, there's a two-way interaction: when you give out a speech or workshop, people can ask questions, or respond, and you all learn something. When you document, you quickly get told "where is this? I couldn't find it" or "I don't understand this" or "I tried that and it didn't work" or "wait, really? shouldn't we do X instead", and you learn. Here, it's static. It's my little soapbox where I scream in the void. The only thing people can do is scream back.

Collaboration So. Let's see if we can work together here. If you don't like something I say, disagree, or find something wrong or to be improved, instead of screaming on social media or ignoring me, try contributing back. This site here is backed by a git repository and I promise to read everything you send there, whether it is an issue or a merge request. I will, of course, still read comments sent by email or IRC or social media, but please, be kind. You can also, of course, follow the latest changes on the TPA wiki. If you want to catch up with the last year, some of the "novellas" I wrote include: (Well, no, you can't actually follow changes on a GitLab wiki. But we have a wiki-replica git repository where you can see the latest commits, and subscribe to the RSS feed.) See you there!

Antoine Beaupr : Qalculate hacks

This is going to be a controversial statement because some people are absolute nerds about this, but, I need to say it. Qalculate is the best calculator that has ever been made. I am not going to try to convince you of this, I just wanted to put out my bias out there before writing down those notes. I am a total fan. This page will collect my notes of cool hacks I do with Qalculate. Most examples are copy-pasted from the command-line interface (qalc(1)), but I typically use the graphical interface as it's slightly better at displaying complex formulas. Discoverability is obviously also better for the cornucopia of features this fantastic application ships.

Qalc commandline primer On Debian, Qalculate's CLI interface can be installed with:
apt install qalc
Then you start it with the qalc command, and end up on a prompt:
anarcat@angela:~$ qalc
> 
Then it's a normal calculator:
anarcat@angela:~$ qalc
> 1+1
  1 + 1 = 2
> 1/7
  1 / 7   0.1429
> pi
  pi   3.142
> 
There's a bunch of variables to control display, approximation, and so on:
> set precision 6
> 1/7
  1 / 7   0.142857
> set precision 20
> pi
  pi   3.1415926535897932385
When I need more, I typically browse around the menus. One big issue I have with Qalculate is there are a lot of menus and features. I had to fiddle quite a bit to figure out that set precision command above. I might add more examples here as I find them.

Bandwidth estimates I often use the data units to estimate bandwidths. For example, here's what 1 megabit per second is over a month ("about 300 GiB"):
> 1 megabit/s * 30 day to gibibyte 
  (1 megabit/second)   (30 days)   301.7 GiB
Or, "how long will it take to download X", in this case, 1GiB over a 100 mbps link:
> 1GiB/(100 megabit/s)
  (1 gibibyte) / (100 megabits/second)   1 min + 25.90 s

Password entropy To calculate how much entropy (in bits) a given password structure, you count the number of possibilities in each entry (say, [a-z] is 26 possibilities, "one word in a 8k dictionary" is 8000), extract the base-2 logarithm, multiplied by the number of entries. For example, an alphabetic 14-character password is:
> log2(26*2)*14
  log (26   2)   14   79.81
... 80 bits of entropy. To get the equivalent in a Diceware password with a 8000 word dictionary, you would need:
> log2(8k)*x = 80
  (log (8   000)   x) = 80  
  x   6.170
... about 6 words, which gives you:
> log2(8k)*6
  log (8   1000)   6   77.79
78 bits of entropy.

Exchange rates You can convert between currencies!
> 1 EUR to USD
  1 EUR   1.038 USD
Even fake ones!
> 1 BTC to USD
  1 BTC   96712 USD
This relies on a database pulled form the internet (typically the central european bank rates, see the source). It will prompt you if it's too old:
It has been 256 days since the exchange rates last were updated.
Do you wish to update the exchange rates now? y
As a reader pointed out, you can set the refresh rate for currencies, as some countries will require way more frequent exchange rates. The graphical version has a little graphical indicator that, when you mouse over, tells you where the rate comes from.

Other conversions Here are other neat conversions extracted from my history
> teaspoon to ml
  teaspoon = 5 mL
> tablespoon to ml
  tablespoon = 15 mL
> 1 cup to ml 
  1 cup   236.6 mL
> 6 L/100km to mpg
  (6 liters) / (100 kilometers)   39.20 mpg
> 100 kph to mph
  100 kph   62.14 mph
> (108km - 72km) / 110km/h
  ((108 kilometers)   (72 kilometers)) / (110 kilometers/hour)  
  19 min + 38.18 s

Completion time estimates This is a more involved example I often do.

Background Say you have started a long running copy job and you don't have the luxury of having a pipe you can insert pv(1) into to get a nice progress bar. For example, rsync or cp -R can have that problem (but not tar!). (Yes, you can use --info=progress2 in rsync, but that estimate is incremental and therefore inaccurate unless you disable the incremental mode with --no-inc-recursive, but then you pay a huge up-front wait cost while the entire directory gets crawled.)

Extracting a process start time First step is to gather data. Find the process start time. If you were unfortunate enough to forget to run date --iso-8601=seconds before starting, you can get a similar timestamp with stat(1) on the process tree in /proc with:
$ stat /proc/11232
  File: /proc/11232
  Size: 0               Blocks: 0          IO Block: 1024   directory
Device: 0,21    Inode: 57021       Links: 9
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-02-07 15:50:25.287220819 -0500
Modify: 2025-02-07 15:50:25.287220819 -0500
Change: 2025-02-07 15:50:25.287220819 -0500
 Birth: -
So our start time is 2025-02-07 15:50:25, we shave off the nanoseconds there, they're below our precision noise floor. If you're not dealing with an actual UNIX process, you need to figure out a start time: this can be a SQL query, a network request, whatever, exercise for the reader.

Saving a variable This is optional, but for the sake of demonstration, let's save this as a variable:
> start="2025-02-07 15:50:25"
  save("2025-02-07T15:50:25"; start; Temporary; ; 1) =
  "2025-02-07T15:50:25"

Estimating data size Next, estimate your data size. That will vary wildly with the job you're running: this can be anything: number of files, documents being processed, rows to be destroyed in a database, whatever. In this case, rsync tells me how many bytes it has transferred so far:
# rsync -ASHaXx --info=progress2 /srv/ /srv-zfs/
2.968.252.503.968  94%    7,63MB/s    6:04:58  xfr#464440, ir-chk=1000/982266) 
Strip off the weird dots in there, because that will confuse qalculate, which will count this as:
  2.968252503968 bytes   2.968 B
Or, essentially, three bytes. We actually transferred almost 3TB here:
  2968252503968 bytes   2.968 TB
So let's use that. If you had the misfortune of making rsync silent, but were lucky enough to transfer entire partitions, you can use df (without -h! we want to be more precise here), in my case:
Filesystem              1K-blocks       Used  Available Use% Mounted on
/dev/mapper/vg_hdd-srv 7512681384 7258298036  179205040  98% /srv
tank/srv               7667173248 2870444032 4796729216  38% /srv-zfs
(Otherwise, of course, you use du -sh $DIRECTORY.)

Digression over bytes Those are 1 K bytes which is actually (and rather unfortunately) Ki, or "kibibytes" (1024 bytes), not "kilobytes" (1000 bytes). Ugh.
> 2870444032 KiB
  2870444032 kibibytes   2.939 TB
> 2870444032 kB
  2870444032 kilobytes   2.870 TB
At this scale, those details matter quite a bit, we're talking about a 69GB (64GiB) difference here:
> 2870444032 KiB - 2870444032 kB
  (2870444032 kibibytes)   (2870444032 kilobytes)   68.89 GB
Anyways. Let's take 2968252503968 bytes as our current progress. Our entire dataset is 7258298064 KiB, as seen above.

Solving a cross-multiplication We have 3 out of four variables for our equation here, so we can already solve:
> (now-start)/x = (2996538438607 bytes)/(7258298064 KiB) to h
  ((actual   start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes))
  x   59.24 h
The entire transfer will take about 60 hours to complete! Note that's not the time left, that is the total time. To break this down step by step, we could calculate how long it has taken so far:
> now-start
  now   start   23 h + 53 min + 6.762 s
> now-start to s
  now   start   85987 s
... and do the cross-multiplication manually, it's basically:
x/(now-start) = (total/current)
so:
x = (total/current) * (now-start)
or, in Qalc:
> ((7258298064  kibibytes) / ( 2996538438607 bytes) ) *  85987 s
  ((7258298064 kibibytes) / (2996538438607 bytes))   (85987 secondes)  
  2 d + 11 h + 14 min + 38.81 s
It's interesting it gives us different units here! Not sure why.

Now and built-in variables The now here is actually a built-in variable:
> now
  now   "2025-02-08T22:25:25"
There is a bewildering list of such variables, for example:
> uptime
  uptime = 5 d + 6 h + 34 min + 12.11 s
> golden
  golden   1.618
> exact
  golden = ( (5) + 1) / 2

Computing dates In any case, yay! We know the transfer is going to take roughly 60 hours total, and we've already spent around 24h of that, so, we have 36h left. But I did that all in my head, we can ask more of Qalc yet! Let's make another variable, for that total estimated time:
> total=(now-start)/x = (2996538438607 bytes)/(7258298064 KiB)
  save(((now   start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes)); total; Temporary; ; 1)  
  2 d + 11 h + 14 min + 38.22 s
And we can plug that into another formula with our start time to figure out when we'll be done!
> start+total
  start + total   "2025-02-10T03:28:52"
> start+total-now
  start + total   now   1 d + 11 h + 34 min + 48.52 s
> start+total-now to h
  start + total   now   35 h + 34 min + 32.01 s
That transfer has ~1d left, or 35h24m32s, and should complete around 4 in the morning on February 10th. But that's icing on top. I typically only do the cross-multiplication and calculate the remaining time in my head. I mostly did the last bit to show Qalculate could compute dates and time differences, as long as you use ISO timestamps. Although it can also convert to and from UNIX timestamps, it cannot parse arbitrary date strings (yet?).

Other functionality Qalculate can:
  • Plot graphs;
  • Use RPN input;
  • Do all sorts of algebraic, calculus, matrix, statistics, trigonometry functions (and more!);
  • ... and so much more!
I have a hard time finding things it cannot do. When I get there, I typically need to resort to programming code in Python, use a spreadsheet, and others will turn to more complete engines like Maple, Mathematica or R. But for daily use, Qalculate is just fantastic. And it's pink! Use it!

Further reading and installation This is just scratching the surface, the fine manual has more information, including more examples. There is also of course a qalc(1) manual page which also ships an excellent EXAMPLES section. Qalculate is packaged for over 30 Linux distributions, but also ships packages for Windows and MacOS. There are third-party derivatives as well including a web version and an Android app.

Updates Colin Watson liked this blog post and was inspired to write his own hacks, similar to what's here, but with extras, check it out!

5 February 2025

Reproducible Builds: Reproducible Builds in January 2025

Welcome to the first report in 2025 from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As usual, though, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. reproduce.debian.net
  2. Two new academic papers
  3. Distribution work
  4. On our mailing list
  5. Upstream patches
  6. diffoscope
  7. Website updates
  8. Reproducibility testing framework

reproduce.debian.net The last few months saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. Powering that is rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there. This month, however, we are pleased to announce that in addition to the existing amd64.reproduce.debian.net and i386.reproduce.debian.net architecture-specific pages, we now build for a three more architectures (for a total of five) arm64 armhf and riscv64.

Two new academic papers Giacomo Benedetti, Oreofe Solarin, Courtney Miller, Greg Tystahl, William Enck, Christian K stner, Alexandros Kapravelos, Alessio Merlo and Luca Verderame published an interesting article recently. Titled An Empirical Study on Reproducible Packaging in Open-Source Ecosystem, the abstract outlines its optimistic findings:
[We] identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.
The entire PDF is available online to view.
In addition, Julien Malka, Stefano Zacchiroli and Th o Zimmermann of T l com Paris in-house research laboratory, the Information Processing and Communications Laboratory (LTCI) published an article asking the question: Does Functional Package Management Enable Reproducible Builds at Scale?. Answering strongly in the affirmative, the article s abstract reads as follows:
In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709,816 packages from historical snapshots of the nixpkgs repository[. We] obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full diffoscopes: recursive diffs of where unreproducible build artifacts differ.
As above, the entire PDF of the article is available to view online.

Distribution work There as been the usual work in various distributions this month, such as:
  • 10+ reviews of Debian packages were added, 11 were updated and 10 were removed this month adding to our knowledge about identified issues. A number of issue types were updated also.
  • The FreeBSD Foundation announced that a planned project to deliver zero-trust builds has begun in January 2025 . Supported by the Sovereign Tech Agency, this project is centered on the various build processes, and that the primary goal of this work is to enable the entire release process to run without requiring root access, and that build artifacts build reproducibly that is, that a third party can build bit-for-bit identical artifacts. The full announcement can be found online, which includes an estimated schedule and other details.

On our mailing list On our mailing list this month:
  • Following-up to a substantial amount of previous work pertaining the Sphinx documentation generator, James Addison asked a question pertaining to the relationship between SOURCE_DATE_EPOCH environment variable and testing that generated a number of replies.
  • Adithya Balakumar of Toshiba asked a question about whether it is possible to make ext4 filesystem images reproducible. Adithya s issue is that even the smallest amount of post-processing of the filesystem results in the modification of the Last mount and Last write timestamps.
  • James Addison also investigated an interesting issue surrounding our disorderfs filesystem. In particular:
    FUSE (Filesystem in USErspace) filesystems such as disorderfs do not delete files from the underlying filesystem when they are deleted from the overlay. This can cause seemingly straightforward tests for example, cases that expect directory contents to be empty after deletion is requested for all files listed within them to fail.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 285, 286 and 287 to Debian:
  • Security fixes:
    • Validate the --css command-line argument to prevent a potential Cross-site scripting (XSS) attack. Thanks to Daniel Schmidt from SRLabs for the report. [ ]
    • Prevent XML entity expansion attacks. Thanks to Florian Wilkens from SRLabs for the report.. [ ][ ]
    • Print a warning if we have disabled XML comparisons due to a potentially vulnerable version of pyexpat. [ ]
  • Bug fixes:
    • Correctly identify changes to only the line-endings of files; don t mark them as Ordering differences only. [ ]
    • When passing files on the command line, don t call specialize( ) before we ve checked that the files are identical or not. [ ]
    • Do not exit with a traceback if paths are inaccessible, either directly, via symbolic links or within a directory. [ ]
    • Don t cause a traceback if cbfstool extraction failed.. [ ]
    • Use the surrogateescape mechanism to avoid a UnicodeDecodeError and crash when any decoding zipinfo output that is not UTF-8 compliant. [ ]
  • Testsuite improvements:
    • Don t mangle newlines when opening test fixtures; we want them untouched. [ ]
    • Move to assert_diff in test_text.py. [ ]
  • Misc improvements:
    • Drop unused subprocess imports. [ ][ ]
    • Drop an unused function in iso9600.py. [ ]
    • Inline a call and check of Config().force_details; no need for an additional variable in this particular method. [ ]
    • Remove an unnecessary return value from the Difference.check_for_ordering_differences method. [ ]
    • Remove unused logging facility from a few comparators. [ ]
    • Update copyright years. [ ][ ]
In addition, fridtjof added support for the ASAR .tar-like archive format. [ ][ ][ ][ ] and lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 285 [ ][ ] and 286 [ ][ ].
strip-nondeterminism is our sister tool to remove specific non-deterministic results from a completed build. This month version 1.14.1-1 was uploaded to Debian unstable by Chris Lamb, making the following the changes:
  • Clarify the --verbose and non --verbose output of bin/strip-nondeterminism so we don t imply we are normalizing files that we are not. [ ]
  • Bump Standards-Version to 4.7.0. [ ]

Website updates There were a large number of improvements made to our website this month, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Add support for rebuilding the armhf architecture. [ ][ ]
    • Add support for rebuilding the arm64 architecture. [ ][ ][ ][ ]
    • Add support for rebuilding the riscv64 architecture. [ ][ ]
    • Move the i386 builder to the osuosl5 node. [ ][ ][ ][ ]
    • Don t run our rebuilders on a public port. [ ][ ]
    • Add database backups on all builders and add links. [ ][ ]
    • Rework and dramatically improve the statistics collection and generation. [ ][ ][ ][ ][ ][ ]
    • Add contact info to the main page [ ], thumbnails [ ] as well as the new, missing architectures. [ ]
    • Move the amd64 worker to the osuosl4 and node. [ ]
    • Run the underlying debrebuild script under nice. [ ]
    • Try to use TMPDIR when calling debrebuild. [ ][ ]
  • buildinfos.debian.net-related:
    • Stop creating buildinfo-pool_$ suite _$ arch .list files. [ ]
    • Temporarily disable automatic updates of pool links. [ ]
  • FreeBSD-related:
    • Fix the sudoers to actually permit builds. [ ]
    • Disable debug output for FreeBSD rebuilding jobs. [ ]
    • Upgrade to FreeBSD 14.2 [ ] and document that bmake was installed on the underlying FreeBSD virtual machine image [ ].
  • Misc:
    • Update the real year to 2025. [ ]
    • Don t try to install a Debian bookworm kernel from backports on the infom08 node which is running Debian trixie. [ ]
    • Don t warn about system updates for systems running Debian testing. [ ]
    • Fix a typo in the ZOMBIES definition. [ ][ ]
In addition:
  • Ed Maste modified the FreeBSD build system to the clean the object directory before commencing a build. [ ]
  • Gioele Barabucci updated the rebuilder stats to first add a category for network errors [ ] as well as to categorise failures without a diffoscope log [ ].
  • Jessica Clarke also made some FreeBSD-related changes, including:
    • Ensuring we clean up the object directory for second build as well. [ ][ ]
    • Updating the sudoers for the relevant rm -rf command. [ ]
    • Update the cleanup_tmpdirs method to to match other removals. [ ]
  • Jochen Sprickerhof:
  • Roland Clobus:
    • Update the reproducible_debstrap job to call Debian s debootstrap with the full path [ ] and to use eatmydata as well [ ][ ].
    • Make some changes to deduce the CPU load in the debian_live_build job. [ ]
Lastly, both Holger Levsen [ ] and Vagrant Cascadian [ ] performed some node maintenance.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

9 January 2025

Reproducible Builds: Reproducible Builds in December 2024

Welcome to the December 2024 report from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As ever, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. reproduce.debian.net
  2. debian-repro-status
  3. On our mailing list
  4. Enhancing the Security of Software Supply Chains
  5. diffoscope
  6. Supply-chain attack in the Solana ecosystem
  7. Website updates
  8. Debian changes
  9. Other development news
  10. Upstream patches
  11. Reproducibility testing framework

reproduce.debian.net Last month saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. rebuilderd is our server designed monitor the official package repositories of Linux distributions and attempts to reproduce the observed results there. This month, however, we are pleased to announce that not only does the service now produce graphs, the reproduce.debian.net homepage itself has become a start page of sorts, and the amd64.reproduce.debian.net and i386.reproduce.debian.net pages have emerged. The first of these rebuilds the amd64 architecture, naturally, but it also is building Debian packages that are marked with the no architecture label, all. The second builder is, however, only rebuilding the i386 architecture. Both of these services were also switched to reproduce the Debian trixie distribution instead of unstable, which started with 43% of the archive rebuild with 79.3% reproduced successfully. This is very much a work in progress, and we ll start reproducing Debian unstable soon. Our i386 hosts are very kindly sponsored by Infomaniak whilst the amd64 node is sponsored by OSUOSL thank you! Indeed, we are looking for more workers for more Debian architectures; please contact us if you are able to help.

debian-repro-status Reproducible builds developer kpcyrd has published a client program for reproduce.debian.net (see above) that queries the status of the locally installed packages and rates the system with a percentage score. This tool works analogously to arch-repro-status for the Arch Linux Reproducible Builds setup. The tool was packaged for Debian and is currently available in Debian trixie: it can be installed with apt install debian-repro-status.

On our mailing list On our mailing list this month:
  • Bernhard M. Wiedemann wrote a detailed post on his long journey towards a bit-reproducible Emacs package. In his interesting message, Bernhard goes into depth about the tools that they used and the lower-level technical details of, for instance, compatibility with the version for glibc within openSUSE.
  • Shivanand Kunijadar posed a question pertaining to the reproducibility issues with encrypted images. Shivanand explains that they must use a random IV for encryption with AES CBC. The resulting artifact is not reproducible due to the random IV used. The message resulted in a handful of replies, hopefully helpful!
  • User Danilo posted an in interesting question related to their attempts in trying to achieve reproducible builds for Threema Desktop 2.0. The question resulted in a number of replies attempting to find the right combination of compiler and linker flags (for example).
  • Longstanding contributor David A. Wheeler wrote to our list announcing the release of the Census III of Free and Open Source Software: Application Libraries report written by Frank Nagle, Kate Powell, Richie Zitomer and David himself. As David writes in his message, the report attempts to answer the question what is the most popular Free and Open Source Software (FOSS)? .
  • Lastly, kpcyrd followed-up to a post from September 2024 which mentioned their desire for someone to implement a hashset of allowed module hashes that is generated during the kernel build and then embedded in the kernel image , thus enabling a deterministic and reproducible build. However, they are now reporting that somebody implemented the hash-based allow list feature and submitted it to the Linux kernel mailing list . Like kpcyrd, we hope it gets merged.

Enhancing the Security of Software Supply Chains: Methods and Practices Mehdi Keshani of the Delft University of Technology in the Netherlands has published their thesis on Enhancing the Security of Software Supply Chains: Methods and Practices . Their introductory summary first begins with an outline of software supply chains and the importance of the Maven ecosystem before outlining the issues that it faces that threaten its security and effectiveness . To address these:
First, we propose an automated approach for library reproducibility to enhance library security during the deployment phase. We then develop a scalable call graph generation technique to support various use cases, such as method-level vulnerability analysis and change impact analysis, which help mitigate security challenges within the ecosystem. Utilizing the generated call graphs, we explore the impact of libraries on their users. Finally, through empirical research and mining techniques, we investigate the current state of the Maven ecosystem, identify harmful practices, and propose recommendations to address them.
A PDF of Mehdi s entire thesis is available to download.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 283 and 284 to Debian:
  • Update copyright years. [ ]
  • Update tests to support file 5.46. [ ][ ]
  • Simplify tests_quines.py::test_ differences,differences_deb to simply use assert_diff and not mangle the test fixture. [ ]

Supply-chain attack in the Solana ecosystem A significant supply-chain attack impacted Solana, an ecosystem for decentralised applications running on a blockchain. Hackers targeted the @solana/web3.js JavaScript library and embedded malicious code that extracted private keys and drained funds from cryptocurrency wallets. According to some reports, about $160,000 worth of assets were stolen, not including SOL tokens and other crypto assets.

Website updates Similar to last month, there was a large number of changes made to our website this month, including:
  • Chris Lamb:
    • Make the landing page hero look nicer when the vertical height component of the viewport is restricted, not just the horizontal width.
    • Rename the Buy-in page to Why Reproducible Builds? [ ]
    • Removing the top black border. [ ][ ]
  • Holger Levsen:
  • hulkoba:
    • Remove the sidebar-type layout and move to a static navigation element. [ ][ ][ ][ ]
    • Create and merge a new Success stories page, which highlights the success stories of Reproducible Builds, showcasing real-world examples of projects shipping with verifiable, reproducible builds. These stories aim to enhance the technical resilience of the initiative by encouraging community involvement and inspiring new contributions. . [ ]
    • Further changes to the homepage. [ ]
    • Remove the translation icon from the navigation bar. [ ]
    • Remove unused CSS styles pertaining to the sidebar. [ ]
    • Add sponsors to the global footer. [ ]
    • Add extra space on large screens on the Who page. [ ]
    • Hide the side navigation on small screens on the Documentation pages. [ ]

Debian changes There were a significant number of reproducibility-related changes within Debian this month, including:
  • Santiago Vila uploaded version 0.11+nmu4 of the dh-buildinfo package. In this release, the dh_buildinfo becomes a no-op ie. it no longer does anything beyond warning the developer that the dh-buildinfo package is now obsolete. In his upload, Santiago wrote that We still want packages to drop their [dependency] on dh-buildinfo, but now they will immediately benefit from this change after a simple rebuild.
  • Holger Levsen filed Debian bug #1091550 requesting a rebuild of a number of packages that were built with a very old version of dpkg.
  • Fay Stegerman contributed to an extensive thread on the debian-devel development mailing list on the topic of Supporting alternative zlib implementations . In particular, Fay wrote about her results experimenting whether zlib-ng produces identical results or not.
  • kpcyrd uploaded a new rust-rebuilderd-worker, rust-derp, rust-in-toto and debian-repro-status to Debian, which passed successfully through the so-called NEW queue.
  • Gioele Barabucci filed a number of bugs against the debrebuild component/script of the devscripts package, including:
    • #1089087: Address a spurious extra subdirectory in the build path.
    • #1089201: Extra zero bytes added to .dynstr when rebuilding CMake projects.
    • #1089088: Some binNMUs have a 1-second offset in some timestamps.
  • Gioele Barabucci also filed a bug against the dh-r package to report that the Recommends and Suggests fields are missing from rebuilt R packages. At the time of writing, this bug has no patch and needs some help to make over 350 binary packages reproducible.
  • Lastly, 8 reviews of Debian packages were added, 11 were updated and 11 were removed this month adding to our knowledge about identified issues.

Other development news In other ecosystem and distribution news:
  • Lastly, in openSUSE, Bernhard M. Wiedemann published another report for the distribution. There, Bernhard reports about the success of building R-B-OS , a partial fork of openSUSE with only 100% bit-reproducible packages. This effort was sponsored by the NLNet NGI0 initiative.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In November, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Add a new i386.reproduce.debian.net rebuilder. [ ][ ][ ][ ][ ][ ]
    • Make a number of updates to the documentation. [ ][ ][ ][ ][ ]
    • Run i386.reproduce.debian.net run on a public port to allow external workers. [ ]
    • Add a link to the /api/v0/pkgs/list endpoint. [ ]
    • Add support for a statistics page. [ ][ ][ ][ ][ ][ ]
    • Limit build logs to 20 MiB and diffoscope output to 10 MiB. [ ]
    • Improve the frontpage. [ ][ ]
    • Explain that we re testing arch:any and arch:all on the amd64 architecture, but only arch:any on i386. [ ]
  • Misc:
    • Remove code for testing Arch Linux, which has moved to reproduce.archlinux.org. [ ][ ]
    • Don t install dstat on Jenkins nodes anymore as its been removed from Debian trixie. [ ]
    • Prepare the infom08-i386 node to become another rebuilder. [ ]
    • Add debug date output for benchmarking the reproducible_pool_buildinfos.sh script. [ ]
    • Install installation-birthday everywhere. [ ]
    • Temporarily disable automatic updates of pool links on buildinfos.debian.net. [ ]
    • Install Recommends by default on Jenkins nodes. [ ]
    • Rename rebuilder_stats.py to rebuilderd_stats.py. [ ]
    • r.d.n/stats: minor formatting changes. [ ]
    • Install files under /etc/cron.d/ with the correct permissions. [ ]
and Jochen Sprickerhof made the following changes: Lastly, Gioele Barabucci also classified packages affected by 1-second offset issue filed as Debian bug #1089088 [ ][ ][ ][ ], Chris Hofstaedtler updated the URL for Grml s dpkg.selections file [ ], Roland Clobus updated the Jenkins log parser to parse warnings from diffoscope [ ] and Mattia Rizzolo banned a number of bots and crawlers from the service [ ][ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 December 2024

Reproducible Builds: Reproducible Builds in November 2024

Welcome to the November 2024 report from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security where relevant. As ever, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. Reproducible Builds mourns the passing of Lunar
  2. Introducing reproduce.debian.net
  3. New landing page design
  4. SBOMs for Python packages
  5. Debian updates
  6. Reproducible builds by default in Maven 4
  7. PyPI now supports digital attestations
  8. Dependency Challenges in OSS Package Registries
  9. Zig programming language demonstrated reproducible
  10. Website updates
  11. Upstream patches
  12. Misc development news
  13. Reproducibility testing framework

Reproducible Builds mourns the passing of Lunar The Reproducible Builds community sadly announced it has lost its founding member, Lunar. J r my Bobbio aka Lunar passed away on Friday November 8th in palliative care in Rennes, France. Lunar was instrumental in starting the Reproducible Builds project in 2013 as a loose initiative within the Debian project. He was the author of our earliest status reports and many of our key tools in use today are based on his design. Lunar s creativity, insight and kindness were often noted. You can view our full tribute elsewhere on our website. He will be greatly missed.

Introducing reproduce.debian.net In happier news, this month saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. rebuilderd is our server designed monitor the official package repositories of Linux distributions and attempts to reproduce the observed results there. In November, reproduce.debian.net began rebuilding Debian unstable on the amd64 architecture, but throughout the MiniDebConf, it had attempted to rebuild 66% of the official archive. From this, it could be determined that it is currently possible to bit-for-bit reproduce and corroborate approximately 78% of the actual binaries distributed by Debian that is, using the .buildinfo files hosted by Debian itself. reproduce.debian.net also contains instructions how to setup one s own rebuilderd instance, and we very much invite everyone with a machine to spare to setup their own version and to share the results. Whilst rebuilderd is still in development, it has been used to reproduce Arch Linux since 2019. We are especially looking for installations targeting Debian architectures other than i386 and amd64.

New landing page design As part of a very productive partnership with the Sovereign Tech Fund and Neighbourhoodie, we are pleased to unveil our new homepage/landing page. We are very happy with our collaboration with both STF and Neighbourhoodie (including many changes not directly related to the website), and look forward to working with them in the future.

SBOMs for Python packages The Python Software Foundation has announced a new cross-functional project for SBOMs and Python packages . Seth Michael Larson writes that the project is specifically looking to solve these issues :
  • Enable Python users that require SBOM documents (likely due to regulations like CRA or SSDF) to self-serve using existing SBOM generation tools.
  • Solve the phantom dependency problem, where non-Python software is bundled in Python packages but not recorded in any metadata. This makes the job of software composition analysis (SCA) tools difficult or impossible.
  • Make the adoption work by relevant projects such as build backends, auditwheel-esque tools, as minimal as possible. Empower users who are interested in having better SBOM data for the Python projects they are using to be able to contribute engineering time towards that goal.
A GitHub repository for the initiative is available, and there are a number of queries, comments and remarks on Seth s Discourse forum post.

Debian updates There was significant development within Debian this month. Firstly, at the recent MiniDebConf in Toulouse, France, Holger Levsen gave a Debian-specific talk on rebuilding packages distributed from ftp.debian.org that is to say, how to reproduce the results from the official Debian build servers: Holger described the talk as follows:
For more than ten years, the Reproducible Builds project has worked towards reproducible builds of many projects, and for ten years now we have build Debian packages twice with maximal variations applied to see if they can be build reproducible still. Since about a month, we ve also been rebuilding trying to exactly match the builds being distributed via ftp.debian.org. This talk will describe the setup and the lessons learned so far, and why the results currently are what they are (spoiler: they are less than 30% reproducible), and what we can do to fix that.
The Debian Project Leader, Andreas Tille, was present at the talk and remarked later in his Bits from the DPL update that:
It might be unfair to single out a specific talk from Toulouse, but I d like to highlight the one on reproducible builds. Beyond its technical focus, the talk also addressed the recent loss of Lunar, whom we mourn deeply. It served as a tribute to Lunar s contributions and legacy. Personally, I ve encountered packages maintained by Lunar and bugs he had filed. I believe that taking over his packages and addressing the bugs he reported is a meaningful way to honor his memory and acknowledge the value of his work.
Holger s slides and video in .webm format are available.
Next, rebuilderd is the server to monitor package repositories of Linux distributions and attempt to reproduce the observed results. This month, version 0.21.0 released, most notably with improved support for binNMUs by Jochen Sprickerhof and updating the rebuilderd-debian.sh integration to the latest debrebuild version by Holger Levsen. There has also been significant work to get the rebuilderd package into the Debian archive, in particular, both rust-rebuilderd-common version 0.20.0-1 and rust-rust-lzma version 0.6.0-1 were packaged by kpcyrd and uploaded by Holger Levsen. Related to this, Holger Levsen submitted three additional issues against rebuilderd as well:
  • rebuildctl should be more verbose when encountering issues. [ ]
  • Please add an option to used randomised queues. [ ]
  • Scheduling and re-scheduling multiple packages at once. [ ]
and lastly, Jochen Sprickerhof submitted one an issue requested that rebuilderd downloads the source package in addition to the .buildinfo file [ ] and kpcyrd also submitted and fixed an issue surrounding dependencies and clarifying the license [ ]
Separate to this, back in 2018, Chris Lamb filed a bug report against the sphinx-gallery package as it generates unreproducible content in various ways. This month, however, Dmitry Shachnev finally closed the bug, listing the multiple sub-issues that were part of the problem and how they were resolved.
Elsewhere, Roland Clobus posted to our mailing list this month, asking for input on a bug in Debian s ca-certificates-java package. The issue is that the Java key management tools embed timestamps in its output, and this output ends up in the /etc/ssl/certs/java/cacerts file on the generated ISO images. A discussion resulted from Roland s post suggesting some short- and medium-term solutions to the problem.
Holger Levsen uploaded some packages with reproducibility-related changes:
Lastly, 12 reviews of Debian packages were added, 5 were updated and 21 were removed this month adding to our knowledge about identified issues in Debian.

Reproducible builds by default in Maven 4 On our mailing list this month, Herv Boutemy reported the latest release of Maven (4.0.0-beta-5) has reproducible builds enabled by default. In his mailing list post, Herv mentions that this story started during our Reproducible Builds summit in Hamburg , where he created the upstream issue that builds on a multi-year effort to have Maven builds configured for reproducibility.

PyPI now supports digital attestations Elsewhere in the Python ecosystem and as reported on LWN and elsewhere, the Python Package Index (PyPI) has announced that it has finalised support for PEP 740 ( Index support for digital attestations ). Trail of Bits, who performed much of the development work, has an in-depth blog post about the work and its adoption, as well as what is left undone:
One thing is notably missing from all of this work: downstream verification. [ ] This isn t an acceptable end state (cryptographic attestations have defensive properties only insofar as they re actually verified), so we re looking into ways to bring verification to individual installing clients. In particular, we re currently working on a plugin architecture for pip that will enable users to load verification logic directly into their pip install flows.
There was an in-depth discussion on LWN s announcement page, as well as on Hacker News.

Dependency Challenges in OSS Package Registries At BENEVOL, the Belgium-Netherlands Software Evolution workshop in Namur, Belgium, Tom Mens and Alexandre Decan presented their paper, An Overview and Catalogue of Dependency Challenges in Open Source Software Package Registries . The abstract of their paper is as follows:
While open-source software has enabled significant levels of reuse to speed up software development, it has also given rise to the dreadful dependency hell that all software practitioners face on a regular basis. This article provides a catalogue of dependency-related challenges that come with relying on OSS packages or libraries. The catalogue is based on the scientific literature on empirical research that has been conducted to understand, quantify and overcome these challenges. [ ]
A PDF of the paper is available online.

Zig programming language demonstrated reproducible Motiejus Jak ty posted an interesting and practical blog post on his successful attempt to reproduce the Zig programming language without using the pre-compiled binaries checked into the repository, and despite the circular dependency inherent in its bootstrapping process. As a summary, Motiejus concludes that:
I can now confidently say (and you can also check, you don t need to trust me) that there is nothing hiding in zig1.wasm [the checked-in binary] that hasn t been checked-in as a source file.
The full post is full of practical details, and includes a few open questions.

Website updates Notwithstanding the significant change to the landing page (screenshot above), there were an enormous number of changes made to our website this month. This included:
  • Alex Feyerke and Mariano Gim nez:
    • Dramatically overhaul the website s landing page with new benefit cards tailored to the expected visitors to our website and a reworking of the visual hierarchy and design. [ ][ ][ ][ ][ ][ ][ ][ ][ ][ ]
  • Bernhard M. Wiedemann:
    • Update the System images page to document the e2fsprogs approach. [ ]
  • Chris Lamb:
  • FC (Fay) Stegerman:
    • Replace more inline markdown with HTML on the Success stories page. [ ]
    • Add some links, fix some other links and correct some spelling errors on the Tools page. [ ]
  • Holger Levsen:
    • Add a historical presentation ( Reproducible builds everywhere eg. in Debian, OpenWrt and LEDE ) from October 2016. [ ]
    • Add jochensp and Oejet to the list of known contributors. [ ][ ]
  • Julia Kr ger:
  • Ninette Adhikari & hulkoba:
    • Add/rework the list of success stories into a new page that clearly shows milestones in Reproducible Builds. [ ][ ][ ][ ][ ][ ]
  • Philip Rinn:
    • Import 47 historical weekly reports. [ ]
  • hulkoba:
    • Add alt text to almost all images (!). [ ][ ]
    • Fix a number of links on the Talks . [ ][ ]
    • Avoid so-called ghost buttons by not using <button> elements as links, as the affordance of a <button> implies an action with (potentially) a side effect. [ ][ ]
    • Center the sponsor logos on the homepage. [ ]
    • Move publications and generate them instead from a data.yml file with an improved layout. [ ][ ]
    • Make a large number of small but impactful stylisting changes. [ ][ ][ ][ ]
    • Expand the Tools to include a number of missing tools, fix some styling issues and fix a number of stale/broken links. [ ][ ][ ][ ][ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Misc development news

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In November, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related changes:
    • Create and introduce a new reproduce.debian.net service and subdomain [ ]
    • Make a large number of documentation changes relevant to rebuilderd. [ ][ ][ ][ ][ ]
    • Explain a temporary workaround for a specific issue in rebuilderd. [ ]
    • Setup another rebuilderd instance on the o4 node and update installation documentation to match. [ ][ ]
    • Make a number of helpful/cosmetic changes to the interface, such as clarifying terms and adding links. [ ][ ][ ][ ][ ]
    • Deploy configuration to the /opt and /var directories. [ ][ ]
    • Add an infancy (or alpha ) disclaimer. [ ][ ]
    • Add more notes to the temporary rebuilderd documentation. [ ]
    • Commit an nginx configuration file for reproduce.debian.net s Stats page. [ ]
    • Commit a rebuilder-worker.conf configuration for the o5 node. [ ]
  • Debian-related changes:
    • Grant jspricke and jochensp access to the o5 node. [ ][ ]
    • Build the qemu package with the nocheck build flag. [ ]
  • Misc changes:
    • Adapt the update_jdn.sh script for new Debian trixie systems. [ ]
    • Stop installing the PostgreSQL database engine on the o4 and o5 nodes. [ ]
    • Prevent accidental reboots of the o4 node because of a long-running job owned by josch. [ ][ ]
In addition, Mattia Rizzolo addressed a number of issues with reproduce.debian.net [ ][ ][ ][ ]. And lastly, both Holger Levsen [ ][ ][ ][ ] and Vagrant Cascadian [ ][ ][ ][ ] performed node maintenance.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

21 October 2024

Sahil Dhiman: Free Software Mirrors in India

Last Updated on 15/11/2024. List of public mirrors in India. Location discovered basis personal knowledge, traces or GeoIP. Mirrors which aren t accessible outside their own ASN are excluded.

North India

East India

South India

West India

CDN (or behind one) Many thanks to Shrirang and Saswata for tips and corrections. Let me know if I m missing someone or something is amiss.

30 August 2024

Sahil Dhiman: Debconf24 Busan

DebConf24 was held in Busan, South Korea, from July 28th to August 4th 2024 and preceded by DebCamp from July 21st to July 27th. This was my second IRL DebConf (DC) and fourth one in total. I started in Debian with a DebConf, so its always an occasion when one happens. This year again, I worked in fundraising team, working to raise funds from International sponsors. We did manage to raise good enough funding, albeit less than budgeted. Though, the local Korean team was able to connect and gather many Governmental sponsors, which was quite surprising for me. I wasn t seriously considering attending DebConf until I discussed this with Nilesh. More or less, his efforts helped push me through the whole process. Thanks, Nilesh, for this. In March, I got my passport and started preparing documents for South Korean visa. It did require quite a lot of paper work but seeing South Korea s s fresh passport visa rejection rate, I had doubts about visa acceptance. The visa finally got approved, which could be attributed to great documentation and help from DebConf visa team. This was also my first trip outside India, and this being to DebConf made many things easy. Most stuff were documented on DebConf website and wiki. Asking some query got immediate responses from someone in the DebConf channels. We then booked a direct flight from Delhi, reaching Seoul in the morning. With good directions from Sourab TK who had reached Seoul a few hours earlier, we quickly got Korean Won, local SIM and T Money card (transportation card) and headed towards Seoul by AREX, airport metro. We spent the next two days exploring Seoul, which is huge. It probably has the highest number of skyscrapers I have ever seen. The city has a good mix of modern and ancient culture. We explored various places in Seoul including Gyeongbokgung Palace, Statue of King Sejong, Bukchon Hanok village, N Seoul Tower and various food markets which were amazing. A Street in Seoul
A Street in Seoul
Next, we headed to Busan for DebConf using KTX (Korean high speed rail). (Fun fact, slogan for City of Busan is Busan is Good .) South Korea has a good network of frequently running high speed trains. We had pre-booked our tickets because, despite the frequency, trains were sold out most of the time. KTX ride was quite smooth, despite travelling at 300 Kmph at times through Korean countryside and long mountain tunnels. View from Dorm Room
PKNU Entrance
The venue for DebConf was Pukyong National University (PKNU), Daeyeon Campus. PKNU had two campuses in the Busan and some folks ended up in wrong campus too. With good help and guidance from the front desk, we got our dormitory rooms assigned. Dorms here were quite different, ie: View from Dorm Room
View from Dorm Room
Settling in was easy. We started meeting familiar folks after almost a year. The long conversations started again. Everyone was excited for DebConf. Like everytime, the first day was full of action (and chaos). Meet and greet, volunteers check in, video team running around and fixing stuff and things working (or not). There were some interesting talks and sponsors stalls. After day one, things more or less settled down. I again volunteered for video team stuff and helped in camera operations and talk directions, which is always fun. As the tradition applies, saw few talks live on stream too sitting in the dorm room during the conf, which is always fun, when too tired to get ready and go out. From Talk Director's chair
From Talk Director's chair
DebConf takes care of food needs for vegan/vegetarianism folks well, of which I m one. I got to try different food items, which was quite an experience. Tried using chopsticks again which didn t work, which I later figured that handling metal ones were more difficult. We had late night ramens and wooden chopsticks worked perfectly. One of the days, we even went out to an Indian restaurant to have some desi aloo paratha, paneer dishes, samosas and chai (milk tea). I wasn t particularly craving desi food but wasn t able to get something according to my taste so went there. As usual Bits from DPL talk was packed
As usual Bits from DPL talk was packed
For day trip, I went to Ulsan. San means mountains in Korean. Ulsan is a port city with many industries including Hyundai car factory, petrochemical industry, paint industry, ship building etc. We saw bamboo forest, Ulsan tower (quite a view towards Ulsan port), whale village, Ulsan Onggi Museum and the sea (which was beautiful). The beautiful sea
The beautiful sea

View from Ulsan Bridge Observatory
View from Ulsan Bridge Observatory
Amongst the sponsors, I was most interested in our network sponsors, folks who were National research and education networks (NREN) here. We had two network sponsors, KOREN and KREONET, thanks to efforts by local team. Initially it was discussed that they ll provide 20G uplink each, so 40G in total, which was whopping but by the time the closing talk happened, we got to know we had 200G uplink to the Internet. This was a massive update to last year when we had 1G main and 100M backup link. 200G wasn t what is required, but it was massive capacity and IIRC from the talk, we peaked at around 500M in usage, but it s always fun to have astronomical amount of bandwidth for bragging rights ;) Various mascots in attendance
Various mascots in attendance

Video and Network stats. Screengrab from closing ceremony
Video and Network stats. Screengrab from closing ceremony
Now let s talk about things I found interesting about South Korea in general: Gyeongbokgung Palace Entrance Gyeongbokgung Palace Entrance Gyeongbokgung Palace Entrance
Grand Gyeongbokgung Palace, Seoul

Starfield Library
Starfield Library, Seoul
If one has to get the whole DebConf experience, it s better to attend DebCamp as well because that s when you can sit and interact with everyone better. As DebConf starts, everyone gets busy in various talks and events and things take a pace. DebConf days literally fly. This year, attending DebConf in person was a different experience. Attending DebConf without any organizational work/stress so was better, and I was able to understand working of different Debian team and workflows better while also identified a few where I would like to join and help. A general conclusion was that almost all Debian teams needs more folks to help out. So if someone want to join, they can probably reach out to the team, and would be able to onboard new folks. Though this would require some patience. Kudos to the Korean team who were able to pull off this event under this tight timeline and thanks for all the hospitality. DebConf24 Group Photo
DebConf24 Group Photo. Click to enlarge.
Credits - Aigars Mahinovs
This whole experience expanded my world view. There s so much to see and explore and understand. Looking forward to DebConf25 in Brest, France. PS - Shoutout to abbyck (aka hamCK)!

19 June 2024

Sahil Dhiman: First Iteration of My Free Software Mirror

As I m gearing towards setting up a Free Software download mirror in India, it occurred to me that I haven t chronicled the work and motivation behind setting up the original mirror in the first place. Also, seems like it would be good to document stuff here for observing the progression, as the mirror is going multi-country now. Right now, my existing mirror i.e., mirrors.de.sahilister.net (was mirrors.sahilister.in), is hosted in Germany and serves traffic for Termux, NomadBSD, Blender, BlendOS and GIMP. For a while in between, it hosted OSMC project mirror as well. To explain what is a Free Software download mirror thing is first, I ll quote myself from work blog -
As most Free Software doesn t have commercial backing and require heavy downloads, the concept of software download mirrors helps take the traffic load off of the primary server, leading to geographical redundancy, higher availability and faster download in general.
So whenever someone wants to download a particular (mirrored) software and click download, upstream redirects the download to one of the mirror server which is geographical (or in other parameters) nearby to the user, leading to faster downloads and load sharing amongst all mirrors. Since the time I got into Linux and servers, I always wanted to help the community somehow, and mirroring seemed to be the most obvious thing. India seems to be a country which has traditionally seen less number of public download mirrors. IITB, TiFR, and some of the public institutions used to host them for popular Linux and Free Softwares, but they seem to be diminishing these days. In the last months of 2021, I started using Termux and saw that it had only a few mirrors (back then). I tried getting a high capacity, high bandwidth node in budget but it was hard in India in 2021-22. So after much deliberation, I decided to go where it s available and chose a German hosting provider with the thought of adding India node when conditions are favorable (thankfully that happened, and India node is live too now.). Termux required only 29 GB of storage, so went ahead and started mirroring it. I raised this issue in Termux s GitHub repository in January 2022. This blog post chronicles the start of the mirror. Termux has high request counts from a mirror point of view. Each Termux client, usually checks every mirror in selected group for availability before randomly selecting one for download (only other case is when client has explicitly selected a single mirror using termux-repo-change). The mirror started getting thousands of requests daily due to this but only a small percentage would actually get my mirror in selection, so download traffic was lower. Similar thing happened with OSMC too (which I started mirroring later). With this start, I started exploring various project that would be benefit from additional mirrors. Public information from Academic Computer Club in Ume s mirror and Freedif s mirror stats helped to figure out storage and bandwidth requirements for potential projects. Fun fact, Academic Computer Club in Ume (which is one of the prominent Debian, Ubuntu etc.) mirror, now has 200 Gbits/s uplink to the internet through SUNET. Later, I migrated to a different provider for better speeds and added LibreSpeed test on the mirror server. Those were fun times. Between OSMC, Termux and LibreSpeed, I was getting almost 1.2 millions hits/day on the server at its peak, crossing for the first time a TB/day traffic number. Next came Blender, which took the longest time to set up of around 9 10 months. Blender had a push-trigger requirement for rsync from upstream that took quite some back and forth. It now contributes the most amount of traffic on the mirror. On release days, mirror does more than 3 TB/day and normal days, it hovers around 2 TB/day. Gimp project is the latest addition. At one time, the mirror traffic touched 4.97 TB/day traffic number. That s when I decided on dropping LibreSpeed server to solely focus on mirroring for now, keeping the bandwidth allotment for serving downloads only. The mirror projects selection grew organically. I used to reach out many projects discussing the need of for additional mirrors. Some projects outright denied mirroring request as Germany already has a good academic mirrors boosting 20-25 Gbits/s speeds from FTP era, which seems fair. Finding the niche was essential to only add softwares, which would truly benefit from additional capacity. There were months when nothing much would happen with the mirror, rsync would continue to update the mirror while nginx would keep on serving the traffic. Nowadays, the mirror pushes around 70 TB/month. I occasionally check logs, vnstat, add new security stuff here and there and pay the bills. It now saturates the Gigabit link sometimes and goes beyond that, peaking around 1.42 Gbits/s (the hosting provider seems to be upping their game). The plan is to upgrade the link to better speeds. vnstat yearly
Yearly traffic stats (through vnstat -y )
On the way, learned quite a few things like - GeoIP Map of Clients from Yesterday Access Logs
GeoIP Map of Clients from Yesterday's Access Logs. Click to enlarge
Generated from IPinfo.io
In hindsight, the statistics look amazing, hundreds of TBs of traffic served from the mirror, month after month. That does show that there s still an appetite for public mirrors in time of commercially donated CDNs and GitHub. The world could have done with one less mirror, but it saved some time, lessened the burden for others, while providing redundancy and traffic localization with one additional mirror. And it s fun for someone like me who s into infrastructure that powers the Internet. Now, I ll try focusing and expanding the India mirror, which in itself started pushing almost half a TB/day. Long live Free Software and public download mirrors.

27 May 2024

Sahil Dhiman: A Late, Late Debconf23 Post

After much procrastination, I have gotten around to complete my DebConf23 (DC23), Kochi blog post. I lost the original etherpad which was started before DebConf23, for jotting down things. Now, I have started afresh with whatever I can remember, months after the actual conference ended. So things might be as accurate as my memory. DebConf23, the 24th annual Debian Conference, happened in Infopark, Kochi, India from 10th September to 17th September 2023. It was preceded by DebCamp from 3rd September to 9th September 2023. First formal bid to host DebConf in India was made during DebConf18 in Hsinchu, Taiwan by Raju Dev, which didn t came our way. In next DebConf, DebConf19 in Curitiba, Brazil, another bid was made by him with help and support from Sruthi, Utkarsh and the whole team.This time, India got the opportunity to host DebConf22, which eventually became DebConf23 for the reasons you all know. I initially met the local team on the sidelines of DebConf20, which was also my first DebConf. Having recently switched to Debian, DC20 introduced me to how things work in Debian. Video team s call for volunteers email pulled me in. Things stuck, and I kept hanging out and helping the local Indian DC team with various stuff. We did manage to organize multiple events leading to DebConf23 including MiniDebConf India 2021 Online, MiniDebConf Palakkad 2022, MiniDebConf Tamil Nadu 2023 and DebUtsav Kochi 2023, which gave us quite a bit of experience and workout. Many local organizers from these conferences later joined various DebConf teams during the conference to help out. For DebConf23, originally I was part of publicity team because that was my usual thing. After a team redistribution exercise, Sruthi and Praveen moved me to sponsorship team, as anyhow we didn t had to do much publicity and sponsorship was one of those things I could get involved remotely. Sponsorship team had to take care of raising funds by reaching out to sponsors, managing invoices and fulfillment. Praveen joined as well in sponsorship team. We also had international sponsorship team, Anisa, Daniel and various Debian Trusted Organizations (TO)s which took care of reaching out to international organizations, and we took care of reaching out to Indian organizations for sponsorship. It was really proud moment when my present employer, Unmukti (makers of hopbox) came aboard as Bronze sponsor. Though fundraising seem to be hit hard from tech industry slowdown and layoffs. Many of our yesteryear sponsors couldn t sponsor. We had biweekly local team meetings, which were turned to weekly as we neared the event. This was done in addition to biweekly global team meeting. Pathu
Pathu, DebConf23 mascot
To describe the conference venue, it happened in InfoPark, Kochi with the main conference hall being Athulya Hall and food, accommodation and two smaller halls in Four Point Hotel, right outside Infopark. We got Athulya Hall as part of venue sponsorship from Infopark. The distance between both of them was around 300 meters. Halls were named Anamudi, Kuthiran and Ponmudi based on hills and mountain areas in host state of Kerala. Other than Annamudi hall which was the main hall, I couldn t remember the names of the hall, I still can t. Four Points was big and expensive, and we had, as expected, cost overruns. Due to how DebConf function, an Indian university wasn t suitable to host a conference of this scale. Infinity Pool at Night
Four Point's Infinity Pool at Night
I landed in Kochi on the first day of DebCamp on 3rd September. As usual, met Abraham first, and the better part of the next hour was spent on meet and greet. It was my first IRL DebConf so met many old friends and new folks. I got a room to myself. Abraham lived nearby and hadn t taken the accommodation, so I asked him to join. He finally joined from second day onwards. All through the conference, room 928 became in-famous for various reasons, and I had various roommates for company. In DebCamp days, we would get up to have breakfast and go back to sleep and get active only past lunch for hacking and helping in the hack lab for the day, followed by fun late night discussions and parties. Nilesh, Chirag and Apple at DC23
Nilesh, Chirag and Apple at DC23
The team even managed to get a press conference arranged as well, and we got an opportunity to go to Press Club, Ernakulam. Sruthi and Jonathan gave the speech and answered questions from journalists. The event was covered by media as well due to this. Ernakulam Press Club
Ernakulam Press Club
During the conference, every night the team use to have 9 PM meetings for retrospection and planning for next day, which was always dotted with new problems. Every day, we used to hijack Silent Hacklab for the meeting and gently ask the only people there at the time to give us space. DebConf, it itself is a well oiled machine. Network was brought up from scratch. Video team built the recording, audio mixing, live-streaming, editing and transcoding infrastructure on site. A gaming rig served as router and gateway. We got internet uplinks, a 1 Gbps sponsored leased line from Kerala Vision and a paid backup 100 Mbps connection from a different provider. IPv6 was added through HE s Tunnelbroker. Overall the network worked fine as additionally we had hotel Wi-Fi, so the conference network wasn t stretched much. I must highlight, DebConf is my only conference where almost everything and every piece of software in developed in-house, for the conference and modified according to need on the fly. Even event recording cameras, audio check, direction, recording and editing is all done on in-house software by volunteer-attendees (in some cases remote ones as well), all trained on the sideline of the conference. The core recording and mixing equipment is owned by Debian and travels to each venue. The rest is sourced locally. Gaming Rig which served as DC23 gateway router
Gaming Rig which served as DC23 gateway router
It was fun seeing how almost all the things were coordinated over text on Internet Relay Chat (IRC). If a talk/event was missing a talkmeister or a director or a camera person, a quick text on #debconf channel would be enough for someone to volunteer. Video team had a dedicated support channel for each conference venue for any issues and were quick to respond and fix stuff. Network information. Screengrab from closing ceremony
Network information. Screengrab from closing ceremony
It rained for the initial days, which gave us a cool weather. Swag team had decided to hand out umbrellas in swag kit which turned out to be quite useful. The swag kit was praised for quality and selection - many thanks to Anupa, Sruthi and others. It was fun wearing different color T-shirts, all designed by Abraham. Red for volunteers, light green for Video team, green for core-team i.e. staff and yellow for conference attendees. With highvoltage
With highvoltage
We were already acclimatized by the time DebConf really started as we had been talking, hacking and hanging out since last 7 days. Rush really started with the start of DebConf. More people joined on the first and second day of the conference. As has been the tradition, an opening talk was prepared by the Sruthi and local team (which I highly recommend watching to get more insights of the process). DebConf day 1 also saw job fair, where Canonical and FOSSEE, IIT Bombay had stalls for community interactions, which judging by the crowd itself turned out to be quite a hit. For me, association with DebConf (and Debian) started due to volunteering with video team, so anyhow I was going to continue doing that this conference as well. I usually volunteer for talks/events which anyhow I m interested in. Handling the camera, talkmeister-ing and direction are fun activities, though I didn t do sound this time around. Sound seemed difficult, and I didn t want to spoil someone s stream and recording. Talk attendance varied a lot, like in Bits from DPL talk, the hall was full but for some there were barely enough people to handle the volunteering tasks, but that s what usually happens. DebConf is more of a place to come together and collaborate, so talk attendance is an afterthought sometimes. Audience in highvoltage's Bits from DPL talk
Audience in highvoltage's Bits from DPL talk
I didn t submit any talk proposals this time around, as just being in the orga team was too much work already, and I knew, the talk preparation would get delayed to the last moment and I would have to rush through it. Enrico's talk
Enrico's talk
From Day 2 onward, more sponsor stalls were introduced in the hallway area. Hopbox by Unmukti , MostlyHarmless and Deeproot (joint stall) and FOSEE. MostlyHarmless stall had nice mechanical keyboards and other fun gadgets. Whenever I got the time, I would go and start typing racing to enjoy the nice, clicky keyboards. As the DebConf tradition dictates, we had a Cheese and Wine party. Everyone brought in cheese and other delicacies from their region. Then there was yummy Sadya. Sadya is a traditional vegetarian Malayalis lunch served over banana leaves. There were loads of different dishes served, the names of most I couldn t pronounce or recollect properly, but everything was super delicious. Day 4 was day trip and I chose to go to Athirappilly Waterfalls and Jungle safari. Pictures would describe the beauty better than words. The journey was bit long though. Athirappilly Falls
Athirappilly Falls

Pathu Pathu Tea Gardens
Tea Gardens
Late that day, we heard the news of Abraham gone missing. We lost Abraham. He had worked really hard all through the years for Debian and making this conference possible. Talks were cancelled for the next day and Jonathan addressed everyone. We went to Abraham s home the next day to meet his family. Team had arranged buses to Abraham s place. It was an unfortunate moment that I only got an opportunity to visit his place after he was gone. Days went by slowly after that. The last day was marked by a small conference dinner. Some of the people had already left. All through the day and next, we kept saying goodbye to friends, with whom we spent almost a fortnight together. Athirappilly Falls
Group photo with all DebConf T-shirts chronologically
This was 2nd trip to Kochi. Vistara Airway s UK886 has become the default flight now. I have almost learned how to travel in and around Kochi by Metro, Water Metro, Airport Shuttle and auto. Things are quite accessible in Kochi but metro is a bit expensive compared to Delhi. I left Kochi on 19th. My flight was due to leave around 8 PM, so I had the whole day to myself. A direct option would have taken less than 1 hour, but as I had time and chose to take the long way to the airport. First took an auto rickshaw to Kakkanad Water Metro station. Then sailed in the water metro to Vyttila Water Metro station. Vyttila serves as intermobility hub which connects water metro, metro, bus at once place. I switched to Metro here at Vyttila Metro station till Aluva Metro station. Here, I had lunch and then boarded the Airport feeder bus to reach Kochi Airport. All in all, I did auto rickshaw > water metro > metro > feeder bus to reach Airport. I was fun and scenic. I must say, public transport and intermodal integration is quite good and once can transition seamlessly from one mode to next. Kochi Water Metro
Kochi Water Metro

Scenes from Kochi Water Metro Scenes from Kochi Water Metro
Scenes from Kochi Water Metro
DebConf23 served its purpose of getting existing Debian people together, as well as getting new people interested and contributing to Debian. People who came are still contributing to Debian, and that s amazing. Streaming video stats
Streaming video stats. Screengrab from closing ceremony
The conference wasn t without its fair share of troubles. There were multiple money transfer woes, and being in India didn t help. Many thanks to multiple organizations who were proactive in helping out. On top of this, there was conference visa uncertainty and other issues which troubled visa team a lot. Kudos to everyone who made this possible. Surely, I m going to miss the name, so thank you for it, you know how much you have done to make this event possible. Now, DebConf24 is scheduled for Busan, South Korea, and work is already in full swing. As usual, I m helping with the fundraising part and plan to attend too. Let s see if I can make it or not. DebConf23 Group Photo
DebConf23 Group Photo. Click to enlarge.
Credits - Aigars Mahinovs
In the end, we kept on saying, no DebConf at this scale would come back to India for the next 10 or 20 years. It s too much trouble to be frank. It was probably the peak that we might not reach again. I would be happy to be proven wrong though :)

6 May 2024

Thomas Lange: Removing tens of thousands of web pages

In January I've removed tens of thousands of web pages on www.debian.org. Have you noticed it? In the past From 1997 onwards, we had web pages for security announcements. We had to manually prepare a .data and a .wml file which then generated a web page for each security announcement (DSA or DLA). We have listed the 6 most recent messages in a short list that was created from these files. Most of the work that went into the Debian web pages was creating these files. Our search engine often listed the pages with security announcements instead of a more relevant web page for a particular topic. Preparation At DebConf Kosovo (2022) I started with a proof of concept and wrote a script, that generates this list without using the .data/.wml files in the Git repository, but instead reading the primary sources of security information[1]. This new list now includes links to the security tracker and the email of the announcement. Following web pages and scripts were also using these .data and .wml files: Before I could remove all the security web pages, I had to adjust the scripts, that create the above information. When I looked at the OVAL files and the apache logs of our web server, I saw that more than 99% of the web traffic was generated by these XML files (134TB of 135TB total in two weeks). They were not compressed and were around 50MB in size. With the help of Carsten Sch nert we managed to modify the python scripts that generate this OVAL file without using the .data/.wml files and now we only provide bzip2 compressed XML files[2]. The RSS feeds are created by the new Perl script which reads the DSA/DLA list the security tracker and determines the URL of the email of all entries. This script also generates the list of the most recent DSA/DLA entries. Currently we show the last 350 entries which covers more than the last year and includes links to the announcement email and the security tracker. The huge list of crossreferences is not needed any more, since the mapping of CVE to DSA is already included in the DSA list[3] of the security tracker. The amount of translations of the DSA/DLA was very different. French translations were almost all done, but all other languages did translations for a couple of months or years only. E.g. in 2022, Italian had 2 translations, Russian 15, Danish 212, French and English each 279. But from 2023 on only French translations were made. By generating the list of DSA/DLA we lost the ability to translate these web pages, but since these announcements are made of simple, identical sentences it is easy to use an automatic translation service if needed. Now the translation statistics of all web pages are more accurate. Instead of 12200 pages that need to be translated (including all these old DSA/DLA) there are now only 2500 pages to translate[4]. Languages that had a lot of old translations of DSA/DLA lost some percentage but languages that are doing translations of newer web pages won in the statistics of how many pages are translated. Examples: Before
German (de)   3501  28.5%
Italian (it)  1005   8.2%
Danish (da)   6336  51.7%
After
German (de)   1486  59.0%
Italian (it)   909  36.1%
Danish (da)    982  39.0%
Cleanup of all the security web pages Finally in January, I could remove all web pages of the security announcements in one git commit[5]. Using several git rm -rf commands this commit removed 54335 files, including around 9650 DSA/DLA data files, 44189 wml files, nearly 500 Makefiles. Outcome No more manual work is needed for the security team and we now have direct links from a DSA-NNN/DLA-NNN to the email in our mailing list archive. This was not possible before. The search results became more accurate. But we still host a lot of other old content on the Debian web pages which may be removed in the future. [1] https://www.debian.org/security/#infos [2] https://www.debian.org/security/oval/ [3] https://salsa.debian.org/security-tracker-team/security-tracker/-/raw/master/data/DSA/list [4] https://www.debian.org/devel/website/stats [5] https://salsa.debian.org/webmaster-team/webwml/-/commit/2aa73ff15bfc4eb2afd85c

1 May 2024

Antoine Beaupr : Tor migrates from Gitolite/GitWeb to GitLab

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.
Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server. Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:
https://git.torproject.org/
to:
https://gitlab.torproject.org/
In your Git configuration. The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab. Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab. User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate. The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining. The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code? Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request. An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page. Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost. Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones. A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form. That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab. Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org. In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why? This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago. The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories. There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab. TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab. In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules. When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories. When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.) So a relatively large amount of Python code was produced to basically do the following:
  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change
User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks. Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order. The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions. The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:
anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]
In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:
$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> =>  'id': 2723, 'name': 'master', 'push_access_levels': [ 'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None ], 'merge_access_levels': [ 'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers' ], 'allow_force_push': False 
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]
Acute eyes will notice the bell used as a notification mechanism as well in this transcript. A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious. This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone. Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record. The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case
# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches hooks info objects/).*) git-.* upload-pack receive-pack HEAD config description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond % REQUEST_URI  ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%2 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$2 .git/$3 [R=302,L]
# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

gitweb.torproject.org rewrite rules Those are the vastly more complicated GitWeb to GitLab rewrite rules. Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]
# list of endpoints taken from cgit's cmd.c
# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond % REQUEST_URI  ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# summary
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# about
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# commit
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond "% QUERY_STRING " "(.*(?:^ &))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1 [R=302,L,QSD]
# patch
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.patch [R=302,L,QSD]
# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.diff [R=302,L,QSD]
# log
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD$2 [R=302,L]
# atom
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L,QSD]
# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags/%1 [R=302,L,QSD]
# tree
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD$2 [R=302,L]
# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD [R=302,L]
# plain
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/HEAD$2 [R=302,L]
# blame: disabled
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteCond % QUERY_STRING  h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/HEAD/$2 [R=302,L]
# stats
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/graphs/HEAD [R=302,L]
# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side
# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]
The reference copy of those is available in our (currently private) Puppet git repository.

18 April 2024

Jonathan McDowell: Sorting out backup internet #2: 5G modem

Having setup recursive DNS it was time to actually sort out a backup internet connection. I live in a Virgin Media area, but I still haven t forgiven them for my terrible Virgin experiences when moving here. Plus it involves a bigger contractual commitment. There are no altnets locally (though I m watching youfibre who have already rolled out in a few Belfast exchanges), so I decided to go for a 5G modem. That gives some flexibility, and is a bit easier to get up and running. I started by purchasing a ZTE MC7010. This had the advantage of being reasonably cheap off eBay, not having any wifi functionality I would just have to disable (it s going to plug it into the same router the FTTP connection terminates on), being outdoor mountable should I decide to go that way, and, finally, being powered via PoE. For now this device sits on the window sill in my study, which is at the top of the house. I printed a table stand for it which mostly does the job (though not as well with a normal, rather than flat, network cable). The router lives downstairs, so I ve extended a dedicated VLAN through the study switch, down to the core switch and out to the router. The PoE study switch can only do GigE, not 2.5Gb/s, but at present that s far from the limiting factor on the speed of the connection. The device is 3 branded, and, as it happens, I ve ended up with a 3 SIM in it. Up until recently my personal phone was with them, but they ve kicked me off Go Roam, so I ve moved. Going with 3 for the backup connection provides some slight extra measure of resiliency; we now have devices on all 4 major UK networks in the house. The SIM is a preloaded data only SIM good for a year; I don t expect to use all of the data allowance, but I didn t want to have to worry about unexpected excess charges. Performance turns out to be disappointing; I end up locking the device to 4G as the 5G signal is marginal - leaving it enabled results in constantly switching between 4G + 5G and a significant extra latency. The smokeping graph below shows a brief period where I removed the 4G lock and allowed 5G: Smokeping 4G vs 5G graph (There s a handy zte.js script to allow doing this from the device web interface.) I get about 10Mb/s sustained downloads out of it. EE/Vodafone did not lead to significantly better results, so for now I m accepting it is what it is. I tried relocating the device to another part of the house (a little tricky while still providing switch-based PoE, but I have an injector), without much improvement. Equally pinning the 4G to certain bands provided a short term improvement (I got up to 40-50Mb/s sustained), but not reliably so. speedtest.net results This is disappointing, but if it turns out to be a problem I can look at mounting it externally. I also assume as 5G is gradually rolled out further things will naturally improve, but that might be wishful thinking on my part. Rather than wait until my main link had a problem I decided to try a day working over the 5G connection. I spend a lot of my time either in browser based apps or accessing remote systems via SSH, so I m reasonably sensitive to a jittery or otherwise flaky connection. I picked a day that I did not have any meetings planned, but as it happened I ended up with an adhoc video call arranged. I m pleased to say that it all worked just fine; definitely noticeable as slower than the FTTP connection (to be expected), but all workable and even the video call was fine (at least from my end). Looking at the traffic graph shows the expected ~ 10Mb/s peak (actually a little higher, and looking at the FTTP stats for previous days not out of keeping with what we see there), and you can just about see the ~ 3Mb/s symmetric use by the video call at 2pm: 4G traffic during the work day The test run also helped iron out the fact that the content filter was still enabled on the SIM, but that was easily resolved. Up next, vaguely automatic failover.

13 April 2024

Paul Tagliamonte: Domo Arigato, Mr. debugfs

Years ago, at what I think I remember was DebConf 15, I hacked for a while on debhelper to write build-ids to debian binary control files, so that the build-id (more specifically, the ELF note .note.gnu.build-id) wound up in the Debian apt archive metadata. I ve always thought this was super cool, and seeing as how Michael Stapelberg blogged some great pointers around the ecosystem, including the fancy new debuginfod service, and the find-dbgsym-packages helper, which uses these same headers, I don t think I m the only one. At work I ve been using a lot of rust, specifically, async rust using tokio. To try and work on my style, and to dig deeper into the how and why of the decisions made in these frameworks, I ve decided to hack up a project that I ve wanted to do ever since 2015 write a debug filesystem. Let s get to it.

Back to the Future Time to admit something. I really love Plan 9. It s just so good. So many ideas from Plan 9 are just so prescient, and everything just feels right. Not just right like, feels good like, correct. The bit that I ve always liked the most is 9p, the network protocol for serving a filesystem over a network. This leads to all sorts of fun programs, like the Plan 9 ftp client being a 9p server you mount the ftp server and access files like any other files. It s kinda like if fuse were more fully a part of how the operating system worked, but fuse is all running client-side. With 9p there s a single client, and different servers that you can connect to, which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic. The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9 in terms of adoption 9p is in all sorts of places folks don t usually expect. For instance, the Windows Subsystem for Linux uses the 9p protocol to share files between Windows and Linux. ChromeOS uses it to share files with Crostini, and qemu uses 9p (virtio-p9) to share files between guest and host. If you re noticing a pattern here, you d be right; for some reason 9p is the go-to protocol to exchange files between hypervisor and guest. Why? I have no idea, except maybe due to being designed well, simple to implement, and it s a lot easier to validate the data being shared and validate security boundaries. Simplicity has its value. As a result, there s a lot of lingering 9p support kicking around. Turns out Linux can even handle mounting 9p filesystems out of the box. This means that I can deploy a filesystem to my LAN or my localhost by running a process on top of a computer that needs nothing special, and mount it over the network on an unmodified machine unlike fuse, where you d need client-specific software to run in order to mount the directory. For instance, let s mount a 9p filesystem running on my localhost machine, serving requests on 127.0.0.1:564 (tcp) that goes by the name mountpointname to /mnt.
$ mount -t 9p \
-o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \
127.0.0.1 \
/mnt
Linux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS.

WHEREIN I STYX WITH IT Since I wanted to push myself a bit more with rust and tokio specifically, I opted to implement the whole stack myself, without third party libraries on the critical path where I could avoid it. The 9p protocol (sometimes called Styx, the original name for it) is incredibly simple. It s a series of client to server requests, which receive a server to client response. These are, respectively, T messages, which transmit a request to the server, which trigger an R message in response (Reply messages). These messages are TLV payload with a very straight forward structure so straight forward, in fact, that I was able to implement a working server off nothing more than a handful of man pages. Later on after the basics worked, I found a more complete spec page that contains more information about the unix specific variant that I opted to use (9P2000.u rather than 9P2000) due to the level of Linux specific support for the 9P2000.u variant over the 9P2000 protocol.

MR ROBOTO The backend stack over at zoo is rust and tokio running i/o for an HTTP and WebRTC server. I figured I d pick something fairly similar to write my filesystem with, since 9P can be implemented on basically anything with I/O. That means tokio tcp server bits, which construct and use a 9p server, which has an idiomatic Rusty API that partially abstracts the raw R and T messages, but not so much as to cause issues with hiding implementation possibilities. At each abstraction level, there s an escape hatch allowing someone to implement any of the layers if required. I called this framework arigato which can be found over on docs.rs and crates.io.
/// Simplified version of the arigato File trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.File.html
trait File  
/// OpenFile is the type returned by this File via an Open call.
 type OpenFile: OpenFile;
/// Return the 9p Qid for this file. A file is the same if the Qid is
 /// the same. A Qid contains information about the mode of the file,
 /// version of the file, and a unique 64 bit identifier.
 fn qid(&self) -> Qid;
/// Construct the 9p Stat struct with metadata about a file.
 async fn stat(&self) -> FileResult<Stat>;
/// Attempt to update the file metadata.
 async fn wstat(&mut self, s: &Stat) -> FileResult<()>;
/// Traverse the filesystem tree.
 async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>;
/// Request that a file's reference be removed from the file tree.
 async fn unlink(&mut self) -> FileResult<()>;
/// Create a file at a specific location in the file tree.
 async fn create(
&mut self,
name: &str,
perm: u16,
ty: FileType,
mode: OpenMode,
extension: &str,
) -> FileResult<Self>;
/// Open the File, returning a handle to the open file, which handles
 /// file i/o. This is split into a second type since it is genuinely
 /// unrelated -- and the fact that a file is Open or Closed can be
 /// handled by the  arigato  server for us.
 async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>;
 
/// Simplified version of the arigato OpenFile trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html
trait OpenFile  
/// iounit to report for this file. The iounit reported is used for Read
 /// or Write operations to signal, if non-zero, the maximum size that is
 /// guaranteed to be transferred atomically.
 fn iounit(&self) -> u32;
/// Read some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes read is
 /// returned.
 async fn read_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
/// Write some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes written
 /// is returned.
 fn write_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
 

Thanks, decade ago paultag! Let s do it! Let s use arigato to implement a 9p filesystem we ll call debugfs that will serve all the debug files shipped according to the Packages metadata from the apt archive. We ll fetch the Packages file and construct a filesystem based on the reported Build-Id entries. For those who don t know much about how an apt repo works, here s the 2-second crash course on what we re doing. The first is to fetch the Packages file, which is specific to a binary architecture (such as amd64, arm64 or riscv64). That architecture is specific to a component (such as main, contrib or non-free). That component is specific to a suite, such as stable, unstable or any of its aliases (bullseye, bookworm, etc). Let s take a look at the Packages.xz file for the unstable-debug suite, main component, for all amd64 binaries.
$ curl \
https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \
  unxz
This will return the Debian-style rfc2822-like headers, which is an export of the metadata contained inside each .deb file which apt (or other tools that can use the apt repo format) use to fetch information about debs. Let s take a look at the debug headers for the netlabel-tools package in unstable which is a package named netlabel-tools-dbgsym in unstable-debug.
Package: netlabel-tools-dbgsym
Source: netlabel-tools (0.30.0-1)
Version: 0.30.0-1+b1
Installed-Size: 79
Maintainer: Paul Tagliamonte <paultag@debian.org>
Architecture: amd64
Depends: netlabel-tools (= 0.30.0-1+b1)
Description: debug symbols for netlabel-tools
Auto-Built-Package: debug-symbols
Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a
Description-md5: a0e587a0cf730c88a4010f78562e6db7
Section: debug
Priority: optional
Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
Size: 62776
SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60
So here, we can parse the package headers in the Packages.xz file, and store, for each Build-Id, the Filename where we can fetch the .deb at. Each .deb contains a number of files but we re only really interested in the files inside the .deb located at or under /usr/lib/debug/.build-id/, which you can find in debugfs under rfc822.rs. It s crude, and very single-purpose, but I m feeling a bit lazy.

Who needs dpkg?! For folks who haven t seen it yet, a .deb file is a special type of .ar file, that contains (usually) three files inside debian-binary, control.tar.xz and data.tar.xz. The core of an .ar file is a fixed size (60 byte) entry header, followed by the specified size number of bytes.
[8 byte .ar file magic]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
...
First up was to implement a basic ar parser in ar.rs. Before we get into using it to parse a deb, as a quick diversion, let s break apart a .deb file by hand something that is a bit of a rite of passage (or at least it used to be? I m getting old) during the Debian nm (new member) process, to take a look at where exactly the .debug file lives inside the .deb file.
$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ ls
control.tar.xz debian-binary
data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ tar --list -f data.tar.xz   grep '.debug$'
./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
Since we know quite a bit about the structure of a .deb file, and I had to implement support from scratch anyway, I opted to implement a (very!) basic debfile parser using HTTP Range requests. HTTP Range requests, if supported by the server (denoted by a accept-ranges: bytes HTTP header in response to an HTTP HEAD request to that file) means that we can add a header such as range: bytes=8-68 to specifically request that the returned GET body be the byte range provided (in the above case, the bytes starting from byte offset 8 until byte offset 68). This means we can fetch just the ar file entry from the .deb file until we get to the file inside the .deb we are interested in (in our case, the data.tar.xz file) at which point we can request the body of that file with a final range request. I wound up writing a struct to handle a read_at-style API surface in hrange.rs, which we can pair with ar.rs above and start to find our data in the .deb remotely without downloading and unpacking the .deb at all. After we have the body of the data.tar.xz coming back through the HTTP response, we get to pipe it through an xz decompressor (this kinda sucked in Rust, since a tokio AsyncRead is not the same as an http Body response is not the same as std::io::Read, is not the same as an async (or sync) Iterator is not the same as what the xz2 crate expects; leading me to read blocks of data to a buffer and stuff them through the decoder by looping over the buffer for each lzma2 packet in a loop), and tarfile parser (similarly troublesome). From there we get to iterate over all entries in the tarfile, stopping when we reach our file of interest. Since we can t seek, but gdb needs to, we ll pull it out of the stream into a Cursor<Vec<u8>> in-memory and pass a handle to it back to the user. From here on out its a matter of gluing together a File traited struct in debugfs, and serving the filesystem over TCP using arigato. Done deal!

A quick diversion about compression I was originally hoping to avoid transferring the whole tar file over the network (and therefore also reading the whole debug file into ram, which objectively sucks), but quickly hit issues with figuring out a way around seeking around an xz file. What s interesting is xz has a great primitive to solve this specific problem (specifically, use a block size that allows you to seek to the block as close to your desired seek position just before it, only discarding at most block size - 1 bytes), but data.tar.xz files generated by dpkg appear to have a single mega-huge block for the whole file. I don t know why I would have expected any different, in retrospect. That means that this now devolves into the base case of How do I seek around an lzma2 compressed data stream ; which is a lot more complex of a question. Thankfully, notoriously brilliant tianon was nice enough to introduce me to Jon Johnson who did something super similar adapted a technique to seek inside a compressed gzip file, which lets his service oci.dag.dev seek through Docker container images super fast based on some prior work such as soci-snapshotter, gztool, and zran.c. He also pulled this party trick off for apk based distros over at apk.dag.dev, which seems apropos. Jon was nice enough to publish a lot of his work on this specifically in a central place under the name targz on his GitHub, which has been a ton of fun to read through. The gist is that, by dumping the decompressor s state (window of previous bytes, in-memory data derived from the last N-1 bytes) at specific checkpoints along with the compressed data stream offset in bytes and decompressed offset in bytes, one can seek to that checkpoint in the compressed stream and pick up where you left off creating a similar block mechanism against the wishes of gzip. It means you d need to do an O(n) run over the file, but every request after that will be sped up according to the number of checkpoints you ve taken. Given the complexity of xz and lzma2, I don t think this is possible for me at the moment especially given most of the files I ll be requesting will not be loaded from again especially when I can just cache the debug header by Build-Id. I want to implement this (because I m generally curious and Jon has a way of getting someone excited about compression schemes, which is not a sentence I thought I d ever say out loud), but for now I m going to move on without this optimization. Such a shame, since it kills a lot of the work that went into seeking around the .deb file in the first place, given the debian-binary and control.tar.gz members are so small.

The Good First, the good news right? It works! That s pretty cool. I m positive my younger self would be amused and happy to see this working; as is current day paultag. Let s take debugfs out for a spin! First, we need to mount the filesystem. It even works on an entirely unmodified, stock Debian box on my LAN, which is huge. Let s take it for a spin:
$ mount \
-t 9p \
-o trans=tcp,version=9p2000.u,aname=unstable-debug \
192.168.0.2 \
/usr/lib/debug/.build-id/
And, let s prove to ourselves that this actually mounted before we go trying to use it:
$ mount   grep build-id
192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564)
Slick. We ve got an open connection to the server, where our host will keep a connection alive as root, attached to the filesystem provided in aname. Let s take a look at it.
$ ls /usr/lib/debug/.build-id/
00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3
01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4
02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5
03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6
04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7
05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8
06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9
07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa
08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb
09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc
0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd
0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe
0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff
Outstanding. Let s try using gdb to debug a binary that was provided by the Debian archive, and see if it ll load the ELF by build-id from the right .deb in the unstable-debug suite:
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Yes! Yes it will!
$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
/usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped

The Bad Linux s support for 9p is mainline, which is great, but it s not robust. Network issues or server restarts will wedge the mountpoint (Linux can t reconnect when the tcp connection breaks), and things that work fine on local filesystems get translated in a way that causes a lot of network chatter for instance, just due to the way the syscalls are translated, doing an ls, will result in a stat call for each file in the directory, even though linux had just got a stat entry for every file while it was resolving directory names. On top of that, Linux will serialize all I/O with the server, so there s no concurrent requests for file information, writes, or reads pending at the same time to the server; and read and write throughput will degrade as latency increases due to increasing round-trip time, even though there are offsets included in the read and write calls. It works well enough, but is frustrating to run up against, since there s not a lot you can do server-side to help with this beyond implementing the 9P2000.L variant (which, maybe is worth it).

The Ugly Unfortunately, we don t know the file size(s) until we ve actually opened the underlying tar file and found the correct member, so for most files, we don t know the real size to report when getting a stat. We can t parse the tarfiles for every stat call, since that d make ls even slower (bummer). Only hiccup is that when I report a filesize of zero, gdb throws a bit of a fit; let s try with a size of 0 to start:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug]
[...]
This obviously won t work since gdb will throw away all our hard work because of stat s output, and neither will loading the real size of the underlying file. That only leaves us with hardcoding a file size and hope nothing else breaks significantly as a result. Let s try it again:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Much better. I mean, terrible but better. Better for now, anyway.

Kilroy was here Do I think this is a particularly good idea? I mean; kinda. I m probably going to make some fun 9p arigato-based filesystems for use around my LAN, but I don t think I ll be moving to use debugfs until I can figure out how to ensure the connection is more resilient to changing networks, server restarts and fixes on i/o performance. I think it was a useful exercise and is a pretty great hack, but I don t think this ll be shipping anywhere anytime soon. Along with me publishing this post, I ve pushed up all my repos; so you should be able to play along at home! There s a lot more work to be done on arigato; but it does handshake and successfully export a working 9P2000.u filesystem. Check it out on on my github at arigato, debugfs and also on crates.io and docs.rs. At least I can say I was here and I got it working after all these years.

Next.