Search Results: "will"

30 June 2025

Russell Coker: Links June 2025

Jonathan McDowell wrote part 2 of his blog series about setting up a voice assistant on Debian, I look forward to reading further posts [1]. I m working on some related things for Debian that will hopefully work with this. I m testing out OpenSnitch on Trixie inspired by this blog post, it s an interesting package [2]. Valerie wrote an informative article about creating mesh networks using LORA for emergency use [3]. Interesting article about Signal and Windows Recall. That gives us some things to consider regarding ML features on Linux systems [4]. Insightful article about AI and the end of prestige [5]. We should all learn about LLMs. Jonathan Dowland wrote an informative blog post about how to manage namespaces on Linux [6]. The Consumer Rights wiki is a great resource for raising awareness of corporations exploiting their customers for computer related goods and services [7]. Interesting article about Schizophrenia and the cliff-edge function of evolution [8].

Otto Kek l inen: Corporate best practices for upstream open source contributions

Featured image of post Corporate best practices for upstream open source contributions
This post is based on presentation given at the Validos annual members meeting on June 25th, 2025.
When I started getting into Linux and open source over 25 years ago, the majority of the software development in this area was done by academics and hobbyists. The number of companies participating in open source has since exploded in parallel with the growth of mobile and cloud software, the majority of which is built on top of open source. For example, Android powers most mobile phones today and is based on Linux. Almost all software used to operate large cloud provider data centers, such as AWS or Google, is either open source or made in-house by the cloud provider. Pretty much all companies, regardless of the industry, have been using open source software at least to some extent for years. However, the degree to which they collaborate with the upstream origins of the software varies. I encourage all companies in a technical industry to start contributing upstream. There are many benefits to having a good relationship with your upstream open source software vendors, both for the short term and especially for the long term. Moreover, with the rollout of CRA in EU in 2025-2027, the law will require software companies to contribute security fixes upstream to the open source projects their products use. To ensure the process is well managed, business-aligned and legally compliant, there are a few do s and don t do s that are important to be aware of.

Maintain your SBOMs For every piece of software, regardless of whether the code was done in-house, from an open source project, or a combination of these, every company needs to produce a Software Bill of Materials (SBOM). The SBOMs provide a standardized and interoperable way to track what software and which versions are used where, what software licenses apply, who holds the copyright of which component, which security fixes have been applied and so forth. A catalog of SBOMs, or equivalent, forms the backbone of software supply-chain management in corporations.

Identify your strategic upstream vendors The SBOMs are likely to reveal that for any piece of non-trivial software, there are hundreds or thousands of upstream open source projects in use. Few organizations have resources to contribute to all of their upstreams. If your organization is just starting to organize upstream contribution activities, identify the key projects that have the largest impact on your business and prioritize forming a relationship with them first. Organizations with a mature contribution process will be collaborating with tens or hundreds of upstreams.

Appoint an internal coordinator and champions Having a written policy on how to contribute upstream will help ensure a consistent process and avoid common pitfalls. However, a written policy alone does not automatically translate into a well-running process. It is highly recommended to appoint at least one internal coordinator who is knowledgeable about how open source communities work, how software licensing and patents work, and is senior enough to have a good sense of what business priorities to optimize for. In small organizations it can be a single person, while larger organizations typically have a full Open Source Programs Office. This coordinator should oversee the contribution process, track all contributions made across the organization, and further optimize the process by working with stakeholders across the business, including legal experts, business owners and CTOs. The marketing and recruiting folks should also be involved, as upstream contributions will have a reputation-building aspect as well, which can be enhanced with systematic tracking and publishing of activities. Additionally, at least in the beginning, the organization should also appoint key staff members as open source champions. Implementing a new process always includes some obstacles and occasional setbacks, which may discourage employees from putting in the extra effort to reap the full long-term benefits for the company. Having named champions will empower them to make the first few contributions themselves, setting a good example and encouraging and mentoring others to contribute upstream as well.

Avoid excessive approvals To maintain a high quality bar, it is always good to have all outgoing submissions reviewed by at least one or two people. Two or three pairs of eyeballs are significantly more likely to catch issues that might slip by someone working alone. The review also slows down the process by a day or two, which gives the author time to sleep on it , which usually helps to ensure the final submission is well-thought-out by the author. Do not require more than one or two reviewers. The marginal utility goes quickly to zero beyond a few reviewers, and at around four or five people the effect becomes negative, as the weight of each approval decreases and the reviewers begin to take less personal responsibility. Having too many people in the loop also makes each feedback round slow and expensive, to the extent that the author will hesitate to make updates and ask for re-reviews due to the costs involved. If the organization experiences setbacks due to mistakes slipping through the review process, do not respond by adding more reviewers, as it will just grind the contribution process to a halt. If there are quality concerns, invest in training for engineers, CI systems and perhaps an internal certification program for those making public upstream code submissions. A typical software engineer is more likely to seriously try to become proficient at their job and put effort into a one-off certification exam and then make multiple high-quality contributions, than it is for a low-skilled engineer to improve and even want to continue doing more upstream contributions if they are burdened by heavy review processes every time they try to submit an upstream contribution.

Don t expect upstream to accept all code contributions Sure, identifying the root cause of and fixing a tricky bug or writing a new feature requires significant effort. While an open source project will certainly appreciate the effort invested, it doesn t mean it will always welcome all contributions with open arms. Occasionally, the project won t agree that the code is correct or the feature is useful, and some contributions are bound to be rejected. You can minimize the chance of experiencing rejections by having a solid internal review process that includes assessing how the upstream community is likely to understand the proposal. Sometimes how things are communicated is more important than how they are coded. Polishing inline comments and git commit messages help ensure high-quality communication, along with a commitment to respond quickly to review feedback and conducting regular follow-ups until a contribution is finalized and accepted.

Start small to grow expertise and reputation In addition to keeping the open source contribution policy lean and nimble, it is also good to start practical contributions with small issues. Don t aim to contribute massive features until you have a track record of being able to make multiple small contributions. Keep in mind that not all open source projects are equal. Each has its own culture, written and unwritten rules, development process, documented requirements (which may be outdated) and more. Starting with a tiny contribution, even just a typo fix, is a good way to validate how code submissions, reviews and approvals work in a particular project. Once you have staff who have successfully landed smaller contributions, you can start planning larger proposals. The exact same proposal might be unsuccessful when proposed by a new person, and successful when proposed by a person who already has a reputation for prior high-quality work.

Embrace all and any publicity you get Some companies have concerns about their employees working in the open. Indeed, every email and code patch an employee submits, and all related discussions become public. This may initially sound scary, but is actually a potential source of good publicity. Employees need to be trained on how to conduct themselves publicly, and the discussions about code should contain only information strictly related to the code, without any references to actual production environments or other sensitive information. In the long run most employees contributing have a positive impact and the company should reap the benefits of positive publicity. If there are quality issues or employee judgment issues, hiding the activity or forcing employees to contribute with pseudonyms is not a proper solution. Instead, the problems should be addressed at the root, and bad behavior addressed rather than tolerated. When people are working publicly, there tends to also be some degree of additional pride involved, which motivates people to try their best. Contributions need to be public for the sponsoring corporation to later be able to claim copyright or licenses. Considering that thousands of companies participate in open source every day, the prevalence of bad publicity is quite low, and the benefits far exceed the risks.

Scratch your own itch When choosing what to contribute, select things that benefit your own company. This is not purely about being selfish - often people working on resolving a problem they suffer from are the same people with the best expertise of what the problem is and what kind of solution is optimal. Also, the issues that are most pressing to your company are more likely to be universally useful to solve than any random bug or feature request in the upstream project s issue tracker.

Remember there are many ways to help upstream While submitting code is often considered the primary way to contribute, please keep in mind there are also other highly impactful ways to contribute. Submitting high-quality bug reports will help developers quickly identify and prioritize issues to fix. Providing good research, benchmarks, statistics or feedback helps guide development and the project make better design decisions. Documentation, translations, organizing events and providing marketing support can help increase adoption and strengthen long-term viability for the project. In some of the largest open source projects there are already far more pending contributions than the core maintainers can process. Therefore, developers who contribute code should also get into the habit of contributing reviews. As Linus law states, given enough eyeballs, all bugs are shallow. Reviewing other contributors submissions will help improve quality, and also alleviate the pressure on core maintainers who are the only ones providing feedback. Reviewing code submitted by others is also a great learning opportunity for the reviewer. The reviewer does not need to be better than the submitter - any feedback is useful; merely posting review feedback is not the same thing as making an approval decision. Many projects are also happy to accept monetary support and sponsorships. Some offer specific perks in return. By human nature, the largest sponsors always get their voice heard in important decisions, as no open source project wants to take actions that scare away major financial contributors.

Starting is the hardest part Long-term success in open source comes from a positive feedback loop of an ever-increasing number of users and collaborators. As seen in the examples of countless corporations contributing open source, the benefits are concrete, and the process usually runs well after the initial ramp-up and organizational learning phase has passed. In open source ecosystems, contributing upstream should be as natural as paying vendors in any business. If you are using open source and not contributing at all, you likely have latent business risks without realizing it. You don t want to wake up one morning to learn that your top talent left because they were forbidden from participating in open source for the company s benefit, or that you were fined due to CRA violations and mismanagement in sharing security fixes with the correct parties. The faster you start with the process, the less likely those risks will materialize.

29 June 2025

Matthias Geiger: Hello world

I finally got around to setting up a blog with pelican as SSG, so here I will be posting about my various Debian-related activities.

Sergio Cipriano: How I deployed this Website

How I deployed this Website I will describe the step-by-step process I followed to make this static website accessible on the Internet.

DNS I bought this domain on NameCheap and am using their DNS for now, where I created these records:
Record Type Host Value
A sergiocipriano.com 201.54.0.17
CNAME www sergiocipriano.com

Virtual Machine I am using Magalu Cloud for hosting my VM, since employees have free credits. Besides creating a VM with a public IP, I only needed to set up a Security Group with the following rules:
Type Protocol Port Direction CIDR
IPv4 / IPv6 TCP 80 IN Any IP
IPv4 / IPv6 TCP 443 IN Any IP

Firewall The first thing I did in the VM was enabling ufw (Uncomplicated Firewall). Enabling ufw without pre-allowing SSH is a common pitfall and can lock you out of your VM. I did this once :) A safe way to enable ufw:
$ sudo ufw allow OpenSSH      # or: sudo ufw allow 22/tcp
$ sudo ufw allow 'Nginx Full' # or: sudo ufw allow 80,443/tcp
$ sudo ufw enable
To check if everything is ok, run:
$ sudo ufw status verbose
Status: active
Logging: on (low)
Default: deny (incoming), allow (outgoing), disabled (routed)
New profiles: skip
To                           Action      From
--                           ------      ----
22/tcp (OpenSSH)             ALLOW IN    Anywhere                  
80,443/tcp (Nginx Full)      ALLOW IN    Anywhere                  
22/tcp (OpenSSH (v6))        ALLOW IN    Anywhere (v6)             
80,443/tcp (Nginx Full (v6)) ALLOW IN    Anywhere (v6) 

Reverse Proxy I'm using Nginx as the reverse proxy. Since I use the Debian package, I just needed to add this file:
/etc/nginx/sites-enabled/sergiocipriano.com
with this content:
server  
    listen 443 ssl;      # IPv4
    listen [::]:443 ssl; # IPv6
    server_name sergiocipriano.com www.sergiocipriano.com;
    root /path/to/website/sergiocipriano.com;
    index index.html;
    location /  
        try_files $uri /index.html;
     
 
server  
    listen 80;
    listen [::]:80;
    server_name sergiocipriano.com www.sergiocipriano.com;
    # Redirect all HTTP traffic to HTTPS
    return 301 https://$host$request_uri;
 

TLS It's really easy to setup TLS thanks to Let's Encrypt:
$ sudo apt-get install certbot python3-certbot-nginx
$ sudo certbot install --cert-name sergiocipriano.com
Saving debug log to /var/log/letsencrypt/letsencrypt.log
Deploying certificate
Successfully deployed certificate for sergiocipriano.com to /etc/nginx/sites-enabled/sergiocipriano.com
Successfully deployed certificate for www.sergiocipriano.com to /etc/nginx/sites-enabled/sergiocipriano.com
Certbot will edit the nginx configuration with the path to the certificate.

HTTP Security Headers I decided to use wapiti, which is a web application vulnerability scanner, and the report found this problems:
  1. CSP is not set
  2. X-Frame-Options is not set
  3. X-XSS-Protection is not set
  4. X-Content-Type-Options is not set
  5. Strict-Transport-Security is not set
I'll explain one by one:
  1. The Content-Security-Policy header prevents XSS and data injection by restricting sources of scripts, images, styles, etc.
  2. The X-Frame-Options header prevents a website from being embedded in iframes (clickjacking).
  3. The X-XSS-Protection header is deprecated. It is recommended that CSP is used instead of XSS filtering.
  4. The X-Content-Type-Options header stops MIME-type sniffing to prevent certain attacks.
  5. The Strict-Transport-Security header informs browsers that the host should only be accessed using HTTPS, and that any future attempts to access it using HTTP should automatically be upgraded to HTTPS. Additionally, on future connections to the host, the browser will not allow the user to bypass secure connection errors, such as an invalid certificate. HSTS identifies a host by its domain name only.
I added this security headers inside the HTTPS and HTTP server block, outside the location block, so they apply globally to all responses. Here's how the Nginx config look like:
add_header Content-Security-Policy "default-src 'self'; style-src 'self';" always;
add_header X-Frame-Options "DENY" always;
add_header X-Content-Type-Options "nosniff" always;
add_header Strict-Transport-Security "max-age=31536000; includeSubDomains" always;
I added always to ensure that nginx sends the header regardless of the response code. To add Content-Security-Policy header I had to move the css to a separate file, because browsers block inline styles under strict CSP unless you allow them explicitly. They're considered unsafe inline unless you move to a separate file and link it like this:
<link rel="stylesheet" href="./assets/header.css">

26 June 2025

Bits from Debian: AMD Platinum Sponsor of DebConf25

amd-logo We are pleased to announce that AMD has committed to sponsor DebConf25 as a Platinum Sponsor. The AMD ROCm platform includes programming models, tools, compilers, libraries, and runtimes for AI and HPC solution development on AMD GPUs. Debian is an officially supported platform for AMD ROCm and a growing number of components are now included directly in the Debian distribution. For more than 55 years AMD has driven innovation in high-performance computing, graphics and visualization technologies. AMD is deeply committed to supporting and contributing to open-source projects, foundations, and open-standards organizations, taking pride in fostering innovation and collaboration within the open-source community. With this commitment as Platinum Sponsor, AMD is contributing to the annual Debian Developers Conference, directly supporting the progress of Debian and Free Software. AMD contributes to strengthening the worldwide community that collaborates on Debian projects year-round. Thank you very much, AMD, for your support of DebConf25! Become a sponsor too! DebConf25 will take place from 14 to 20 July 2025 in Brest, France, and will be preceded by DebCamp, from 7 to 13 July 2025. DebConf25 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf25 website at https://debconf25.debconf.org/sponsors /become-a-sponsor/.

24 June 2025

Evgeni Golov: Using LXCFS together with Podman

JP was puzzled that using podman run --memory=2G would not result in the 2G limit being visible inside the container. While we were able to identify this as a visualization problem tools like free(1) only look at /proc/meminfo and that is not virtualized inside a container, you'd have to look at /sys/fs/cgroup/memory.max and friends instead I couldn't leave it at that. And then I remembered there is actually something that can provide a virtual (cgroup-aware) /proc for containers: LXCFS! But does it work with Podman?! I always used it with LXC, but there is technically no reason why it wouldn't work with a different container solution cgroups are cgroups after all. As we all know: there is only one way to find out! Take a fresh Debian 12 VM, install podman and verify things behave as expected:
user@debian12:~$ podman run -ti --rm --memory=2G centos:stream9
bash-5.1# grep MemTotal /proc/meminfo
MemTotal:        6067396 kB
bash-5.1# cat /sys/fs/cgroup/memory.max
2147483648
And after installing (and starting) lxcfs, we can use the virtual /proc/meminfo it generates by bind-mounting it into the container (LXC does that part automatically for us):
user@debian12:~$ podman run -ti --rm --memory=2G --mount=type=bind,source=/var/lib/lxcfs/proc/meminfo,destination=/proc/meminfo centos:stream9
bash-5.1# grep MemTotal /proc/meminfo
MemTotal:        2097152 kB
bash-5.1# cat /sys/fs/cgroup/memory.max
2147483648
The same of course works with all the other proc entries lxcfs provides (cpuinfo, diskstats, loadavg, meminfo, slabinfo, stat, swaps, and uptime here), just bind-mount them. And yes, free(1) now works too!
bash-5.1# free -m
               total        used        free      shared  buff/cache   available
Mem:            2048           3        1976           0          67        2044
Swap:              0           0           0
Just don't blindly mount the whole /var/lib/lxcfs/proc over the container's /proc. It did work (as in: "bash and free didn't crash") for me, but with /proc/$PID etc missing, I bet things will go south pretty quickly.

23 June 2025

Russell Coker: PFAs

For some time I ve been noticing news reports about PFAs [1]. I hadn t thought much about that issue, I grew up when leaded petrol was standard, when almost all thermometers had mercury, when all small batteries had mercury, and I had generally considered that I had already had so many nasty chemicals in my body that as long as I don t eat bottom feeding seafood often I didn t have much to worry about. I already had a higher risk of a large number of medical issues than I d like due to decisions made before I was born and there s not much to do about it given that there are regulations restricting the emissions of lead, mercury etc. I just watched a Veritasium video about Teflon and the PFA poisoning related to it s production [2]. This made me realise that it s more of a problem than I realised and it s a problem that s getting worse. PFA levels in the parts-per-trillion range in the environment can cause parts-per-billion in the body which increases the risks of several cancers and causes other health problems. Fortunately there is some work being done on water filtering, you can get filters for a home level now and they are working on filters that can work at a sufficient scale for a city water plant. There is a map showing PFAs in the environment in Australia which shows some sites with concerning levels that are near residential areas [3]. One of the major causes for that in Australia is fire retardant foam Australia has never had much if any Teflon manufacturing AFAIK. Also they noted that donating blood regularly can decrease levels of PFAs in the bloodstream. So presumably people who have medical conditions that require receiving donated blood regularly will have really high levels.

22 June 2025

Iustin Pop: Coding, as we knew it, has forever changed

Back when I was terribly na ve When I was younger, and definitely na ve, I was so looking forward to AI, which will help us write lots of good, reliable code faster. Well, principally me, not thinking what impact it will have industry-wide. Other more general concerns, like societal issues, role of humans in the future and so on were totally not on my radar. At the same time, I didn t expect this will actually happen. Even years later, things didn t change dramatically. Even the first release of ChatGPT a few years back didn t click for me, as the limitations were still significant.

Hints of serious change The first hint of the change, for me, was when a few months ago (yes, behind the curve), I asked ChatGPT to re-explain a concept to me, and it just wrote a lot of words, but without a clear explanation. On a whim, I asked Grok then recently launched, I think to do the same. And for the first time, the explanation clicked and I felt I could have a conversation with it. Of course, now I forgot again that theoretical CS concept, but the first step was done: I can ask an LLM to explain something, and it will, and I can have a back and forth logical discussion, even if on some theoretical concept. Additionally, I learned that not all LLMs are the same, and that means there s real competition and that leap frogging is possible. Another topic on which I tried to adopt early and failed to get mileage out of it, was GitHub Copilot (in VSC). I tried, it helped, but didn t feel any speed-up at all. Then more recently, in May, I asked Grok what s the state of the art in AI-assisted coding. It said either Claude in a browser tab, or in VSC via continue.dev extension. The continue.dev extension/tooling is a bit of a strange/interesting thing. It seems to want to be a middle-man between the user and actual LLM services, i.e. you pay a subscription to continue.dev, not to Anthropic itself, and they manage the keys/APIs, for whatever backend LLMs you want to use. The integration with Visual Studio Code is very nice, but I don t know if long-term their business model will make sense. Well, not my problem.

Claude: reverse engineering my old code and teaching new concepts So I installed the latter and subscribed, thinking 20 CHF for a month is good for testing. I skipped the tutorial model/assistant, created a new one from scratch, just enabled Claude 3.7 Sonnet, and started using it. And then, my mind was blown-not just by the LLM, but by the ecosystem. As said, I ve used GitHub copilot before, but it didn t seem effective. I don t know if a threshold has been reached, or Claude (3.7 at that time) is just better than ChatGPT. I didn t use the AI to write (non-trivial) code for me, at most boilerplate snippets. But I used it both as partner for discussion - I want to do x, what do you think, A or B? , and as a teacher, especially for fronted topics, which I m not familiar with. Since May, in mostly fragmented sessions, I ve achieved more than in the last two years. Migration from old school JS to ECMA modules, a webpacker (reducing bundle size by 50%), replacing an old Javascript library with hand written code using modern APIs, implementing the zoom feature together with all of keyboard, mouse, touchpad and touchscreen support, simplifying layout from manually computed to automatic layout, and finding a bug in webkit for which it also wrote a cool minimal test (cool, as in, way better than I d have ever, ever written, because for me it didn t matter that much). And more. Could I have done all this? Yes, definitely, nothing was especially tricky here. But hours and hours of reading MDN, scouring Stack Overflow and Reddit, and lots of trial and error. So doable, but much more toily. This, to me, feels like cheating. 20 CHF per month to make me 3x more productive is free money well, except that I don t make money on my code which is written basically for myself. However, I don t get stuck anymore searching hours in the web for guidance, I ask my question, and I get at least direction if not answer, and I m finished way earlier. I can now actually juggle more hobbies, in the same amount of time, if my personal code takes less time or differently said, if I m more efficient at it. Not all is roses, of course. Once, it did write code with such an endearing error that it made me laugh. It was so blatantly obvious that you shouldn t keep other state in the array that holds pointer status because that confuses the calculation of how many pointers are down , probably to itself too if I d have asked. But I didn t, since it felt a bit embarassing to point out such a dumb mistake. Yes, I m anthropomorphising again, because this is the easiest way to deal with things. In general, it does an OK-to-good-to-sometimes-awesome job, and the best thing is that it summarises documentation and all of Reddit and Stack Overflow. And gives links to those. Now, I have no idea yet what this means for the job of a software engineer. If on open source code, my own code, it makes me 3x faster reverse engineering my code from 10 years ago is no small feat for working on large codebases, it should do at least the same, if not more. As an example of how open-ended the assistance can be, at one point, I started implementing a new feature threading a new attribute to a large number of call points. This is not complex at all, just add a new field to a Haskell record, and modifying everything to take it into account, populate it, merge it when merging the data structures, etc. The code is not complex, tending toward boilerplate a bit, and I was wondering on a few possible choices for implementation, so, with just a few lines of code written that were not even compiling, I asked I want to add a new feature, should I do A or B if I want it to behave like this , and the answer was something along the lines of I see you want to add the specific feature I was working on, but the implementation is incomplete, you still need to to X, Y and Z . My mind was blown at this point, as I thought, if the code doesn t compile, surely the computer won t be able to parse it, but this is not a program, this is an LLM, so of course it could read it kind of as a human would. Again, the code complexity is not great, but the fact that it was able to read a half-written patch, understand what I was working towards, and reason about, was mind-blowing, and scary. Like always.

Non-code writing Now, after all this, while writing a recent blog post, I thought this is going to be public anyway, so let me ask Claude what it thinks about it. And I was very surprised, again: gone was all the pain of rereading three times my post to catch typos (easy) or phrasing structure issues. It gave me very clearly points, and helped me cut 30-40% of the total time. So not only coding, but word smithing too is changed. If I were an author, I d be delighted (and scared). Here is the overall reply it gave me:
  • Spelling and grammar fixes, all of them on point except one mistake (I claimed I didn t capitalize one word, but I did). To the level of a good grammar checker.
  • Flow Suggestions, which was way beyond normal spelling and grammar. It felt like a teacher telling me to do better in my writing, i.e. nitpicking on things that actually were true even if they d still work. I.e. lousy phrase structure, still understandable, but lousy nevertheless.
  • Other notes: an overall summary. This was mostly just praising my post . I wish LLMs were not so focused on praise the user .
So yeah, this speeds me up to about 2x on writing blog posts, too. It definitely feels not fair.

Wither the future? After all this, I m a bit flabbergasted. Gone are the 2000 s with code without unittests, gone are the 2010 s without CI/CD, and now, mid-2020 s, gone is the lone programmer that scours the internet to learn new things, alone? What this all means for our skills in software development, I have no idea, except I know things have irreversibly changed (a butlerian jihad aside). Do I learn better with a dedicated tutor even if I don t fight with the problem for so long? Or is struggling in finding good docs the main method of learning? I don t know yet. I feel like I understand the topics I m discussing with the AI, but who knows in reality what it will mean long term in terms of stickiness of learning. For the better, or for worse, things have changed. After all the advances over the last five centuries in mechanical sciences, it has now come to some aspects of the intellectual work. Maybe this is the answer to the ever-growing complexity of tech stacks? I.e. a return of the lone programmer that builds things end-to-end, but with AI taming the complexity added in the last 25 years? I can dream, of course, but this also means that the industry overall will increase in complexity even more, because large companies tend to do that, so maybe a net effect of not much One thing I did learn so far is that my expectation that AI (at this level) will only help junior/beginner people, i.e. it would flatten the skills band, is not true. I think AI can speed up at least the middle band, likely the middle top band, I don t know about the 10x programmers (I m not one of them). So, my question about AI now is how to best use it, not to lament how all my learning (90% self learning, to be clear) is obsolete. No, it isn t. AI helps me start and finish one migration (that I delayed for ages), then start the second, in the same day. At the end of this a bit rambling reflection on the past month and a half, I still have many questions about AI and humanity. But one has been answered: yes, AI , quotes or no quotes, already has changed this field (producing software), and we ve not seen the end of it, for sure.

Sahil Dhiman: Case of (broken) maharashtra.gov.in Authoritative Name Servers

Maharashtra is a state here in India, which has Mumbai, the financial capital of India, as its capital. maharashtra.gov.in is the official website of the State Government of Maharashtra. We re going to talk about authoritative name servers serving it (and bunch of child zones under maharashtra.gov.in). Here s a simple trace for the main domain:
$ dig +trace maharashtra.gov.in
; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> +trace maharashtra.gov.in
;; global options: +cmd
.            33128    IN    NS    j.root-servers.net.
.            33128    IN    NS    h.root-servers.net.
.            33128    IN    NS    l.root-servers.net.
.            33128    IN    NS    k.root-servers.net.
.            33128    IN    NS    i.root-servers.net.
.            33128    IN    NS    g.root-servers.net.
.            33128    IN    NS    f.root-servers.net.
.            33128    IN    NS    e.root-servers.net.
.            33128    IN    NS    b.root-servers.net.
.            33128    IN    NS    d.root-servers.net.
.            33128    IN    NS    c.root-servers.net.
.            33128    IN    NS    m.root-servers.net.
.            33128    IN    NS    a.root-servers.net.
.            33128    IN    RRSIG    NS 8 0 518400 20250704050000 20250621040000 53148 . pGxGZftwj+6VNTSQtstTKVN95Z7/b5Q8GSjRCXI68GoVYbVai9HNelxs OGIRKL4YmSrsiSsndXuEsBuvL9QvQ+qbybNLkekJUAiicKYNgr3KM3+X 69rsS9KxHgT2T8/oqG8KN8EJLJ8VkuM2PJ2HfSKijtF7ULtgBbERNQ4i u2I/wQ7elOyeF2M76iEOa7UGhgiBHSBqPulsbpnB//WbKL71yyFhWSk0 tiFEPuZM+iLrN2qBsElriF4kkw37uRHq8sSGcCjfBVdkpbb3/Sb3sIgN /zKU17f+hOvuBQTDr5qFIymqGAENA5UZ2RQjikk6+zK5EfBUXNpq1+oo 2y64DQ==
;; Received 525 bytes from 9.9.9.9#53(9.9.9.9) in 3 ms
in.            172800    IN    NS    ns01.trs-dns.com.
in.            172800    IN    NS    ns01.trs-dns.net.
in.            172800    IN    NS    ns10.trs-dns.org.
in.            172800    IN    NS    ns10.trs-dns.info.
in.            86400    IN    DS    48140 8 2 5EE4748C2069B99C98BC39A56881A64AF17CC78711E6297D43AC5A4F 4B5BB6E5
in.            86400    IN    RRSIG    DS 8 1 86400 20250704050000 20250621040000 53148 . jkCotYosapreoKKPvr9zPOEDECYVe9OtJLjkQbFfTin8uYbm/kdWzieW CkN5sabif5IHTFU4FEVOShfu4DFeUolhNav56TPKjGqEGjQ7qCghpqTj dNN4iY2s8BcJ2ujHwhm6HRfdbQRVoKYQ73UUZ+oWSute6lXWHE9+Snk2 1ZCAYPdZ2s1s7NZhrZW2YXVw/nHIcRl/rHqWIQ9sgUlsd6MwmahcAAG+ v15HG9Q48rCG1A2gJlJPbxWpVe0EUEu8LzDsp+ORqy1pHhzgJynrJHJz qMiYU0egv2j7xVPSoQHXjx3PG2rsOLNnqDBYCA+piEXOLsY3d+7c1SZl w9u66g==
;; Received 679 bytes from 199.7.83.42#53(l.root-servers.net) in 3 ms
maharashtra.gov.in.    900    IN    NS    ns8.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns9.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns10.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns18.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns20.maharashtra.gov.in.
npk19skvsdmju264d4ono0khqf7eafqv.gov.in. 300 IN    NSEC3 1 1 0 - P0KKR4BMBGLJDOKBGBI0KDM39DSM0EA4 NS SOA MX TXT RRSIG DNSKEY NSEC3PARAM
npk19skvsdmju264d4ono0khqf7eafqv.gov.in. 300 IN    RRSIG NSEC3 8 3 300 20250626140337 20250528184339 48544 gov.in. Khcq3n1Jn34HvuBEZExusVqoduEMH6DzqkWHk9dFkM+q0RVBYBHBbW+u LsSnc2/Rqc3HAYutk3EZeS+kXVF07GA/A486dr17Hqf3lHszvG/MNT/s CJfcdrqO0Q8NZ9NQxvAwWo44bCPaECQV+fhznmIaVSgbw7de9xC6RxWG ZFcsPYwYt07yB5neKa99RlVvJXk4GHX3ISxiSfusCNOuEKGy5cMxZg04 4PbYsP0AQNiJWALAduq2aNs80FQdWweLhd2swYuZyfsbk1nSXJQcYbTX aONc0VkYFeEJzTscX8/wNbkJeoLP0r/W2ebahvFExl3NYpb7b2rMwGBY omC/QA==
npk19skvsdmju264d4ono0khqf7eafqv.gov.in. 300 IN    RRSIG NSEC3 13 3 300 20250718144138 20250619135610 22437 gov.in. mbj7td3E6YE7kIhYoSlDTZR047TXY3Z60NY0aBwU7obyg5enBQU9j5nl GUxn9zUiwVUzei7v5GIPxXS7XDpk7g==
6bflkoouitlvj011i2mau7ql5pk61sks.gov.in. 300 IN    NSEC3 1 1 0 - 78S0UO5LI1KV1SVMH1889FHUCNC40U6T TXT RRSIG
6bflkoouitlvj011i2mau7ql5pk61sks.gov.in. 300 IN    RRSIG NSEC3 8 3 300 20250626133905 20250528184339 48544 gov.in. M2yPThQpX0sEf4klooQ06h+rLR3e3Q/BqDTSFogyTIuGwjgm6nwate19 jGmgCeWCYL3w/oxsg1z7SfCvDBCXOObH8ftEBOfLe8/AGHAEkWFSu3e0 s09Ccoz8FJiCfBJbbZK5Vf4HWXtBLfBq+ncGCEE24tCQLXaS5cT85BxZ Zne6Y6u8s/WPgo8jybsvlGnL4QhIPlW5UkHDs7cLLQSwlkZs3dwxyHTn EgjNWClhghGXP9nlvOlnDjUkmacEYeq5ItnCQjYPl4uwh9fBJ9CD/8LV K+Tn3+dgqDBek6+2HRzjGs59NzuHX8J9wVFxP7/nd+fUgaSgz+sST80O vrXlHA==
6bflkoouitlvj011i2mau7ql5pk61sks.gov.in. 300 IN    RRSIG NSEC3 13 3 300 20250718141148 20250619135610 22437 gov.in. raWzWsQnPkXYtr2v1SRH/fk2dEAv/K85NH+06pNUwkxPxQk01nS8eYlq BPQ41b26kikg8mNOgr2ULlBpJHb1OQ==
couldn't get address for 'ns18.maharashtra.gov.in': not found
couldn't get address for 'ns20.maharashtra.gov.in': not found
;; Received 1171 bytes from 2620:171:813:1534:8::1#53(ns10.trs-dns.org) in 0 ms
;; communications error to 10.187.202.24#53: timed out
;; communications error to 10.187.202.24#53: timed out
;; communications error to 10.187.202.24#53: timed out
;; communications error to 10.187.202.28#53: timed out
;; communications error to 10.187.203.201#53: timed out
;; no servers could be reached
Quick takeaways: It s a hit or miss for this DNS query resolution.

Looking at in zone data Let s look at NS added in zone itself (with 9.9.9.9):
$ dig ns maharashtra.gov.in
; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> ns maharashtra.gov.in
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 172
;; flags: qr rd ra; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 3
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;maharashtra.gov.in.        IN    NS
;; ANSWER SECTION:
maharashtra.gov.in.    300    IN    NS    ns8.maharashtra.gov.in.
maharashtra.gov.in.    300    IN    NS    ns9.maharashtra.gov.in.
;; ADDITIONAL SECTION:
ns9.maharashtra.gov.in.    300    IN    A    10.187.202.24
ns8.maharashtra.gov.in.    300    IN    A    10.187.202.28
;; Query time: 180 msec
;; SERVER: 9.9.9.9#53(9.9.9.9) (UDP)
;; WHEN: Sat Jun 21 23:00:49 IST 2025
;; MSG SIZE  rcvd: 115
Pay special attention to ADDITIONAL SECTION . Running dig ns9.maharashtra.gov.in and dig ns8.maharashtra.gov.in, return RFC 1918 ie these private addresses. This is coming from zone itself, so in zone A records of NS8 and NS9 point to 10.187.202.28 and 10.187.202.24 respectively. Cloudflare s 1.1.1.1 has a slightly different version:
$ dig ns maharashtra.gov.in @1.1.1.1
; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> ns maharashtra.gov.in @1.1.1.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 36005
;; flags: qr rd ra; QUERY: 1, ANSWER: 3, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;maharashtra.gov.in.        IN    NS
;; ANSWER SECTION:
maharashtra.gov.in.    300    IN    NS    ns8.
maharashtra.gov.in.    300    IN    NS    ns10.maharashtra.gov.in.
maharashtra.gov.in.    300    IN    NS    ns9.
;; Query time: 7 msec
;; SERVER: 1.1.1.1#53(1.1.1.1) (UDP)
;; WHEN: Sun Jun 22 10:38:30 IST 2025
;; MSG SIZE  rcvd: 100
Interesting response here for sure :D. The reason for difference between response from 1.1.1.1 and 9.9.9.9 is in the next section.

Looking at parent zone gov.in is the parent zone here. Tucows is operator for gov.in as well as .in ccTLD zone:
$ dig ns gov.in +short
ns01.trs-dns.net.
ns01.trs-dns.com.
ns10.trs-dns.org.
ns10.trs-dns.info.
Let s take a look at what parent zone (NS) hold:
$ dig ns maharashtra.gov.in @ns01.trs-dns.net.
; <<>> DiG 9.18.36 <<>> ns maharashtra.gov.in @ns01.trs-dns.net.
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 56535
;; flags: qr rd; QUERY: 1, ANSWER: 0, AUTHORITY: 5, ADDITIONAL: 6
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
; COOKIE: f13027aa39632404010000006856fa2a9c97d6bbc973ba4f (good)
;; QUESTION SECTION:
;maharashtra.gov.in.        IN    NS
;; AUTHORITY SECTION:
maharashtra.gov.in.    900    IN    NS    ns8.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns18.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns10.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns9.maharashtra.gov.in.
maharashtra.gov.in.    900    IN    NS    ns20.maharashtra.gov.in.
;; ADDITIONAL SECTION:
ns20.maharashtra.gov.in. 900    IN    A    52.183.143.210
ns18.maharashtra.gov.in. 900    IN    A    35.154.30.166
ns10.maharashtra.gov.in. 900    IN    A    164.100.128.234
ns9.maharashtra.gov.in.    900    IN    A    103.23.150.89
ns8.maharashtra.gov.in.    900    IN    A    103.23.150.88
;; Query time: 28 msec
;; SERVER: 64.96.2.1#53(ns01.trs-dns.net.) (UDP)
;; WHEN: Sun Jun 22 00:00:02 IST 2025
;; MSG SIZE  rcvd: 248
The ADDITIONAL SECTION gives a completely different picture (different from in zone NSes). Maybe this was how it was supposed to be, but none of the IPs listed for NS10, NS18 and NS20 are responding to any DNS query. Assuming NS8 as 103.23.150.88 and NS9 as 103.23.150.89, checking SOA on each gives following:
$ dig soa maharashtra.gov.in @103.23.150.88 +short
ns8.maharashtra.gov.in. postmaster.maharashtra.gov.in. 2013116777 1200 600 1296000 300
$ dig soa maharashtra.gov.in @103.23.150.89 +short
ns8.maharashtra.gov.in. postmaster.maharashtra.gov.in. 2013116757 1200 600 1296000 300
NS8 (which is marked as primary in SOA) has serial 2013116777 and NS9 is on serial 2013116757, so looks like the sync (IXFR/AXFR) between primary and secondary is broken. That s why NS8 and NS9 are serving different responses, evident from the following:
$ dig ns8.maharashtra.gov.in @103.23.150.88 +short
103.23.150.88
$ dig ns8.maharashtra.gov.in @103.23.150.89 +short
10.187.202.28
$ dig ns9.maharashtra.gov.in @103.23.150.88 +short
103.23.150.89
$ dig ns9.maharashtra.gov.in @103.23.150.89 +short
10.187.202.24
$ dig ns maharashtra.gov.in @103.23.150.88 +short
ns9.
ns8.
ns10.maharashtra.gov.in.
$ dig ns maharashtra.gov.in @103.23.150.89 +short
ns9.maharashtra.gov.in.
ns8.maharashtra.gov.in.
$ dig ns10.maharashtra.gov.in @103.23.150.88 +short
10.187.203.201
$ dig ns10.maharashtra.gov.in @103.23.150.89 +short
# No/empty response ^
This is the reason for difference in 1.1.1.1 and 9.9.9.9 responses in previous section.

To summarize:
  • Primary and secondary NS aren t in sync. Serials aren t matching, while NS8 and NS9 are responding differently for same queries.
  • NSes have A records with private address, not reachable on the internet, so lame servers.
  • Incomplete NS address, not even FQDN in some cases.
  • Difference between NS delegated in parent zone and NS added in actual zone.
  • Name resolution works in very particular order (in my initial trace it failed).
Initially, I thought of citing RFCs, but I don t really think it s even required. 1.1.1.1, 8.8.8.8 and 9.9.9.9 are handling (lame servers, this probelm) well, handing out the A record for the main website, so dig maharashtra.gov.in would mostly pass and that was the reason I started this post with +trace to recurse the complete zone to show the problem. For later reference:
$ dig maharashtra.gov.in @8.8.8.8 +short
103.8.188.109

Email to SOA address I have sent the following email to address listed in SOA:
Subject - maharashtra.gov.in authoritative DNS servers not reachable Hello, I wanted to highlight the confusing state of maharashtra.gov.in authoritative DNS servers. Parent zone list following as name servers for your DNS zone:
  • ns8.maharashtra.gov.in.
  • ns18.maharashtra.gov.in.
  • ns10.maharashtra.gov.in.
  • ns9.maharashtra.gov.in.
  • ns20.maharashtra.gov.in.
Out of these, ns18 and ns20 don t have public A/AAAA records and are thus not reachable. ns10 keeps on shuffling between NO A record and 10.187.203.201 (private, not reachable address). ns8 keeps on shuffling between 103.23.150.88 and 10.187.202.28 (private, not reachable address). ns9 keeps on shuffling between 103.23.150.89 and 10.187.202.24 (private, not reachable address). These are leading to long, broken, or no DNS resolution for the website(s). Can you take a look at the problem? Regards, Sahil
I ll update here if I get a response. Hopefully, they ll listen and fix their problem.

21 June 2025

Ravi Dwivedi: Getting Brunei visa

In December 2024, my friend Badri and I were planning a trip to Southeast Asia. At this point, we were planning to visit Singapore, Malaysia and Vietnam. My Singapore visa had already been approved, and Malaysia was visa-free for us. For Vietnam, we had to apply for an e-visa online. We considered adding Brunei to our itinerary. I saw some videos of the Brunei visa process and got the impression that we needed to go to the Brunei embassy in Kuching, Malaysia in person. However, when I happened to search for Brunei on Organic Maps1, I stumbled upon the Brunei Embassy in Delhi. It seemed to be somewhere in Hauz Khas. As I was going to Delhi to collect my Singapore visa the next day, I figured I d also visit the Brunei Embassy to get information about the visa process. The next day I went to the location displayed by Organic Maps. It was next to the embassy of Madagascar, and a sign on the road divider confirmed that I was at the right place. That said, it actually looked like someone s apartment. I entered and asked for directions to the Brunei embassy, but the people inside did not seem to understand my query. After some back and forth, I realized that the embassy wasn t there. I now searched for the Brunei embassy on the Internet, and this time I got an address in Vasant Vihar. It seemed like the embassy had been moved from Hauz Khas to Vasant Vihar. Going by the timings mentioned on the web page, the embassy was closing in an hour. I took a Metro from Hauz Khas to Vasant Vihar. After deboarding at the Vasant Vihar metro station, I took an auto to reach the embassy. The address listed on the webpage got me into the correct block. However, the embassy was still nowhere to be seen. I asked around, but security guards in that area pointed me to the Burundi embassy instead. After some more looking around, I did end up finding the embassy. I spoke to the security guards at the gate and told them that I would like to know the visa process. They dialled a number and asked that person to tell me the visa process. I spoke to a lady on the phone. She listed the documents required for the visa process and mentioned that the timings for visa application were from 9 o clock to 11 o clock in the morning. She also informed me that the visa fees was 1000. I also asked about the process Badri, who lives far away in Tamil Nadu and cannot report at the embassy physically. She told me that I can submit a visa application on his behalf, along with an authorization letter. Having found the embassy in Delhi was a huge relief. The other plan - going to Kuching, Malaysia - was a bit uncertain, and we didn t know how much time it would take. Getting our passport submitted at an embassy in a foreign country was also not ideal. A few days later, Badri sent me all the documents required for his visa. I went to the embassy and submitted both the applications. The lady who collected our visa submissions asked me for our flight reservations from Delhi to Brunei, whereas ours were (keeping with our itinerary) from Kuala Lampur. She said that she might contact me later if it was required. For reference, here is the list of documents we submitted - I then asked about the procedure to collect the passports and visa results. Usually, embassies will tell you that they will contact you when they have decided on your applications. However, here I was informed that if they don t contact me within 5 days, I can come and collect our passports and visa result between 13:30-14:30 hours on the fifth day. That was strange :) I did visit the embassy to collect our visa results on the fifth day. However, the lady scolded me for not bringing the receipt she gave me. I was afraid that I might have to go all the way back home and bring the receipt to get our passports. The travel date was close, and it would take some time for Badri to receive his passport via courier as well. Fortunately, she gave me our passports (with the visa attached) and asked me to share a scanned copy of the receipt via email after I get home. We were elated that our visas were approved. Now we could focus on booking our flights. If you are going to Brunei, remember to fill their arrival card from the website within 48 hours of your arrival! Thanks to Badri and Contrapunctus for reviewing the draft before publishing the article.

  1. Nowadays, I prefer using Comaps instead of Organic Maps and recommend you do the same. Organic Maps had some issues with its governance and the community issues weren t being addressed.

19 June 2025

Russell Coker: Matching Intel CPUs

To run a SMP system with multiple CPUs you need to have CPUs that are identical , the question is what does identical mean. In this case I m interested in Intel CPUs because SMP motherboards and server systems for Intel CPUs are readily available and affordable. There are people selling matched pairs of CPUs on ebay which tend to be more expensive than randomly buying 2 of the same CPU model, so if you can identify 2 CPUs that are identical which are sold separately then you can save some money. Also if you own a two CPU system with only one CPU installed then buying a second CPU to match the first is cheaper and easier than buying two more CPUs and removing a perfectly working CPU. e5-2640 v4 cpus
Intel (R) Xeon (R)
E5-2640V4
SR2NZ 2.40GHZ
J717B324 (e4)
7758S4100843
Above is a pic of 2 E5-2640v4 CPUs that were in a SMP system I purchased along with a plain ASCII representation of the text on one of them. The bottom code (starting with 77 ) is apparently the serial number, one of the two codes above it is what determines how identical those CPUs are. The code on the same line as the nominal clock speed (in this case SR2NZ) is the spec number which is sometimes referred to as sspec [1]. The line below the sspec and above the serial number has J717B324 which doesn t have a google hit. I looked at more than 20 pics of E5-2640v4 CPUs on ebay, they all had the code SR2NZ but had different numbers on the line below. I conclude that the number on the line below probably indicates the model AND stepping while SR2NZ just means E5-2640v4 regardless of stepping. As I wasn t able to find another CPU on ebay with the same number on the line below the sspec I believe that it will be unreasonably difficult to get a match for an existing CPU. For the purpose of matching CPUs I believe that if the line above the serial number matches then the CPUs can be used together. I am not certain that CPUs with this number slightly mismatching won t work but I definitely wouldn t want to spend money on CPUs with this number being different.
smpboot: CPU0: Intel(R) Xeon(R) CPU E5-2699A v4 @ 2.40GHz (family: 0x6, model: 0x4f, stepping: 0x1)
When you boot Linux the kernel identifies the CPU in a manner like the above, the combination of family and model seem to map to one spec number. The combination of family, model, and stepping should be all that s required to have them work together. I think that Intel did the wrong thing in not making this clearer. It would have been very easy to print the stepping on the CPU case next to the sspec or the CPU model name. It also wouldn t have been too hard to make the CPU provide the magic number that is apparently the required match for SMP to the OS. Having the Intel web site provide a mapping of those numbers to steppings of CPUs also shouldn t be difficult for them. If anyone knows more about these issues please let me know.

Debian Outreach Team: GSoC 2025 Introduction: Make Debian for Raspberry Pi Build Again

Hello everyone! I am Kurva Prashanth, Interested in the lower level working of system software, CPUs/SoCs and Hardware design. I was introduced to Open Hardware and Embedded Linux while studying electronics and embedded systems as part of robotics coursework. Initially, I did not pay much attention to it and quickly moved on. However, a short talk on Liberating SBCs using Debian by Yuvraj at MiniDebConf India, 2021 caught my interest. The talk focused on Open Hardware platforms such as Olimex and BeagleBone Black, as well as the Debian distributions tailored for these ARM-based single-board computers has intrigued me to delve deeper into the realm of Open Hardware and Embedded Linux. These days I m trying to improve my abilities to contribute to Debian and Linux Kernel development. Before finding out about the Google Summer of Code project, I had already started my journey with Debian. I extensively used Debian system build tools(debootstrap, sbuild, deb-build-pkg, qemu-debootstrap) for Building Debian Image for Bela Cape a real-time OS for music making to achieve extremely fast audio and sensor processing times. In 2023, I had the opportunity to attend DebConf23 in Kochi, India - thanks to Nilesh Patra (@nilesh) and I met Hector Oron (@zumbi) over dinner at DebConf23 and It was nice talking about his contributions/work at Debian on armhf port and Debian System Administration that conversation got me interested in knowing more about Debian ARM, Installer and I found it fascinating that EmDebian was once a external project bringing Debian to embedded systems and now, Debian itself can be run on many embedded systems. And, also during DebCamp I got Introduced to PGP/GPG keys and the web of trust by Carlos Henrique Lima Melara (@charles) I learned how to use and generate GPG keys. After DebConf23 I tried debian packaging and I miserably failed to get sponsorship for a python library I packaged. I came across the Debian project for this year s Google Summer of Code and found the project titled Make Debian for Raspberry Pi Build Again quite interesting to me and applied. Gladly, on May 8th, I received an acceptance e-mail from GSoC. I got excited that I ll spend the summer working on something that I like doing. I am thrilled to be part of this project and I am super excited for the summer of 25. I m looking forward to work on what I most like, new connections and learning opportunities. So, let me talk a bit more about my project. I will be working on to Make Debian for Raspberry Pi SBC s under the guidance of Gunnar Wolf (@gwolf). In this post, I will describe the project I will be working on.

Why make Debian for Raspberry Pi build again? There is an available set of images for running Debian in Raspberry Pi computers (all models below the 5 series)! However, the maintainer severely lacking time to take care for them; called for help for somebody to adopt them, but have not been successful. The image generation scripts might have bitrotted a bit, but it is mostly all done. And there is a lot of interest and use still in having the images freshly generated and decently tested! This GSoC project is about getting the [https://raspi.debian.net/ Raspberry Pi Debian images] site working reliably, daily-built images become automatic again and ideally making it easily deployable to be run in project machines and migrating exsisting hosting infrastructure to Debian.

How much it differ from Debian build process? While the goal is to stay as close as possible to the Debian build process, Raspberry Pi boards require some necessary platform-specific changes primarily in the early boot sequence and firmware handling. Unlike typical Debian systems, Raspberry Pi boards depend on a non-standard bootloader and use non-free firmware (raspi-firmware), Introducing some hardware-specific differences in the initialization process. These differences are largely confined to the early boot and hardware initialization stages. Once the system boots, the userspace remains closely aligned with a typical Debian install, using Debian packages. The current modifications are required due to non-free firmware. However, several areas merit review: but there are a few parts that might be worth changing.
  1. Boot flow: Transitioning to a U-Boot based boot process (as used in Debian installer images for many other SBCs) would reduce divergence and better align with Debian Installer.
  2. Current scripts/workarounds: Some existing hacks may now be redundant with recent upstream support and could be removed.
  3. Board-specific images: Shift to architecture-specific base images with runtime detection could simplify builds and reduce duplication.
Debian already already building SD card images for a wide range of SBCs (e.g., BeagleBone, BananaPi, OLinuXino, Cubieboard, etc.) installer-arm64/images/u-boot and installer-armhf/images/u-boot, a similar approach for Raspberry Pi could improve maintainability and consistency with Debian s broader SBC support.

Quoted from Mail Discussion Thread with Mentor (Gunnar Wolf)
"One direction we wanted to explore was whether we should still be building one image per family, or whether we could instead switch to one image per architecture (armel, armhf, arm64). There were some details to iron out as RPi3 and RPi4 were quite different, but I think it will be similar to the differences between the RPi 0 and 1, which are handled at first-boot time. To understand what differs between families, take a look at Cyril Brulebois generate-recipe (in the repo), which is a great improvement over the ugly mess I had before he contributed it"
In this project, I intend to to build one image per architecture (armel, armhf, arm64) rather than continuing with the current model of building one image per board. This change simplifies image management, reduces redundancy, and leverages dynamic configuration at boot time to support all supported boards within each architecture. By using U-Boot and flash-kernel, we can detect the board type and configure kernel parameters, DTBs, and firmware during the first boot, reducing duplication across images and simplifying the maintenance burden and we can also generalize image creation while still supporting board-specific behavior at runtime. This method aligns with existing practices in the DebianInstaller team and aligns with Debian s long-term maintainability goals and better leverages upstream capabilities, ensuring a consistent and scalable boot experience. To streamline and standardize the process of building bootable Debian images for Raspberry Pi devices, I proposed a new workflow that leverages U-Boot and flash-kernel Debian packages. This provides a clean, maintainable, and reproducible way to generate images for armel, armhf and arm64 boards. The workflow is vmdb2, a lightweight, declarative tool designed to automate the creation of disk images. A typical vmdb2 recipe defines the disk layout, base system installation (via debootstrap), architecture-specific packages, and any custom post-install hooks and the image should includes U-Boot (the u-boot-rpi package), flash-kernel, and a suitable Debian kernel package like linux-image-arm64 or linux-image-armmp. U-Boot serves as the platform s bootloader and is responsible for loading the kernel and initramfs. Unlike Raspberry Pi s non-free firmware/proprietary bootloader, U-Boot provides an open and scriptable interface, allowing us to follow a more standard Debian boot process. It can be configured to boot using either an extlinux.conf or a boot.scr script generated automatically by flash-kernel. The role of flash-kernel is to bridge Debian s kernel installation system with the specifics of embedded bootloaders like U-Boot. When installed, it automatically copies the kernel image, initrd, and device tree blobs (DTBs) to the /boot partition. It also generates the necessary boot.scr script if the board configuration demands it. To work correctly, flash-kernel requires that the target machine be identified via /etc/flash-kernel/machine, which must correspond to an entry in its internal machine database.\ Once the vmdb2 build is complete, the resulting image will contain a fully configured bootable system with all necessary boot components correctly installed. The image can be flashed to an SD card and used to boot on the intended device without additional manual configuration. Because all key packages (U-Boot, kernel, flash-kernel) are managed through Debian s package system, kernel updates and boot script regeneration are handled automatically during system upgrades.

Current Workflow: Builds one Image per family The current vmdb2 recipe uses the Raspberry Pi GPU bootloader provided via the raspi-firmware package. This is the traditional boot process followed by Raspberry Pi OS, and it s tightly coupled with firmware files like bootcode.bin, start.elf, and fixup.dat. These files are installed to /boot/firmware, which is mounted from a FAT32 partition labeled RASPIFIRM. The device tree files (*.dtb) are manually copied from /usr/lib/linux-image-*-arm64/broadcom/ into this partition. The kernel is installed via the linux-image-arm64 package, and the boot arguments are injected by modifying /boot/firmware/cmdline.txt using sed commands. Booting depends on the root partition being labeled RASPIROOT, referenced through that file. There is no bootloader like UEFI-based or U-Boot involved the Raspberry Pi firmware directly loads the kernel, which is standard for Raspberry Pi boards.
- apt: install
  packages:
    ...
    - raspi-firmware  
The boot partition contents and kernel boot setup are tightly controlled via scripting in the recipe. Limitations of Current Workflow: While this setup works, it is
  1. Proprietary and Raspberry Pi specific It relies on the closed-source GPU bootloader the raspi-firmware package, which is tightly coupled to specific Raspberry Pi models.
  2. Manual DTB handling Device tree files are manually copied and hardcoded, making upgrades or board-specific changes error-prone.
  3. Not easily extendable to future Raspberry Pi boards Any change in bootloader behavior (as seen in the Raspberry Pi 5, which introduces a more flexible firmware boot process) would require significant rework.
  4. No UEFI-based/U-Boot The current method bypasses the standard bootloader layers, making it inconsistent with other Debian ARM platforms and harder to maintain long-term.
As Raspberry Pi firmware and boot processes evolve, especially with the introduction of Pi 5 and potentially Pi 6, maintaining compatibility will require more flexibility - something best delivered by adopting U-Boot and flash-kernel.

New Workflow: Building Architecture-Specific Images with vmdb2, U-Boot, flash-kernel, and Debian Kernel This workflow outlines an improved approach to generating bootable Debian images architecture specific, using vmdb2, U-Boot, flash-kernel, and Debian kernels and also to move away from Raspberry Pi s proprietary bootloader to a fully open-source boot process which improves maintainability, consistency, and cross-board support.

New Method: Shift to U-Boot + flash-kernel U-Boot (via Debian su-boot-rpi package) and flash-kernel bring the image building process closer to how Debian officially boots ARM devices. flash-kernel integrates with the system s initramfs and kernel packages to install bootloaders, prepare boot.scr or extlinux.conf, and copy kernel/initrd/DTBs to /boot in a format that U-Boot expects. U-Boot will be used as a second-stage bootloader, loaded by the Raspberry Pi s built-in firmware. Once U-Boot is in place, it will read standard boot scripts ( boot.scr) generated by flash-kernel, providing a Debian-compatible and board-flexible solution. Extending YAML spec for vmdb2 build with U-Boot and flash-kernel To improve an existing vmdb2 YAML spec(https://salsa.debian.org/raspi-team/image-specs/raspi_master.yaml), to integrate U-Boot, flash-kernel, and the architecture-specific Debian kernel into the image build process. By incorporating u-boot-rpi and flash-kernel from Debian packages, alongside the standard initramfs-tools, we align the image closer to Debian best practices while supporting both armhf and arm64 architectures. Below are key additions and adjustments needed in a vmdb2 YAML spec to support the workflow: Install U-Boot, flash-kernel, initramfs-tools and the architecture-specific Debian kernel.
- apt: install
  packages:
    - u-boot-rpi
    - flash-kernel
    - initramfs-tools
    - linux-image-arm64 # or linux-image-armmp for armhf 
  tag: tag-root
Replace linux-image-arm64 with the correct kernel package for specific target architecture. These packages should be added under the tag-root section in YAML spec for vmdb2 build recipe. This ensures that the necessary bootloader, kernel, and initramfs tools are included and properly configured in the image. Configure Raspberry Pi firmware to Load U-Boot Install the U-Boot binary as kernel.img in /boot/firmware we can also download and build U-Boot from source, but Debian provides tested binaries.
- shell:  
    cp /usr/lib/u-boot/rpi_4/u-boot.bin $ ROOT? /boot/firmware/kernel.img
    echo "enable_uart=1" >> $ ROOT? /boot/firmware/config.txt
  root-fs: tag-root
This makes the RPi firmware load u-boot.bin instead of the Linux kernel directly. Set Up flash-kernel for Debian-style Boot flash-kernel integrates with initramfs-tools and writes boot config suitable for U-Boot. We need to make sure /etc/flash-kernel/db contains an entry for board (most Raspberry Pi boards already supported in Bookworm). Set up /etc/flash-kernel.conf with:
- create-file: /etc/flash-kernel.conf
  contents:  
    MACHINE="Raspberry Pi 4"
    BOOTPART="/dev/disk/by-label/RASPIFIRM"
    ROOTPART="/dev/disk/by-label/RASPIROOT"
  unless: rootfs_unpacked
This allows flash-kernel to write an extlinux.conf or boot.scr into /boot/firmware. Clean up Proprietary/Non-Free Firmware Bootflow Remove the direct kernel loading flow:
- shell:  
    rm -f $ ROOT? /boot/firmware/vmlinuz*
    rm -f $ ROOT? /boot/firmware/initrd.img*
    rm -f $ ROOT? /boot/firmware/cmdline.txt
  root-fs: tag-root
Let U-Boot and flash-kernel manage kernel/initrd and boot parameters instead. Boot Flow After This Change
[SoC ROM] -> [start.elf] -> [U-Boot] -> [boot.scr] -> [Linux Kernel]
  1. This still depends on the Raspberry Pi firmware to start, but it only loads U-Boot, not Linux kernel.
  2. U-Boot gives you more flexibility (e.g., networking, boot menus, signed boot).
  3. Using flash-kernel ensures kernel updates are handled the Debian Installer way.
  4. Test with a serial console (enable_uart=1) in case HDMI doesn t show early boot logs.
Advantage of New Workflow
  1. Replaces the proprietary Raspberry Pi bootloader with upstream U-Boot.
  2. Debian-native tooling Uses flash-kernel and initramfs-tools to manage boot configuration.
  3. Consistent across boards Works for both armhf and arm64, unifying the image build process.
  4. Easier to support new boards Like the Raspberry Pi 5 and future models.
This transition will standardize a bit image-building process, making it aligned with upstream Debian Installer workflows.

vmdb2 configuration for arm64 using u-boot and flash-kernel NOTE: This is a baseline example and may require tuning.
# Raspberry Pi arm64 image using U-Boot and flash-kernel
steps:
  # ... (existing mkimg, partitions, mount, debootstrap, etc.) ...
  # Install U-Boot, flash-kernel, initramfs-tools and architecture specific kernel
  - apt: install
    packages:
      - u-boot-rpi
      - flash-kernel
      - initramfs-tools
      - linux - image - arm64 # or linux - image - armmp for armhf
    tag: tag-root
  # Install U-Boot binary as kernel.img in firmware partition
  - shell:  
      cp /usr/lib/u-boot/rpi_arm64 /u-boot.bin $ ROOT? /boot/firmware/kernel.img
      echo "enable_uart=1" >> $ ROOT? /boot/firmware/config.txt
    root-fs: tag-root
  # Configure flash-kernel for Raspberry Pi
  - create-file: /etc/flash-kernel.conf
    contents:  
      MACHINE="Generic Raspberry Pi ARM64"
      BOOTPART="/dev/disk/by-label/RASPIFIRM"
      ROOTPART="/dev/disk/by-label/RASPIROOT"
    unless: rootfs_unpacked
  # Remove direct kernel boot files from Raspberry Pi firmware
  - shell:  
      rm -f $ ROOT? /boot/firmware/vmlinuz*
      rm -f $ ROOT? /boot/firmware/initrd.img*
      rm -f $ ROOT? /boot/firmware/cmdline.txt
    root-fs: tag-root
  # flash-kernel will manage boot scripts and extlinux.conf
  # Rest of image build continues...

Required Changes to Support Raspberry Pi Boards in Debian (flash-kernel + U-Boot)

Overview of Required Changes
Component Required Task
Debian U-Boot Package Add build target for rpi_arm64 in u-boot-rpi. Optionally deprecate legacy 32-bit targets.
Debian flash-kernel Package Add or verify entries in db/all.db for Pi 4, Pi 5, Zero 2W, CM4. Ensure boot script generation works via bootscr.uboot-generic.
Debian Kernel Ensure DTBs are installed at /usr/lib/linux-image-<version>/ and available for flash-kernel to reference.

flash-kernel

Already Supported Boards in flash-kernel Debian Package https://sources.debian.org/src/flash-kernel/3.109/db/all.db/#L1700
Model Arch DTB-Id
Raspberry Pi 1 A/B/B+, Rev2 armel bcm2835-*
Raspberry Pi CM1 armel bcm2835-rpi-cm1-io1.dtb
Raspberry Pi Zero/Zero W armel bcm2835-rpi-zero*.dtb
Raspberry Pi 2B armhf bcm2836-rpi-2-b.dtb
Raspberry Pi 3B/3B+ arm64 bcm2837-*
Raspberry Pi CM3 arm64 bcm2837-rpi-cm3-io3.dtb
Raspberry Pi 400 arm64 bcm2711-rpi-400.dtb

uboot

Already Supported Boards in Debian U-Boot Package https://salsa.debian.org/installer-team/flash-kernel/-/blob/master/db/all.db

arm64 Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi 3B arm64 rpi_3_defconfig rpi_3 Raspberry Pi 4B arm64 rpi_4_defconfig rpi_4 Raspberry Pi 3B/3B+/CM3/CM3+/4B/CM4/400/5B/Zero 2W arm64 rpi_arm64_defconfig rpi_arm64
armhf Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi 2 armhf rpi_2_defconfig rpi_2 Raspberry Pi 3B (32-bit) armhf rpi_3_32b_defconfig rpi_3_32b Raspberry Pi 4B (32-bit) armhf rpi_4_32b_defconfig rpi_4_32b
armel Model Arch Upstream Defconfig Debian Target - - - Raspberry Pi armel rpi_defconfig rpi Raspberry Pi 1/Zero armel rpi_0_w rpi_0_w These boards are already defined in debian/rules under the u-boot-rpi source package and generates usable U-Boot binaries for corresponding Raspberry Pi models.

To-Do: Add Missing Board Support to U-Boot and flash-kernel in Debian Several Raspberry Pi models are missing from the Debian U-Boot and flash-kernel packages, even though upstream support and DTBs exist in the Debian kernel but are missing entries in the flash-kernel database to enable support for bootloader installation and initrd handling.

Boards Not Yet Supported in flash-kernel Debian Package
Model Arch DTB-Id
Raspberry Pi 3A+ (32 & 64 bit) armhf, arm64 bcm2837-rpi-3-a-plus.dtb
Raspberry Pi 4B (32 & 64 bit) armhf, arm64 bcm2711-rpi-4-b.dtb
Raspberry Pi CM4 arm64 bcm2711-rpi-cm4-io.dtb
Raspberry Pi CM 4S arm64 -
Raspberry Zero 2 W arm64 bcm2710-rpi-zero-2-w.dtb
Raspberry Pi 5 arm64 bcm2712-rpi-5-b.dtb
Raspberry Pi CM5 arm64 -
Raspberry Pi 500 arm64 -

Boards Not Yet Supported in Debian U-Boot Package
Model Arch Upstream defconfig(s)
Raspberry Pi 3A+/3B+ arm64 -, rpi_3_b_plus_defconfig
Raspberry Pi CM 4S arm64 -
Raspberry Pi 5 arm64 -
Raspberry Pi CM5 arm64 -
Raspberry Pi 500 arm64 -

So, what next? During the Community Bonding Period, I got hands-on with workflow improvements, set up test environments, and began reviewing Raspberry Pi support in Debian s U-Boot and flash-kernel and these are the logs of the project, where I provide weekly reports on the work done. You can check here: Community Bonding Period logs. My next steps include submitting patches to the u-boot and flash-kernel packages to ensure all missing Raspberry Pi entries are built and shipped. And, also to confirm the kernel DTB installation paths and make sure the necessary files are included for all Raspberry Pi variants. Finally, plan to validate changes with test builds on Raspberry Pi hardware. In parallel, I m organizing my tasks and setting up my environment to contribute more effectively. It s been exciting to explore how things work under the hood and to prepare for a summer of learning and contributing to this great community.

18 June 2025

Sergio Durigan Junior: GCC, glibc, stack unwinding and relocations A war story

I ve been meaning to write a post about this bug for a while, so here it is (before I forget the details!). First, I d like to thank a few people: I ll probably forget some details because it s been more than a week (and life at $DAYJOB moves fast), but we ll see.

The background story Wolfi OS takes security seriously, and one of the things we have is a package which sets the hardening compiler flags for C/C++ according to the best practices recommended by OpenSSF. At the time of this writing, these flags are (in GCC s spec file parlance):
*self_spec:
+ % !O:% !O1:% !O2:% !O3:% !O0:% !Os:% !0fast:% !0g:% !0z:-O2  -fhardened -Wno-error=hardened -Wno-hardened % !fdelete-null-pointer-checks:-fno-delete-null-pointer-checks  -fno-strict-overflow -fno-strict-aliasing % !fomit-frame-pointer:-fno-omit-frame-pointer  -mno-omit-leaf-frame-pointer
*link:
+ --as-needed -O1 --sort-common -z noexecstack -z relro -z now
The important part for our bug is the usage of -z now and -fno-strict-aliasing. As I was saying, these flags are set for almost every build, but sometimes things don t work as they should and we need to disable them. Unfortunately, one of these problematic cases has been glibc. There was an attempt to enable hardening while building glibc, but that introduced a strange breakage to several of our packages and had to be reverted. Things stayed pretty much the same until a few weeks ago, when I started working on one of my roadmap items: figure out why hardening glibc wasn t working, and get it to work as much as possible.

Reproducing the bug I started off by trying to reproduce the problem. It s important to mention this because I often see young engineers forgetting to check if the problem is even valid anymore. I don t blame them; the anxiety to get the bug fixed can be really blinding. Fortunately, I already had one simple test to trigger the failure. All I had to do was install the py3-matplotlib package and then invoke:
$ python3 -c 'import matplotlib'
This would result in an abortion with a coredump. I followed the steps above, and readily saw the problem manifesting again. OK, first step is done; I wasn t getting out easily from this one.

Initial debug The next step is to actually try to debug the failure. In an ideal world you get lucky and are able to spot what s wrong after just a few minutes. Or even better: you also can devise a patch to fix the bug and contribute it to upstream. I installed GDB, and then ran the py3-matplotlib command inside it. When the abortion happened, I issued a backtrace command inside GDB to see where exactly things had gone wrong. I got a stack trace similar to the following:
#0  0x00007c43afe9972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007c43afe3d8be in raise () from /lib/libc.so.6
#2  0x00007c43afe2531f in abort () from /lib/libc.so.6
#3  0x00007c43af84f79d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007c43af86d4d8 in _Unwind_RaiseException () from /usr/lib/libgcc_s.so.1
#5  0x00007c43acac9014 in __cxxabiv1::__cxa_throw (obj=0x5b7d7f52fab0, tinfo=0x7c429b6fd218 <typeinfo for pybind11::attribute_error>, dest=0x7c429b5f7f70 <pybind11::reference_cast_error::~reference_cast_error() [clone .lto_priv.0]>)
    at ../../../../libstdc++-v3/libsupc++/eh_throw.cc:93
#6  0x00007c429b5ec3a7 in ft2font__getattr__(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >) [clone .lto_priv.0] [clone .cold] () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#7  0x00007c429b62f086 in pybind11::cpp_function::initialize<pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, pybind11::name, pybind11::scope, pybind11::sibling>(pybind11::object (*&)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::object (*)(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >), pybind11::name const&, pybind11::scope const&, pybind11::sibling const&):: lambda(pybind11::detail::function_call&)#1 ::_FUN(pybind11::detail::function_call&) [clone .lto_priv.0] ()
   from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
#8  0x00007c429b603886 in pybind11::cpp_function::dispatcher(_object*, _object*, _object*) () from /usr/lib/python3.13/site-packages/matplotlib/ft2font.cpython-313-x86_64-linux-gnu.so
...
Huh. Initially this didn t provide me with much information. There was something strange seeing the abort function being called right after _Unwind_RaiseException, but at the time I didn t pay much attention to it. OK, time to expand our horizons a little. Remember when I said that several of our packages would crash with a hardened glibc? I decided to look for another problematic package so that I could make it crash and get its stack trace. My thinking here is that maybe if I can compare both traces, something will come up. I happened to find an old discussion where Dann Frazier mentioned that Emacs was also crashing for him. He and I share the Emacs passion, and I totally agreed with him when he said that Emacs crashing is priority -1! (I m paraphrasing). I installed Emacs, ran it, and voil : the crash happened again. OK, that was good. When I ran Emacs inside GDB and asked for a backtrace, here s what I got:
#0  0x00007eede329972c in __pthread_kill_implementation () from /lib/libc.so.6
#1  0x00007eede323d8be in raise () from /lib/libc.so.6
#2  0x00007eede322531f in abort () from /lib/libc.so.6
#3  0x00007eede262879d in uw_init_context_1[cold] () from /usr/lib/libgcc_s.so.1
#4  0x00007eede2646e7c in _Unwind_Backtrace () from /usr/lib/libgcc_s.so.1
#5  0x00007eede3327b11 in backtrace () from /lib/libc.so.6
#6  0x000059535963a8a1 in emacs_backtrace ()
#7  0x000059535956499a in main ()
Ah, this backtrace is much simpler to follow. Nice. Hmmm. Now the crash is happening inside _Unwind_Backtrace. A pattern emerges! This must have something to do with stack unwinding (or so I thought keep reading to discover the whole truth). You see, the backtrace function (yes, it s a function) and C++ s exception handling mechanism use similar techniques to do their jobs, and it pretty much boils down to unwinding frames from the stack. I looked into Emacs source code, specifically the emacs_backtrace function, but could not find anything strange over there. This bug was probably not going to be an easy fix

The quest for a minimal reproducer Being able to easily reproduce the bug is awesome and really helps with debugging, but even better is being able to have a minimal reproducer for the problem. You see, py3-matplotlib is a huge package and pulls in a bunch of extra dependencies, so it s not easy to ask other people to just install this big package plus these other dependencies, and then run this command , especially if we have to file an upstream bug and talk to people who may not even run the distribution we re using. So I set up to try and come up with a smaller recipe to reproduce the issue, ideally something that s not tied to a specific package from the distribution. Having all the information gathered from the initial debug session, especially the Emacs backtrace, I thought that I could write a very simple program that just invoked the backtrace function from glibc in order to trigger the code path that leads to _Unwind_Backtrace. Here s what I wrote:
#include <execinfo.h>

int
main(int argc, char *argv[])
 
  void *a[4096];
  backtrace (a, 100);
  return 0;
 
After compiling it, I determined that yes, the problem did happen with this small program as well. There was only a small nuisance: the manifestation of the bug was not deterministic, so I had to execute the program a few times until it crashed. But that s much better than what I had before, and a small price to pay. Having a minimal reproducer pretty much allows us to switch our focus to what really matters. I wouldn t need to dive into Emacs or Python s source code anymore. At the time, I was sure this was a glibc bug. But then something else happened.

GCC 15 I had to stop my investigation efforts because something more important came up: it was time to upload GCC 15 to Wolfi. I spent a couple of weeks working on this (it involved rebuilding the whole archive, filing hundreds of FTBFS bugs, patching some programs, etc.), and by the end of it the transition went smooth. When the GCC 15 upload was finally done, I switched my focus back to the glibc hardening problem. The first thing I did was to yes, reproduce the bug again. It had been a few weeks since I had touched the package, after all. So I built a hardened glibc with the latest GCC and the bug did not happen anymore! Fortunately, the very first thing I thought was this must be GCC , so I rebuilt the hardened glibc with GCC 14, and the bug was there again. Huh, unexpected but very interesting.

Diving into glibc and libgcc At this point, I was ready to start some serious debugging. And then I got a message on Signal. It was one of those moments where two minds think alike: Gabriel decided to check how I was doing, and I was thinking about him because this involved glibc, and Gabriel contributed to the project for many years. I explained what I was doing, and he promptly offered to help. Yes, there are more people who love low level debugging! We spent several hours going through disassembles of certain functions (because we didn t have any debug information in the beginning), trying to make sense of what we were seeing. There was some heavy GDB involved; unfortunately I completely lost the session s history because it was done inside a container running inside an ephemeral VM. But we learned a lot. For example:
  • It was hard to actually understand the full stack trace leading to uw_init_context_1[cold]. _Unwind_Backtrace obviously didn t call it (it called uw_init_context_1, but what was that [cold] doing?). We had to investigate the disassemble of uw_init_context_1 in order to determined where uw_init_context_1[cold] was being called.
  • The [cold] suffix is a GCC function attribute that can be used to tell the compiler that the function is unlikely to be reached. When I read that, my mind immediately jumped to this must be an assertion , so I went to the source code and found the spot.
  • We were able to determine that the return code of uw_frame_state_for was 5, which means _URC_END_OF_STACK. That s why the assertion was triggering.
After finding these facts without debug information, I decided to bite the bullet and recompiled GCC 14 with -O0 -g3, so that we could debug what uw_frame_state_for was doing. After banging our heads a bit more, we found that fde is NULL at this excerpt:
// ...
  fde = _Unwind_Find_FDE (context->ra + _Unwind_IsSignalFrame (context) - 1,
                          &context->bases);
  if (fde == NULL)
     
#ifdef MD_FALLBACK_FRAME_STATE_FOR
      /* Couldn't find frame unwind info for this function.  Try a
         target-specific fallback mechanism.  This will necessarily
         not provide a personality routine or LSDA.  */
      return MD_FALLBACK_FRAME_STATE_FOR (context, fs);
#else
      return _URC_END_OF_STACK;
#endif
     
// ...
We re debugging on amd64, which means that MD_FALLBACK_FRAME_STATE_FOR is defined and therefore is called. But that s not really important for our case here, because we had established before that _Unwind_Find_FDE would never return NULL when using a non-hardened glibc (or a glibc compiled with GCC 15). So we decided to look into what _Unwind_Find_FDE did. The function is complex because it deals with .eh_frame , but we were able to pinpoint the exact location where find_fde_tail (one of the functions called by _Unwind_Find_FDE) is returning NULL:
if (pc < table[0].initial_loc + data_base)
  return NULL;
We looked at the addresses of pc and table[0].initial_loc + data_base, and found that the former fell within libgcc s text section, which the latter fell within /lib/ld-linux-x86-64.so.2 text. At this point, we were already too tired to continue. I decided to keep looking at the problem later and see if I could get any further.

Bisecting GCC The next day, I woke up determined to find what changed in GCC 15 that caused the bug to disappear. Unless you know GCC s internals like they are your own home (which I definitely don t), the best way to do that is to git bisect the commits between GCC 14 and 15. I spent a few days running the bisect. It took me more time than I d have liked to find the right range of commits to pass git bisect (because of how branches and tags are done in GCC s repository), and I also had to write some helper scripts that:
  • Modified the gcc.yaml package definition to make it build with the commit being bisected.
  • Built glibc using the GCC that was just built.
  • Ran tests inside a docker container (with the recently built glibc installed) to determine whether the bug was present.
At the end, I had a commit to point to:
commit 99b1daae18c095d6c94d32efb77442838e11cbfb
Author: Richard Biener <rguenther@suse.de>
Date:   Fri May 3 14:04:41 2024 +0200
    tree-optimization/114589 - remove profile based sink heuristics
Makes sense, right?! No? Well, it didn t for me either. Even after reading what was changed in the code and the upstream bug fixed by the commit, I was still clueless as to why this change fixed the problem (I say fixed because it may very well be an unintended consequence of the change, and some other problem might have been introduced).

Upstream takes over After obtaining the commit that possibly fixed the bug, while talking to Dann and explaining what I did, he suggested that I should file an upstream bug and check with them. Great idea, of course. I filed the following upstream bug: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120653 It s a bit long, very dense and complex, but ultimately upstream was able to find the real problem and have a patch accepted in just two days. Nothing like knowing the code base. The initial bug became: https://sourceware.org/bugzilla/show_bug.cgi?id=33088 In the end, the problem was indeed in how the linker defines __ehdr_start, which, according to the code (from elf/dl-support.c):
if (_dl_phdr == NULL)
   
    /* Starting from binutils-2.23, the linker will define the
       magic symbol __ehdr_start to point to our own ELF header
       if it is visible in a segment that also includes the phdrs.
       So we can set up _dl_phdr and _dl_phnum even without any
       information from auxv.  */


    extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
    assert (__ehdr_start.e_phentsize == sizeof *GL(dl_phdr));
    _dl_phdr = (const void *) &__ehdr_start + __ehdr_start.e_phoff;
    _dl_phnum = __ehdr_start.e_phnum;
   
But the following definition is the problematic one (from elf/rtld.c):
extern const ElfW(Ehdr) __ehdr_start attribute_hidden;
This symbol (along with its counterpart, __ehdr_end) was being run-time relocated when it shouldn t be. The fix that was pushed added optimization barriers to prevent the compiler from doing the relocations. I don t claim to fully understand what was done here, and Jakub s analysis is a thing to behold, but in the end I was able to confirm that the patch fixed the bug. And in the end, it was indeed a glibc bug.

Conclusion This was an awesome bug to investigate. It s one of those that deserve a blog post, even though some of the final details of the fix flew over my head. I d like to start blogging more about these sort of bugs, because I ve encountered my fair share of them throughout my career. And it was great being able to do some debugging with another person, exchange ideas, learn things together, and ultimately share that deep satisfaction when we find why a crash is happening. I have at least one more bug in my TODO list to write about (another one with glibc, but this time I was able to get to the end of it and come up with a patch). Stay tunned. P.S.: After having published the post I realized that I forgot to explain why the -z now and -fno-strict-aliasing flags were important. -z now is the flag that I determined to be the root cause of the breakage. If I compiled glibc with every hardening flag except -z now, everything worked. So initially I thought that the problem had to do with how ld.so was resolving symbols at runtime. As it turns out, this ended up being more a symptom than the real cause of the bug. As for -fno-strict-aliasing, a Gentoo developer who commented on the GCC bug above mentioned that this OpenSSF bug had a good point against using this flag for hardening. I still have to do a deep dive on what was discussed in the issue, but this is certainly something to take into consideration. There s this very good write-up about strict aliasing in general if you re interested in understanding it better.

17 June 2025

Evgeni Golov: Arguing with an AI or how Evgeni tried to use CodeRabbit

Everybody is trying out AI assistants these days, so I figured I'd jump on that train and see how fast it derails. I went with CodeRabbit because I've seen it on YouTube ads work, I guess. I am trying to answer the following questions: To reduce the amount of output and not to confuse contributors, CodeRabbit was configured to only do reviews on demand. What follows is a rather unscientific evaluation of CodeRabbit based on PRs in two Foreman-related repositories, looking at the summaries CodeRabbit posted as well as the comments/suggestions it had about the code. Ansible 2.19 support PR: theforeman/foreman-ansible-modules#1848 summary posted The summary CodeRabbit posted is technically correct.
This update introduces several changes across CI configuration, Ansible roles, plugins, and test playbooks. It expands CI test coverage to a new Ansible version, adjusts YAML key types in test variables, refines conditional logic in Ansible tasks, adds new default variables, and improves clarity and consistency in playbook task definitions and debug output.
Yeah, it does all of that, all right. But it kinda misses the point that the addition here is "Ansible 2.19 support", which starts with adding it to the CI matrix and then adjusting the code to actually work with that version. Also, the changes are not for "clarity" or "consistency", they are fixing bugs in the code that the older Ansible versions accepted, but the new one is more strict about. Then it adds a table with the changed files and what changed in there. To me, as the author, it felt redundant, and IMHO doesn't add any clarity to understand the changes. (And yes, same "clarity" vs bugfix mistake here, but that makes sense as it apparently miss-identified the change reason) And then the sequence diagrams They probably help if you have a dedicated change to a library or a library consumer, but for this PR it's just noise, especially as it only covers two of the changes (addition of 2.19 to the test matrix and a change to the inventory plugin), completely ignoring other important parts. Overall verdict: noise, don't need this. comments posted CodeRabbit also posted 4 comments/suggestions to the changes. Guard against undefined result.task IMHO a valid suggestion, even if on the picky side as I am not sure how to make it undefined here. I ended up implementing it, even if with slightly different (and IMHO better readable) syntax. Inconsistent pipeline in when for composite CV versions That one was funny! The original complaint was that the when condition used slightly different data manipulation than the data that was passed when the condition was true. The code was supposed to do "clean up the data, but only if there are any items left after removing the first 5, as we always want to keep 5 items". And I do agree with the analysis that it's badly maintainable code. But the suggested fix was to re-use the data in the variable we later use for performing the cleanup. While this is (to my surprise!) valid Ansible syntax, it didn't make the code much more readable as you need to go and look at the variable definition. The better suggestion then came from Ewoud: to compare the length of the data with the number we want to keep. Humans, so smart! But Ansible is not Ewoud's native turf, so he asked whether there is a more elegant way to count how much data we have than to use list count in Jinja (the data comes from a Python generator, so needs to be converted to a list first). And the AI helpfully suggested to use count instead! However, count is just an alias for length in Jinja, so it behaves identically and needs a list. Luckily the AI quickly apologized for being wrong after being pointed at the Jinja source and didn't try to waste my time any further. Wouldn't I have known about the count alias, we'd have committed that suggestion and let CI fail before reverting again. Apply the same fix for non-composite CV versions The very same complaint was posted a few lines later, as the logic there is very similar just slightly different data to be filtered and cleaned up. Interestingly, here the suggestion also was to use the variable. But there is no variable with the data! The text actually says one need to "define" it, yet the "committable suggestion" doesn't contain that part. Interestingly, when asked where it sees the "inconsistency" in that hunk, it said the inconsistency is with the composite case above. That however is nonsense, as while we want to keep the same number of composite and non-composite CV versions, the data used in the task is different it even gets consumed by a totally different playbook so there can't be any real consistency between the branches. I ended up applying the same logic as suggested by Ewoud above. As that refactoring was possible in a consistent way. Ensure consistent naming for Oracle Linux subscription defaults One of the changes in Ansible 2.19 is that Ansible fails when there are undefined variables, even if they are only undefined for cases where they are unused. CodeRabbit complains that the names of the defaults I added are inconsistent. And that is technically correct. But those names are already used in other places in the code, so I'd have to refactor more to make it work properly. Once being pointed at the fact that the variables already exist, the AI is as usual quick to apologize, yay. add new parameters to the repository module PR: theforeman/foreman-ansible-modules#1860 summary posted Again, the summary is technically correct
The repository module was updated to support additional parameters for repository synchronization and authentication. New options were added for ansible collections, ostree, Python packages, and yum repositories, including authentication tokens, filtering controls, and version retention settings. All changes were limited to module documentation and argument specification.
But it doesn't add anything you'd not get from looking at the diff, especially as it contains a large documentation chunk explaining those parameters. No sequence diagram this time. That's a good thing! Overall verdict: noise (even if the amount is small), don't need this. comments posted CodeRabbit generated two comments for this PR. Interestingly, none of them overlapped with the issues ansible-lint and friends found. get rid of the FIXMEs Yepp, that's fair add validation for the new parameters Yepp, I forgot these (not intentionally!). The diff it suggests is nonsense, as it doesn't take into account the existing Ansible and Yum validations, but it clearly has read them as the style etc of the new ones matches. It also managed to group the parameters correctly by repository type, so it's something.
 if module.foreman_params['content_type'] != 'ansible_collection':
     invalid_list = [key for key in ['ansible_collection_requirements'] if key in module.foreman_params]
     if invalid_list:
         module.fail_json(msg="( 0 ) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))
+
+# Validate ansible_collection specific parameters
+if module.foreman_params['content_type'] != 'ansible_collection':
+    invalid_list = [key for key in ['ansible_collection_auth_token', 'ansible_collection_auth_url'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'ansible_collection'".format(",".join(invalid_list)))
+
+# Validate ostree specific parameters
+if module.foreman_params['content_type'] != 'ostree':
+    invalid_list = [key for key in ['depth', 'exclude_refs', 'include_refs'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'ostree'".format(",".join(invalid_list)))
+
+# Validate python package specific parameters
+if module.foreman_params['content_type'] != 'python':
+    invalid_list = [key for key in ['excludes', 'includes', 'package_types', 'keep_latest_packages'] if key in module.foreman_params]
+    if invalid_list:
+        module.fail_json(msg="( 0 ) can only be used with content_type 'python'".format(",".join(invalid_list)))
+
+# Validate yum specific parameter
+if module.foreman_params['content_type'] != 'yum' and 'upstream_authentication_token' in module.foreman_params:
+    module.fail_json(msg="upstream_authentication_token can only be used with content_type 'yum'")
Interestingly, it also said "Note: If 'python' is not a valid content_type, please adjust the validation accordingly." which is quite a hint at a bug in itself. The module currently does not even allow to create content_type=python repositories. That should have been more prominent, as it's a BUG! parameter persistence in obsah PR: theforeman/obsah#72 summary posted Mostly correct. It did miss-interpret the change to a test playbook as an actual "behavior" change: "Introduced new playbook variables for database configuration" there is no database configuration in this repository, just the test playbook using the same metadata as a consumer of the library. Later on it does say "Playbook metadata and test fixtures", so unclear whether this is a miss-interpretation or just badly summarized. As long as you also look at the diff, it won't confuse you, but if you're using the summary as the sole source of information (bad!) it would. This time the sequence diagram is actually useful, yay. Again, not 100% accurate: it's missing the fact that saving the parameters is hidden behind an "if enabled" flag something it did represent correctly for loading them. Overall verdict: not really useful, don't need this. comments posted Here I was a bit surprised, especially as the nitpicks were useful! Persist-path should respect per-user state locations (nitpick) My original code used os.environ.get('OBSAH_PERSIST_PATH', '/var/lib/obsah/parameters.yaml') for the location of the persistence file. CodeRabbit correctly pointed out that this won't work for non-root users and one should respect XDG_STATE_HOME. Ewoud did point that out in his own review, so I am not sure whether CodeRabbit came up with this on its own, or also took the human comments into account. The suggested code seems fine too just doesn't use /var/lib/obsah at all anymore. This might be a good idea for the generic library we're working on here, and then be overridden to a static /var/lib path in a consumer (which always runs as root). In the end I did not implement it, but mostly because I was lazy and was sure we'd override it anyway. Positional parameters are silently excluded from persistence (nitpick) The library allows you to generate both positional (foo without --) and non-positional (--foo) parameters, but the code I wrote would only ever persist non-positional parameters. This was intentional, but there is no documentation of the intent in a comment which the rabbit thought would be worth pointing out. It's a fair nitpick and I ended up adding a comment. Enforce FQDN validation for database_host The library has a way to perform type checking on passed parameters, and one of the supported types is "FQDN" so a fully qualified domain name, with dots and stuff. The test playbook I added has a database_host variable, but I didn't bother adding a type to it, as I don't really need any type checking here. While using "FQDN" might be a bit too strict here technically a working database connection can also use a non-qualified name or an IP address, I was positively surprised by this suggestion. It shows that the rest of the repository was taken into context when preparing the suggestion. reset_args() can raise AttributeError when a key is absent This is a correct finding, the code is not written in a way that would survive if it tries to reset things that are not set. However, that's only true for the case where users pass in --reset-<parameter> without ever having set parameter before. The complaint about the part where the parameter is part of the persisted set but not in the parsed args is wrong as parsed args inherit from the persisted set. The suggested code is not well readable, so I ended up fixing it slightly differently. Persisted values bypass argparse type validation When persisting, I just yaml.safe_dump the parsed parameters, which means the YAML will contain native types like integers. The argparse documentation warns that the type checking argparse does only applies to strings and is skipped if you pass anything else (via default values). While correct, it doesn't really hurt here as the persisting only happens after the values were type-checked. So there is not really a reason to type-check them again. Well, unless the type changes, anyway. Not sure what I'll do with this comment. consider using contextlib.suppress This was added when I asked CodeRabbit for a re-review after pushing some changes. Interestingly, the PR already contained try: except: pass code before, and it did not flag that. Also, the code suggestion contained import contextlib in the middle of the code, instead in the head of the file. Who would do that?! But the comment as such was valid, so I fixed it in all places it is applicable, not only the one the rabbit found. workaround to ensure LCE and CV are always sent together PR: theforeman/foreman-ansible-modules#1867 summary posted
A workaround was added to the _update_entity method in the ForemanAnsibleModule class to ensure that when updating a host, both content_view_id and lifecycle_environment_id are always included together in the update payload. This prevents partial updates that could cause inconsistencies.
Partial updates are not a thing. The workaround is purely for the fact that Katello expects both parameters to be sent, even if only one of them needs an actual update. No diagram, good. Overall verdict: misleading summaries are bad! comments posted Given a small patch, there was only one comment. Implementation looks correct, but consider adding error handling for robustness. This reads correct on the first glance. More error handling is always better, right? But if you dig into the argumentation, you see it's wrong. Either: The AI accepted defeat once I asked it to analyze things in more detail, but why did I have to ask in the first place?! Summary Well, idk, really. Did the AI find things that humans did not find (or didn't bother to mention)? Yes. It's debatable whether these were useful (see e.g. the database_host example), but I tend to be in the "better to nitpick/suggest more and dismiss than oversee" team, so IMHO a positive win. Did the AI output help the humans with the review (useful summary etc)? In my opinion it did not. The summaries were either "lots of words, no real value" or plain wrong. The sequence diagrams were not useful either. Luckily all of that can be turned off in the settings, which is what I'd do if I'd continue using it. Did the AI output help the humans with the code (useful suggestions etc)? While the actual patches it posted were "meh" at best, there were useful findings that resulted in improvements to the code. Was the AI output misleading? Absolutely! The whole Jinja discussion would have been easier without the AI "help". Same applies for the "error handling" in the workaround PR. Was the AI output distracting? The output is certainly a lot, so yes I think it can be distracting. As mentioned, I think dropping the summaries can make the experience less distracting. What does all that mean? I will disable the summaries for the repositories, but will leave the @coderabbitai review trigger active if someone wants an AI-assisted review. This won't be something that I'll force on our contributors and maintainers, but they surely can use it if they want. But I don't think I'll be using this myself on a regular basis. Yes, it can be made "usable". But so can be vim ;-) Also, I'd prefer to have a junior human asking all the questions and making bad suggestions, so they can learn from it, and not some planet burning machine.

Matthew Garrett: Locally hosting an internet-connected server

I'm lucky enough to have a weird niche ISP available to me, so I'm paying $35 a month for around 600MBit symmetric data. Unfortunately they don't offer static IP addresses to residential customers, and nor do they allow multiple IP addresses per connection, and I'm the sort of person who'd like to run a bunch of stuff myself, so I've been looking for ways to manage this.

What I've ended up doing is renting a cheap VPS from a vendor that lets me add multiple IP addresses for minimal extra cost. The precise nature of the VPS isn't relevant - you just want a machine (it doesn't need much CPU, RAM, or storage) that has multiple world routeable IPv4 addresses associated with it and has no port blocks on incoming traffic. Ideally it's geographically local and peers with your ISP in order to reduce additional latency, but that's a nice to have rather than a requirement.

By setting that up you now have multiple real-world IP addresses that people can get to. How do we get them to the machine in your house you want to be accessible? First we need a connection between that machine and your VPS, and the easiest approach here is Wireguard. We only need a point-to-point link, nothing routable, and none of the IP addresses involved need to have anything to do with any of the rest of your network. So, on your local machine you want something like:

[Interface]
PrivateKey = privkeyhere
ListenPort = 51820
Address = localaddr/32

[Peer]
Endpoint = VPS:51820
PublicKey = pubkeyhere
AllowedIPs = VPS/0


And on your VPS, something like:

[Interface]
Address = vpswgaddr/32
SaveConfig = true
ListenPort = 51820
PrivateKey = privkeyhere

[Peer]
PublicKey = pubkeyhere
AllowedIPs = localaddr/32


The addresses here are (other than the VPS address) arbitrary - but they do need to be consistent, otherwise Wireguard is going to be unhappy and your packets will not have a fun time. Bring that interface up with wg-quick and make sure the devices can ping each other. Hurrah! That's the easy bit.

Now you want packets from the outside world to get to your internal machine. Let's say the external IP address you're going to use for that machine is 321.985.520.309 and the wireguard address of your local system is 867.420.696.005. On the VPS, you're going to want to do:

iptables -t nat -A PREROUTING -p tcp -d 321.985.520.309 -j DNAT --to-destination 867.420.696.005

Now, all incoming packets for 321.985.520.309 will be rewritten to head towards 867.420.696.005 instead (make sure you've set net.ipv4.ip_forward to 1 via sysctl!). Victory! Or is it? Well, no.

What we're doing here is rewriting the destination address of the packets so instead of heading to an address associated with the VPS, they're now going to head to your internal system over the Wireguard link. Which is then going to ignore them, because the AllowedIPs statement in the config only allows packets coming from your VPS, and these packets still have their original source IP. We could rewrite the source IP to match the VPS IP, but then you'd have no idea where any of these packets were coming from, and that sucks. Let's do something better. On the local machine, in the peer, let's update AllowedIps to 0.0.0.0/0 to permit packets form any source to appear over our Wireguard link. But if we bring the interface up now, it'll try to route all traffic over the Wireguard link, which isn't what we want. So we'll add table = off to the interface stanza of the config to disable that, and now we can bring the interface up without breaking everything but still allowing packets to reach us. However, we do still need to tell the kernel how to reach the remote VPN endpoint, which we can do with ip route add vpswgaddr dev wg0. Add this to the interface stanza as:

PostUp = ip route add vpswgaddr dev wg0
PreDown = ip route del vpswgaddr dev wg0


That's half the battle. The problem is that they're going to show up there with the source address still set to the original source IP, and your internal system is (because Linux) going to notice it has the ability to just send replies to the outside world via your ISP rather than via Wireguard and nothing is going to work. Thanks, Linux. Thinux.

But there's a way to solve this - policy routing. Linux allows you to have multiple separate routing tables, and define policy that controls which routing table will be used for a given packet. First, let's define a new table reference. On the local machine, edit /etc/iproute2/rt_tables and add a new entry that's something like:

1 wireguard


where "1" is just a standin for a number not otherwise used there. Now edit your wireguard config and replace table=off with table=wireguard - Wireguard will now update the wireguard routing table rather than the global one. Now all we need to do is to tell the kernel to push packets into the appropriate routing table - we can do that with ip rule add from localaddr lookup wireguard, which tells the kernel to take any packet coming from our Wireguard address and push it via the Wireguard routing table. Add that to your Wireguard interface config as:

PostUp = ip rule add from localaddr lookup wireguard
PreDown = ip rule del from localaddr lookup wireguard

and now your local system is effectively on the internet.

You can do this for multiple systems - just configure additional Wireguard interfaces on the VPS and make sure they're all listening on different ports. If your local IP changes then your local machines will end up reconnecting to the VPS, but to the outside world their accessible IP address will remain the same. It's like having a real IP without the pain of convincing your ISP to give it to you.

comment count unavailable comments

16 June 2025

Paul Tagliamonte: The Promised LAN

The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There s a lot of good things about the Internet as of 2025, but there s also an inescapable hole in what it used to be, for me. I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are brave enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level. I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which security offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other. Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month. This is wrong, and those who have seen what was know it. I can t keep doing it. I m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it.

What comes before part b? Throughout the 2000s, some of my favorite memories were from LAN parties at my friends places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working being able to do something fiercely technical in the context of a uniquely social activity. It wasn t really much about the games or the projects it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven t been back to campus in a few years, but I d be surprised if this were still the case. In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it. Over the years, it s evolved very slowly we haven t pulled any all-nighters. Our mantra has become old growth , building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as The Bureau of LAN Management . Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. Nothing here is exotic or even technically interesting.

Applications of trusting trust The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there s no shortage of friends that meet the qualifications but we are not equipped to take that many folks on. It s hard to join -both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It s worked for the last 4 years, and it should continue working until we think of something better. We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they re let into our friends' network, there is a level of trust that works transitively for us I trust them to be on mine. This LAN is not for security , rather, the network border is a social one. Benign hacking in the original sense of misusing systems to do fun and interesting things is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal not technical failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms. Over the last 4 years, we ve grown our own culture and fads around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other s counters. It s incredible how much network transport and a trusting culture gets you there s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of redphones .

DIY We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug. This is a call for you to do the same. Build your own LAN. Connect it with friends homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily. Bring back what we re missing.

Kentaro Hayashi: Fixing long standing font issue about Debian Graphical Installer

Introduction This is just a note-taking about how fixed the long standing font issue about Debian Graphical Installer for up-coming trixie ready. Recently, this issue had been resolved by Cyril Brulebois. Thanks!

What is the problem? Because of Han unification, wrong font typefaces are rendered by default when you choose Japanese language using Graphical Debian installer.
"Wrong" glyph for Japanese
Most of typefaces seems correct, but there are wrong typefaces (Simplified Chinese) which is used for widget rendering. This issue will not be solved during using DroidSansFallback.ttf continuously for Japanese. Thus, it means that we need to switch font itself which contains Japanese typeface to fix this issue. If you wan to know about how Han Unification is harmful in this context, See

What causes this problem? In short, fonts-android (DroidSansFallback.ttf) had been used for CJK, especially for Japanese. Since Debian 9 (stretch), fonts-android was adopted for CJK fonts by default. Thus this issue was not resolved in Debian 9, Debian 10, Debian 11 and Debian 12 release cycle!

What is the impact about this issue? For sadly, Japanese native speakers can recognize such a unexpectedly rendered "Wrong" glyph, so it is not hard to continue Debian installation process. Even if there is no problem with the installer's functionality, it gives a terrible user experience for newbie. For example, how can you trust an installer which contains full of typos? It is similar situation for Japanese users.

How Debian Graphical Installer was fixed? In short, newly fonts-motoya-l-cedar-udeb was bundled for Japanese, and changed to switch that font via gtk-set-font command. It was difficult that what is the best font to deal font file size and visibility. Typically Japanese font file occupies extra a few MB. Luckily, some space was back for Installer, it was not seen as a problem (I guess). As a bonus, we tried to investigate a possibility of font compression mechanism for Installer, but it was regarded as too complicated and not suitable for trixie release cycle.

Conclution
  • The font issue was fixed in Debian Graphical Installer for Japanese
  • As recently fixed, not officially shipped yet (NOTE Debian Installer Trixie RC1 does not contain this fix) Try daily build installer if you want.
This article was written with Ultimate Hacking Keyboard 60 v2 with Rizer 60 (New my gear!).

Sven Hoexter: vym 3 Development Version in experimental

Took some time yesterday to upload the current state of what will be at some point vym 3 to experimental. If you're a user of this tool you can give it a try, but be aware that the file format changed, and can't be processed with vym releases before 2.9.500! Thus it's important to create a backup until you're sure that you're ready to move on. On the technical side this is also the switch from Qt5 to Qt6.

12 June 2025

Dirk Eddelbuettel: #50: Introducing almm: Activate-Linux (based) Market Monitor

Welcome to post 50 in the R4 series. Today we reconnect to a previous post, namely #36 on pub/sub for live market monitoring with R and Redis. It introduced both Redis as well as the (then fairly recent) extensions to RcppRedis to support the publish-subscibe ( pub/sub ) model of Redis. In short, it manages both subscribing clients as well as producer for live, fast and lightweight data transmission. Using pub/sub is generally more efficient than the (conceptually simpler) poll-sleep loops as polling creates cpu and network load. Subscriptions are lighterweight as they get notified, they are also a little (but not much!) more involved as they require a callback function. We should mention that Redis has a recent fork in Valkey that arose when the former did one of these non-uncommon-among-db-companies licenuse suicides which, happy to say, they reversed more recently so that we now have both the original as well as this leading fork (among others). Both work, the latter is now included in several Linux distros, and the C library hiredis used to connect to either is still licensed permissibly as well. All this came about because Yahoo! Finance recently had another hickup in which they changed something leading to some data clients having hiccups. This includes GNOME applet Stocks Extension I had been running. There is a lively discussion on its issue #120 suggestions for example a curl wrapper (which then makes each access a new system call). Separating data acquisition and presentation becomes an attractive alternative, especially given how the standard Python and R accessors to the Yahoo! Finance service continued to work (and how per post #36 I already run data acquisition). Moreoever, and somewhat independently, it occurred to me that the cute (and both funny in its pun, and very pretty in its display) ActivateLinux program might offer an easy-enough way to display updates on the desktop. There were two aspects to address. First, the subscription side needed to be covered in either plain C or C++. That, it turns out, is very straightforward and there are existing documentation and prior examples (e.g. at StackOverflow) as well as the ability to have an LLM generate a quick stanza as I did with Claude. A modified variant is now in the example repo redis-pubsub-examples in file subscriber.c. It is deliberately minimal and the directory does not even have a Makefile: just compile and link against both libevent (for the event loop controlling this) and libhiredis (for the Redis or Valkey connection). This should work on any standard Linux (or macOS) machine with those two (very standard) libraries installed. The second aspect was trickier. While we can get Claude to modify the program to also display under x11, it still uses a single controlling event loop. It took a little bit of probing on my event to understand how to modify (the x11 use of) ActivateLinux, but as always it was reasonably straightforward in the end: instead of one single while loop awaiting events we now first check for pending events and deal with them if present but otherwise do not idle and wait but continue in another loop that also checks on the Redis or Valkey pub/sub events. So two thumbs up to vibe coding which clearly turned me into an x11-savvy programmer too The result is in a new (and currently fairly bare-bones) repo almm. It includes all files needed to build the application, borrowed with love from ActivateLinux (which is GPL-licensed, as is of course our minimal extension) and adds the minimal modifications we made, namely linking with libhiredis and some minimal changes to x11/x11.c. (Supporting wayland as well is on the TODO list, and I also need to release a new RcppRedis version to CRAN as one currently needs the GitHub version.) We also made a simple mp4 video with a sound overlay which describes the components briefly: Comments and questions welcome. I will probably add a little bit of command-line support to the almm. Selecting the symbol subscribed to is currently done in the most minimal way via environment variable SYMBOL (NB: not SYM as the video using the default value shows). I also worked out how to show the display only one of my multiple monitors so I may add an explicit screen id selector too. A little bit of discussion (including minimal Docker use around r2u) is also in issue #121 where I first floated the idea of having StocksExtension listen to Redis (or Valkey). Other suggestions are most welcome, please use issue tickets at the almm repository.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

11 June 2025

Iustin Pop: This blog finally goes git-annex!

A long, long time ago I have a few pictures on this blog, mostly in earlier years, because even with small pictures, the git repository became 80MiB soon this is not much in absolute terms, but the actual Markdown/Haskell/CSS/HTML total size is tiny compared to the picture, PDFs and fonts. I realised I need a better solution, probably about ten years ago, and that I should investigate git-annex. Then time passed, and I heard about git-lfs, so I thought that s the way forward. Now, I recently got interested again into doing something about this repository, and started researching.

Detour: git-lfs I was sure that git-lfs, being supported by large providers, would be the modern solution. But to my surprise, git-lfs is very server centric, which in hindsight makes sense, but for a home setup, it s not very good. Maybe I misunderstood, but git-lfs is more a protocol/method for a forge to store files, rather than an end-user solution. But then you need to backup those files separately (together with the rest of the forge), or implement another way of safeguarding them. Further details such as the fact that it keeps two copies of the files (one in the actual checked-out tree, one in internal storage) means it s not a good solution. Well, for my blog yes, but not in general. Then posts on Reddit about horror stories people being locked out of github due to quota, as an example, or this Stack Overflow post about git-lfs constraining how one uses git, convinced me that s not what I want. To each their own, but not for me I might want to push this blog s repo to github, but I definitely wouldn t want in that case to pay for github storage for my blog images (which are copies, not originals). And yes, even in 2025, those quotas are real GitHub limits and I agree with GitHub, storage and large bandwidth can t be free.

Back to the future: git-annex So back to git-annex. I thought it s going to be a simple thing, but oh boy, was I wrong. It took me half a week of continuous (well, in free time) reading and discussions with LLMs to understand a bit how it works. I think, honestly, it s a bit too complex, which is why the workflows page lists seven (!) levels of workflow complexity, from fully-managed, to fully-manual. IMHO, respect to the author for the awesome tool, but if you need a web app to help you manage git, it hints that the tool is too complex. I made the mistake of running git annex sync once, to realise it actually starts pushing to my upstream repo and creating new branches and whatnot, so after enough reading, I settled on workflow 6/7, since I don t want another tool to manage my git history. Maybe I m an outlier here, but everything automatic is a bit too much for me. Once you do managed yourself how git-annex works (on the surface, at least), it is a pretty cool thing. It uses a git-annex git branch to store metainformation, and that is relatively clean. If you do run git annex sync, it creates some extra branches, which I don t like, but meh.

Trick question: what is a remote? One of the most confusing things about git-annex was understanding its remote concept. I thought a remote is a place where you replicate your data. But not, that s a special remote. A normal remote is a git remote, but which is expected to be git/ssh/with command line access. So if you have a git+ssh remote, git-annex will not only try to push it s above-mentioned branch, but also copy the files. If such a remote is on a forge that doesn t support git-annex, then it will complain and get confused. Of course, if you read the extensive docs, you just do git config remote.<name>.annex-ignore true, and it will understand that it should not sync to it. But, aside, from this case, git-annex expects that all checkouts and clones of the repository are both metadata and data. And if you do any annex commands in them, all other clones will know about them! This can be unexpected, and you find people complaining about it, but nowadays there s a solution:
git clone   dir && cd dir
git config annex.private true
git annex init "temp copy"
This is important. Any leaf git clone must be followed by that annex.private true config, especially on CI/CD machines. Honestly, I don t understand why by default clones should be official data stores, but it is what it is. I settled on not making any of my checkouts stable , but only the actual storage places. Except those are not git repositories, but just git-annex storage things. I.e., special remotes. Is it confusing enough yet ?

Special remotes The special remotes, as said, is what I expected to be the normal git annex remotes, i.e. places where the data is stored. But well, they exist, and while I m only using a couple simple ones, there is a large number of them. Among the interesting ones: git-lfs, a remote that allows also storing the git repository itself (git-remote-annex), although I m bit confused about this one, and most of the common storage providers via the rclone remote. Plus, all of the special remotes support encryption, so this is a really neat way to store your files across a large number of things, and handle replication, number of copies, from which copy to retrieve, etc. as you with.

And many of other features git-annex has tons of other features, so to some extent, the sky s the limit. Automatic selection of what to add git it vs plain git, encryption handling, number of copies, clusters, computed files, etc. etc. etc. I still think it s cool but too complex, though!

Uses Aside from my blog post, of course. I ve seen blog posts/comments about people using git-annex to track/store their photo collection, and I could see very well how the remote encrypted repos any of the services supported by rclone could be an N+2 copy or so. For me, tracking photos would be a bit too tedious, but it could maybe work after more research. A more practical thing would probably be replicating my local movie collection (all legal, to be clear) better than just run rsync from time to time and tracking the large files in it via git-annex. That s an exercise for another day, though, once I get more mileage with it - my blog pictures are copies, so I don t care much if they get lost, but movies are primary online copies, and I don t want to re-dump the discs. Anyway, for later.

Migrating to git-annex Migrating here means ending in a state where all large files are in git-annex, and the plain git repo is small. Just moving the files to git annex at the current head doesn t remove them from history, so your git repository is still large; it won t grow in the future, but remains with old size (and contains the large files in its history). In my mind, a nice migration would be: run a custom command, and all the history is migrated to git-annex, so I can go back in time and the still use git-annex. I na vely expected this would be easy and already available, only to find comments on the git-annex site with unsure git-filter-branch calls and some web discussions. This is the discussion on the git annex website, but it didn t make me confident it would do the right thing. But that discussion is now 8 years old. Surely in 2025, with git-filter-repo, it s easier? And, maybe I m missing something, but it is not. Not from the point of view of plain git, that s easy, but because interacting with git-annex, which stores its data in git itself, so doing this properly across successive steps of a repo (when replaying the commits) is, I think, not well defined behaviour. So I was stuck here for a few days, until I got an epiphany: As I m going to rewrite the repository, of course I m keeping a copy of it from before git-annex. If so, I don t need the history, back in time, to be correct in the sense of being able to retrieve the binary files too. It just needs to be correct from the point of view of the actual Markdown and Haskell files that represent the meat of the blog. This simplified the problem a lot. At first, I wanted to just skip these files, but this could also drop commits (git-filter-repo, by default, drops the commits if they re empty), and removing the files loses information - when they were added, what were the paths, etc. So instead I came up with a rather clever idea, if I might say so: since git-annex replaces files with symlinks already, just replace the files with symlinks in the whole history, except symlinks that are dangling (to represent the fact that files are missing). One could also use empty files, but empty files are more valid in a sense than dangling symlinks, hence why I settled on those. Doing this with git-filter-repo is easy, in newer versions, with the new --file-info-callback. Here is the simple code I used:
import os
import os.path
import pathlib

SKIP_EXTENSIONS= 'jpg', 'jpeg', 'png', 'pdf', 'woff', 'woff2' 
FILE_MODES =  b"100644", b"100755" 
SYMLINK_MODE = b"120000"

fas_string = filename.decode()
path = pathlib.PurePosixPath(fas_string)
ext = path.suffix.removeprefix('.')

if ext not in SKIP_EXTENSIONS:
  return (filename, mode, blob_id)

if mode not in FILE_MODES:
  return (filename, mode, blob_id)

print(f"Replacing ' filename ' (extension '. ext ') in  os.getcwd() ")

symlink_target = '/none/binary-file-removed-from-git-history'.encode()
new_blob_id = value.insert_file_with_contents(symlink_target)
return (filename, SYMLINK_MODE, new_blob_id)
This goes and replaces files with a symlink to nowhere, but the symlink should explain why it s dangling. Then later renames or moving the files around work naturally , as the rename/mv doesn t care about file contents. Then, when the filtering is done via:
git-filter-repo --file-info-callback <(cat ~/filter-big.py ) --force
It is easy to onboard to git annex:
  • remove all dangling symlinks
  • copy the (binary) files from the original repository
  • since they re named the same, and in the same places, git sees a type change
  • then simply run git annex add on those files
For me it was easy as all such files were in a few directories, so just copying those directories back, a few git-annex add commands, and done. Of course, then adding a few rsync remotes, git annex copy --to, and the repository was ready. Well, I also found a bug in my own Hakyll setup: on a fresh clone, when the large files are just dangling symlinks, the builder doesn t complain, just ignores the images. Will have to fix.

Other resources This is a blog that I read at the beginning, and I found it very useful as an intro: https://switowski.com/blog/git-annex/. It didn t help me understand how it works under the covers, but it is well written. The author does use the sync command though, which is too magic for me, but also agrees about its complexity

The proof is in the pudding And now, for the actual first image to be added that never lived in the old plain git repository. It s not full-res/full-size, it s cropped a bit on the bottom. Earlier in the year, I went to Paris for a very brief work trip, and I walked around a bit it was more beautiful than what I remembered from way way back. So a bit random selection of a picture, but here it is:
Un bateau sur la Seine Un bateau sur la Seine
Enjoy!

Next.

Previous.