Search Results: "stw"

27 January 2026

Elana Hashman: A beginner's guide to improving your digital security

In 2017, I led a series of workshops aimed at teaching beginners a better understanding of encryption, how the internet works, and their digital security. Nearly a decade later, there is still a great need to share reliable resources and guides on improving these skills. I have worked professionally in computer security one way or another for well over a decade, at many major technology companies and in many open source software projects. There are many inaccurate and unreliable resources out there on this subject, put together by well-meaning people without a background in security, which can lead to sharing misinformation, exaggeration and fearmongering. I hope that I can offer you a trusted, curated list of high impact things that you can do right now, using whichever vetted guide you prefer. In addition, I also include how long it should take, why you should do each task, and any limitations. This guide is aimed at improving your personal security, and does not apply to your work-owned devices. Always assume your company can monitor all of your messages and activities on work devices. What can I do to improve my security right away? I put together this list in order of effort, easiest tasks first. You should be able to complete many of the low effort tasks in a single hour. The medium to high effort tasks are very much worth doing, but may take you a few days or even weeks to complete them. Low effort (<15 minutes) Upgrade your software to the latest versions Why? I don't know anyone who hasn't complained about software updates breaking features, introducing bugs, and causing headaches. If it ain't broke, why upgrade, right? Well, alongside all of those annoying bugs and breaking changes, software updates also include security fixes, which will protect your device from being exploited by bad actors. Security issues can be found in software at any time, even software that's been available for many years and thought to be secure. You want to install these as soon as they are available. Recommendation: Turn on automatic upgrades and always keep your devices as up-to-date as possible. If you have some software you know will not work if you upgrade it, at least be sure to upgrade your laptop and phone operating system (iOS, Android, Windows, etc.) and web browser (Chrome, Safari, Firefox, etc.). Do not use devices that do not receive security support (e.g. old Android or iPhones). Guides: Limitations: This will prevent someone from exploiting known security issues on your devices, but it won't help if your device was already compromised. If this is a concern, doing a factory reset, upgrade, and turning on automatic upgrades may help. This also won't protect against all types of attacks, but it is a necessary foundation. Use Signal Why? Signal is a trusted, vetted, secure messaging application that allows you to send end-to-end encrypted messages and make video/phone calls. This means that only you and your intended recipient can decrypt the messages and someone cannot intercept and read your messages, in contrast to texting (SMS) and other insecure forms of messaging. Other applications advertise themselves as end-to-end encrypted, but Signal provides the strongest protections. Recommendation: I recommend installing the Signal app and using it! My mom loves that she can video call me on Wi-Fi on my Android phone. It also supports group chats. I use it as a secure alternative to texting (SMS) and other chat platforms. I also like Signal's "disappearing messages" feature which I enable by default because it automatically deletes messages after a certain period of time. This avoids your messages taking up too much storage. Guides: Limitations: Signal is only able to protect your messages in transit. If someone has access to your phone or the phone of the person you sent messages to, they will still be able to read them. As a rule of thumb, if you don't want someone to read something, don't write it down! Meet in person or make an encrypted phone call where you will not be overheard. If you are talking to someone you don't know, assume your messages are as public as posting on social media. Set passwords and turn on device encryption Why? Passwords ensure that someone else can't unlock your device without your consent or knowledge. They also are required to turn on device encryption, which protects your information on your device from being accessed when it is locked. Biometric (fingerprint or face ID) locking provides some privacy, but your fingerprint or face ID can be used against your wishes, whereas if you are the only person who knows your password, only you can use it. Recommendation: Always set passwords and have device encryption enabled in order to protect your personal privacy. It may be convenient to allow kids or family members access to an unlocked device, but anyone else can access it, too! Use strong passwords that cannot be guessed avoid using names, birthdays, phone numbers, addresses, or other public information. Using a password manager will make creating and managing passwords even easier. Disable biometric unlock, or at least know how to disable it. Most devices will enable disk encryption by default, but you should double-check. Guides: Limitations: If your device is unlocked, the password and encryption will provide no protections; the device must be locked for this to protect your privacy. It is possible, though unlikely, for someone to gain remote access to your device (for example through malware or stalkerware), which would bypass these protections. Some forensic tools are also sophisticated enough to work with physical access to a device that is turned on and locked, but not a device that is turned off/freshly powered on and encrypted. If you lose your password or disk encryption key, you may lose access to your device. For this reason, Windows and Apple laptops can make a cloud backup of your disk encryption key. However, a cloud backup can potentially be disclosed to law enforcement. Install an ad blocker Why? Online ad networks are often exploited to spread malware to unsuspecting visitors. If you've ever visited a regular website and suddenly seen an urgent, flashing pop-up claiming your device was hacked, it is often due to a bad ad. Blocking ads provides an additional layer of protection against these kinds of attacks. Recommendation: I recommend everyone uses an ad blocker at all times. Not only are ads annoying and disruptive, but they can even result in your devices being compromised! Guides: Limitations: Sometimes the use of ad blockers can break functionality on websites, which can be annoying, but you can temporarily disable them to fix the problem. These may not be able to block all ads or all tracking, but they make browsing the web much more pleasant and lower risk! Some people might also be concerned that blocking ads might impact the revenue of their favourite websites or creators. In this case, I recommend either donating directly or sharing the site with a wider audience, but keep using the ad blocker for your safety. Enable HTTPS-Only Mode Why? The "S" in "HTTPS" stands for "secure". This feature, which can be enabled on your web browser, ensures that every time you visit a website, your connection is always end-to-end encrypted (just like when you use Signal!) This ensures that someone can't intercept what you search for, what pages on websites you visit, and any information you or the website share such as your banking details. Recommendation: I recommend enabling this for everyone, though with improvements in web browser security and adoption of HTTPS over the years, your devices will often do this by default! There is a small risk you will encounter some websites that do not support HTTPS, usually older sites. Guides: Limitations: HTTPS protects the information on your connection to a website. It does not hide or protect the fact that you visited that website, only the information you accessed. If the website is malicious, HTTPS does not provide any protection. In certain settings, like when you use a work-managed computer that was set up for you, it can still be possible for your IT Department to see what you are browsing, even over an HTTPS connection, because they have administrator access to your computer and the network. Medium to high effort (1+ hours) These tasks require more effort but are worth the investment. Set up a password manager Why? It is not possible for a person to remember a unique password for every single website and app that they use. I have, as of writing, 556 passwords stored in my password manager. Password managers do three important things very well:
  1. They generate secure passwords with ease. You don't need to worry about getting your digits and special characters just right; the app will do it for you, and generate long, secure passwords.
  2. They remember all your passwords for you, and you just need to remember one password to access all of them. The most common reason people's accounts get hacked online is because they used the same password across multiple websites, and one of the websites had all their passwords leaked. When you use a unique password on every website, it doesn't matter if your password gets leaked!
  3. They autofill passwords based on the website you're visiting. This is important because it helps prevent you from getting phished. If you're tricked into visiting an evil lookalike site, your password manager will refuse to fill the password.
Recommendation: These benefits are extremely important, and setting up a password manager is often one of the most impactful things you can do for your digital security. However, they take time to get used to, and migrating all of your passwords into the app (and immediately changing them!) can take a few minutes at a time... over weeks. I recommend you prioritize the most important sites, such as your email accounts, banking/financial sites, and cellphone provider. This process will feel like a lot of work, but you will get to enjoy the benefits of never having to remember new passwords and the autofill functionality for websites. My recommended password manager is 1Password, but it stores passwords in the cloud and costs money. There are some good free options as well if cost is a concern. You can also use web browser- or OS-based password managers, but I do not prefer these. Guides: Limitations: Many people are concerned about the risk of using a password manager causing all of their passwords to be compromised. For this reason, it's very important to use a vetted, reputable password manager that has passed audits, such as 1Password or Bitwarden. It is also extremely important to choose a strong password to unlock your password manager. 1Password makes this easier by generating a secret to strengthen your unlock password, but I recommend using a long, memorable password in any case. Another risk is that if you forget your password manager's password, you will lose access to all your passwords. This is why I recommend 1Password, which has you set up an Emergency Kit to recover access to your account. Set up two-factor authentication (2FA) for your accounts Why? If your password is compromised in a website leak or due to a phishing attack, two-factor authentication will require a second piece of information to log in and potentially thwart the intruder. This provides you with an extra layer of security on your accounts. Recommendation: You don't necessarily need to enable 2FA on every account, but prioritize enabling it on your most important accounts (email, banking, cellphone, etc.) There are typically a few different kinds: email-based (which is why your email account's security is so important), text message or SMS-based (which is why your cell phone account's security is so important), app-based, and hardware token-based. Email and text message 2FA are fine for most accounts. You may want to enable app- or hardware token-based 2FA for your most sensitive accounts. Guides: Limitations: The major limitation is that if you lose access to 2FA, you can be locked out of an account. This can happen if you're travelling abroad and can't access your usual cellphone number, if you break your phone and you don't have a backup of your authenticator app, or if you lose your hardware-based token. For this reason, many websites will provide you with "backup tokens" you can print them out and store them in a secure location or use your password manager. I also recommend if you use an app, you choose one that will allow you to make secure backups, such as Ente. You are also limited by the types of 2FA a website supports; many don't support app- or hardware token-based 2FA. Remove your information from data brokers Why? This is a problem that mostly affects people in the US. It surprises many people that information from their credit reports and other public records is scraped and available (for free or at a low cost) online through "data broker" websites. I have shocked friends who didn't believe this was an issue by searching for their full names and within 5 minutes being able to show them their birthday, home address, and phone number. This is a serious privacy problem! Recommendation: Opt out of any and all data broker websites to remove this information from the internet. This is especially important if you are at risk of being stalked or harassed. Guides: Limitations: It can take time for your information to be removed once you opt out, and unfortunately search engines may have cached your information for a while longer. This is also not a one-and-done process. New data brokers are constantly popping up and some may not properly honour your opt out, so you will need to check on a regular basis (perhaps once or twice a year) to make sure your data has been properly scrubbed. This also cannot prevent someone from directly searching public records to find your information, but that requires much more effort. "Recommended security measures" I think beginners should avoid We've covered a lot of tasks you should do, but I also think it's important to cover what not to do. I see many of these tools recommended to security beginners, and I think that's a mistake. For each tool, I will explain my reasoning around why I don't think you should use it, and the scenarios in which it might make sense to use. "Secure email" What is it? Many email providers, such as Proton Mail, advertise themselves as providing secure email. They are often recommended as a "more secure" alternative to typical email providers such as GMail. What's the problem? Email is fundamentally insecure by design. The email specification (RFC-3207) states that any publicly available email server MUST NOT require the use of end-to-end encryption in transit. Email providers can of course provide additional security by encrypting their copies of your email, and providing you access to your email by HTTPS, but the messages themselves can always be sent without encryption. Some platforms such as Proton Mail advertise end-to-end encrypted emails so long as you email another Proton user. This is not truly email, but their own internal encrypted messaging platform that follows the email format. What should I do instead? Use Signal to send encrypted messages. NEVER assume the contents of an email are secure. Who should use it? I don't believe there are any major advantages to using a service such as this one. Even if you pay for a more "secure" email provider, the majority of your emails will still be delivered to people who don't. Additionally, while I don't use or necessarily recommend their service, Google offers an Advanced Protection Program for people who may be targeted by state-level actors. PGP/GPG Encryption What is it? PGP ("Pretty Good Privacy") and GPG ("GNU Privacy Guard") are encryption and cryptographic signing software. They are often recommended to encrypt messages or email. What's the problem? GPG is decades old and its usability has always been terrible. It is extremely easy to accidentally send a message that you thought was encrypted without encryption! The problems with PGP/GPG have been extensively documented. What should I do instead? Use Signal to send encrypted messages. Again, NEVER use email for sensitive information. Who should use it? Software developers who contribute to projects where there is a requirement to use GPG should continue to use it until an adequate alternative is available. Everyone else should live their lives in PGP-free bliss. Installing a "secure" operating system (OS) on your phone What is it? There are a number of self-installed operating systems for Android phones, such as GrapheneOS, that advertise as being "more secure" than using the version of the Android operating system provided by your phone manufacturer. They often remove core Google APIs and services to allow you to "de-Google" your phone. What's the problem? These projects are relatively niche, and don't have nearly enough resourcing to be able to respond to the high levels of security pressure Android experiences (such as against the forensic tools I mentioned earlier). You may suddenly lose security support with no notice, as with CalyxOS. You need a high level of technical know-how and a lot of spare time to maintain your device with a custom operating system, which is not a reasonable expectation for the average person. By stripping all Google APIs such as Google Play Services, some useful apps can no longer function. And some law enforcement organizations have gone as far as accusing people who install GrapheneOS on Pixel phones to be engaging in criminal activity. What should I do instead? For the best security on an Android device, use a phone manufactured by Google or Samsung (smaller manufacturers are more unreliable), or consider buying an iPhone. Make sure your device is receiving security updates and up-to-date. Who should use it? These projects are great for tech enthusiasts who are interested in contributing to and developing them further. They can be used to give new life to old phones that are not receiving security or software updates. They are also great for people with an interest in free and open source software and digital autonomy. But these tools are not a good choice for a general audience, nor do they provide more practical security than using an up-to-date Google or Samsung Android phone. Virtual Private Network (VPN) Services What is it? A virtual private network or VPN service can provide you with a secure tunnel from your device to the location that the VPN operates. This means that if I am using my phone in Seattle connected to a VPN in Amsterdam, if I access a website, it appears to the website that my phone is located in Amsterdam. What's the problem? VPN services are frequently advertised as providing security or protection from nefarious bad actors, or helping protect your privacy. These benefits are often far overstated, and there are predatory VPN providers that can actually be harmful. It costs money and resources to provide a VPN, so free VPN services are especially suspect. When you use a VPN, the VPN provider knows the websites you are visiting in order to provide you with the service. Free VPN providers may sell this data in order to cover the cost of providing the service, leaving you with less security and privacy. The average person does not have the knowledge to be able to determine if a VPN service is trustworthy or not. VPNs also don't provide any additional encryption benefits if you are already using HTTPS. They may provide a small amount of privacy benefit if you are connected to an untrusted network with an attacker. What should I do instead? Always use HTTPS to access websites. Don't connect to untrusted internet providers for example, use cellphone network data instead of a sketchy Wi-Fi access point. Your local neighbourhood coffee shop is probably fine. Who should use it? There are three main use cases for VPNs. The first is to bypass geographic restrictions. A VPN will cause all of your web traffic to appear to be coming from another location. If you live in an area that has local internet censorship policies, you can use a VPN to access the internet from a location that lacks such policies. The second is if you know your internet service provider is actively hostile or malicious. A trusted VPN will protect the visibility of all your traffic, including which websites you visit, from your internet service provider, and the only thing they will be able to see is that you are accessing a VPN. The third use case is to access a network that isn't connected to the public internet, such as a corporate intranet. I strongly discourage the use of VPNs for "general-purpose security." Tor What is it? Tor, "The Onion Router", is a free and open source software project that provides anonymous networking. Unlike with a VPN, where the VPN provider knows who you are and what websites you are requesting, Tor's architecture makes it extremely difficult to determine who sent a request. What's the problem? Tor is difficult to set up properly; similar to PGP-encrypted email, it is possible to accidentally not be connected to Tor and not know the difference. This usability has improved over the years, but Tor is still not a good tool for beginners to use. Due to the way Tor works, it is also extremely slow. If you have used cable or fiber internet, get ready to go back to dialup speeds. Tor also doesn't provide perfect privacy and without a strong understanding of its limitations, it can be possible to deanonymize someone despite using it. Additionally, many websites are able to detect connections from the Tor network and block them. What should I do instead? If you want to use Tor to bypass censorship, it is often better to use a trusted VPN provider, particularly if you need high bandwidth (e.g. for streaming). If you want to use Tor to access a website anonymously, Tor itself might not be enough to protect you. For example, if you need to provide an email address or personal information, you can decline to provide accurate information and use a masked email address. A friend of mine once used the alias "Nunya Biznes" Who should use it? Tor should only be used by people who are experienced users of security tools and understand its strengths and limitations. Tor also is best used on a purpose-built system, such as Tor Browser or Freedom of the Press Foundation's SecureDrop. I want to learn more! I hope you've found this guide to be a useful starting point. I always welcome folks reaching out to me with questions, though I might take a little bit of time to respond. You can always email me. If there's enough interest, I might cover the following topics in a future post: Stay safe out there!

17 November 2025

Rodrigo Siqueira: XDC 2025

It has been a long time since I published any update in this space. Since this was a year of colossal changes for me, maybe it is also time for me to make something different with this blog and publish something just for a change why not start talking about XDC 2025? This year, I attended XDC 2025 in Vienna as an Igalia developer. I was thrilled to see some faces from people I worked with in the past and people I m working with now. I had a chance to hang out with some folks I worked with at AMD (Harry, Alex, Leo, Christian, Shashank, and Pierre), many Igalians ( an, Job, Ricardo, Paulo, Tvrtko, and many others), and finally some developers from Valve. In particular, I met T mur in person for the first time, even though we have been talking for months about GPU recovery. Speaking of GPU recovery, we held a workshop on this topic together. The workshop was packed with developers from different companies, which was nice because it added different angles on this topic. We began our discussion by focusing on the topic of job resubmission. Christian began sharing a brief history of how the AMDGPU driver started handling resubmission and the associated issues. After learning from erstwhile experience, amdgpu ended up adopting the following approach:
  1. When a job cause a hang, call driver specific handler.
  2. Stop the scheduler.
  3. Copy all jobs from the ring buffer, minus the job that caused the issue, to a temporary ring.
  4. Reset the ring buffer.
  5. Copy back the other jobs to the ring buffer.
  6. Resume the scheduler.
Below, you can see one crucial series associated with amdgpu recovery implementation: The next topic was a discussion around the replacement of drm_sched_resubmit_jobs() since this function became deprecated. Just a few drivers still use this function, and they need a replacement for that. Some ideas were floating around to extract part of the specific implementation from some drivers into a generic function. The next day, Philipp Stanner continued to discuss this topic in his workshop, DRM GPU Scheduler. Another crucial topic discussed was improving GPU reset debuggability to narrow down which operations cause the hang (keep in mind that GPU recovery is a medicine, not the cure to the problem). Intel developers shared their strategy for dealing with this by obtaining hints from userspace, which helped them provide a better set of information to append to the devcoredump. AMD could adopt this alongside dumping the IB data into the devcoredump (I am already investigating this). Finally, we discussed strategies to avoid hang issues regressions. In summary, we have two lines of defense: Lighting talk This year, as always, XDC was super cool, packed with many engaging presentations which I highly recommend everyone check out. If you are interested, check the schedule and the presentation recordings available on the X.Org Foundation Youtube page. Anyway, I hope this blog post marks the inauguration of a new era for this site, where I will start posting more content ranging from updates to tutorials. See you soon.

5 November 2025

Reproducible Builds: Reproducible Builds in October 2025

Welcome to the October 2025 report from the Reproducible Builds project! Welcome to the very latest report from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website. In this report:

  1. Farewell from the Reproducible Builds Summit 2025
  2. Google s Play Store breaks reproducible builds for Signal
  3. Mailing list updates
  4. The Original Sin of Computing that no one can fix
  5. Reproducible Builds at the Transparency.dev summit
  6. Supply Chain Security for Go
  7. Three new academic papers published
  8. Distribution work
  9. Upstream patches
  10. Website updates
  11. Tool development

Farewell from the Reproducible Builds Summit 2025 Thank you to everyone who joined us at the Reproducible Builds Summit in Vienna, Austria! We were thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. During this event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim was to create an inclusive space that fosters collaboration, innovation and problem-solving. The agenda of the three main days is available online however, some working sessions may still lack notes at time of publication. One tangible outcome of the summit is that Johannes Starosta finished their rebuilderd tutorial, which is now available online and Johannes is actively seeking feedback.

Google s Play Store breaks reproducible builds for Signal On the issue tracker for the popular Signal messenger app, developer Greyson Parrelli reports that updates to the Google Play store have, in effect, broken reproducible builds:
The most recent issues have to do with changes to the APKs that are made by the Play Store. Specifically, they add some attributes to some .xml files around languages are resources, which is not unexpected because of how the whole bundle system works. This is trickier to resolve, because unlike current expected differences (like signing information), we can t just exclude a whole file from the comparison. We have to take a more nuanced look at the diff. I ve been hesitant to do that because it ll complicate our currently-very-readable comparison script, but I don t think there s any other reasonable option here.
The full thread with additional context is available on GitHub.

Mailing list updates On our mailing list this month:
  • kpcyrd forwarded a fascinating tidbit regarding so-called ninja and samurai build ordering, that uses data structures in which the pointer values returned from malloc are used to determine some order of execution.
  • Arnout Engelen, Justin Cappos, Ludovic Court s and kpcyrd continued a conversation started in September regarding the Minimum Elements for a Software Bill of Materials . (Full thread)
  • Felix Moessbauer of Siemens posted to the list reporting that he had recently stumbled upon a couple of Debian source packages on the snapshot mirrors that are listed multiple times (same name and version), but each time with a different checksum . The thread, which Felix titled, Debian: what precisely identifies a source package is about precisely that what can be axiomatically relied upon by consumers of the Debian archives, as well as indicating an issue where we can t exactly say which packages were used during build time (even when having the .buildinfo files).
  • Luca DiMaio posted to the list announcing the release of xfsprogs 6.17.0 which specifically includes a commit that implements the functionality to populate a newly created XFS filesystem directly from an existing directory structure which makes it easier to create populated filesystems without having to mount them [and thus is] particularly useful for reproducible builds . Luca asked the list how they might contribute to the docs of the System images page.

The Original Sin of Computing that no one can fix Popular YouTuber @laurewired published a video this month with an engaging take on the Trusting Trust problem. Titled The Original Sin of Computing that no one can fix, the video touches on David A. Wheeler s Diverse Double-Compiling dissertation. GNU developer Janneke Nieuwenhuizen followed-up with an email (additionally sent to our mailing list) as well, underscoring that GNU Mes s current solution [to this issue] uses ancient softwares in its bootstrap path, such as gcc-2.95.3 and glibc-2.2.5 . (According to Colby Russell, the GNU Mes bootstrapping sequence is shown at 18m54s in the video.)

Reproducible Builds at the Transparency.dev summit Holger Levsen gave a talk at this year s Transparency.dev summit in Gothenburg, Sweden, outlining the achievements of the Reproducible Builds project in the last 12 years, covering both upstream developments as well as some distribution-specific details. As mentioned on the talk s page, Holger s presentation concluded with an outlook into the future and an invitation to collaborate to bring transparency logs into Reproducible Builds projects . The slides of the talk are available, although a video has yet to be released. Nevertheless, as a result of the discussions at Transparency.dev there is a new page on the Debian wiki with the aim of describing a potential transparency log setup for Debian.

Supply Chain Security for Go Andrew Ayer has setup a new service at sourcespotter.com that aims to monitor the supply chain security for Go releases. It consists of four separate trackers:
  1. A tool to verify that the Go Module Mirror and Checksum Database is behaving honestly and has not presented inconsistent information to clients.
  2. A module monitor that records every module version served by the Go Module Mirror and Checksum Database, allowing you to monitor for unexpected versions of your modules.
  3. A tool to verifies that the Go toolchains published in the Go Module Mirror can be reproduced from source code, making it difficult to hide backdoors in the binaries downloaded by the go command.
  4. A telemetry config tracker that tracks the names of telemetry counters uploaded by the Go toolchain, to ensure that Go telemetry is not violating users privacy.
As the homepage of the service mentions, the trackers are free software and do not rely on Google infrastructure.

Three new academic papers published Julien Malka of the Institut Polytechnique de Paris published an exciting paper this month on How NixOS could have detected the XZ supply-chain attack for the benefit of all thanks to reproducible-builds. Julien outlines his paper as follows:
In March 2024, a sophisticated backdoor was discovered in xz, a core compression library in Linux distributions, covertly inserted over three years by a malicious maintainer, Jia Tan. The attack, which enabled remote code execution via ssh, was only uncovered by chance when Andres Freund investigated a minor performance issue. This incident highlights the vulnerability of the open-source supply chain and the effort attackers are willing to invest in gaining trust and access. In this article, I analyze the backdoor s mechanics and explore how bitwise build reproducibility could have helped detect it.
A PDF of the paper is available online.
Iy n M ndez Veiga and Esther H nggi (of the Lucerne University of Applied Sciences and Arts and ETH Zurich) published a paper this month on the topic of Reproducible Builds for Quantum Computing. The abstract of their paper mentions the following:
Although quantum computing is a rapidly evolving field of research, it can already benefit from adopting reproducible builds. This paper aims to bridge the gap between the quantum computing and reproducible builds communities. We propose a generalization of the definition of reproducible builds in the quantum setting, motivated by two threat models: one targeting the confidentiality of end users data during circuit preparation and submission to a quantum computer, and another compromising the integrity of quantum computation results. This work presents three examples that show how classical information can be hidden in transpiled quantum circuits, and two cases illustrating how even minimal modifications to these circuits can lead to incorrect quantum computation results.
A full PDF of their paper is available.
Congratulations to Georg Kofler who submitted their Master s thesis for the Johannes Kepler University of Linz, Austria on the topic of Reproducible builds of E2EE-messengers for Android using Nix hermetic builds:
The thesis focuses on providing a reproducible build process for two open-source E2EE messaging applications: Signal and Wire. The motivation to ensure reproducibility and thereby the integrity of E2EE messaging applications stems from their central role as essential tools for modern digital privacy. These applications provide confidentiality for private and sensitive communications, and their compromise could undermine encryption mechanisms, potentially leaking sensitive data to third parties.
A full PDF of their thesis is available online.
Shawkot Hossain of Aalto University, Finland has also submitted their Master s thesis on the The Role of SBOM in Modern Development with a focus on the extant tooling:
Currently, there are numerous solutions and techniques available in the market to tackle supply chain security, and all claim to be the best solution. This thesis delves deeper by implementing those solutions and evaluates them for better understanding. Some of the tools that this thesis implemented are Syft, Trivy, Grype, FOSSA, dependency-check, and Gemnasium. Software dependencies are generated in a Software Bill of Materials (SBOM) format by using these open-source tools, and the corresponding results have been analyzed. Among these tools, Syft and Trivy outperform others as they provide relevant and accurate information on software dependencies.
A PDF of the thesis is also available.

Distribution work Michael Plura published an interesting article on Heise.de on the topic of Trust is good, reproducibility is better:
In the wake of growing supply chain attacks, the FreeBSD developers are relying on a transparent build concept in the form of Zero-Trust Builds. The approach builds on the established Reproducible Builds, where binary files can be rebuilt bit-for-bit from the published source code. While reproducible builds primarily ensure verifiability, the zero-trust model goes a step further and removes trust from the build process itself. No single server, maintainer, or compiler can be considered more than potentially trustworthy.
The article mentions that this goal has now been achieved with a slight delay and can be used in the current development branch for FreeBSD 15 .
In Debian this month, 7 reviews of Debian packages were added, 5 were updated and 11 were removed this month adding to our knowledge about identified issues. For the Debian CI tests Holger fixed #786644 and set nocheck in DEB_BUILD_OPTIONS for the 2nd build..
Lastly, Bernhard M. Wiedemann posted another openSUSE monthly update for their work there.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Website updates Once again, there were a number of improvements made to our website this month including: In addition, a number of contributors added a series of notes from our recent summit to the website, including Alexander Couzens [ ], Robin Candau [ ][ ][ ][ ][ ][ ][ ][ ][ ] and kpcyrd [ ].

Tool development diffoscope version 307 was uploaded to Debian unstable by Chris Lamb, who made a number of changes including fixing compatibility with LLVM version 21 [ ], an attempt to automatically attempt to deploy to PyPI by liaising with the PyPI developers/maintainers (with this experimental feature). [ ] In addition, Vagrant Cascadian updated diffoscope in GNU Guix to version 307.

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 November 2025

Russ Allbery: Review: The Raven Scholar

Review: The Raven Scholar, by Antonia Hodgson
Series: Eternal Path Trilogy #1
Publisher: Orbit
Copyright: April 2025
ISBN: 0-316-57723-5
Format: Kindle
Pages: 651
The Raven Scholar is an epic fantasy and the first book of a projected trilogy. It is Antonia Hodgson's first published fantasy novel; her previous published novels are historical mystery. I would classify this as adult fantasy the main character is thirty-four with a stable court position but it has strong YA vibes because of the generational turnover feel of the main plot. Eight years before the start of this book, Andren Valit attempted to assassinate the emperor and failed. Since then, his widow and three children twins Yana and Ruko and infant Nisthala have been living in disgrace in a cramped apartment, subject to constant inspections and suspicion. As the story opens, they have been summoned to appear before the emperor, escorted by a young and earnest Hound (essentially the state security services) named Shal Worthy. The resulting interrogation is full of dangerous traps. Not all of them will be avoided. The formalization of the consequences of that imperial summons falls to an unpopular Junior Archivist (Third Class) whose one notable skill is her penmanship. A meeting that was disasterous for the Valits becomes unexpectedly fortunate for the archivist, albeit with a poisonous core. Eight years later, Neema Kraa is High Scholar, and Emperor Bersun's twenty-four years of permitted reign is coming to an end. The Festival is about to begin. One representative from each of the empire's eight anats (religious schools) will compete in seven days of Trials, save for the Dragons who do not want the throne and will send a proxy. The victor according to the Trials scoring system will become emperor and reign unquestioned for twenty-four years or until resignation. This is the system that put an end to the era of chaos and has been in place for over a thousand years. On the eve of the Trials, the Raven contender is found murdered. Neema is immediately a suspect; she even has reasons to suspect herself. She volunteers to lead the investigation because she has to know what happened. She is also volunteered to be the replacement Raven contender. There is no chance that she will become emperor; she doesn't even know how to fight. But agnostic Neema has a rather unexpected ally.
As the last chime fades we drop neatly on to the balcony's rusting hand rail, folding our wings with a soft shuffle. Noon, on the ninth day of the eighth month, 1531. Neema Kraa's lodgings. We are here, exactly where we should be, at exactly the right moment, because we are the Raven, and we are magnificent.
The Raven Scholar is a rather good epic fantasy, with some caveats that I'll get to in a moment, but I found it even more fascinating as a genre artifact. I've read my share of epic fantasy over the years, although most of my familiarity of the current wave of new adult fairy epics comes from reviews rather than personal experience. The Raven Scholar is epic fantasy, through and through. There is court intrigue, a main character who is a court functionary unexpectedly thrown into the middle of some problem, civilization-wide stakes, dramatic political alliances, detailed magic and mythological systems, and gods. There were moments that reminded me of a Guy Gavriel Kay novel, although Hodgson's characters tend more towards disarming moments of humanization instead of Kay's operatic scenes of emotional intensity. But The Raven Scholar is also a murder mystery, complete with a crime scene, clues, suspects, evidence, an investigation, a possibly compromised detective, and a morass of possible motives and red herrings. I'm not much of a mystery reader, but this didn't feel like sort of ancillary mystery that might crop up in the course of a typical epic fantasy. It felt like a full-fledged investigation with an amateur detective; one can tell that Hodgson's previous four books were historical mysteries. And then there's the Trials, which are the centerpiece of the book. This book helped me notice that people (okay, me, I'm the people) have been sleeping on the influence of The Hunger Games, Battle Royale, and reality TV (specifically Survivor) on genre fiction, possibly because the more obvious riffs on the idea (Powerless, The Selection) have been young adult or new adult. Once I started looking, I realized this idea is everywhere now: Throne of Glass, Fourth Wing, even The Night Circus to some extent. Competitions with consequences are having a moment. I suspect having a competition to decide the next emperor is going to strike some traditional fantasy readers as sufficiently absurd and unbelievable that it will kick them out of the book. I had a moment of "okay, this is weird, why would anyone stick with this system for so long" myself. But I would encourage such readers to interrogate whether that's only a response from unfamiliarity; after all, strange women lying in ponds distributing swords is no basis for a system of government either. This is hardly the most unrealistic epic fantasy trope, and it has the advantage of being a hell of a plot generator when handled well. Hodgson handles it well. Society in this novel is structured around the anats and the eight Guardians, gods who, according to myth, had returned seven times previously to save the world, but who will destroy the world when they return again. Each Guardian represents a group of characteristics and useful societal functions: the Ox is trustworthy, competent and hard-working; the Fox is a trickster and a rule-bender; the Raven is shrewd and careful and is the Guardian of scholars and lawyers. Each Trial is organized by one of the anats and tests the contenders for the skills most valued by that Guardian, often in subtle and rather ingenious ways. There are flaws here that you could poke at if you wanted to, but I was charmed and thoroughly entertained by how well Hodgson weaves the story around the Trials and uses the conflicting values to create character conflict, unexpected alliances, and engrossing plot. Most importantly for a book of this sort, I liked Neema. She has a charming combination of competence, quirks (she is almost physically unable to not correct people's factual errors), insecurity, imposter syndrome, and determination. She is way out of her depth and knows it, but she has an ethical core and an insatiable curiosity that won't let her leave the central mysteries of the book alone. And the character dynamics are great; there are a lot of characters, including the competition problem of having to juggle eight contenders and give them all sufficient characterization to be meaningful, but this book uses its length to give each character some room to breathe. This is a long book, well over 600 pages, but it felt packed with events and plot twists. After every chapter I had to fight the urge to read just one more. The biggest drawback of this book is that it is very much the first book of a trilogy, none of the other volumes are out yet, and the ending is rather nasty. This is the sort of trilogy that opens with a whole lot of bad things happening, and while I am thoroughly hooked and will purchase the next volume as soon as it's available, I wish Hodgson had found a way to end the book on a somewhat more positive or hopeful note. The middle of the book was great; the end was a bit of an emotional slog, alas. The writing is good enough here that I'm fairly sure the depression will be worth it, but if you need your endings to be triumphant (and who could blame you in this moment in history), you may want to wait on this one until more volumes are out. Apart from that, though, this was a lot of fun. The Guardians felt like they came from a different strand of fantasy than you usually see in epic, more of a traditional folk tale vibe, which adds an intriguing twist to the epic fantasy setting. The characters all work, and Hodgson even pulls off some Game of Thrones style twists that make you sympathetic to characters you previously hated. The magic system apart from the Guardians felt underbaked, but the politics had more depth than a lot of fantasy novels. If you want the truly complex and twisty politics you would get from one of Guy Gavriel Kay's historical rewrites, you will come away disappointed, but it was good enough for me. And I did enjoy the Raven.
Respect, that's all we demand. Recognition of our magnificence. Offerings. Love. Fear. Trembling awe. Worship. Shiny things. Blood sacrifice, some of us very much enjoy blood sacrifice. Truly, we ask for so little.
Followed by an as-yet untitled sequel that I hope will materialize. Rating: 7 out of 10

20 October 2025

Matthew Garrett: Where are we on X Chat security?

AWS had an outage today and Signal was unavailable for some users for a while. This has confused some people, including Elon Musk, who are concerned that having a dependency on AWS means that Signal could somehow be compromised by anyone with sufficient influence over AWS (it can't). Which means we're back to the richest man in the world recommending his own "X Chat", saying The messages are fully encrypted with no advertising hooks or strange AWS dependencies such that I can t read your messages even if someone put a gun to my head.

Elon is either uninformed about his own product, lying, or both.

As I wrote back in June, X Chat genuinely end-to-end encrypted, but ownership of the keys is complicated. The encryption key is stored using the Juicebox protocol, sharded between multiple backends. Two of these are asserted to be HSM backed - a discussion of the commissioning ceremony was recently posted here. I have not watched the almost 7 hours of video to verify that this was performed correctly, and I also haven't been able to verify that the public keys included in the post were the keys generated during the ceremony, although that may be down to me just not finding the appropriate point in the video (sorry, Twitter's video hosting doesn't appear to have any skip feature and would frequently just sit spinning if I tried to seek to far and I should probably just download them and figure it out but I'm not doing that now). With enough effort it would probably also have been possible to fake the entire thing - I have no reason to believe that this has happened, but it's not externally verifiable.

But let's assume these published public keys are legitimately the ones used in the HSM Juicebox realms[1] and that everything was done correctly. Does that prevent Elon from obtaining your key and decrypting your messages? No.

On startup, the X Chat client makes an API call called GetPublicKeysResult, and the public keys of the realms are returned. Right now when I make that call I get the public keys listed above, so there's at least some indication that I'm going to be communicating with actual HSMs. But what if that API call returned different keys? Could Elon stick a proxy in front of the HSMs and grab a cleartext portion of the key shards? Yes, he absolutely could, and then he'd be able to decrypt your messages.

(I will accept that there is a plausible argument that Elon is telling the truth in that even if you held a gun to his head he's not smart enough to be able to do this himself, but that'd be true even if there were no security whatsoever, so it still says nothing about the security of his product)

The solution to this is remote attestation - a process where the device you're speaking to proves its identity to you. In theory the endpoint could attest that it's an HSM running this specific code, and we could look at the Juicebox repo and verify that it's that code and hasn't been tampered with, and then we'd know that our communication channel was secure. Elon hasn't done that, despite it being table stakes for this sort of thing (Signal uses remote attestation to verify the enclave code used for private contact discovery, for instance, which ensures that the client will refuse to hand over any data until it's verified the identity and state of the enclave). There's no excuse whatsoever to build a new end-to-end encrypted messenger which relies on a network service for security without providing a trustworthy mechanism to verify you're speaking to the real service.

We know how to do this properly. We have done for years. Launching without it is unforgivable.

[1] There are three Juicebox realms overall, one of which doesn't appear to use HSMs, but you need at least two in order to obtain the key so at least part of the key will always be held in HSMs

comment count unavailable comments

19 October 2025

Otto Kek l inen: Could the XZ backdoor have been detected with better Git and Debian packaging practices?

Featured image of post Could the XZ backdoor have been detected with better Git and Debian packaging practices?The discovery of a backdoor in XZ Utils in the spring of 2024 shocked the open source community, raising critical questions about software supply chain security. This post explores whether better Debian packaging practices could have detected this threat, offering a guide to auditing packages and suggesting future improvements. The XZ backdoor in versions 5.6.0/5.6.1 made its way briefly into many major Linux distributions such as Debian and Fedora, but luckily didn t reach that many actual users, as the backdoored releases were quickly removed thanks to the heroic diligence of Andres Freund. We are all extremely lucky that he detected a half a second performance regression in SSH, cared enough to trace it down, discovered malicious code in the XZ library loaded by SSH, and reported promtly to various security teams for quick coordinated actions. This episode makes software engineers pondering the following questions: As a Debian Developer, I decided to audit the xz package in Debian, share my methodology and findings in this post, and also suggest some improvements on how the software supply-chain security could be tightened in Debian specifically. Note that the scope here is only to inspect how Debian imports software from its upstreams, and how they are distributed to Debian s users. This excludes the whole story of how to assess if an upstream project is following software development security best practices. This post doesn t discuss how to operate an individual computer running Debian to ensure it remains untampered as there are plenty of guides on that already.

Downloading Debian and upstream source packages Let s start by working backwards from what the Debian package repositories offer for download. As auditing binaries is extremely complicated, we skip that, and assume the Debian build hosts are trustworthy and reliably building binaries from the source packages, and the focus should be on auditing the source code packages. As with everything in Debian, there are multiple tools and ways to do the same thing, but in this post only one (and hopefully the best) way to do something is presented for brevity. The first step is to download the latest version and some past versions of the package from the Debian archive, which is easiest done with debsnap. The following command will download all Debian source packages of xz-utils from Debian release 5.2.4-1 onwards:
$ debsnap --verbose --first 5.2.4-1 xz-utils
Getting json https://snapshot.debian.org/mr/package/xz-utils/
...
Getting dsc file xz-utils_5.2.4-1.dsc: https://snapshot.debian.org/file/a98271e4291bed8df795ce04d9dc8e4ce959462d
Getting file xz-utils_5.2.4.orig.tar.xz.asc: https://snapshot.debian.org/file/59ccbfb2405abe510999afef4b374cad30c09275
Getting file xz-utils_5.2.4-1.debian.tar.xz: https://snapshot.debian.org/file/667c14fd9409ca54c397b07d2d70140d6297393f
source-xz-utils/xz-utils_5.2.4-1.dsc:
Good signature found
validating xz-utils_5.2.4.orig.tar.xz
validating xz-utils_5.2.4.orig.tar.xz.asc
validating xz-utils_5.2.4-1.debian.tar.xz
All files validated successfully.
Once debsnap completes there will be a subfolder source-<package name> with the following types of files:
  • *.orig.tar.xz: source code from upstream
  • *.orig.tar.xz.asc: detached signature (if upstream signs their releases)
  • *.debian.tar.xz: Debian packaging source, i.e. the debian/ subdirectory contents
  • *.dsc: Debian source control file, including signature by Debian Developer/Maintainer
Example:
$ ls -1 source-xz-utils/
...
xz-utils_5.6.4.orig.tar.xz
xz-utils_5.6.4.orig.tar.xz.asc
xz-utils_5.6.4-1.debian.tar.xz
xz-utils_5.6.4-1.dsc
xz-utils_5.8.0.orig.tar.xz
xz-utils_5.8.0.orig.tar.xz.asc
xz-utils_5.8.0-1.debian.tar.xz
xz-utils_5.8.0-1.dsc
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
xz-utils_5.8.1-1.1.debian.tar.xz
xz-utils_5.8.1-1.1.dsc
xz-utils_5.8.1-1.debian.tar.xz
xz-utils_5.8.1-1.dsc
xz-utils_5.8.1-2.debian.tar.xz
xz-utils_5.8.1-2.dsc

Verifying authenticity of upstream and Debian sources using OpenPGP signatures As seen in the output of debsnap, it already automatically verifies that the downloaded files match the OpenPGP signatures. To have full clarity on what files were authenticated with what keys, we should verify the Debian packagers signature with:
$ gpg --verify --auto-key-retrieve --keyserver hkps://keyring.debian.org xz-utils_5.8.1-2.dsc
gpg: Signature made Fri Oct 3 22:04:44 2025 UTC
gpg: using RSA key 57892E705233051337F6FDD105641F175712FA5B
gpg: requesting key 05641F175712FA5B from hkps://keyring.debian.org
gpg: key 7B96E8162A8CF5D1: public key "Sebastian Andrzej Siewior" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Sebastian Andrzej Siewior" [unknown]
gpg: aka "Sebastian Andrzej Siewior <bigeasy@linutronix.de>" [unknown]
gpg: aka "Sebastian Andrzej Siewior <sebastian@breakpoint.cc>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 6425 4695 FFF0 AA44 66CC 19E6 7B96 E816 2A8C F5D1
Subkey fingerprint: 5789 2E70 5233 0513 37F6 FDD1 0564 1F17 5712 FA5B
The upstream tarball signature (if available) can be verified with:
$ gpg --verify --auto-key-retrieve xz-utils_5.8.1.orig.tar.xz.asc
gpg: assuming signed data in 'xz-utils_5.8.1.orig.tar.xz'
gpg: Signature made Thu Apr 3 11:38:23 2025 UTC
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: key 38EE757D69184620: public key "Lasse Collin <lasse.collin@tukaani.org>" imported
gpg: Total number processed: 1
gpg: imported: 1
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 3690 C240 CE51 B467 0D30 AD1C 38EE 757D 6918 4620
Note that this only proves that there is a key that created a valid signature for this content. The authenticity of the keys themselves need to be validated separately before trusting they in fact are the keys of these people. That can be done by checking e.g. the upstream website for what key fingerprints they published, or the Debian keyring for Debian Developers and Maintainers, or by relying on the OpenPGP web-of-trust .

Verifying authenticity of upstream sources by comparing checksums In case the upstream in question does not publish release signatures, the second best way to verify the authenticity of the sources used in Debian is to download the sources directly from upstream and compare that the sha256 checksums match. This should be done using the debian/watch file inside the Debian packaging, which defines where the upstream source is downloaded from. Continuing on the example situation above, we can unpack the latest Debian sources, enter and then run uscan to download:
$ tar xvf xz-utils_5.8.1-2.debian.tar.xz
...
debian/rules
debian/source/format
debian/source.lintian-overrides
debian/symbols
debian/tests/control
debian/tests/testsuite
debian/upstream/signing-key.asc
debian/watch
...
$ uscan --download-current-version --destdir /tmp
Newest version of xz-utils on remote site is 5.8.1, specified download version is 5.8.1
gpgv: Signature made Thu Apr 3 11:38:23 2025 UTC
gpgv: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpgv: Good signature from "Lasse Collin <lasse.collin@tukaani.org>"
Successfully symlinked /tmp/xz-5.8.1.tar.xz to /tmp/xz-utils_5.8.1.orig.tar.xz.
The original files downloaded from upstream are now in /tmp along with the files renamed to follow Debian conventions. Using everything downloaded so far the sha256 checksums can be compared across the files and also to what the .dsc file advertised:
$ ls -1 /tmp/
xz-5.8.1.tar.xz
xz-5.8.1.tar.xz.sig
xz-utils_5.8.1.orig.tar.xz
xz-utils_5.8.1.orig.tar.xz.asc
$ sha256sum xz-utils_5.8.1.orig.tar.xz /tmp/xz-5.8.1.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e xz-utils_5.8.1.orig.tar.xz
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e /tmp/xz-5.8.1.tar.xz
$ grep -A 3 Sha256 xz-utils_5.8.1-2.dsc
Checksums-Sha256:
0b54f79df85912504de0b14aec7971e3f964491af1812d83447005807513cd9e 1461872 xz-utils_5.8.1.orig.tar.xz
4138f4ceca1aa7fd2085fb15a23f6d495d27bca6d3c49c429a8520ea622c27ae 833 xz-utils_5.8.1.orig.tar.xz.asc
3ed458da17e4023ec45b2c398480ed4fe6a7bfc1d108675ec837b5ca9a4b5ccb 24648 xz-utils_5.8.1-2.debian.tar.xz
In the example above the checksum 0b54f79df85... is the same across the files, so it is a match.

Repackaged upstream sources can t be verified as easily Note that uscan may in rare cases repackage some upstream sources, for example to exclude files that don t adhere to Debian s copyright and licensing requirements. Those files and paths would be listed under the Files-Excluded section in the debian/copyright file. There are also other situations where the file that represents the upstream sources in Debian isn t bit-by-bit the same as what upstream published. If checksums don t match, an experienced Debian Developer should review all package settings (e.g. debian/source/options) to see if there was a valid and intentional reason for divergence.

Reviewing changes between two source packages using diffoscope Diffoscope is an incredibly capable and handy tool to compare arbitrary files. For example, to view a report in HTML format of the differences between two XZ releases, run:
diffoscope --html-dir xz-utils-5.6.4_vs_5.8.0 xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
browse xz-utils-5.6.4_vs_5.8.0/index.html
Inspecting diffoscope output of differences between two XZ Utils releases If the changes are extensive, and you want to use a LLM to help spot potential security issues, generate the report of both the upstream and Debian packaging differences in Markdown with:
diffoscope --markdown diffoscope-debian.md xz-utils_5.6.4-1.debian.tar.xz xz-utils_5.8.1-2.debian.tar.xz
diffoscope --markdown diffoscope.md xz-utils_5.6.4.orig.tar.xz xz-utils_5.8.0.orig.tar.xz
The Markdown files created above can then be passed to your favorite LLM, along with a prompt such as:
Based on the attached diffoscope output for a new Debian package version compared with the previous one, list all suspicious changes that might have introduced a backdoor, followed by other potential security issues. If there are none, list a short summary of changes as the conclusion.

Reviewing Debian source packages in version control As of today only 93% of all Debian source packages are tracked in git on Debian s GitLab instance at salsa.debian.org. Some key packages such as Coreutils and Bash are not using version control at all, as their maintainers apparently don t see value in using git for Debian packaging, and the Debian Policy does not require it. Thus, the only reliable and consistent way to audit changes in Debian packages is to compare the full versions from the archive as shown above. However, for packages that are hosted on Salsa, one can view the git history to gain additional insight into what exactly changed, when and why. For packages that are using version control, their location can be found in the Git-Vcs header in the debian/control file. For xz-utils the location is salsa.debian.org/debian/xz-utils. Note that the Debian policy does not state anything about how Salsa should be used, or what git repository layout or development practices to follow. In practice most packages follow the DEP-14 proposal, and use git-buildpackage as the tool for managing changes and pushing and pulling them between upstream and salsa.debian.org. To get the XZ Utils source, run:
$ gbp clone https://salsa.debian.org/debian/xz-utils.git
gbp:info: Cloning from 'https://salsa.debian.org/debian/xz-utils.git'
At the time of writing this post the git history shows:
$ git log --graph --oneline
* bb787585 (HEAD -> debian/unstable, origin/debian/unstable, origin/HEAD) Prepare 5.8.1-2
* 4b769547 d: Remove the symlinks from -dev package.
* a39f3428 Correct the nocheck build profile
* 1b806b8d Import Debian changes 5.8.1-1.1
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
 \
  * fa1e8796 (origin/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
  * a522a226 Bump version and soname for 5.8.1
  * 1c462c2a Add NEWS for 5.8.1
  * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
  * 48440e24 Tests: Add a fuzzing target for the multithreaded .xz decoder
  * 0c80045a liblzma: mt dec: Fix lack of parallelization in single-shot decoding
  * 81880488 liblzma: mt dec: Don't modify thr->in_size in the worker thread
  * d5a2ffe4 liblzma: mt dec: Don't free the input buffer too early (CVE-2025-31115)
  * c0c83596 liblzma: mt dec: Simplify by removing the THR_STOP state
  * 831b55b9 liblzma: mt dec: Fix a comment
  * b9d168ee liblzma: Add assertions to lzma_bufcpy()
  * c8e0a489 DOS: Update Makefile to fix the build
  * 307c02ed sysdefs.h: Avoid <stdalign.h> even with C11 compilers
  * 7ce38b31 Update THANKS
  * 688e51bd Translations: Update the Croatian translation
*   a6b54dde Prepare 5.8.0-1.
*   77d9470f Add 5.8 symbols.
*   9268eb66 Import 5.8.0
*   6f85ef4f Update upstream source from tag 'upstream/5.8.0'
 \ \
  *   afba662b New upstream version 5.8.0
   /
  * 173fb5c6 doc/SHA256SUMS: Add 5.8.0
  * db9258e8 Bump version and soname for 5.8.0
  * bfb752a3 Add NEWS for 5.8.0
  * 6ccbb904 Translations: Run "make -C po update-po"
  * 891a5f05 Translations: Run po4a/update-po
  * 4f52e738 Translations: Partially fix overtranslation in Serbian man pages
  * ff5d9447 liblzma: Count the extra bytes in LZMA/LZMA2 decoder memory usage
  * 943b012d liblzma: Use SSE2 intrinsics instead of memcpy() in dict_repeat()
This shows both the changes on the debian/unstable branch as well as the intermediate upstream import branch, and the actual real upstream development branch. See my Debian source packages in git explainer for details of what these branches are used for. To only view changes on the Debian branch, run git log --graph --oneline --first-parent or git log --graph --oneline -- debian. The Debian branch should only have changes inside the debian/ subdirectory, which is easy to check with:
$ git diff --stat upstream/v5.8
debian/README.source   16 +++
debian/autogen.sh   32 +++++
debian/changelog   949 ++++++++++++++++++++++++++
...
debian/upstream/signing-key.asc   52 +++++++++
debian/watch   4 +
debian/xz-utils.README.Debian   47 ++++++++
debian/xz-utils.docs   6 +
debian/xz-utils.install   28 +++++
debian/xz-utils.postinst   19 +++
debian/xz-utils.prerm   10 ++
debian/xzdec.docs   6 +
debian/xzdec.install   4 +
33 files changed, 2014 insertions(+)
All the files outside the debian/ directory originate from upstream, and for example running git blame on them should show only upstream commits:
$ git blame CMakeLists.txt
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 1) # SPDX-License-Identifier: 0BSD
22af94128 (Lasse Collin 2024-02-12 17:09:10 +0200 2)
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 3) ###############
7e3493d40 (Lasse Collin 2020-02-24 23:38:16 +0200 4) #
426bdc709 (Lasse Collin 2024-02-17 21:45:07 +0200 5) # CMake support for building XZ Utils
If the upstream in question signs commits or tags, they can be verified with e.g.:
$ git verify-tag v5.6.2
gpg: Signature made Wed 29 May 2024 09:39:42 AM PDT
gpg: using RSA key 3690C240CE51B4670D30AD1C38EE757D69184620
gpg: issuer "lasse.collin@tukaani.org"
gpg: Good signature from "Lasse Collin <lasse.collin@tukaani.org>" [expired]
gpg: Note: This key has expired!
The main benefit of reviewing changes in git is the ability to see detailed information about each individual change, instead of just staring at a massive list of changes without any explanations. In this example, to view all the upstream commits since the previous import to Debian, one would view the commit range from afba662b New upstream version 5.8.0 to fa1e8796 New upstream version 5.8.1 with git log --reverse -p afba662b...fa1e8796. However, a far superior way to review changes would be to browse this range using a visual git history viewer, such as gitk. Either way, looking at one code change at a time and reading the git commit message makes the review much easier. Browsing git history in gitk --all

Comparing Debian source packages to git contents As stated in the beginning of the previous section, and worth repeating, there is no guarantee that the contents in the Debian packaging git repository matches what was actually uploaded to Debian. While the tag2upload project in Debian is getting more and more popular, Debian is still far from having any system to enforce that the git repository would be in sync with the Debian archive contents. To detect such differences we can run diff across the Debian source packages downloaded with debsnap earlier (path source-xz-utils/xz-utils_5.8.1-2.debian) and the git repository cloned in the previous section (path xz-utils):
diff
$ diff -u source-xz-utils/xz-utils_5.8.1-2.debian/ xz-utils/debian/
diff -u source-xz-utils/xz-utils_5.8.1-2.debian/changelog xz-utils/debian/changelog
--- debsnap/source-xz-utils/xz-utils_5.8.1-2.debian/changelog 2025-10-03 09:32:16.000000000 -0700
+++ xz-utils/debian/changelog 2025-10-12 12:18:04.623054758 -0700
@@ -5,7 +5,7 @@
 * Remove the symlinks from -dev, pointing to the lib package.
 (Closes: #1109354)

- -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:32:16 +0200
+ -- Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Fri, 03 Oct 2025 18:36:59 +0200
In the case above diff revealed that the timestamp in the changelog in the version uploaded to Debian is different from what was committed to git. This is not malicious, just a mistake by the maintainer who probably didn t run gbp tag immediately after upload, but instead some dch command and ended up with having a different timestamps in the git compared to what was actually uploaded to Debian.

Creating syntetic Debian packaging git repositories If no Debian packaging git repository exists, or if it is lagging behind what was uploaded to Debian s archive, one can use git-buildpackage s import-dscs feature to create synthetic git commits based on the files downloaded by debsnap, ensuring the git contents fully matches what was uploaded to the archive. To import a single version there is gbp import-dsc (no s at the end), of which an example invocation would be:
$ gbp import-dsc --verbose ../source-xz-utils/xz-utils_5.8.1-2.dsc
Version '5.8.1-2' imported under '/home/otto/debian/xz-utils-2025-09-29'
Example commit history from a repository with commits added with gbp import-dsc:
$ git log --graph --oneline
* 86aed07b (HEAD -> debian/unstable, tag: debian/5.8.1-2, origin/debian/unstable) Import Debian changes 5.8.1-2
* f111d93b (tag: debian/5.8.1-1.1) Import Debian changes 5.8.1-1.1
* 1106e19b (tag: debian/5.8.1-1) Import Debian changes 5.8.1-1
 \
  * 08edbe38 (tag: upstream/5.8.1, origin/upstream/v5.8, upstream/v5.8) Import Upstream version 5.8.1
   \
    * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
    * 1c462c2a Add NEWS for 5.8.1
    * 513cabcf Tests: Call lzma_code() in smaller chunks in fuzz_common.h
An online example repository with only a few missing uploads added using gbp import-dsc can be viewed at salsa.debian.org/otto/xz-utils-2025-09-29/-/network/debian%2Funstable An example repository that was fully crafted using gbp import-dscs can be viewed at salsa.debian.org/otto/xz-utils-gbp-import-dscs-debsnap-generated/-/network/debian%2Flatest. There exists also dgit, which in a similar way creates a synthetic git history to allow viewing the Debian archive contents via git tools. However, its focus is on producing new package versions, so fetching a package with dgit that has not had the history recorded in dgit earlier will only show the latest versions:
$ dgit clone xz-utils
canonical suite name for unstable is sid
starting new git history
last upload to archive: NO git hash
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz...
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1.orig.tar.xz.asc...
downloading http://ftp.debian.org/debian//pool/main/x/xz-utils/xz-utils_5.8.1-2.debian.tar.xz...
dpkg-source: info: extracting xz-utils in unpacked
dpkg-source: info: unpacking xz-utils_5.8.1.orig.tar.xz
dpkg-source: info: unpacking xz-utils_5.8.1-2.debian.tar.xz
synthesised git commit from .dsc 5.8.1-2
HEAD is now at f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium
dgit ok: ready for work in xz-utils
$ dgit/sid   git log --graph --oneline
* f9bcaf7 xz-utils (5.8.1-2) unstable; urgency=medium 9 days ago (HEAD -> dgit/sid, dgit/dgit/sid)
 \
  * 11d3a62 Import xz-utils_5.8.1-2.debian.tar.xz 9 days ago
* 15dcd95 Import xz-utils_5.8.1.orig.tar.xz 6 months ago
Unlike git-buildpackage managed git repositories, the dgit managed repositories cannot incorporate the upstream git history and are thus less useful for auditing the full software supply-chain in git.

Comparing upstream source packages to git contents Equally important to the note in the beginning of the previous section, one must also keep in mind that the upstream release source packages, often called release tarballs, are not guaranteed to have the exact same contents as the upstream git repository. Projects might strip out test data or extra development files from their release tarballs to avoid shipping unnecessary files to users, or projects might add documentation files or versioning information into the tarball that isn t stored in git. While a small minority, there are also upstreams that don t use git at all, so the plain files in a release tarball is still the lowest common denominator for all open source software projects, and exporting and importing source code needs to interface with it. In the case of XZ, the release tarball has additional version info and also a sizeable amount of pregenerated compiler configuration files. Detecting and comparing differences between git contents and tarballs can of course be done manually by running diff across an unpacked tarball and a checked out git repository. If using git-buildpackage, the difference between the git contents and tarball contents can be made visible directly in the import commit. In this XZ example, consider this git history:
* b1cad34b Prepare 5.8.1-1
* a8646015 Import 5.8.1
* 2808ec2d Update upstream source from tag 'upstream/5.8.1'
 \
  * fa1e8796 (debian/upstream/v5.8, upstream/v5.8) New upstream version 5.8.1
  * a522a226 (tag: v5.8.1) Bump version and soname for 5.8.1
  * 1c462c2a Add NEWS for 5.8.1
The commit a522a226 was the upstream release commit, which upstream also tagged v5.8.1. The merge commit 2808ec2d applied the new upstream import branch contents on the Debian branch. Between these is the special commit fa1e8796 New upstream version 5.8.1 tagged upstream/v5.8. This commit and tag exists only in the Debian packaging repository, and they show what is the contents imported into Debian. This is generated automatically by git-buildpackage when running git import-orig --uscan for Debian packages with the correct settings in debian/gbp.conf. By viewing this commit one can see exactly how the upstream release tarball differs from the upstream git contents (if at all). In the case of XZ, the difference is substantial, and shown below in full as it is very interesting:
$ git show --stat fa1e8796
commit fa1e8796dabd91a0f667b9e90f9841825225413a
(debian/upstream/v5.8, upstream/v5.8)
Author: Sebastian Andrzej Siewior <sebastian@breakpoint.cc>
Date: Thu Apr 3 22:58:39 2025 +0200
New upstream version 5.8.1
.codespellrc   30 -
.gitattributes   8 -
.github/workflows/ci.yml   163 -
.github/workflows/freebsd.yml   32 -
.github/workflows/netbsd.yml   32 -
.github/workflows/openbsd.yml   35 -
.github/workflows/solaris.yml   32 -
.github/workflows/windows-ci.yml   124 -
.gitignore   113 -
ABOUT-NLS   1 +
ChangeLog   17392 +++++++++++++++++++++
Makefile.in   1097 +++++++
aclocal.m4   1353 ++++++++
build-aux/ci_build.bash   286 --
build-aux/compile   351 ++
build-aux/config.guess   1815 ++++++++++
build-aux/config.rpath   751 +++++
build-aux/config.sub   2354 +++++++++++++
build-aux/depcomp   792 +++++
build-aux/install-sh   541 +++
build-aux/ltmain.sh   11524 ++++++++++++++++++++++
build-aux/missing   236 ++
build-aux/test-driver   160 +
config.h.in   634 ++++
configure   26434 ++++++++++++++++++++++
debug/Makefile.in   756 +++++
doc/SHA256SUMS   236 --
doc/man/txt/lzmainfo.txt   36 +
doc/man/txt/xz.txt   1708 ++++++++++
doc/man/txt/xzdec.txt   76 +
doc/man/txt/xzdiff.txt   39 +
doc/man/txt/xzgrep.txt   70 +
doc/man/txt/xzless.txt   36 +
doc/man/txt/xzmore.txt   31 +
lib/Makefile.in   623 ++++
m4/.gitignore   40 -
m4/build-to-host.m4   274 ++
m4/gettext.m4   392 +++
m4/host-cpu-c-abi.m4   529 +++
m4/iconv.m4   324 ++
m4/intlmacosx.m4   71 +
m4/lib-ld.m4   170 +
m4/lib-link.m4   815 +++++
m4/lib-prefix.m4   334 ++
m4/libtool.m4   8488 +++++++++++++++++++++
m4/ltoptions.m4   467 +++
m4/ltsugar.m4   124 +
m4/ltversion.m4   24 +
m4/lt~obsolete.m4   99 +
m4/nls.m4   33 +
m4/po.m4   456 +++
m4/progtest.m4   92 +
po/.gitignore   31 -
po/Makefile.in.in   517 +++
po/Rules-quot   66 +
po/boldquot.sed   21 +
po/ca.gmo   Bin 0 -> 15587 bytes
po/cs.gmo   Bin 0 -> 7983 bytes
po/da.gmo   Bin 0 -> 9040 bytes
po/de.gmo   Bin 0 -> 29882 bytes
po/en@boldquot.header   35 +
po/en@quot.header   32 +
po/eo.gmo   Bin 0 -> 15060 bytes
po/es.gmo   Bin 0 -> 29228 bytes
po/fi.gmo   Bin 0 -> 28225 bytes
po/fr.gmo   Bin 0 -> 10232 bytes
To be able to easily inspect exactly what changed in the release tarball compared to git release tag contents, the best tool for the job is Meld, invoked via git difftool --dir-diff fa1e8796^..fa1e8796. Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between git release tag and release tarball contents To compare changes across the new and old upstream tarball, one would need to compare commits afba662b New upstream version 5.8.0 and fa1e8796 New upstream version 5.8.1 by running git difftool --dir-diff afba662b..fa1e8796. Meld invoked by git difftool --dir-diff afba662b..fa1e8796 to show differences between to upstream release tarball contents With all the above tips you can now go and try to audit your own favorite package in Debian and see if it is identical with upstream, and if not, how it differs.

Should the XZ backdoor have been detected using these tools? The famous XZ Utils backdoor (CVE-2024-3094) consisted of two parts: the actual backdoor inside two binary blobs masqueraded as a test files (tests/files/bad-3-corrupt_lzma2.xz, tests/files/good-large_compressed.lzma), and a small modification in the build scripts (m4/build-to-host.m4) to extract the backdoor and plant it into the built binary. The build script was not tracked in version control, but generated with GNU Autotools at release time and only shipped as additional files in the release tarball. The entire reason for me to write this post was to ponder if a diligent engineer using git-buildpackage best practices could have reasonably spotted this while importing the new upstream release into Debian. The short answer is no . The malicious actor here clearly anticipated all the typical ways anyone might inspect both git commits, and release tarball contents, and masqueraded the changes very well and over a long timespan. First of all, XZ has for legitimate reasons for several carefully crafted .xz files as test data to help catch regressions in the decompression code path. The test files are shipped in the release so users can run the test suite and validate that the binary is built correctly and xz works properly. Debian famously runs massive amounts of testing in its CI and autopkgtest system across tens of thousands of packages to uphold high quality despite frequent upgrades of the build toolchain and while supporting more CPU architectures than any other distro. Test data is useful and should stay. When git-buildpackage is used correctly, the upstream commits are visible in the Debian packaging for easy review, but the commit cf44e4b that introduced the test files does not deviate enough from regular sloppy coding practices to really stand out. It is unfortunately very common for git commit to lack a message body explaining why the change was done, and to not be properly atomic with test code and test data together in the same commit, and for commits to be pushed directly to mainline without using code reviews (the commit was not part of any PR in this case). Only another upstream developer could have spotted that this change is not on par to what the project expects, and that the test code was never added, only test data, and thus that this commit was not just a sloppy one but potentially malicious. Secondly, the fact that a new Autotools file appeared (m4/build-to-host.m4) in the XZ Utils 5.6.0 is not suspicious. This is perfectly normal for Autotools. In fact, starting from XZ Utils version 5.8.1 it is now shipping a m4/build-to-host.m4 file that it actually uses now. Spotting that there is anything fishy is practically impossible by simply reading the code, as Autotools files are full custom m4 syntax interwoven with shell script, and there are plenty of backticks ( ) that spawn subshells and evals that execute variable contents further, which is just normal for Autotools. Russ Cox s XZ post explains how exactly the Autotools code fetched the actual backdoor from the test files and injected it into the build. Inspecting the m4/build-to-host.m4 changes in Meld launched via git difftool There is only one tiny thing that maybe a very experienced Autotools user could potentially have noticed: the serial 30 in the version header is way too high. In theory one could also have noticed this Autotools file deviates from what other packages in Debian ship with the same filename, such as e.g. the serial 3, serial 5a or 5b versions. That would however require and an insane amount extra checking work, and is not something we should plan to start doing. A much simpler solution would be to simply strongly recommend all open source projects to stop using Autotools to eventually get rid of it entirely.

Not detectable with reasonable effort While planting backdoors is evil, it is hard not to feel some respect to the level of skill and dedication of the people behind this. I ve been involved in a bunch of security breach investigations during my IT career, and never have I seen anything this well executed. If it hadn t slowed down SSH by ~500 milliseconds and been discovered due to that, it would most likely have stayed undetected for months or years. Hiding backdoors in closed source software is relatively trivial, but hiding backdoors in plain sight in a popular open source project requires some unusual amount of expertise and creativity as shown above.

Is the software supply-chain in Debian easy to audit? While maintaining a Debian package source using git-buildpackage can make the package history a lot easier to inspect, most packages have incomplete configurations in their debian/gbp.conf, and thus their package development histories are not always correctly constructed or uniform and easy to compare. The Debian Policy does not mandate git usage at all, and there are many important packages that are not using git at all. Additionally the Debian Policy also allows for non-maintainers to upload new versions to Debian without committing anything in git even for packages where the original maintainer wanted to use git. Uploads that bypass git unfortunately happen surpisingly often. Because of the situation, I am afraid that we could have multiple similar backdoors lurking that simply haven t been detected yet. More audits, that hopefully also get published openly, would be welcome! More people auditing the contents of the Debian archives would probably also help surface what tools and policies Debian might be missing to make the work easier, and thus help improve the security of Debian s users, and improve trust in Debian.

Is Debian currently missing some software that could help detect similar things? To my knowledge there is currently no system in place as part of Debian s QA or security infrastructure to verify that the upstream source packages in Debian are actually from upstream. I ve come across a lot of packages where the debian/watch or other configs are incorrect and even cases where maintainers have manually created upstream tarballs as it was easier than configuring automation to work. It is obvious that for those packages the source tarball now in Debian is not at all the same as upstream. I am not aware of any malicious cases though (if I was, I would report them of course). I am also aware of packages in the Debian repository that are misconfigured to be of type 1.0 (native) packages, mixing the upstream files and debian/ contents and having patches applied, while they actually should be configured as 3.0 (quilt), and not hide what is the true upstream sources. Debian should extend the QA tools to scan for such things. If I find a sponsor, I might build it myself as my next major contribution to Debian. In addition to better tooling for finding mismatches in the source code, Debian could also have better tooling for tracking in built binaries what their source files were, but solutions like Fraunhofer-AISEC s supply-graph or Sony s ESSTRA are not practical yet. Julien Malka s post about NixOS discusses the role of reproducible builds, which may help in some cases across all distros.

Or, is Debian missing some policies or practices to mitigate this? Perhaps more importantly than more security scanning, the Debian Developer community should switch the general mindset from anyone is free to do anything to valuing having more shared workflows. The ability to audit anything is severely hampered by the fact that there are so many ways to do the same thing, and distinguishing what is a normal deviation from a malicious deviation is too hard, as the normal can basically be almost anything. Also, as there is no documented and recommended default workflow, both those who are old and new to Debian packaging might never learn any one optimal workflow, and end up doing many steps in the packaging process in a way that kind of works, but is actually wrong or unnecessary, causing process deviations that look malicious, but turn out to just be a result of not fully understanding what would have been the right way to do something. In the long run, once individual developers workflows are more aligned, doing code reviews will become a lot easier and smoother as the excess noise of workflow differences diminishes and reviews will feel much more productive to all participants. Debian fostering a culture of code reviews would allow us to slowly move from the current practice of mainly solo packaging work towards true collaboration forming around those code reviews. I have been promoting increased use of Merge Requests in Debian already for some time, for example by proposing DEP-18: Encourage Continuous Integration and Merge Request based Collaboration for Debian packages. If you are involved in Debian development, please give a thumbs up in dep-team/deps!21 if you want me to continue promoting it.

Can we trust open source software? Yes and I would argue that we can only trust open source software. There is no way to audit closed source software, and anyone using e.g. Windows or MacOS just have to trust the vendor s word when they say they have no intentional or accidental backdoors in their software. Or, when the news gets out that the systems of a closed source vendor was compromised, like Crowdstrike some weeks ago, we can t audit anything, and time after time we simply need to take their word when they say they have properly cleaned up their code base. In theory, a vendor could give some kind of contractual or financial guarantee to its customer that there are no preventable security issues, but in practice that never happens. I am not aware of a single case of e.g. Microsoft or Oracle would have paid damages to their customers after a security flaw was found in their software. In theory you could also pay a vendor more to have them focus more effort in security, but since there is no way to verify what they did, or to get compensation when they didn t, any increased fees are likely just pocketed as increased profit. Open source is clearly better overall. You can, if you are an individual with the time and skills, audit every step in the supply-chain, or you could as an organization make investments in open source security improvements and actually verify what changes were made and how security improved. If your organisation is using Debian (or derivatives, such as Ubuntu) and you are interested in sponsoring my work to improve Debian, please reach out.

8 September 2025

Michael Ablassmeier: Vagrant images for trixie

It s no news that the vagrant license has changed while ago, which resulted in less motivation to maintain it in Debian (understandably). Unfortunately this means there are currently no official vagrant images for Debian trixie, for reasons Of course there are various boxes floating around on hashicorp s vagrant cloud, but either they do not fit my needs (too big) or i don t consider them trustworthy enough Building the images using the existing toolset is quite straight forward. The required scripts are maintained in the Debian Vagrant images repository. With a few additional changes applied and following the instructions of the README, you can build the images yourself. For me, the built images work like expected.

11 July 2025

Jamie McClelland: Avoiding Apache Max Request Workers Errors

Wow, I hate this error:
AH00484: server reached MaxRequestWorkers setting, consider raising the MaxRequestWorkers setting
For starters, it means I have to relearn how MaxRequestWorkers functions in Apache:
For threaded and hybrid servers (e.g. event or worker), MaxRequestWorkers restricts the total number of threads that will be available to serve clients. For hybrid MPMs, the default value is 16 (ServerLimit) multiplied by the value of 25 (ThreadsPerChild). Therefore, to increase MaxRequestWorkers to a value that requires more than 16 processes, you must also raise ServerLimit.
Ok remind me what ServerLimit refers to?
For the prefork MPM, this directive sets the maximum configured value for MaxRequestWorkers for the lifetime of the Apache httpd process. For the worker and event MPMs, this directive in combination with ThreadLimit sets the maximum configured value for MaxRequestWorkers for the lifetime of the Apache httpd process. For the event MPM, this directive also defines how many old server processes may keep running and finish processing open connections. Any attempts to change this directive during a restart will be ignored, but MaxRequestWorkers can be modified during a restart. Special care must be taken when using this directive. If ServerLimit is set to a value much higher than necessary, extra, unused shared memory will be allocated. If both ServerLimit and MaxRequestWorkers are set to values higher than the system can handle, Apache httpd may not start or the system may become unstable. With the prefork MPM, use this directive only if you need to set MaxRequestWorkers higher than 256 (default). Do not set the value of this directive any higher than what you might want to set MaxRequestWorkers to. With worker, use this directive only if your MaxRequestWorkers and ThreadsPerChild settings require more than 16 server processes (default). Do not set the value of this directive any higher than the number of server processes required by what you may want for MaxRequestWorkers and ThreadsPerChild. With event, increase this directive if the process number defined by your MaxRequestWorkers and ThreadsPerChild settings, plus the number of gracefully shutting down processes, is more than 16 server processes (default).
Got it? In other words, you can consider raising the MaxRequestWorkers setting all you want, but you can t just change that setting, you have to read about several other compliated settings, do some math, and spend a lot of time wondering if you are going to remember what you just did and how to undo it if you blow up your server. On the plus side, typically, nobody should increase this limit - because if the server runs out of connections, it usually means something else is wrong. In our case, on a shared web server running Apache2 and PHP-FPM, it s usually because a single web site has gone out of control. But wait! How can that happen, we are using PHP-FPM s max_children setting to prevent a single PHP web site from taking down the server? After years of struggling with this problem I have finally made some headway. Our PHP pool configuration typically looks like this:
user = site342999writer
group = site342999writer
listen = /run/php/8.1-site342999.sock
listen.owner = www-data
listen.group = www-data
pm = ondemand
pm.max_children = 12
pm.max_requests = 500
php_admin_value[memory_limit] = 256M
And we invoke PHP-FPM via this apache snippet:
<FilesMatch \.php$>
        SetHandler "proxy:unix:/var/run/php/8.1-site342999.sock fcgi://localhost"
</FilesMatch>
With these settings in place, what happens when we use up all 12 max_children? According to the docs:
By default, mod_proxy will allow and retain the maximum number of connections that could be used simultaneously by that web server child process. Use the max parameter to reduce the number from the default. The pool of connections is maintained per web server child process, and max and other settings are not coordinated among all child processes, except when only one child process is allowed by configuration or MPM design.
The max parameter seems to default to the ThreadsPerChild, so it seems that the default here is to allow any web site to consume ThreadsPerChild (25) x ServerLimit (16), which is also the max number of over all connections. Not great. To make matter worse, there is another setting available which is mysteriously called acquire:
If set, this will be the maximum time to wait for a free connection in the connection pool, in milliseconds. If there are no free connections in the pool, the Apache httpd will return SERVER_BUSY status to the client.
By default this is not set which seems to suggest Apache will just hang on to connections forever until a free PHP process becomes available (or some other time out happens). So, let s try something different:
 <Proxy "fcgi://localhost">
    ProxySet acquire=1 max=12
  </proxy>
This snippet is the way you can configure the proxy configuration we setup in the SetHandler statement above. It s documented on the Apache mod_proxy page. Now we limit the maximum pool size per process to half of what is available for the entire server and we tell Apache to immediately throw a 503 error if we have exceeded our maximum number of connecitons. Now, if a site is overwhelmed with traffic, instead of maxing out the available Apache connections while leaving user with constantly spinning browsers, the users will get 503 ed and the server will be able to server other sites.

16 June 2025

Paul Tagliamonte: The Promised LAN

The Internet has changed a lot in the last 40+ years. Fads have come and gone. Network protocols have been designed, deployed, adopted, and abandoned. Industries have come and gone. The types of people on the internet have changed a lot. The number of people on the internet has changed a lot, creating an information medium unlike anything ever seen before in human history. There s a lot of good things about the Internet as of 2025, but there s also an inescapable hole in what it used to be, for me. I miss being able to throw a site up to send around to friends to play with without worrying about hordes of AI-feeding HTML combine harvesters DoS-ing my website, costing me thousands in network transfer for the privilege. I miss being able to put a lightly authenticated game server up and not worry too much at night wondering if that process is now mining bitcoin. I miss being able to run a server in my home closet. Decades of cat and mouse games have rendered running a mail server nearly impossible. Those who are brave enough to try are met with weekslong stretches of delivery failures and countless hours yelling ineffectually into a pipe that leads from the cheerful lobby of some disinterested corporation directly into a void somewhere 4 layers below ground level. I miss the spirit of curiosity, exploration, and trying new things. I miss building things for fun without having to worry about being too successful, after which security offices start demanding my supplier paperwork in triplicate as heartfelt thanks from their engineering teams. I miss communities that are run because it is important to them, not for ad revenue. I miss community operated spaces and having more than four websites that are all full of nothing except screenshots of each other. Every other page I find myself on now has an AI generated click-bait title, shared for rage-clicks all brought-to-you-by-our-sponsors completely covered wall-to-wall with popup modals, telling me how much they respect my privacy, with the real content hidden at the bottom bracketed by deceptive ads served by companies that definitely know which new coffee shop I went to last month. This is wrong, and those who have seen what was know it. I can t keep doing it. I m not doing it any more. I reject the notion that this is as it needs to be. It is wrong. The hole left in what the Internet used to be must be filled. I will fill it.

What comes before part b? Throughout the 2000s, some of my favorite memories were from LAN parties at my friends places. Dragging your setup somewhere, long nights playing games, goofing off, even building software all night to get something working being able to do something fiercely technical in the context of a uniquely social activity. It wasn t really much about the games or the projects it was an excuse to spend time together, just hanging out. A huge reason I learned so much in college was that campus was a non-stop LAN party we could freely stand up servers, talk between dorms on the LAN, and hit my dorm room computer from the lab. Things could go from individual to social in the matter of seconds. The Internet used to work this way my dorm had public IPs handed out by DHCP, and my workstation could serve traffic from anywhere on the internet. I haven t been back to campus in a few years, but I d be surprised if this were still the case. In December of 2021, three of us got together and connected our houses together in what we now call The Promised LAN. The idea is simple fill the hole we feel is gone from our lives. Build our own always-on 24/7 nonstop LAN party. Build a space that is intrinsically social, even though we re doing technical things. We can freely host insecure game servers or one-off side projects without worrying about what someone will do with it. Over the years, it s evolved very slowly we haven t pulled any all-nighters. Our mantra has become old growth , building each layer carefully. As of May 2025, the LAN is now 19 friends running around 25 network segments. Those 25 networks are connected to 3 backbone nodes, exchanging routes and IP traffic for the LAN. We refer to the set of backbone operators as The Bureau of LAN Management . Combined decades of operating critical infrastructure has driven The Bureau to make a set of well-understood, boring, predictable, interoperable and easily debuggable decisions to make this all happen. Nothing here is exotic or even technically interesting.

Applications of trusting trust The hardest part, however, is rejecting the idea that anything outside our own LAN is untrustworthy nearly irreversible damage inflicted on us by the Internet. We have solved this by not solving it. We strictly control membership the absolute hard minimum for joining the LAN requires 10 years of friendship with at least one member of the Bureau, with another 10 years of friendship planned. Members of the LAN can veto new members even if all other criteria is met. Even with those strict rules, there s no shortage of friends that meet the qualifications but we are not equipped to take that many folks on. It s hard to join -both socially and technically. Doing something malicious on the LAN requires a lot of highly technical effort upfront, and it would endanger a decade of friendship. We have relied on those human, social, interpersonal bonds to bring us all together. It s worked for the last 4 years, and it should continue working until we think of something better. We assume roommates, partners, kids, and visitors all have access to The Promised LAN. If they re let into our friends' network, there is a level of trust that works transitively for us I trust them to be on mine. This LAN is not for security , rather, the network border is a social one. Benign hacking in the original sense of misusing systems to do fun and interesting things is encouraged. Robust ACLs and firewalls on the LAN are, by definition, an interpersonal not technical failure. We all trust every other network operator to run their segment in a way that aligns with our collective values and norms. Over the last 4 years, we ve grown our own culture and fads around half of the people on the LAN have thermal receipt printers with open access, for printing out quips or jokes on each other s counters. It s incredible how much network transport and a trusting culture gets you there s a 3-node IRC network, exotic hardware to gawk at, radios galore, a NAS storage swap, LAN only email, and even a SIP phone network of redphones .

DIY We do not wish to, nor will we, rebuild the internet. We do not wish to, nor will we, scale this. We will never be friends with enough people, as hard as we may try. Participation hinges on us all having fun. As a result, membership will never be open, and we will never have enough connected LANs to deal with the technical and social problems that start to happen with scale. This is a feature, not a bug. This is a call for you to do the same. Build your own LAN. Connect it with friends homes. Remember what is missing from your life, and fill it in. Use software you know how to operate and get it running. Build slowly. Build your community. Do it with joy. Remember how we got here. Rebuild a community space that doesn t need to be mediated by faceless corporations and ad revenue. Build something sustainable that brings you joy. Rebuild something you use daily. Bring back what we re missing.

6 June 2025

Reproducible Builds: Reproducible Builds in May 2025

Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website. In this report:
  1. Security audit of Reproducible Builds tools published
  2. When good pseudorandom numbers go bad
  3. Academic articles
  4. Distribution work
  5. diffoscope and disorderfs
  6. Website updates
  7. Reproducibility testing framework
  8. Upstream patches

Security audit of Reproducible Builds tools published The Open Technology Fund s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a whitebox audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service. The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility. OTF s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.

When good pseudorandom numbers go bad Danielle Navarro published an interesting and amusing article on their blog on When good pseudorandom numbers go bad. Danielle sets the stage as follows:
[Colleagues] approached me to talk about a reproducibility issue they d been having with some R code. They d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren t just a little bit different in the way that we ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
Thanks to David Wheeler for posting about this article on our mailing list

Academic articles There were two scholarly articles published this month that related to reproducibility: Daniel Hugenroth and Alastair R. Beresford of the University of Cambridge in the United Kingdom and Mario Lins and Ren Mayrhofer of Johannes Kepler University in Linz, Austria published an article titled Attestable builds: compiling verifiable binaries on untrusted systems using trusted execution environments. In their paper, they:
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare attestable builds with reproducible builds by noting an attestable build requires only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it , and proceed by determining that t he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.
Timo Pohl, Pavel Nov k, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is sparse , focusing on few individual problems and ecosystems, and therefore identify space for more critical research.

Distribution work In Debian this month:
Hans-Christoph Steiner of the F-Droid catalogue of open source applications for the Android platform published a blog post on Making reproducible builds visible. Noting that Reproducible builds are essential in order to have trustworthy software , Hans also mentions that F-Droid has been delivering reproducible builds since 2015 . However:
There is now a Reproducibility Status link for each app on f-droid.org, listed on every app s page. Our verification server shows or based on its build results, where means our rebuilder reproduced the same APK file and means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.

diffoscope & disorderfs diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:
  • Don t rely on zipdetails --walk argument being available, and only add that argument on newer versions after we test for that. [ ]
  • Review and merge support for NuGet packages from Omair Majid. [ ]
  • Update copyright years. [ ]
  • Merge support for an lzma comparator from Will Hollywood. [ ][ ]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues [ ]. This was then uploaded to Debian as version 0.6.0-1. Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 [ ][ ] and 297 [ ][ ], and disorderfs to version 0.6.0 [ ][ ].

Website updates Once again, there were a number of improvements made to our website this month including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL s public post, recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year . As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
  • Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
  • After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a regular architecture there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
  • Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
  • Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.

Outside of this, a number of smaller changes were also made by Holger Levsen:
  • reproduce.debian.net-related:
    • Only use two workers for the ppc64el architecture due to RAM size. [ ]
    • Monitor nginx_request and nginx_status with the Munin monitoring system. [ ][ ]
    • Detect various variants of network and memory errors. [ ][ ][ ][ ]
    • Add a prominent link to reproducible-builds.org. [ ]
    • Add a rebuilderd-cache-cleanup.service and run it daily via timer. [ ][ ][ ][ ][ ]
    • Be more verbose what sources are being downloaded. [ ]
    • Correctly deal with packages with an epoch in their version [ ] and deal with binNMUs versions with an epoch as well [ ][ ].
    • Document how to reschedule all other errors on all archs. [ ]
    • Misc documentation improvements. [ ][ ][ ][ ]
    • Include the $HOSTNAME variable in the rebuilderd logfiles. [ ]
    • Install the equivs package on all worker nodes. [ ][ ]
  • Jenkins nodes:
    • Permit the sudo tool to fix up permission issues. [ ][ ]
    • Document how to manage diskspace with OpenStack. [ ]
    • Ignore a number of spurious monitoring errors on riscv64, FreeBSD, etc.. [ ][ ][ ][ ]
    • Install ntpsec-ntpdate (instead of ntpdate) as the former is available on Debian trixie and bookworm. [ ][ ]
    • Use the same SSH ControlPath for all nodes. [ ]
    • Make sure the munin user uses the same SSH config as the jenkins user. [ ]
  • tests.reproducible-builds.org-related:
    • Disable testing of the i386 architecture. [ ][ ][ ][ ][ ]
    • Document the current disk usage. [ ][ ]
    • Address some image placement now that we only test three architectures. [ ]
    • Keep track of build performance. [ ]
  • Misc:
    • Fix a (harmless) typo in the multiarch_versionskew script. [ ]
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
  • Add out of memory detection to the statistics page. [ ]
  • Reverse the sorting order on the statistics page. [ ][ ][ ][ ]
  • Improve the spacing between statistics groups. [ ]
  • Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. [ ]
  • Support Debian unstable in the rebuilder-debian.sh script. [ ] ]
  • Rely on rebuildctl to sync only arch-specific packages. [ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

5 June 2025

Matthew Garrett: Twitter's new encrypted DMs aren't better than the old ones

(Edit: Twitter could improve this significantly with very few changes - I wrote about that here. It's unclear why they'd launch without doing that, since it entirely defeats the point of using HSMs)

When Twitter[1] launched encrypted DMs a couple
of years ago, it was the worst kind of end-to-end
encrypted - technically e2ee, but in a way that made it relatively easy for Twitter to inject new encryption keys and get everyone's messages anyway. It was also lacking a whole bunch of features such as "sending pictures", so the entire thing was largely a waste of time. But a couple of days ago, Elon announced the arrival of "XChat", a new encrypted message platform built on Rust with (Bitcoin style) encryption, whole new architecture. Maybe this time they've got it right?

tl;dr - no. Use Signal. Twitter can probably obtain your private keys, and admit that they can MITM you and have full access to your metadata.

The new approach is pretty similar to the old one in that it's based on pretty straightforward and well tested cryptographic primitives, but merely using good cryptography doesn't mean you end up with a good solution. This time they've pivoted away from using the underlying cryptographic primitives directly and into higher level abstractions, which is probably a good thing. They're using Libsodium's boxes for message encryption, which is, well, fine? It doesn't offer forward secrecy (if someone's private key is leaked then all existing messages can be decrypted) so it's a long way from the state of the art for a messaging client (Signal's had forward secrecy for over a decade!), but it's not inherently broken or anything. It is, however, written in C, not Rust[2].

That's about the extent of the good news. Twitter's old implementation involved clients generating keypairs and pushing the public key to Twitter. Each client (a physical device or a browser instance) had its own private key, and messages were simply encrypted to every public key associated with an account. This meant that new devices couldn't decrypt old messages, and also meant there was a maximum number of supported devices and terrible scaling issues and it was pretty bad. The new approach generates a keypair and then stores the private key using the Juicebox protocol. Other devices can then retrieve the private key.

Doesn't this mean Twitter has the private key? Well, no. There's a PIN involved, and the PIN is used to generate an encryption key. The stored copy of the private key is encrypted with that key, so if you don't know the PIN you can't decrypt the key. So we brute force the PIN, right? Juicebox actually protects against that - before the backend will hand over the encrypted key, you have to prove knowledge of the PIN to it (this is done in a clever way that doesn't directly reveal the PIN to the backend). If you ask for the key too many times while providing the wrong PIN, access is locked down.

But this is true only if the Juicebox backend is trustworthy. If the backend is controlled by someone untrustworthy[3] then they're going to be able to obtain the encrypted key material (even if it's in an HSM, they can simply watch what comes out of the HSM when the user authenticates if there's no validation of the HSM's keys). And now all they need is the PIN. Turning the PIN into an encryption key is done using the Argon2id key derivation function, using 32 iterations and a memory cost of 16MB (the Juicebox white paper says 16KB, but (a) that's laughably small and (b) the code says 16 * 1024 in an argument that takes kilobytes), which makes it computationally and moderately memory expensive to generate the encryption key used to decrypt the private key. How expensive? Well, on my (not very fast) laptop, that takes less than 0.2 seconds. How many attempts to I need to crack the PIN? Twitter's chosen to fix that to 4 digits, so a maximum of 10,000. You aren't going to need many machines running in parallel to bring this down to a very small amount of time, at which point private keys can, to a first approximation, be extracted at will.

Juicebox attempts to defend against this by supporting sharding your key over multiple backends, and only requiring a subset of those to recover the original. I can't find any evidence that Twitter's does seem to be making use of this,Twitter uses three backends and requires data from at least two, but all the backends used are under x.com so are presumably under Twitter's direct control. Trusting the keystore without needing to trust whoever's hosting it requires a trustworthy communications mechanism between the client and the keystore. If the device you're talking to can prove that it's an HSM that implements the attempt limiting protocol and has no other mechanism to export the data, this can be made to work. Signal makes use of something along these lines using Intel SGX for contact list and settings storage and recovery, and Google and Apple also have documentation about how they handle this in ways that make it difficult for them to obtain backed up key material. Twitter has no documentation of this, and as far as I can tell does nothing to prove that the backend is in any way trustworthy. (Edit to add: The Juicebox API does support authenticated communication between the client and the HSM, but that relies on you having some way to prove that the public key you're presented with corresponds to a private key that only exists in the HSM. Twitter gives you the public key whenever you communicate with them, so even if they've implemented this properly you can't prove they haven't made up a new key and MITMed you the next time you retrieve your key)

On the plus side, Juicebox is written in Rust, so Elon's not 100% wrong. Just mostly wrong.

But ok, at least you've got viable end-to-end encryption even if someone can put in some (not all that much, really) effort to obtain your private key and render it all pointless? Actually no, since you're still relying on the Twitter server to give you the public key of the other party and there's no out of band mechanism to do that or verify the authenticity of that public key at present. Twitter can simply give you a public key where they control the private key, decrypt the message, and then reencrypt it with the intended recipient's key and pass it on. The support page makes it clear that this is a known shortcoming and that it'll be fixed at some point, but they said that about the original encrypted DM support and it never was, so that's probably dependent on whether Elon gets distracted by something else again. And the server knows who and when you're messaging even if they haven't bothered to break your private key, so there's a lot of metadata leakage.

Signal doesn't have these shortcomings. Use Signal.

[1] I'll respect their name change once Elon respects his daughter

[2] There are implementations written in Rust, but Twitter's using the C one with these JNI bindings

[3] Or someone nominally trustworthy but who's been compelled to act against your interests - even if Elon were absolutely committed to protecting all his users, his overarching goals for Twitter require him to have legal presence in multiple jurisdictions that are not necessarily above placing employees in physical danger if there's a perception that they could obtain someone's encryption keys

comment count unavailable comments

5 May 2025

Daniel Lange: Make apt shut up about "modernize-sources" in Trixie

Apt in Trixie (Debian 13) has the annoying function to tell you "Notice: Some sources can be modernized. Run 'apt modernize-sources' to do so." ... every single time you run apt update. Not cool for logs and log monitoring. And - of course - if you had the option to do this, you ... would have run the indicated apt modernize-sources command to convert your sources.list to "deb822 .sources format" files already. So an information message once or twice would have done. Well, luckily you can help yourself: apt -o APT::Get::Update::SourceListWarnings=false will keep apt shut up. This could go into an alias or your systems management tool / update script. Alternatively add
# Keep apt shut about preferring the "deb822" sources file format
APT::Get::Update::SourceListWarnings "false";
to /etc/apt/apt.conf.d/10quellsourceformatwarnings . This silences the notices about sources file formats (not only the deb822 one) system-wide. That way you can decide when you can / want to migrate to the new, more verbose, apt sources format yourself.

30 April 2025

Simon Josefsson: Building Debian in a GitLab Pipeline

After thinking about multi-stage Debian rebuilds I wanted to implement the idea. Recall my illustration:
Earlier I rebuilt all packages that make up the difference between Ubuntu and Trisquel. It turned out to be a 42% bit-by-bit identical similarity. To check the generality of my approach, I rebuilt the difference between Debian and Devuan too. That was the debdistreproduce project. It only had to orchestrate building up to around 500 packages for each distribution and per architecture. Differential reproducible rebuilds doesn t give you the full picture: it ignore the shared package between the distribution, which make up over 90% of the packages. So I felt a desire to do full archive rebuilds. The motivation is that in order to trust Trisquel binary packages, I need to trust Ubuntu binary packages (because that make up 90% of the Trisquel packages), and many of those Ubuntu binaries are derived from Debian source packages. How to approach all of this? Last year I created the debdistrebuild project, and did top-50 popcon package rebuilds of Debian bullseye, bookworm, trixie, and Ubuntu noble and jammy, on a mix of amd64 and arm64. The amount of reproducibility was lower. Primarily the differences were caused by using different build inputs. Last year I spent (too much) time creating a mirror of snapshot.debian.org, to be able to have older packages available for use as build inputs. I have two copies hosted at different datacentres for reliability and archival safety. At the time, snapshot.d.o had serious rate-limiting making it pretty unusable for massive rebuild usage or even basic downloads. Watching the multi-month download complete last year had a meditating effect. The completion of my snapshot download co-incided with me realizing something about the nature of rebuilding packages. Let me below give a recap of the idempotent rebuilds idea, because it motivate my work to build all of Debian from a GitLab pipeline. One purpose for my effort is to be able to trust the binaries that I use on my laptop. I believe that without building binaries from source code, there is no practically feasible way to trust binaries. To trust any binary you receive, you can de-assemble the bits and audit the assembler instructions for the CPU you will execute it on. Doing that on a OS-wide level this is unpractical. A more practical approach is to audit the source code, and then confirm that the binary is 100% bit-by-bit identical to one that you can build yourself (from the same source) on your own trusted toolchain. This is similar to a reproducible build. My initial goal with debdistrebuild was to get to 100% bit-by-bit identical rebuilds, and then I would have trustworthy binaries. Or so I thought. This also appears to be the goal of reproduce.debian.net. They want to reproduce the official Debian binaries. That is a worthy and important goal. They achieve this by building packages using the build inputs that were used to build the binaries. The build inputs are earlier versions of Debian packages (not necessarily from any public Debian release), archived at snapshot.debian.org. I realized that these rebuilds would be not be sufficient for me: it doesn t solve the problem of how to trust the toolchain. Let s assume the reproduce.debian.net effort succeeds and is able to 100% bit-by-bit identically reproduce the official Debian binaries. Which appears to be within reach. To have trusted binaries we would only have to audit the source code for the latest version of the packages AND audit the tool chain used. There is no escaping from auditing all the source code that s what I think we all would prefer to focus on, to be able to improve upstream source code. The trouble is about auditing the tool chain. With the Reproduce.debian.net approach, that is a recursive problem back to really ancient Debian packages, some of them which may no longer build or work, or even be legally distributable. Auditing all those old packages is a LARGER effort than auditing all current packages! Doing auditing of old packages is of less use to making contributions: those releases are old, and chances are any improvements have already been implemented and released. Or that improvements are no longer applicable because the projects evolved since the earlier version. See where this is going now? I reached the conclusion that reproducing official binaries using the same build inputs is not what I m interested in. I want to be able to build the binaries that I use from source using a toolchain that I can also build from source. And preferably that all of this is using latest version of all packages, so that I can contribute and send patches for them, to improve matters. The toolchain that Reproduce.Debian.Net is using is not trustworthy unless all those ancient packages are audited or rebuilt bit-by-bit identically, and I don t see any practical way forward to achieve that goal. Nor have I seen anyone working on that problem. It is possible to do, though, but I think there are simpler ways to achieve the same goal. My approach to reach trusted binaries on my laptop appears to be a three-step effort: How to go about achieving this? Today s Debian build architecture is something that lack transparency and end-user control. The build environment and signing keys are managed by, or influenced by, unidentified people following undocumented (or at least not public) security procedures, under unknown legal jurisdictions. I always wondered why none of the Debian-derivates have adopted a modern GitDevOps-style approach as a method to improve binary build transparency, maybe I missed some project? If you want to contribute to some GitHub or GitLab project, you click the Fork button and get a CI/CD pipeline running which rebuild artifacts for the project. This makes it easy for people to contribute, and you get good QA control because the entire chain up until its artifact release are produced and tested. At least in theory. Many projects are behind on this, but it seems like this is a useful goal for all projects. This is also liberating: all users are able to reproduce artifacts. There is no longer any magic involved in preparing release artifacts. As we ve seen with many software supply-chain security incidents for the past years, where the magic is involved is a good place to introduce malicious code. To allow me to continue with my experiment, I thought the simplest way forward was to setup a GitDevOps-centric and user-controllable way to build the entire Debian archive. Let me introduce the debdistbuild project. Debdistbuild is a re-usable GitLab CI/CD pipeline, similar to the Salsa CI pipeline. It provide one build job definition and one deploy job definition. The pipeline can run on GitLab.org Shared Runners or you can set up your own runners, like my GitLab riscv64 runner setup. I have concerns about relying on GitLab (both as software and as a service), but my ideas are easy to transfer to some other GitDevSecOps setup such as Codeberg.org. Self-hosting GitLab, including self-hosted runners, is common today, and Debian rely increasingly on Salsa for this. All of the build infrastructure could be hosted on Salsa eventually. The build job is simple. From within an official Debian container image build packages using dpkg-buildpackage essentially by invoking the following commands.
sed -i 's/ deb$/ deb deb-src/' /etc/apt/sources.list.d/*.sources
apt-get -o Acquire::Check-Valid-Until=false update
apt-get dist-upgrade -q -y
apt-get install -q -y --no-install-recommends build-essential fakeroot
env DEBIAN_FRONTEND=noninteractive \
    apt-get build-dep -y --only-source $PACKAGE=$VERSION
useradd -m build
DDB_BUILDDIR=/build/reproducible-path
chgrp build $DDB_BUILDDIR
chmod g+w $DDB_BUILDDIR
su build -c "apt-get source --only-source $PACKAGE=$VERSION" > ../$PACKAGE_$VERSION.build
cd $DDB_BUILDDIR
su build -c "dpkg-buildpackage"
cd ..
mkdir out
mv -v $(find $DDB_BUILDDIR -maxdepth 1 -type f) out/
The deploy job is also simple. It commit artifacts to a Git project using Git-LFS to handle large objects, essentially something like this:
if ! grep -q '^pool/**' .gitattributes; then
    git lfs track 'pool/**'
    git add .gitattributes
    git commit -m"Track pool/* with Git-LFS." .gitattributes
fi
POOLDIR=$(if test "$(echo "$PACKAGE"   cut -c1-3)" = "lib"; then C=4; else C=1; fi; echo "$DDB_PACKAGE"   cut -c1-$C)
mkdir -pv pool/main/$POOLDIR/
rm -rfv pool/main/$POOLDIR/$PACKAGE
mv -v out pool/main/$POOLDIR/$PACKAGE
git add pool
git commit -m"Add $PACKAGE." -m "$CI_JOB_URL" -m "$VERSION" -a
if test "$ DDB_GIT_TOKEN:- " = ""; then
    echo "SKIP: Skipping git push due to missing DDB_GIT_TOKEN (see README)."
else
    git push -o ci.skip
fi
That s it! The actual implementation is a bit longer, but the major difference is for log and error handling. You may review the source code of the base Debdistbuild pipeline definition, the base Debdistbuild script and the rc.d/-style scripts implementing the build.d/ process and the deploy.d/ commands. There was one complication related to artifact size. GitLab.org job artifacts are limited to 1GB. Several packages in Debian produce artifacts larger than this. What to do? GitLab supports up to 5GB for files stored in its package registry, but this limit is too close for my comfort, having seen some multi-GB artifacts already. I made the build job optionally upload artifacts to a S3 bucket using SHA256 hashed file hierarchy. I m using Hetzner Object Storage but there are many S3 providers around, including self-hosting options. This hierarchy is compatible with the Git-LFS .git/lfs/object/ hierarchy, and it is easy to setup a separate Git-LFS object URL to allow Git-LFS object downloads from the S3 bucket. In this mode, only Git-LFS stubs are pushed to the git repository. It should have no trouble handling the large number of files, since I have earlier experience with Apt mirrors in Git-LFS. To speed up job execution, and to guarantee a stable build environment, instead of installing build-essential packages on every build job execution, I prepare some build container images. The project responsible for this is tentatively called stage-N-containers. Right now it create containers suitable for rolling builds of trixie on amd64, arm64, and riscv64, and a container intended for as use the stage-0 based on the 20250407 docker images of bookworm on amd64 and arm64 using the snapshot.d.o 20250407 archive. Or actually, I m using snapshot-cloudflare.d.o because of download speed and reliability. I would have prefered to use my own snapshot mirror with Hetzner bandwidth, alas the Debian snapshot team have concerns about me publishing the list of (SHA1 hash) filenames publicly and I haven t been bothered to set up non-public access. Debdistbuild has built around 2.500 packages for bookworm on amd64 and bookworm on arm64. To confirm the generality of my approach, it also build trixie on amd64, trixie on arm64 and trixie on riscv64. The riscv64 builds are all on my own hosted runners. For amd64 and arm64 my own runners are only used for large packages where the GitLab.com shared runners run into the 3 hour time limit. What s next in this venture? Some ideas include: What do you think?

11 April 2025

Reproducible Builds: Reproducible Builds in March 2025

Welcome to the third report in 2025 from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As usual, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. Debian bookworm live images now fully reproducible from their binary packages
  2. How NixOS and reproducible builds could have detected the xz backdoor
  3. LWN: Fedora change aims for 99% package reproducibility
  4. Python adopts PEP standard for specifying package dependencies
  5. OSS Rebuild real-time validation and tooling improvements
  6. SimpleX Chat server components now reproducible
  7. Three new scholarly papers
  8. Distribution roundup
  9. An overview of Supply Chain Attacks on Linux distributions
  10. diffoscope & strip-nondeterminism
  11. Website updates
  12. Reproducibility testing framework
  13. Upstream patches

Debian bookworm live images now fully reproducible from their binary packages Roland Clobus announced on our mailing list this month that all the major desktop variants (ie. Gnome, KDE, etc.) can be reproducibly created for Debian bullseye, bookworm and trixie from their (pre-compiled) binary packages. Building reproducible Debian live images does not require building from reproducible source code, but this is still a remarkable achievement. Some large proportion of the binary packages that comprise these live images can (and were) built reproducibly, but live image generation works at a higher level. (By contrast, full or end-to-end reproducibility of a bootable OS image will, in time, require both the compile-the-packages the build-the-bootable-image stages to be reproducible.) Nevertheless, in response, Roland s announcement generated significant congratulations as well as some discussion regarding the finer points of the terms employed: a full outline of the replies can be found here. The news was also picked up by Linux Weekly News (LWN) as well as to Hacker News.

How NixOS and reproducible builds could have detected the xz backdoor Julien Malka aka luj published an in-depth blog post this month with the highly-stimulating title How NixOS and reproducible builds could have detected the xz backdoor for the benefit of all . Starting with an dive into the relevant technical details of the XZ Utils backdoor, Julien s article goes on to describe how we might avoid the xz catastrophe in the future by building software from trusted sources and building trust into untrusted release tarballs by way of comparing sources and leveraging bitwise reproducibility, i.e. applying the practices of Reproducible Builds. The article generated significant discussion on Hacker News as well as on Linux Weekly News (LWN).

LWN: Fedora change aims for 99% package reproducibility Linux Weekly News (LWN) contributor Joe Brockmeier has published a detailed round-up on how Fedora change aims for 99% package reproducibility. The article opens by mentioning that although Debian has been working toward reproducible builds for more than a decade , the Fedora project has now:
progressed far enough that the project is now considering a change proposal for the Fedora 43 development cycle, expected to be released in October, with a goal of making 99% of Fedora s package builds reproducible. So far, reaction to the proposal seems favorable and focused primarily on how to achieve the goal with minimal pain for packagers rather than whether to attempt it.
The Change Proposal itself is worth reading:
Over the last few releases, we [Fedora] changed our build infrastructure to make package builds reproducible. This is enough to reach 90%. The remaining issues need to be fixed in individual packages. After this Change, package builds are expected to be reproducible. Bugs will be filed against packages when an irreproducibility is detected. The goal is to have no fewer than 99% of package builds reproducible.
Further discussion can be found on the Fedora mailing list as well as on Fedora s Discourse instance.

Python adopts PEP standard for specifying package dependencies Python developer Brett Cannon reported on Fosstodon that PEP 751 was recently accepted. This design document has the purpose of describing a file format to record Python dependencies for installation reproducibility . As the abstract of the proposal writes:
This PEP proposes a new file format for specifying dependencies to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.
The PEP, which itself supersedes PEP 665, mentions that there are at least five well-known solutions to this problem in the community .

OSS Rebuild real-time validation and tooling improvements OSS Rebuild aims to automate rebuilding upstream language packages (e.g. from PyPI, crates.io, npm registries) and publish signed attestations and build definitions for public use. OSS Rebuild is now attempting rebuilds as packages are published, shortening the time to validating rebuilds and publishing attestations. Aman Sharma contributed classifiers and fixes for common sources of non-determinism in JAR packages. Improvements were also made to some of the core tools in the project:
  • timewarp for simulating the registry responses from sometime in the past.
  • proxy for transparent interception and logging of network activity.
  • and stabilize, yet another nondeterminism fixer.

SimpleX Chat server components now reproducible SimpleX Chat is a privacy-oriented decentralised messaging platform that eliminates user identifiers and metadata, offers end-to-end encryption and has a unique approach to decentralised identity. Starting from version 6.3, however, Simplex has implemented reproducible builds for its server components. This advancement allows anyone to verify that the binaries distributed by SimpleX match the source code, improving transparency and trustworthiness.

Three new scholarly papers Aman Sharma of the KTH Royal Institute of Technology of Stockholm, Sweden published a paper on Build and Runtime Integrity for Java (PDF). The paper s abstract notes that Software Supply Chain attacks are increasingly threatening the security of software systems and goes on to compare build- and run-time integrity:
Build-time integrity ensures that the software artifact creation process, from source code to compiled binaries, remains untampered. Runtime integrity, on the other hand, guarantees that the executing application loads and runs only trusted code, preventing dynamic injection of malicious components.
Aman s paper explores solutions to safeguard Java applications and proposes some novel techniques to detect malicious code injection. A full PDF of the paper is available.
In addition, Hamed Okhravi and Nathan Burow of Massachusetts Institute of Technology (MIT) Lincoln Laboratory along with Fred B. Schneider of Cornell University published a paper in the most recent edition of IEEE Security & Privacy on Software Bill of Materials as a Proactive Defense:
The recently mandated software bill of materials (SBOM) is intended to help mitigate software supply-chain risk. We discuss extensions that would enable an SBOM to serve as a basis for making trust assessments thus also serving as a proactive defense.
A full PDF of the paper is available.
Lastly, congratulations to Giacomo Benedetti of the University of Genoa for publishing their PhD thesis. Titled Improving Transparency, Trust, and Automation in the Software Supply Chain, Giacomo s thesis:
addresses three critical aspects of the software supply chain to enhance security: transparency, trust, and automation. First, it investigates transparency as a mechanism to empower developers with accurate and complete insights into the software components integrated into their applications. To this end, the thesis introduces SUNSET and PIP-SBOM, leveraging modeling and SBOMs (Software Bill of Materials) as foundational tools for transparency and security. Second, it examines software trust, focusing on the effectiveness of reproducible builds in major ecosystems and proposing solutions to bolster their adoption. Finally, it emphasizes the role of automation in modern software management, particularly in ensuring user safety and application reliability. This includes developing a tool for automated security testing of GitHub Actions and analyzing the permission models of prominent platforms like GitHub, GitLab, and BitBucket.

Distribution roundup In Debian this month:
The IzzyOnDroid Android APK repository reached another milestone in March, crossing the 40% coverage mark specifically, more than 42% of the apps in the repository is now reproducible Thanks to funding by NLnet/Mobifree, the project was also to put more time into their tooling. For instance, developers can now run easily their own verification builder in less than 5 minutes . This currently supports Debian-based systems, but support for RPM-based systems is incoming. Future work in the pipeline, including documentation, guidelines and helpers for debugging.
Fedora developer Zbigniew J drzejewski-Szmek announced a work-in-progress script called fedora-repro-build which attempts to reproduce an existing package within a Koji build environment. Although the project s README file lists a number of fields will always or almost always vary (and there are a non-zero list of other known issues), this is an excellent first step towards full Fedora reproducibility (see above for more information).
Lastly, in openSUSE news, Bernhard M. Wiedemann posted another monthly update for his work there.

An overview of Supply Chain Attacks on Linux distributions Fenrisk, a cybersecurity risk-management company, has published a lengthy overview of Supply Chain Attacks on Linux distributions. Authored by Maxime Rinaudo, the article asks:
[What] would it take to compromise an entire Linux distribution directly through their public infrastructure? Is it possible to perform such a compromise as simple security researchers with no available resources but time?

diffoscope & strip-nondeterminism diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 290, 291, 292 and 293 and 293 to Debian:
  • Bug fixes:
    • file(1) version 5.46 now returns XHTML document for .xhtml files such as those found nested within our .epub tests. [ ]
    • Also consider .aar files as APK files, at least for the sake of diffoscope. [ ]
    • Require the new, upcoming, version of file(1) and update our quine-related testcase. [ ]
  • Codebase improvements:
    • Ensure all calls to our_check_output in the ELF comparator have the potential CalledProcessError exception caught. [ ][ ]
    • Correct an import masking issue. [ ]
    • Add a missing subprocess import. [ ]
    • Reformat openssl.py. [ ]
    • Update copyright years. [ ][ ][ ]
In addition, Ivan Trubach contributed a change to ignore the st_size metadata entry for directories as it is essentially arbitrary and introduces unnecessary or even spurious changes. [ ]

Website updates Once again, there were a number of improvements made to our website this month, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Add links to two related bugs about buildinfos.debian.net. [ ]
    • Add an extra sync to the database backup. [ ]
    • Overhaul description of what the service is about. [ ][ ][ ][ ][ ][ ]
    • Improve the documentation to indicate that need to fix syncronisation pipes. [ ][ ]
    • Improve the statistics page by breaking down output by architecture. [ ]
    • Add a copyright statement. [ ]
    • Add a space after the package name so one can search for specific packages more easily. [ ]
    • Add a script to work around/implement a missing feature of debrebuild. [ ]
  • Misc:
    • Run debian-repro-status at the end of the chroot-install tests. [ ][ ]
    • Document that we have unused diskspace at Ionos. [ ]
In addition:
  • James Addison made a number of changes to the reproduce.debian.net homepage. [ ][ ].
  • Jochen Sprickerhof updated the statistics generation to catch No space left on device issues. [ ]
  • Mattia Rizzolo added a better command to stop the builders [ ] and fixed the reStructuredText syntax in the README.infrastructure file. [ ]
And finally, node maintenance was performed by Holger Levsen [ ][ ][ ] and Mattia Rizzolo [ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 March 2025

John Goerzen: Why You Should (Still) Use Signal As Much As Possible

As I write this in March 2025, there is a lot of confusion about Signal messenger due to the recent news of people using Signal in government, and subsequent leaks. The short version is: there was no problem with Signal here. People were using it because they understood it to be secure, not the other way around. Both the government and the Electronic Frontier Foundation recommend people use Signal. This is an unusual alliance, and in the case of the government, was prompted because it understood other countries had a persistent attack against American telephone companies and SMS traffic. So let s dive in. I ll cover some basics of what security is, what happened in this situation, and why Signal is a good idea. This post isn t for programmers that work with cryptography every day. Rather, I hope it can make some of these concepts accessible to everyone else.

What makes communications secure? When most people are talking about secure communications, they mean some combination of these properties:
  1. Privacy - nobody except the intended recipient can decode a message.
  2. Authentication - guarantees that the person you are chatting with really is the intended recipient.
  3. Ephemerality - preventing a record of the communication from being stored. That is, making it more like a conversation around the table than a written email.
  4. Anonymity - keeping your set of contacts to yourself and even obfuscating the fact that communications are occurring.
If you think about it, most people care the most about the first two. In fact, authentication is a key part of privacy. There is an attack known as man in the middle in which somebody pretends to be the intended recipient. The interceptor reads the messages, and then passes them on to the real intended recipient. So we can t really have privacy without authentication. I ll have more to say about these later. For now, let s discuss attack scenarios.

What compromises security? There are a number of ways that security can be compromised. Let s think through some of them:

Communications infrastructure snooping Let s say you used no encryption at all, and connected to public WiFi in a coffee shop to send your message. Who all could potentially see it?
  • The owner of the coffee shop s WiFi
  • The coffee shop s Internet provider
  • The recipient s Internet provider
  • Any Internet providers along the network between the sender and the recipient
  • Any government or institution that can compel any of the above to hand over copies of the traffic
  • Any hackers that compromise any of the above systems
Back in the early days of the Internet, most traffic had no encryption. People were careful about putting their credit cards into webpages and emails because they knew it was easy to intercept them. We have been on a decades-long evolution towards more pervasive encryption, which is a good thing. Text messages (SMS) follow a similar path to the above scenario, and are unencrypted. We know that all of the above are ways people s texts can be compromised; for instance, governments can issue search warrants to obtain copies of texts, and China is believed to have a persistent hack into western telcos. SMS fails all four of our attributes of secure communication above (privacy, authentication, ephemerality, and anonymity). Also, think about what information is collected from SMS and by who. Texts you send could be retained in your phone, the recipient s phone, your phone company, their phone company, and so forth. They might also live in cloud backups of your devices. You only have control over your own phone s retention. So defenses against this involve things like:
  • Strong end-to-end encryption, so no intermediate party even the people that make the app can snoop on it.
  • Using strong authentication of your peers
  • Taking steps to prevent even app developers from being able to see your contact list or communication history
You may see some other apps saying they use strong encryption or use the Signal protocol. But while they may do that for some or all of your message content, they may still upload your contact list, history, location, etc. to a central location where it is still vulnerable to these kinds of attacks. When you think about anonymity, think about it like this: if you send a letter to a friend every week, every postal carrier that transports it even if they never open it or attempt to peak inside will be able to read the envelope and know that you communicate on a certain schedule with that friend. The same can be said of SMS, email, or most encrypted chat operators. Signal s design prevents it from retaining even this information, though nation-states or ISPs might still be able to notice patterns (every time you send something via Signal, your contact receives something from Signal a few milliseconds later). It is very difficult to provide perfect anonymity from well-funded adversaries, even if you can provide very good privacy.

Device compromise Let s say you use an app with strong end-to-end encryption. This takes away some of the easiest ways someone could get to your messages. But it doesn t take away all of them. What if somebody stole your phone? Perhaps the phone has a password, but if an attacker pulled out the storage unit, could they access your messages without a password? Or maybe they somehow trick or compel you into revealing your password. Now what? An even simpler attack doesn t require them to steal your device at all. All they need is a few minutes with it to steal your SIM card. Now they can receive any texts sent to your number - whether from your bank or your friend. Yikes, right? Signal stores your data in an encrypted form on your device. It can protect it in various ways. One of the most important protections is ephemerality - it can automatically delete your old texts. A text that is securely erased can never fall into the wrong hands if the device is compromised later. An actively-compromised phone, though, could still give up secrets. For instance, what if a malicious keyboard app sent every keypress to an adversary? Signal is only as secure as the phone it runs on but still, it protects against a wide variety of attacks.

Untrustworthy communication partner Perhaps you are sending sensitive information to a contact, but that person doesn t want to keep it in confidence. There is very little you can do about that technologically; with pretty much any tool out there, nothing stops them from taking a picture of your messages and handing the picture off.

Environmental compromise Perhaps your device is secure, but a hidden camera still captures what s on your screen. You can take some steps against things like this, of course.

Human error Sometimes humans make mistakes. For instance, the reason a reporter got copies of messages recently was because a participant in a group chat accidentally added him (presumably that participant meant to add someone else and just selected the wrong name). Phishing attacks can trick people into revealing passwords or other sensitive data. Humans are, quite often, the weakest link in the chain.

Protecting yourself So how can you protect yourself against these attacks? Let s consider:
  • Use a secure app like Signal that uses strong end-to-end encryption where even the provider can t access your messages
  • Keep your software and phone up-to-date
  • Be careful about phishing attacks and who you add to chat rooms
  • Be aware of your surroundings; don t send sensitive messages where people might be looking over your shoulder with their eyes or cameras
There are other methods besides Signal. For instance, you could install GnuPG (GPG) on a laptop that has no WiFi card or any other way to connect it to the Internet. You could always type your messages on that laptop, encrypt them, copy the encrypted text to a floppy disk (or USB device), take that USB drive to your Internet computer, and send the encrypted message by email or something. It would be exceptionally difficult to break the privacy of messages in that case (though anonymity would be mostly lost). Even if someone got the password to your secure laptop, it wouldn t do them any good unless they physically broke into your house or something. In some ways, it is probably safer than Signal. (For more on this, see my article How gapped is your air?) But, that approach is hard to use. Many people aren t familiar with GnuPG. You don t have the convenience of sending a quick text message from anywhere. Security that is hard to use most often simply isn t used. That is, you and your friends will probably just revert back to using insecure SMS instead of this GnuPG approach because SMS is so much easier. Signal strikes a unique balance of providing very good security while also being practical, easy, and useful. For most people, it is the most secure option available. Signal is also open source; you don t have to trust that it is as secure as it says, because you can inspect it for yourself. Also, while it s not federated, I previously addressed that.

Government use If you are a government, particularly one that is highly consequential to the world, you can imagine that you are a huge target. Other nations are likely spending billions of dollars to compromise your communications. Signal itself might be secure, but if some other government can add spyware to your phones, or conduct a successful phishing attack, you can still have your communications compromised. I have no direct knowledge, but I think it is generally understood that the US government maintains communications networks that are entirely separate from the Internet and can only be accessed from secure physical locations and secure rooms. These can be even more secure than the average person using Signal because they can protect against things like environmental compromise, human error, and so forth. The scandal in March of 2025 happened because government employees were using Signal rather than official government tools for sensitive information, had taken advantage of Signal s ephemerality (laws require records to be kept), and through apparent human error had directly shared this information with a reporter. Presumably a reporter would have lacked access to the restricted communications networks in the first place, so that wouldn t have been possible. This doesn t mean that Signal is bad. It just means that somebody that can spend billions of dollars on security can be more secure than you. Signal is still a great tool for people, and in many cases defeats even those that can spend lots of dollars trying to defeat it. And remember - to use those restricted networks, you have to go to specific rooms in specific buildings. They are still not as convenient as what you carry around in your pocket.

Conclusion Signal is practical security. Do you want phone companies reading your messages? How about Facebook or X? Have those companies demonstrated that they are completely trustworthy throughout their entire history? I say no. So, go install Signal. It s the best, most practical tool we have.
This post is also available on my website, where it may be periodically updated.

23 February 2025

Colin Watson: Qalculate time hacks

Anarcat recently wrote about Qalculate, and I think I m a convert, even though I ve only barely scratched the surface. The thing I almost immediately started using it for is time calculations. When I started tracking my time, I quickly found that Timewarrior was good at keeping all the data I needed, but I often found myself extracting bits of it and reprocessing it in variously clumsy ways. For example, I often don t finish a task in one sitting; maybe I take breaks, or I switch back and forth between a couple of different tasks. The raw output of timew summary is a bit clumsy for this, as it shows each chunk of time spent as a separate row:
$ timew summary 2025-02-18 Debian
Wk Date       Day Tags                            Start      End    Time   Total
W8 2025-02-18 Tue CVE-2025-26465, Debian,       9:41:44 10:24:17 0:42:33
                  next, openssh
                  Debian, FTBFS with GCC-15,   10:24:17 10:27:12 0:02:55
                  icoutils
                  Debian, FTBFS with GCC-15,   11:50:05 11:57:25 0:07:20
                  kali
                  Debian, Upgrade to 0.67,     11:58:21 12:12:41 0:14:20
                  python_holidays
                  Debian, FTBFS with GCC-15,   12:14:15 12:33:19 0:19:04
                  vigor
                  Debian, FTBFS with GCC-15,   12:39:02 12:39:38 0:00:36
                  python_setproctitle
                  Debian, Upgrade to 1.3.4,    12:39:39 12:46:05 0:06:26
                  python_setproctitle
                  Debian, FTBFS with GCC-15,   12:48:28 12:49:42 0:01:14
                  python_setproctitle
                  Debian, Upgrade to 3.4.1,    12:52:07 13:02:27 0:10:20 1:44:48
                  python_charset_normalizer
                                                                         1:44:48
So I wrote this Python program to help me:
#! /usr/bin/python3
"""
Summarize timewarrior data, grouped and sorted by time spent.
"""
import json
import subprocess
from argparse import ArgumentParser, RawDescriptionHelpFormatter
from collections import defaultdict
from datetime import datetime, timedelta, timezone
from operator import itemgetter
from rich import box, print
from rich.table import Table
parser = ArgumentParser(
    description=__doc__, formatter_class=RawDescriptionHelpFormatter
)
parser.add_argument("-t", "--only-total", default=False, action="store_true")
parser.add_argument(
    "range",
    nargs="?",
    default=":today",
    help="Time range (usually a hint, e.g. :lastweek)",
)
parser.add_argument("tag", nargs="*", help="Tags to filter by")
args = parser.parse_args()
entries: defaultdict[str, timedelta] = defaultdict(timedelta)
now = datetime.now(timezone.utc)
for entry in json.loads(
    subprocess.run(
        ["timew", "export", args.range, *args.tag],
        check=True,
        capture_output=True,
        text=True,
    ).stdout
):
    start = datetime.fromisoformat(entry["start"])
    if "end" in entry:
        end = datetime.fromisoformat(entry["end"])
    else:
        end = now
    entries[", ".join(entry["tags"])] += end - start
if not args.only_total:
    table = Table(box=box.SIMPLE, highlight=True)
    table.add_column("Tags")
    table.add_column("Time", justify="right")
    for tags, time in sorted(entries.items(), key=itemgetter(1), reverse=True):
        table.add_row(tags, str(time))
    print(table)
total = sum(entries.values(), start=timedelta())
hours, rest = divmod(total, timedelta(hours=1))
minutes, rest = divmod(rest, timedelta(minutes=1))
seconds = rest.seconds
print(f"Total time:  hours:02 : minutes:02 : seconds:02 ")
$ summarize-time 2025-02-18 Debian
  Tags                                                     Time
  
  CVE-2025-26465, Debian, next, openssh                 0:42:33
  Debian, FTBFS with GCC-15, vigor                      0:19:04
  Debian, Upgrade to 0.67, python_holidays              0:14:20
  Debian, Upgrade to 3.4.1, python_charset_normalizer   0:10:20
  Debian, FTBFS with GCC-15, kali                       0:07:20
  Debian, Upgrade to 1.3.4, python_setproctitle         0:06:26
  Debian, FTBFS with GCC-15, icoutils                   0:02:55
  Debian, FTBFS with GCC-15, python_setproctitle        0:01:50
Total time: 01:44:48
Much nicer. But that only helps with some of my reporting. At the end of a month, I have to work out how much time to bill Freexian for and fill out a timesheet, and for various reasons those queries don t correspond to single timew tags: they sometimes correspond to the sum of all time spent on multiple tags, or to the time spent on one tag minus the time spent on another tag, or similar. As a result I quite often have to do basic arithmetic on time intervals; but that s surprisingly annoying! I didn t previously have good tools for that, and was reduced to doing things like str(timedelta(hours=..., minutes=..., seconds=...) + ...) in Python, which gets old fast. Instead:
$ qalc '62:46:30 - 51:02:42 to time'
(225990 / 3600)   (183762 / 3600) = 11:43:48
I also often want to work out how much of my time I ve spent on Debian work this month so far, since Freexian pays me for up to 20% of my work time on Debian; if I m under that then I might want to prioritize more Debian projects, and if I m over then I should be prioritizing more Freexian projects as otherwise I m not going to get paid for that time.
$ summarize-time -t :month Freexian
Total time: 69:19:42
$ summarize-time -t :month Debian
Total time: 24:05:30
$ qalc '24:05:30 / (24:05:30 + 69:19:42) to %'
(86730 / 3600) / ((86730 / 3600) + (249582 / 3600))   25.78855349%
I love it.

31 January 2025

Gunnar Wolf: ChatGPT is bullshit

This post is an unpublished review for ChatGPT is bullshit
As people around the world understand how LLMs behave, more and more people wonder as to why these models hallucinate, and what can be done about to reduce it. This provocatively named article by Michael Townsen Hicks, James Humphries and Joe Slater bring is an excellent primer to better understanding how LLMs work and what to expect from them. As humans carrying out our relations using our language as the main tool, we are easily at awe with the apparent ease with which ChatGPT (the first widely available, and to this day probably the best known, LLM-based automated chatbot) simulates human-like understanding and how it helps us to easily carry out even daunting data aggregation tasks. It is common that people ask ChatGPT for an answer and, if it gets part of the answer wrong, they justify it by stating that it s just a hallucination. Townsen et al. invite us to switch from that characterization to a more correct one: LLMs are bullshitting. This term is formally presented by Frankfurt [1]. To Bullshit is not the same as to lie, because lying requires to know (and want to cover) the truth. A bullshitter not necessarily knows the truth, they just have to provide a compelling description, regardless of what is really aligned with truth. After introducing Frankfurt s ideas, the authors explain the fundamental ideas behind LLM-based chatbots such as ChatGPT; a Generative Pre-trained Transformer (GPT) s have as their only goal to produce human-like text, and it is carried out mainly by presenting output that matches the input s high-dimensional abstract vector representation, and probabilistically outputs the next token (word) iteratively with the text produced so far. Clearly, a GPT s ask is not to seek truth or to convey useful information they are built to provide a normal-seeming response to the prompts provided by their user. Core data are not queried to find optimal solutions for the user s requests, but are generated on the requested topic, attempting to mimic the style of document set it was trained with. Erroneous data emitted by a LLM is, thus, not equiparable with what a person could hallucinate with, but appears because the model has no understanding of truth; in a way, this is very fitting with the current state of the world, a time often termed as the age of post-truth [2]. Requesting an LLM to provide truth in its answers is basically impossible, given the difference between intelligence and consciousness: Following Harari s definitions [3], LLM systems, or any AI-based system, can be seen as intelligent, as they have the ability to attain goals in various, flexible ways, but they cannot be seen as conscious, as they have no ability to experience subjectivity. This is, the LLM is, by definition, bullshitting its way towards an answer: their goal is to provide an answer, not to interpret the world in a trustworthy way. The authors close their article with a plea for literature on the topic to adopt the more correct bullshit term instead of the vacuous, anthropomorphizing hallucination . Of course, being the word already loaded with a negative meaning, it is an unlikely request. This is a great article that mixes together Computer Science and Philosophy, and can shed some light on a topic that is hard to grasp for many users. [1] Frankfurt, Harry (2005). On Bullshit. Princeton University Press. [2] Zoglauer, Thomas (2023). Constructed truths: truth and knowledge in a post-truth world. Springer. [3] Harari, Yuval Noah (2023. Nexus: A Brief History of Information Networks From the Stone Age to AI. Random House.

29 January 2025

Russ Allbery: Review: The Sky Road

Review: The Sky Road, by Ken MacLeod
Series: Fall Revolution #4
Publisher: Tor
Copyright: 1999
Printing: August 2001
ISBN: 0-8125-7759-0
Format: Mass market
Pages: 406
The Sky Road is the fourth book in the Fall Revolution series, but it represents an alternate future that diverges after (or during?) the events of The Sky Fraction. You probably want to read that book first, but I'm not sure reading The Stone Canal or The Cassini Division adds anything to this book other than frustration. Much more on that in a moment. Clovis colha Gree is a aspiring doctoral student in history with a summer job as a welder. He works on the platform for the project, which the reader either slowly discovers from the book or quickly discovers from the cover is a rocket to get to orbit. As the story opens, he meets (or, as he describes it) is targeted by a woman named Merrial, a tinker who works on the guidance system. The early chapters provide only a few hints about Clovis's world: a statue of the Deliverer on a horse that forms the backdrop of their meeting, the casual carrying of weapons, hints that tinkers are socially unacceptable, and some division between the white logic and the black logic in programming. Also, because this is a Ken MacLeod novel, everyone is obsessed with smoking and tobacco the way that the protagonists of erotica are obsessed with sex. Clovis's story is one thread of this novel. The other, told in the alternating chapters, is the story of Myra Godwin-Davidova, chair of the governing Council of People's Commissars of the International Scientific and Technical Workers' Republic, a micronation embedded in post-Soviet Kazakhstan. Series readers will remember Myra's former lover, David Reid, as the villain of The Stone Canal and the head of the corporation Mutual Protection, which is using slave labor (sort of) to support a resurgent space movement and its attempt to take control of a balkanized Earth. The ISTWR is in decline and a minor power by all standards except one: They still have nuclear weapons. So, first, we need to talk about the series divergence. I know from reading about this book on-line that The Sky Road is an alternate future that does not follow the events of The Stone Canal and The Cassini Division. I do not know this from the text of the book, which is completely silent about even being part of a series. More annoyingly, while the divergence in the Earth's future compared to The Cassini Division is obvious, I don't know what the Jonbar hinge is. Everything I can find on-line about this book is maddeningly coy. Wikipedia claims the divergence happens at the end of The Sky Fraction. Other reviews and the Wikipedia talk page claim it happens in the middle of The Stone Canal. I do have a guess, but it's an unsatisfying one and I'm not sure how to test its correctness. I suppose I shouldn't care and instead take each of the books on their own terms, but this is the type of thing that my brain obsesses over, and I find it intensely irritating that MacLeod didn't explain it in the books themselves. It's the sort of authorial trick that makes me feel dumb, and books that gratuitously make me feel dumb are less enjoyable to read. The second annoyance I have with this book is also only partly its fault. This series, and this book in particular, is frequently mentioned as good political science fiction that explores different ways of structuring human society. This was true of some of the earlier books in a surprisingly superficial way. Here, I would call it hogwash. This book, or at least the Myra portion of it, is full of people doing politics in a tactical sense, but like the previous books of this series, that politics is mostly embedded in personal grudges and prior romantic relationships. Everyone involved is essentially an authoritarian whose ability to act as they wish is only contested by other authoritarians and is largely unconstrained by such things as persuasion, discussions, elections, or even theory. Myra and most of the people she meets are profoundly cynical and almost contemptuous of any true discussion of political systems. This is the trappings and mechanisms of politics without the intellectual debate or attempt at consensus, turning it into a zero-sum game won by whoever can threaten the others more effectively. Given the glowing reviews I've seen in relatively political SF circles, presumably I am missing something that other people see in MacLeod's approach. Perhaps this level of pettiness and cynicism is an accurate depiction of what it's like inside left-wing political movements. (What an appalling condemnation of left-wing political movements, if so.) But many of the on-line reviews lead me to instead conclude that people's understanding of "political fiction" is stunted and superficial. For example, there is almost nothing Marxist about this book it contains essentially no economic or class analysis whatsoever but MacLeod uses a lot of Marxist terminology and sets half the book in an explicitly communist state, and this seems to be enough for large portions of the on-line commentariat to conclude that it's full of dangerous, radical ideas. I find this sadly hilarious given that MacLeod's societies tend, if anything, towards a low-grade libertarianism that would be at home in a Robert Heinlein novel. Apparently political labels are all that's needed to make political fiction; substance is optional. So much for the politics. What's left in Clovis's sections is a classic science fiction adventure in which the protagonist has a radically different perspective from the reader and the fun lies in figuring out the world-building through the skewed perspective of the characters. This was somewhat enjoyable, but would have been more fun if Clovis had any discernible personality. Sadly he instead seems to be an empty receptacle for the prejudices and perspective of his society, which involve a lot of quasi-religious taboos and an essentially magical view of the world. Merrial is a more interesting character, although as always in this series the romance made absolutely no sense to me and seemed to be conjured by authorial fiat and weirdly instant sexual attraction. Myra's portion of the story was the part I cared more about and was more invested in, aided by the fact that she's attempting to do something more interesting than launch a crewed space vehicle for no obvious reason. She at least faces some true moral challenges with no obviously correct response. It's all a bit depressing, though, and I found Myra's unwillingness to ground her decisions in a more comprehensive moral framework disappointing. If you're going to make a protagonist the ruler of a communist state, even an ironic one, I'd like to hear some real political philosophy, some theory of sociology and economics that she used to justify her decisions. The bits that rise above personal animosity and vibes were, I think, said better in The Cassini Division. This series was disappointing, and I can't say I'm glad to have read it. There is some small pleasure in finishing a set of award-winning genre books so that I can have a meaningful conversation about them, but the awards failed to find me better books to read than I would have found on my own. These aren't bad books, but the amount of enjoyment I got out of them didn't feel worth the frustration. Not recommended, I'm afraid. Rating: 6 out of 10

24 January 2025

Scarlett Gately Moore: KDE: Snaps bug fixes and Kubuntu: Noble updates

Fixed a major crash bug in our apps that use webengine, I also went ahead and updated these to core24 https://bugs.launchpad.net/snapd/+bug/2095418 andhttps://bugs.kde.org/show_bug.cgi?id=498663 Fixed okular
Can t import certificates to digitally sign in Okular https://bugs.kde.org/show_bug.cgi?id=498558 Can t open files https://bugs.kde.org/show_bug.cgi?id=421987 and https://bugs.kde.org/show_bug.cgi?id=415711 Skanpage won t launch https://bugs.kde.org/show_bug.cgi?id=493847 in edge please help test. Ghostwriter https://bugs.kde.org/show_bug.cgi?id=481258
Kalm - Breathing techniques
New KDE Snaps! Kalm Breathing techniques
Telly-skout Display TV guides Kubuntu: Plasma 5.27.12 has been uploaded to archive proposed and should make the .2 release! I hate asking but I am unemployable with this broken arm fiasco. If you could spare anything it would be appreciated! https://gofund.me/573cc38e

9 January 2025

Scarlett Gately Moore: KDE: Snaps 24.12.1 Release, Kubuntu Plasma 5.27.12 Call for testers

I have released more core24 snaps to edge for your testing pleasure. If you find any bugs please report them at bugs.kde.org and assign them to me. Thanks!
Kdenlive our amazing video editor!
Haruna is a video player that also supports youtube!
Kdevelop is our feature rich development IDE KDE applications 24.12.1 release https://kde.org/announcements/gear/24.12.1/ New qt6 ports
Kubuntu: We have Plasma 5.27.12 Bugfix release in staging https://launchpad.net/~kubuntu-ppa/+archive/ubuntu/staging-plasma for noble updates, please test! Do NOT do this on a production system. Thanks! I hate asking but I am unemployable with this broken arm fiasco and 6 hours a day hospital runs for treatment. If you could spare anything it would be appreciated! https://gofund.me/573cc38e

Next.