Search Results: "nore"

19 February 2025

Thomas Lange: The secret maze of Debian images

TL;DR It's difficult to find the right Debian image. We have thousands of ISO files and cloud images and we support multiple CPU architectures and several download methods. The directory structure of our main image server is like a maze, and our web pages for downloading are also confusing. Most important facts from this blog post The Debian maze Debian maze Did you ever searched for a specific Debian image which was not the default netinst ISO for amd64? How long did it take to find it? Debian is very good at hiding their images for downloading by offering a huge amount of different versions and variants of images and multiple methods how to download them. Debian also has multiple web pages for This is the secret Debian maze of images. It's currently filled with 8700+ different ISO images and another 34.000+ files (raw and qcow2) for the cloud images. The main URL for the server hosting all Debian images is https://cdimage.debian.org/cdimage/ There, you will find installer images, live images, cloud images. Let's try to find the right image you need We have three different types of images: Images for the stable release Almost always, you are probably looking for the image to install the latest stable release. The URL https://cdimage.debian.org/cdimage/release/ shows:
12.9.0
12.9.0-live
current
current-live
but you cannot see that two are symlinks:
current -> 12.9.0/
current-live -> 12.9.0-live/
Here you will find the installer images and live images for the stable release (currently Debian 12, bookworm). If you choose https://cdimage.debian.org/cdimage/release/12.9.0/ you will see a list of CPU architectures:
amd64
arm64
armel
armhf
i386
mips64el
mipsel
ppc64el
s390x
source
trace
(BTW source and trace are no CPU architectures) The typical end user will not care about most architectures, because your computer will actually always need images from the amd64 folder. Maybe you have heard that your computer has a 64bit CPU and even if you have an Intel processor we call this architecture amd64. Let's see what's in the folder amd64:
bt-bd
bt-cd
bt-dvd
iso-bd
iso-cd
iso-dvd
jigdo-16G
jigdo-bd
jigdo-cd
jigdo-dlbd
jigdo-dvd
list-16G
list-bd
list-cd
list-dlbd
list-dvd
Wow. This is confusing and there's no description what all those folders mean. The first three are different methods how to download an image. Use iso when a single network connection will be fast enough for you. Using bt can result in a faster download, because it downloads via a peer-to-peer file sharing protocol. You need an additional torrent program for downloading. Then we have these variants: 16G and dlbd images are only available via jigdo. All iso-xx and bt-xx folders provide the same images but with a different access method. Here are examples of images:
  iso-cd/debian-12.9.0-amd64-netinst.iso
  iso-cd/debian-edu-12.9.0-amd64-netinst.iso
  iso-cd/debian-mac-12.9.0-amd64-netinst.iso
Fortunately the folder explains in very detail the differences between these images and what you also find there. You can ignore the SHA... files if you do not know what they are needed for. They are not important for you. These ISO files are small and contain only the core Debian installer code and a small set of programs. If you install a desktop environment, the other packages will be downloaded at the end of the installation. The folders bt-dvd and iso-dvd only contain debian-12.9.0-amd64-DVD-1.iso or the appropriate torrent file. In bt-bd and iso-bd you will only find debian-edu-12.9.0-amd64-BD-1.iso. These large images contain much more Debian packages, so you will not need a network connection during the installation. For the other CPU architectures (other than amd64) Debian provides less variants of images but still a lot. In total, we have 44 ISO files (or torrents) for the current release of the Debian installer for all architectures. When using jigdo you can choose between 268 images. And these are only the installer images for the stable release, no older or newer version are counted here. Take a breath before we're diving into..... The live images The live images in release/12.9.0-live/amd64/iso-hybrid/ are only available for the amd64 architecture but for newer Debian releases there will be images also for arm64. We have 7 different live images containing one of the most common desktop environments and one with only a text interface (standard).
debian-live-12.9.0-amd64-xfce.iso
debian-live-12.9.0-amd64-mate.iso
debian-live-12.9.0-amd64-lxqt.iso
debian-live-12.9.0-amd64-gnome.iso
debian-live-12.9.0-amd64-lxde.iso
debian-live-12.9.0-amd64-standard.iso
debian-live-12.9.0-amd64-cinnamon.iso
debian-live-12.9.0-amd64-kde.iso
The folder name iso-hybrid is the technology that you can use those ISO files for burning them onto a CD/DVD/BD or writing the same ISO file to a USB stick. bt-hybrid will give you the torrent files for downloading the same images using a torrent client program. More recent installer and live images (aka testing) For newer version of the images we have currently these folders:
daily-builds
weekly-builds
weekly-live-builds
trixie_di_alpha1
I suggest using the weekly-builds because in this folder you find a similar structure and all variants of images as in the release directory. For e.g. weekly-builds/amd64/iso-cd/debian-testing-amd64-netinst.iso and similar for the live images
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-kde.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-lxde.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-debian-junior.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-standard.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-lxqt.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-mate.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-xfce.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-gnome.iso
weekly-live-builds/amd64/iso-hybrid/debian-live-testing-amd64-cinnamon.iso
weekly-live-builds/arm64/iso-hybrid/debian-live-testing-arm64-gnome.iso
Here you see a new variant call debian-junior, which is a Debian blend. BitTorrent files are not available for weekly builds. The daily-builds folder structure is different and only provide the small network install (netinst) ISOs but several versions of the last days. Currently we have 55 ISO files available there. If you like to use the newest installation image fetch this one: https://cdimage.debian.org/cdimage/daily-builds/sid_d-i/arch-latest/amd64/iso-cd/debian-testing-amd64-netinst.iso Debian stable with a backports kernel Unfortunately Debian does not provide any installation media using the stable release but including a backports kernel for newer hardware. This is because our installer environment is a very complex mix of special tools (like anna) and special .udeb versions of packages. But the FAIme web service of my FAI project can build a custom installation image using the backports kernel. Choose a desktop environment, a language and add some packages names if you like. Then select Debian 12 bookworm and then enable backports repository including newer kernel. After a short time you can download your own installation image. Older releases Usually you should not use older releases for a new installation. In our archive the folder https://cdimage.debian.org/cdimage/archive/ contains 6163 ISO files starting from Debian 3.0 (first release was in 2002) and including every point release. The full DVD image for the oldstable release (Debian 11.11.0 including non-free firmware) is here https://cdimage.debian.org/cdimage/unofficial/non-free/cd-including-firmware/archive/latest-oldstable/amd64/iso-dvd/firmware-11.11.0-amd64-DVD-1.iso the smaller netinst image is https://cdimage.debian.org/cdimage/archive/11.10.0/amd64/iso-cd/debian-11.10.0-amd64-netinst.iso The oldest ISO I could find is from 1999 using kernel 2.0.36 I still didn't managed to boot it in KVM. UPDATE I got a kernel panic because the VM had 4GB RAM. Reducing this to 500MB RAM (also 8MB works) started the installer of Debian 2.1 without any problems. Anything else? In this post, we still did not cover the ports folder (the non official supported (older) hardware architectures) which contains around 760 ISO files and the unofficial folder (1445 ISO files) which also provided the ISOs which included the non-free firmware blobs in the past. Then, there are more than 34.000 cloud images. But hey, no ISO files are involved there. This may be part of a complete new posting.

9 February 2025

Antoine Beaupr : A slow blogging year

Well, 2024 will be remembered, won't it? I guess 2025 already wants to make its mark too, but let's not worry about that right now, and instead let's talk about me. A little over a year ago, I was gloating over how I had such a great blogging year in 2022, and was considering 2023 to be average, then went on to gather more stats and traffic analysis... Then I said, and I quote:
I hope to write more next year. I've been thinking about a few posts I could write for work, about how things work behind the scenes at Tor, that could be informative for many people. We run a rather old setup, but things hold up pretty well for what we throw at it, and it's worth sharing that with the world...
What a load of bollocks.

A bad year for this blog 2024 was the second worst year ever in my blogging history, tied with 2009 at a measly 6 posts for the year:
anarcat@angela:anarc.at$ curl -sSL https://anarc.at/blog/   grep 'href="\./'   grep -o 20[0-9][0-9]   sort   uniq -c   sort -nr   grep -v 2025   tail -3
      6 2024
      6 2009
      3 2014
I did write about my work though, detailing the migration from Gitolite to GitLab we completed that year. But after August, total radio silence until now.

Loads of drafts It's not that I have nothing to say: I have no less than five drafts in my working tree here, not counting three actual drafts recorded in the Git repository here:
anarcat@angela:anarc.at$ git s blog
## main...origin/main
?? blog/bell-bot.md
?? blog/fish.md
?? blog/kensington.md
?? blog/nixos.md
?? blog/tmux.md
anarcat@angela:anarc.at$ git grep -l '\!tag draft'
blog/mobile-massive-gallery.md
blog/on-dying.mdwn
blog/secrets-recovery.md
I just don't have time to wrap those things up. I think part of me is disgusted by seeing my work stolen by large corporations to build proprietary large language models while my idols have been pushed to suicide for trying to share science with the world. Another part of me wants to make those things just right. The "tagged drafts" above are nothing more than a huge pile of chaotic links, far from being useful for anyone else than me, and even then. The on-dying article, in particular, is becoming my nemesis. I've been wanting to write that article for over 6 years now, I think. It's just too hard.

Writing elsewhere There's also the fact that I write for work already. A lot. Here are the top-10 contributors to our team's wiki:
anarcat@angela:help.torproject.org$ git shortlog --numbered --summary --group="format:%al"   head -10
  4272  anarcat
   423  jerome
   117  zen
   116  lelutin
   104  peter
    58  kez
    45  irl
    43  hiro
    18  gaba
    17  groente
... but that's a bit unfair, since I've been there half a decade. Here's the last year:
anarcat@angela:help.torproject.org$ git shortlog --since=2024-01-01 --numbered --summary --group="format:%al"   head -10
   827  anarcat
   117  zen
   116  lelutin
    91  jerome
    17  groente
    10  gaba
     8  micah
     7  kez
     5  jnewsome
     4  stephen.swift
So I still write the most commits! But to truly get a sense of the amount I wrote in there, we should count actual changes. Here it is by number of lines (from commandlinefu.com):
anarcat@angela:help.torproject.org$ git ls-files   xargs -n1 git blame --line-porcelain   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  99046 Antoine Beaupr 
   6900 Zen Fu
   4784 J r me Charaoui
   1446 Gabriel Filion
   1146 Jerome Charaoui
    837 groente
    705 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
That, of course, is the entire history of the git repo, again. We should take only the last year into account, and probably ignore the tails directory, as sneaky Zen Fu imported the entire docs from another wiki there...
anarcat@angela:help.torproject.org$ find [d-s]* -type f -mtime -365   xargs -n1 git blame --line-porcelain 2>/dev/null   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  75037 Antoine Beaupr 
   2932 J r me Charaoui
   1442 Gabriel Filion
   1400 Zen Fu
    929 Jerome Charaoui
    837 groente
    702 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
Pretty good! 75k lines. But those are the files that were modified in the last year. If we go a little more nuts, we find that:
anarcat@angela:help.torproject.org$ $ git-count-words-range.py    sort -k6 -nr   head -10
parsing commits for words changes from command: git log '--since=1 year ago' '--format=%H %al'
anarcat 126116 - 36932 = 89184
zen 31774 - 5749 = 26025
groente 9732 - 607 = 9125
lelutin 10768 - 2578 = 8190
jerome 6236 - 2586 = 3650
gaba 3164 - 491 = 2673
stephen.swift 2443 - 673 = 1770
kez 1034 - 74 = 960
micah 772 - 250 = 522
weasel 410 - 0 = 410
I wrote 126,116 words in that wiki, only in the last year. I also deleted 37k words, so the final total is more like 89k words, but still: that's about forty (40!) articles of the average size (~2k) I wrote in 2022. (And yes, I did go nuts and write a new log parser, essentially from scratch, to figure out those word diffs. I did get the courage only after asking GPT-4o for an example first, I must admit.) Let's celebrate that again: I wrote 90 thousand words in that wiki in 2024. According to Wikipedia, a "novella" is 17,500 to 40,000 words, which would mean I wrote about a novella and a novel, in the past year. But interestingly, if I look at the repository analytics. I certainly didn't write that much more in the past year. So that alone cannot explain the lull in my production here.

Arguments Another part of me is just tired of the bickering and arguing on the internet. I have at least two articles in there that I suspect is going to get me a lot of push-back (NixOS and Fish). I know how to deal with this: you need to write well, consider the controversy, spell it out, and defuse things before they happen. But that's hard work and, frankly, I don't really care that much about what people think anymore. I'm not writing here to convince people. I have stop evangelizing a long time ago. Now, I'm more into documenting, and teaching. And, while teaching, there's a two-way interaction: when you give out a speech or workshop, people can ask questions, or respond, and you all learn something. When you document, you quickly get told "where is this? I couldn't find it" or "I don't understand this" or "I tried that and it didn't work" or "wait, really? shouldn't we do X instead", and you learn. Here, it's static. It's my little soapbox where I scream in the void. The only thing people can do is scream back.

Collaboration So. Let's see if we can work together here. If you don't like something I say, disagree, or find something wrong or to be improved, instead of screaming on social media or ignoring me, try contributing back. This site here is backed by a git repository and I promise to read everything you send there, whether it is an issue or a merge request. I will, of course, still read comments sent by email or IRC or social media, but please, be kind. You can also, of course, follow the latest changes on the TPA wiki. If you want to catch up with the last year, some of the "novellas" I wrote include: (Well, no, you can't actually follow changes on a GitLab wiki. But we have a wiki-replica git repository where you can see the latest commits, and subscribe to the RSS feed.) See you there!

31 January 2025

Divine Attah-Ohiemi: Seeking Opportunities: Building a Career in Software Engineering and Beyond

My journey in CS has always been driven by curiosity, determination, and a deep love for understanding software solutions at its tiniest, most complex levels. Taking ALX Africa Software Engineer track after High school was where it all started for me. During the 1-year intensive bootcamp, I delved into the intricacies of Linux programming and low-level programming with C, which solidified my foundational knowledge. This experience not only enhanced my technical skills but also taught me the importance of adaptability and self-directed learning. I discovered how to approach challenges with curiosity, igniting a passion for exploring software solutions in their most intricate forms. Each module pushed me to think critically and creatively, transforming my understanding of technology and its capabilities. Let s just say that I have always been drawn to asking, How does this happen?" And I just go on and on until I find an answer eventually and sometimes I don t but that s okay. That curiosity, combined with a deep commitment to learning, has guided my journey. Debian Webmaster My drive has led me to get involved in open-source contributions, where I can put my knowledge to the test while helping my community. Engaging with real-world experts and learning from my mistakes has been invaluable. One of the highlights of this journey was joining the Debian Webmasters team as an intern through Outreachy. Here, I have the honor of working on redesigning and migrating the old Debian webpages to make them more user-friendly. This experience not only allows me to apply my skills in a practical setting but also deepens my understanding of collaborative software development. Building My Skills: The Foundation of My Experience Throughout my academic and professional journey, I have taken on many roles that have shaped my skills and prepared me for what s ahead I believe. I am definitely not a one-trick pony, and maybe not completely a jack of all trade either but I am a bit diverse I d like to think. Here are the key roles that have defined my journey so far: Volunteer Developer at Yoris Africa (June 2022 - August 2023) I began my career by volunteering at Yoris, where I collaborated with a talented team to design and build the frontend for a mobile app. My contributions extended beyond just the frontend; I also worked on backend solutions and microservices, gaining hands-on experience in full-stack development. This role was instrumental in shaping my understanding of software architecture, allowing me to contribute meaningfully to projects while learning from experienced developers in a dynamic environment. Freelance Academics Software Developer (September 2023 - October 2024) I freelanced as an academic software developer, where I pitched and developed software solutions for universities in my community. One of my most notable projects was creating a Computer-Based Testing (CBT) software for a medical school, which featured a unique questionnaire and scoring system tailored to their specific needs. This experience not only allowed me to apply my technical skills in a real-world setting but also deepened my understanding of educational software requirements and user experience, ultimately enhancing the learning process for students. Open Source Intern at Debian Webmaster Team (November 2024 -) Perhaps the most transformative experience has been my role as an intern at Debian Webmasters. This opportunity allowed me to delve into the fascinating world of open source. As an intern, I have the chance to work on a project where we are redesigning and migrating the Debian webpages to utilize a new and faster technology: Go templates with Hugo. For a detailed look at the work and progress I made during my internship, as well as information on this project and how to get involved, you can check out the wiki. My ultimate goal with this role is to build a vibrant community for Debian in Africa and, if given the chance, to host a debian-cd mirror for faster installations in my region. You can connect with me through LinkedIn, or X (formerly Twitter), or reach out via email.

20 January 2025

Divine Attah-Ohiemi: Progress Report: First Half of My Outreachy Internship

Hello everyone!, I m excited to share a progress report on my Outreachy internship with the Debian community. As I reach the halfway point of this journey, I want to reflect on what I ve accomplished so far and outline my modified goals for the second half of the internship. In truth, there wasn t a strict timeline for my project migrating Debian webpage content to Hugo because the original repository contained thousands of pages. The initial goal was to develop a proof of concept for: Thanks to our daily standups, where we brainstorm and revise contributions, we ve made significant progress. The wiki documentation discussing the technical decisions taken to meet these goals is currently in progress here. During the first half of my internship, I have improved and refined my skills in several areas. I learned new Markdown syntaxes, studied and utilized Apache's mod_rewrite, and halfway studied GNU Make to use Perl scripts for processing data for dynamic content. I recommend Managing Projects with GNU Make by Robert Mecklenburg it's a great book for beginners! While I didn t get stuck on any particular goal, the most challenging aspect was adding Hugo aliases to help with Apache's multilingual content negotiation. The way the webwml repository generates multilingual content differs from debianhugo. For instance, in webwml, the structure looks like this: english/index.wml -> /index.en.html (with a symlink from index.html to index.en.html) and french/index.wml -> /index.fr.html. In contrast, debianhugo uses en/_index.md -> /index.html and fr/_index.md -> /fr/index.html. Apache's multilingual content negotiation checks for index.<user preferred lang code>.html in the current directory, which works well with webwml since all related translations are generated in the same directory. However, with debianhugo using subdirectories for languages other than English, we had to set up aliases for every other language page to be generated in the frontmatter. For example, in fr/_index.md, we added this to the front matter:
...
aliases:
  - /index.fr.html
...
This setup allows Hugo to generate multilingual HTML files in the initial home directory solely for the purpose of setting up a 301 redirect to the same page in the language subdirectory. However, if the client sets their preferred language to English, Apache content negotiation tries to find /index.en.html. If it doesn t find it, it defaults to any other language-suffixed file, which can lead to unexpected behavior. For example, if English is set as the preferred language, accessing the site may serve /index.fr.html, which then redirects to /fr/index.html. This was a significant challenge, and you can see a demo of this hosted here. If I were to start the project over, I would document every decision as I make them in the wiki, no matter how rough the documentation turns out. Waiting until the midpoint of the project to document was not a good idea. As I move into the second half of my internship, the goals we ve set include improving our project wiki documentation and continuing the migration process while enhancing the user experience of complicated sections. I m looking forward to making even more progress and sharing my journey with you all. Happy coding!

15 January 2025

Dirk Eddelbuettel: RcppFastFloat 0.0.5 on CRAN: New Upstream, Updates

A new minor release of RcppFastFloat just arrived on CRAN. The package wraps fast_float, another nice library by Daniel Lemire. For details, see the arXiv preprint or published paper showing that one can convert character representations of numbers into floating point at rates at or exceeding one gigabyte per second. This release updates the underlying fast_float library version to the current version 7.0.0, and updates a few packaging aspects.

Changes in version 0.0.5 (2025-01-15)
  • No longer set a compilation standard
  • Updates to continuous integration, badges, URLs, DESCRIPTION
  • Update to fast_float 7.0.0
  • Per CRAN Policy comment-out compiler 'diagnostic ignore' instances

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the [issue tracker][issue tickets] at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

7 January 2025

Jonathan Wiltshire: Using TPM for Automatic Disk Decryption in Debian 12

These days it s straightforward to have reasonably secure, automatic decryption of your root filesystem at boot time on Debian 12. Here s how I did it on an existing system which already had a stock kernel, secure boot enabled, grub2 and an encrypted root filesystem with the passphrase in key slot 0. There s no need to switch to systemd-boot for this setup but you will use systemd-cryptenroll to manage the TPM-sealed key. If that offends you, there are other ways of doing this.

Caveat The parameters I ll seal a key against in the TPM include a hash of the initial ramdisk. This is essential to prevent an attacker from swapping the image for one which discloses the key. However, it also means the key has to be re-sealed every time the image is rebuilt. This can be frequent, for example when installing/upgrading/removing packages which include a kernel module. You won t get locked out (as long as you still have a passphrase in another slot), but will need to re-seal the key to restore the automation. You can also choose not to include this parameter for the seal, but that opens the door to such an attack.

Caution: these are the steps I took on my own system. You may need to adjust them to avoid ending up with a non-booting system.

Check for a usable TPM device We ll bind the secure boot state, kernel parameters, and other boot measurements to a decryption key. Then, we ll seal it using the TPM. This prevents the disk being moved to another system, the boot chain being tampered with and various other attacks.
# apt install tpm2-tools
# systemd-cryptenroll --tpm2-device list
PATH        DEVICE     DRIVER 
/dev/tpmrm0 STM0125:00 tpm_tis

Clean up older kernels including leftover configurations I found that previously-removed (but not purged) kernel packages sometimes cause dracut to try installing files to the wrong paths. Identify them with:
# apt install aptitude
# aptitude search '~c'
Change search to purge or be more selective, this part is an exercise for the reader.

Switch to dracut for initramfs images Unless you have a particular requirement for the default initramfs-tools, replace it with dracut and customise:
# mkdir /etc/dracut.conf.d
# echo 'add_dracutmodules+=" tpm2-tss crypt "' > /etc/dracut.conf.d/crypt.conf
# apt install dracut

Remove root device from crypttab, configure grub Remove (or comment) the root device from /etc/crypttab and rebuild the initial ramdisk with dracut -f. Edit /etc/default/grub and add rd.auto rd.luks=1 to GRUB_CMDLINE_LINUX. Re-generate the config with update-grub. At this point it s a good idea to sanity-check the initrd contents with lsinitrd. Then, reboot using the new image to ensure there are no issues. This will also have up-to-date TPM measurements ready for the next step.

Identify device and seal a decryption key
# lsblk -ip -o NAME,TYPE,MOUNTPOINTS
NAME                                                    TYPE  MOUNTPOINTS
/dev/nvme0n1p4                                          part  /boot
/dev/nvme0n1p5                                          part  
 -/dev/mapper/luks-deff56a9-8f00-4337-b34a-0dcda772e326 crypt 
   -/dev/mapper/lv-var                                  lvm   /var
   -/dev/mapper/lv-root                                 lvm   /
   -/dev/mapper/lv-home                                 lvm   /home
In this example my root filesystem is in a container on /dev/nvme0n1p5. The existing passphrase key is in slot 0.
# systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=7+8+9+14 /dev/nvme0n1p5
Please enter current passphrase for disk /dev/nvme0n1p5: ********
New TPM2 token enrolled as key slot 1.
The PCRs I chose (7, 8, 9 and 14) correspond to the secure boot policy, kernel command line (to prevent init=/bin/bash-style attacks), files read by grub including that crucial initrd measurement, and secure boot MOK certificates and hashes. You could also include PCR 5 for the partition table state, and any others appropriate for your setup.

Reboot You should now be able to reboot and the root device will be unlocked automatically, provided the secure boot measurements remain consistent. The key slot protected by a passphrase (mine is slot 0) is now your recovery key. Do not remove it!
Please consider supporting my work in Debian and elsewhere through Liberapay.

31 December 2024

Scarlett Gately Moore: KDE: Application snaps 24.12.0 release and more

https://kde.org/announcements/gear/24.12.0 I hope everyone had a wonderful holiday! Your present from me is shiny new application snaps! There are several new qt6 ports in this release. Please visit https://snapcraft.io/store?q=kde I have also fixed the Krita snap unable to open/save bug. Please test edge! I am continuing work on core24 support and hope to be done before next release. I do look forward to 2025! Begone 2024! If you can help with gas, I still have 3 weeks of treatments to go. Thank you for your continued support. https://gofund.me/573cc38e

12 December 2024

Matthew Garrett: When should we require that firmware be free?

The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.

Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.

How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.

But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.

So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?

I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.

Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.

This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).

RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.

The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.

For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.

But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.

As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.

Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.

[1] Yes yes SMM

comment count unavailable comments

11 December 2024

Divine Attah-Ohiemi: From Sisterly Wisdom to Debian Dreams: My Outreachy Journey

Discovering Open Source: How I Got Introduced
Hey there! I m Divine Attah-Ohiemi, a sophomore studying Computer Science. My journey into the world of open source was anything but grand. It all started with a simple question to my sister: How do people get jobs without experience? Her answer? Open source! I dove into this vibrant community, and it felt like discovering a hidden treasure chest filled with knowledge and opportunities. Choosing Debian: Why This Community?
Why Debian, you ask? Well, I applied to Outreachy twice, and both times, I chose Debian. It s not just my first operating system; it feels like home. The Debian community is incredibly welcoming, like a big family gathering where everyone supports each other. Whether I was updating my distro or poring over documentation, the care and consideration in this community were palpable. It reminded me of the warmth of homeschooling with relatives. Plus, knowing that Debian's name comes from its creator Ian and his wife Debra adds a personal touch that makes me feel even more honored to contribute to making the website better! Why I Applied to Outreachy: What Inspired Me
Outreachy is my golden ticket to the open source world! As a 19-year-old, I see this internship as a unique opportunity to gain invaluable experience while contributing to something meaningful. It s the perfect platform for me to learn, grow, and connect with like-minded individuals who share my passion for technology and community. I m excited for this journey and can t wait to see where it takes me!

9 December 2024

Paul Wise: FLOSS Activities November 2024

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Communication
  • Respond to queries from Debian users and contributors on IRC

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

6 December 2024

B lint R czey: Firebuild 0.8.3 is out with 100+ fixes and experimental macOS support!

The new Firebuild release contains plenty of small fixes and a few notable improvements.

Experimental macOS support The most frequently asked question from people getting to know Firebuild was if it worked on their Mac and the answer sadly used to be that well, it did, but only in a Linux VM. This was far from what they were looking for. Linux and macOS have common UNIX roots, but porting Firebuild to macOS included bigger challenges, like ensuring that dyld(1), macOS s dynamic loader initializes the preloaded interceptor library early enough to catch all interesting calls, and avoid using anything that uses malloc() or thread local variables which are not yet set up then. Preloading libraries on Linux is really easy, running LD_PRELOAD=my_lib.so ls just works if the library exports the symbols to be interposed, while macOS employs multiple lines of defense to prevent applications from using unknown libraries. Firebuild s guide for making DYLD_INSERT_LIBRARIES honored on Macs can be helpful with other projects as well that rely on injecting libraries. Since GitHub s Arm64 macOS runners don t allow intercepting binaries with arm64e ABI yet, Firebuild s Apple Silicon tests are run at Bitrise, who are proud to be first to provide the latest Xcode stacks and were also quick to make the needed changes to their infrastructure to support Firebuild (thanks!  ). Firebuild on macOS can already accelerate simple projects and rebuild itself with Xcode. Since Xcode introduces a lot of nondeterminism to the build, Firebuild can t shine in acceleration with Xcode yet, but can provide nice reports to show which part of the build is the most time consuming and how each sub-command is called. If you would like to try Firebuild on macOS please compile it from the GitHub repository for now. Precompiled binaries will be distributed on the Mac App Store and via CI providers. Contact us to get notified when those channels become available.

Dealing with the Epochalypse Glibc s API provides many functions with time parameters and some of those functions are intercepted by Firebuild. Time parameters used to be passed as 32-bit values on 32-bit systems, preventing them to accurately represent timestamps after year 2038, which is known as the Y2038 problem or the Epochalypse. To deal with the problem glibc 2.34 started providing new function symbol variants with 64-bit time parameters, e.g clock_gettime64() in addition to clock_gettime(). The new 64-bit variants are used when compiling consumers of the API with _TIME_BITS=64 defined. Processes intercepted by Firebuild may have been compiled with or without _TIME_BITS=64, thus libfirebuild now provides both variants on affected systems running glibc >= 34 to work safely with binaries using 64-bit and 32-bit time representation. Many Linux distributions already stopped supporting 32-bit architectures, but Debian and Ubuntu still supports armhf, for example, where the Y2038 problem still applies. Both Debian and Ubuntu performed a transition rebuilding every library (and their reverse dependencies) with -D_FILE_OFFSET_BITS=64 set where the libraries exported symbols that changed when switching to 64-bit time representation (thanks to Steve Langasek for driving this!) . Thanks to the transition most programs are ready for 2038, but interposer libraries are trickier to fix and if you maintain one it might be a good idea to check if it works well both 32-bit and 64-bit libraries. Faketime, for example is not fixed yet, see #1064555.

Select passed through environment variables with regular expressions Firebuild filters out most of the environment variables set when starting a build to make the build more reproducible and achieve higher cache hit rate. Extra environment variables to pass through can be specified on the command line one by one, but with many similarly named variables this may become hard to maintain. With regular expressions this just became easier:
firebuild -o 'env_vars.pass_through += "MY_VARS_.*"' my_build_command
If you are not interested in acceleration just would like to explore what the build does by generating a report you can simply pass all variables:
firebuild -r -o 'env_vars.pass_through += ".*"' my_build_command

Other highlights from the 0.8.3 release
  • Fixed and nicer report in Chrome and other WebKit based browsers
  • Support GLibc 2.39 by intercepting pidfd_spawn() and pidfd_spawnp()
  • Even faster Rust build acceleration
For all the changes please check out the release page on GitHub!  (This post is also published on The Firebuild blog.)

2 December 2024

Dirk Eddelbuettel: anytime 0.3.10 on CRAN: Multiple Enhancements

A new release of the anytime package arrived on CRAN today the first is well over four years. The package is fairly feature-complete, and code and functionality remain mature and stable, of course. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character, factor, ordered, input format to either POSIXct (when called as anytime) or Date objects (when called as anydate) and to do so without requiring a format string as well as accomodating different formats in one input vector. See the anytime page, or the GitHub repo for a few examples, and the beautiful documentation site for all documentation. This release slowly matured over four years. It combines a number of strictly internal repository maintenance such as changes to continuous integration with small enhancements (adding for example some new formats, responding better to an error condition, dealing with logical input as an error) with a relaxation of the C++ compilation standard. While we once needed C++11, it is now a constraint as as R itself is quite proactive (the last two releases defaulted already to C++17, suitable compiler permitting) we can now relax this constraint. The documentation site is new, as some other small changes. See the full list of changes which follows.

Changes in anytime version 0.3.10 (2024-12-02)
  • A new documentation site was added.
  • Continuous Integration now uses run.sh from r-ci with bspm
  • Logical input vectors are now recognised as an error (#121)
  • Additional dot-separated format '%Y.%m.%d' is supported
  • Other small updates were made throughout the package
  • No longer set a C++ compilation standard as the default choices by R are sufficient for the package
  • Switch Rcpp include file to Rcpp/Lightest
  • We recommend ~/.R/Makevars compiler flag options -Wno-ignored-attributes -Wno-nonnull -Wno-parentheses
  • The tinytest runner was simplified
  • NA values from conversion now trigger a warning

Courtesy of my CRANberries, there is also a diffstat report of changes relative to the previous release. The issue tracker tracker off the GitHub repo can be use for questions and comments. More information about the package is at the package page, the GitHub repo and the documentation site. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 November 2024

Dima Kogan: Strava track filtering validation

After years of seeing people's strava tracks, I became convinced that they insufficiently filter the data, resulting in over-estimating the effort. Today I did a bit of lazy analysis, and half-confirmed this: in the one case I looked at, strava reported reasonable elevation gain numbers, but greatly overestimated the distance traveled. I looked at a single gps track of a long bike ride. This was uploaded to strava manually, as a .gpx file. I can imagine that different things happen if you use the strava app or some device that integrates with the service (the filtering might happen before the data hits the server, and the server could decide to not apply any more filtering). I processed the data with a simple hysteretic filter, ignoring small changes in position and elevation, trying out different thresholds in the process. I completely ignore the timestamps, and only look at the differences between successive points. This handles the usual GPS noise; it does not handle GPS jumps, which I completely ignore in this analysis. Ignoring these would produce inflated elevation/gain numbers, but I'm working with a looong track, so hopefully this is a small effect. Clearly this is not scientific, but it's something.

The code
Parsing .gpx is slow (this is a big file), so I cache that into a .vnl:
import sys
import gpxpy
filename_in  = 'INPUT.gpx'
filename_out = 'OUTPUT.gpx'
with open(filename_in, 'r') as f:
    gpx = gpxpy.parse(f)
f_out = open(filename_out, 'w')
tracks = gpx.tracks
if len(tracks) != 1:
    print("I want just one track", file=sys.stderr)
    sys.exit(1)
track = tracks[0]
segments = track.segments
if len(segments) != 1:
    print("I want just one segment", file=sys.stderr)
    sys.exit(1)
segment = segments[0]
time0 = segment.points[0].time
print("# time lat lon ele_m")
for p in segment.points:
    print(f" (p.time - time0).seconds   p.latitude   p.longitude   p.elevation ",
          file = f_out)
And I process this data with the different filters (this is a silly Python loop, and is slow):
#!/usr/bin/python3
import sys
import numpy as np
import numpysane as nps
import gnuplotlib as gp
import vnlog
import pyproj
geod = None
def dist_ft(lat0,lon0, lat1,lon1):
    global geod
    if geod is None:
        geod = pyproj.Geod(ellps='WGS84')
    return \
        geod.inv(lon0,lat0, lon1,lat1)[2] * 100./2.54/12.
f = 'OUTPUT.gpx'
track,list_keys,dict_key_index = \
    vnlog.slurp(f)
t      = track[:,dict_key_index['time' ]]
lat    = track[:,dict_key_index['lat'  ]]
lon    = track[:,dict_key_index['lon'  ]]
ele_ft = track[:,dict_key_index['ele_m']] * 100./2.54/12.
@nps.broadcast_define( ( (), ()),
                       (2,))
def filter_track(ele_hysteresis_ft,
                 dxy_hysteresis_ft):
    dist        = 0.0
    ele_gain_ft = 0.0
    lon_accepted = None
    lat_accepted = None
    ele_accepted = None
    for i in range(len(lat)):
        if ele_accepted is not None:
            dxy_here  = dist_ft(lat_accepted,lon_accepted, lat[i],lon[i])
            dele_here = np.abs( ele_ft[i] - ele_accepted )
            if dxy_here < dxy_hysteresis_ft and dele_here < ele_hysteresis_ft:
                continue
            if ele_ft[i] > ele_accepted:
                ele_gain_ft += dele_here;
            dist += np.sqrt(dele_here * dele_here +
                            dxy_here  * dxy_here)
        lon_accepted = lon[i]
        lat_accepted = lat[i]
        ele_accepted = ele_ft[i]
    # lose the last point. It simply doesn't matter
    dist_mi = dist / 5280.
    return np.array((ele_gain_ft, dist_mi))
Nele_hysteresis_ft    = 20
ele_hysteresis0_ft    = 5
ele_hysteresis1_ft    = 100
ele_hysteresis_ft_all = np.linspace(ele_hysteresis0_ft,
                                    ele_hysteresis1_ft,
                                    Nele_hysteresis_ft)
Ndxy_hysteresis_ft = 20
dxy_hysteresis0_ft = 5
dxy_hysteresis1_ft = 1000
dxy_hysteresis_ft  = np.linspace(dxy_hysteresis0_ft,
                                 dxy_hysteresis1_ft,
                                 Ndxy_hysteresis_ft)
# shape (Nele,Ndxy,2)
gain,distance = \
    nps.mv( filter_track( nps.dummy(ele_hysteresis_ft_all,-1),
                          dxy_hysteresis_ft),
            -1,0 )
# Stolen from mrcal
def options_heatmap_with_contours( plotoptions, # we update this on output
                                   *,
                                   contour_min           = 0,
                                   contour_max,
                                   contour_increment     = None,
                                   do_contours           = True,
                                   contour_labels_styles = 'boxed',
                                   contour_labels_font   = None):
    r'''Update plotoptions, return curveoptions for a contoured heat map'''
    gp.add_plot_option(plotoptions,
                       'set',
                       ('view equal xy',
                        'view map'))
    if do_contours:
        if contour_increment is None:
            # Compute a "nice" contour increment. I pick a round number that gives
            # me a reasonable number of contours
            Nwant = 10
            increment = (contour_max - contour_min)/Nwant
            # I find the nearest 1eX or 2eX or 5eX
            base10_floor = np.power(10., np.floor(np.log10(increment)))
            # Look through the options, and pick the best one
            m   = np.array((1., 2., 5., 10.))
            err = np.abs(m * base10_floor - increment)
            contour_increment = -m[ np.argmin(err) ] * base10_floor
        gp.add_plot_option(plotoptions,
                           'set',
                           ('key box opaque',
                            'style textbox opaque',
                            'contour base',
                            f'cntrparam levels incremental  contour_max , contour_increment , contour_min '))
        if contour_labels_font is not None:
            gp.add_plot_option(plotoptions,
                               'set',
                               f'cntrlabel format "%d" font " contour_labels_font "' )
        else:
            gp.add_plot_option(plotoptions,
                               'set',
                               f'cntrlabel format "%.0f"' )
        plotoptions['cbrange'] = [contour_min, contour_max]
        # I plot 3 times:
        # - to make the heat map
        # - to make the contours
        # - to make the contour labels
        _with = np.array(('image',
                          'lines nosurface',
                          f'labels  contour_labels_styles  nosurface'))
    else:
        gp.add_plot_option(plotoptions, 'unset', 'key')
        _with = 'image'
    using = \
        f'( dxy_hysteresis0_ft +$1* float(dxy_hysteresis1_ft-dxy_hysteresis0_ft)/(Ndxy_hysteresis_ft-1) ):' + \
        f'( ele_hysteresis0_ft +$2* float(ele_hysteresis1_ft-ele_hysteresis0_ft)/(Nele_hysteresis_ft-1) ):3'
    plotoptions['_3d']     = True
    plotoptions['_xrange'] = [dxy_hysteresis0_ft,dxy_hysteresis1_ft]
    plotoptions['_yrange'] = [ele_hysteresis0_ft,ele_hysteresis1_ft]
    plotoptions['ascii']   = True # needed for using to work
    gp.add_plot_option(plotoptions, 'unset', 'grid')
    return \
        dict( tuplesize=3,
              legend = "", # needed to force contour labels
              using = using,
              _with=_with)
contour_granularity = 1000
plotoptions = dict()
curveoptions = \
    options_heatmap_with_contours( plotoptions, # we update this on output
                                   # round down to the nearest contour_granularity
                                   contour_min = (np.min(gain) // contour_granularity)*contour_granularity,
                                   # round up to the nearest contour_granularity
                                   contour_max = ((np.max(gain) + (contour_granularity-1)) // contour_granularity) * contour_granularity,
                                   do_contours = True)
gp.add_plot_option(plotoptions, 'unset', 'key')
gp.add_plot_option(plotoptions, 'set', 'size square')
gp.plot(gain,
        xlabel  = "Distance hysteresis (ft)",
        ylabel  = "Elevation hysteresis (ft)",
        cblabel = "Elevation gain (ft)",
        wait = True,
        **curveoptions,
        **plotoptions,
        title    = 'Computed gain vs filtering parameters')
contour_granularity = 10
plotoptions = dict()
curveoptions = \
    options_heatmap_with_contours( plotoptions, # we update this on output
                                   # round down to the nearest contour_granularity
                                   contour_min = (np.min(distance) // contour_granularity)*contour_granularity,
                                   # round up to the nearest contour_granularity
                                   contour_max = ((np.max(distance) + (contour_granularity-1)) // contour_granularity) * contour_granularity,
                                   do_contours = True)
gp.add_plot_option(plotoptions, 'unset', 'key')
gp.add_plot_option(plotoptions, 'set', 'size square')
gp.plot(distance,
        xlabel  = "Distance hysteresis (ft)",
        ylabel  = "Elevation hysteresis (ft)",
        cblabel = "Distance (miles)",
        wait = True,
        **curveoptions,
        **plotoptions,
        title    = 'Computed distance vs filtering parameters')

Results: gain
Strava says the gain was 46307ft. The analysis says:
strava-gain.png
strava-gain-zoom.png
These show the filtered gain for different values of the distance and gain hysteresis thresholds. The same data is shown at diffent zoom levels. There's no sweet spot, but we get 46307ft with a reasonable amount of filtering. Maybe 46307ft is a bit low even.

Results: distance
Strava says the distance covered was 322 miles. The analysis says:
strava-distance.png
strava-distance-zoom.png
Once again, there's no sweet spot, but we get 322 miles only if we apply no filtering at all. That's clearly too high, and is not reasonable. From the map (and from other people's strava routes) the true distance is closer to 305 miles. Why those people's strava numbers are more believable is anybody's guess.

29 November 2024

Raju Devidas: Finding all sub domains of a main domain

Problem: Need to know all the sub domains of a main domain, e.g. example.com has a sub domain dev.example.com , I also want to know other sub domains. Solution: Install the package called sublist3r, written by Ahmed Aboul-Ela
$ sudo apt install sublist3r
run the command
$ sublist3r -d example.com -o subdomains-example.com.txt
                 ____        _     _ _     _   _____
                / ___  _   _   __   (_)___   _ ___ / _ __
                \___ \        &apos_ \    / __  __   _ \  &apos__ 
                 ___)    _     _)     \__ \  _ ___)    
                 ____/ \__,_ _.__/ _ _ ___/\__ ____/ _ 
                # Coded By Ahmed Aboul-Ela - @aboul3la
[-] Enumerating subdomains now for example.com
[-] Searching now in Baidu..
[-] Searching now in Yahoo..
[-] Searching now in Google..
[-] Searching now in Bing..
[-] Searching now in Ask..
[-] Searching now in Netcraft..
[-] Searching now in DNSdumpster..
[-] Searching now in Virustotal..
[-] Searching now in ThreatCrowd..
[-] Searching now in SSL Certificates..
[-] Searching now in PassiveDNS..
Process DNSdumpster-8:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 269, in run
    domain_list = self.enumerate()
                  ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 649, in enumerate
    token = self.get_csrftoken(resp)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 644, in get_csrftoken
    token = csrf_regex.findall(resp)[0]
            ~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
[!] Error: Google probably now is blocking our requests
[~] Finished now the Google Enumeration ...
[!] Error: Virustotal probably now is blocking our requests
[-] Saving results to file: subdomains-example.com.txt
[-] Total Unique Subdomains Found: 7
AS207960 Test Intermediate - example.com
www.example.com
dev.example.com
m.example.com
products.example.com
support.example.com
m.testexample.com
We can see the subdomains listed at the end of the command output. enjoy, have fun, drink water!

22 November 2024

Matthew Palmer: Invalid Excuses for Why Your Release Process Sucks

In my companion article, I made the bold claim that your release process should consist of no more than two steps:
  1. Create an annotated Git tag;
  2. Run a single command to trigger the release pipeline.
As I have been on the Internet for more than five minutes, I m aware that a great many people will have a great many objections to this simple and straightforward idea. In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid. If you have an objection I don t cover here, the comment box is down the bottom of the article. If you think you ve got a real stumper, I m available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, I ll waive my fees.

But I automatically generate my release notes from commit messages! This one is really easy to solve: have the release note generation tool feed directly into the annotation. Boom! Headshot.

But all these files need to be edited to make a release! No, they absolutely don t. But I can see why you might think you do, given how inflexible some packaging environments can seem, and since that s how we ve always done it .

Language Packages Most languages require you to encode the version of the library or binary in a file that you want to revision control. This is teh suck, but I m yet to encounter a situation that can t be worked around some way or another. In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (that s part of git-version-bump, as an aside) to calculate the version number from the git tags. The Rust build tool, Cargo, uses a TOML file, which isn t as easy, but a small amount of release automation is used to take care of that.

Distribution Packages If you re building Linux distribution packages, you can easily apply similar automation faffery. For example, Debian packages take their metadata from the debian/changelog file in the build directory. Don t keep that file in revision control, though: build it at release time. Everything you need to construct a Debian (or RPM) changelog is in the tag version numbers, dates, times, authors, release notes. Use it for much good.

The Dreaded Changelog Finally, there s the CHANGELOG file. If it s maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an Unreleased heading at the top. It s one more place to remember to have to edit when making that preparing release X.Y.Z commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of every commit must add a changelog entry . My solution: just burn it to the ground. Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if that s your bag) and never open it ever again.

But I need to know other things about my release, too! For some reason, you might think you need some other metadata about your releases. You re probably wrong it s amazing how much information you can obtain or derive from the humble tag so think creatively about your situation before you start making unnecessary complexity for yourself. But, on the off chance you re in a situation that legitimately needs some extra release-related information, here s the secret: structured annotation. The annotation on a tag can be literally any sequence of octets you like. How that data is interpreted is up to you. So, require that annotations on release tags use some sort of structured data format (say YAML or TOML or even XML if you hate your release manager), and mandate that it contain whatever information you need. You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesn t meet the requirements, and you re sorted.

But I have multiple packages in my repo, with different release cadences and versions! This one is common enough that I just refer to it as the monorepo drama . Personally, I m not a huge fan of monorepos, but you do you, boo. Annotated tags can still handle it just fine. The trick is to include the package name being released in the tag name. So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z. The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.

But we don t semver our releases! Oh, that s easy. The tag pattern that marks a release doesn t have to be vX.Y.Z. It can be anything you want. Relatedly, there is a (rare, but existent) need for packages that don t really have a conception of releases in the traditional sense. The example I ve hit most often is automatically generated bindings packages, such as protobuf definitions. The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) you re using. But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change. The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day). This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.

Th-th-th-th-that s all, folks! I hope you ve enjoyed this bit of mild debunking. Show your gratitude by buying me a refreshing beverage, or purchase my professional expertise and I ll answer all of your questions and write all your CI jobs.

18 November 2024

Russ Allbery: Review: Delilah Green Doesn't Care

Review: Delilah Green Doesn't Care, by Ashley Herring Blake
Series: Bright Falls #1
Publisher: Jove
Copyright: February 2022
ISBN: 0-593-33641-0
Format: Kindle
Pages: 374
Delilah Green Doesn't Care is a sapphic romance novel. It's the first of a trilogy, although in the normal romance series fashion each book follows a different protagonist and has its own happy ending. It is apparently classified as romantic comedy, which did not occur to me while reading but which I suppose I can see in retrospect. Delilah Green got the hell out of Bright Falls as soon as she could and tried not to look back. After her father died, her step-mother lavished all of her perfectionist attention on her overachiever step-sister, leaving Delilah feeling like an unwanted ghost. She escaped to New York where there was space for a queer woman with an acerbic personality and a burgeoning career in photography. Her estranged step-sister's upcoming wedding was not a good enough reason to return to the stifling small town of her childhood. The pay for photographing the wedding was, since it amounted to three months of rent and trying to sell photographs in galleries was not exactly a steady living. So back to Bright Falls Delilah goes. Claire never left Bright Falls. She got pregnant young and ended up with a different life than she expected, although not a bad one. Now she's raising her daughter as a single mom, running the town bookstore, and dealing with her unreliable ex. She and Iris are Astrid Parker's best friends and have been since fifth grade, which means she wants to be happy for Astrid's upcoming wedding. There's only one problem: the groom. He's a controlling, boorish ass, but worse, Astrid seems to turn into a different person around him. Someone Claire doesn't like. Then, to make life even more complicated, Claire tries to pick up Astrid's estranged step-sister in Bright Falls's bar without recognizing her. I have a lot of things to say about this novel, but here's the core of my review: I started this book at 4pm on a Saturday because I hadn't read anything so far that day and wanted to at least start a book. I finished it at 11pm, having blown off everything else I had intended to do that evening, completely unable to put it down. It turns out there is a specific type of romance novel protagonist that I absolutely adore: the sarcastic, confident, no-bullshit character who is willing to pick the fights and say the things that the other overly polite and anxious characters aren't able to get out. Astrid does not react well to criticism, for reasons that are far more complicated than it may first appear, and Claire and Iris have been dancing around the obvious problems with her surprise engagement. As the title says, Delilah thinks she doesn't care: she's here to do a job and get out, and maybe she'll get to tweak her annoying step-sister a bit in the process. But that also means that she is unwilling to play along with Astrid's obsessively controlling mother or her obnoxious fiance, and thus, to the barely disguised glee of Claire and Iris, is a direct threat to the tidy life that Astrid's mother is trying to shoehorn her daughter into. This book is a great example of why I prefer sapphic romances: I think this character setup would not work, at least for me, in a heterosexual romance. Delilah's role only works if she's a woman; if a male character were the sarcastic conversational bulldozer, it would be almost impossible to avoid falling into the gender stereotype of a male rescuer. If this were a heterosexual romance trying to avoid that trap, the long-time friend who doesn't know how to directly confront Astrid would have to be the male protagonist. That could work, but it would be a tricky book to write without turning it into a story focused primarily on the subversion of gender roles. Making both protagonists women dodges the problem entirely and gives them so much narrative and conceptual space to simply be themselves, rather than characters obscured by the shadows of societal gender rules. This is also, at it's core, a book about friendship. Claire, Astrid, and Iris have the sort of close-knit friend group that looks exclusive and unapproachable from the outside. Delilah was the stereotypical outsider, mocked and excluded when they thought of her at all. This, at least, is how the dynamics look at the start of the book, but Blake did an impressive job of shifting my understanding of those relationships without changing their essential nature. She fleshes out all of the characters, not just the romantic leads, and adds complexity, nuance, and perspective. And, yes, past misunderstanding, but it's mostly not the cheap sort that sometimes drives romance plots. It's the misunderstanding rooted in remembered teenage social dynamics, the sort of misunderstanding that happens because communication is incredibly difficult, even more difficult when one has no practice or life experience, and requires knowing oneself well enough to even know what to communicate. The encounter between Delilah and Claire in the bar near the start of the book is cornerstone of the plot, but the moment that grabbed me and pulled me in was Delilah's first interaction with Claire's daughter Ruby. That was the point when I knew these were characters I could trust, and Blake never let me down. I love how Ruby is handled throughout this book, with all of the messy complexity of a kid of divorced parents with her own life and her own personality and complicated relationships with both parents that are independent of the relationship their parents have with each other. This is not a perfect book. There's one prank scene that I thought was excessively juvenile and should have been counter-productive, and there's one tricky question of (nonsexual) consent that the book raises and then later seems to ignore in a way that bugged me after I finished it. There is a third-act breakup, which is not my favorite plot structure, but I think Blake handles it reasonably well. I would probably find more niggles and nitpicks if I re-read it more slowly. But it was utterly engrossing reading that exactly matched my mood the day that I picked it up, and that was a fantastic reading experience. I'm not much of a romance reader and am not the traditional audience for sapphic romance, so I'm probably not the person you should be looking to for recommendations, but this is the sort of book that got me to immediately buy all of the sequels and start thinking about a re-read. It's also the sort of book that dragged me back in for several chapters when I was fact-checking bits of my review. Take that recommendation for whatever it's worth. Content note: Reviews of Delilah Green Doesn't Care tend to call it steamy or spicy. I have no calibration for this for romance novels. I did not find it very sex-focused (I have read genre fantasy novels with more sex), but there are several on-page sex scenes if that's something you care about one way or the other. Followed by Astrid Parker Doesn't Fail. Rating: 9 out of 10

12 November 2024

Paul Tagliamonte: Complex for Whom?

In basically every engineering organization I ve ever regarded as particularly high functioning, I ve sat through one specific recurring conversation which is not a conversation about complexity . Things are good or bad because they are or aren t complex, architectures needs to be redone because it s too complex some refactor of whatever it is won t work because it s too complex. You may have even been a part of some of these conversations or even been the one advocating for simple light-weight solutions. I ve done it. Many times. Rarely, if ever, do we talk about complexity within its rightful context complexity for whom. Is a solution complex because it s complex for the end user? Is it complex if it s complex for an API consumer? Is it complex if it s complex for the person maintaining the API service? Is it complex if it s complex for someone outside the team maintaining it to understand? Complexity within a problem domain I ve come to believe, is fairly zero-sum there s a fixed amount of complexity in the problem to be solved, and you can choose to either solve it, or leave it for those downstream of you to solve that problem on their own. That being said, while I believe there is a lower bound in complexity to contend with for a problem, I do not believe there is an upper bound to the complexity of solutions possible. It is always possible, and in fact, very likely that teams create problems for themselves while trying to solve a problem. The rest of this post is talking to the lower bound. When getting feedback on an early draft of this blog post, I ve been informed that Fred Brooks coined a term for what I call lower bound complexity Essential Complexity , in the paper No Silver Bullet Essence and Accident in Software Engineering , which is a better term and can be used interchangeably.

Complexity Culture In a large enough organization, where the team is high functioning enough to have and maintain trust amongst peers, members of the team will specialize. People will begin to engage with subsets of the work to be done, and begin to have their efficacy measured against that part of the organization s problems. Incentives shift, and over time it becomes increasingly likely that two engineers may have two very different priorities when working on the same system together. Someone accountable for uptime and tasked with responding to outages will begin to resist changes. Someone accountable for rapidly delivering features will resist gates between them and their users. Companies (either wittingly or unwittingly) will deal with this by tasking engineers with both production (feature development) and operational tasks (maintenance), so the difference in incentives isn t usually as bad as it could be. When we get a bunch of folks from far-flung corners of an organization in a room, fire up a slide deck and throw up some aspirational to-be architecture diagram in order to get a sign-off to solve some problem (be it someone needs a credible promotion packet, new feature needs to get delivered, or the system has begun to fail and needs fixing), the initial reaction will, more often than I d like, start to devolve into a discussion of how this is going to introduce a bunch of complexity, going to be hard to maintain, why can t you make it less complex? Right around here is when I start to try and contextualize the conversation happening around me understand what complexity is that being discussed, and understand who is taking on that burden. Think about who should be owning that problem, and work through the tradeoffs involved. Is it best solved here, or left to consumers (be them other systems, developers, or users). Should something become an API call s optional param, taking on all the edge-cases and on, or should users have to implement the logic using the data you return (leaving everyone else to take on all the edge-cases and maintenance)? Should you process the data, or require the user to preprocess it for you? Frequently it s right to make an active and explicit decision to simplify and leave problems to be solved downstream, since they may not actually need to be solved or perhaps you expect consumers will want to own the specifics of how the problem is solved, in which case you leave lots of documentation and examples. Many other times, especially when it s something downstream consumers are likely to hit, it s best solved internal to the system, since the only thing that can come of leaving it unsolved are bugs, frustration and half-correct solutions. This is a grey-space of tradeoffs, not a clear decision tree. No one wants the software manifestation of a katamari ball or a junk drawer, nor does anyone want a half-baked service unable to handle the simplest use-case.

Head-in-sand as a Service Popoffs about how complex something is, are, to a first approximation, best understood as meaning complicated for the person making comments . A lot of the #thoughtleadership believe that an AWS hosted EKS k8s cluster running images built by CI talking to an AWS hosted PostgreSQL RDS is not complex. They re right. Mostly right. This is less complex less complex for them. It s not, however, without complexity and its own tradeoffs it s just complexity that they do not have to deal with. Now they don t have to maintain machines that have pesky operating systems or hard drive failures. They don t have to deal with updating the version of k8s, nor ensuring the backups work. No one has to push some artifact to prod manually. Deployments happen unattended. You click a button and get a cluster. On the other hand, developers outside the ops function need to deal with troubleshooting CI, debugging access control rules encoded in turing complete YAML, permissions issues inside the cluster due to whatever the fuck a service mesh is, everyone needs to learn how to use some k8s tools they only actually use during a bad day, likely while doing some x.509 troubleshooting to connect to the cluster (an internal only endpoint; just port forward it) not to mention all sorts of rules to route packets to their project (a single repo s binary being run in 3 containers on a single vm host). Beyond that, there s the invisible complexity complexity on the interior of a service you depend on. I think about the dozens of teams maintaining the EKS service (which is either run on EC2 instances, or alternately, EC2 instances in a trench coat, moustache and even more shell scripts), the RDS service (also EC2 and shell scripts, but this time accounting for redundancy, backups, availability zones), scores of hypervisors pulled off the shelf (xen, kvm) smashed together with the ones built in-house (firecracker, nitro, etc) running on hardware that has to be refreshed and maintained continuously. Every request processed by network ACL rules, AWS IAM rules, security group rules, using IP space announced to the internet wired through IXPs directly into ISPs. I don t even want to begin to think about the complexity inherent in how those switches are designed. Shitloads of complexity to solve problems you may or may not have, or even know you had. What s more complex? An app running in an in-house 4u server racked in the office s telco closet in the back running off the office Verizon line, or an app running four hypervisors deep in an AWS datacenter? Which is more complex to you? What about to your organization? In total? Which is more prone to failure? Which is more secure? Is the complexity good or bad? What type of Complexity can you manage effectively? Which threaten the system? Which threaten your users?

COMPLEXIVIBES This extends beyond Engineering. Decisions regarding what tools are we able to use be them existing contracts with cloud providers, CIO mandated SaaS products, a list of the only permissible open source projects will incur costs in terms of expressed complexity . Pinning open source projects to a fixed set makes SBOM production less complex . Using only one SaaS provider s product suite (even if its terrible, because it has all the types of tools you need) makes accreditation less complex . If all you have is a contract with Pauly T s lowest price technically acceptable artisinal cloudary and haberdashery, the way you pay for your compute is less complex for the CIO shop, though you will find yourself building your own hosted database template, mechanism to spin up a k8s cluster, and all the operational and technical burden that comes with it. Or you won t and make it everyone else s problem in the organization. Nothing you can do will solve for the fact that you must now deal with this problem somewhere because it was less complicated for the business to put the workloads on the existing contract with a cut-rate vendor. Suddenly, the decision to reduce complexity because of an existing contract vehicle has resulted in a huge amount of technical risk and maintenance burden being onboarded. Complexity you would otherwise externalize has now been taken on internally. With a large enough organizations (specifically, in this case, i m talking about you, bureaucracies), this is largely ignored or accepted as normal since the personnel cost is understood to be free to everyone involved. Doing it this way is more expensive, more work, less reliable and less maintainable, and yet, somehow, is, in a lot of ways, less complex to the organization. It s particularly bad with bureaucracies, since screwing up a contract will get you into much more trouble than delivering a broken product, leaving basically no reason for anyone to care to fix this. I can t shake the feeling that for every story of technical mandates gone awry, somewhere just out of sight there s a decisionmaker optimizing for what they believe to be the least amount of complexity least hassle, fewest unique cases, most consistency as they can. They freely offload complexity from their accreditation and risk acceptance functions through mandates. They will never have to deal with it. That does not change the fact that someone does.

TC;DR (TOO COMPLEX; DIDN T REVIEW) We wish to rid ourselves of systemic Complexity after all, complexity is bad, simplicity is good. Removing upper-bound own-goal complexity ( accidental complexity in Brooks s terms) is important, but once you hit the lower bound complexity, the tradeoffs become zero-sum. Removing complexity from one part of the system means that somewhere else maybe outside your organization or in a non-engineering function must grow it back. Sometimes, the opposite is the case, such as when a previously manual business processes is automated. Maybe that s a good idea. Maybe it s not. All I know is that what doesn t help the situation is conflating complexity with everything we don t like legacy code, maintenance burden or toil, cost, delivery velocity.
  • Complexity is not the same as proclivity to failure. The most reliable systems I ve interacted with are unimaginably complex, with layers of internal protection to prevent complete failure. This has its own set of costs which other people have written about extensively.
  • Complexity is not cost. Sometimes the cost of taking all the complexity in-house is less, for whatever value of cost you choose to use.
  • Complexity is not absolute. Something simple from one perspective may be wildly complex from another. The impulse to burn down complex sections of code is helpful to have generally, but sometimes things are complicated for a reason, even if that reason exists outside your codebase or organization.
  • Complexity is not something you can remove without introducing complexity elsewhere. Just as not making a decision is a decision itself; choosing to require someone else to deal with a problem rather than dealing with it internally is a choice that needs to be considered in its full context.
Next time you re sitting through a discussion and someone starts to talk about all the complexity about to be introduced, I want to pop up in the back of your head, politely asking what does complex mean in this context? Is it lower bound complexity? Is this complexity desirable? Is what they re saying mean something along the lines of I don t understand the problems being solved, or does it mean something along the lines of this problem should be solved elsewhere? Do they believe this will result in more work for them in a way that you don t see? Should this not solved at all by changing the bounds of what we should accept or redefine the understood limits of this system? Is the perceived complexity a result of a decision elsewhere? Who s taking this complexity on, or more to the point, is failing to address complexity required by the problem leaving it to others? Does it impact others? How specifically? What are you not seeing? What can change? What should change?

Sven Hoexter: fluxcd: Validate flux-system Root Kustomization

Not entirely sure how people use fluxcd, but I guess most people have something like a flux-system flux kustomization as the root to add more flux kustomizations to their kubernetes cluster. Here all of that is living in a monorepo, and as we're all humans people figure out different ways to break it, which brings the reconciliation of the flux controllers down. Thus we set out to do some pre-flight validations. Note1: We do not use flux variable substitutions for those root kustomizations, so if you use those, you've to put additional work into the validation and pipe things through flux envsubst. First Iteration: Just Run kustomize Like Flux Would Do It With a folder structure where we've a cluster folder with subfolders per cluster, we just run a for loop over all of them:
for CLUSTER in $ CLUSTERS ; do
    pushd clusters/$ CLUSTER 
    # validate if we can create and build a flux-system like kustomization file
    kustomize create --autodetect --recursive
    if ! kustomize build . -o /dev/null 2> error.log; then
        echo "Error building flux-system kustomization for cluster $ CLUSTER "
        cat error.log
    fi
    popd
done
Second Iteration: Make Sure Our Workload Subfolder Have a kustomization.yaml Next someone figured out that you can delete some yaml files from a workload subfolder, including the kustomization.yaml, but not all of them. That left around a resource definition which lacks some other referenced objects, but is still happily included into the root kustomization by kustomize create and flux, which of course did not work. Thus we started to catch that as well in our growing for loop:
for CLUSTER in $ CLUSTERS ; do
    pushd clusters/$ CLUSTER 
    # validate if we can create and build a flux-system like kustomization file
    kustomize create --autodetect --recursive
    if ! kustomize build . -o /dev/null 2> error.log; then
        echo "Error building flux-system kustomization for cluster $ CLUSTER "
        cat error.log
    fi
    # validate if we always have a kustomization file in folders with yaml files
    for CLFOLDER in $(find . -type d); do
        test -f $ CLFOLDER /kustomization.yaml && continue
        test -f $ CLFOLDER /kustomization.yml && continue
        if [[ $(find $ CLFOLDER  -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) -type f wc -l) != 0 ]]; then
            echo "Error Cluster $ CLUSTER  folder $ CLFOLDER  lacks a kustomization.yaml"
        fi
    done
    popd
done
Note2: I shortened those snippets to the core parts. In our case some things are a bit specific to how we implemented the execution of those checks in GitHub action workflows. Hope that's enough to transport the idea of what to check for.

11 November 2024

Gunnar Wolf: Why academics under-share research data - A social relational theory

This post is a review for Computing Reviews for Why academics under-share research data - A social relational theory , a article published in Journal of the Association for Information Science and Technology
As an academic, I have cheered for and welcomed the open access (OA) mandates that, slowly but steadily, have been accepted in one way or another throughout academia. It is now often accepted that public funds means public research. Many of our universities or funding bodies will demand that, with varying intensities sometimes they demand research to be published in an OA venue, sometimes a mandate will only prefer it. Lately, some journals and funder bodies have expanded this mandate toward open science, requiring not only research outputs (that is, articles and books) to be published openly but for the data backing the results to be made public as well. As a person who has been involved with free software promotion since the mid 1990s, it was natural for me to join the OA movement and to celebrate when various universities adopt such mandates. Now, what happens after a university or funder body adopts such a mandate? Many individual academics cheer, as it is the right thing to do. However, the authors observe that this is not really followed thoroughly by academics. What can be observed, rather, is the slow pace or feet dragging of academics when they are compelled to comply with OA mandates, or even an outright refusal to do so. If OA and open science are close to the ethos of academia, why aren t more academics enthusiastically sharing the data used for their research? This paper finds a subversive practice embodied in the refusal to comply with such mandates, and explores an hypothesis based on Karl Marx s productive worker theory and Pierre Bourdieu s ideas of symbolic capital. The paper explains that academics, as productive workers, become targets for exploitation: given that it s not only the academics sharing ethos, but private industry s push for data collection and industry-aligned research, they adapt to technological changes and jump through all kinds of hurdles to create more products, in a result that can be understood as a neoliberal productivity measurement strategy. Neoliberalism assumes that mechanisms that produce more profit for academic institutions will result in better research; it also leads to the disempowerment of academics as a class, although they are rewarded as individuals due to the specific value they produce. The authors continue by explaining how open science mandates seem to ignore the historical ways of collaboration in different scientific fields, and exploring different angles of how and why data can be seen as under-shared, failing to comply with different aspects of said mandates. This paper, built on the social sciences tradition, is clearly a controversial work that can spark interesting discussions. While it does not specifically touch on computing, it is relevant to Computing Reviews readers due to the relatively high percentage of academics among us.

10 November 2024

Reproducible Builds: Reproducible Builds in October 2024

Welcome to the October 2024 report from the Reproducible Builds project. Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Beyond bitwise equality for Reproducible Builds?
  2. Two Ways to Trustworthy at SeaGL 2024
  3. Number of cores affected Android compiler output
  4. On our mailing list
  5. diffoscope
  6. IzzyOnDroid passed 25% reproducible apps
  7. Distribution work
  8. Website updates
  9. Reproducibility testing framework
  10. Supply-chain security at Open Source Summit EU
  11. Upstream patches

Beyond bitwise equality for Reproducible Builds? Jens Dietrich, Tim White, of Victoria University of Wellington, New Zealand along with Behnaz Hassanshahi and Paddy Krishnan of Oracle Labs Australia published a paper entitled Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds :
The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: Does build A confirm the integrity of build B? or Can build A reveal a compromised build B? . To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types.
A PDF of the paper is freely available.

Two Ways to Trustworthy at SeaGL 2024 On Friday 8th November, Vagrant Cascadian will present a talk entitled Two Ways to Trustworthy at SeaGL in Seattle, WA. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. Vagrant s talk:
[ ] delves into how two project[s] approaches fundamental security features through Reproducible Builds, Bootstrappable Builds, code auditability, etc. to improve trustworthiness, allowing independent verification; trustworthy projects require little to no trust. Exploring the challenges that each project faces due to very different technical architectures, but also contextually relevant social structure, adoption patterns, and organizational history should provide a good backdrop to understand how different approaches to security might evolve, with real-world merits and downsides.

Number of cores affected Android compiler output Fay Stegerman wrote that the cause of the Android toolchain bug from September s report that she reported to the Android issue tracker has been found and the bug has been fixed.
the D8 Java to DEX compiler (part of the Android toolchain) eliminated a redundant field load if running the class s static initialiser was known to be free of side effects, which ended up accidentally depending on the sharding of the input, which is dependent on the number of CPU cores used during the build.
To make it easier to understand the bug and the patch, Fay also made a small example to illustrate when and why the optimisation involved is valid.

On our mailing list On our mailing list this month:

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 279, 280, 281 and 282 to Debian:
  • Ignore errors when listing .ar archives (#1085257). [ ]
  • Don t try and test with systemd-ukify in the Debian stable distribution. [ ]
  • Drop Depends on the deprecated python3-pkg-resources (#1083362). [ ]
In addition, Jelle van der Waa added support for Unified Kernel Image (UKI) files. [ ][ ][ ] Furthermore, Vagrant Cascadian updated diffoscope in GNU Guix to version 282. [ ][ ]

IzzyOnDroid passed 25% reproducible apps The IzzyOnDroid project has reached a good milestone by reaching over 25% of the ~1,200 Android apps provided by their repository (of official APKs built by the original application developers) having been confirmed to be reproducible by a rebuilder.

Distribution work In Debian this month:
  • Holger Levsen uploaded devscripts version 2.24.2, including many changes to the debootsnap, debrebuild and reproducible-check scripts. This is the first time that debrebuild actually works (using sbuild s unshare backend). As part of this, Holger also fixed an issue in the reproducible-check script where a typo in the code led to incorrect results [ ]
  • Recently, a news entry was added to snapshot.debian.org s homepage, describing the recent changes that made the system stable again:
    The new server has no problems keeping up with importing the full archives on every update, as each run finishes comfortably in time before it s time to run again. [While] the new server is the one doing all the importing of updated archives, the HTTP interface is being served by both the new server and one of the VM s at LeaseWeb.
    The entry list a number of specific updates surrounding the API endpoints and rate limiting.
  • Lastly, 12 reviews of Debian packages were added, 3 were updated and 18 were removed this month adding to our knowledge about identified issues.
Elsewhere in distribution news, Zbigniew J drzejewski-Szmek performed another rebuild of Fedora 42 packages, with the headline result being that 91% of the packages are reproducible. Zbigniew also reported a reproducibility problem with QImage. Finally, in openSUSE, Bernhard M. Wiedemann published another report for that distribution.

Website updates There were an enormous number of improvements made to our website this month, including:
  • Alba Herrerias:
    • Improve consistency across distribution-specific guides. [ ]
    • Fix a number of links on the Contribute page. [ ]
  • Chris Lamb:
  • hulkoba
  • James Addison:
    • Huge and significant work on a (as-yet-merged) quickstart guide to be linked from the homepage [ ][ ][ ][ ][ ]
    • On the homepage, link directly to the Projects subpage. [ ]
    • Relocate dependency-drift notes to the Volatile inputs page. [ ]
  • Ninette Adhikari:
    • Add a brand new Success stories page that highlights the success stories of Reproducible Builds, showcasing real-world examples of projects shipping with verifiable, reproducible builds . [ ][ ][ ][ ][ ][ ]
  • Pol Dellaiera:
    • Update the website s README page for building the website under NixOS. [ ][ ][ ][ ][ ]
    • Add a new academic paper citation. [ ]
Lastly, Holger Levsen filed an extensive issue detailing a request to create an overview of recommendations and standards in relation to reproducible builds.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen, including:
  • Add a basic index.html for rebuilderd. [ ]
  • Update the nginx.conf configuration file for rebuilderd. [ ]
  • Document how to use a rescue system for Infomaniak s OpenStack cloud. [ ]
  • Update usage info for two particular nodes. [ ]
  • Fix up a version skew check to fix the name of the riscv64 architecture. [ ]
  • Update the rebuilderd-related TODO. [ ]
In addition, Mattia Rizzolo added a new IP address for the inos5 node [ ] and Vagrant Cascadian brought 4 virt nodes back online [ ].

Supply-chain security at Open Source Summit EU The Open Source Summit EU took place recently, and covered plenty of topics related to supply-chain security, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Next.

Previous.