Search Results: "benjamin"

2 January 2025

Colin Watson: Free software activity in December 2024

Most of my Debian contributions this month were sponsored by Freexian, as well as one direct donation via Liberapay (thanks!). OpenSSH I issued a bookworm update with a number of fixes that had accumulated over the last year, especially fixing GSS-API key exchange which was quite broken in bookworm. base-passwd A few months ago, the adduser maintainer started a discussion with me (as the base-passwd maintainer) and the shadow maintainer about bringing all three source packages under one team, since they often need to cooperate on things like user and group names. I agreed, but hadn t got round to doing anything about it until recently. I ve now officially moved it under team maintenance. debconf Gioele Barabucci has been working on eliminating duplicated code between debconf and cdebconf, ultimately with the goal of migrating to cdebconf (which I m not sure I m convinced of as a goal, but if we can make improvements to both packages as part of working towards it then there s no harm in that). I finally got round to reviewing and merging confmodule changes in each of debconf and cdebconf. This caused an installer regression due to a weirdness in cdebconf-udeb s packaging, which I fixed - sorry about that! I ve also been dealing with a few patch submissions that had been in my queue for a long time, but more on that next month if all goes well. CI issues I noticed and fixed a problem with Restrictions: needs-sudo in autopkgtest. I fixed broken aptly images in the Salsa CI pipeline. Python team Last month, I mentioned some progress on sorting out the multipart vs. python-multipart name conflict in Debian (#1085728), and said that I thought we d be able to finish it soon. I was right! We got it all done this month: The Python 3.13 transition continues, and last month we were able to add it to the supported Python versions in testing. (The next step will be to make it the default.) I fixed lots of problems in aid of this, including: Sphinx 8.0 removed some old intersphinx_mapping syntax which turned out to still be in use by many packages in Debian. The fixes for this were individually trivial, but there were a lot of them: I found that twisted 24.11.0 broke tests in buildbot and wokkel, and fixed those. I packaged python-flatdict, needed for a new upstream version of python-semantic-release. I tracked down a test failure in vdirsyncer (which I ve been using for some years, but had never previously needed to modify) and contributed a fix upstream. I fixed some packages to tolerate future versions of dh-python that will drop their dependency on python3-setuptools: I fixed django-cte to remove a build-dependency on the obsolete python3-nose package. I added Django 5.1 support to django-polymorphic. (There are a number of other packages that still need work here.) I fixed various other build/test failures: I upgraded these packages to new upstream versions: I updated the team s library style guide to remove material related to Python 2 and early versions of Python 3, which is no longer relevant to any current Python packaging work. Other Python upstream work I happened to notice a Twisted upstream issue requesting the removal of the deprecated twisted.internet.defer.returnValue, realized it was still used in many places in Debian, and went on a PR-filing spree informed by codesearch to try to reduce the future impact of such a change on Debian: Other small fixes Santiago Vila has been building the archive with make --shuffle (also see its author s explanation). I fixed associated bugs in cccc (contributed upstream), groff, and spectemu. I backported an upstream patch to putty to fix undefined behaviour that affected use of the small keypad . I removed groff s Recommends: libpaper1 (#1091375, #1091376), since it isn t currently all that useful and was getting in the way of a transition to libpaper2. I filed an upstream bug suggesting better integration in this area.

31 December 2024

Chris Lamb: Favourites of 2024

Here are my favourite books and movies that I read and watched throughout 2024. It wasn't quite the stellar year for books as previous years: few of those books that make you want to recommend and/or buy them for all your friends. In subconscious compensation, perhaps, I reread a few classics (e.g. True Grit, Solaris), and I'm almost finished my second read of War and Peace.

Books

Elif Batuman: Either/Or (2022) Stella Gibbons: Cold Comfort Farm (1932) Michel Faber: Under The Skin (2000) Wallace Stegner: Crossing to Safety (1987) Gustave Flaubert: Madame Bovary (1857) Rachel Cusk: Outline (2014) Sara Gran: The Book of the Most Precious Substance (2022) Anonymous: The Railway Traveller s Handy Book (1862) Natalie Hodges: Uncommon Measure: A Journey Through Music, Performance, and the Science of Time (2022)Gary K. Wolf: Who Censored Roger Rabbit? (1981)

Films Recent releases

Seen at a 2023 festival. Disappointments this year included Blitz (Steve McQueen), Love Lies Bleeding (Rose Glass), The Room Next Door (Pedro Almod var) and Emilia P rez (Jacques Audiard), whilst the worst new film this year was likely The Substance (Coralie Fargeat), followed by Megalopolis (Francis Ford Coppola), Unfrosted (Jerry Seinfeld) and Joker: Folie Deux (Todd Phillips).
Older releases ie. Films released before 2023, and not including rewatches from previous years. Distinctly unenjoyable watches included The Island of Dr. Moreau (John Frankenheimer, 1996), Southland Tales (Richard Kelly, 2006), Any Given Sunday (Oliver Stone, 1999) & The Hairdresser s Husband (Patrice Leconte, 19990). On the other hand, unforgettable cinema experiences this year included big-screen rewatches of Solaris (Andrei Tarkovsky, 1972), Blade Runner (Ridley Scott, 1982), Apocalypse Now (Francis Ford Coppola, 1979) and Die Hard (John McTiernan, 1988).

21 December 2024

Benjamin Mako Hill: Thug Life

My current playlist is this diorama of Lulu the Piggy channeling Tupac Shakur in a toy vending machine in the basement of New World Mall in Flushing Chinatown.

19 December 2024

Gregory Colpart: MiniDebConf Toulouse 2024

After the MiniDebConf Marseille 2019, COVID-19 made it impossible or difficult to organize new MiniDebConfs for a few years. With the gradual resumption of in-person events (like FOSDEM, DebConf, etc.), the idea emerged to host another MiniDebConf in France, but with a lighter organizational load. In 2023, we decided to reach out to the organizers of Capitole du Libre to repeat the experience of 2017: hosting a MiniDebConf alongside their annual event in Toulouse in November. However, our request came too late for 2023. After discussions with Capitole du Libre in November 2023 in Toulouse and again in February 2024 in Brussels, we confirmed that a MiniDebConf Toulouse would take place in November 2024! We then assembled a small organizing team and got to work: a Call for Papers in May 2024, adding a two-day MiniDebCamp, coordinating with the DebConf video team, securing sponsors, creating a logo, ordering T-shirts and stickers, planning the schedule, and managing registrations. Even with lighter logistics (conference rooms, badges, and catering during the weekend were handled by Capitole du Libre), there was still quite a bit of preparation to do. On Thursday, November 14, and Friday, November 15, 2024, about forty developers arrived from around the world (France, Spain, Italy, Switzerland, Germany, England, Brazil, Uruguay, India, Brest, Marseille ) to spend two days at the MiniDebCamp in the beautiful collaborative spaces of Artilect in Toulouse city center.
Then, on Saturday, November 16, and Sunday, November 17, 2024, the MiniDebConf took place at ENSEEIHT as part of the Capitole du Libre event. The conference kicked off on Saturday morning with an opening session by J r my Lecour, which included a tribute to Lunar (Nicolas Dandrimont). This was followed by Reproducible Builds Rebuilding What is Distributed from ftp.debian.org (Holger Levsen) and Discussion on My Research Work on Sustainability of Debian OS (Eda). After lunch at the Capitole du Libre food trucks, the intense afternoon schedule began: What s New in the Linux Kernel (and What s Missing in Debian) (Ben Hutchings), Linux Live Patching in Debian (Santiago Ruano Rinc n), Trixie on Mobile: Are We There Yet? (Arnaud Ferraris), PostgreSQL Container Groups, aka cgroups Down the Road (C dric Villemain), Upgrading a Thousand Debian Hosts in Less Than an Hour (J r my Lecour and myself), and Using Debusine to Automate Your QA (Stefano Rivera & co). Sunday marked the second day, starting with a presentation on DebConf 25 (Benjamin Somers), which will be held in Brest in July 2025. The morning continued with talks: How LTS Goes Beyond LTS (Santiago Ruano Rinc n & Roberto C. S nchez), Cross-Building (Helmut Grohne), and State of JavaScript (Bastien Roucari s). In the afternoon, there were Lightning Talks, PyPI Security: Past, Present & Future (Salvo LtWorf Tomaselli), and the classic Bits from DPL (Andreas Tille), before closing with the final session led by Pierre-Elliott B cue. All talks are available on video (a huge thanks to the amazing DebConf video team), and many thanks to our sponsors (Viridien, Freexian, Evolix, Collabora, and Data Bene). A big thank-you as well to the entire Capitole du Libre team for hosting and supporting us see you in Brest in July 2025! Articles about (or mentioning) MiniDebConf Toulouse:

Benjamin Mako Hill: Being a bread torus

A concerned nutritional epidemiologist in Tokyo realizes that if you are what you eat, that means
It s a similar situation in Seoul, albeit with less oil and more confidence.

8 November 2024

Freexian Collaborators: Debian Contributions: October s report (by Anupa Ann Joseph)

Debian Contributions: 2024-10 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

rebootstrap, by Helmut Grohne After significant changes earlier this year, the state of architecture cross bootstrap is normalizing again. More and more architectures manage to complete rebootstrap testing successfully again. Here are two examples of what kind of issues the bootstrap testing identifies. At some point, libpng1.6 would fail to cross build on musl architectures whereas it would succeed on other ones failing to locate zlib. Adding --debug-find to the cmake invocation eventually revealed that it would fail to search in /usr/lib/<triplet>, which is the default library path. This turned out to be a bug in cmake assuming that all linux systems use glibc. libpng1.6 also gained a baseline violation for powerpc and ppc64 by enabling the use of AltiVec there. The newt package would fail to cross build for many 32-bit architectures whereas it would succeed for armel and armhf due to -Wincompatible-pointer-types. It turns out that this flag was turned into -Werror and it was compiling with a warning earlier. The actual problem is a difference in signedness between wchar_t and FriBidChar (aka uint32_t) and actually affects native building on i386.

Miscellaneous contributions
  • Helmut sent 35 patches for cross build failures.
  • Stefano Rivera uploaded the Python 3.13.0 final release.
  • Stefano continued to rebuild Python packages with C extensions using Python 3.13, to catch compatibility issues before the 3.13-add transition starts.
  • Stefano uploaded new versions of a handful of Python packages, including: dh-python, objgraph, python-mitogen, python-truststore, and python-virtualenv.
  • Stefano packaged a new release of mkdocs-macros-plugin, which required packaging a new Python package for Debian, python-super-collections (now in NEW review).
  • Stefano helped the mini-DebConf Online Brazil get video infrastructure up and running for the event. Unfortunately, Debian s online-DebConf setup has bitrotted over the last couple of years, and it eventually required new temporary Jitsi and Jibri instances.
  • Colin Watson fixed a number of autopkgtest failures to get ansible back into testing.
  • Colin fixed an ssh client failure in certain cases when using GSS-API key exchange, and added an integration test to ensure this doesn t regress in future.
  • Colin worked on the Python 3.13 transition, fixing problems related to it in 15 packages. This included upstream work in a number of packages (postgresfixture, python-asyncssh, python-wadllib).
  • Colin upgraded 41 Python packages to new upstream versions.
  • Carles improved po-debconf-manager: now it can create merge requests to Salsa automatically (created 17, new batch coming this month), imported almost all the packages with debconf translation templates whose VCS is Salsa (currently 449 imported), added statistics per package and language, improved command line interface options. Performed user support fixing different issues. Also prepared an abstract for the talk at MiniDebConf Toulouse.
  • Santiago Ruano Rinc n continued the organization work for the DebConf 25 conference, to be held in Brest, France. Part of the work relates to the initial edits of the sponsoring brochure. Thanks to Benjamin Somers who finalized the French and English versions.
  • Rapha l forwarded a couple of zim and hamster bugs to the upstream developers, and tried to diagnose a delayed startup of gdm on his laptop (cf #1085633).
  • On behalf of the Debian Publicity Team, Anupa interviewed 7 women from the Debian community, old and new contributors. The interview was published in Bits from Debian.

17 September 2024

Benjamin Mako Hill: My Chair

I realize that because I have several chairs, the phrase my chair is ambiguous. To reduce confusion, I will refer to the head of my academic department as my office chair going forward.

10 August 2024

Benjamin Mako Hill: For Additional Confusion

The Wikipedia article on antipopes can be pretty confusing! If you d like to be even more confused, it can help with that!

4 March 2024

Paulo Henrique de Lima Santana: Bits from FOSDEM 2023 and 2024

Link para vers o em portugu s

Intro Since 2019, I have traveled to Brussels at the beginning of the year to join FOSDEM, considered the largest and most important Free Software event in Europe. The 2024 edition was the fourth in-person edition in a row that I joined (2021 and 2022 did not happen due to COVID-19) and always with the financial help of Debian, which kindly paid my flight tickets after receiving my request asking for help to travel and approved by the Debian leader. In 2020 I wrote several posts with a very complete report of the days I spent in Brussels. But in 2023 I didn t write anything, and becayse last year and this year I coordinated a room dedicated to translations of Free Software and Open Source projects, I m going to take the opportunity to write about these two years and how it was my experience. After my first trip to FOSDEM, I started to think that I could join in a more active way than just a regular attendee, so I had the desire to propose a talk to one of the rooms. But then I thought that instead of proposing a tal, I could organize a room for talks :-) and with the topic translations which is something that I m very interested in, because it s been a few years since I ve been helping to translate the Debian for Portuguese.

Joining FOSDEM 2023 In the second half of 2022 I did some research and saw that there had never been a room dedicated to translations, so when the FOSDEM organization opened the call to receive room proposals (called DevRoom) for the 2023 edition, I sent a proposal to a translation room and it was accepted! After the room was confirmed, the next step was for me, as room coordinator, to publicize the call for talk proposals. I spent a few weeks hoping to find out if I would receive a good number of proposals or if it would be a failure. But to my happiness, I received eight proposals and I had to select six to schedule the room programming schedule due to time constraints . FOSDEM 2023 took place from February 4th to 5th and the translation devroom was scheduled on the second day in the afternoon. Fosdem 2023 The talks held in the room were these below, and in each of them you can watch the recording video. And on the first day of FOSDEM I was at the Debian stand selling the t-shirts that I had taken from Brazil. People from France were also there selling other products and it was cool to interact with people who visited the booth to buy and/or talk about Debian.
Fosdem 2023

Fosdem 2023
Photos

Joining FOSDEM 2024 The 2023 result motivated me to propose the translation devroom again when the FOSDEM 2024 organization opened the call for rooms . I was waiting to find out if the FOSDEM organization would accept a room on this topic for the second year in a row and to my delight, my proposal was accepted again :-) This time I received 11 proposals! And again due to time constraints, I had to select six to schedule the room schedule grid. FOSDEM 2024 took place from February 3rd to 4th and the translation devroom was scheduled for the second day again, but this time in the morning. The talks held in the room were these below, and in each of them you can watch the recording video. This time I didn t help at the Debian stand because I couldn t bring t-shirts to sell from Brazil. So I just stopped by and talked to some people who were there like some DDs. But I volunteered for a few hours to operate the streaming camera in one of the main rooms.
Fosdem 2024

Fosdem 2024
Photos

Conclusion The topics of the talks in these two years were quite diverse, and all the lectures were really very good. In the 12 talks we can see how translations happen in some projects such as KDE, PostgreSQL, Debian and Mattermost. We had the presentation of tools such as LibreTranslate, Weblate, scripts, AI, data model. And also reports on the work carried out by communities in Africa, China and Indonesia. The rooms were full for some talks, a little more empty for others, but I was very satisfied with the final result of these two years. I leave my special thanks to Jonathan Carter, Debian Leader who approved my flight tickets requests so that I could join FOSDEM 2023 and 2024. This help was essential to make my trip to Brussels because flight tickets are not cheap at all. I would also like to thank my wife Jandira, who has been my travel partner :-) Bruxelas As there has been an increase in the number of proposals received, I believe that interest in the translations devroom is growing. So I intend to send the devroom proposal to FOSDEM 2025, and if it is accepted, wait for the future Debian Leader to approve helping me with the flight tickets again. We ll see.

16 November 2023

Dimitri John Ledkov: Ubuntu 23.10 significantly reduces the installed kernel footprint


Photo by Pixabay
Ubuntu systems typically have up to 3 kernels installed, before they are auto-removed by apt on classic installs. Historically the installation was optimized for metered download size only. However, kernel size growth and usage no longer warrant such optimizations. During the 23.10 Mantic Minatour cycle, I led a coordinated effort across multiple teams to implement lots of optimizations that together achieved unprecedented install footprint improvements.

Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:

  • 2x less disk space used (1,417MB vs 2,940MB, including initrd)

  • 3x less peak RAM usage for the initrd boot (68MB vs 204MB)

  • 0.5x increase in download size (949MB vs 600MB)

  • 2.5x faster initrd generation (4.5s vs 11.3s)

  • approximately the same total time (103s vs 98s, hardware dependent)


For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:

  • 1.3x less disk space used (548MB vs 742MB)

  • 2.2x less peak RAM usage for initrd boot (27MB vs 62MB)

  • 0.4x increase in download size (207MB vs 146MB)


Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.

This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.

The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.

The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.

Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.

Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:

For questions and comments please post to Kernel section on Ubuntu Discourse.



11 November 2023

Reproducible Builds: Reproducible Builds in October 2023

Welcome to the October 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort, and this instance was no different. During this enriching event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. A number of concrete outcomes from the summit will documented in the report for November 2023 and elsewhere. Amazingly the agenda and all notes from all sessions are already online. The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Reflections on Reflections on Trusting Trust Russ Cox posted a fascinating article on his blog prompted by the fortieth anniversary of Ken Thompson s award-winning paper, Reflections on Trusting Trust:
[ ] In March 2023, Ken gave the closing keynote [and] during the Q&A session, someone jokingly asked about the Turing award lecture, specifically can you tell us right now whether you have a backdoor into every copy of gcc and Linux still today?
Although Ken reveals (or at least claims!) that he has no such backdoor, he does admit that he has the actual code which Russ requests and subsequently dissects in great but accessible detail.

Ecosystem factors of reproducible builds Rahul Bajaj, Eduardo Fernandes, Bram Adams and Ahmed E. Hassan from the Maintenance, Construction and Intelligence of Software (MCIS) laboratory within the School of Computing, Queen s University in Ontario, Canada have published a paper on the Time to fix, causes and correlation with external ecosystem factors of unreproducible builds. The authors compare various response times within the Debian and Arch Linux distributions including, for example:
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed.
A full PDF of their paper is available online, as are many other interesting papers on MCIS publication page.

NixOS installation image reproducible On the NixOS Discourse instance, Arnout Engelen (raboof) announced that NixOS have created an independent, bit-for-bit identical rebuilding of the nixos-minimal image that is used to install NixOS. In their post, Arnout details what exactly can be reproduced, and even includes some of the history of this endeavour:
You may remember a 2021 announcement that the minimal ISO was 100% reproducible. While back then we successfully tested that all packages that were needed to build the ISO were individually reproducible, actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created. By the time we fixed those, regressions had popped up (notably an upstream problem in Python 3.10), and it isn t until this week that we were back to having everything reproducible and being able to validate the complete chain.
Congratulations to NixOS team for reaching this important milestone! Discussion about this announcement can be found underneath the post itself, as well as on Hacker News.

CPython source tarballs now reproducible Seth Larson published a blog post investigating the reproducibility of the CPython source tarballs. Using diffoscope, reprotest and other tools, Seth documents his work that led to a pull request to make these files reproducible which was merged by ukasz Langa.

New arm64 hardware from Codethink Long-time sponsor of the project, Codethink, have generously replaced our old Moonshot-Slides , which they have generously hosted since 2016 with new KVM-based arm64 hardware. Holger Levsen integrated these new nodes to the Reproducible Builds continuous integration framework.

Community updates On our mailing list during October 2023 there were a number of threads, including:
  • Vagrant Cascadian continued a thread about the implementation details of a snapshot archive server required for reproducing previous builds. [ ]
  • Akihiro Suda shared an update on BuildKit, a toolkit for building Docker container images. Akihiro links to a interesting talk they recently gave at DockerCon titled Reproducible builds with BuildKit for software supply-chain security.
  • Alex Zakharov started a thread discussing and proposing fixes for various tools that create ext4 filesystem images. [ ]
Elsewhere, Pol Dellaiera made a number of improvements to our website, including fixing typos and links [ ][ ], adding a NixOS Flake file [ ] and sorting our publications page by date [ ]. Vagrant Cascadian presented Reproducible Builds All The Way Down at the Open Source Firmware Conference.

Distribution work distro-info is a Debian-oriented tool that can provide information about Debian (and Ubuntu) distributions such as their codenames (eg. bookworm) and so on. This month, Benjamin Drung uploaded a new version of distro-info that added support for the SOURCE_DATE_EPOCH environment variable in order to close bug #1034422. In addition, 8 reviews of packages were added, 74 were updated and 56 were removed this month, all adding to our knowledge about identified issues. Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Chris Lamb fixed an issue in diffoscope, where if the equivalent of file -i returns text/plain, fallback to comparing as a text file. This was originally filed as Debian bug #1053668) by Niels Thykier. [ ] This was then uploaded to Debian (and elsewhere) as version 251.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Refine the handling of package blacklisting, such as sending blacklisting notifications to the #debian-reproducible-changes IRC channel. [ ][ ][ ]
    • Install systemd-oomd on all Debian bookworm nodes (re. Debian bug #1052257). [ ]
    • Detect more cases of failures to delete schroots. [ ]
    • Document various bugs in bookworm which are (currently) being manually worked around. [ ]
  • Node-related changes:
    • Integrate the new arm64 machines from Codethink. [ ][ ][ ][ ][ ][ ]
    • Improve various node cleanup routines. [ ][ ][ ][ ]
    • General node maintenance. [ ][ ][ ][ ]
  • Monitoring-related changes:
    • Remove unused Munin monitoring plugins. [ ]
    • Complain less visibly about too many installed kernels. [ ]
  • Misc:
    • Enhance the firewall handling on Jenkins nodes. [ ][ ][ ][ ]
    • Install the fish shell everywhere. [ ]
In addition, Vagrant Cascadian added some packages and configuration for snapshot experiments. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

26 November 2022

Benjamin Mako Hill: The Financial Times has been printing an obvious error on its Market Data page for 18 months and nobody else seems to have noticed

Market Data section of the Financial Times US Edition print edition from May 5, 2021.
If you ve flipped through printed broadsheet newspapers, you ve probably seen pages full of tiny text listing prices and other market information for stocks and commodities. And you ve almost certainly just turned the page. Anybody interested in this market prices today will turn to the internet where these numbers are available in real time and where you don t need to squint to find what you need. This is presumably why many newspapers have stopped printing these types of pages or dramatically reduced the space devoted to them. Major financial newspapers however like the Financial Times (FT) still print multiple pages of market data daily. But does anybody read them? The answer appears to be no. How do I know? I noticed an error in the FT s Market Data page that anybody looking in the relevant section of the page would have seen. And I have seen it reproduced every single day for the last 18 months. In early May last year, I noticed that the Japanese telecom giant Nippon Telegraph and Telephone (NTT) was listed twice on the FT s list of the 500 largest global companies: once as Nippon T&T and also as Nippon TT. One right above the other. All the numbers are identical. Clearly a mistake.
Reproduction of the FT Market Data section showing a subset of Japanese companies from the FT 500 list of global companies. The duplicate lines are highlighted in yellow. This page is from today s paper (November 26, 2022).
Wondering if it was a one-off error, I looked at a copy of the paper from about a week before and saw that the error did not exist then. I looked at a copy from one day before and saw that it did. Since the issue was apparently recurring, but new at the time, I figured someone at the paper would notice and fix it quickly. I was wrong. It has been 18 months now and the error has been reproduced every single day. Looking through the archives, it seems that the first day the error showed up was May 5, 2021. I ve included a screenshot from the electronic paper version from that day and from the fifth of every month since then (or the sixth if the paper was not printed on the fifth) that shows that the error is reproduced every day. A quick look in the archives suggests it not only appears in the US edition but also in the UK, European, Asian, and Middle East editions. All of them. Why does this matter? The FT prints over 112,000 copies of its paper, six days a week. This duplicate line takes up almost no space, of course, so it s not a big deal on its own. But devoting two full broadsheet pages to market data that is out date as soon as it is printed much of which nobody appears to be reading doesn t seem like a great use of resources. There s an argument to made that papers like the FT print these pages not because they are useful but because doing so is a signal of the publications identities as serious financial papers. But that hardly seems like a good enough reason on its own if nobody is looking at them. It seems well past time for newspapers to stop wasting paper and ink on these pages. I respect that some people think that printing paper newspapers at all is wasteful when one can just read the material online. Plenty of people disagree, of course. But who will disagree with a call to stop printing material that evidence suggests is not being seen by anybody? If an error this obvious can exist for so long, it seems clear that nobody not even anybody at the FT itself is reading it.

9 November 2021

Benjamin Mako Hill: The Hidden Costs of Requiring Accounts

Should online communities require people to create accounts before participating? This question has been a source of disagreement among people who start or manage online communities for decades. Requiring accounts makes some sense since users contributing without accounts are a common source of vandalism, harassment, and low quality content. In theory, creating an account can deter these kinds of attacks while still making it pretty quick and easy for newcomers to join. Also, an account requirement seems unlikely to affect contributors who already have accounts and are typically the source of most valuable contributions. Creating accounts might even help community members build deeper relationships and commitments to the group in ways that lead them to stick around longer and contribute more.
In a new paper published in Communication Research, I worked with Aaron Shaw provide an answer. We analyze data from natural experiments that occurred when 136 wikis on Fandom.com started requiring user accounts. Although we find strong evidence that the account requirements deterred low quality contributions, this came at a substantial (and usually hidden) cost: a much larger decrease in high quality contributions. Surprisingly, the cost includes lost contributions from community members who had accounts already, but whose activity appears to have been catalyzed by the (often low quality) contributions from those without accounts.
A version of this post was first posted on the Community Data Science blog. The full citation for the paper is: Hill, Benjamin Mako, and Aaron Shaw. 2020. The Hidden Costs of Requiring Accounts: Quasi-Experimental Evidence from Peer Production. Communication Research, 48 (6): 771 95. https://doi.org/10.1177/0093650220910345. If you do not have access to the paywalled journal, please check out this pre-print or get in touch with us. We have also released replication materials for the paper, including all the data and code used to conduct the analysis and compile the paper itself.

3 November 2021

Benjamin Mako Hill: Q&A about doing a PhD with my research group

Ever considered doing research about online communities, free culture/software, and peer production full time? It s PhD admission season and my research group the Community Data Science Collective is doing an open-to-anyone Q&A about PhD admissions this Friday November 5th. We ve got room in the session and its not too late to sign up to join us! The session will be a good opportunity to hear from and talk to faculty recruiting students to our various programs at the University of Washington, Purdue, and Northwestern and to talk with current and previous students in the group. I am hoping to admit at least one new PhD advisee to the Department of Communication at UW this year (maybe more) and am currently co-advising (and/or have previously co-advised) students in UW s Allen School of Computer Science & Engineering, Department of Human-Centered Design & Engineering, and Information School. One thing to keep in mind is that my primary/home department Communication has a deadline for PhD applications of November 15th this year. The registration deadline for the Q&A session is listed as today but we ll do what we can to sneak you in even if you register late. That said, please do register ASAP so we can get you the link to the session!

6 October 2021

Reproducible Builds: Reproducible Builds in September 2021

The goal behind reproducible builds is to ensure that no deliberate flaws have been introduced during compilation processes via promising or mandating that identical results are always generated from a given source. This allowing multiple third-parties to come to an agreement on whether a build was compromised or not by a system of distributed consensus. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month:
First mentioned in our March 2021 report, Martin Heinz published two blog posts on sigstore, a project that endeavours to offer software signing as a public good, [the] software-signing equivalent to Let s Encrypt . The two posts, the first entitled Sigstore: A Solution to Software Supply Chain Security outlines more about the project and justifies its existence:
Software signing is not a new problem, so there must be some solution already, right? Yes, but signing software and maintaining keys is very difficult especially for non-security folks and UX of existing tools such as PGP leave much to be desired. That s why we need something like sigstore - an easy to use software/toolset for signing software artifacts.
The second post (titled Signing Software The Easy Way with Sigstore and Cosign) goes into some technical details of getting started.
There was an interesting thread in the /r/Signal subreddit that started from the observation that Signal s apk doesn t match with the source code:
Some time ago I checked Signal s reproducibility and it failed. I asked others to test in case I did something wrong, but nobody made any reports. Since then I tried to test the Google Play Store version of the apk against one I compiled myself, and that doesn t match either.

BitcoinBinary.org was announced this month, which aims to be a repository of Reproducible Build Proofs for Bitcoin Projects :
Most users are not capable of building from source code themselves, but we can at least get them able enough to check signatures and shasums. When reputable people who can tell everyone they were able to reproduce the project s build, others at least have a secondary source of validation.

Distribution work Fr d ric Pierret announced a new testing service at beta.tests.reproducible-builds.org, showing actual rebuilds of binaries distributed by both the Debian and Qubes distributions. In Debian specifically, however, 51 reviews of Debian packages were added, 31 were updated and 31 were removed this month to our database of classified issues. As part of this, Chris Lamb refreshed a number of notes, including the build_path_in_record_file_generated_by_pybuild_flit_plugin issue. Elsewhere in Debian, Roland Clobus posted his Fourth status update about reproducible live-build ISO images in Jenkins to our mailing list, which mentions (amongst other things) that:
  • All major configurations are still built regularly using live-build and bullseye.
  • All major configurations are reproducible now; Jenkins is green.
    • I ve worked around the issue for the Cinnamon image.
    • The patch was accepted and released within a few hours.
  • My main focus for the last month was on the live-build tool itself.
Related to this, there was continuing discussion on how to embed/encode the build metadata for the Debian live images which were being worked on by Roland Clobus.
Ariadne Conill published another detailed blog post related to various security initiatives within the Alpine Linux distribution. After summarising some conventional security work being done (eg. with sudo and the release of OpenSSH version 3.0), Ariadne included another section on reproducible builds: The main blocker [was] determining what to do about storing the build metadata so that a build environment can be recreated precisely . Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

Community news On our website this month, Bernhard M. Wiedemann fixed some broken links [ ] and Holger Levsen made a number of changes to the Who is Involved? page [ ][ ][ ]. On our mailing list, Magnus Ihse Bursie started a thread with the subject Reproducible builds on Java, which begins as follows:
I m working for Oracle in the Build Group for OpenJDK which is primary responsible for creating a built artifact of the OpenJDK source code. [ ] For the last few years, we have worked on a low-effort, background-style project to make the build of OpenJDK itself building reproducible. We ve come far, but there are still issues I d like to address. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 183, 184 and 185 as well as performed significant triaging of merge requests and other issues in addition to making the following changes:
  • New features:
    • Support a newer format version of the R language s .rds files. [ ]
    • Update tests for OCaml 4.12. [ ]
    • Add a missing format_class import. [ ]
  • Bug fixes:
    • Don t call close_archive when garbage collecting Archive instances, unless open_archive definitely returned successfully. This prevents, for example, an AttributeError where PGPContainer s cleanup routines were rightfully assuming that its temporary directory had actually been created. [ ]
    • Fix (and test) the comparison of R language s .rdb files after refactoring temporary directory handling. [ ]
    • Ensure that RPM archives exists in the Debian package description, regardless of whether python3-rpm is installed or not at build time. [ ]
  • Codebase improvements:
    • Use our assert_diff routine in tests/comparators/test_rdata.py. [ ]
    • Move diffoscope.versions to diffoscope.tests.utils.versions. [ ]
    • Reformat a number of modules with Black. [ ][ ]
However, the following changes were also made:
  • Mattia Rizzolo:
    • Fix an autopkgtest caused by the androguard module not being in the (expected) python3-androguard Debian package. [ ]
    • Appease a shellcheck warning in debian/tests/control.sh. [ ]
    • Ignore a warning from h5py in our tests that doesn t concern us. [ ]
    • Drop a trailing .1 from the Standards-Version field as it s required. [ ]
  • Zbigniew J drzejewski-Szmek:
    • Stop using the deprecated distutils.spawn.find_executable utility. [ ][ ][ ][ ][ ]
    • Adjust an LLVM-related test for LLVM version 13. [ ]
    • Update invocations of llvm-objdump. [ ]
    • Adjust a test with a one-byte text file for file version 5.40. [ ]
And, finally, Benjamin Peterson added a --diff-context option to control unified diff context size [ ] and Jean-Romain Garnier fixed the Macho comparator for architectures other than x86-64 [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Drop my package rebuilder prototype as it s not useful anymore. [ ]
    • Schedule old packages in Debian bookworm. [ ]
    • Stop scheduling packages for Debian buster. [ ][ ]
    • Don t include PostgreSQL debug output in package lists. [ ]
    • Detect Python library mismatches during build in the node health check. [ ]
    • Update a note on updating the FreeBSD system. [ ]
  • Mattia Rizzolo:
    • Silence a warning from Git. [ ]
    • Update a setting to reflect that Debian bookworm is the new testing. [ ]
    • Upgrade the PostgreSQL database to version 13. [ ]
  • Roland Clobus (Debian live image generation):
    • Workaround non-reproducible config files in the libxml-sax-perl package. [ ]
    • Use the new DNS for the snapshot service. [ ]
  • Vagrant Cascadian:
    • Also note that the armhf architecture also systematically varies by the kernel. [ ]

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

21 September 2021

Russell Coker: Links September 2021

Matthew Garrett wrote an interesting and insightful blog post about the license of software developed or co-developed by machine-learning systems [1]. One of his main points is that people in the FOSS community should aim for less copyright protection. The USENIX ATC 21/OSDI 21 Joint Keynote Address titled It s Time for Operating Systems to Rediscover Hardware has some inssightful points to make [2]. Timothy Roscoe makes some incendiaty points but backs them up with evidence. Is Linux really an OS? I recommend that everyone who s interested in OS design watch this lecture. Cory Doctorow wrote an interesting set of 6 articles about Disneyland, ride pricing, and crowd control [3]. He proposes some interesting ideas for reforming Disneyland. Benjamin Bratton wrote an insightful article about how philosophy failed in the pandemic [4]. He focuses on the Italian philosopher Giorgio Agamben who has a history of writing stupid articles that match Qanon talking points but with better language skills. Arstechnica has an interesting article about penetration testers extracting an encryption key from the bus used by the TPM on a laptop [5]. It s not a likely attack in the real world as most networks can be broken more easily by other methods. But it s still interesting to learn about how the technology works. The Portalist has an article about David Brin s Startide Rising series of novels and his thought s on the concept of Uplift (which he denies inventing) [6]. Jacobin has an insightful article titled You re Not Lazy But Your Boss Wants You to Think You Are [7]. Making people identify as lazy is bad for them and bad for getting them to do work. But this is the first time I ve seen it described as a facet of abusive capitalism. Jacobin has an insightful article about free public transport [8]. Apparently there are already many regions that have free public transport (Tallinn the Capital of Estonia being one example). Fare free public transport allows bus drivers to concentrate on driving not taking fares, removes the need for ticket inspectors, and generally provides a better service. It allows passengers to board buses and trams faster thus reducing traffic congestion and encourages more people to use public transport instead of driving and reduces road maintenance costs. Interesting research from Israel about bypassing facial ID [9]. Apparently they can make a set of 9 images that can pass for over 40% of the population. I didn t expect facial recognition to be an effective form of authentication, but I didn t expect it to be that bad. Edward Snowden wrote an insightful blog post about types of conspiracies [10]. Kevin Rudd wrote an informative article about Sky News in Australia [11]. We need to have a Royal Commission now before we have our own 6th Jan event. Steve from Big Mess O Wires wrote an informative blog post about USB-C and 4K 60Hz video [12]. Basically you can t have a single USB-C hub do 4K 60Hz video and be a USB 3.x hub unless you have compression software running on your PC (slow and only works on Windows), or have DisplayPort 1.4 or Thunderbolt (both not well supported). All of the options are not well documented on online store pages so lots of people will get unpleasant surprises when their deliveries arrive. Computers suck. Steinar H. Gunderson wrote an informative blog post about GaN technology for smaller power supplies [13]. A 65W USB-C PSU that fits the usual wall wart form factor is an interesting development.

17 September 2021

Reproducible Builds (diffoscope): diffoscope 184 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 184. This version includes the following changes:
[ Chris Lamb ]
* Fix the semantic comparison of R's .rdb files after a refactoring of
  temporary directory handling in a previous version.
* Support a newer format version of R's .rds files.
* Update tests for OCaml 4.12. (Closes: reproducible-builds/diffoscope#274)
* Move diffoscope.versions to diffoscope.tests.utils.versions.
* Use assert_diff in tests/comparators/test_rdata.py.
* Reformat various modules with Black.
[ Zbigniew J drzejewski-Szmek ]
* Stop using the deprecated distutils module by adding a version
  comparison class based on the RPM version rules.
* Update invocations of llvm-objdump for the latest version of LLVM.
* Adjust a test with one-byte text file for file(1) version 5.40.
* Improve the parsing of the version of OpenSSH.
[ Benjamin Peterson ]
* Add a --diff-context option to control the unified diff context size.
  (reproducible-builds/diffoscope!88)
You find out more by visiting the project homepage.

31 August 2021

Benjamin Mako Hill: Returning to DebConf

I first started using Debian sometime in the mid 90s and started contributing as a developer and package maintainer more than two decades years ago. My first very first scholarly publication, collaborative work led by Martin Michlmayr that I did when I was still an undergrad at Hampshire College, was about quality and the reliance on individuals in Debian. To this day, many of my closest friends are people I first met through Debian. I met many of them at Debian s annual conference DebConf. Given my strong connections to Debian, I find it somewhat surprising that although all of my academic research has focused on peer production, free culture, and free software, I haven t actually published any Debian related research since that first paper with Martin in 2003! So it felt like coming full circle when, several days ago, I was able to sit in the virtual DebConf audience and watch two of my graduate student advisees Kaylea Champion and Wm Salt Hale present their research about Debian at DebConf21. Salt presented his masters thesis work which tried to understand the social dynamics behind organizational resilience among free software projects. Kaylea presented her work on a new technique she developed to identifying at risk software packages that are lower quality than we might hope given their popularity (you can read more about Kaylea s project in our blog post from earlier this year). If you missed either presentation, check out the blog post my research collective put up or watch the videos below. If you want to hear about new work we re doing including work on Debian you should follow our research group blog, and/or follow or engage with us in the Fediverse (@communitydata@social.coop), or on Twitter (@comdatasci). And if you re interested in joining us perhaps to do more research on FLOSS and/or Debian and/or a graduate degree of your own? please be in touch with me directly!
Wm Salt Hale s presentation plus Q&A. (WebM available)
Kaylea Champion s presentation plus Q&A. (WebM available)

4 May 2021

Benjamin Mako Hill: NSF CAREER Award

In exciting professional news, it was recently announced that I got an National Science Foundation CAREER award! The CAREER is the US NSF s most prestigious award for early-career faculty. In addition to the recognition, the award involves a bunch of money for me to put toward my research over the next 5 years. The Department of Communication at the University of Washington has put up a very nice web page announcing the thing. It s all very exciting and a huge honor. I m very humbled. The grant will support a bunch of new research to develop and test a theory about the relationship between governance and online community lifecycles. If you ve been reading this blog for a while, you ll know that I ve been involved in a bunch of research to describe how peer production communities tend to follow common patterns of growth and decline as well as a studies that show that many open communities become increasingly closed in ways that deter lots of the kinds contributions that made the communities successful in the first place. Over the last few years, I ve worked with Aaron Shaw to develop the outlines of an explanation for why many communities because increasingly closed over time in ways that hurt their ability to integrate contributions from newcomers. Over the course of the work on the CAREER, I ll be continuing that project with Aaron and I ll also be working to test that explanation empirically and to develop new strategies about what online communities can do as a result. In addition to supporting research, the grant will support a bunch of new outreach and community building within the Community Data Science Collective. In particular, I m planning to use the grant to do a better job of building relationships with community participants, community managers, and others in the platforms we study. I m also hoping to use the resources to help the CDSC do a better job of sharing our stuff out in ways that are useful as well doing a better job of listening and learning from the communities that our research seeks to inform. There are many to thank. The proposed work was the direct research of the work I did as the Center for Advanced Studies in the Behavioral Sciences at Stanford where I got to spend the 2018-2019 academic year in Claude Shannon s old office and talking through these ideas with an incredible range of other scholars over lunch every day. It s also the product of years of conversations with Aaron Shaw and Yochai Benkler. The proposal itself reflects the excellent work of the whole CDSC who did the work that made the award possible and provided me with detailed feedback on the proposal itself.

29 March 2021

Benjamin Mako Hill: Identifying Underproduced Software

I wrote this blog post with Kaylea Champion and a version of this post was originally posted on the Community Data Science Collective blog. Critical software we all rely on can silently crumble away beneath us. Unfortunately, we often don t find out software infrastructure is in poor condition until it is too late. Over the last year or so, I have been supporting Kaylea Champion on a project my group announced earlier to measure software underproduction a term we use to describe software that is low in quality but high in importance. Underproduction reflects an important type of risk in widely used free/libre open source software (FLOSS) because participants often choose their own projects and tasks. Because FLOSS contributors work as volunteers and choose what they work on, important projects aren t always the ones to which FLOSS developers devote the most attention. Even when developers want to work on important projects, relative neglect among important projects is often difficult for FLOSS contributors to see. Given all this, what can we do to detect problems in FLOSS infrastructure before major failures occur? Kaylea Champion and I recently published a paper laying out our new method for measuring underproduction at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2021 that we believe provides one important answer to this question.

A conceptual diagram of underproduction. The x-axis shows relative importance, the y-axis relative quality. The top left area of the graph described by these axes is 'overproduction' -- high quality, low importance. The diagonal is Alignment: quality and importance are approximately the same. The lower right depicts underproduction -- high importance, low quality -- the area of potential risk.Conceptual diagram showing how our conception of underproduction relates to quality and importance of software.
In the paper, we describe a general approach for detecting underproduced software infrastructure that consists of five steps: (1) identifying a body of digital infrastructure (like a code repository); (2) identifying a measure of quality (like the time to takes to fix bugs); (3) identifying a measure of importance (like install base); (4) specifying a hypothesized relationship linking quality and importance if quality and importance are in perfect alignment; and (5) quantifying deviation from this theoretical baseline to find relative underproduction. To show how our method works in practice, we applied the technique to an important collection of FLOSS infrastructure: 21,902 packages in the Debian GNU/Linux distribution. Although there are many ways to measure quality, we used a measure of how quickly Debian maintainers have historically dealt with 461,656 bugs that have been filed over the last three decades. To measure importance, we used data from Debian s Popularity Contest opt-in survey. After some statistical machinations that are documented in our paper, the result was an estimate of relative underproduction for the 21,902 packages in Debian we looked at. One of our key findings is that underproduction is very common in Debian. By our estimates, at least 4,327 packages in Debian are underproduced. As you can see in the list of the most underproduced packages again, as estimated using just one more measure many of the most at risk packages are associated with the desktop and windowing environments where there are many users but also many extremely tricky integration-related bugs.
This table shows the 30 packages with the most severe underproduction problem in Debian, shown as a series of boxplots.These 30 packages have the highest level of underproduction in Debian according to our analysis.
We hope these results are useful to folks at Debian and the Debian QA team. We also hope that the basic method we ve laid out is something that others will build off in other contexts and apply to other software repositories.
In addition to the paper itself and the video of the conference presentation on Youtube by Kaylea, we ve put a repository with all our code and data in an archival repository Harvard Dataverse and we d love to work with others interested in applying our approach in other software ecosytems.

For more details, check out the full paper which is available as a freely accessible preprint.

This project was supported by the Ford/Sloan Digital Infrastructure Initiative. Wm Salt Hale of the Community Data Science Collective and Debian Developers Paul Wise and Don Armstrong provided valuable assistance in accessing and interpreting Debian bug data. Ren Just generously provided insight and feedback on the manuscript.

Paper Citation: Kaylea Champion and Benjamin Mako Hill. 2021. Underproduction: An Approach for Measuring Risk in Open Source Software. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). IEEE.

Contact Kaylea Champion (kaylea@uw.edu) with any questions or if you are interested in following up.

Next.