Search Results: "appro"

25 April 2024

Petter Reinholdtsen: 45 orphaned Debian packages moved to git, 391 to go

Nine days ago, I started migrating orphaned Debian packages with no version control system listed in debian/control of the source to git. At the time there were 438 such packages. Now there are 391, according to the UDD. In reality it is slightly less, as there is a delay between uploads and UDD updates. In the nine days since, I have thus been able to work my way through ten percent of the packages. I am starting to run out of steam, and hope someone else will also help brushing some dust of these packages. Here is a recipe how to do it. I start by picking a random package by querying the UDD for a list of 10 random packages from the set of remaining packages:
PGPASSWORD="udd-mirror" psql --port=5432 --host=udd-mirror.debian.net \
  --username=udd-mirror udd -c "select source from sources \
   where release = 'sid' and (vcs_url ilike '%anonscm.debian.org%' \
   OR vcs_browser ilike '%anonscm.debian.org%' or vcs_url IS NULL \
   OR vcs_browser IS NULL) AND maintainer ilike '%packages@qa.debian.org%' \
   order by random() limit 10;"
Next, I visit http://salsa.debian.org/debian and search for the package name, to ensure no git repository already exist. If it does, I clone it and try to get it to an uploadable state, and add the Vcs-* entries in d/control to make the repository more widely known. These packages are a minority, so I will not cover that use case here. For packages without an existing git repository, I run the following script debian-snap-to-salsa to prepare a git repository with the existing packaging.
#!/bin/sh
#
# See also https://bugs.debian.org/804722#31
set -e
# Move to this Standards-Version.
SV_LATEST=4.7.0
PKG="$1"
if [ -z "$PKG" ]; then
    echo "usage: $0 "
    exit 1
fi
if [ -e "$ PKG -salsa" ]; then
    echo "error: $ PKG -salsa already exist, aborting."
    exit 1
fi
if [ -z "ALLOWFAILURE" ] ; then
    ALLOWFAILURE=false
fi
# Fetch every snapshotted source package.  Manually loop until all
# transfers succeed, as 'gbp import-dscs --debsnap' do not fail on
# download failures.
until debsnap --force -v $PKG   $ALLOWFAILURE ; do sleep 1; done
mkdir $ PKG -salsa; cd $ PKG -salsa
git init
# Specify branches to override any debian/gbp.conf file present in the
# source package.
gbp import-dscs  --debian-branch=master --upstream-branch=upstream \
    --pristine-tar ../source-$PKG/*.dsc
# Add Vcs pointing to Salsa Debian project (must be manually created
# and pushed to).
if ! grep -q ^Vcs- debian/control ; then
    awk "BEGIN   s=1   /^\$/   if (s==1)   print \"Vcs-Browser: https://salsa.debian.org/debian/$PKG\"; print \"Vcs-Git: https://salsa.debian.org/debian/$PKG.git\"  ; s=0     print  " < debian/control > debian/control.new && mv debian/control.new debian/control
    git commit -m "Updated vcs in d/control to Salsa." debian/control
fi
# Tell gbp to enforce the use of pristine-tar.
inifile +inifile debian/gbp.conf +create +section DEFAULT +key pristine-tar +value True
git add debian/gbp.conf
git commit -m "Added d/gbp.conf to enforce the use of pristine-tar." debian/gbp.conf
# Update to latest Standards-Version.
SV="$(grep ^Standards-Version: debian/control awk ' print $2 ')"
if [ $SV_LATEST != $SV ]; then
    sed -i "s/\(Standards-Version: \)\(.*\)/\1$SV_LATEST/" debian/control
    git commit -m "Updated Standards-Version from $SV to $SV_LATEST." debian/control
fi
if grep -q pkg-config debian/control; then
    sed -i s/pkg-config/pkgconf/ debian/control
    git commit -m "Replaced obsolete pkg-config build dependency with pkgconf." debian/control
fi
if grep -q libncurses5-dev debian/control; then
    sed -i s/libncurses5-dev/libncurses-dev/ debian/control
    git commit -m "Replaced obsolete libncurses5-dev build dependency with libncurses-dev." debian/control
fi
Some times the debsnap script fail to download some of the versions. In those cases I investigate, and if I decide the failing versions will not be missed, I call it using ALLOWFAILURE=true to ignore the problem and create the git repository anyway. With the git repository in place, I do a test build (gbp buildpackage) to ensure the build is actually working. If it does not I pick a different package, or if the build failure is trivial to fix, I fix it before continuing. At this stage I revisit http://salsa.debian.org/debian and create the project under this group for the package. I then follow the instructions to publish the local git repository. Here is from a recent example:
git remote add origin git@salsa.debian.org:debian/perl-byacc.git
git push --set-upstream origin master upstream pristine-tar
git push --tags
With a working build, I have a look at the build rules if I want to remove some more dust. I normally try to move to debhelper compat level 13, which involves removing debian/compat and modifying debian/control to build depend on debhelper-compat (=13). I also test with 'Rules-Requires-Root: no' in debian/control and verify in debian/rules that hardening is enabled, and include all of these if the package still build. If it fail to build with level 13, I try with 12, 11, 10 and so on until I find a level where it build, as I do not want to spend a lot of time fixing build issues. Some times, when I feel inspired, I make sure debian/copyright is converted to the machine readable format, often by starting with 'debhelper -cc' and then cleaning up the autogenerated content until it matches realities. If I feel like it, I might also clean up non-dh-based debian/rules files to use the short style dh build rules. Once I have removed all the dust I care to process for the package, I run 'gbp dch' to generate a debian/changelog entry based on the commits done so far, run 'dch -r' to switch from 'UNRELEASED' to 'unstable' and get an editor to make sure the 'QA upload' marker is in place and that all long commit descriptions are wrapped into sensible lengths, run 'debcommit --release -a' to commit and tag the new debian/changelog entry, run 'debuild -S' to build a source only package, and 'dput ../perl-byacc_2.0-10_source.changes' to do the upload. During the entire process, and many times per step, I run 'debuild' to verify the changes done still work. I also some times verify the set of built files using 'find debian' to see if I can spot any problems (like no file in usr/bin any more or empty package). I also try to fix all lintian issues reported at the end of each 'debuild' run. If I find Debian specific patches, I try to ensure their metadata is fairly up to date and some times I even try to reach out to upstream, to make the upstream project aware of the patches. Most of my emails bounce, so the success rate is low. For projects with no Homepage entry in debian/control I try to track down one, and for packages with no debian/watch file I try to create one. But at least for some of the packages I have been unable to find a functioning upstream, and must skip both of these. If I could handle ten percent in nine days, twenty people could complete the rest in less then five days. I use approximately twenty minutes per package, when I have twenty minutes spare time to spend. Perhaps you got twenty minutes to spare too? As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

20 April 2024

Bastian Venthur: Help needed: creating a WSDL file to interact with debbugs

I am upstream and Debian package maintainer of python-debianbts, which is a Python library that allows for querying Debian s Bug Tracking System (BTS). python-debianbts is used by reportbug, the standard tool to report bugs in Debian, and therefore the glue between the reportbug and the BTS. debbugs, the software that powers Debian s BTS, provides a SOAP interface for querying the BTS. Unfortunately, SOAP is not a very popular protocol anymore, and I m facing the second migration to another underlying SOAP library as they continue to become unmaintained over time. Zeep, the library I m currently considering, requires a WSDL file in order to work with a SOAP service, however, debbugs does not provide one. Since I m not familiar with WSDL, I need help from someone who can create a WSDL file for debbugs, so I can migrate python-debianbts away from pysimplesoap to zeep. How did we get here? Back in the olden days, reportbug was querying the BTS by parsing its HTML output. While this worked, it tightly coupled the user-facing presentation of the BTS with critical functionality of the bug reporting tool. The setup was fragile, prone to breakage, and did not allow changing anything in the BTS frontend for fear of breaking reportbug itself. In 2007, I started to work on reportbug-ng, a user-friendly alternative to reportbug, targeted at users not comfortable using the command line. Early on, I decided to use the BTS SOAP interface instead of parsing HTML like reportbug did. 2008, I extracted the code that dealt with the BTS into a separate Python library, and after some collaboration with the reportbug maintainers, reportbug adopted python-debianbts in 2011 and has used it ever since. 2015, I was working on porting python-debianbts to Python 3. During that process, it turned out that its major dependency, SoapPy was pretty much unmaintained for years and blocking the Python3 transition. Thanks to the help of Gaetano Guerriero, who ported python-debianbts to pysimplesoap, the migration was unblocked and could proceed. In 2024, almost ten years later, pysimplesoap seems to be unmaintained as well, and I have to look again for alternatives. The most promising one right now seems to be zeep. Unfortunately, zeep requires a WSDL file for working with a SOAP service, which debbugs does not provide. How can you help? reportbug (and thus python-debianbts) is used by thousands of users and I have a certain responsibility to keep things working properly. Since I simply don t know enough about WSDL to create such a file for debbugs myself, I m looking for someone who can help me with this task. If you re familiar with SOAP, WSDL and optionally debbugs, please get in touch with me. I don t speak Pearl, so I m not really able to read debbugs code, but I do know some things about the SOAP requests and replies due to my work on python-debianbts, so I m sure we can work something out. There is a WSDL file for a debbugs version used by GNU, but I don t think it s official and it currently does not work with zeep. It may be a good starting point, though. The future of debbugs API While we can probably continue to support debbugs SOAP interface for a while, I don t think it s very sustainable in the long run. A simpler, well documented REST API that returns JSON seems more appropriate nowadays. The queries and replies that debbugs currently supports are simple enough to design a REST API with JSON around it. The benefit would be less complex libraries on the client side and probably easier maintainability on the server side as well. debbugs maintainer seemed to be in agreement with this idea back in 2018. I created an attempt to define a new API (HTML render), but somehow we got stuck and no progress has been made since then. I m still happy to help shaping such an API for debbugs, but I can t really implement anything in debbugs itself, as it is written in Perl, which I m not familiar with.

18 April 2024

Russ Allbery: Review: Unseen Academicals

Review: Unseen Academicals, by Terry Pratchett
Series: Discworld #37
Publisher: Harper
Copyright: October 2009
Printing: November 2014
ISBN: 0-06-233500-6
Format: Mass market
Pages: 517
Unseen Academicals is the 37th Discworld novel and includes many of the long-standing Ankh-Morpork cast, but mostly as supporting characters. The main characters are a new (and delightful) bunch with their own concerns. You arguably could start reading here if you really wanted to, although you would risk spoiling several previous books (most notably Thud!) and will miss some references that depend on familiarity with the cast. The Unseen University is, like most institutions of its sort, funded by an endowment that allows the wizards to focus on the pure life of the mind (or the stomach). Much to their dismay, they have just discovered that an endowment that amounts to most of their food budget requires that they field a football team. Glenda runs the night kitchen at the Unseen University. Given the deep and abiding love that wizards have for food, there is both a main kitchen and a night kitchen. The main kitchen is more prestigious, but the night kitchen is responsible for making pies, something that Glenda is quietly but exceptionally good at. Juliet is Glenda's new employee. She is exceptionally beautiful, not very bright, and a working class girl of the Ankh-Morpork streets down to her bones. Trevor Likely is a candle dribbler, responsible for assisting the Candle Knave in refreshing the endless university candles and ensuring that their wax is properly dribbled, although he pushes most of that work off onto the infallibly polite and oddly intelligent Mr. Nutt. Glenda, Trev, and Juliet are the sort of people who populate the great city of Ankh-Morpork. While the people everyone has heard of have political crises, adventures, and book plots, they keep institutions like the Unseen University running. They read romance novels, go to the football games, and nurse long-standing rivalries. They do not expect the high mucky-mucks to enter their world, let alone mess with their game. I approached Unseen Academicals with trepidation because I normally don't get along as well with the Discworld wizard books. I need not have worried; Pratchett realized that the wizards would work better as supporting characters and instead turns the main plot (or at least most of it; more on that later) over to the servants. This was a brilliant decision. The setup of this book is some of the best of Discworld up to this point. Trev is a streetwise rogue with an uncanny knack for kicking around a can that he developed after being forbidden to play football by his dear old mum. He falls for Juliet even though their families support different football teams, so you may think that a Romeo and Juliet spoof is coming. There are a few gestures of one, but Pratchett deftly avoids the pitfalls and predictability and instead makes Juliet one of the best characters in the book by playing directly against type. She is one of the characters that Pratchett is so astonishingly good at, the ones that are so thoroughly themselves that they transcend the stories they're put into. The heart of this book, though, is Glenda.
Glenda enjoyed her job. She didn't have a career; they were for people who could not hold down jobs.
She is the kind of person who knows where she fits in the world and likes what she does and is happy to stay there until she decides something isn't right, and then she changes the world through the power of common sense morality, righteous indignation, and sheer stubborn persistence. Discworld is full of complex and subtle characters fencing with each other, but there are few things I have enjoyed more than Glenda being a determinedly good person. Vetinari of course recognizes and respects (and uses) that inner core immediately. Unfortunately, as great as the setup and characters are, Unseen Academicals falls apart a bit at the end. I was eagerly reading the story, wondering what Pratchett was going to weave out of the stories of these individuals, and then it partly turned into yet another wizard book. Pratchett pulled another of his deus ex machina tricks for the climax in a way that I found unsatisfying and contrary to the tone of the rest of the story, and while the characters do get reasonable endings, it lacked the oomph I was hoping for. Rincewind is as determinedly one-note as ever, the wizards do all the standard wizard things, and the plot just isn't that interesting. I liked Mr. Nutt a great deal in the first part of the book, and I wish he could have kept that edge of enigmatic competence and unflappableness. Pratchett wanted to tell a different story that involved more angst and self-doubt, and while I appreciate that story, I found it less engaging and a bit more melodramatic than I was hoping for. Mr. Nutt's reactions in the last half of the book were believable and fit his background, but that was part of the problem: he slotted back into an archetype that I thought Pratchett was going to twist and upend. Mr. Nutt does, at least, get a fantastic closing line, and as usual there are a lot of great asides and quotes along the way, including possibly the sharpest and most biting Vetinari speech of the entire series.
The Patrician took a sip of his beer. "I have told this to few people, gentlemen, and I suspect never will again, but one day when I was a young boy on holiday in Uberwald I was walking along the bank of a stream when I saw a mother otter with her cubs. A very endearing sight, I'm sure you will agree, and even as I watched, the mother otter dived into the water and came up with a plump salmon, which she subdued and dragged on to a half-submerged log. As she ate it, while of course it was still alive, the body split and I remember to this day the sweet pinkness of its roes as they spilled out, much to the delight of the baby otters who scrambled over themselves to feed on the delicacy. One of nature's wonders, gentlemen: mother and children dining on mother and children. And that's when I first learned about evil. It is built into the very nature of the universe. Every world spins in pain. If there is any kind of supreme being, I told myself, it is up to all of us to become his moral superior."
My dissatisfaction with the ending prevents Unseen Academicals from rising to the level of Night Watch, and it's a bit more uneven than the best books of the series. Still, though, this is great stuff; recommended to anyone who is reading the series. Followed in publication order by I Shall Wear Midnight. Rating: 8 out of 10

13 April 2024

Simon Josefsson: Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz
The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:
make srcdist
gpg -b libntlm-1.8-src.tar.gz
Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:
91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz
So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:
podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz
You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:
podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz
The source-only minimal tarball can be regenerated on Debian 11:
podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 
As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.
podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz
Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:
git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD
Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

12 April 2024

Freexian Collaborators: Debian Contributions: SSO Authentication for jitsi.debian.social, /usr-move updates, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services. P.S. We ve completed over a year of writing these blogs. If you have any suggestions on how to make them better or what you d like us to cover, or any other opinions/reviews you might have, et al, please let us know by dropping an email to us. We d be happy to hear your thoughts. :)

SSO Authentication for jitsi.debian.social, by Stefano Rivera Debian.social s jitsi instance has been getting some abuse by (non-Debian) people sharing sexually explicit content on the service. After playing whack-a-mole with this for a month, and shutting the instance off for another month, we opened it up again and the abuse immediately re-started. Stefano sat down and wrote an SSO Implementation that hooks into Jitsi s existing JWT SSO support. This requires everyone using jitsi.debian.social to have a Salsa account. With only a little bit of effort, we could change this in future, to only require an account to open a room, and allow guests to join the call.

/usr-move, by Helmut Grohne The biggest task this month was sending mitigation patches for all of the /usr-move issues arising from package renames due to the 2038 transition. As a result, we can now say that every affected package in unstable can either be converted with dh-sequence-movetousr or has an open bug report. The package set relevant to debootstrap except for the set that has to be uploaded concurrently has been moved to /usr and is awaiting migration. The move of coreutils happened to affect piuparts which hard codes the location of /bin/sync and received multiple updates as a result.

Miscellaneous contributions
  • Stefano Rivera uploaded a stable release update to python3.11 for bookworm, fixing a use-after-free crash.
  • Stefano uploaded a new version of python-html2text, and updated python3-defaults to build with it.
  • In support of Python 3.12, Stefano dropped distutils as a Build-Dependency from a few packages, and uploaded a complex set of patches to python-mitogen.
  • Stefano landed some merge requests to clean up dead code in dh-python, removed the flit plugin, and uploaded it.
  • Stefano uploaded new upstream versions of twisted, hatchling, python-flexmock, python-authlib, python mitogen, python-pipx, and xonsh.
  • Stefano requested removal of a few packages supporting the Opsis HDMI2USB hardware that DebConf Video team used to use for HDMI capture, as they are not being maintained upstream. They started to FTBFS, with recent sdcc changes.
  • DebConf 24 is getting ready to open registration, Stefano spent some time fixing bugs in the website, caused by infrastructure updates.
  • Stefano reviewed all the DebConf 23 travel reimbursements, filing requests for more information from SPI where our records mismatched.
  • Stefano spun up a Wafer website for the Berlin 2024 mini DebConf.
  • Roberto C. S nchez worked on facilitating the transfer of upstream maintenance responsibility for the dormant Shorewall project to a new team led by the current maintainer of the Shorewall packages in Debian.
  • Colin Watson fixed build failures in celery-haystack-ng, db1-compat, jsonpickle, libsdl-perl, kali, knews, openssh-ssh1, python-json-log-formatter, python-typing-extensions, trn4, vigor, and wcwidth. Some of these were related to the 64-bit time_t transition, since that involved enabling -Werror=implicit-function-declaration.
  • Colin fixed an off-by-one error in neovim, which was already causing a build failure in Ubuntu and would eventually have caused a build failure in Debian with stricter toolchain settings.
  • Colin added an sshd@.service template to openssh to help newer systemd versions make containers and VMs SSH-accessible over AF_VSOCK sockets.
  • Following the xz-utils backdoor, Colin spent some time testing and discussing OpenSSH upstream s proposed inline systemd notification patch, since the current implementation via libsystemd was part of the attack vector used by that backdoor.
  • Utkarsh reviewed and sponsored some Go packages for Lena Voytek and Rajudev.
  • Utkarsh also helped Mitchell Dzurick with the adoption of pyparted package.
  • Helmut sent 10 patches for cross build failures.
  • Helmut partially fixed architecture cross bootstrap tooling to deal with changes in linux-libc-dev and the recent gcc-for-host changes and also fixed a 64bit-time_t FTBFS in libtextwrap.
  • Thorsten Alteholz uploaded several packages from debian-printing: cjet, lprng, rlpr and epson-inkjet-printer-escpr were affected by the newly enabled compiler switch -Werror=implicit-function-declaration. Besides fixing these serious bugs, Thorsten also worked on other bugs and could fix one or the other.
  • Carles updated simplemonitor and python-ring-doorbell packages with new upstream versions.
  • Santiago is still working on the Salsa CI MRs to adapt the build jobs so they can rely on sbuild. Current work includes adapting the images used by the build job, implementing the basic sbuild support the related jobs, and adjusting the support for experimental and *-backports releases..
    Additionally, Santiago reviewed some MR such as Make timeout action explicit in the logs and the subsequent Implement conditional timeout verbosity, and the batch of MRs included in https://salsa.debian.org/salsa-ci-team/pipeline/-/merge_requests/482.
  • Santiago also reviewed applications for the improving Salsa CI in Debian GSoC 2024 project. We received applications from four very talented candidates. The selection process is currently ongoing. A huge thanks to all of them!
  • As part of the DebConf 24 organization, Santiago has taken part in the Content team discussions.

11 April 2024

Jonathan McDowell: Sorting out backup internet #1: recursive DNS

I work from home these days, and my nearest office is over 100 miles away, 3 hours door to door if I travel by train (and, to be honest, probably not a lot faster given rush hour traffic if I drive). So I m reliant on a functional internet connection in order to be able to work. I m lucky to have access to Openreach FTTP, provided by Aquiss, but I worry about what happens if there s a cable cut somewhere or some other long lasting problem. Worst case I could tether to my work phone, or try to find some local coworking space to use while things get sorted, but I felt like arranging a backup option was a wise move. Step 1 turned out to be sorting out recursive DNS. It s been many moons since I had to deal with running DNS in a production setting, and I ve mostly done my best to avoid doing it at home too. dnsmasq has done a decent job at providing for my needs over the years, covering DHCP, DNS (+ tftp for my test device network). However I just let it forward to my ISP s nameservers, which means if that link goes down it ll no longer be able to resolve anything outside the house. One option would have been to either point to a different recursive DNS server (Cloudfare s 1.1.1.1 or Google s Public DNS being the common choices), but I ve no desire to share my lookup information with them. As another approach I could have done some sort of failover of resolv.conf when the primary network went down, but then I would have to get into moving files around based on networking status and that felt a bit clunky. So I decided to finally setup a proper local recursive DNS server, which is something I ve kinda meant to do for a while but never had sufficient reason to look into. Last time I did this I did it with BIND 9 but there are more options these days, and I decided to go with unbound, which is primarily focused on recursive DNS. One extra wrinkle, pointed out by Lars, is that having dynamic name information from DHCP hosts is exceptionally convenient. I ve kept dnsmasq as the local DHCP server, so I wanted to be able to forward local queries there. I m doing all of this on my RB5009, running Debian. Installing unbound was a simple matter of apt install unbound. I needed 2 pieces of configuration over the default, one to enable recursive serving for the house networks, and one to enable forwarding of queries for the local domain to dnsmasq. I originally had specified the wildcard address for listening, but this caused problems with the fact my router has many interfaces and would sometimes respond from a different address than the request had come in on.
/etc/unbound/unbound.conf.d/network-resolver.conf
server:
  interface: 192.0.2.1
  interface: 2001::db8:f00d::1
  access-control: 192.0.2.0/24 allow
  access-control: 2001::db8:f00d::/56 allow

/etc/unbound/unbound.conf.d/local-to-dnsmasq.conf
server:
  domain-insecure: "example.org"
  do-not-query-localhost: no
forward-zone:
  name: "example.org"
  forward-addr: 127.0.0.1@5353

I then had to configure dnsmasq to not listen on port 53 (so unbound could), respond to requests on the loopback interface (I have dnsmasq restricted to only explicitly listed interfaces), and to hand out unbound as the appropriate nameserver in DHCP requests - once dnsmasq is not listening on port 53 it no longer does this by default.
/etc/dnsmasq.d/behind-unbound
interface=lo
port=5353
dhcp-option=option6:dns-server,[2001::db8:f00d::1]
dhcp-option=option:dns-server,192.0.2.1

With these minor changes in place I now have local recursive DNS being handled by unbound, without losing dynamic local DNS for DHCP hosts. As an added bonus I now get 10/10 on Test IPv6 - previously I was getting dinged on the ability for my DNS server to resolve purely IPv6 reachable addresses. Next step, actually sorting out a backup link.

Russell Coker: ML Training License

Last year a Debian Developer blogged about writing Haskell code to give a bad result for LLMs that were trained on it. I forgot who wrote the post and I d appreciate the URL if anyone has it. I respect such technical work to enforce one s legal rights when they aren t respected by corporations, but I have a different approach. As an aside the Fosdem lecture Fortify AI against regulation, litigation and lobotomies is interesting on this topic [1], it s what inspired me to write about this. For what I write I am at this time happy to allow it to be used as part of a large training data set (consider this blog post a licence grant that applies until such time as I edit this post to change it). But only if aggregated with so much other data that my content is only a tiny portion of the data set by any metric. So I don t want someone to make a programming LLM that has my code as the only C code or a political data set that has my blog posts as the only left-wing content. If someone wants to train an LLM on only my content to make a Russell-simulator then I don t license my work for that purpose but also as it s small enough that anyone with a bit of skill could do it on a weekend I can t stop it. I would be really interested in seeing the results if someone from the FOSS community wanted to make a Russell-simulator and would probably issue them a license for such work if asked. If my work comprises more than 0.1% of the content in a particular measure (theme, programming language, political position, etc) in a training data set then I don t permit that without prior discussion. Finally if someone wants to make a FOSS training data set to be used for FOSS LLM systems (maybe under the AGPL or some similar license) then I ll allow my writing to be used as part of that.

9 April 2024

Matthew Palmer: How I Tripped Over the Debian Weak Keys Vulnerability

Those of you who haven t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys. The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I d share it as a tale of how huh, that s weird can be a powerful threat-hunting tool but only if you ve got the time to keep pulling at the thread.

Prelude to an Adventure Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet s eye. I am, of course, talking about everyone s favourite Microsoft product: GitHub. Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow. The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?
2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file
Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done. EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint. We didn t take this decision on a whim it wasn t a case of yeah, sure, let s just hack around with OpenSSH, what could possibly go wrong? . We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn t have to worry about this problem for a long time to come. Normally, you d think patching OpenSSH to make mass SSH logins super fast would be a good story on its own. But no, this is just the opening scene.

Chekov s Gun Makes its Appearance Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users repos over SSH. Naturally, as we d recently rolled out the OpenSSH patch, which touched this very thing, the code I d written was suspect number one, so I was called in to help.
The lineup scene from the movie The Usual Suspects They're called The Usual Suspects for a reason, but sometimes, it really is Keyser S ze
Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn t happen it s a bit like winning the lottery twice in a row1 unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on. The users professed no knowledge of each other, neither admitted to publicising their key, and couldn t offer any explanation as to how the other person could possibly have gotten their key. Then things went from weird to what the ? . Because another pair of users showed up, sharing a key fingerprint but it was a different shared key fingerprint. The odds now have gone from winning the lottery multiple times in a row to as close to this literally cannot happen as makes no difference.
Milhouse from The Simpsons says that We're Through The Looking Glass Here, People
Once we were really, really confident that the OpenSSH patch wasn t the cause of the problem, my involvement in the problem basically ended. I wasn t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys. However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated. That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov s Gun Goes Off With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from bazillions to a little over 32,000. With so many people signing up to GitHub some of them no doubt following best practice and freshly generating a separate key it s unsurprising that some collisions occurred. You can imagine the sense of oooooooh, so that s what s going on! that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned While I ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go hmm, I wonder what s going on here? , then keep digging until the solution presented itself. The thought hmm, that s odd , followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though. When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn t do the deep investigation. I wonder if Luciano hadn t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service. As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference. It s a luxury to be able to take the time to really dig into a problem, and it s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go hmm .

Support My Hunches If you d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.
  1. the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it s easiest to just approximate it as really, really unlikely .

5 April 2024

Paul Wise: FLOSS Activities March 2024

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Administration
  • Debian wiki: approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

Bits from Debian: apt install dpl-candidate: Andreas Tille

The Debian Project Developers will shortly vote for a new Debian Project Leader known as the DPL. The Project Leader is the official representative of The Debian Project tasked with managing the overall project, its vision, direction, and finances. The DPL is also responsible for the selection of Delegates, defining areas of responsibility within the project, the coordination of Developers, and making decisions required for the project. Our outgoing and present DPL Jonathan Carter served 4 terms, from 2020 through 2024. Jonathan shared his last Bits from the DPL post to Debian recently and his hopes for the future of Debian. Recently, we sat with the two present candidates for the DPL position asking questions to find out who they really are in a series of interviews about their platforms, visions for Debian, lives, and even their favorite text editors. The interviews were conducted by disaster2life (Yashraj Moghe) and made available from video and audio transcriptions: Voting for the position starts on April 6, 2024. Editors' note: This is our official return to Debian interviews, readers should stay tuned for more upcoming interviews with Developers and other important figures in Debian as part of our "Meet your Debian Developer" series. We used the following tools and services: Turboscribe.ai for the transcription from the audio and video files, IRC: Oftc.net for communication, Jitsi meet for interviews, and Open Broadcaster Software (OBS) for editing and video. While we encountered many technical difficulties in the return to this process, we are still able and proud to present the transcripts of the interviews edited only in a few areas for readability. 2024 Debian Project Leader Candidate: Andrea Tille Andreas' Interview Who are you? Tell us a little about yourself. [Andreas]:
How am I? Well, I'm, as I wrote in my platform, I'm a proud grandfather doing a lot of free software stuff, doing a lot of sports, have some goals in mind which I like to do and hopefully for the best of Debian.
And How are you today? [Andreas]:
How I'm doing today? Well, actually I have some headaches but it's fine for the interview. So, usually I feel very good. Spring was coming here and today it's raining and I plan to do a bicycle tour tomorrow and hope that I do not get really sick but yeah, for the interview it's fine.
What do you do in Debian? Could you mention your story here? [Andreas]:
Yeah, well, I started with Debian kind of an accident because I wanted to have some package salvaged which is called WordNet. It's a monolingual dictionary and I did not really plan to do more than maybe 10 packages or so. I had some kind of training with xTeddy which is totally unimportant, a cute teddy you can put on your desktop. So, and then well, more or less I thought how can I make Debian attractive for my employer which is a medical institute and so on. It could make sense to package bioinformatics and medicine software and it somehow evolved in a direction I did neither expect it nor wanted to do, that I'm currently the most busy uploader in Debian, created several teams around it. DebianMate is very well known from me. I created the Blends team to create teams and techniques around what we are doing which was Debian TIS, Debian Edu, Debian Science and so on and I also created the packaging team for R, for the statistics package R which is technically based and not topic based. All these blends are covering a certain topic and R is just needed by lots of these blends. So, yeah, and to cope with all this I have written a script which is routing an update to manage all these uploads more or less automatically. So, I think I had one day where I uploaded 21 new packages but it's just automatically generated, right? So, it's on one day more than I ever planned to do.
What is the first thing you think of when you think of Debian? Editors' note: The question was misunderstood as the worst thing you think of when you think of Debian [Andreas]:
The worst thing I think about Debian, it's complicated. I think today on Debian board I was asked about the technical progress I want to make and in my opinion we need to standardize things inside Debian. For instance, bringing all the packages to salsa, follow some common standards, some common workflow which is extremely helpful. As I said, if I'm that productive with my own packages we can adopt this in general, at least in most cases I think. I made a lot of good experience by the support of well-formed teams. Well-formed teams are those teams where people support each other, help each other. For instance, how to say, I'm a physicist by profession so I'm not an IT expert. I can tell apart what works and what not but I'm not an expert in those packages. I do and the amount of packages is so high that I do not even understand all the techniques they are covering like Go, Rust and something like this. And I also don't speak Java and I had a problem once in the middle of the night and I've sent the email to the list and was a Java problem and I woke up in the morning and it was solved. This is what I call a team. I don't call a team some common repository that is used by random people for different packages also but it's working together, don't hesitate to solve other people's problems and permit people to get active. This is what I call a team and this is also something I observed in, it's hard to give a percentage, in a lot of other teams but we have other people who do not even understand the concept of the team. Why is working together make some advantage and this is also a tough thing. I [would] like to tackle in my term if I get elected to form solid teams using the common workflow. This is one thing. The other thing is that we have a lot of good people in our infrastructure like FTP masters, DSA and so on. I have the feeling they have a lot of work and are working more or less on their limits, and I like to talk to them [to ask] what kind of change we could do to move that limits or move their personal health to the better side.
The DPL term lasts for a year, What would you do during that you couldn't do now? [Andreas]:
Yeah, well this is basically what I said are my main issues. I need to admit I have no really clear imagination what kind of tasks will come to me as a DPL because all these financial issues and law issues possible and issues [that] people who are not really friendly to Debian might create. I'm afraid these things might occupy a lot of time and I can't say much about this because I simply don't know.
What are three key terms about you and your candidacy? [Andreas]:
As I said, I like to work on standards, I d like to make Debian try [to get it right so] that people don't get overworked, this third key point is be inviting to newcomers, to everybody who wants to come. Yeah, I also mentioned in my term this diversity issue, geographical and from gender point of view. This may be the three points I consider most important.
Preferred text editor? [Andreas]:
Yeah, my preferred one? Ah, well, I have no preferred text editor. I'm using the Midnight Commander very frequently which has an internal editor which is convenient for small text. For other things, I usually use VI but I also use Emacs from time to time. So, no, I have not preferred text editor. Whatever works nicely for me.
What is the importance of the community in the Debian Project? How would like to see it evolving over the next few years? [Andreas]:
Yeah, I think the community is extremely important. So, I was on a lot of DebConfs. I think it's not really 20 but 17 or 18 DebCons and I really enjoyed these events every year because I met so many friends and met so many interesting people that it's really enriching my life and those who I never met in person but have read interesting things and yeah, Debian community makes really a part of my life.
And how do you think it should evolve specifically? [Andreas]:
Yeah, for instance, last year in Kochi, it became even clearer to me that the geographical diversity is a really strong point. Just discussing with some women from India who is afraid about not coming next year to Busan because there's a problem with Shanghai and so on. I'm not really sure how we can solve this but I think this is a problem at least I wish to tackle and yeah, this is an interesting point, the geographical diversity and I'm running the so-called mentoring of the month. This is a small project to attract newcomers for the Debian Med team which has the focus on medical packages and I learned that we had always men applying for this and so I said, okay, I dropped the constraint of medical packages. Any topic is fine, I teach you packaging but it must be someone who does not consider himself a man. I got only two applicants, no, actually, I got one applicant and one response which was kind of strange if I'm hunting for women or so. I did not understand but I got one response and interestingly, it was for me one of the least expected counters. It was from Iran and I met a very nice woman, very open, very skilled and gifted and did a good job or have even lose contact today and maybe we need more actively approach groups that are underrepresented. I don't know if what's a good means which I did but at least I tried and so I try to think about these kind of things.
What part of Debian has made you smile? What part of the project has kept you going all through the years? [Andreas]:
Well, the card game which is called Mao on the DebConf made me smile all the time. I admit I joined only two or three times even if I really love this kind of games but I was occupied by other stuff so this made me really smile. I also think the first online DebConf in 2020 made me smile because we had this kind of short video sequences and I tried to make a funny video sequence about every DebConf I attended before. This is really funny moments but yeah, it's not only smile but yeah. One thing maybe it's totally unconnected to Debian but I learned personally something in Debian that we have a do-ocracy and you can do things which you think that are right if not going in between someone else, right? So respect everybody else but otherwise you can do so. And in 2020 I also started to take trees which are growing widely in my garden and plant them into the woods because in our woods a lot of trees are dying and so I just do something because I can. I have the resource to do something, take the small tree and bring it into the woods because it does not harm anybody. I asked the forester if it is okay, yes, yes, okay. So everybody can do so but I think the idea to do something like this came also because of the free software idea. You have the resources, you have the computer, you can do something and you do something productive, right? And when thinking about this I think it was also my Debian work. Meanwhile I have planted more than 3,000 trees so it's not a small number but yeah, I enjoy this.
What part of Debian would you have some criticisms for? [Andreas]:
Yeah, it's basically the same as I said before. We need more standards to work together. I do not want to repeat this but this is what I think, yeah.
What field in Free Software generally do you think requires the most work to be put into it? What do you think is Debian's part in the field? [Andreas]:
It's also in general, the thing is the fact that I'm maintaining packages which are usually as modern software is maintained in Git, which is fine but we have some software which is at Sourceport, we have software laying around somewhere, we have software where Debian somehow became Upstream because nobody is caring anymore and free software is very different in several things, ways and well, I in principle like freedom of choice which is the basic of all our work. Sometimes this freedom goes in the way of productivity because everybody is free to re-implement. You asked me for the most favorite editor. In principle one really good working editor would be great to have and would work and we have maybe 500 in Debian or so, I don't know. I could imagine if people would concentrate and say five instead of 500 editors, we could get more productive, right? But I know this will not happen, right? But I think this is one thing which goes in the way of making things smooth and productive and we could have more manpower to replace one person who's [having] children, doing some other stuff and can't continue working on something and maybe this is a problem I will not solve, definitely not, but which I see.
What do you think is Debian's part in the field? [Andreas]:
Yeah, well, okay, we can bring together different Upstreams, so we are building some packages and have some general overview about similar things and can say, oh, you are doing this and some other person is doing more or less the same, do you want to join each other or so, but this is kind of a channel we have to our Upstreams which is probably not very successful. It starts with code copies of some libraries which are changed a little bit, which is fine license-wise, but not so helpful for different things and so I've tried to convince those Upstreams to forward their patches to the original one, but for this and I think we could do some kind of, yeah, [find] someone who brings Upstream together or to make them stop their forking stuff, but it costs a lot of energy and we probably don't have this and it's also not realistic that we can really help with this problem.
Do you have any questions for me? [Andreas]:
I enjoyed the interview, I enjoyed seeing you again after half a year or so. Yeah, actually I've seen you in the eating room or cheese and wine party or so, I do not remember we had to really talk together, but yeah, people around, yeah, for sure. Yeah.

4 April 2024

Lukas M rdian: Netplan v1.0 paves the way to stable, declarative network management

New netplan status diff subcommand, finding differences between configuration and system state As the maintainer and lead developer for Netplan, I m proud to announce the general availability of Netplan v1.0 after more than 7 years of development efforts. Over the years, we ve so far had about 80 individual contributors from around the globe. This includes many contributions from our Netplan core-team at Canonical, but also from other big corporations such as Microsoft or Deutsche Telekom. Those contributions, along with the many we receive from our community of individual contributors, solidify Netplan as a healthy and trusted open source project. In an effort to make Netplan even more dependable, we started shipping upstream patch releases, such as 0.106.1 and 0.107.1, which make it easier to integrate fixes into our users custom workflows. With the release of version 1.0 we primarily focused on stability. However, being a major version upgrade, it allowed us to drop some long-standing legacy code from the libnetplan1 library. Removing this technical debt increases the maintainability of Netplan s codebase going forward. The upcoming Ubuntu 24.04 LTS and Debian 13 releases will ship Netplan v1.0 to millions of users worldwide.

Highlights of version 1.0 In addition to stability and maintainability improvements, it s worth looking at some of the new features that were included in the latest release:
  • Simultaneous WPA2 & WPA3 support.
  • Introduction of a stable libnetplan1 API.
  • Mellanox VF-LAG support for high performance SR-IOV networking.
  • New hairpin and port-mac-learning settings, useful for VXLAN tunnels with FRRouting.
  • New netplan status diff subcommand, finding differences between configuration and system state.
Besides those highlights of the v1.0 release, I d also like to shed some light on new functionality that was integrated within the past two years for those upgrading from the previous Ubuntu 22.04 LTS which used Netplan v0.104:
  • We added support for the management of new network interface types, such as veth, dummy, VXLAN, VRF or InfiniBand (IPoIB).
  • Wireless functionality was improved by integrating Netplan with NetworkManager on desktop systems, adding support for WPA3 and adding the notion of a regulatory-domain, to choose proper frequencies for specific regions.
  • To improve maintainability, we moved to Meson as Netplan s buildsystem, added upstream CI coverage for multiple Linux distributions and integrations (such as Debian testing, NetworkManager, snapd or cloud-init), checks for ABI compatibility, and automatic memory leak detection.
  • We increased consistency between the supported backend renderers (systemd-networkd and NetworkManager), by matching physical network interfaces on permanent MAC address, when the match.macaddress setting is being used, and added new hardware offloading functionality for high performance networking, such as Single-Root IO Virtualisation virtual function link-aggregation (SR-IOV VF-LAG).
The much improved Netplan documentation, that is now hosted on Read the Docs , and new command line subcommands, such as netplan status, make Netplan a well vested tool for declarative network management and troubleshooting.

Integrations Those changes pave the way to integrate Netplan in 3rd party projects, such as system installers or cloud deployment methods. By shipping the new python3-netplan Python bindings to libnetplan, it is now easier than ever to access Netplan functionality and network validation from other projects. We are proud that the Debian Cloud Team chose Netplan to be the default network management tool in their official cloud-images for Debian Bookworm and beyond. Ubuntu s NetworkManager package now uses Netplan as it s default backend on Ubuntu 23.10 Desktop systems and beyond. Further integrations happened with cloud-init and the Calamares installer.
Please check out the Netplan version 1.0 release on GitHub! If you want to learn more, follow our activities on Netplan.io, GitHub, Launchpad, IRC or our Netplan Developer Diaries blog on discourse.

1 April 2024

Simon Josefsson: Towards reproducible minimal source code tarballs? On *-src.tar.gz

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I d like to attempt to formalize two ideas, which have been discussed before, but the context in which they can be appreciated have not been as clear as it is today.
  1. Reproducible tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I ve brought up this aspect before.
  2. Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.
My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:
  1. The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
  2. The tarballs should be signed, for example with PGP or minisign.
  3. The tarball should be possible to reproduce bit-by-bit by a third party using upstream s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
  4. The tarball should not require an Internet connection to download things.
    • Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
    • Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
  5. The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the minimal property lacking today.
    • Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
    • Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository and packages were set up to extract the version they need (gnulib s ./bootstrap already supports this via the gnulib-refdir parameter), this is not normally in place.
    • Suggested Corollary: The tarball should contain content from git submodule s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
  6. Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.
If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by unnecessary files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today s foo-1.2.3.tar.gz files. I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. One problem with this is that SHA-1 is broken, so placing trust in a SHA-1 identifier is simply not secure. Another counter-argument is that this optimize for packagers benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that. Update: In my experiment with source-only tarballs for Libntlm I actually did use git-archive output. What do you think?

Arturo Borrero Gonz lez: Kubecon and CloudNativeCon 2024 Europe summary

Kubecon EU 2024 Paris logo This blog post shares my thoughts on attending Kubecon and CloudNativeCon 2024 Europe in Paris. It was my third time at this conference, and it felt bigger than last year s in Amsterdam. Apparently it had an impact on public transport. I missed part of the opening keynote because of the extremely busy rush hour tram in Paris. On Artificial Intelligence, Machine Learning and GPUs Talks about AI, ML, and GPUs were everywhere this year. While it wasn t my main interest, I did learn about GPU resource sharing and power usage on Kubernetes. There were also ideas about offering Models-as-a-Service, which could be cool for Wikimedia Toolforge in the future. See also: On security, policy and authentication This was probably the main interest for me in the event, given Wikimedia Toolforge was about to migrate away from Pod Security Policy, and we were currently evaluating different alternatives. In contrast to my previous attendances to Kubecon, where there were three policy agents with presence in the program schedule, Kyverno, Kubewarden and OpenPolicyAgent (OPA), this time only OPA had the most relevant sessions. One surprising bit I got from one of the OPA sessions was that it could work to authorize linux PAM sessions. Could this be useful for Wikimedia Toolforge? OPA talk I attended several sessions related to authentication topics. I discovered the keycloak software, which looks very promising. I also attended an Oauth2 session which I had a hard time following, because I clearly missed some additional knowledge about how Oauth2 works internally. I also attended a couple of sessions that ended up being a vendor sales talk. See also: On container image builds, harbor registry, etc This topic was also of interest to me because, again, it is a core part of Wikimedia Toolforge. I attended a couple of sessions regarding container image builds, including topics like general best practices, image minimization, and buildpacks. I learned about kpack, which at first sight felt like a nice simplification of how the Toolforge build service was implemented. I also attended a session by the Harbor project maintainers where they shared some valuable information on things happening soon or in the future , for example: On networking I attended a couple of sessions regarding networking. One session in particular I paid special attention to, ragarding on network policies. They discussed new semantics being added to the Kubernetes API. The different layers of abstractions being added to the API, the different hook points, and override layers clearly resembled (to me at least) the network packet filtering stack of the linux kernel (netfilter), but without the 20 (plus) years of experience building the right semantics and user interfaces. Network talk I very recently missed some semantics for limiting the number of open connections per namespace, see Phabricator T356164: [toolforge] several tools get periods of connection refused (104) when connecting to wikis This functionality should be available in the lower level tools, I mean Netfilter. I may submit a proposal upstream at some point, so they consider adding this to the Kubernetes API. Final notes In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had forgotten because of lack of daily exposure. I m really happy that the cloud native way of thinking was reinforced in me, which I still need because most of my muscle memory to approach systems architecture and engineering is from the old pre-cloud days. That being said, I felt less engaged with the content of the conference schedule compared to last year. I don t know if the schedule itself was less interesting, or that I m losing interest? Finally, not an official track in the conference, but we met a bunch of folks from Wikimedia Deutschland. We had a really nice time talking about how wikibase.cloud uses Kubernetes, whether they could run in Wikimedia Cloud Services, and why structured data is so nice. Group photo

24 March 2024

Jacob Adams: Regular Reboots

Uptime is often considered a measure of system reliability, an indication that the running software is stable and can be counted on. However, this hides the insidious build-up of state throughout the system as it runs, the slow drift from the expected to the strange. As Nolan Lawson highlights in an excellent post entitled Programmers are bad at managing state, state is the most challenging part of programming. It s why did you try turning it off and on again is a classic tech support response to any problem. In addition to the problem of state, installing regular updates periodically requires a reboot, even if the rest of the process is automated through a tool like unattended-upgrades. For my personal homelab, I manage a handful of different machines running various services. I used to just schedule a day to update and reboot all of them, but that got very tedious very quickly. I then moved the reboot to a cronjob, and then recently to a systemd timer and service. I figure that laying out my path to better management of this might help others, and will almost certainly lead to someone telling me a better way to do this. UPDATE: Turns out there s another option for better systemd cron integration. See systemd-cron below.

Stage One: Reboot Cron The first, and easiest approach, is a simple cron job. Just adding the following line to /var/spool/cron/crontabs/root1 is enough to get your machine to reboot once a month2 on the 6th at 8:00 AM3:
0 8 6 * * reboot
I had this configured for many years and it works well. But you have no indication as to whether it succeeds except for checking your uptime regularly yourself.

Stage Two: Reboot systemd Timer The next evolution of this approach for me was to use a systemd timer. I created a regular-reboot.timer with the following contents:
[Unit]
Description=Reboot on a Regular Basis
[Timer]
Unit=regular-reboot.service
OnBootSec=1month
[Install]
WantedBy=timers.target
This timer will trigger the regular-reboot.service systemd unit when the system reaches one month of uptime. I ve seen some guides to creating timer units recommend adding a Wants=regular-reboot.service to the [Unit] section, but this has the consequence of running that service every time it starts the timer. In this case that will just reboot your system on startup which is not what you want. Care needs to be taken to use the OnBootSec directive instead of OnCalendar or any of the other time specifications, as your system could reboot, discover its still within the expected window and reboot again. With OnBootSec your system will not have that problem. Technically, this same problem could have occurred with the cronjob approach, but in practice it never did, as the systems took long enough to come back up that they were no longer within the expected window for the job. I then added the regular-reboot.service:
[Unit]
Description=Reboot on a Regular Basis
Wants=regular-reboot.timer
[Service]
Type=oneshot
ExecStart=shutdown -r 02:45
You ll note that this service is actually scheduling a specific reboot time via the shutdown command instead of just immediately rebooting. This is a bit of a hack needed because I can t control when the timer runs exactly when using OnBootSec. This way different systems have different reboot times so that everything doesn t just reboot and fail all at once. Were something to fail to come back up I would have some time to fix it, as each machine has a few hours between scheduled reboots. One you have both files in place, you ll simply need to reload configuration and then enable and start the timer unit:
systemctl daemon-reload
systemctl enable --now regular-reboot.timer
You can then check when it will fire next:
# systemctl status regular-reboot.timer
  regular-reboot.timer - Reboot on a Regular Basis
     Loaded: loaded (/etc/systemd/system/regular-reboot.timer; enabled; preset: enabled)
     Active: active (waiting) since Wed 2024-03-13 01:54:52 EDT; 1 week 4 days ago
    Trigger: Fri 2024-04-12 12:24:42 EDT; 2 weeks 4 days left
   Triggers:   regular-reboot.service
Mar 13 01:54:52 dorfl systemd[1]: Started regular-reboot.timer - Reboot on a Regular Basis.

Sidenote: Replacing all Cron Jobs with systemd Timers More generally, I ve now replaced all cronjobs on my personal systems with systemd timer units, mostly because I can now actually track failures via prometheus-node-exporter. There are plenty of ways to hack in cron support to the node exporter, but just moving to systemd units provides both support for tracking failure and logging, both of which make system administration much easier when things inevitably go wrong.

systemd-cron An alternative to converting everything by hand, if you happen to have a lot of cronjobs is systemd-cron. It will make each crontab and /etc/cron.* directory into automatic service and timer units. Thanks to Alexandre Detiste for letting me know about this project. I have few enough cron jobs that I ve already converted, but for anyone looking at a large number of jobs to convert you ll want to check it out!

Stage Three: Monitor that it s working The final step here is confirm that these units actually work, beyond just firing regularly. I now have the following rule in my prometheus-alertmanager rules:
  - alert: UptimeTooHigh
    expr: (time() - node_boot_time_seconds job="node" ) / 86400 > 35
    annotations:
      summary: "Instance  Has Been Up Too Long!"
      description: "Instance  Has Been Up Too Long!"
This will trigger an alert anytime that I have a machine up for more than 35 days. This actually helped me track down one machine that I had forgotten to set up this new unit on4.

Not everything needs to scale Is It Worth The Time One of the most common fallacies programmers fall into is that we will jump to automating a solution before we stop and figure out how much time it would even save. In taking a slow improvement route to solve this problem for myself, I ve managed not to invest too much time5 in worrying about this but also achieved a meaningful improvement beyond my first approach of doing it all by hand.
  1. You could also add a line to /etc/crontab or drop a script into /etc/cron.monthly depending on your system.
  2. Why once a month? Mostly to avoid regular disruptions, but still be reasonably timely on updates.
  3. If you re looking to understand the cron time format I recommend crontab guru.
  4. In the long term I really should set up something like ansible to automatically push fleetwide changes like this but with fewer machines than fingers this seems like overkill.
  5. Of course by now writing about it, I ve probably doubled the amount of time I ve spent thinking about this topic but oh well

23 March 2024

Erich Schubert: Do not get Amazon Kids+ or a Fire HD Kids

The Amazon Kids parental controls are extremely insufficient, and I strongly advise against getting any of the Amazon Kids series. The initial permise (and some older reviews) look okay: you can set some time limits, and you can disable anything that requires buying. With the hardware you get one year of the Amazon Kids+ subscription, which includes a lot of interesting content such as books and audio, but also some apps. This seemed attractive: some learning apps, some decent games. Sometimes there seems to be a special Amazon Kids+ edition , supposedly one that has advertisements reduced/removed and no purchasing. However, there are so many things just wrong in Amazon Kids: And, unfortunately, Amazon Kids is full of poor content for kids, such as DIY Fashion Star that I consider to be very dangerous for kids: it is extremely stereotypical, beginning with supposedly female color schemes, model-only body types, and judging people by their clothing (and body). You really thought you could hand-pick suitable apps for your kid on your own? No, you have to identify and remove such contents one by one, with many clicks each, because there is no whitelisting, and no mass-removal (anymore - apparently Amazon removed the workarounds that previously allowed you to mass remove contents). Not with Amazon Kids+, which apparently aims at raising the next generation of zombie customers that buy whatever you tell them to buy. Hence, do not get your kids an Amazon Fire HD tablet!

21 March 2024

Ian Jackson: How to use Rust on Debian (and Ubuntu, etc.)

tl;dr: Don t just apt install rustc cargo. Either do that and make sure to use only Rust libraries from your distro (with the tiresome config runes below); or, just use rustup. Don t do the obvious thing; it s never what you want Debian ships a Rust compiler, and a large number of Rust libraries. But if you just do things the obvious default way, with apt install rustc cargo, you will end up using Debian s compiler but upstream libraries, directly and uncurated from crates.io. This is not what you want. There are about two reasonable things to do, depending on your preferences. Q. Download and run whatever code from the internet? The key question is this: Are you comfortable downloading code, directly from hundreds of upstream Rust package maintainers, and running it ? That s what cargo does. It s one of the main things it s for. Debian s cargo behaves, in this respect, just like upstream s. Let me say that again: Debian s cargo promiscuously downloads code from crates.io just like upstream cargo. So if you use Debian s cargo in the most obvious way, you are still downloading and running all those random libraries. The only thing you re avoiding downloading is the Rust compiler itself, which is precisely the part that is most carefully maintained, and of least concern. Debian s cargo can even download from crates.io when you re building official Debian source packages written in Rust: if you run dpkg-buildpackage, the downloading is suppressed; but a plain cargo build will try to obtain and use dependencies from the upstream ecosystem. ( Happily , if you do this, it s quite likely to bail out early due to version mismatches, before actually downloading anything.) Option 1: WTF, no I don t want curl bash OK, but then you must limit yourself to libraries available within Debian. Each Debian release provides a curated set. It may or may not be sufficient for your needs. Many capable programs can be written using the packages in Debian. But any upstream Rust project that you encounter is likely to be a pain to get working, unless their maintainers specifically intend to support this. (This is fairly rare, and the Rust tooling doesn t make it easy.) To go with this plan, apt install rustc cargo and put this in your configuration, in $HOME/.cargo/config.toml:
[source.debian-packages]
directory = "/usr/share/cargo/registry"
[source.crates-io]
replace-with = "debian-packages"
This causes cargo to look in /usr/share for dependencies, rather than downloading them from crates.io. You must then install the librust-FOO-dev packages for each of your dependencies, with apt. This will allow you to write your own program in Rust, and build it using cargo build. Option 2: Biting the curl bash bullet If you want to build software that isn t specifically targeted at Debian s Rust you will probably need to use packages from crates.io, not from Debian. If you re doing to do that, there is little point not using rustup to get the latest compiler. rustup s install rune is alarming, but cargo will be doing exactly the same kind of thing, only worse (because it trusts many more people) and more hidden. So in this case: do run the curl bash install rune. Hopefully the Rust project you are trying to build have shipped a Cargo.lock; that contains hashes of all the dependencies that they last used and tested. If you run cargo build --locked, cargo will only use those versions, which are hopefully OK. And you can run cargo audit to see if there are any reported vulnerabilities or problems. But you ll have to bootstrap this with cargo install --locked cargo-audit; cargo-audit is from the RUSTSEC folks who do care about these kind of things, so hopefully running their code (and their dependencies) is fine. Note the --locked which is needed because cargo s default behaviour is wrong. Privilege separation This approach is rather alarming. For my personal use, I wrote a privsep tool which allows me to run all this upstream Rust code as a separate user. That tool is nailing-cargo. It s not particularly well productised, or tested, but it does work for at least one person besides me. You may wish to try it out, or consider alternative arrangements. Bug reports and patches welcome. OMG what a mess Indeed. There are large number of technical and social factors at play. cargo itself is deeply troubling, both in principle, and in detail. I often find myself severely disappointed with its maintainers decisions. In mitigation, much of the wider Rust upstream community does takes this kind of thing very seriously, and often makes good choices. RUSTSEC is one of the results. Debian s technical arrangements for Rust packaging are quite dysfunctional, too: IMO the scheme is based on fundamentally wrong design principles. But, the Debian Rust packaging team is dynamic, constantly working the update treadmills; and the team is generally welcoming and helpful. Sadly last time I explored the possibility, the Debian Rust Team didn t have the appetite for more fundamental changes to the workflow (including, for example, changes to dependency version handling). Significant improvements to upstream cargo s approach seem unlikely, too; we can only hope that eventually someone might manage to supplant it.
edited 2024-03-21 21:49 to add a cut tag


comment count unavailable comments

19 March 2024

Colin Watson: apt install everything?

On Mastodon, the question came up of how Ubuntu would deal with something like the npm install everything situation. I replied:
Ubuntu is curated, so it probably wouldn t get this far. If it did, then the worst case is that it would get in the way of CI allowing other packages to be removed (again from a curated system, so people are used to removal not being self-service); but the release team would have no hesitation in removing a package like this to fix that, and it certainly wouldn t cause this amount of angst. If you did this in a PPA, then I can t think of any particular negative effects.
OK, if you added lots of build-dependencies (as well as run-time dependencies) then you might be able to take out a builder. But Launchpad builders already run arbitrary user-submitted code by design and are therefore very carefully sandboxed and treated as ephemeral, so this is hardly novel. There s a lot to be said for the arrangement of having a curated system for the stuff people actually care about plus an ecosystem of add-on repositories. PPAs cover a wide range of levels of developer activity, from throwaway experiments to quasi-official distribution methods; there are certainly problems that arise from it being difficult to tell the difference between those extremes and from there being no systematic confinement, but for this particular kind of problem they re very nearly ideal. (Canonical has tried various other approaches to software distribution, and while they address some of the problems, they aren t obviously better at helping people make reliable social judgements about code they don t know.) For a hypothetical package with a huge number of dependencies, to even try to upload it directly to Ubuntu you d need to be an Ubuntu developer with upload rights (or to go via Debian, where you d have to clear a similar hurdle). If you have those, then the first upload has to pass manual review by an archive administrator. If your package passes that, then it still has to build and get through proposed-migration CI before it reaches anything that humans typically care about. On the other hand, if you were inclined to try this sort of experiment, you d almost certainly try it in a PPA, and that would trouble nobody but yourself.

18 March 2024

Simon Josefsson: Apt archive mirrors in Git-LFS

My effort to improve transparency and confidence of public apt archives continues. I started to work on this in Apt Archive Transparency in which I mention the debdistget project in passing. Debdistget is responsible for mirroring index files for some public apt archives. I ve realized that having a publicly auditable and preserved mirror of the apt repositories is central to being able to do apt transparency work, so the debdistget project has become more central to my project than I thought. Currently I track Trisquel, PureOS, Gnuinos and their upstreams Ubuntu, Debian and Devuan. Debdistget download Release/Package/Sources files and store them in a git repository published on GitLab. Due to size constraints, it uses two repositories: one for the Release/InRelease files (which are small) and one that also include the Package/Sources files (which are large). See for example the repository for Trisquel release files and the Trisquel package/sources files. Repositories for all distributions can be found in debdistutils archives GitLab sub-group. The reason for splitting into two repositories was that the git repository for the combined files become large, and that some of my use-cases only needed the release files. Currently the repositories with packages (which contain a couple of months worth of data now) are 9GB for Ubuntu, 2.5GB for Trisquel/Debian/PureOS, 970MB for Devuan and 450MB for Gnuinos. The repository size is correlated to the size of the archive (for the initial import) plus the frequency and size of updates. Ubuntu s use of Apt Phased Updates (which triggers a higher churn of Packages file modifications) appears to be the primary reason for its larger size. Working with large Git repositories is inefficient and the GitLab CI/CD jobs generate quite some network traffic downloading the git repository over and over again. The most heavy user is the debdistdiff project that download all distribution package repositories to do diff operations on the package lists between distributions. The daily job takes around 80 minutes to run, with the majority of time is spent on downloading the archives. Yes I know I could look into runner-side caching but I dislike complexity caused by caching. Fortunately not all use-cases requires the package files. The debdistcanary project only needs the Release/InRelease files, in order to commit signatures to the Sigstore and Sigsum transparency logs. These jobs still run fairly quickly, but watching the repository size growth worries me. Currently these repositories are at Debian 440MB, PureOS 130MB, Ubuntu/Devuan 90MB, Trisquel 12MB, Gnuinos 2MB. Here I believe the main size correlation is update frequency, and Debian is large because I track the volatile unstable. So I hit a scalability end with my first approach. A couple of months ago I solved this by discarding and resetting these archival repositories. The GitLab CI/CD jobs were fast again and all was well. However this meant discarding precious historic information. A couple of days ago I was reaching the limits of practicality again, and started to explore ways to fix this. I like having data stored in git (it allows easy integration with software integrity tools such as GnuPG and Sigstore, and the git log provides a kind of temporal ordering of data), so it felt like giving up on nice properties to use a traditional database with on-disk approach. So I started to learn about Git-LFS and understanding that it was able to handle multi-GB worth of data that looked promising. Fairly quickly I scripted up a GitLab CI/CD job that incrementally update the Release/Package/Sources files in a git repository that uses Git-LFS to store all the files. The repository size is now at Ubuntu 650kb, Debian 300kb, Trisquel 50kb, Devuan 250kb, PureOS 172kb and Gnuinos 17kb. As can be expected, jobs are quick to clone the git archives: debdistdiff pipelines went from a run-time of 80 minutes down to 10 minutes which more reasonable correlate with the archive size and CPU run-time. The LFS storage size for those repositories are at Ubuntu 15GB, Debian 8GB, Trisquel 1.7GB, Devuan 1.1GB, PureOS/Gnuinos 420MB. This is for a couple of days worth of data. It seems native Git is better at compressing/deduplicating data than Git-LFS is: the combined size for Ubuntu is already 15GB for a couple of days data compared to 8GB for a couple of months worth of data with pure Git. This may be a sub-optimal implementation of Git-LFS in GitLab but it does worry me that this new approach will be difficult to scale too. At some level the difference is understandable, Git-LFS probably store two different Packages files around 90MB each for Trisquel as two 90MB files, but native Git would store it as one compressed version of the 90MB file and one relatively small patch to turn the old files into the next file. So the Git-LFS approach surprisingly scale less well for overall storage-size. Still, the original repository is much smaller, and you usually don t have to pull all LFS files anyway. So it is net win. Throughout this work, I kept thinking about how my approach relates to Debian s snapshot service. Ultimately what I would want is a combination of these two services. To have a good foundation to do transparency work I would want to have a collection of all Release/Packages/Sources files ever published, and ultimately also the source code and binaries. While it makes sense to start on the latest stable releases of distributions, this effort should scale backwards in time as well. For reproducing binaries from source code, I need to be able to securely find earlier versions of binary packages used for rebuilds. So I need to import all the Release/Packages/Sources packages from snapshot into my repositories. The latency to retrieve files from that server is slow so I haven t been able to find an efficient/parallelized way to download the files. If I m able to finish this, I would have confidence that my new Git-LFS based approach to store these files will scale over many years to come. This remains to be seen. Perhaps the repository has to be split up per release or per architecture or similar. Another factor is storage costs. While the git repository size for a Git-LFS based repository with files from several years may be possible to sustain, the Git-LFS storage size surely won t be. It seems GitLab charges the same for files in repositories and in Git-LFS, and it is around $500 per 100GB per year. It may be possible to setup a separate Git-LFS backend not hosted at GitLab to serve the LFS files. Does anyone know of a suitable server implementation for this? I had a quick look at the Git-LFS implementation list and it seems the closest reasonable approach would be to setup the Gitea-clone Forgejo as a self-hosted server. Perhaps a cloud storage approach a la S3 is the way to go? The cost to host this on GitLab will be manageable for up to ~1TB ($5000/year) but scaling it to storing say 500TB of data would mean an yearly fee of $2.5M which seems like poor value for the money. I realized that ultimately I would want a git repository locally with the entire content of all apt archives, including their binary and source packages, ever published. The storage requirements for a service like snapshot (~300TB of data?) is today not prohibitly expensive: 20TB disks are $500 a piece, so a storage enclosure with 36 disks would be around $18.000 for 720TB and using RAID1 means 360TB which is a good start. While I have heard about ~TB-sized Git-LFS repositories, would Git-LFS scale to 1PB? Perhaps the size of a git repository with multi-millions number of Git-LFS pointer files will become unmanageable? To get started on this approach, I decided to import a mirror of Debian s bookworm for amd64 into a Git-LFS repository. That is around 175GB so reasonable cheap to host even on GitLab ($1000/year for 200GB). Having this repository publicly available will make it possible to write software that uses this approach (e.g., porting debdistreproduce), to find out if this is useful and if it could scale. Distributing the apt repository via Git-LFS would also enable other interesting ideas to protecting the data. Consider configuring apt to use a local file:// URL to this git repository, and verifying the git checkout using some method similar to Guix s approach to trusting git content or Sigstore s gitsign. A naive push of the 175GB archive in a single git commit ran into pack size limitations: remote: fatal: pack exceeds maximum allowed size (4.88 GiB) however breaking up the commit into smaller commits for parts of the archive made it possible to push the entire archive. Here are the commands to create this repository: git init
git lfs install
git lfs track 'dists/**' 'pool/**'
git add .gitattributes
git commit -m"Add Git-LFS track attributes." .gitattributes
time debmirror --method=rsync --host ftp.se.debian.org --root :debian --arch=amd64 --source --dist=bookworm,bookworm-updates --section=main --verbose --diff=none --keyring /usr/share/keyrings/debian-archive-keyring.gpg --ignore .git .
git add dists project
git commit -m"Add." -a
git remote add origin git@gitlab.com:debdistutils/archives/debian/mirror.git
git push --set-upstream origin --all
for d in pool//; do
echo $d;
time git add $d;
git commit -m"Add $d." -a
git push
done
The resulting repository size is around 27MB with Git LFS object storage around 174GB. I think this approach would scale to handle all architectures for one release, but working with a single git repository for all releases for all architectures may lead to a too large git repository (>1GB). So maybe one repository per release? These repositories could also be split up on a subset of pool/ files, or there could be one repository per release per architecture or sources. Finally, I have concerns about using SHA1 for identifying objects. It seems both Git and Debian s snapshot service is currently using SHA1. For Git there is SHA-256 transition and it seems GitLab is working on support for SHA256-based repositories. For serious long-term deployment of these concepts, it would be nice to go for SHA256 identifiers directly. Git-LFS already uses SHA256 but Git internally uses SHA1 as does the Debian snapshot service. What do you think? Happy Hacking!

15 March 2024

Patryk Cisek: OpenPGP Paper Backup

openpgp-paper-backup I ve been using OpenPGP through GnuPG since early 2000 . It s an essential part of Debian Developer s workflow. We use it regularly to authenticate package uploads and votes. Proper backups of that key are really important. Up until recently, the only reliable option for me was backing up a tarball of my ~/.gnupg offline on a set few flash drives. This approach is better than nothing, but it s not nearly as reliable as I d like it to be.

14 March 2024

Matthew Garrett: Digital forgeries are hard

Closing arguments in the trial between various people and Craig Wright over whether he's Satoshi Nakamoto are wrapping up today, amongst a bewildering array of presented evidence. But one utterly astonishing aspect of this lawsuit is that expert witnesses for both sides agreed that much of the digital evidence provided by Craig Wright was unreliable in one way or another, generally including indications that it wasn't produced at the point in time it claimed to be. And it's fascinating reading through the subtle (and, in some cases, not so subtle) ways that that's revealed.

One of the pieces of evidence entered is screenshots of data from Mind Your Own Business, a business management product that's been around for some time. Craig Wright relied on screenshots of various entries from this product to support his claims around having controlled meaningful number of bitcoin before he was publicly linked to being Satoshi. If these were authentic then they'd be strong evidence linking him to the mining of coins before Bitcoin's public availability. Unfortunately the screenshots themselves weren't contemporary - the metadata shows them being created in 2020. This wouldn't fundamentally be a problem (it's entirely reasonable to create new screenshots of old material), as long as it's possible to establish that the material shown in the screenshots was created at that point. Sadly, well.

One part of the disclosed information was an email that contained a zip file that contained a raw database in the format used by MYOB. Importing that into the tool allowed an audit record to be extracted - this record showed that the relevant entries had been added to the database in 2020, shortly before the screenshots were created. This was, obviously, not strong evidence that Craig had held Bitcoin in 2009. This evidence was reported, and was responded to with a couple of additional databases that had an audit trail that was consistent with the dates in the records in question. Well, partially. The audit record included session data, showing an administrator logging into the data base in 2011 and then, uh, logging out in 2023, which is rather more consistent with someone changing their system clock to 2011 to create an entry, and switching it back to present day before logging out. In addition, the audit log included fields that didn't exist in versions of the product released before 2016, strongly suggesting that the entries dated 2009-2011 were created in software released after 2016. And even worse, the order of insertions into the database didn't line up with calendar time - an entry dated before another entry may appear in the database afterwards, indicating that it was created later. But even more obvious? The database schema used for these old entries corresponded to a version of the software released in 2023.

This is all consistent with the idea that these records were created after the fact and backdated to 2009-2011, and that after this evidence was made available further evidence was created and backdated to obfuscate that. In an unusual turn of events, during the trial Craig Wright introduced further evidence in the form of a chain of emails to his former lawyers that indicated he had provided them with login details to his MYOB instance in 2019 - before the metadata associated with the screenshots. The implication isn't entirely clear, but it suggests that either they had an opportunity to examine this data before the metadata suggests it was created, or that they faked the data? So, well, the obvious thing happened, and his former lawyers were asked whether they received these emails. The chain consisted of three emails, two of which they confirmed they'd received. And they received a third email in the chain, but it was different to the one entered in evidence. And, uh, weirdly, they'd received a copy of the email that was submitted - but they'd received it a few days earlier. In 2024.

And again, the forensic evidence is helpful here! It turns out that the email client used associates a timestamp with any attachments, which in this case included an image in the email footer - and the mysterious time travelling email had a timestamp in 2024, not 2019. This was created by the client, so was consistent with the email having been sent in 2024, not being sent in 2019 and somehow getting stuck somewhere before delivery. The date header indicates 2019, as do encoded timestamps in the MIME headers - consistent with the mail being sent by a computer with the clock set to 2019.

But there's a very weird difference between the copy of the email that was submitted in evidence and the copy that was located afterwards! The first included a header inserted by gmail that included a 2019 timestamp, while the latter had a 2024 timestamp. Is there a way to determine which of these could be the truth? It turns out there is! The format of that header changed in 2022, and the version in the email is the new version. The version with the 2019 timestamp is anachronistic - the format simply doesn't match the header that gmail would have introduced in 2019, suggesting that an email sent in 2022 or later was modified to include a timestamp of 2019.

This is by no means the only indication that Craig Wright's evidence may be misleading (there's the whole argument that the Bitcoin white paper was written in LaTeX when general consensus is that it's written in OpenOffice, given that's what the metadata claims), but it's a lovely example of a more general issue.

Our technology chains are complicated. So many moving parts end up influencing the content of the data we generate, and those parts develop over time. It's fantastically difficult to generate an artifact now that precisely corresponds to how it would look in the past, even if we go to the effort of installing an old OS on an old PC and setting the clock appropriately (are you sure you're going to be able to mimic an entirely period appropriate patch level?). Even the version of the font you use in a document may indicate it's anachronistic. I'm pretty good at computers and I no longer have any belief I could fake an old document.

(References: this Dropbox, under "Expert reports", "Patrick Madden". Initial MYOB data is in "Appendix PM7", further analysis is in "Appendix PM42", email analysis is "Sixth Expert Report of Mr Patrick Madden")

comment count unavailable comments

Next.