Search Results: "Simon Josefsson"

13 April 2024

Simon Josefsson: Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are hopefully reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:

git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz

The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:

make srcdist
gpg -b libntlm-1.8-src.tar.gz

Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:

91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz

So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:

podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz

You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:

podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz

The source-only minimal tarball can be regenerated on Debian 11:

podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz

As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.

podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz

Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:

git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD

Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

1 April 2024

Simon Josefsson: Towards reproducible minimal source code tarballs? On *-src.tar.gz

While the work to analyze the xz backdoor is in progress, several ideas have been suggested to improve the software supply chain ecosystem. Some of those ideas are good, some of the ideas are at best irrelevant and harmless, and some suggestions are plain bad. I d like to attempt to formalize two ideas, which have been discussed before, but the context in which they can be appreciated have not been as clear as it is today.

Reproducible tarballs. The idea is that published source tarballs should be possible to reproduce independently somehow, and that this should be continuously tested and verified preferrably as part of the upstream project continuous integration system (e.g., GitHub action or GitLab pipeline). While nominally this looks easy to achieve, there are some complex matters in this, for example: what timestamps to use for files in the tarball? I ve brought up this aspect before.
Minimal source tarballs without generated vendor files. Most GNU Autoconf/Automake-based tarballs pre-generated files which are important for bootstrapping on exotic systems that does not have the required dependencies. For the bootstrapping story to succeed, this approach is important to support. However it has become clear that this practice raise significant costs and risks. Most modern GNU/Linux distributions have all the required dependencies and actually prefers to re-build everything from source code. These pre-generated extra files introduce uncertainty to that process.

My strawman proposal to improve things is to define new tarball format *-src.tar.gz with at least the following properties:

The tarball should allow users to build the project, which is the entire purpose of all this. This means that at least all source code for the project has to be included.
The tarballs should be signed, for example with PGP or minisign.
The tarball should be possible to reproduce bit-by-bit by a third party using upstream s version controlled sources and a pointer to which revision was used (e.g., git tag or git commit).
The tarball should not require an Internet connection to download things.
- Corollary: every external dependency either has to be explicitly documented as such (e.g., gcc and GnuTLS), or included in the tarball.
- Observation: This means including all *.po gettext translations which are normally downloaded when building from version controlled sources.
The tarball should contain everything required to build the project from source using as much externally released versioned tooling as possible. This is the minimal property lacking today.
- Corollary: This means including a vendored copy of OpenSSL or libz is not acceptable: link to them as external projects.
- Open question: How about non-released external tooling such as gnulib or autoconf archive macros? This is a bit more delicate: most distributions either just package one current version of gnulib or autoconf archive, not previous versions. While this could change, and distributions could package the gnulib git repository (up to some current version) and the autoconf archive git repository and packages were set up to extract the version they need (gnulib s ./bootstrap already supports this via the gnulib-refdir parameter), this is not normally in place.
- Suggested Corollary: The tarball should contain content from git submodule s such as gnulib and the necessary Autoconf archive M4 macros required by the project.
Similar to how the GNU project specify the ./configure interface we need a documented interface for how to bootstrap the project. I suggest to use the already well established idiom of running ./bootstrap to set up the package to later be able to be built via ./configure. Of course, some projects are not using the autotool ./configure interface and will not follow this aspect either, but like most build systems that compete with autotools have instructions on how to build the project, they should document similar interfaces for bootstrapping the source tarball to allow building.

If tarballs that achieve the above goals were available from popular upstream projects, distributions could more easily use them instead of current tarballs that include pre-generated content. The advantage would be that the build process is not tainted by unnecessary files. We need to develop tools for maintainers to create these tarballs, similar to make dist that generate today s foo-1.2.3.tar.gz files. I think one common argument against this approach will be: Why bother with all that, and just use git-archive outputs? Or avoid the entire tarball approach and move directly towards version controlled check outs and referring to upstream releases as git URL and commit tag or id. One problem with this is that SHA-1 is broken, so placing trust in a SHA-1 identifier is simply not secure. Another counter-argument is that this optimize for packagers benefits at the cost of upstream maintainers: most upstream maintainers do not want to store gettext *.po translations in their source code repository. A compromise between the needs of maintainers and packagers is useful, so this *-src.tar.gz tarball approach is the indirection we need to solve that. Update: In my experiment with source-only tarballs for Libntlm I actually did use git-archive output. What do you think?

18 March 2024

Simon Josefsson: Apt archive mirrors in Git-LFS

My effort to improve transparency and confidence of public apt archives continues. I started to work on this in Apt Archive Transparency in which I mention the debdistget project in passing. Debdistget is responsible for mirroring index files for some public apt archives. I ve realized that having a publicly auditable and preserved mirror of the apt repositories is central to being able to do apt transparency work, so the debdistget project has become more central to my project than I thought. Currently I track Trisquel, PureOS, Gnuinos and their upstreams Ubuntu, Debian and Devuan. Debdistget download Release/Package/Sources files and store them in a git repository published on GitLab. Due to size constraints, it uses two repositories: one for the Release/InRelease files (which are small) and one that also include the Package/Sources files (which are large). See for example the repository for Trisquel release files and the Trisquel package/sources files. Repositories for all distributions can be found in debdistutils archives GitLab sub-group. The reason for splitting into two repositories was that the git repository for the combined files become large, and that some of my use-cases only needed the release files. Currently the repositories with packages (which contain a couple of months worth of data now) are 9GB for Ubuntu, 2.5GB for Trisquel/Debian/PureOS, 970MB for Devuan and 450MB for Gnuinos. The repository size is correlated to the size of the archive (for the initial import) plus the frequency and size of updates. Ubuntu s use of Apt Phased Updates (which triggers a higher churn of Packages file modifications) appears to be the primary reason for its larger size. Working with large Git repositories is inefficient and the GitLab CI/CD jobs generate quite some network traffic downloading the git repository over and over again. The most heavy user is the debdistdiff project that download all distribution package repositories to do diff operations on the package lists between distributions. The daily job takes around 80 minutes to run, with the majority of time is spent on downloading the archives. Yes I know I could look into runner-side caching but I dislike complexity caused by caching. Fortunately not all use-cases requires the package files. The debdistcanary project only needs the Release/InRelease files, in order to commit signatures to the Sigstore and Sigsum transparency logs. These jobs still run fairly quickly, but watching the repository size growth worries me. Currently these repositories are at Debian 440MB, PureOS 130MB, Ubuntu/Devuan 90MB, Trisquel 12MB, Gnuinos 2MB. Here I believe the main size correlation is update frequency, and Debian is large because I track the volatile unstable. So I hit a scalability end with my first approach. A couple of months ago I solved this by discarding and resetting these archival repositories. The GitLab CI/CD jobs were fast again and all was well. However this meant discarding precious historic information. A couple of days ago I was reaching the limits of practicality again, and started to explore ways to fix this. I like having data stored in git (it allows easy integration with software integrity tools such as GnuPG and Sigstore, and the git log provides a kind of temporal ordering of data), so it felt like giving up on nice properties to use a traditional database with on-disk approach. So I started to learn about Git-LFS and understanding that it was able to handle multi-GB worth of data that looked promising. Fairly quickly I scripted up a GitLab CI/CD job that incrementally update the Release/Package/Sources files in a git repository that uses Git-LFS to store all the files. The repository size is now at Ubuntu 650kb, Debian 300kb, Trisquel 50kb, Devuan 250kb, PureOS 172kb and Gnuinos 17kb. As can be expected, jobs are quick to clone the git archives: debdistdiff pipelines went from a run-time of 80 minutes down to 10 minutes which more reasonable correlate with the archive size and CPU run-time. The LFS storage size for those repositories are at Ubuntu 15GB, Debian 8GB, Trisquel 1.7GB, Devuan 1.1GB, PureOS/Gnuinos 420MB. This is for a couple of days worth of data. It seems native Git is better at compressing/deduplicating data than Git-LFS is: the combined size for Ubuntu is already 15GB for a couple of days data compared to 8GB for a couple of months worth of data with pure Git. This may be a sub-optimal implementation of Git-LFS in GitLab but it does worry me that this new approach will be difficult to scale too. At some level the difference is understandable, Git-LFS probably store two different Packages files around 90MB each for Trisquel as two 90MB files, but native Git would store it as one compressed version of the 90MB file and one relatively small patch to turn the old files into the next file. So the Git-LFS approach surprisingly scale less well for overall storage-size. Still, the original repository is much smaller, and you usually don t have to pull all LFS files anyway. So it is net win. Throughout this work, I kept thinking about how my approach relates to Debian s snapshot service. Ultimately what I would want is a combination of these two services. To have a good foundation to do transparency work I would want to have a collection of all Release/Packages/Sources files ever published, and ultimately also the source code and binaries. While it makes sense to start on the latest stable releases of distributions, this effort should scale backwards in time as well. For reproducing binaries from source code, I need to be able to securely find earlier versions of binary packages used for rebuilds. So I need to import all the Release/Packages/Sources packages from snapshot into my repositories. The latency to retrieve files from that server is slow so I haven t been able to find an efficient/parallelized way to download the files. If I m able to finish this, I would have confidence that my new Git-LFS based approach to store these files will scale over many years to come. This remains to be seen. Perhaps the repository has to be split up per release or per architecture or similar. Another factor is storage costs. While the git repository size for a Git-LFS based repository with files from several years may be possible to sustain, the Git-LFS storage size surely won t be. It seems GitLab charges the same for files in repositories and in Git-LFS, and it is around $500 per 100GB per year. It may be possible to setup a separate Git-LFS backend not hosted at GitLab to serve the LFS files. Does anyone know of a suitable server implementation for this? I had a quick look at the Git-LFS implementation list and it seems the closest reasonable approach would be to setup the Gitea-clone Forgejo as a self-hosted server. Perhaps a cloud storage approach a la S3 is the way to go? The cost to host this on GitLab will be manageable for up to ~1TB ($5000/year) but scaling it to storing say 500TB of data would mean an yearly fee of $2.5M which seems like poor value for the money. I realized that ultimately I would want a git repository locally with the entire content of all apt archives, including their binary and source packages, ever published. The storage requirements for a service like snapshot (~300TB of data?) is today not prohibitly expensive: 20TB disks are $500 a piece, so a storage enclosure with 36 disks would be around $18.000 for 720TB and using RAID1 means 360TB which is a good start. While I have heard about ~TB-sized Git-LFS repositories, would Git-LFS scale to 1PB? Perhaps the size of a git repository with multi-millions number of Git-LFS pointer files will become unmanageable? To get started on this approach, I decided to import a mirror of Debian s bookworm for amd64 into a Git-LFS repository. That is around 175GB so reasonable cheap to host even on GitLab ($1000/year for 200GB). Having this repository publicly available will make it possible to write software that uses this approach (e.g., porting debdistreproduce), to find out if this is useful and if it could scale. Distributing the apt repository via Git-LFS would also enable other interesting ideas to protecting the data. Consider configuring apt to use a local file:// URL to this git repository, and verifying the git checkout using some method similar to Guix s approach to trusting git content or Sigstore s gitsign. A naive push of the 175GB archive in a single git commit ran into pack size limitations: remote: fatal: pack exceeds maximum allowed size (4.88 GiB) however breaking up the commit into smaller commits for parts of the archive made it possible to push the entire archive. Here are the commands to create this repository:

git init
git lfs install
git lfs track 'dists/**' 'pool/**'
git add .gitattributes
git commit -m"Add Git-LFS track attributes." .gitattributes
time debmirror --method=rsync --host ftp.se.debian.org --root :debian --arch=amd64 --source --dist=bookworm,bookworm-updates --section=main --verbose --diff=none --keyring /usr/share/keyrings/debian-archive-keyring.gpg --ignore .git .
git add dists project
git commit -m"Add." -a
git remote add origin git@gitlab.com:debdistutils/archives/debian/mirror.git
git push --set-upstream origin --all
for d in pool//; do
   echo $d;
   time git add $d;
   git commit -m"Add $d." -a
   git push
done

The resulting repository size is around 27MB with Git LFS object storage around 174GB. I think this approach would scale to handle all architectures for one release, but working with a single git repository for all releases for all architectures may lead to a too large git repository (>1GB). So maybe one repository per release? These repositories could also be split up on a subset of pool/ files, or there could be one repository per release per architecture or sources. Finally, I have concerns about using SHA1 for identifying objects. It seems both Git and Debian s snapshot service is currently using SHA1. For Git there is SHA-256 transition and it seems GitLab is working on support for SHA256-based repositories. For serious long-term deployment of these concepts, it would be nice to go for SHA256 identifiers directly. Git-LFS already uses SHA256 but Git internally uses SHA1 as does the Debian snapshot service. What do you think? Happy Hacking!

10 January 2024

Simon Josefsson: Trisquel on arm64: Ampere Altra

Having had success running Trisquel on the ppc64 Talos II, I felt ready to get an arm64 machine running Trisquel. I have a Ampere Altra Developer Platform from ADLINK, which is a fairly powerful desktop machine. While there were some issues during installation, I m happy to say the machine is stable and everything appears to work fine.

ISO images for non-amd64 platforms are unfortunately still hidden from the main Trisquel download area, so you will have to use the following procedure to download and extract a netinst ISO image (using debian-installer) and write it to a USB memory device. Another unfortunate problem is that there are no OpenPGP signatures or hash checksums, but below I publish one checksum.

wget -q http://builds.trisquel.org/debian-installer-images/debian-installer-images_20210731+deb11u9+11.0trisquel15_arm64.tar.gz
tar xfa debian-installer-images_20210731+deb11u9+11.0trisquel15_arm64.tar.gz ./installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso
echo '311732519cc8c7c1bb2fe873f134fdafb211ef3bcb5b0d2ecdc6ea4e3b336357  installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso'   sha256sum -c
sudo wipefs -a /dev/sdX
sudo dd if=installer-arm64/20210731+deb11u9+11/images/netboot/mini.iso of=/dev/sdX conv=sync status=progress

Insert the USB stick in a USB slot in the machine, and power up. Press ESCAPE at the BIOS prompt and select the USB device as the boot device. The first problem that hit me was that translations didn t work, I selected Swedish but the strings were garbled. Rebooting and selecting the default English worked fine. For installation, you need Internet connectivity and I use the RJ45 port closest to VGA/serial which is available as enP5p1s0 in the installer. I wouldn t connect the BMC RJ45 port to anything unless you understand the security implications. During installation you have to create a EFI partition for booting, and I ended up with one 1GB EFI partition, one 512GB ext4 partition for / with discard/noatime options, and a 32GB swap partition. The installer did not know about any Trisquel mirrors, but only had the default archive.trisquel.org, so if you need to use a mirror, take a note of the necessary details. The installation asks me about which kernel to install, and I went with the default linux-generic which results in a 5.15 linux-libre kernel. At the end of installation, unfortunately grub failed with a mysterious error message: Unable to install GRUB in dummy. Executing 'grub-install dummy' failed. On another console there is a better error message: failed to register the EFI boot entry. There are some references to file descriptor issues. Perhaps I partitioned the disk in a bad way, or this is a real bug in the installer for this platform. I continued installation, and it appears the installer was able to write GRUB to the device, but not add the right boot menu. So I was able to finish the installation properly, and then reboot and manually type the following GRUB commands: linux (hd0,gpt2)/boot/vmlinuz initrd (hd0,gpt2)/boot/initrd.img boot. Use the GRUB ls command to find the right device. See images below for more information.

Booting and installing GRUB again manually works fine:

root@ampel:~# update-grub
Sourcing file  /etc/default/grub'
Sourcing file  /etc/default/grub.d/background.cfg'
Sourcing file  /etc/default/grub.d/init-select.cfg'
Generating grub configuration file ...
Found linux image: /boot/vmlinuz-5.15.0-91-generic
Found initrd image: /boot/initrd.img-5.15.0-91-generic
Found linux image: /boot/vmlinuz-5.15.0-58-generic
Found initrd image: /boot/initrd.img-5.15.0-58-generic
Warning: os-prober will not be executed to detect other bootable partitions.
Systems on them will not be added to the GRUB boot configuration.
Check GRUB_DISABLE_OS_PROBER documentation entry.
Adding boot menu entry for UEFI Firmware Settings ...
done
root@ampel:~#

During installation I tend to avoid selecting any tasksel components, in part because it didn t use a local mirror to gain network speed, and in part because I don t want to generate OpenSSH keys in a possibly outdated environment that is harder to audit and reproducible rebuild than the finally installed system. When I selected the OpenSSH and GNOME tasksel, I get an error, but fortunately using apt get directly is simple.

root@ampel:~# tasksel
Tasksel GNOME failed:
tasksel: apt-get failed (100)
root@ampel:~# apt-get install trisquel-gnome ssh

Graphics in GNOME was slow using the built-in ASPEED AST2500 VGA controller with linux-libre 5.15. There are kernels labeled 64k but I haven t tested them, and I m not sure they would bring any significant advantage. I simply upgraded to a more recent linux-libre 6.2 kernel via the linux-image-generic-hwe-11.0 virtual package. After a reboot, graphics in GNOME is usable.

root@ampel:~# apt-get install linux-image-generic-hwe-11.0

There seems to be some issue with power-saving inside GNOME, since the machine becomes unresponsive after 20 minutes, and I m unable to make it resume via keyboard or power button. Disabling the inactivity power setting in GNOME works fine to resolve this. I will now put this machine to some more heavy use and see how it handles it. I hope to find more suitable arm64-based servers to complement my ppc64el-based servers in the future, as this ADLINK Ampere Altra Developer Platform with liquid-cooling is more of a toy than a serious server for use in a datacentre. Happy Trisquel-on-arm64 Hacking!

28 December 2023

Simon Josefsson: Validating debian/copyright: licenserecon

Recently I noticed a new tool called licenserecon written by Peter Blackman, and I helped get licenserecon into Debian. The purpose of licenserecon is to reconcile licenses from debian/copyright against the output from licensecheck, a tool written by Jonas Smedegaard. It assumes DEP5 copyright files. You run the tool in a directory that has a debian/ sub-directory, and its output when it notices mismatches (this is for resolv-wrapper):

# sudo apt install licenserecon
jas@kaka:~/dpkg/resolv-wrapper$ lrc
Parsing Source Tree ....
Running licensecheck ....
d/copyright       licensecheck
BSD-3-Clauses     BSD-3-clause     src/resolv_wrapper.c
BSD-3-Clauses     BSD-3-clause     tests/dns_srv.c
BSD-3-Clauses     BSD-3-clause     tests/test_dns_fake.c
BSD-3-Clauses     BSD-3-clause     tests/test_res_query_search.c
BSD-3-Clauses     BSD-3-clause     tests/torture.c
BSD-3-Clauses     BSD-3-clause     tests/torture.h
jas@kaka:~/dpkg/resolv-wrapper$

Noticing one-character typos like this may not bring satisfaction except to the most obsessive-compulsive among us, however the tool has the potential of discovering more serious mistakes. Using it manually once in a while may be useful, however I tend to forget QA steps that are not automated. Could we add this to the Salsa CI/CD pipeline? I recently proposed a merge request to add a wrap-and-sort job to the Salsa CI/CD pipeline (disabled by default) and learned how easy it was to extend it. I think licenserecon is still a bit rough on the edges, and I haven t been able to successfully use it on any but the simplest packages yet. I wouldn t want to suggest it is added to the normal Salsa CI/CD pipeline, even if disabled. If you maintain a Debian package on Salsa and wish to add a licenserecon job to your pipeline, I wrote licenserecon.yml for you. The simplest way to use licenserecon.yml is to replace recipes/debian.yml@salsa-ci-team/pipeline as the Salsa CI/CD configuration file setting with debian/salsa-ci.yml@debian/licenserecon. If you use a debian/salsa-ci.yml file you may put something like this in it instead:

---
include:
  - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/recipes/debian.yml
  - https://salsa.debian.org/debian/licenserecon/raw/main/debian/licenserecon.yml

Once you trigger the pipeline, this will result in a new job licenserecon that validates debian/copyright against licensecheck output on every build! I have added this to the libcpucycles package on Salsa and the pipeline contains a new job licenserecon whose output currently ends with:

$ cd $ WORKING_DIR /$ SOURCE_DIR 
$ lrc
Parsing Source Tree ....
Running licensecheck ....
No differences found
Cleaning up project directory and file based variables

If upstream releases a new version with files not matching our debian/copyright file, we will detect that on the next Salsa build job rather than months later when somebody happens to run the tools manually or there is some license conflict. Incidentally licenserecon is written in Pascal which brought back old memories with Turbo Pascal back in the MS-DOS days. Thanks Peter for licenserecon, and Jonas for licensecheck making this possible!

9 December 2023

Simon Josefsson: Classic McEliece goes to IETF and OpenSSH

My earlier work on Streamlined NTRU Prime has been progressing along. The IETF document on sntrup761 in SSH has passed several process points. GnuPG s libgcrypt has added support for sntrup761. The libssh support for sntrup761 is working, but the merge request is stuck mostly due to lack of time to debug why the regression test suite sporadically errors out in non-sntrup761 related parts with the patch. The foundation for lattice-based post-quantum algorithms has some uncertainty around it, and I have felt that there is more to the post-quantum story than adding sntrup761 to implementations. Classic McEliece has been mentioned to me a couple of times, and I took some time to learn it and did a cut n paste job of the proposed ISO standard and published draft-josefsson-mceliece in the IETF to make the algorithm easily available to the IETF community. A high-quality implementation of Classic McEliece has been published as libmceliece and I ve been supporting the work of Jan Moj to package libmceliece for Debian, alas it has been stuck in the ftp-master NEW queue for manual review for over two months. The pre-dependencies librandombytes and libcpucycles are available in Debian already. All that text writing and packaging work set the scene to write some code. When I added support for sntrup761 in libssh, I became familiar with the OpenSSH code base, so it was natural to return to OpenSSH to experiment with a new SSH KEX for Classic McEliece. DJB suggested to pick mceliece6688128 and combine it with the existing X25519+sntrup761 or with plain X25519. While a three-algorithm hybrid between X25519, sntrup761 and mceliece6688128 would be a simple drop-in for those that don t want to lose the benefits offered by sntrup761, I decided to start the journey on a pure combination of X25519 with mceliece6688128. The key combiner in sntrup761x25519 is a simple SHA512 call and the only good I can say about that is that it is simple to describe and implement, and doesn t raise too many questions since it is already deployed. After procrastinating coding for months, once I sat down to work it only took a couple of hours until I had a successful Classic McEliece SSH connection. I suppose my brain had sorted everything in background before I started. To reproduce it, please try the following in a Debian testing environment (I use podman to get a clean environment).

# podman run -it --rm debian:testing-slim
apt update
apt dist-upgrade -y
apt install -y wget python3 librandombytes-dev libcpucycles-dev gcc make git autoconf libz-dev libssl-dev
cd ~
wget -q -O- https://lib.mceliece.org/libmceliece-20230612.tar.gz   tar xfz -
cd libmceliece-20230612/
./configure
make install
ldconfig
cd ..
git clone https://gitlab.com/jas/openssh-portable
cd openssh-portable
git checkout jas/mceliece
autoreconf
./configure # verify 'libmceliece support: yes'
make # CC="cc -DDEBUG_KEX=1 -DDEBUG_KEXDH=1 -DDEBUG_KEXECDH=1"

You should now have a working SSH client and server that supports Classic McEliece! Verify support by running ./ssh -Q kex and it should mention mceliece6688128x25519-sha512@openssh.com. To have it print plenty of debug outputs, you may remove the # character on the final line, but don t use such a build in production. You can test it as follows:

./ssh-keygen -A # writes to /usr/local/etc/ssh_host_...
# setup public-key based login by running the following:
./ssh-keygen -t rsa -f ~/.ssh/id_rsa -P ""
cat ~/.ssh/id_rsa.pub > ~/.ssh/authorized_keys
adduser --system sshd
mkdir /var/empty
while true; do $PWD/sshd -p 2222 -f /dev/null; done &
./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date

On the client you should see output like this:

OpenSSH_9.5p1, OpenSSL 3.0.11 19 Sep 2023
...
debug1: SSH2_MSG_KEXINIT sent
debug1: SSH2_MSG_KEXINIT received
debug1: kex: algorithm: mceliece6688128x25519-sha512@openssh.com
debug1: kex: host key algorithm: ssh-ed25519
debug1: kex: server->client cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: kex: client->server cipher: chacha20-poly1305@openssh.com MAC: <implicit> compression: none
debug1: expecting SSH2_MSG_KEX_ECDH_REPLY
debug1: SSH2_MSG_KEX_ECDH_REPLY received
debug1: Server host key: ssh-ed25519 SHA256:YognhWY7+399J+/V8eAQWmM3UFDLT0dkmoj3pIJ0zXs
...
debug1: Host '[localhost]:2222' is known and matches the ED25519 host key.
debug1: Found key in /root/.ssh/known_hosts:1
debug1: rekey out after 134217728 blocks
debug1: SSH2_MSG_NEWKEYS sent
debug1: expecting SSH2_MSG_NEWKEYS
debug1: SSH2_MSG_NEWKEYS received
debug1: rekey in after 134217728 blocks
...
debug1: Sending command: date
debug1: pledge: fork
debug1: permanently_set_uid: 0/0
Environment:
  USER=root
  LOGNAME=root
  HOME=/root
  PATH=/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin
  MAIL=/var/mail/root
  SHELL=/bin/bash
  SSH_CLIENT=::1 46894 2222
  SSH_CONNECTION=::1 46894 ::1 2222
debug1: client_input_channel_req: channel 0 rtype exit-status reply 0
debug1: client_input_channel_req: channel 0 rtype eow@openssh.com reply 0
Sat Dec  9 22:22:40 UTC 2023
debug1: channel 0: free: client-session, nchannels 1
Transferred: sent 1048044, received 3500 bytes, in 0.0 seconds
Bytes per second: sent 23388935.4, received 78108.6
debug1: Exit status 0

Notice the kex: algorithm: mceliece6688128x25519-sha512@openssh.com output. How about network bandwidth usage? Below is a comparison of a complete SSH client connection such as the one above that log in and print date and logs out. Plain X25519 is around 7kb, X25519 with sntrup761 is around 9kb, and mceliece6688128 with X25519 is around 1MB. Yes, Classic McEliece has large keys, but for many environments, 1MB of data for the session establishment will barely be noticeable.

./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date 2>&1   grep ^Transferred
Transferred: sent 3028, received 3612 bytes, in 0.0 seconds
./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date 2>&1   grep ^Transferred
Transferred: sent 4212, received 4596 bytes, in 0.0 seconds
./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date 2>&1   grep ^Transferred
Transferred: sent 1048044, received 3764 bytes, in 0.0 seconds

So how about session establishment time?

date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=curve25519-sha256 date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:19 UTC 2023
Sat Dec  9 22:39:25 UTC 2023
# 6 seconds
date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=sntrup761x25519-sha512@openssh.com date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:29 UTC 2023
Sat Dec  9 22:39:38 UTC 2023
# 9 seconds
date; i=0; while test $i -le 100; do ./ssh -v -p 2222 localhost -oKexAlgorithms=mceliece6688128x25519-sha512@openssh.com date > /dev/null 2>&1; i= expr $i + 1 ; done; date
Sat Dec  9 22:39:55 UTC 2023
Sat Dec  9 22:40:07 UTC 2023
# 12 seconds

I never noticed adding sntrup761, so I m pretty sure I wouldn t notice this increase either. This is all running on my laptop that runs Trisquel so take it with a grain of salt but at least the magnitude is clear. Future work items include:

Use a better hybrid mode combiner than SHA512? See for example KEM Combiners.
Write IETF document describing the Classic McEliece SSH protocol
Submit my patch to the OpenSSH community for discussion and feedback, please review meanwhile!
Implement a mceliece6688128sntrup761x25519 multi-hybrid mode?
Write a shell script a la sntrup761.sh to import a stripped-down mceliece6688128 implementation to avoid having OpenSSH depend on libmceliece?
My kexmceliece6688128x25519.c is really just kexsntrup761x25519.c with the three calls to sntrup761 replaced with mceliece6688128. This should be parametrized to avoid cut n paste of code, if the OpenSSH community is interested.
Consider if the behaviour of librandombytes related to closing of file descriptors is relevant to OpenSSH.

Happy post-quantum SSH ing! Update: Changing the mceliece6688128_keypair call to mceliece6688128f_keypair (i.e., using the fully compatible f-variant) results in McEliece being just as fast as sntrup761 on my machine. Update 2023-12-26: An initial IETF document draft-josefsson-ssh-mceliece-00 published.

1 September 2023

Simon Josefsson: Trisquel on ppc64el: Talos II

The release notes for Trisquel 11.0 Aramo mention support for POWER and ARM architectures, however the download area only contains links for x86, and forum posts suggest there is a lack of instructions how to run Trisquel on non-x86. Since the release of Trisquel 11 I have been busy migrating x86 machines from Debian to Trisquel. One would think that I would be finished after this time period, but re-installing and migrating machines is really time consuming, especially if you allow yourself to be distracted every time you notice something that Really Ought to be improved. Rabbit holes all the way down. One of my production machines is running Debian 11 bullseye on a Talos II Lite machine from Raptor Computing Systems, and migrating the virtual machines running on that host (including the VM that serves this blog) to a x86 machine running Trisquel felt unsatisfying to me. I want to migrate my computing towards hardware that harmonize with FSF s Respects Your Freedom and not away from it. Here I had to chose between using the non-free software present in newer Debian or the non-free software implied by most x86 systems: not an easy chose. So I have ignored the dilemma for some time. After all, the machine was running Debian 11 bullseye , which was released before Debian started to require use of non-free software. With the end-of-life date for bullseye approaching, it seems that this isn t a sustainable choice. There is a report open about providing ppc64el ISOs that was created by Jason Self shortly after the release, but for many months nothing happened. About a month ago, Luis Guzm n mentioned an initial ISO build and I started testing it. The setup has worked well for a month, and with this post I want to contribute instructions how to get it up and running since this is still missing. The setup of my soon-to-be new production machine:

Talos II Lite
POWER9 18-core v2 CPU
Inter-Tech 4U-4410 rack case with ASPOWER power supply
8x32GB DDR4-2666 ECC RDIMM
HighPoint SSD7505 (the Rocket 1504 or 1204 would be a more cost-effective choice, but I re-used a component I had laying around)
PERC H700 aka LSI MegaRAID 2108 SAS/SATA (also found laying around)
2x1TB NVMe
3x18TB disks

According to the notes in issue 14 the ISO image is available at https://builds.trisquel.org/debian-installer-images/ and the following commands download, integrity check and write it to a USB stick:

wget -q https://builds.trisquel.org/debian-installer-images/debian-installer-images_20210731+deb11u8+11.0trisquel14_ppc64el.tar.gz
tar xfa debian-installer-images_20210731+deb11u8+11.0trisquel14_ppc64el.tar.gz ./installer-ppc64el/20210731+deb11u8+11/images/netboot/mini.iso
echo '6df8f45fbc0e7a5fadf039e9de7fa2dc57a4d466e95d65f2eabeec80577631b7  ./installer-ppc64el/20210731+deb11u8+11/images/netboot/mini.iso'   sha256sum -c
sudo wipefs -a /dev/sdX
sudo dd if=./installer-ppc64el/20210731+deb11u8+11/images/netboot/mini.iso of=/dev/sdX conv=sync status=progress

Sadly, no hash checksums or OpenPGP signatures are published. Power off your device, insert the USB stick, and power it up, and you see a Petitboot menu offering to boot from the USB stick. For some reason, the "Expert Install" was the default in the menu, and instead I select "Default Install" for the regular experience. For this post, I will ignore BMC/IPMI, as interacting with it is not necessary. Make sure to not connect the BMC/IPMI ethernet port unless you are willing to enter that dungeon. The VGA console works fine with a normal USB keyboard, and you can chose to use only the second enP4p1s0f1 network card in the network card selection menu. If you are familiar with Debian netinst ISO s, the installation is straight-forward. I complicate the setup by partitioning two RAID1 partitions on the two NVMe sticks, one RAID1 for a 75GB ext4 root filesystem (discard,noatime) and one RAID1 for a 900GB LVM volume group for virtual machines, and two 20GB swap partitions on each of the NVMe sticks (to silence a warning about lack of swap, I m not sure swap is still a good idea?). The 3x18TB disks use DM-integrity with RAID1 however the installer does not support DM-integrity so I had to create it after the installation. There are two additional matters worth mentioning:

Selecting the apt mirror does not have the list of well-known Trisquel mirrors which the x86 installer offers. Instead I have to input the archive mirror manually, and fortunately the archive.trisquel.org hostname and path values are available as defaults, so I just press enter and fix this after the installation has finished. You may want to have the hostname/path of your local mirror handy, to speed things up.
The installer asks me which kernel to use, which the x86 installer does not do. I believe older Trisquel/Ubuntu installers asked this question, but that it was gone in aramo on x86. I select the default linux-image-generic which gives me a predictable 5.15 Linux-libre kernel, although you may want to chose linux-image-generic-hwe-11.0 for a more recent 6.2 Linux-libre kernel. Maybe this is intentional debinst-behaviour for non-x86 platforms?

I have re-installed the machine a couple of times, and have now finished installing the production setup. I haven t ran into any serious issues, and the system has been stable. Time to wrap up, and celebrate that I now run an operating system aligned with the Free System Distribution Guidelines on hardware that aligns with Respects Your Freedom Happy Hacking indeed!

16 August 2023

Simon Josefsson: Enforcing wrap-and-sort -satb

For Debian package maintainers, the wrap-and-sort tool is one of those nice tools that I use once in a while, and every time have to re-read the documentation to conclude that I want to use the --wrap-always --short-indent --trailing-comma --sort-binary-package options (or -satb for short). Every time, I also wish that I could automate this and have it always be invoked to keep my debian/ directory tidy, so I don t have to do this manually once every blue moon. I haven t found a way to achieve this automation in a non-obtrusive way that interacts well with my git-based packaging workflow. Ideally I would like for something like the lintian-hook during gbp buildpackage to check for this ideas? Meanwhile, I have come up with a way to make sure I don t forget to run wrap-and-sort for long, and that others who work on the same package won t either: create an autopkgtest which is invoked during the Salsa CI/CD pipeline using the following as debian/tests/wrap-and-sort:

#!/bin/sh
set -eu
TMPDIR=$(mktemp -d)
trap "rm -rf $TMPDIR" 0 INT QUIT ABRT PIPE TERM
cp -a debian $TMPDIR
cd $TMPDIR
wrap-and-sort -satb
diff -ur $OLDPWD/debian debian

Add the following to debian/tests/control to invoke it which is intentionally not indented properly so that the self-test will fail so you will learn how it behaves.

Tests: wrap-and-sort
Depends: devscripts, python3-debian
Restrictions: superficial

Now I will get build failures in the pipeline once I upload the package into Salsa, which I usually do before uploading into Debian. I will get a diff output, and it won t be happy until I push a commit with the output of running wrap-and-sort with the parameters I settled with. While autopkgtest is intended to test the installed package, the tooling around autopkgtest is powerful and easily allows this mild abuse of its purpose for a pleasant QA improvement. Thoughts? Happy hacking!

11 July 2023

Simon Josefsson: Coping with non-free software in Debian

A personal reflection on how I moved from my Debian home to find two new homes with Trisquel and Guix for my own ethical computing, and while doing so settled my dilemma about further Debian contributions. Debian s contributions to the free software community has been tremendous. Debian was one of the early distributions in the 1990 s that combined the GNU tools (compiler, linker, shell, editor, and a set of Unix tools) with the Linux kernel and published a free software operating system. Back then there were little guidance on how to publish free software binaries, let alone entire operating systems. There was a lack of established community processes and conflict resolution mechanisms, and lack of guiding principles to motivate the work. The community building efforts that came about in parallel with the technical work has resulted in a steady flow of releases over the years. From the work of Richard Stallman and the Free Software Foundation (FSF) during the 1980 s and early 1990 s, there was at the time already an established definition of free software. Inspired by free software definition, and a belief that a social contract helps to build a community and resolve conflicts, Debian s social contract (DSC) with the free software community was published in 1997. The DSC included the Debian Free Software Guidelines (DFSG), which directly led to the Open Source Definition.

I was introduced to GNU/Linux through Slackware in the early 1990 s (oh boy those nights calculating XFree86 modeline s and debugging sendmail.cf) and primarily used RedHat Linux during ca 1995-2003. I switched to Debian during the Woody release cycles, when the original RedHat Linux was abandoned and Fedora launched. It was Debian s explicit community processes and infrastructure that attracted me. The slow nature of community processes also kept me using RedHat for so long: centralized and dogmatic decision processes often produce quick and effective outcomes, and in my opinion RedHat Linux was technically better than Debian ca 1995-2003. However the RedHat model was not sustainable, and resulted in the RedHat vs Fedora split. Debian catched up, and reached technical stability once its community processes had been grounded. I started participating in the Debian community around late 2006. My interpretation of Debian s social contract is that Debian should be a distribution of works licensed 100% under a free license. The Debian community has always been inclusive towards non-free software, creating the contrib/non-free section and permitting use of the bug tracker to help resolve issues with non-free works. This is all explained in the social contract. There has always been a clear boundary between free and non-free work, and there has been a commitment that the Debian system itself would be 100% free. The concern that RedHat Linux was not 100% free software was not critical to me at the time: I primarily (and happily) ran GNU tools on Solaris, IRIX, AIX, OS/2, Windows etc. Running GNU tools on RedHat Linux was an improvement, and I hadn t realized it was possible to get rid of all non-free software on my own primary machine. Debian realized that goal for me. I ve been a believer in that model ever since. I can use Solaris, macOS, Android etc knowing that I have the option of using a 100% free Debian. While the inclusive approach towards non-free software invite and deserve criticism (some argue that being inclusive to non-inclusive behavior is a bad idea), I believe that Debian s approach was a successful survival technique: by being inclusive to and a compromise between free and non-free communities, Debian has been able to stay relevant and contribute to both environments. If Debian had not served and contributed to the free community, I believe free software people would have stopped contributing. If Debian had rejected non-free works completely, I don t think the successful Ubuntu distribution would have been based on Debian. I wrote the majority of the text above back in September 2022, intending to post it as a way to argue for my proposal to maintain the status quo within Debian. I didn t post it because I felt I was saying the obvious, and that the obvious do not need to be repeated, and the rest of the post was just me going down memory lane. The Debian project has been a sustainable producer of a 100% free OS up until Debian 11 bullseye. In the resolution on non-free firmware the community decided to leave the model that had resulted in a 100% free Debian for so long. The goal of Debian is no longer to publish a 100% free operating system, instead this was added: The Debian official media may include firmware . Indeed the Debian 12 bookworm release has confirmed that this would not only be an optional possibility. The Debian community could have published a 100% free Debian, in parallel with the non-free Debian, and still be consistent with their newly adopted policy, but chose not to. The result is that Debian s policies are not consistent with their actions. It doesn t make sense to claim that Debian is 100% free when the Debian installer contains non-free software. Actions speaks louder than words, so I m left reading the policies as well-intended prose that is no longer used for guidance, but for the peace of mind for people living in ivory towers. And to attract funding, I suppose. So how to deal with this, on a personal level? I did not have an answer to that back in October 2022 after the vote. It wasn t clear to me that I would ever want to contribute to Debian under the new social contract that promoted non-free software. I went on vacation from any Debian work. Meanwhile Debian 12 bookworm was released, confirming my fears. I kept coming back to this text, and my only take-away was that it would be unethical for me to use Debian on my machines. Letting actions speak for themselves, I switched to PureOS on my main laptop during October, barely noticing any difference since it is based on Debian 11 bullseye. Back in December, I bought a new laptop and tried Trisquel and Guix on it, as they promise a migration path towards ppc64el that PureOS do not. While I pondered how to approach my modest Debian contributions, I set out to learn Trisquel and gained trust in it. I migrated one Debian machine after another to Trisquel, and started to use Guix on others. Migration was easy because Trisquel is based on Ubuntu which is based on Debian. Using Guix has its challenges, but I enjoy its coherant documented environment. All of my essential self-hosted servers (VM hosts, DNS, e-mail, WWW, Nextcloud, CI/CD builders, backup etc) uses Trisquel or Guix now. I ve migrated many GitLab CI/CD rules to use Trisquel instead of Debian, to have a more ethical computing base for software development and deployment. I wish there were official Guix docker images around. Time has passed, and when I now think about any Debian contributions, I m a little less muddled by my disappointment of the exclusion of a 100% free Debian. I realize that today I can use Debian in the same way that I use macOS, Android, RHEL or Ubuntu. And what prevents me from contributing to free software on those platforms? So I will make the occasional Debian contribution again, knowing that it will also indirectly improve Trisquel. To avoid having to install Debian, I need a development environment in Trisquel that allows me to build Debian packages. I have found a recipe for doing this:

# System commands:
sudo apt-get install debhelper git-buildpackage debian-archive-keyring
sudo wget -O /usr/share/debootstrap/scripts/debian-common https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/debian-common
sudo wget -O /usr/share/debootstrap/scripts/sid https://sources.debian.org/data/main/d/debootstrap/1.0.128%2Bnmu2/scripts/sid
# Run once to create build image:
DIST=sid git-pbuilder create --mirror http://deb.debian.org/debian/ --debootstrapopts "--exclude=usr-is-merged" --basepath /var/cache/pbuilder/base-sid.cow
# Run in a directory with debian/ to build a package:
gbp buildpackage --git-pbuilder --git-dist=sid

How to sustainably deliver a 100% free software binary distributions seems like an open question, and the challenges are not all that different compared to the 1990 s or early 2000 s. I m hoping Debian will come back to provide a 100% free platform, but my fear is that Debian will compromise even further on the free software ideals rather than the opposite. With similar arguments that were used to add the non-free firmware, Debian could compromise the free software spirit of the Linux boot process (e.g., non-free boot images signed by Debian) and media handling (e.g., web browsers and DRM), as Debian have already done with appstore-like functionality for non-free software (Python pip). To learn about other freedom issues in Debian packaging, browsing Trisquel s helper scripts may enlight you. Debian s setback and the recent setback for RHEL-derived distributions are sad, and it will be a challenge for these communities to find internally consistent coherency going forward. I wish them the best of luck, as Debian and RHEL are important for the wider free software eco-system. Let s see how the community around Trisquel, Guix and the other FSDG-distributions evolve in the future. The situation for free software today appears better than it was years ago regardless of Debian and RHEL s setbacks though, which is important to remember! I don t recall being able install a 100% free OS on a modern laptop and modern server as easily as I am able to do today. Happy Hacking! Addendum 22 July 2023: The original title of this post was Coping with non-free Debian, and there was a thread about it that included feedback on the title. I do agree that my initial title was confrontational, and I ve changed it to the more specific Coping with non-free software in Debian. I do appreciate all the fine free software that goes into Debian, and hope that this will continue and improve, although I have doubts given the opinions expressed by the majority of developers. For the philosophically inclined, it is interesting to think about what it means to say that a compilation of software is freely licensed. At what point does a compilation of software deserve the labels free vs non-free? Windows probably contains some software that is published as free software, let s say Windows is 1% free. Apple authors a lot of free software (as a tangent, Apple probably produce more free software than what Debian as an organization produces), and let s say macOS contains 20% free software. Solaris (or some still maintained derivative like OpenIndiana) is mostly freely licensed these days, isn t it? Let s say it is 80% free. Ubuntu and RHEL pushes that closer to let s say 95% free software. Debian used to be 100% but is now slightly less at maybe 99%. Trisquel and Guix are at 100%. At what point is it reasonable to call a compilation free? Does Debian deserve to be called freely licensed? Does macOS? Is it even possible to use these labels for compilations in any meaningful way? All numbers just taken from thin air. It isn t even clear how this can be measured (binary bytes? lines of code? CPU cycles? etc). The caveat about license review mistakes applies. I ignore Debian s own claims that Debian is 100% free software, which I believe is inconsistent and no longer true under any reasonable objective analysis. It was not true before the firmware vote since Debian ships with non-free blobs in the Linux kernel for example.

31 May 2023

Russell Coker: Links May 2023

Petter Reinholdtsen wrote an interesting blog post about their work on packaging speech to text for Debian [1]. The work of the Debian Deep Learning Team seems really interesting and I look forward to playing with this sort of thing after the release of Bookworm (the packages in question will NOT go in Bookworm but I ll run at least one system on Testing after Bookworm). It would be nice to get more information on the hardware used for running such programs, the minimum hardware needed for real-time speech to text would be interesting to know. Brian Krebs wrote an informative article about attacks involving supply chain compromise and fake LinkedIn profiles [2]. The attacks targetted Linux as well as Windows. Interesting video about the Illium cameras, a bit harsh though, they criticise Illium devices for being too low resolution, too expensive, and taking too much CPU time to process [3]. The Illium cameras still sell for decent prices on eBay, I wonder if it s because of curious people like me who would like to play with them and have money to spare or whether some other interesting things are being done. I wonder how a 4*4 array of the rectangular cameras secured together with duct tape would go. The ideas of Illium should work better if implemented for multi-core CPUs or GPUs. Bruce Schneier with Henry Farrell and Nathan Sanders wrote an insightful blog post about how AT Chatbots could improve democracy [4]. Wired has an interesting article about the way DJI drones transmit the location of the drone operator without encryption by design [5]. Apparently this has been used for targetting attacks on drone operators in Ukraine. This video about robot mice navigating mazes is interesting [6]. But I think it became less interesting when they got to the stage of milliseconds counting for the win, it s very optimised for one case just like F1. I think it would be interesting if they had a rally contest where they go across grass or sand, 3D mazes both in air and water, and contests where Tungsten weights have to be transported. They should push some of the other limits of engineering as completing a maze quickly has been solved. The Guardian has an interesting article about a blood test for sleepy driving [7]. Once they have an objective test they can punish people for it. This github repository listing public APIs is interesting [8]. Lots of fun ideas for phone apps there. Simon Josefsson wrote an insightful blog post about the threat model of security devices [9]. Unfortunately the security of most people is way below the level where this is an issue. But it s good to think about future steps needed for good security. Cory Doctorow wrote an interesting article The Swivel Eyed Loons have a Point [10] about the fact that some of the nuttiest people are protesting about real issues, just in the wrong way.

11 May 2023

Simon Josefsson: Streamlined NTRU Prime sntrup761 goes to IETF

The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms. Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets it is a SHA512 call on the concatenated secrets I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656 4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Moj of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work? Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here. Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here. I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them. To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What s with the key size vs block size confusion? What s with the optional parameters? What s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn t the output be different? Are there other ways to verify correctness of implementations? As always, happy hacking!

6 May 2023

Reproducible Builds: Reproducible Builds in April 2023

Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.

General news Trisquel is a fully-free operating system building on the work of Ubuntu Linux. This month, Simon Josefsson published an article on his blog titled Trisquel is 42% Reproducible!. Simon wrote:
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.
Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via `apt-get update`) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called `pacman-bintrans`) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
Joachim Breitner wrote an in-depth blog post on a bootstrap-capable GHC, the primary compiler for the Haskell programming language. As a quick background to what this is trying to solve, in order to generate a fully trustworthy compile chain, trustworthy root binaries are needed and a popular approach to address this problem is called bootstrappable builds where the core idea is to address previously-circular build dependencies by creating a new dependency path using simpler prerequisite versions of software. Joachim takes an somewhat recursive approach to the problem for Haskell, leading to the inadvertently humourous question: Can I turn all of GHC into one module, and compile that? Elsewhere in the world of bootstrapping, Janneke Nieuwenhuizen and Ludovic Court s wrote a blog post on the GNU Guix blog announcing The Full-Source Bootstrap, specifically:
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run `guix pull` today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start. There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the `#bootstrappable` IRC channel.

Michael Ablassmeier created a script called pypidiff as they were looking for a way to track differences between packages published on PyPI. According to Micahel, pypidiff uses diffoscope to create reports on the published releases and automatically pushes them to a GitHub repository. This can be seen on the pypi-diff GitHub page (example).
Eleuther AI, a non-profit AI research group, recently unveiled Pythia, a collection of 16 Large Language Model (LLMs) trained on public data in the same order designed specifically to facilitate scientific research. According to a post on MarkTechPost:
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.
These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
Back in February s report we reported on a series of changes to the Sphinx documentation generator that was initiated after attempts to get the `alembic` Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
WireGuard is a popular Virtual Private Network (VPN) service that aims to be faster, simpler and leaner than other solutions to create secure connections between computing devices. According to a post on the WireGuard developer mailing list, the WireGuard Android app can now be built reproducibly so that its contents can be publicly verified. According to the post by Jason A. Donenfeld, the F-Droid project now does this verification by comparing their build of WireGuard to the build that the WireGuard project publishes. When they match, the new version becomes available. This is very positive news.
Author and public speaker, V. M. Brasseur published a sample chapter from her upcoming book on corporate open source strategy which is the topic of Software Bill of Materials (SBOM):
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]

Several distributions noticed recent versions of the Linux Kernel are no longer reproducible because the BPF Type Format (BTF) metadata is not generated in a deterministic way. This was discussed on the `#reproducible-builds` IRC channel, but no solution appears to be in sight for now.

Community news On our mailing list this month:

Larry Doolittle shared an interesting puzzle with the group where three bytes in a `.zip` file were different between two builds.

Alexis PM wrote a message as they had observed a difference between binaries available in the Debian archive and the ones on tests.reproducible-builds.org. The thread generated a number of replies, including interesting responses from Vagrant Cascadian [ ] and kpcyrd [ ].

Holger Levsen gave a talk at foss-north 2023 in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years. Lastly, there were a number of updates to our website, including:

Chris Lamb attempted a number of ways to try and fix literal `: .lead` appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the `_data/sponsors.yml` file [ ].

Holger Levsen added his recent talk [ ], added Simon Josefsson, Mike Perry and Seth Schoen to the contributors page [ ][ ][ ], reworked the People page a little [ ] [ ], as well as fixed spelling of Arch Linux [ ].

Lastly, Mattia Rizzolo moved some old sponsors to a former section [ ] and Simon Josefsson added Trisquel GNU/Linux. [ ]

Debian

Vagrant Cascadian reported on the Debian s `build-essential` package set, which was inspired by how close we are to making the Debian `build-essential` set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]

Debian Developer Cyril Brulebois (kibi) filed a bug against snapshot.debian.org after they noticed that there are many missing `dinstalls` that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .

20 reviews of Debian packages were added, 21 were updated and 5 were removed this month adding to our knowledge about identified issues. Chris Lamb added a new `build_path_in_line_annotations_added_by_ruby_ragel` toolchain issue. [ ]

Mattia Rizzolo announced that the data for the stretch archive on tests.reproducible-builds.org has been archived. This matches the archival of stretch within Debian itself. This is of some historical interest, as stretch was the first Debian release regularly tested by the Reproducible Builds project.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`ghc` (workaround a parallelism-related issue)

Jan Zerebecki:

`ghc` (report a parallelism-related issue)

Chris Lamb:

#1034147 filed against `ruby-regexp-parser`.

Vagrant Cascadian:

#1033954, #1033955 and #1033957 filed against `pike8.0`.

#1033958 and #1033959 filed against `binutils`.

#1034129 filed against `lomiri-action-api`.

#1034199 and #1034200 filed against `lomiri`.

#1034327 filed against `nmodl`.

#1034423 filed against `php8.2`.

#1034431 filed against `qemu`.

#1034499 filed against `twisted`.

#1034740 filed against `boost1.74`.

#1034892 filed against `php8.2`.

#1035324 filed against `shaderc`.

#1035329 and #1035331 filed against `jackd2`.

diffoscope development diffoscope version `241` was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing `raise` statement that was accidentally dropped in a previous commit. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In April, a number of changes were made, including:

Holger Levsen:

Significant work on a new Documented Jenkins Maintenance (djm) script to support logged maintenance of nodes, etc. [ ][ ][ ][ ][ ][ ]

Add the new APT repo url for Jenkins itself with a new signing key. [ ][ ]

In the Jenkins shell monitor, allow 40 GiB of files for diffoscope for the Debian experimental distribution as Debian is frozen around the release at the moment. [ ]

Updated Arch Linux testing to cleanup leftover files left in `/tmp/archlinux-ci/` after three days. [ ][ ][ ]

Mark a number of nodes hosted by Oregon State University Open Source Lab (OSUOSL) as online and offline. [ ][ ][ ]

Update the node health checks to detect failures to end `schroot` sessions. [ ]

Filter out another duplicate contributor from the contributor statistics. [ ]

Mattia Rizzolo:

Code changes to properly archive the `stretch` Debian distribution. [ ][ ][ ][ ]

Introduce the archived suites configuration option. [ ]][ ]

Fix the KGB bot configuration to support `pyyaml` 6.0 as present in Debian bookworm. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mailing list: `rb-general@lists.reproducible-builds.org`

29 April 2023

Simon Josefsson: How To Trust A Machine

Let s reflect on some of my recent work that started with understanding Trisquel GNU/Linux, improving transparency into apt-archives, working on reproducible builds of Trisquel, strengthening verification of apt-archives with Sigstore, and finally thinking about security device threat models. A theme in all this is improving methods to have trust in machines, or generally any external entity. While I believe that everything starts by trusting something, usually something familiar and well-known, we need to deal with misuse of that trust that leads to failure to deliver what is desired and expected from the trusted entity. How can an entity behave to invite trust? Let s argue for some properties that can be quantitatively measured, with a focus on computer software and hardware:

Deterministic Behavior given a set of circumstances, it should behave the same.
Verifiability and Transparency the method (the source code) should be accessible for understanding (compare scientific method) and its binaries verifiable, i.e., it should be possible to verify that the entity actually follows the intended deterministic method (implying efforts like reproducible builds and bootstrappable builds).
Accountable the entity should behave the same for everyone, and deviation should be possible prove in a way that is hard to deny, implying efforts such as Certificate Transparency and more generic checksum logs like Sigstore and Sigsum.
Liberating the tools and documentation should be available as free software to enable you to replace the trusted entity if so desired. An entity that wants to restrict you from being able to replace the trusted entity is vulnerable to corruption and may stop acting trustworthy. This point of view reinforces that open source misses the point; it has become too common to use trademark laws to restrict re-use of open source software (e.g., firefox, chrome, rust).

Essentially, this boils down to: Trust, Verify and Hold Accountable. To put this dogma in perspective, it helps to understand that this approach may be harmful to human relationships (which could explain the social awkwardness of hackers), but it remains useful as a method to improve the design of computer systems, and a useful method to evaluate safety of computer systems. When a system fails some of the criteria above, we know we have more work to do to improve it. How far have we come on this journey? Through earlier efforts, we are in a fairly good situation. Richard Stallman through GNU/FSF made us aware of the importance of free software, the Reproducible/Bootstrappable build projects made us aware of the importance of verifiability, and Certificate Transparency highlighted the need for accountable signature logs leading to efforts like Sigstore for software. None of these efforts would have seen the light of day unless people wrote free software and packaged them into distributions that we can use, and built hardware that we can run it on. While there certainly exists more work to be done on the software side, with the recent amazing full-source build of Guix based on a 357-byte hand-written seed, I believe that we are closing that loop on the software engineering side. So what remains? Some inspiration for further work:

Accountable binary software distribution remains unresolved in practice, although we have some software components around (e.g., apt-sigstore and guix git authenticate). What is missing is using them for verification by default and/or to improve the signature process to use trustworthy hardware devices, and committing the signatures to transparency logs.
Trustworthy hardware to run trustworthy software on remains a challenge, and we owe FSF s Respect Your Freedom credit for raising awareness of this. Many modern devices requires non-free software to work which fails most of the criteria above and are thus inherently untrustworthy.
Verifying rebuilds of currently published binaries on trustworthy hardware is unresolved.
Completing a full-source rebuild from a small seed on trustworthy hardware remains, preferably on a platform wildly different than X86 such as Raptor s Talos II.
We need improved security hardware devices and improved established practices on how to use them. For example, while Gnuk on the FST enable a trustworthy software and hardware solution, the best process for using it that I can think of generate the cryptographic keys on a more complex device. Efforts like Tillitis are inspiring here.

Onwards and upwards, happy hacking! Update 2023-05-03: Added the Liberating property regarding free software, instead of having it be part of the Verifiability and Transparency .

27 April 2023

Simon Josefsson: A Security Device Threat Model: The Substitution Attack

I d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles. Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack. Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.

The Substitution Attack
An attacker is able to substitute any component of the device (hardware or software) at any time for any period of time.
Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.

Terminology The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don t care about that distinction, we just care if a particular device has a malicious component in it or not. I ll use terms like security device , device , hardware key , security co-processor etc interchangeably. From an engineering point of view, malicious here includes unintentional behavior such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug. Don t attribute to malice what can be adequately explained by stupidity, but don t na vely attribute to stupidity what may be deniable malicious.

What is some period of time ? Some period of time can be any length of time: seconds, minutes, days, weeks, etc. It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase. Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.

What is harmful consequences ? Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:

Attacker learns any secret keys stored on a Good Key.

Attacker causes user to trust a public generated by a Bad Key.

Attacker is able to sign something using a Good Key.

Attacker learns the PIN code used to unlock a Good Key.

Attacker learns data that is decrypted by a Good Key.

Thin vs Deep solutions One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before. A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices. In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices. Upon scrutiny, this solution is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks. The model of the thin solution is similar to the old approach to network firewalls: have a filtering firewall that only lets through intended traffic, and then run completely insecure protocols internally such as telnet. The networking world has evolved, and now we have defense in depth: even within strongly firewall ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking. I m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.

Example: Device authentication using PIN codes To see how this threat model can be applied to reason about security key designs, let s consider a common design. Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct. Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device. This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the `kdf_do=required` option.

Example: Onboard non-predictable key-generation Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key. Let s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling. This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.

Example: Non-predictable randomness-generation Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is. Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not. This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.

Case-Study: Tillitis I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:
cdi = blake2s(UDS, blake2s(device_app), USS)
Let s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:
cdi = weakprng(UDS , weakprng(device_app), USS)
Where `weakprng` is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.

Conclusion Remember that it is impossible to fully protect against this attack, that s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let s assume that there is a non-negligible chance that they haven t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that s why the design mantra of mitigate harmful consequences is crucial as a takeaway from this. Let s improve the design of security devices that further the security of its users!

20 April 2023

Simon Josefsson: Sigstore for Apt Archives: apt-cosign

As suggested in my initial announcement of apt-sigstore my plan was to look into stronger uses of Sigstore than rekor, and I m now happy to announce that the apt-cosign plugin has been added to apt-sigstore and the operational project debdistcanary is publishing cosign-statements about the InRelease file published by the following distributions: Trisquel GNU/Linux, PureOS, Gnuinos, Ubuntu, Debian and Devuan. Summarizing the commands that you need to run as root to experience the great new world:

# run everything as root: su / sudo -i / doas -s
apt-get install -y apt gpg bsdutils wget
wget -nv -O/usr/local/bin/apt-verify-gpgv https://gitlab.com/debdistutils/apt-verify/-/raw/main/apt-verify-gpgv
chmod +x /usr/local/bin/apt-verify-gpgv
mkdir -p /etc/apt/verify.d
ln -s /usr/bin/gpgv /etc/apt/verify.d
echo 'APT::Key::gpgvcommand "apt-verify-gpgv";' > /etc/apt/apt.conf.d/75verify
wget -O/usr/local/bin/cosign https://github.com/sigstore/cosign/releases/download/v2.0.1/cosign-linux-amd64
echo 924754b2e62f25683e3e74f90aa5e166944a0f0cf75b4196ee76cb2f487dd980  /usr/local/bin/cosign   sha256sum -c
chmod +x /usr/local/bin/cosign
wget -nv -O/etc/apt/verify.d/apt-cosign https://gitlab.com/debdistutils/apt-sigstore/-/raw/main/apt-cosign
chmod +x /etc/apt/verify.d/apt-cosign
mkdir -p /etc/apt/trusted.cosign.d
dist=$(lsb_release --short --id   tr A-Z a-z)
wget -O/etc/apt/trusted.cosign.d/cosign-public-key-$dist.txt "https://gitlab.com/debdistutils/debdistcanary/-/raw/main/cosign/cosign-public-key-$dist.txt"
echo "Cosign::Base-URL \"https://gitlab.com/debdistutils/canary/$dist/-/raw/main/cosign\";" > /etc/apt/apt.conf.d/77cosign

Then run your usual apt-get update and look in the syslog to debug things. This is the kind of work that gets done while waiting for the build machines to attempt to reproducibly build PureOS. Unfortunately, the results is that a meager 16% of the 765 added/modifed packages are reproducible by me. There is some infrastructure work to be done to improve things: we should use sbuild for example. The build infrastructure should produce signed statements for each package it builds: One statement saying that it attempted to reproducible build a particular binary package (thus generated some build logs and diffoscope-output for auditing), and one statements saying that it actually was able to reproduce a package. Verifying such claims during apt-get install or possibly dpkg -i is a logical next step. There is some code cleanups and release work to be done now. Which distribution will be the first apt-based distribution that includes native support for Sigstore? Let s see. Sigstore is not the only relevant transparency log around, and I ve been trying to learn a bit about Sigsum to be able to support it as well. The more improved confidence about system security, the merrier!

17 April 2023

Simon Josefsson: More on Differential Reproducible Builds: Devuan is 46% reproducible!

Building on my work to rebuild Trisquel GNU/Linux 11.0 aramo, it felt simple to generalize the tooling to any two apt-repository pairs and I ve created debdistreproduce as a template-project for doing this through the infrastructure of GitLab CI/CD and meanwhile even set up my own gitlab-runner on spare hardware. I ve brought over reproduce/trisquel to using debdistreproduce as well, and archived the old reproduce-trisquel project. After fixing some quirks, building Devuan GNU+Linux 4.0 Chimaera was fairly quick since they do not modify that many packages, and I m now able to reproduce 46% of the packages that Devuan Chimaera add/modify on amd64. I have more work in progress here (hint: reproduce/pureos), but PureOS is considerably larger than both Trisquel and Devuan together. I m not sure how interested Devuan or PureOS are in reproducible builds though. Reflecting on this work made me realize that while the natural thing to do here was to differentiate two different apt-based distributions, I have realized the same way I did for debdistdiff that it would also be interesting to compare, say, Debian bookworm from Debian unstable, especially now that they should be fairly close together. My tooling should support that too. However, to really provide any benefit from the more complete existing reproducible testing of Debian, some further benefit from doing that would be useful and I can t articulate one right now. One ultimate goal with my effort is to improve trust in apt-repositories, and combining transparency-style protection a la apt-sigstore with third-party validated reproducible builds may indeed be one such use-case that would benefit the wider community of apt-repositories. Imagine having your system not install any package unless it can verify it against a third-party reproducible build organization that commits their results in a tamper-proof transparency ledger. But I m now on repeat here, so will stop.

15 April 2023

Simon Josefsson: Sigstore protects Apt archives: apt-verify & apt-sigstore

Do you want your apt-get update to only ever use files whose hash checksum have been recorded in the globally immutable tamper-resistance ledger rekor provided by the Sigstore project? Well I thought you d never ask, but now you can, thanks to my new projects apt-verify and apt-sigstore. I have not done proper stable releases yet, so this is work in progress. To try it out, adapt to the modern era of running random stuff from the Internet as root, and run the following commands. Use a container or virtual machine if you have trust issues.

apt-get install -y apt gpg bsdutils wget
wget -nv -O/usr/local/bin/rekor-cli 'https://github.com/sigstore/rekor/releases/download/v1.1.0/rekor-cli-linux-amd64'
echo afde22f01d9b6f091a7829a6f5d759d185dc0a8f3fd21de22c6ae9463352cf7d  /usr/local/bin/rekor-cli   sha256sum -c
chmod +x /usr/local/bin/rekor-cli
wget -nv -O/usr/local/bin/apt-verify-gpgv https://gitlab.com/debdistutils/apt-verify/-/raw/main/apt-verify-gpgv
chmod +x /usr/local/bin/apt-verify-gpgv
mkdir -p /etc/apt/verify.d
ln -s /usr/bin/gpgv /etc/apt/verify.d
echo 'APT::Key::gpgvcommand "apt-verify-gpgv";' > /etc/apt/apt.conf.d/75verify
wget -nv -O/etc/apt/verify.d/apt-rekor https://gitlab.com/debdistutils/apt-sigstore/-/raw/main/apt-rekor
chmod +x /etc/apt/verify.d/apt-rekor
apt-get update
less /var/log/syslog

If the stars are aligned (and the puppet projects of debdistget and debdistcanary have ran their GitLab CI/CD pipeline recently enough) you will see a successful output from apt-get update and your syslog will contain debug logs showing the entries from the rekor log for the release index files that you downloaded. See sample outputs in the README. If you get tired of it, disabling is easy:

chmod -x /etc/apt/verify.d/apt-rekor

Our project currently supports Trisquel GNU/Linux 10 (nabia) & 11 (aramo), PureOS 10 (byzantium), Gnuinos chimaera, Ubuntu 20.04 (focal) & 22.04 (jammy), Debian 10 (buster) & 11 (bullseye), and Devuan GNU+Linux 4.0 (chimaera). Others can be supported to, please open an issue about it, although my focus is on FSDG-compliant distributions and their upstreams. This is a continuation of my previous work on apt-canary. I have realized that it was better to separate out the generic part of apt-canary into my new project apt-verify that offers a plugin-based method, and then rewrote apt-canary to be one such plugin. Then apt-sigstore s apt-rekor was my second plugin for apt-verify. Due to the design of things, and some current limitations, Ubuntu is the least stable since they push out new signed InRelease files frequently (mostly due to their use of Phased-Update-Percentage) and debdistget and debdistcanary CI/CD runs have a hard time keeping up. If you have insight on how to improve this, please comment me in the issue tracking the race condition. There are limitations of what additional safety a rekor-based solution actually provides, but I expect that to improve as I get a cosign-based approach up and running. Currently apt-rekor mostly make targeted attacks less deniable. With a cosign-based approach, we could design things such that your machine only downloads updates when they have been publicly archived in an immutable fashion, or submitted for validation by a third-party such as my reproducible build setup for Trisquel GNU/Linux aramo. What do you think? Happy Hacking!

10 April 2023

Simon Josefsson: Trisquel is 42% Reproducible!

The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric. tl;dr: go to reproduce-trisquel. When I set about to understand how Trisquel worked, I identified a number of things that would improve my confidence in it. The lowest hanging fruit for me was to manually audit the package archive, and I wrote a tool called debdistdiff to automate this for me. That led me to think about apt archive transparency more in general. I have made some further work in that area (hint: apt-verify) that deserve its own blog post eventually. Most of apt archive transparency is futile if we don t trust the intended packages that are in the archive. One way to measurable increase trust in the package are to provide reproducible builds of the packages, which should by now be an established best practice. Code review is still important, but since it will never provide positive guarantees we need other processes that can identify sub-optimal situations automatically. The way reproducible builds easily identify negative results is what I believe has driven much of its success: its results are tangible and measurable. The field of software engineering is in need of more such practices. The design of my setup to build Trisquel reproducible are as follows.

The project debdistget is responsible for downloading Release/Packages files (which are the most relevant files from dists/) from apt archives, and works by commiting them into GitLab-hosted git-repositories. I maintain several such repositories for popular apt-archives, including for Trisquel and its upstream Ubuntu. GitLab invokes a schedule pipeline to do the downloading, and there is some race conditions here.
The project debdistdiff is used to produce the list of added and modified packages, which are the input to actually being able to know what packages to reproduce. It publishes human readable summary of difference for several distributions, including Trisquel vs Ubuntu. Early on I decided that rebuilding all of the upstream Ubuntu packages is out of scope for me: my personal trust in the official Debian/Ubuntu apt archives are greater than my trust of the added/modified packages in Trisquel.
The final project reproduce-trisquel puts the pieces together briefly as follows, everything being driven from its .gitlab-ci.yml file.
- There is a (manually triggered) job generate-build-image to create a build image to speed up CI/CD runs, using a simple Dockerfile.
- There is a (manually triggered) job generate-package-lists that uses debdistdiff to generate and store package lists and puts its output in lists/. The reason this is manually triggered right now is due to a race condition.
- There is a (scheduled) job that does two things: from the package lists, the script generate-ci-packages.sh builds a GitLab CI/CD instruction file ci-packages.yml that describes jobs for each package to build. The second part is generate-readme.sh that re-generate the project s README.md based on the build logs and diffoscope outputs that stored in the git repository.
- Through the ci-packages.yml file, there is a large number of jobs that are dynamically defined, which currently are manually triggered to not overload the build servers. The script build-package.sh is invoked and attempts to rebuild a package, and stores build log and diffoscope output in the git project itself.

I did not expect to be able to use the GitLab shared runners to do the building, however they turned out to work quite well and I postponed setting up my own runner. There is a manually curated lists/disabled-aramo.txt with some packages that all required too much disk space or took over two hours to build. Today I finally took the time to setup a GitLab runner using podman running Trisquel aramo, and I expect to complete builds of the remaining packages soon one of my Dell R630 server with 256GB RAM and dual 2680v4 CPUs should deliver sufficient performance. Current limitations and ideas on further work (most are filed as project issues) include:

We don t support *.buildinfo files. As far as I am aware, Trisquel does not publish them for their builds. Improving this would be a first step forward, anyone able to help? Compare buildinfo.debian.net. For example, many packages differ only in their NT_GNU_BUILD_ID symbol inside the ELF binary, see example diffoscope output for libgpg-error. By poking around in jenkins.trisquel.org I managed to discover that Trisquel built initramfs-utils in the randomized path /build/initramfs-tools-bzRLUp and hard-coding that path allowed me to reproduce that package. I expect the same to hold for many other packages. Unfortunately, this failure turned into success with that package moved the needle from 42% reproducibility to 43% however I didn t let that stand in the way of a good headline.
The mechanism to download the Release/Package-files from dists/ is not fool-proof: we may not capture all ever published such files. While this is less of a concern for reproducibility, it is more of a concern for apt transparency. Still, having Trisquel provide a service similar to snapshot.debian.org would help.
Having at least one other CPU architecture would be nice.
Due to lack of time and mental focus, handling incremental updates of new versions of packages is not yet working. This means we only ever build one version of a package, and never discover any newly published versions of the same package. Now that Trisquel aramo is released, the expected rate of new versions should be low, but still happens due to security or backports.
Porting this to test supposedly FSDG-compliant distributions such as PureOS and Gnuinos should be relatively easy. I m also looking at Devuan because of Gnuinos.
The elephant in the room is how reproducible Ubuntu is in the first place.

Happy Easter Hacking! Update 2023-04-17: The original project reproduce-trisquel that was announced here has been archived and replaced with two projects, one generic debdistreproduce and one with results for Trisquel: reproduce/trisquel .

27 March 2023

Simon Josefsson: OpenPGP master key on Nitrokey Start

I ve used hardware-backed OpenPGP keys since 2006 when I imported newly generated rsa1024 subkeys to a FSFE Fellowship card. This worked well for several years, and I recall buying more ZeitControl cards for multi-machine usage and backup purposes. As a side note, I recall being unsatisfied with the weak 1024-bit RSA subkeys at the time my primary key was a somewhat stronger 1280-bit RSA key created back in 2002 but OpenPGP cards at the time didn t support more than 1024 bit RSA, and were (and still often are) also limited to power-of-two RSA key sizes which I dislike. I had my master key on disk with a strong password for a while, mostly to refresh expiration time of the subkeys and to sign other s OpenPGP keys. At some point I stopped carrying around encrypted copies of my master key. That was my main setup when I migrated to a new stronger RSA 3744 bit key with rsa2048 subkeys on a YubiKey NEO back in 2014. At that point, signing other s OpenPGP keys was a rare enough occurrence that I settled with bringing out my offline machine to perform this operation, transferring the public key to sign on USB sticks. In 2019 I re-evaluated my OpenPGP setup and ended up creating a offline Ed25519 key with subkeys on a FST-01G running Gnuk. My approach for signing other s OpenPGP keys were still to bring out my offline machine and sign things using the master secret using USB sticks for storage and transport. Which meant I almost never did that, because it took too much effort. So my 2019-era Ed25519 key still only has a handful of signatures on it, since I had essentially stopped signing other s keys which is the traditional way of getting signatures in return. None of this caused any critical problem for me because I continued to use my old 2014-era RSA3744 key in parallel with my new 2019-era Ed25519 key, since too many systems didn t handle Ed25519. However, during 2022 this changed, and the only remaining environment that I still used my RSA3744 key for was in Debian and they require OpenPGP signatures on the new key to allow it to replace an older key. I was in denial about this sub-optimal solution during 2022 and endured its practical consequences, having to use the YubiKey NEO (which I had replaced with a permanently inserted YubiKey Nano at some point) for Debian-related purposes alone. In December 2022 I bought a new laptop and setup a FST-01SZ with my Ed25519 key, and while I have taken a vacation from Debian, I continue to extend the expiration period on the old RSA3744-key in case I will ever have to use it again, so the overall OpenPGP setup was still sub-optimal. Having two valid OpenPGP keys at the same time causes people to use both for email encryption (leading me to have to use both devices), and the WKD Key Discovery protocol doesn t like two valid keys either. At FOSDEM 23 I ran into Andre Heinecke at GnuPG and I couldn t help complain about how complex and unsatisfying all OpenPGP-related matters were, and he mildly ignored my rant and asked why I didn t put the master key on another smartcard. The comment sunk in when I came home, and recently I connected all the dots and this post is a summary of what I did to move my offline OpenPGP master key to a Nitrokey Start. First a word about device choice, I still prefer to use hardware devices that are as compatible with free software as possible, but the FST-01G or FST-01SZ are no longer easily available for purchase. I got a comment about Nitrokey start in my last post, and had two of them available to experiment with. There are things to dislike with the Nitrokey Start compared to the YubiKey (e.g., relative insecure chip architecture, the bulkier form factor and lack of FIDO/U2F/OATH support) but as far as I know there is no more widely available owner-controlled device that is manufactured for an intended purpose of implementing an OpenPGP card. Thus it hits the sweet spot for me.

The first step is to run latest firmware on the Nitrokey Start for bug-fixes and important OpenSSH 9.0 compatibility and there are reproducible-built firmware published that you can install using pynitrokey. I run Trisquel 11 aramo on my laptop, which does not include the Python Pip package (likely because it promotes installing non-free software) so that was a slight complication. Building the firmware locally may have worked, and I would like to do that eventually to confirm the published firmware, however to save time I settled with installing the Ubuntu 22.04 packages on my machine:

$ sha256sum python3-pip*
ded6b3867a4a4cbaff0940cab366975d6aeecc76b9f2d2efa3deceb062668b1c  python3-pip_22.0.2+dfsg-1ubuntu0.2_all.deb
e1561575130c41dc3309023a345de337e84b4b04c21c74db57f599e267114325  python3-pip-whl_22.0.2+dfsg-1ubuntu0.2_all.deb
$ doas dpkg -i python3-pip*
...
$ doas apt install -f
...
$

Installing pynitrokey downloaded a bunch of dependencies, and it would be nice to audit the license and security vulnerabilities for each of them. (Verbose output below slightly redacted.)

jas@kaka:~$ pip3 install --user pynitrokey
Collecting pynitrokey
  Downloading pynitrokey-0.4.34-py3-none-any.whl (572 kB)
Collecting frozendict~=2.3.4
  Downloading frozendict-2.3.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (113 kB)
Requirement already satisfied: click<9,>=8.0.0 in /usr/lib/python3/dist-packages (from pynitrokey) (8.0.3)
Collecting ecdsa
  Downloading ecdsa-0.18.0-py2.py3-none-any.whl (142 kB)
Collecting python-dateutil~=2.7.0
  Downloading python_dateutil-2.7.5-py2.py3-none-any.whl (225 kB)
Collecting fido2<2,>=1.1.0
  Downloading fido2-1.1.0-py3-none-any.whl (201 kB)
Collecting tlv8
  Downloading tlv8-0.10.0.tar.gz (16 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: certifi>=14.5.14 in /usr/lib/python3/dist-packages (from pynitrokey) (2020.6.20)
Requirement already satisfied: pyusb in /usr/lib/python3/dist-packages (from pynitrokey) (1.2.1.post1)
Collecting urllib3~=1.26.7
  Downloading urllib3-1.26.15-py2.py3-none-any.whl (140 kB)
Collecting spsdk<1.8.0,>=1.7.0
  Downloading spsdk-1.7.1-py3-none-any.whl (684 kB)
Collecting typing_extensions~=4.3.0
  Downloading typing_extensions-4.3.0-py3-none-any.whl (25 kB)
Requirement already satisfied: cryptography<37,>=3.4.4 in /usr/lib/python3/dist-packages (from pynitrokey) (3.4.8)
Collecting intelhex
  Downloading intelhex-2.3.0-py2.py3-none-any.whl (50 kB)
Collecting nkdfu
  Downloading nkdfu-0.2-py3-none-any.whl (16 kB)
Requirement already satisfied: requests in /usr/lib/python3/dist-packages (from pynitrokey) (2.25.1)
Collecting tqdm
  Downloading tqdm-4.65.0-py3-none-any.whl (77 kB)
Collecting nrfutil<7,>=6.1.4
  Downloading nrfutil-6.1.7.tar.gz (845 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: cffi in /usr/lib/python3/dist-packages (from pynitrokey) (1.15.0)
Collecting crcmod
  Downloading crcmod-1.7.tar.gz (89 kB)
  Preparing metadata (setup.py) ... done
Collecting libusb1==1.9.3
  Downloading libusb1-1.9.3-py3-none-any.whl (60 kB)
Collecting pc_ble_driver_py>=0.16.4
  Downloading pc_ble_driver_py-0.17.0-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.9 MB)
Collecting piccata
  Downloading piccata-2.0.3-py3-none-any.whl (21 kB)
Collecting protobuf<4.0.0,>=3.17.3
  Downloading protobuf-3.20.3-cp310-cp310-manylinux_2_12_x86_64.manylinux2010_x86_64.whl (1.1 MB)
Collecting pyserial
  Downloading pyserial-3.5-py2.py3-none-any.whl (90 kB)
Collecting pyspinel>=1.0.0a3
  Downloading pyspinel-1.0.3.tar.gz (58 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: pyyaml in /usr/lib/python3/dist-packages (from nrfutil<7,>=6.1.4->pynitrokey) (5.4.1)
Requirement already satisfied: six>=1.5 in /usr/lib/python3/dist-packages (from python-dateutil~=2.7.0->pynitrokey) (1.16.0)
Collecting pylink-square<0.11.9,>=0.8.2
  Downloading pylink_square-0.11.1-py2.py3-none-any.whl (78 kB)
Collecting jinja2<3.1,>=2.11
  Downloading Jinja2-3.0.3-py3-none-any.whl (133 kB)
Collecting bincopy<17.11,>=17.10.2
  Downloading bincopy-17.10.3-py3-none-any.whl (17 kB)
Collecting fastjsonschema>=2.15.1
  Downloading fastjsonschema-2.16.3-py3-none-any.whl (23 kB)
Collecting astunparse<2,>=1.6
  Downloading astunparse-1.6.3-py2.py3-none-any.whl (12 kB)
Collecting oscrypto~=1.2
  Downloading oscrypto-1.3.0-py2.py3-none-any.whl (194 kB)
Collecting deepmerge==0.3.0
  Downloading deepmerge-0.3.0-py2.py3-none-any.whl (7.6 kB)
Collecting pyocd<=0.31.0,>=0.28.3
  Downloading pyocd-0.31.0-py3-none-any.whl (12.5 MB)
Collecting click-option-group<0.6,>=0.3.0
  Downloading click_option_group-0.5.5-py3-none-any.whl (12 kB)
Collecting pycryptodome<4,>=3.9.3
  Downloading pycryptodome-3.17-cp35-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (2.1 MB)
Collecting pyocd-pemicro<1.2.0,>=1.1.1
  Downloading pyocd_pemicro-1.1.5-py3-none-any.whl (9.0 kB)
Requirement already satisfied: colorama<1,>=0.4.4 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (0.4.4)
Collecting commentjson<1,>=0.9
  Downloading commentjson-0.9.0.tar.gz (8.7 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: asn1crypto<2,>=1.2 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.0)
Collecting pypemicro<0.2.0,>=0.1.9
  Downloading pypemicro-0.1.11-py3-none-any.whl (5.7 MB)
Collecting libusbsio>=2.1.11
  Downloading libusbsio-2.1.11-py3-none-any.whl (247 kB)
Collecting sly==0.4
  Downloading sly-0.4.tar.gz (60 kB)
  Preparing metadata (setup.py) ... done
Collecting ruamel.yaml<0.18.0,>=0.17
  Downloading ruamel.yaml-0.17.21-py3-none-any.whl (109 kB)
Collecting cmsis-pack-manager<0.3.0
  Downloading cmsis_pack_manager-0.2.10-py2.py3-none-manylinux1_x86_64.whl (25.1 MB)
Collecting click-command-tree==1.1.0
  Downloading click_command_tree-1.1.0-py3-none-any.whl (3.6 kB)
Requirement already satisfied: bitstring<3.2,>=3.1 in /usr/lib/python3/dist-packages (from spsdk<1.8.0,>=1.7.0->pynitrokey) (3.1.7)
Collecting hexdump~=3.3
  Downloading hexdump-3.3.zip (12 kB)
  Preparing metadata (setup.py) ... done
Collecting fire
  Downloading fire-0.5.0.tar.gz (88 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: wheel<1.0,>=0.23.0 in /usr/lib/python3/dist-packages (from astunparse<2,>=1.6->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.37.1)
Collecting humanfriendly
  Downloading humanfriendly-10.0-py2.py3-none-any.whl (86 kB)
Collecting argparse-addons>=0.4.0
  Downloading argparse_addons-0.12.0-py3-none-any.whl (3.3 kB)
Collecting pyelftools
  Downloading pyelftools-0.29-py2.py3-none-any.whl (174 kB)
Collecting milksnake>=0.1.2
  Downloading milksnake-0.1.5-py2.py3-none-any.whl (9.6 kB)
Requirement already satisfied: appdirs>=1.4 in /usr/lib/python3/dist-packages (from cmsis-pack-manager<0.3.0->spsdk<1.8.0,>=1.7.0->pynitrokey) (1.4.4)
Collecting lark-parser<0.8.0,>=0.7.1
  Downloading lark-parser-0.7.8.tar.gz (276 kB)
  Preparing metadata (setup.py) ... done
Requirement already satisfied: MarkupSafe>=2.0 in /usr/lib/python3/dist-packages (from jinja2<3.1,>=2.11->spsdk<1.8.0,>=1.7.0->pynitrokey) (2.0.1)
Collecting asn1crypto<2,>=1.2
  Downloading asn1crypto-1.5.1-py2.py3-none-any.whl (105 kB)
Collecting wrapt
  Downloading wrapt-1.15.0-cp310-cp310-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (78 kB)
Collecting future
  Downloading future-0.18.3.tar.gz (840 kB)
  Preparing metadata (setup.py) ... done
Collecting psutil>=5.2.2
  Downloading psutil-5.9.4-cp36-abi3-manylinux_2_12_x86_64.manylinux2010_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (280 kB)
Collecting capstone<5.0,>=4.0
  Downloading capstone-4.0.2-py2.py3-none-manylinux1_x86_64.whl (2.1 MB)
Collecting naturalsort<2.0,>=1.5
  Downloading naturalsort-1.5.1.tar.gz (7.4 kB)
  Preparing metadata (setup.py) ... done
Collecting prettytable<3.0,>=2.0
  Downloading prettytable-2.5.0-py3-none-any.whl (24 kB)
Collecting intervaltree<4.0,>=3.0.2
  Downloading intervaltree-3.1.0.tar.gz (32 kB)
  Preparing metadata (setup.py) ... done
Collecting ruamel.yaml.clib>=0.2.6
  Downloading ruamel.yaml.clib-0.2.7-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.manylinux_2_24_x86_64.whl (485 kB)
Collecting termcolor
  Downloading termcolor-2.2.0-py3-none-any.whl (6.6 kB)
Collecting sortedcontainers<3.0,>=2.0
  Downloading sortedcontainers-2.4.0-py2.py3-none-any.whl (29 kB)
Requirement already satisfied: wcwidth in /usr/lib/python3/dist-packages (from prettytable<3.0,>=2.0->pyocd<=0.31.0,>=0.28.3->spsdk<1.8.0,>=1.7.0->pynitrokey) (0.2.5)
Building wheels for collected packages: nrfutil, crcmod, sly, tlv8, commentjson, hexdump, pyspinel, fire, intervaltree, lark-parser, naturalsort, future
  Building wheel for nrfutil (setup.py) ... done
  Created wheel for nrfutil: filename=nrfutil-6.1.7-py3-none-any.whl size=898520 sha256=de6f8803f51d6c26d24dc7df6292064a468ff3f389d73370433fde5582b84a10
  Stored in directory: /home/jas/.cache/pip/wheels/39/2b/9b/98ab2dd716da746290e6728bdb557b14c1c9a54cb9ed86e13b
  Building wheel for crcmod (setup.py) ... done
  Created wheel for crcmod: filename=crcmod-1.7-cp310-cp310-linux_x86_64.whl size=31422 sha256=5149ac56fcbfa0606760eef5220fcedc66be560adf68cf38c604af3ad0e4a8b0
  Stored in directory: /home/jas/.cache/pip/wheels/85/4c/07/72215c529bd59d67e3dac29711d7aba1b692f543c808ba9e86
  Building wheel for sly (setup.py) ... done
  Created wheel for sly: filename=sly-0.4-py3-none-any.whl size=27352 sha256=f614e413918de45c73d1e9a8dca61ca07dc760d9740553400efc234c891f7fde
  Stored in directory: /home/jas/.cache/pip/wheels/a2/23/4a/6a84282a0d2c29f003012dc565b3126e427972e8b8157ea51f
  Building wheel for tlv8 (setup.py) ... done
  Created wheel for tlv8: filename=tlv8-0.10.0-py3-none-any.whl size=11266 sha256=3ec8b3c45977a3addbc66b7b99e1d81b146607c3a269502b9b5651900a0e2d08
  Stored in directory: /home/jas/.cache/pip/wheels/e9/35/86/66a473cc2abb0c7f21ed39c30a3b2219b16bd2cdb4b33cfc2c
  Building wheel for commentjson (setup.py) ... done
  Created wheel for commentjson: filename=commentjson-0.9.0-py3-none-any.whl size=12092 sha256=28b6413132d6d7798a18cf8c76885dc69f676ea763ffcb08775a3c2c43444f4a
  Stored in directory: /home/jas/.cache/pip/wheels/7d/90/23/6358a234ca5b4ec0866d447079b97fedf9883387d1d7d074e5
  Building wheel for hexdump (setup.py) ... done
  Created wheel for hexdump: filename=hexdump-3.3-py3-none-any.whl size=8913 sha256=79dfadd42edbc9acaeac1987464f2df4053784fff18b96408c1309b74fd09f50
  Stored in directory: /home/jas/.cache/pip/wheels/26/28/f7/f47d7ecd9ae44c4457e72c8bb617ef18ab332ee2b2a1047e87
  Building wheel for pyspinel (setup.py) ... done
  Created wheel for pyspinel: filename=pyspinel-1.0.3-py3-none-any.whl size=65033 sha256=01dc27f81f28b4830a0cf2336dc737ef309a1287fcf33f57a8a4c5bed3b5f0a6
  Stored in directory: /home/jas/.cache/pip/wheels/95/ec/4b/6e3e2ee18e7292d26a65659f75d07411a6e69158bb05507590
  Building wheel for fire (setup.py) ... done
  Created wheel for fire: filename=fire-0.5.0-py2.py3-none-any.whl size=116951 sha256=3d288585478c91a6914629eb739ea789828eb2d0267febc7c5390cb24ba153e8
  Stored in directory: /home/jas/.cache/pip/wheels/90/d4/f7/9404e5db0116bd4d43e5666eaa3e70ab53723e1e3ea40c9a95
  Building wheel for intervaltree (setup.py) ... done
  Created wheel for intervaltree: filename=intervaltree-3.1.0-py2.py3-none-any.whl size=26119 sha256=5ff1def22ba883af25c90d90ef7c6518496fcd47dd2cbc53a57ec04cd60dc21d
  Stored in directory: /home/jas/.cache/pip/wheels/fa/80/8c/43488a924a046b733b64de3fac99252674c892a4c3801c0a61
  Building wheel for lark-parser (setup.py) ... done
  Created wheel for lark-parser: filename=lark_parser-0.7.8-py2.py3-none-any.whl size=62527 sha256=3d2ec1d0f926fc2688d40777f7ef93c9986f874169132b1af590b6afc038f4be
  Stored in directory: /home/jas/.cache/pip/wheels/29/30/94/33e8b58318aa05cb1842b365843036e0280af5983abb966b83
  Building wheel for naturalsort (setup.py) ... done
  Created wheel for naturalsort: filename=naturalsort-1.5.1-py3-none-any.whl size=7526 sha256=bdecac4a49f2416924548cae6c124c85d5333e9e61c563232678ed182969d453
  Stored in directory: /home/jas/.cache/pip/wheels/a6/8e/c9/98cfa614fff2979b457fa2d9ad45ec85fa417e7e3e2e43be51
  Building wheel for future (setup.py) ... done
  Created wheel for future: filename=future-0.18.3-py3-none-any.whl size=492037 sha256=57a01e68feca2b5563f5f624141267f399082d2f05f55886f71b5d6e6cf2b02c
  Stored in directory: /home/jas/.cache/pip/wheels/5e/a9/47/f118e66afd12240e4662752cc22cefae5d97275623aa8ef57d
Successfully built nrfutil crcmod sly tlv8 commentjson hexdump pyspinel fire intervaltree lark-parser naturalsort future
Installing collected packages: tlv8, sortedcontainers, sly, pyserial, pyelftools, piccata, naturalsort, libusb1, lark-parser, intelhex, hexdump, fastjsonschema, crcmod, asn1crypto, wrapt, urllib3, typing_extensions, tqdm, termcolor, ruamel.yaml.clib, python-dateutil, pyspinel, pypemicro, pycryptodome, psutil, protobuf, prettytable, oscrypto, milksnake, libusbsio, jinja2, intervaltree, humanfriendly, future, frozendict, fido2, ecdsa, deepmerge, commentjson, click-option-group, click-command-tree, capstone, astunparse, argparse-addons, ruamel.yaml, pyocd-pemicro, pylink-square, pc_ble_driver_py, fire, cmsis-pack-manager, bincopy, pyocd, nrfutil, nkdfu, spsdk, pynitrokey
  WARNING: The script nitropy is installed in '/home/jas/.local/bin' which is not on PATH.
  Consider adding this directory to PATH or, if you prefer to suppress this warning, use --no-warn-script-location.
Successfully installed argparse-addons-0.12.0 asn1crypto-1.5.1 astunparse-1.6.3 bincopy-17.10.3 capstone-4.0.2 click-command-tree-1.1.0 click-option-group-0.5.5 cmsis-pack-manager-0.2.10 commentjson-0.9.0 crcmod-1.7 deepmerge-0.3.0 ecdsa-0.18.0 fastjsonschema-2.16.3 fido2-1.1.0 fire-0.5.0 frozendict-2.3.5 future-0.18.3 hexdump-3.3 humanfriendly-10.0 intelhex-2.3.0 intervaltree-3.1.0 jinja2-3.0.3 lark-parser-0.7.8 libusb1-1.9.3 libusbsio-2.1.11 milksnake-0.1.5 naturalsort-1.5.1 nkdfu-0.2 nrfutil-6.1.7 oscrypto-1.3.0 pc_ble_driver_py-0.17.0 piccata-2.0.3 prettytable-2.5.0 protobuf-3.20.3 psutil-5.9.4 pycryptodome-3.17 pyelftools-0.29 pylink-square-0.11.1 pynitrokey-0.4.34 pyocd-0.31.0 pyocd-pemicro-1.1.5 pypemicro-0.1.11 pyserial-3.5 pyspinel-1.0.3 python-dateutil-2.7.5 ruamel.yaml-0.17.21 ruamel.yaml.clib-0.2.7 sly-0.4 sortedcontainers-2.4.0 spsdk-1.7.1 termcolor-2.2.0 tlv8-0.10.0 tqdm-4.65.0 typing_extensions-4.3.0 urllib3-1.26.15 wrapt-1.15.0
jas@kaka:~$

Then upgrading the device worked remarkable well, although I wish that the tool would have printed URLs and checksums for the firmware files to allow easy confirmation.

jas@kaka:~$ PATH=$PATH:/home/jas/.local/bin
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.15-5D271572: Nitrokey Nitrokey Start (RTM.12.1-RC2-modified)
jas@kaka:~$ nitropy start update
Command line tool to interact with Nitrokey devices 0.4.34
Nitrokey Start firmware update tool
Platform: Linux-5.15.0-67-generic-x86_64-with-glibc2.35
System: Linux, is_linux: True
Python: 3.10.6
Saving run log to: /tmp/nitropy.log.gc5753a8
Admin PIN: 
Firmware data to be used:
- FirmwareType.REGNUAL: 4408, hash: ...b'72a30389' valid (from ...built/RTM.13/regnual.bin)
- FirmwareType.GNUK: 129024, hash: ...b'25a4289b' valid (from ...prebuilt/RTM.13/gnuk.bin)
Currently connected device strings:
Device: 
    Vendor: Nitrokey
   Product: Nitrokey Start
    Serial: FSIJ-1.2.15-5D271572
  Revision: RTM.12.1-RC2-modified
    Config: *:*:8e82
       Sys: 3.0
     Board: NITROKEY-START-G
initial device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.15-5D271572', 'Revision': 'RTM.12.1-RC2-modified', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
Please note:
- Latest firmware available is: 
  RTM.13 (published: 2022-12-08T10:59:11Z)
- provided firmware: None
- all data will be removed from the device!
- do not interrupt update process - the device may not run properly!
- the process should not take more than 1 minute
Do you want to continue? [yes/no]: yes
...
Starting bootloader upload procedure
Device: Nitrokey Start FSIJ-1.2.15-5D271572
Connected to the device
Running update!
Do NOT remove the device from the USB slot, until further notice
Downloading flash upgrade program...
Executing flash upgrade...
Waiting for device to appear:
  Wait 20 seconds.....
Downloading the program
Protecting device
Finish flashing
Resetting device
Update procedure finished. Device could be removed from USB slot.
Currently connected device strings (after upgrade):
Device: 
    Vendor: Nitrokey
   Product: Nitrokey Start
    Serial: FSIJ-1.2.19-5D271572
  Revision: RTM.13
    Config: *:*:8e82
       Sys: 3.0
     Board: NITROKEY-START-G
device can now be safely removed from the USB slot
final device strings: [ 'name': '', 'Vendor': 'Nitrokey', 'Product': 'Nitrokey Start', 'Serial': 'FSIJ-1.2.19-5D271572', 'Revision': 'RTM.13', 'Config': '*:*:8e82', 'Sys': '3.0', 'Board': 'NITROKEY-START-G' ]
finishing session 2023-03-16 21:49:07.371291
Log saved to: /tmp/nitropy.log.gc5753a8
jas@kaka:~$ 
jas@kaka:~$ nitropy start list
Command line tool to interact with Nitrokey devices 0.4.34
:: 'Nitrokey Start' keys:
FSIJ-1.2.19-5D271572: Nitrokey Nitrokey Start (RTM.13)
jas@kaka:~$

Before importing the master key to this device, it should be configured. Note the commands in the beginning to make sure scdaemon/pcscd is not running because they may have cached state from earlier cards. Change PIN code as you like after this, my experience with Gnuk was that the Admin PIN had to be changed first, then you import the key, and then you change the PIN.

jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
OK
ERR 67125247 Slut p  fil <GPG Agent>
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas        11651  0.0  0.0   3468  1672 pts/0    R+   21:54   0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......: 
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE5D2715720000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q
gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
      If the key generation does not succeed, please check the
      documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
   (1) RSA
   (2) ECC
Your selection? 2
Please select which elliptic curve you want:
   (1) Curve 25519
   (4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card> 
jas@kaka:~$ gpg --card-edit
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
jas@kaka:~$

Once setup, bring out your offline machine and boot it and mount your USB stick with the offline key. The paths below will be different, and this is using a somewhat unorthodox approach of working with fresh GnuPG configuration paths that I chose for the USB stick.

jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ cp -a gnupghome-backup-masterkey gnupghome-import-nitrokey-5D271572
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$ gpg --homedir $PWD/gnupghome-import-nitrokey-5D271572 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
Secret key is available.
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Really move the primary key? (y/N) y
Please select where to store the key:
   (1) Signature key
   (3) Authentication key
Your selection? 1
sec  ed25519/D73CF638C53C06BE
     created: 2019-03-20  expired: 2019-10-22  usage: SC  
     trust: ultimate      validity: expired
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> 
Save changes? (y/N) y
jas@kaka:/media/jas/2c699cbd-b77e-4434-a0d6-0c4965864296$

At this point it is useful to confirm that the Nitrokey has the master key available and that is possible to sign statements with it, back on your regular machine:

jas@kaka:~$ gpg --card-status
Reader ...........: 20A0:4211:FSIJ-1.2.19-5D271572:0
Application ID ...: D276000124010200FFFE5D2715720000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 5D271572
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 1
KDF setting ......: on
Signature key ....: B1D2 BD13 75BE CB78 4CF4  F8C4 D73C F638 C53C 06BE
      created ....: 2019-03-20 23:37:24
Encryption key....: [none]
Authentication key: [none]
General key info..: pub  ed25519/D73CF638C53C06BE 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec>  ed25519/D73CF638C53C06BE  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 5D271572
ssb>  ed25519/80260EE8A9B92B2B  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  ed25519/51722B08FE4745A2  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
ssb>  cv25519/02923D7EE76EBD60  created: 2019-03-20  expires: 2023-09-19
                                card-no: FFFE 42315277
jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Thu Mar 16 22:11:02 2023 CET
gpg:                using EDDSA key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$

Finally to retrieve and sign a key, for example Andre Heinecke s that I could confirm the OpenPGP key identifier from his business card.

jas@kaka:~$ gpg --locate-external-keys aheinecke@gnupg.com
gpg: key 1FDF723CF462B6B1: public key "Andre Heinecke <aheinecke@gnupg.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
gpg: marginals needed: 3  completes needed: 1  trust model: pgp
gpg: depth: 0  valid:   2  signed:   7  trust: 0-, 0q, 0n, 0m, 0f, 2u
gpg: depth: 1  valid:   7  signed:  64  trust: 7-, 0q, 0n, 0m, 0f, 0u
gpg: next trustdb check due at 2023-05-26
pub   rsa3072 2015-12-08 [SC] [expires: 2025-12-05]
      94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1
uid           [ unknown] Andre Heinecke <aheinecke@gnupg.com>
sub   ed25519 2017-02-13 [S]
sub   ed25519 2017-02-13 [A]
sub   rsa3072 2015-12-08 [E] [expires: 2025-12-05]
sub   rsa3072 2015-12-08 [A] [expires: 2025-12-05]
jas@kaka:~$ gpg --edit-key "94A5C9A03C2FE5CA3B095D8E1FDF723CF462B6B1"
gpg (GnuPG) 2.2.27; Copyright (C) 2021 Free Software Foundation, Inc.
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
pub  rsa3072/1FDF723CF462B6B1
     created: 2015-12-08  expires: 2025-12-05  usage: SC  
     trust: unknown       validity: unknown
sub  ed25519/2978E9D40CBABA5C
     created: 2017-02-13  expires: never       usage: S   
sub  ed25519/DC74D901C8E2DD47
     created: 2017-02-13  expires: never       usage: A   
The following key was revoked on 2017-02-23 by RSA key 1FDF723CF462B6B1 Andre Heinecke <aheinecke@gnupg.com>
sub  cv25519/1FFE3151683260AB
     created: 2017-02-13  revoked: 2017-02-23  usage: E   
sub  rsa3072/8CC999BDAA45C71F
     created: 2015-12-08  expires: 2025-12-05  usage: E   
sub  rsa3072/6304A4B539CE444A
     created: 2015-12-08  expires: 2025-12-05  usage: A   
[ unknown] (1). Andre Heinecke <aheinecke@gnupg.com>
gpg> sign
pub  rsa3072/1FDF723CF462B6B1
     created: 2015-12-08  expires: 2025-12-05  usage: SC  
     trust: unknown       validity: unknown
 Primary key fingerprint: 94A5 C9A0 3C2F E5CA 3B09  5D8E 1FDF 723C F462 B6B1
     Andre Heinecke <aheinecke@gnupg.com>
This key is due to expire on 2025-12-05.
Are you sure that you want to sign this key with your
key "Simon Josefsson <simon@josefsson.org>" (D73CF638C53C06BE)
Really sign? (y/N) y
gpg> quit
Save changes? (y/N) y
jas@kaka:~$

This is on my day-to-day machine, using the NitroKey Start with the offline key. No need to boot the old offline machine just to sign keys or extend expiry anymore! At FOSDEM 23 I managed to get at least one DD signature on my new key, and the Debian keyring maintainers accepted my Ed25519 key. Hopefully I can now finally let my 2014-era RSA3744 key expire in 2023-09-19 and not extend it any further. This should finish my transition to a simpler OpenPGP key setup, yay!

1 February 2023

Simon Josefsson: Apt Archive Transparency: debdistdiff & apt-canary

I ve always found the operation of apt software package repositories to be a mystery. There appears to be a lack of transparency into which people have access to important apt package repositories out there, how the automatic non-human update mechanism is implemented, and what changes are published. I m thinking of big distributions like Ubuntu and Debian, but also the free GNU/Linux distributions like Trisquel and PureOS that are derived from the more well-known distributions. As far as I can tell, anyone who has the OpenPGP private key trusted by a apt-based GNU/Linux distribution can sign a modified Release/InRelease file and if my machine somehow downloads that version of the release file, my machine could be made to download and install packages that the distribution didn t intend me to install. Further, it seems that anyone who has access to the main HTTP server, or any of its mirrors, or is anywhere on the network between them and my machine (when plaintext HTTP is used), can either stall security updates on my machine (on a per-IP basis), or use it to send my machine (again, on a per-IP basis to avoid detection) a modified Release/InRelease file if they had been able to obtain the private signing key for the archive. These are mighty powers that warrant overview. I ve always put off learning about the processes to protect the apt infrastructure, mentally filing it under so many people rely on this infrastructure that enough people are likely to have invested time reviewing and improving these processes . Simultaneous, I ve always followed the more free-software friendly Debian-derived distributions such as gNewSense and have run it on some machines. I ve never put them into serious production use, because the trust issues with their apt package repositories has been a big question mark for me. The enough people part of my rationale for deferring this is not convincing. Even the simple question of is someone updating the apt repository is not easy to understand on a running gNewSense system. At some point in time the gNewSense cron job to pull in security updates from Debian must have stopped working, and I wouldn t have had any good mechanism to notice that. Most likely it happened without any public announcement. I ve recently switched to Trisquel on production machines, and these questions has come back to haunt me. The situation is unsatisfying and I looked into what could be done to improve it. I could try to understand who are the key people involved in each project, and may even learn what hardware component is used, or what software is involved to update and sign apt repositories. Is the server running non-free software? Proprietary BIOS or NIC firmware? Are the GnuPG private keys on disk? Smartcard? TPM? YubiKey? HSM? Where is the server co-located, and who has access to it? I tried to do a bit of this, and discovered things like Trisquel having a DSA1024 key in its default apt trust store (although for fairness, it seems that apt by default does not trust such signatures). However, I m not certain understanding this more would scale to securing my machines against attacks on this infrastructure. Even people with the best intentions, and the state of the art hardware and software, will have problems. To increase my trust in Trisquel I set out to understand how it worked. To make it easier to sort out what the interesting parts of the Trisquel archive to audit further were, I created debdistdiff to produce human readable text output comparing one apt archive with another apt archive. There is a GitLab CI/CD cron job that runs this every day, producing output comparing Trisquel vs Ubuntu and PureOS vs Debian. Working with these output files has made me learn more about how the process works, and I even stumbled upon something that is likely a bug where Trisquel aramo was imported from Ubuntu jammy while it contained a couple of package (e.g., gcc-8, python3.9) that were removed for the final Ubuntu jammy release. After working on auditing the Trisquel archive manually that way, I realized that whatever I could tell from comparing Trisquel with Ubuntu, it would only be something based on a current snapshot of the archives. Tomorrow it may look completely different. What felt necessary was to audit the differences of the Trisquel archive continously. I was quite happy to have developed debdistdiff for one purpose (comparing two different archives like Trisquel and Ubuntu) and discovered that the tool could be used for another purpose (comparing the Trisquel archive at two different points in time). At this time I realized that I needed a log of all different apt archive metadata to be able to produce an audit log of the differences in time for the archive. I create manually curated git-repositories with the Release/InRelease and the Packages files for each architecture/component of the well-known distributions Trisquel, Ubuntu, Debian and PureOS. Eventually I wrote scripts to automate this, which are now published in the debdistget project. At this point, one of the early question about per-IP substitution of Release files were lingering in my mind. However with the tooling I now had available, coming up with a way to resolve this was simple! Merely have apt compute a SHA256 checksum of the just downloaded InRelease file, and see if my git repository had the same file. At this point I started reading the Apt source code, and now I had more doubts about the security of my systems than I ever had before. Oh boy how the name Apt has never before felt more Apt?! Oh well, we must leave some exercises for the students. Eventually I realized I wanted to touch as little of apt code basis as possible, and noticed the SigVerify::CopyAndVerify function called ExecGPGV which called apt-key verify which called GnuPG s gpgv. By setting Apt::Key::gpgvcommand I could get apt-key verify to call another tool than gpgv. See where I m going? I thought wrapping this up would now be trivial but for some reason the hash checksum I computed locally never matched what was on my server. I gave up and started working on other things instead. Today I came back to this idea, and started to debug exactly how the local files looked that I got from apt and how they differed from what I had in my git repositories, that came straight from the apt archives. Eventually I traced this back to SplitClearSignedFile which takes an InRelease file and splits it into two files, probably mimicking the (old?) way of distributing both Release and Release.gpg. So the clearsigned InRelease file is split into one cleartext file (similar to the Release file) and one OpenPGP signature file (similar to the Release.gpg file). But why didn t the cleartext variant of the InRelease file hash to the same value as the hash of the Release file? Sadly they differ by the final newline. Having solved this technicality, wrapping the pieces up was easy, and I came up with a project apt-canary that provides a script apt-canary-gpgv that verify the local apt release files against something I call a apt canary witness file stored at a URL somewhere. I m now running apt-canary on my Trisquel aramo laptop, a Trisquel nabia server, and Talos II ppc64el Debian machine. This means I have solved the per-IP substitution worries (or at least made them less likely to occur, having to send the same malicious release files to both GitLab and my system), and allow me to have an audit log of all release files that I actually use for installing and downloading packages. What do you think? There are clearly a lot of work and improvements to be made. This is a proof-of-concept implementation of an idea, but instead of refining it until perfection and delaying feedback, I wanted to publish this to get others to think about the problems and various ways to resolve them. Btw, I m going to be at FOSDEM 23 this weekend, helping to manage the Security Devroom. Catch me if you want to chat about this or other things. Happy Hacking!

Next.