Search Results: "Niels Thykier"

7 July 2024

Niels Thykier: Improving packaging file detection in Debian

Debian packaging consists of a directory (debian/) containing a number of "hard-coded" filenames such as debian/control, debian/changelog, debian/copyright. In addition to these, many packages will also use a number of optional files that are named via a pattern such as debian/ PACKAGE .install. At a high level the patterns looks deceptively simple. However, if you start working on trying to automatically classify files in debian/ (which could be helpful to tell the user they have a typo in the filename), you will quickly realize these patterns are not machine friendly at all for this purpose.
The patterns deconstructed To appreciate the problem fully, here is a primer on the pattern and all its issues. If you are already well-versed in these, you might want to skip the section. The most common patterns are debian/package.stem or debian/stem and usually the go to example is the install stem ( a concrete example being debian/debhelper.install). However, the full pattern consists of 4 parts where 3 of them are optional.
  • The package name followed by a period. Optional, but must be the first if present.
  • The name segment followed by a period. Optional, but must appear between the package name (if present) and the stem. If the package name is not present, then the name segment must be first.
  • The stem. Mandatory.
  • An architecture restriction prefixed by a period. Optional, must appear after the stem if present.
To visualize it with [foo] to mark optional parts, it looks like debian/[PACKAGE.][NAME.]STEM[.ARCH] Detecting whether a given file is in fact a packaging file now boils down to reverse engineering its name against this pattern. Again, so far, it might still look manageable. One major complication is that every part (except ARCH) can contain periods. So a trivial "split by period" is not going to cut it. As an example:
debian/g++-3.0.user.service
This example is deliberately crafted to be ambiguous and show this problem in its full glory. This file name can be in multiple ways:
  • Is the stem service or user.service? (both are known stems from dh_installsystemd and dh_installsystemduser respectively). In fact, it can be both at the same time with "clever" usage of --name=user passed to dh_installsystemd.
  • The g++-3.0 can be a package prefix or part of the name segment. Even if there is a g++-3.0 package in debian/control, then debhelper (until compat 15) will still happily match this file for the main package if you pass --name=g++-3.0 to the helper. Side bar: Woe is you if there is a g++-3 and a g++-3.0 package in debian/control, then we have multiple options for the package prefix! Though, I do not think that happens in practice.
Therefore, there are a lot of possible ways to split this filename that all matches the pattern but with vastly different meaning and consequences.
Making detection practical To make this detection practical, lets look at the first problems that we need to solve.
  1. We need the possible stems up front to have a chance at all. When multiple stems are an option, go for the longest match (that is, the one with most periods) since --name is rare and "code golfing" is even rarer.
  2. We can make the package prefix mandatory for files with the name segment. This way, the moment there is something before the stem, we know the package prefix will be part of it and can cut it. It does not solve the ambiguity if one package name is a prefix of another package name (from the same source), but it still a lot better. This made its way into debhelper compat 15 and now it is "just" a slow long way to a better future.
A simple solution to the first problem could be to have a static list of known stems. That will get you started but the debhelper eco-system strive on decentralization, so this feels like a mismatch. There is also a second problem with the static list. Namely, a given stem is only "valid" if the command in question is actually in use. Which means you now need to dumpster dive into the mess that is Turning-complete debhelper configuration file known as debian/rules to fully solve that. Thanks to the Turning-completeness, we will never get a perfect solution for a static analysis. Instead, it is time to back out and instead apply some simplifications. Here is a sample flow:
  1. Check whether the dh sequencer is used. If so, use some heuristics to figure out which addons are used.
  2. Delegate to dh_assistant to figure out which commands will be used and which debhelper config file stems it knows about. Here we need to know which sequences are in use from step one (if relevant). Combine this with any other sources for stems you have.
  3. Deconstruct all files in debian/ against the stems and known package names from debian/control. In theory, dumpster diving after --name options would be helpful here, but personally I skipped that part as I want to keep my debian/rules parsing to an absolute minimum.
With this logic, you can now:
  • Provide typo detection of the stem (debian/foo.intsall -> debian/foo.install) provided to have adequate handling of the corner cases (such as debian/*.conf not needing correction into debian/*.config)
  • Detect possible invalid package prefix (debian/foo.install without foo being a package). Note this has to be a weak warning unless the package is using debhelper compat 15 or you dumpster dived to validate that dh_install was not passed dh_install --name foo. Agreed, no one should do that, but they can and false positives are the worst kind of positives for a linting tool.
  • With some limitations, detect files used without the relevant command being active. As an example, the some integration modes of debputy removes dh_install, so a debian/foo.install would not be used.
  • Associate a given file with a given command to assist users with the documentation look up. Like debian/foo.user.service is related to dh_installsystemduser, so man dh_installsystemduser is a natural start for documentation.
I have added the logic for all these features in debputy though the documentation association is currently not in a user facing command. All the others are now diagnostics emitted by debputy in its editor support mode (debputy lsp server) or via debputy lint. In the editor mode, the diagnostics are currently associated with the package name in debian/control due to technical limitations of how the editor integration works. Some of these features will the latest version of debhelper (moving target at times). Check with debputy lsp features for the Extra dh support feature, which will be enabled if you got all you need. Note: The detection is currently (mostly) ignoring files with architecture restrictions. That might be lifted in the future. However, architecture restricted config files tend to be rare, so they were not a priority at this point. Additionally, debputy for technical reasons ignores stem typos with multiple matches. That sadly means that typos of debian/docs will often be unreported due to its proximity to debian/dirs and vice versa.
Diving a bit deeper on getting the stems To get the stems, debputy has 3 primary sources:
  1. Its own plugins can provide packager provided files. These are only relevant if the package is using debputy.
  2. It is als possible to provide a debputy plugin that identifies packaging files (either static or named ones). Though in practice, we probably do not want people to roll their own debputy plugin for this purpose, since the detection only works if the plugin is installed. I have used this mechanism to have debhelper provide a debhelper-documentation plugin to enrich the auto-detected data and we can assume most people interested in this feature would have debhelper installed.
  3. It asks dh_assistant list-guessed-dh-config-files for config files, which is covered below.
The dh_assistant command uses the same logic as dh to identify the active add-ons and loads them. From there, it scans all commands mentioned in the sequence for the PROMISE: DH NOOP WITHOUT ...-hint and a new INTROSPECTABLE: CONFIG-FILES ...-hint. When these hints reference a packaging file (as an example, via pkgfile(foo)) then dh_assistant records that as a known packaging file for that helper. Additionally, debhelper now also tracks commands that were removed from the sequence. Several of the dh_assistant subcommand now use this to enrich their (JSON) output with notes about these commands being known but not active.
The end result With all of this work, you now get:
$ apt satisfy 'dh-debputy (>= 0.1.43~), debhelper (>= 13.16~), python3-lsprotocol, python3-levenshtein'
# For demo purposes, pull two known repos (feel free to use your own packages here)
$ git clone https://salsa.debian.org/debian/debhelper.git -b debian/13.16
$ git clone https://salsa.debian.org/debian/debputy.git -b debian/0.1.43
$ cd debhelper
$ mv debian/debhelper.install debian/debhelper.intsall
$ debputy lint
warning: File: debian/debhelper.intsall:1:0:1:0: The file "debian/debhelper.intsall" is likely a typo of "debian/debhelper.install"
    File-level diagnostic
$ mv debian/debhelper.intsall debian/debhleper.install
$ debputy lint
warning: File: debian/debhleper.install:1:0:1:0: Possible typo in "debian/debhleper.install". Consider renaming the file to "debian/debhelper.debhleper.install" or "debian/debhelper.install" if it is intended for debhelper
    File-level diagnostic
$ cd ../debputy
$ touch debian/install
$ debputy lint --no-warn-about-check-manifest
warning: File: debian/install:1:0:1:0: The file debian/install is related to a command that is not active in the dh sequence with the current addons
    File-level diagnostic
As mentioned, you also get these diagnostics in the editor via the debputy lsp server feature. Here the diagnostics appear in debian/control over the package name for technical reasons. The editor side still needs a bit more work. Notably, changes to the filename is not triggered automatically and will first be caught on the next change to debian/control. Likewise, changes to debian/rules to add --with to dh might also have some limitations depending on the editor. Saving both files and then triggering an edit of debian/control seems to work reliable but ideally it should not be that involved. The debhelper side could also do with some work to remove the unnecessary support for the name segment with many file stems that do not need them and announce that to debputy. Anyhow, it is still a vast improvement over the status quo that was "Why is my file silently ignored!?".

2 July 2024

Colin Watson: Free software activity in June 2024

My Debian contributions this month were all sponsored by Freexian. You can support my work directly via Liberapay.

1 July 2024

Niels Thykier: Debian packaging with style black

When I started working on the language server for debputy, one of several reasons was about automatic applying a formatting style. Such that you would not have to remember to manually reformat the file. One of the problems with supporting automatic formatting is that no one agrees on the "one true style". To make this concrete, Johannes Schauer Marin Rodrigues did the numbers of which wrap-and-sort option that are most common in https://bugs.debian.org/895570#46. Unsurprising, we end up with 14-15 different styles with various degrees of popularity. To make matters worse, wrap-and-sort does not provide a way to declare "this package uses options -sat". So that begged the question, how would debputy know which style it should use when it was going to reformat file. After a couple of false-starts, Christian Hofstaedtler mentioned that we could just have a field in debian/control for supporting a "per-package" setting in responds to my concern about adding a new "per-package" config file. At first, I was not happy with it, because how would you specify all of these options in a field (in a decent manner)? But then I realized that one I do not want all these styles and that I could start simpler. The Python code formatter black is quite successful despite not having a lot of personalized style options. In fact, black makes a statement out of not allowing a lot of different styles. Combing that, the result was X-Style: black (to be added to the Source stanza of debian/control), which every possible reference to the black tool for how styling would work. Namely, you outsource the style management to the tool (debputy) and then start using your focus on something else than discussing styles. As with black, this packaging formatting style is going to be opinionated and it will evolve over time. At the starting point, it is similar to wrap-and-sort -sat for the deb822 files (debputy does not reformat other files at the moment). But as mentioned, it will likely evolve and possible diverge from wrap-and-sort over time. The choice of the starting point was based on the numbers posted by Johannes #895570. It was not my personal favorite but it seemed to have a majority and is also close to the one suggested by salsa pipeline maintainers. The delta being -kb which I had originally but removed in 0.1.34 at request of Otto Kek l inen after reviewing the numbers from Johannes one more time. To facilitate this new change, I uploaded debputy/0.1.30 (a while back) to Debian unstable with the following changes:
  • Support for the X-Style: black header.
  • When a style is defined, the debputy lsp server command will now automatically reformat deb822 files on save (if the editor supports it) or on explicit "reformat file" request from the editor (usually indirectly from the user).
  • New subcommand debputy reformat command that will reformat the files, when a style is defined.
  • A new pre-commit hook repo to run debputy lint and debputy reformat. These hooks are available from https://salsa.debian.org/debian/debputy-pre-commit-hooks version v0.1 and can be used with the pre-commit tool (from the package of same name).
The obvious omission is a salsa-pipeline feature for this. Otto has put that on to his personal todo list and I am looking forward to that.
Beyond black Another thing I dislike about our existing style tooling is that if you run wrap-and-sort without any arguments, you have a higher probability of "trashing" the style of the current package than getting the desired result. Part of this is because wrap-and-sort's defaults are out of sync with the usage (which is basically what https://bugs.debian.org/895570 is about). But I see another problem. The wrap-and-sort tool explicitly defined options to tweak the style but provided maintainers no way to record their preference in any machine readable way. The net result is that we have tons of diverging styles and that you (as a user of wrap-and-sort) have to manually tell wrap-and-sort which style you want every time you run the tool. In my opinion that is not playing to the strengths of neither human nor machine. Rather, it is playing to the weaknesses of the human if anything at all. But the salsa-CI pipeline people also ran into this issue and decided to work around this deficiency. To use wrap-and-sort in the salsa-CI pipeline, you have to set a variable to activate the job and another variable with the actual options you want. The salsa-CI pipeline is quite machine readable and wrap-and-sort is widely used. I had debputy reformat also check for the salsa-CI variables as a fallback. This fallback also works for the editor mode (debputy lsp server), so you might not even have to run debputy reformat. :) This was a deliberate trade-off. While I do not want all us to have all these options, I also want Debian packaging to be less painful and have fewer paper cuts. Having debputy go extra lengths to meet wrap-and-sort users where they are came out as the better solution for me. A nice side-effect of this trade-off is that debputy reformat now a good tool for drive-by contributors. You can safely run debputy reformat on any package and either it will apply the styling or it will back out and inform you that no obvious style was detected. In the latter case, you would have to fallback to manually deducing the style and applying it.
Differences to wrap-and-sort The debputy reformat has some limitations or known differences to wrap-and-sort. Notably, debputy reformat (nor debputy lsp server) will not invoke wrap-and-sort. Instead, debputy has its own reformatting engine that provides similar features. One reason for not running wrap-and-sort is that I want debputy reformat to match the style that debputy lsp server will give you. That way, you get consistent style across all debputy commands. Another reason is that it is important to me that reformatting is safe and does not change semantics. This leads to two regrettable known differences to the wrap-and-sort behavior due to safety in addition to one scope limitation in debputy:
  1. debputy will ignore requests to sort the stanzas when the "keep first" option is disabled (-b --no-keep-first). This combination is unsafe reformatting. I feel it was a mistake for wrap-and-sort to ever allow this but at least it is no longer the default (-b is now -bk by default). This will be less of a problem in debhelper-compat 15, since the concept of "main package" will disappear and all multi-binary source packages will be required to use debian/package.install rather than debian/install.
  2. debputy will not reorder the contents of debhelper packaging files such as debian/install. This is also an (theoretical) unsafe thing to do. While the average package will not experience issues with this, there are rare corner cases where the re-ordering can affect the end result. I happen to know this, because I ran into issues when trying to optimize dh_install in a way that assumed the order did not matter. Stuff broke and there is now special-case code in dh_install to back out of that optimization when that happens.
  3. debputy has a limited list of wrap-and-sort options it understands. Some options may cause debputy to back out and disable reformatting entirely with a remark that it cannot apply that style. If you run into a case of this, feel free to file a feature request to support it. I will not promise to support everything, but if it is safe and trivially doable with the engine already, then I probably will.
As stated, where debputy cannot implement the wrap-and-sort styles fully, then it will currently implement a subset that is safe if that can be identified or back out entirely of the formatting when it cannot. In all cases, debputy will not break the formatting if it is correct. It may just fail at correcting one aspect of the wrap-and-sort style if you happen to get it wrong. It is also important to remember that the prerequisite for debputy applying any wrap-and-sort style is that you have set the salsa-CI pipeline variables to trigger wrap-and-sort with the salsa-CI pipeline. So there is still a CI check before the merge that will run the wrap-and-sort in its full glory that provides the final safety net for you.
Just give me a style In conclusion, if you, like me, are more interested in getting a consistent style rather than discussing what that style should be, now you can get that with X-Style: black. You can also have your custom wrap-and-sort style be picked up automatically for drive-by contributors.
$ apt satisfy 'dh-debputy (>= 0.1.30), python3-lsprotocol'
# Add  X-Style: black  to  debian/control  for "just give me a style"
#
# OR, if there is a specific  wrap-and-sort  style for you then set
# SALSA_CI_DISABLE_WRAP_AND_SORT=no plus set relevant options in
# SALSA_CI_WRAP_AND_SORT_ARGS in debian/salsa-ci.yml (or .gitlab-ci.yml)
$ debputy reformat
It is sadly not yet in the salsa-ci pipeline. Otto is looking into that and hopefully we will have it soon. :) And if you find yourself often doing archive-wide contributions and is tired of having to reverse engineer package formatting styles, consider using debputy reformat or debputy lsp server. If you use debputy in this way, please consider providing feedback on what would help you.

24 March 2024

Niels Thykier: debputy v0.1.21

Earlier today, I have just released debputy version 0.1.21 to Debian unstable. In the blog post, I will highlight some of the new features.
Package boilerplate reduction with automatic relationship substvar Last month, I started a discussion on rethinking how we do relationship substvars such as the $ misc:Depends . These generally ends up being boilerplate runes in the form of Depends: $ misc:Depends , $ shlibs:Depends where you as the packager has to remember exactly which runes apply to your package. My proposed solution was to automatically apply these substvars and this feature has now been implemented in debputy. It is also combined with the feature where essential packages should use Pre-Depends by default for dpkg-shlibdeps related dependencies. I am quite excited about this feature, because I noticed with libcleri that we are now down to 3-5 fields for defining a simple library package. Especially since most C library packages are trivial enough that debputy can auto-derive them to be Multi-Arch: same. As an example, the libcleric1 package is down to 3 fields (Package, Architecture, Description) with Section and Priority being inherited from the Source stanza. I have submitted a MR to show case the boilerplate reduction at https://salsa.debian.org/siridb-team/libcleri/-/merge_requests/3. The removal of libcleric1 (= $ binary:Version ) in that MR relies on another existing feature where debputy can auto-derive a dependency between an arch:any -dev package and the library package based on the .so symlink for the shared library. The arch:any restriction comes from the fact that arch:all and arch:any packages are not built together, so debputy cannot reliably see across the package boundaries during the build (and therefore refuses to do so at all). Packages that have already migrated to debputy can use debputy migrate-from-dh to detect any unnecessary relationship substitution variables in case you want to clean up. The removal of Multi-Arch: same and intra-source dependencies must be done manually and so only be done so when you have validated that it is safe and sane to do. I was willing to do it for the show-case MR, but I am less confident that would bother with these for existing packages in general. Note: I summarized the discussion of the automatic relationship substvar feature earlier this month in https://lists.debian.org/debian-devel/2024/03/msg00030.html for those who want more details. PS: The automatic relationship substvars feature will also appear in debhelper as a part of compat 14.
Language Server (LSP) and Linting I have long been frustrated by our poor editor support for Debian packaging files. To this end, I started working on a Language Server (LSP) feature in debputy that would cover some of our standard Debian packaging files. This release includes the first version of said language server, which covers the following files:
  • debian/control
  • debian/copyright (the machine readable variant)
  • debian/changelog (mostly just spelling)
  • debian/rules
  • debian/debputy.manifest (syntax checks only; use debputy check-manifest for the full validation for now)
Most of the effort has been spent on the Deb822 based files such as debian/control, which comes with diagnostics, quickfixes, spellchecking (but only for relevant fields!), and completion suggestions. Since not everyone has a LSP capable editor and because sometimes you just want diagnostics without having to open each file in an editor, there is also a batch version for the diagnostics via debputy lint. Please see debputy(1) for how debputy lint compares with lintian if you are curious about which tool to use at what time. To help you getting started, there is a now debputy lsp editor-config command that can provide you with the relevant editor config glue. At the moment, emacs (via eglot) and vim with vim-youcompleteme are supported. For those that followed the previous blog posts on writing the language server, I would like to point out that the command line for running the language server has changed to debputy lsp server and you no longer have to tell which format it is. I have decided to make the language server a "polyglot" server for now, which I will hopefully not regret... Time will tell. :) Anyhow, to get started, you will want:
$ apt satisfy 'dh-debputy (>= 0.1.21~), python3-pygls'
# Optionally, for spellchecking
$ apt install python3-hunspell hunspell-en-us
# For emacs integration
$ apt install elpa-dpkg-dev-el markdown-mode-el
# For vim integration via vim-youcompleteme
$ apt install vim-youcompleteme
Specifically for emacs, I also learned two things after the upload. First, you can auto-activate eglot via eglot-ensure. This badly feature interacts with imenu on debian/changelog for reasons I do not understand (causing a several second start up delay until something times out), but it works fine for the other formats. Oddly enough, opening a changelog file and then activating eglot does not trigger this issue at all. In the next version, editor config for emacs will auto-activate eglot on all files except debian/changelog. The second thing is that if you install elpa-markdown-mode, emacs will accept and process markdown in the hover documentation provided by the language server. Accordingly, the editor config for emacs will also mention this package from the next version on. Finally, on a related note, Jelmer and I have been looking at moving some of this logic into a new package called debpkg-metadata. The point being to support easier reuse of linting and LSP related metadata - like pulling a list of known fields for debian/control or sharing logic between lintian-brush and debputy.
Minimal integration mode for Rules-Requires-Root One of the original motivators for starting debputy was to be able to get rid of fakeroot in our build process. While this is possible, debputy currently does not support most of the complex packaging features such as maintscripts and debconf. Unfortunately, the kind of packages that need fakeroot for static ownership tend to also require very complex packaging features. To bridge this gap, the new version of debputy supports a very minimal integration with dh via the dh-sequence-zz-debputy-rrr. This integration mode keeps the vast majority of debhelper sequence in place meaning most dh add-ons will continue to work with dh-sequence-zz-debputy-rrr. The sequence only replaces the following commands:
  • dh_fixperms
  • dh_gencontrol
  • dh_md5sums
  • dh_builddeb
The installations feature of the manifest will be disabled in this integration mode to avoid feature interactions with debhelper tools that expect debian/<pkg> to contain the materialized package. On a related note, the debputy migrate-from-dh command now supports a --migration-target option, so you can choose the desired level of integration without doing code changes. The command will attempt to auto-detect the desired integration from existing package features such as a build-dependency on a relevant dh sequence, so you do not have to remember this new option every time once the migration has started. :)

13 March 2024

Freexian Collaborators: Debian Contributions: Upcoming Improvements to Salsa CI, /usr-move, packaging simplemonitor, and more! (by Utkarsh Gupta)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

/usr-move, by Helmut Grohne Much of the work was spent on handling interaction with time time64 transition and sending patches for mitigating fallout. The set of packages relevant to debootstrap is mostly converted and the patches for glibc and base-files have been refined due to feedback from the upload to Ubuntu noble. Beyond this, he sent patches for all remaining packages that cannot move their files with dh-sequence-movetousr and packages using dpkg-divert in ways that dumat would not recognize.

Upcoming improvements to Salsa CI, by Santiago Ruano Rinc n Last month, Santiago Ruano Rinc n started the work on integrating sbuild into the Salsa CI pipeline. Initially, Santiago used sbuild with the unshare chroot mode. However, after discussion with josch, jochensp and helmut (thanks to them!), it turns out that the unshare mode is not the most suitable for the pipeline, since the level of isolation it provides is not needed, and some test suites would fail (eg: krb5). Additionally, one of the requirements of the build job is the use of ccache, since it is needed by some C/C++ large projects to reduce the compilation time. In the preliminary work with unshare last month, it was not possible to make ccache to work. Finally, Santiago changed the chroot mode, and now has a couple of POC (cf: 1 and 2) that rely on the schroot and sudo, respectively. And the good news is that ccache is successfully used by sbuild with schroot! The image here comes from an example of building grep. At the end of the build, ccache -s shows the statistics of the cache that it used, and so a little more than half of the calls of that job were cacheable. The most important pieces are in place to finish the integration of sbuild into the pipeline. Other than that, Santiago also reviewed the very useful merge request !346, made by IOhannes zm lnig to autodetect the release from debian/changelog. As agreed with IOhannes, Santiago is preparing a merge request to include the release autodetection use case in the very own Salsa CI s CI.

Packaging simplemonitor, by Carles Pina i Estany Carles started using simplemonitor in 2017, opened a WNPP bug in 2022 and started packaging simplemonitor dependencies in October 2023. After packaging five direct and indirect dependencies, Carles finally uploaded simplemonitor to unstable in February. During the packaging of simplemonitor, Carles reported a few issues to upstream. Some of these were to make the simplemonitor package build and run tests reproducibly. A reproducibility issue was reprotest overriding the timezone, which broke simplemonitor s tests. There have been discussions on resolving this upstream in simplemonitor and in reprotest, too. Carles also started upgrading or improving some of simplemonitor s dependencies.

Miscellaneous contributions
  • Stefano Rivera spent some time doing admin on debian.social infrastructure. Including dealing with a spike of abuse on the Jitsi server.
  • Stefano started to prepare a new release of dh-python, including cleaning out a lot of old Python 2.x related code. Thanks to Niels Thykier (outside Freexian) for spear-heading this work.
  • DebConf 24 planning is beginning. Stefano discussed venues and finances with the local team and remotely supported a site-visit by Nattie (outside Freexian).
  • Also in the DebConf 24 context, Santiago took part in discussions and preparations related to the Content Team.
  • A JIT bug was reported against pypy3 in Debian Bookworm. Stefano bisected the upstream history to find the patch (it was already resolved upstream) and released an update to pypy3 in bookworm.
  • Enrico participated in /usr-merge discussions with Helmut.
  • Colin Watson backported a python-channels-redis fix to bookworm, rediscovered while working on debusine.
  • Colin dug into a cluster of celery build failures and tracked the hardest bit down to a Python 3.12 regression, now fixed in unstable. celery should be back in testing once the 64-bit time_t migration is out of the way.
  • Thorsten Alteholz uploaded a new upstream version of cpdb-libs. Unfortunately upstream changed the naming of their release tags, so updating the watch file was a bit demanding. Anyway this version 2.0 is a huge step towards introduction of the new Common Print Dialog Backends.
  • Helmut send patches for 48 cross build failures.
  • Helmut changed debvm to use mkfs.ext4 instead of genext2fs.
  • Helmut sent a debci MR for improving collector robustness.
  • In preparation for DebConf 25, Santiago worked on the Brest Bid.

24 February 2024

Niels Thykier: Language Server for Debian: Spellchecking

This is my third update on writing a language server for Debian packaging files, which aims at providing a better developer experience for Debian packagers. Lets go over what have done since the last report.
Semantic token support I have added support for what the Language Server Protocol (LSP) call semantic tokens. These are used to provide the editor insights into tokens of interest for users. Allegedly, this is what editors would use for syntax highlighting as well. Unfortunately, eglot (emacs) does not support semantic tokens, so I was not able to test this. There is a 3-year old PR for supporting with the last update being ~3 month basically saying "Please sign the Copyright Assignment". I pinged the GitHub issue in the hopes it will get unstuck. For good measure, I also checked if I could try it via neovim. Before installing, I read the neovim docs, which helpfully listed the features supported. Sadly, I did not spot semantic tokens among those and parked from there. That was a bit of a bummer, but I left the feature in for now. If you have an LSP capable editor that supports semantic tokens, let me know how it works for you! :)
Spellchecking Finally, I implemented something Otto was missing! :) This stared with Paul Wise reminding me that there were Python binding for the hunspell spellchecker. This enabled me to get started with a quick prototype that spellchecked the Description fields in debian/control. I also added spellchecking of comments while I was add it. The spellchecker runs with the standard en_US dictionary from hunspell-en-us, which does not have a lot of technical terms in it. Much less any of the Debian specific slang. I spend considerable time providing a "built-in" wordlist for technical and Debian specific slang to overcome this. I also made a "wordlist" for known Debian people that the spellchecker did not recognise. Said wordlist is fairly short as a proof of concept, and I fully expect it to be community maintained if the language server becomes a success. My second problem was performance. As I had suspected that spellchecking was not the fastest thing in the world. Therefore, I added a very small language server for the debian/changelog, which only supports spellchecking the textual part. Even for a small changelog of a 1000 lines, the spellchecking takes about 5 seconds, which confirmed my suspicion. With every change you do, the existing diagnostics hangs around for 5 seconds before being updated. Notably, in emacs, it seems that diagnostics gets translated into an absolute character offset, so all diagnostics after the change gets misplaced for every character you type. Now, there is little I could do to speed up hunspell. But I can, as always, cheat. The way diagnostics work in the LSP is that the server listens to a set of notifications like "document opened" or "document changed". In a response to that, the LSP can start its diagnostics scanning of the document and eventually publish all the diagnostics to the editor. The spec is quite clear that the server owns the diagnostics and the diagnostics are sent as a "notification" (that is, fire-and-forgot). Accordingly, there is nothing that prevents the server from publishing diagnostics multiple times for a single trigger. The only requirement is that the server publishes the accumulated diagnostics in every publish (that is, no delta updating). Leveraging this, I had the language server for debian/changelog scan the document and publish once for approximately every 25 typos (diagnostics) spotted. This means you quickly get your first result and that clears the obsolete diagnostics. Thereafter, you get frequent updates to the remainder of the document if you do not perform any further changes. That is, up to a predefined max of typos, so we do not overload the client for longer changelogs. If you do any changes, it resets and starts over. The only bit missing was dealing with concurrency. By default, a pygls language server is single threaded. It is not great if the language server hangs for 5 seconds everytime you type anything. Fortunately, pygls has builtin support for asyncio and threaded handlers. For now, I did an async handler that await after each line and setup some manual detection to stop an obsolete diagnostics run. This means the server will fairly quickly abandon an obsolete run. Also, as a side-effect of working on the spellchecking, I fixed multiple typos in the changelog of debputy. :)
Follow up on the "What next?" from my previous update In my previous update, I mentioned I had to finish up my python-debian changes to support getting the location of a token in a deb822 file. That was done, the MR is now filed, and is pending review. Hopefully, it will be merged and uploaded soon. :) I also submitted my proposal for a different way of handling relationship substvars to debian-devel. So far, it seems to have received only positive feedback. I hope it stays that way and we will have this feature soon. Guillem proposed to move some of this into dpkg, which might delay my plans a bit. However, it might be for the better in the long run, so I will wait a bit to see what happens on that front. :) As noted above, I managed to add debian/changelog as a support format for the language server. Even if it only does spellchecking and trimming of trailing newlines on save, it technically is a new format and therefore cross that item off my list. :D Unfortunately, I did not manage to write a linter variant that does not involve using an LSP-capable editor. So that is still pending. Instead, I submitted an MR against elpa-dpkg-dev-el to have it recognize all the fields that the debian/control LSP knows about at this time to offset the lack of semantic token support in eglot.
From here... My sprinting on this topic will soon come to an end, so I have to a bit more careful now with what tasks I open! I think I will narrow my focus to providing a batch linting interface. Ideally, with an auto-fix for some of the more mechanical issues, where this is little doubt about the answer. Additionally, I think the spellchecking will need a bit more maturing. My current code still trips on naming patterns that are "clearly" verbatim or code references like things written in CamelCase or SCREAMING_SNAKE_CASE. That gets annoying really quickly. It also trips on a lot of commands like dpkg-gencontrol, but that is harder to fix since it could have been a real word. I think those will have to be fixed people using quotes around the commands. Maybe the most popular ones will end up in the wordlist. Beyond that, I will play it by ear if I have any time left. :)

21 February 2024

Niels Thykier: Expanding on the Language Server (LSP) support for debian/control

I have spent some more time on improving my language server for debian/control. Today, I managed to provide the following features:
  • The X- style prefixes for field names are now understood and handled. This means the language server now considers XC-Package-Type the same as Package-Type.

  • More diagnostics:

    • Fields without values now trigger an error marker
    • Duplicated fields now trigger an error marker
    • Fields used in the wrong paragraph now trigger an error marker
    • Typos in field names or values now trigger a warning marker. For field names, X- style prefixes are stripped before typo detection is done.
    • The value of the Section field is now validated against a dataset of known sections and trigger a warning marker if not known.
  • The "on-save trim end of line whitespace" now works. I had a logic bug in the server side code that made it submit "no change" edits to the editor.

  • The language server now provides "hover" documentation for field names. There is a small screenshot of this below. Sadly, emacs does not support markdown or, if it does, it does not announce the support for markdown. For now, all the documentation is always in markdown format and the language server will tag it as either markdown or plaintext depending on the announced support.

  • The language server now provides quick fixes for some of the more trivial problems such as deprecated fields or typos of fields and values.

  • Added more known fields including the XS-Autobuild field for non-free packages along with a link to the relevant devref section in its hover doc.

This covers basically all my known omissions from last update except spellchecking of the Description field. An image of emacs showing documentation for the Provides field from the language server.
Spellchecking Personally, I feel spellchecking would be a very welcome addition to the current feature set. However, reviewing my options, it seems that most of the spellchecking python libraries out there are not packaged for Debian, or at least not other the name I assumed they would be. The alternative is to pipe the spellchecking to another program like aspell list. I did not test this fully, but aspell list does seem to do some input buffering that I cannot easily default (at least not in the shell). Though, either way, the logic for this will not be trivial and aspell list does not seem to include the corrections either. So best case, you would get typo markers but no suggestions for what you should have typed. Not ideal. Additionally, I am also concerned with the performance for this feature. For d/control, it will be a trivial matter in practice. However, I would be reusing this for d/changelog which is 99% free text with plenty of room for typos. For a regular linter, some slowness is acceptable as it is basically a batch tool. However, for a language server, this potentially translates into latency for your edits and that gets annoying. While it is definitely on my long term todo list, I am a bit afraid that it can easily become a time sink. Admittedly, this does annoy me, because I wanted to cross off at least one of Otto's requested features soon.
On wrap-and-sort support The other obvious request from Otto would be to automate wrap-and-sort formatting. Here, the problem is that "we" in Debian do not agree on the one true formatting of debian/control. In fact, I am fairly certain we do not even agree on whether we should all use wrap-and-sort. This implies we need a style configuration. However, if we have a style configuration per person, then you get style "ping-pong" for packages where the co-maintainers do not all have the same style configuration. Additionally, it is very likely that you are a member of multiple packaging teams or groups that all have their own unique style. Ergo, only having a personal config file is doomed to fail. The only "sane" option here that I can think of is to have or support "per package" style configuration. Something that would be committed to git, so the tooling would automatically pick up the configuration. Obviously, that is not fun for large packaging teams where you have to maintain one file per package if you want a consistent style across all packages. But it beats "style ping-pong" any day of the week. Note that I am perfectly open to having a personal configuration file as a fallback for when the "per package" configuration file is absent. The second problem is the question of which format to use and what to name this file. Since file formats and naming has never been controversial at all, this will obviously be the easy part of this problem. But the file should be parsable by both wrap-and-sort and the language server, so you get the same result regardless of which tool you use. If we do not ensure this, then we still have the style ping-pong problem as people use different tools. This also seems like time sink with no end. So, what next then...?
What next? On the language server front, I will have a look at its support for providing semantic hints to the editors that might be used for syntax highlighting. While I think most common Debian editors have built syntax highlighting already, I would like this language server to stand on its own. I would like us to be in a situation where we do not have implement yet another editor extension for Debian packaging files. At least not for editors that support the LSP spec. On a different front, I have an idea for how we go about relationship related substvars. It is not directly related to this language server, except I got triggered by the language server "missing" a diagnostic for reminding people to add the magic Depends: $ misc:Depends [, $ shlibs:Depends ] boilerplate. The magic boilerplate that you have to write even though we really should just fix this at a tooling level instead. Energy permitting, I will formulate a proposal for that and send it to debian-devel. Beyond that, I think I might start adding support for another file. I also need to wrap up my python-debian branch, so I can get the position support into the Debian soon, which would remove one papercut for using this language server. Finally, it might be interesting to see if I can extract a "batch-linter" version of the diagnostics and related quickfix features. If nothing else, the "linter" variant would enable many of you to get a "mini-Lintian" without having to do a package build first.

20 February 2024

Niels Thykier: Language Server (LSP) support for debian/control

About a month ago, Otto Kek l inen asked for editor extensions for debian related files on the debian-devel mailing list. In that thread, I concluded that what we were missing was a "Language Server" (LSP) for our packaging files. Last week, I started a prototype for such a LSP for the debian/control file as a starting point based on the pygls library. The initial prototype worked and I could do very basic diagnostics plus completion suggestion for field names.
Current features I got 4 basic features implemented, though I have only been able to test two of them in emacs.
  • Diagnostics or linting of basic issues.
  • Completion suggestions for all known field names that I could think of and values for some fields.
  • Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.
  • On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.
Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience. Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.
Completion suggestions For completion, all known fields are completed. Place the cursor at the start of the line or in a partially written out field name and trigger the completion in your editor. In my case, I can type R-R-R and trigger the completion and the editor will automatically replace it with Rules-Requires-Root as the only applicable match. Your milage may vary since I delegate most of the filtering to the editor, meaning the editor has the final say about whether your input matches anything. The only filtering done on the server side is that the server prunes out fields already used in the paragraph, so you are not presented with the option to repeat an already used field, which would be an error. Admittedly, not an error the language server detects at the moment, but other tools will. When completing field, if the field only has one non-default value such as Essential which can be either no (the default, but you should not use it) or yes, then the completion suggestion will complete the field along with its value. This is mostly only applicable for "yes/no" fields such as Essential and Protected. But it does also trigger for Package-Type at the moment. As for completing values, here the language server can complete the value for simple fields such as "yes/no" fields, Multi-Arch, Package-Type and Priority. I intend to add support for Section as well - maybe also Architecture.
Diagnostics On the diagnostic front, I have added multiple diagnostics:
  • An error marker for syntax errors.
  • An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.
  • An error marker for adding Multi-Arch: same to an Architecture: all package.
  • Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.
  • Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.
  • Info level marker if a field like Priority duplicates the value of the Source paragraph.
Notable omission at this time:
  • No errors are raised if a field does not have a value.
  • No errors are raised if a field is duplicated inside a paragraph.
  • No errors are used if a field is used in the wrong paragraph.
  • No spellchecking of the Description field.
  • No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
  • Quick fixes to solve these problems... :)
Trying it out If you want to try, it is sadly a bit more involved due to things not being uploaded or merged yet. Also, be advised that I will regularly rebase my git branches as I revise the code. The setup:
  • Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.
  • Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.
  • Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy
  • Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.
Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them. For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.
Using the debputy LSP in emacs The guide I found so far relies on eglot. The guide below assumes you have the elpa-dpkg-dev-el package installed for the debian-control-mode. Though it should be a trivially matter to replace debian-control-mode with a different mode if you use a different mode for your debian/control file. In your emacs init file (such as ~/.emacs or ~/.emacs.d/init.el), you add the follow blob.
(with-eval-after-load 'eglot
    (add-to-list 'eglot-server-programs
        '(debian-control-mode . ("debputy" "lsp" "debian/control"))))
Once you open the debian/control file in emacs, you can type M-x eglot to activate the language server. Not sure why that manual step is needed and if someone knows how to automate it such that eglot activates automatically on opening debian/control, please let me know. For testing completions, I often have to manually activate them (with C-M-i or M-x complete-symbol). Though, it is a bit unclear to me whether this is an emacs setting that I have not toggled or something I need to do on the language server side.
From here As next steps, I will probably look into fixing some of the "known missing" items under diagnostics. The quick fix would be a considerable improvement to assisting users. In the not so distant future, I will probably start to look at supporting other files such as debian/changelog or look into supporting configuration, so I can cover formatting features like wrap-and-sort. I am also very much open to how we can provide integrations for this feature into editors by default. I will probably create a separate binary package for specifically this feature that pulls all relevant dependencies that would be able to provide editor integrations as well.

Niels Thykier: Language Server (LSP) support for debian/control

Work done:
  • [X] No errors are raised if a field does not have a value.
  • [X] No errors are raised if a field is duplicated inside a paragraph.
  • [X] No errors are used if a field is used in the wrong paragraph.
  • [ ] No spellchecking of the Description field.
  • [X] No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
  • [X] Fixed the on-save trim end of line whitespace. Bug in the server end.
  • [X] Hover text for field names

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.
Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?
Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:
  • Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.
  • Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.
  • Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...
    • If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.
So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?
Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.
Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)
Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:
  • When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.
  • When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.
You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

20 January 2024

Niels Thykier: Making debputy: Writing declarative parsing logic

In this blog post, I will cover how debputy parses its manifest and the conceptual improvements I did to make parsing of the manifest easier. All instructions to debputy are provided via the debian/debputy.manifest file and said manifest is written in the YAML format. After the YAML parser has read the basic file structure, debputy does another pass over the data to extract the information from the basic structure. As an example, the following YAML file:
manifest-version: "0.1"
installations:
  - install:
      source: foo
      dest-dir: usr/bin
would be transformed by the YAML parser into a structure resembling:
 
  "manifest-version": "0.1",
  "installations": [
      
       "install":  
         "source": "foo",
         "dest-dir": "usr/bin",
        
      
  ]
 
This structure is then what debputy does a pass on to translate this into an even higher level format where the "install" part is translated into an InstallRule. In the original prototype of debputy, I would hand-write functions to extract the data that should be transformed into the internal in-memory high level format. However, it was quite tedious. Especially because I wanted to catch every possible error condition and report "You are missing the required field X at Y" rather than the opaque KeyError: X message that would have been the default. Beyond being tedious, it was also quite error prone. As an example, in debputy/0.1.4 I added support for the install rule and you should allegedly have been able to add a dest-dir: or an as: inside it. Except I crewed up the code and debputy was attempting to look up these keywords from a dict that could never have them. Hand-writing these parsers were so annoying that it demotivated me from making manifest related changes to debputy simply because I did not want to code the parsing logic. When I got this realization, I figured I had to solve this problem better. While reflecting on this, I also considered that I eventually wanted plugins to be able to add vocabulary to the manifest. If the API was "provide a callback to extract the details of whatever the user provided here", then the result would be bad.
  1. Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.
  2. This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.
So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:
  1. It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.
  2. It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.
  3. It had to support "normalization" of the input, such as
           # User provides
           - install: "foo"
           # Which is normalized into:
           - install:
               source: "foo"
4) It must be simple to tell  debputy  what input you expected.
At this point, I remembered that I had seen a Python (PYPI) package where you could give it a TypedDict and an arbitrary input (Sadly, I do not remember the name). The package would then validate the said input against the TypedDict. If the match was successful, you would get the result back casted as the TypedDict. If the match was unsuccessful, the code would raise an error for you. Conceptually, this seemed to be a good starting point for where I wanted to be. Then I looked a bit on the normalization requirement (point 3). What is really going on here is that you have two "schemas" for the input. One is what the programmer will see (the normalized form) and the other is what the user can input (the manifest form). The problem is providing an automatic normalization from the user input to the simplified programmer structure. To expand a bit on the following example:
# User provides
- install: "foo"
# Which is normalized into:
- install:
    source: "foo"
Given that install has the attributes source, sources, dest-dir, as, into, and when, how exactly would you automatically normalize "foo" (str) into source: "foo"? Even if the code filtered by "type" for these attributes, you would end up with at least source, dest-dir, and as as candidates. Turns out that TypedDict actually got this covered. But the Python package was not going in this direction, so I parked it here and started looking into doing my own. At this point, I had a general idea of what I wanted. When defining an extension to the manifest, the plugin would provide debputy with one or two definitions of TypedDict. The first one would be the "parsed" or "target" format, which would be the normalized form that plugin provider wanted to work on. For this example, lets look at an earlier version of the install-examples rule:
# Example input matching this typed dict.
#    
#       "source": ["foo"]
#       "into": ["pkg"]
#    
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
In this form, the install-examples has two attributes - both are list of strings. On the flip side, what the user can input would look something like this:
# Example input matching this typed dict.
#    
#       "source": "foo"
#       "into": "pkg"
#    
#
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
FullInstallExamplesManifestFormat = Union[
    InstallExamplesManifestFormat,
    List[str],
    str,
]
The idea was that the plugin provider would use these two definitions to tell debputy how to parse install-examples. Pseudo-registration code could look something like:
def _handler(
    normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
    ...  # Do something with the normalized form and return an InstallRule.
concept_debputy_api.add_install_rule(
  keyword="install-examples",
  target_form=InstallExamplesTargetFormat,
  manifest_form=FullInstallExamplesManifestFormat,
  handler=_handler,
)
This was my conceptual target and while the current actual API ended up being slightly different, the core concept remains the same.
From concept to basic implementation Building this code is kind like swallowing an elephant. There was no way I would just sit down and write it from one end to the other. So the first prototype of this did not have all the features it has now. Spoiler warning, these next couple of sections will contain some Python typing details. When reading this, it might be helpful to know things such as Union[str, List[str]] being the Python type for either a str (string) or a List[str] (list of strings). If typing makes your head spin, these sections might less interesting for you. To build this required a lot of playing around with Python's introspection and typing APIs. My very first draft only had one "schema" (the normalized form) and had the following features:
  • Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.
  • Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.
This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time. There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:
def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
    return x if isinstance(x, list) else [x]
This basic form basically works for all types (assuming none of the types will have List[List[...]]). The logic for determining when this rule is applicable is slightly more involved. My current code is about 100 lines of Python code that would probably lose most of the casual readers. For the interested, you are looking for _union_narrowing in declarative_parser.py With this, when the manifest format had Union[str, List[str]] and the target format had List[str] the generated parser would silently map a string into a list of strings for the plugin provider. But with that in place, I had covered the basics of what I needed to get started. I was quite excited about this milestone of having my first keyword parsed without handwriting the parser logic (at the expense of writing a more generic parse-generator framework).
Adding the first parse hint With the basic implementation done, I looked at what to do next. As mentioned, at the time sources in the manifest format was Union[str, List[str]] and I considered to split into a source: str and a sources: List[str] on the manifest side while keeping the normalized form as sources: List[str]. I ended up committing to this change and that meant I had to solve the problem getting my parser generator to understand the situation:
# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
There are two related problems to solve here:
  1. How will the parser generator understand that source should be normalized and then mapped into sources?
  2. Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.
While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:
# For all intents and purposes,  foo  is a string despite all the  Annotated  stuff.
foo: Annotated[str, "hello world"] = "my string here"
The exciting thing is that you can put arbitrary details into the type field and read it out again in your introspection code. Which meant, I could add "parse hints" into the type. Some "quick" prototyping later (a day or so), I got the following to work:
# Map from
#      
#        "source": "foo"  # (or "sources": ["foo"])
#        "into": "pkg"
#      
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
#      
#        "source": ["foo"]
#        "into": ["pkg"]
#      
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]
Without me (as a plugin provider) writing a line of code, I can have debputy rename or "merge" attributes from the manifest form into the normalized form. Obviously, this required me (as the debputy maintainer) to write a lot code so other me and future plugin providers did not have to write it.
High level typing At this point, basic normalization between one mapping to another mapping form worked. But one thing irked me with these install rules. The into was a list of strings when the parser handed them over to me. However, I needed to map them to the actual BinaryPackage (for technical reasons). While I felt I was careful with my manual mapping, I knew this was exactly the kind of case where a busy programmer would skip the "is this a known package name" check and some user would typo their package resulting in an opaque KeyError: foo. Side note: "Some user" was me today and I was super glad to see debputy tell me that I had typoed a package name (I would have been more happy if I had remembered to use debputy check-manifest, so I did not have to wait through the upstream part of the build that happened before debhelper passed control to debputy...) I thought adding this feature would be simple enough. It basically needs two things:
  1. Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)
  2. At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.
It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:
# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[BinaryPackage, List[BinaryPackage]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[
        Annotated[
            List[BinaryPackage],
            DebputyParseHint.required_when_multi_binary()
        ]
    ]
Code-wise, I still had to check for into being absent and providing a default for that case (that is still true in the current codebase - I will hopefully fix that eventually). But I now had less room for mistakes and a standardized error message when you misspell the package name, which was a plus.
The added side-effect - Introspection A lovely side-effect of all the parsing logic being provided to debputy in a declarative form was that the generated parser snippets had fields containing all expected attributes with their types, which attributes were required, etc. This meant that adding an introspection feature where you can ask debputy "What does an install rule look like?" was quite easy. The code base already knew all of this, so the "hard" part was resolving the input the to concrete rule and then rendering it to the user. I added this feature recently along with the ability to provide online documentation for parser rules. I covered that in more details in my blog post Providing online reference documentation for debputy in case you are interested. :)
Wrapping it up This was a short insight into how debputy parses your input. With this declarative technique:
  • The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.
  • It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.
  • The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.
  • Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)
In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)

26 November 2023

Niels Thykier: Providing online reference documentation for debputy

I do not think seasoned Debian contributors quite appreciate how much knowledge we have picked up and internalized. As an example, when I need to look up documentation for debhelper, I generally know which manpage to look in. I suspect most long time contributors would be able to a similar thing (maybe down 2-3 manpages). But new contributors does not have the luxury of years of experience. This problem is by no means unique to debhelper. One thing that debhelper does very well, is that it is hard for users to tell where a addon "starts" and debhelper "ends". It is clear you use addons, but the transition in and out of third party provided tools is generally smooth. This is a sign that things "just work(tm)". Except when it comes to documentation. Here, debhelper's static documentation does not include documentation for third party tooling. If you think from a debhelper maintainer's perspective, this seems obvious. Embedding documentation for all the third-party code would be very hard work, a layer-violation, etc.. But from a user perspective, we should not have to care "who" provides "what". As as user, I want to understand how this works and the more hoops I have to jump through to get that understanding, the more frustrated I will be with the toolstack. With this, I came to the conclusion that the best way to help users and solve the problem of finding the documentation was to provide "online documentation". It should be possible to ask debputy, "What attributes can I use in install-man?" or "What does path-metadata do?". Additionally, the lookup should work the same no matter if debputy provided the feature or some third-party plugin did. In the future, perhaps also other types of documentation such as tutorials or how-to guides. Below, I have some tentative results of my work so far. There are some improvements to be done. Notably, the commands for these documentation features are still treated a "plugin" subcommand features and should probably have its own top level "ask-me-anything" subcommand in the future.
Automatic discard rules Since the introduction of install rules, debputy has included an automatic filter mechanism that prunes out unwanted content. In 0.1.9, these filters have been named "Automatic discard rules" and you can now ask debputy to list them.
$ debputy plugin list automatic-discard-rules
+-----------------------+-------------+
  Name                    Provided By  
+-----------------------+-------------+
  python-cache-files      debputy      
  la-files                debputy      
  backup-files            debputy      
  version-control-paths   debputy      
  gnu-info-dir-file       debputy      
  debian-dir              debputy      
  doxygen-cruft-files     debputy      
+-----------------------+-------------+
For these rules, the provider can both provide a description but also an example of their usage.
$ debputy plugin show automatic-discard-rules la-files
Automatic Discard Rule: la-files
================================
Documentation: Discards any .la files beneath /usr/lib
Example
-------
    /usr/lib/libfoo.la        << Discarded (directly by the rule)
    /usr/lib/libfoo.so.1.0.0
The example is a live example. That is, the provider will provide debputy with a scenario and the expected outcome of that scenario. Here is the concrete code in debputy that registers this example:
api.automatic_discard_rule(
    "la-files",
    _debputy_prune_la_files,
    rule_reference_documentation="Discards any .la files beneath /usr/lib",
    examples=automatic_discard_rule_example(
        "usr/lib/libfoo.la",
        ("usr/lib/libfoo.so.1.0.0", False),
    ),
)
When showing the example, debputy will validate the example matches what the plugin provider intended. Lets say I was to introduce a bug in the code, so that the discard rule no longer worked. Then debputy would start to show the following:
# Output if the code or example is broken
$ debputy plugin show automatic-discard-rules la-files
[...]
Automatic Discard Rule: la-files
================================
Documentation: Discards any .la files beneath /usr/lib
Example
-------
    /usr/lib/libfoo.la        !! INCONSISTENT (code: keep, example: discard)
    /usr/lib/libfoo.so.1.0.0
debputy: warning: The example was inconsistent. Please file a bug against the plugin debputy
Obviously, it would be better if this validation could be added directly as a plugin test, so the CI pipeline would catch it. That is one my personal TODO list. :) One final remark about automatic discard rules before moving on. In 0.1.9, debputy will also list any path automatically discarded by one of these rules in the build output to make sure that the automatic discard rule feature is more discoverable.
Plugable manifest rules like the install rule In the manifest, there are several places where rules can be provided by plugins. To make life easier for users, debputy can now since 0.1.8 list all provided rules:
$ debputy plugin list plugable-manifest-rules
+-------------------------------+------------------------------+-------------+
  Rule Name                       Rule Type                      Provided By  
+-------------------------------+------------------------------+-------------+
  install                         InstallRule                    debputy      
  install-docs                    InstallRule                    debputy      
  install-examples                InstallRule                    debputy      
  install-doc                     InstallRule                    debputy      
  install-example                 InstallRule                    debputy      
  install-man                     InstallRule                    debputy      
  discard                         InstallRule                    debputy      
  move                            TransformationRule             debputy      
  remove                          TransformationRule             debputy      
  [...]                           [...]                          [...]        
  remove                          DpkgMaintscriptHelperCommand   debputy      
  rename                          DpkgMaintscriptHelperCommand   debputy      
  cross-compiling                 ManifestCondition              debputy      
  can-execute-compiled-binaries   ManifestCondition              debputy      
  run-build-time-tests            ManifestCondition              debputy      
  [...]                           [...]                          [...]        
+-------------------------------+------------------------------+-------------+
(Output trimmed a bit for space reasons) And you can then ask debputy to describe any of these rules:
$ debputy plugin show plugable-manifest-rules install
Generic install ( install )
===========================
The generic  install  rule can be used to install arbitrary paths into packages
and is *similar* to how  dh_install  from debhelper works.  It is a two "primary" uses.
  1) The classic "install into directory" similar to the standard  dh_install 
  2) The "install as" similar to  dh-exec 's  foo => bar  feature.
Attributes:
 -  source  (conditional): string
    sources  (conditional): List of string
   A path match ( source ) or a list of path matches ( sources ) defining the
   source path(s) to be installed. [...]
 -  dest-dir  (optional): string
   A path defining the destination *directory*. [...]
 -  into  (optional): string or a list of string
   A path defining the destination *directory*. [...]
 -  as  (optional): string
   A path defining the path to install the source as. [...]
 -  when  (optional): manifest condition (string or mapping of string)
   A condition as defined in [Conditional rules](https://salsa.debian.org/debian/debputy/-/blob/main/MANIFEST-FORMAT.md#Conditional rules).
This rule enforces the following restrictions:
 - The rule must use exactly one of:  source ,  sources 
 - The attribute  as  cannot be used with any of:  dest-dir ,  sources 
[...]
(Output trimmed a bit for space reasons) All the attributes and restrictions are auto-computed by debputy from information provided by the plugin. The associated documentation for each attribute is supplied by the plugin itself, The debputy API validates that all attributes are covered and the documentation does not describe non-existing fields. This ensures that you as a plugin provider never forget to document new attributes when you add them later. The debputy API for manifest rules are not quite stable yet. So currently only debputy provides rules here. However, it is my intention to lift that restriction in the future. I got the idea of supporting online validated examples when I was building this feature. However, sadly, I have not gotten around to supporting it yet.
Manifest variables like PACKAGE I also added a similar documentation feature for manifest variables such as PACKAGE . When I implemented this, I realized listing all manifest variables by default would probably be counter productive to new users. As an example, if you list all variables by default it would include DEB_HOST_MULTIARCH (the most common case) side-by-side with the the much less used DEB_BUILD_MULTIARCH and the even lessor used DEB_TARGET_MULTIARCH variable. Having them side-by-side implies they are of equal importance, which they are not. As an example, the ballpark number of unique packages for which DEB_TARGET_MULTIARCH is useful can be counted on two hands (and maybe two feet if you consider gcc-X distinct from gcc-Y). This is one of the cases, where experience makes us blind. Many of us probably have the "show me everything and I will find what I need" mentality. But that requires experience to be able to pull that off - especially if all alternatives are presented as equals. The cross-building terminology has proven to notoriously match poorly to people's expectation. Therefore, I took a deliberate choice to reduce the list of shown variables by default and in the output explicitly list what filters were active. In the current version of debputy (0.1.9), the listing of manifest-variables look something like this:
$ debputy plugin list manifest-variables
+----------------------------------+----------------------------------------+------+-------------+
  Variable (use via:   NAME  )   Value                                    Flag   Provided by  
+----------------------------------+----------------------------------------+------+-------------+
  DEB_HOST_ARCH                      amd64                                           debputy      
  [... other DEB_HOST_* vars ...]    [...]                                           debputy      
  DEB_HOST_MULTIARCH                 x86_64-linux-gnu                                debputy      
  DEB_SOURCE                         debputy                                         debputy      
  DEB_VERSION                        0.1.8                                           debputy      
  DEB_VERSION_EPOCH_UPSTREAM         0.1.8                                           debputy      
  DEB_VERSION_UPSTREAM               0.1.8                                           debputy      
  DEB_VERSION_UPSTREAM_REVISION      0.1.8                                           debputy      
  PACKAGE                            <package-name>                                  debputy      
  path:BASH_COMPLETION_DIR           /usr/share/bash-completion/completions          debputy      
+----------------------------------+----------------------------------------+------+-------------+
+-----------------------+--------+-------------------------------------------------------+
  Variable type           Value    Option                                                 
+-----------------------+--------+-------------------------------------------------------+
  Token variables         hidden   --show-token-variables OR --show-all-variables         
  Special use variables   hidden   --show-special-case-variables OR --show-all-variables  
+-----------------------+--------+-------------------------------------------------------+
I will probably tweak the concrete listing in the future. Personally, I am considering to provide short-hands variables for some of the DEB_HOST_* variables and then hide the DEB_HOST_* group from the default view as well. Maybe something like ARCH and MULTIARCH, which would default to their DEB_HOST_* counter part. This variable could then have extended documentation that high lights DEB_HOST_<X> as its source and imply that there are special cases for cross-building where you might need DEB_BUILD_<X> or DEB_TARGET_<X>. Speaking of variable documentation, you can also lookup the documentation for a given manifest variable:
$ debputy plugin show manifest-variables path:BASH_COMPLETION_DIR
Variable: path:BASH_COMPLETION_DIR
==================================
Documentation: Directory to install bash completions into
Resolved: /usr/share/bash-completion/completions
Plugin: debputy
This was my update on online reference documentation for debputy. I hope you found it useful. :)
Thanks On a closing note, I would like to thanks Jochen Sprickerhof, Andres Salomon, Paul Gevers for their recent contributions to debputy. Jochen and Paul provided a number of real world cases where debputy would crash or not work, which have now been fixed. Andres and Paul also provided corrections to the documentation.

11 November 2023

Reproducible Builds: Reproducible Builds in October 2023

Welcome to the October 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort, and this instance was no different. During this enriching event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. A number of concrete outcomes from the summit will documented in the report for November 2023 and elsewhere. Amazingly the agenda and all notes from all sessions are already online. The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Reflections on Reflections on Trusting Trust Russ Cox posted a fascinating article on his blog prompted by the fortieth anniversary of Ken Thompson s award-winning paper, Reflections on Trusting Trust:
[ ] In March 2023, Ken gave the closing keynote [and] during the Q&A session, someone jokingly asked about the Turing award lecture, specifically can you tell us right now whether you have a backdoor into every copy of gcc and Linux still today?
Although Ken reveals (or at least claims!) that he has no such backdoor, he does admit that he has the actual code which Russ requests and subsequently dissects in great but accessible detail.

Ecosystem factors of reproducible builds Rahul Bajaj, Eduardo Fernandes, Bram Adams and Ahmed E. Hassan from the Maintenance, Construction and Intelligence of Software (MCIS) laboratory within the School of Computing, Queen s University in Ontario, Canada have published a paper on the Time to fix, causes and correlation with external ecosystem factors of unreproducible builds. The authors compare various response times within the Debian and Arch Linux distributions including, for example:
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed.
A full PDF of their paper is available online, as are many other interesting papers on MCIS publication page.

NixOS installation image reproducible On the NixOS Discourse instance, Arnout Engelen (raboof) announced that NixOS have created an independent, bit-for-bit identical rebuilding of the nixos-minimal image that is used to install NixOS. In their post, Arnout details what exactly can be reproduced, and even includes some of the history of this endeavour:
You may remember a 2021 announcement that the minimal ISO was 100% reproducible. While back then we successfully tested that all packages that were needed to build the ISO were individually reproducible, actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created. By the time we fixed those, regressions had popped up (notably an upstream problem in Python 3.10), and it isn t until this week that we were back to having everything reproducible and being able to validate the complete chain.
Congratulations to NixOS team for reaching this important milestone! Discussion about this announcement can be found underneath the post itself, as well as on Hacker News.

CPython source tarballs now reproducible Seth Larson published a blog post investigating the reproducibility of the CPython source tarballs. Using diffoscope, reprotest and other tools, Seth documents his work that led to a pull request to make these files reproducible which was merged by ukasz Langa.

New arm64 hardware from Codethink Long-time sponsor of the project, Codethink, have generously replaced our old Moonshot-Slides , which they have generously hosted since 2016 with new KVM-based arm64 hardware. Holger Levsen integrated these new nodes to the Reproducible Builds continuous integration framework.

Community updates On our mailing list during October 2023 there were a number of threads, including:
  • Vagrant Cascadian continued a thread about the implementation details of a snapshot archive server required for reproducing previous builds. [ ]
  • Akihiro Suda shared an update on BuildKit, a toolkit for building Docker container images. Akihiro links to a interesting talk they recently gave at DockerCon titled Reproducible builds with BuildKit for software supply-chain security.
  • Alex Zakharov started a thread discussing and proposing fixes for various tools that create ext4 filesystem images. [ ]
Elsewhere, Pol Dellaiera made a number of improvements to our website, including fixing typos and links [ ][ ], adding a NixOS Flake file [ ] and sorting our publications page by date [ ]. Vagrant Cascadian presented Reproducible Builds All The Way Down at the Open Source Firmware Conference.

Distribution work distro-info is a Debian-oriented tool that can provide information about Debian (and Ubuntu) distributions such as their codenames (eg. bookworm) and so on. This month, Benjamin Drung uploaded a new version of distro-info that added support for the SOURCE_DATE_EPOCH environment variable in order to close bug #1034422. In addition, 8 reviews of packages were added, 74 were updated and 56 were removed this month, all adding to our knowledge about identified issues. Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Chris Lamb fixed an issue in diffoscope, where if the equivalent of file -i returns text/plain, fallback to comparing as a text file. This was originally filed as Debian bug #1053668) by Niels Thykier. [ ] This was then uploaded to Debian (and elsewhere) as version 251.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Refine the handling of package blacklisting, such as sending blacklisting notifications to the #debian-reproducible-changes IRC channel. [ ][ ][ ]
    • Install systemd-oomd on all Debian bookworm nodes (re. Debian bug #1052257). [ ]
    • Detect more cases of failures to delete schroots. [ ]
    • Document various bugs in bookworm which are (currently) being manually worked around. [ ]
  • Node-related changes:
    • Integrate the new arm64 machines from Codethink. [ ][ ][ ][ ][ ][ ]
    • Improve various node cleanup routines. [ ][ ][ ][ ]
    • General node maintenance. [ ][ ][ ][ ]
  • Monitoring-related changes:
    • Remove unused Munin monitoring plugins. [ ]
    • Complain less visibly about too many installed kernels. [ ]
  • Misc:
    • Enhance the firewall handling on Jenkins nodes. [ ][ ][ ][ ]
    • Install the fish shell everywhere. [ ]
In addition, Vagrant Cascadian added some packages and configuration for snapshot experiments. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

8 October 2023

Niels Thykier: A new Debian package helper: debputy

I have made a new helper for producing Debian packages called debputy. Today, I uploaded it to Debian unstable for the first time. This enables others to migrate their package build using dh +debputy rather than the classic dh. Eventually, I hope to remove dh entirely from this equation, so you only need debputy. But for now, debputy still leverages dh support for managing upstream build systems. The debputy tool takes a radicially different approach to packaging compared to our existing packaging methods by using a single highlevel manifest instead of all the debian/install (etc.) and no hook targets in debian/rules. Here are some of the things that debputy can do or does: There are also some features that debputy does not support at the moment: There are all limitations of the current work in progress. I hope to resolve them all in due time.

Trying debputy With the limitations aside, lets talk about how you would go about migrating a package:
# Assuming here you have already run: apt install dh-debputy
$ git clone https://salsa.debian.org/rra/kstart
[...]
$ cd kstart
# Add a Build-Dependency on dh-sequence-debputy
$ perl -n -i -e \
   'print; print " dh-sequence-debputy,\n" if m/debhelper-compat/;' \
    debian/control
$ debputy migrate-from-dh --apply-changes
debputy: info: Loading plugin debputy (version: archive/debian/4.3-1) ...
debputy: info: Verifying the generating manifest
debputy: info: Updated manifest debian/debputy.manifest
debputy: info: Removals:
debputy: info:   rm -f "./debian/docs"
debputy: info:   rm -f "./debian/examples"
debputy: info: Migrations performed successfully
debputy: info: Remember to validate the resulting binary packages after rebuilding with debputy
$ cat debian/debputy.manifest 
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - NEWS
    - README
    - TODO
- install-examples:
    source: examples/krenew-agent
$ git add debian/debputy.manifest
$ git commit --signoff -am"Migrate to debputy"
# Run build tool of choice to verify the output.
This is of course a specific example that works out of the box. If you were to try this on debianutils (from git), the output would look something like this:
$ debputy migrate-from-dh
debputy: info: Loading plugin debputy (version: 5.13-13-g9836721) ...
debputy: error: Unable to migrate automatically due to missing features in debputy.
  * The "debian/triggers" debhelper config file (used by dh_installdeb is currently not supported by debputy.
Use --acceptable-migration-issues=[...] to convert this into a warning [...]
And indeed, debianutils requires at least 4 debhelper features beyond what debputy can support at the moment (all related to maintscripts and triggers).

Rapid feedback Rapid feedback cycles are important for keeping developers engaged in their work. The debputy tool provides the following features to enable rapid feedback.

Immediate manifest validation It would be absolutely horrible if you had to do a full-rebuild only to realize you got the manifest syntax wrong. Therefore, debputy has a check-manifest command that checks the manifest for syntactical and semantic issues.
$ cat debian/debputy.manifest
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - GETTING-STARTED-WITH-dh-debputy.md
    - MANIFEST-FORMAT.md
    - MIGRATING-A-DH-PLUGIN.md
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-1-gf34bd66) ...
debputy: info: No errors detected.
$ cat <<EOF >> debian/debputy.manifest
- install:
    sourced: foo
    as: usr/bin/foo
EOF
# Did I typo anything?
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: warning: Possible typo: The key "sourced" at "installations[1].install" should probably have been 'source'
debputy: error: Unknown keys " 'sourced' " at installations[1].install".  Keys that could be used here are: sources, when, dest-dir, source, into.
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
$ sed -i s/sourced:/source:/ debian/debputy.manifest
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: info: No errors detected.
The debputy check-manifest command is limited to the manifest itself and does not warn about foo not existing as it could be produced as apart of the upstream build system. Therefore, there are still issues that can only be detected at package build time. But where debputy can reliably give you immediate feedback, it will do so.

Idempotence: Clean re-runs of dh_debputy without clean/rebuild If you read the fine print of many debhelper commands, you may see the following note their manpage:
This command is not idempotent. dh_prep(1) should be called between invocations of this command Manpage of an anonymous debhelper tool
What this usually means, is that if you run the command twice, you will get its maintscript change (etc.) twice in the final deb. This fits into our single-use clean throw-away chroot builds on the buildds and CI as well as dpkg-buildpackage s no-clean (-nc) option. Single-use throw-away chroots are not very helpful for debugging though, so I rarely use them when doing the majority of my packaging work as I do not want to wait for the chroot initialization (including installing of build-depends). But even then, I have found that dpkg-buildpackage -nc has been useless for me in many cases as I am stuck between two options:
  • With -nc, you often still interact with the upstream build system. As an example, debhelper will do a dh_prep followed by dh_auto_install, so now we are waiting for upstream s install target to run again. What should have taken seconds now easily take 0.5-1 minute extra per attempt.
  • If you want to by-pass this, you have to manually call the helpers needed (in correct order) and every run accumulates cruft from previous runs to the point that cruft drowns out the actual change you want to see. Also, I am rarely in the mood to play human dh, when I am debugging an issue that I failed to fix in my first, second and third try.
As you can probably tell, neither option has worked that well for me. But with dh_debputy, I have made it a goal that it will not self-taint the final output. If dh_debputy fails, you should be able to tweak the manifest and re-run dh_debputy with the same arguments.
  • No waiting for dpkg-buildpackage -nc nor anything implied by that.
  • No self-tainting of the final deb. The result you get, is the result you would have gotten if the previous dh_debputy run never happened.
  • Because dh_debputy produces the final result, I do not have to run multiple tools in the right order.
Obviously, this is currently a lot easier, because debputy is not involved in the upstream build system at all. If this feature is useful to you, please do let me know and I will try to preserve it as debputy progresses in features.

Packager provided files On a different topic, have you ever wondered what kind of files you can place into the debian directory that debhelper automatically picks up or reacts too? I do not have an answer to that beyond it is over 80 files and that as the maintainer of debhelper, I am not willing to manually maintain such a list manually. However, I do know what the answer is in debputy, because I can just ask debputy:
$ debputy plugin list packager-provided-files
+-----------------------------+---------------------------------------------[...]
  Stem                          Installed As                                [...]
+-----------------------------+---------------------------------------------[...]
  NEWS                          /usr/share/doc/ name /NEWS.Debian           [...]
  README.Debian                 /usr/share/doc/ name /README.Debian         [...]
  TODO                          /usr/share/doc/ name /TODO.Debian           [...]
  bug-control                   /usr/share/bug/ name /control               [...]
  bug-presubj                   /usr/share/bug/ name /presubj               [...]
  bug-script                    /usr/share/bug/ name /script                [...]
  changelog                     /usr/share/doc/ name /changelog.Debian      [...]
  copyright                     /usr/share/doc/ name /copyright             [...]
[...]
This will list all file types (Stem column) that debputy knows about and it accounts for any plugin that debputy can find. Note to be deterministic, debputy will not auto-load plugins that have not been explicitly requested during package builds. So this list could list files that are available but not active for your current package. Note the output is not intended to be machine readable. That may come in later version. Feel free to chime in if you have a concrete use-case.

Take it for a spin As I started this blog post with, debputy is now available in unstable. I hope you will take it for a spin on some of your simpler packages and provide feedback on it.  For documentation, please have a look at: Thanks for considering PS: My deepest respect to the fakeroot maintainers. That game of whack-a-mole is not something I would have been willing to maintain. I think fakeroot is like the Python GIL in the sense that it has been important in getting Debian to where it is today. But at the same time, I feel it is time to let go of the crutch and find a proper solution.

20 June 2022

Niels Thykier: wrap-and-sort with experimental support for comments in devscripts/2.22.2

In the devscripts package currently in Debian testing (2.22.2), wrap-and-sort has opt-in support for preserving comments in deb822 control files such as debian/control and debian/tests/control. Currently, this is an opt-in feature to provide some exposure without breaking anything. To use the feature, add --experimental-rts-parser to the command line. A concrete example being (adjust to your relevant style):
wrap-and-sort --experimental-rts-parser -tabk
Please provide relevant feedback to #820625 if you have any. If you experience issues, please remember to provide the original control file along with the concrete command line used. As hinted above, the option is a temporary measure and will be removed again once the testing phase is over, so please do not put it into scripts or packages. For the same reason, wrap-and-sort will emit a slightly annoying warning when using the option. Enjoy.

25 December 2020

Niels Thykier: Improvements to IntelliJ/PyCharm support for Debian packaging files

I have updated my debpkg plugin for IDEA (e.g. IntelliJ, PyCharm, Android Studios) to v0.0.8. Here are some of the changes since last time I wrote about the plugin. New file types supported Links for URLs and bug closes There are often links in deb822 files or the debian/changelog and as of v0.0.8, the plugin will now highlight them and able you to easily open them via your browser. In the deb822 case, they generally appear in the Homepage field, the Vcs-* fields or the Format field of the debian/copyright field. For the changelog file, they often appear in the form of bug Closes statements such as the #123456 in Closes: #123456 , which is a reference to https://bugs.debian.org/123456. Improvements to debian/control The dependency validator now has per-field knowledge. This enables it to flag dependency relations in the Provides field that uses operators other than = (which is the only operator that is supported in that field). It also knows which fields support build-profile restrictions. It in theory also do Architecture restrictions, but I have not added it among other because it gets a bit spicy around binary packages. (Fun fact, you can have Depends: foo [amd64] but only for arch:any packages.) The plugin now suggests adding a Rules-Requires-Root field to the Source stanza along with a quick fix for adding the field. Admittedly, it was mostly done as exercise for me to learn how to do that kind of feature. Support for machine-readable debian/copyright The plugin now has a dedicated file type for debian/copyright that follows the machine-readable format. It should auto-detect it automatically based on the presence of the Format field being set to https://www.debian.org/doc/packaging-manuals/copyright-format/1.0. Sadly, I have not found the detection reliable in all cases, so you may have to apply it manually. With the copyright format, the plugin now scans the Files fields for common issues like pointing on non-existing paths and invalid escape sequences. When the plugin discovers a path that does not match anything, it highlights the part of the path that cannot be found. As an example, consider the pattern src/foo/data.c and that src/foo exist but data.c does not exist, then the plugin will only flag the data.c part of src/foo/data.c as invalid. The plugin will also suggest a quick fix if you a directory into the Files field to replace it with a directory wildcard (e.g. src/foo -> src/foo/* ), which is how the spec wants you to reference every file beneath a given directory. Finally, when the plugin can identify part of the path, then it will turn it into a link (reference in IDEA lingo). This means that you can CTRL + click on it to jump to the file. As a side-effect, it also provides refactoring assistance for renaming files, where renaming a file will often be automatically reflected in debian/copyright. This use case is admittedly mostly relevant people, who are both upstream and downstream maintainer. Folding support improvement for .dsc/.changes/.buildinfo files The new field types appeared with two cases, where I decided to improve the folding support logic. The first was the GPG signature (if present), which consists of two parts. The top part with is mostly a single line marker but often followed by a GPG armor header (e.g. Hash: SHA512 ) and then the signature blob with related marker lines around it. Both cases are folded into a single marker line by default to reduce their impact on content in the editor view. The second case was the following special-case pattern:
Files:
 <md5> <size> filename
Checksums-Sha256:
 <sha256> <size> filename
In the above example, where there is exactly on file name, those fields will by default now be folded into:
Files: <md5> <size> filename
Checksums-Sha256: <sha256> <size> filename
For all other multi-line fields, the plugin still falls back to a list of known fields to fold by default as in previous versions. Spellchecking improvements The plugin already supported selective spell checking in v0.0.3, where it often omitted spell checking for fields (in deb822 files) where it did not make sense. The spell check feature has been improved by providing a list of known packaging terms/jargo used by many contributors (so autopkgtests is no longer considered a typo). This applies to all file types (probably also those not handled by the plugin as it is just a dictionary). Furthermore, the plugin also attempts discover common patterns (e.g. file names or command arguments) and exempt these from spell checking in the debian/changelog. This also includes manpage references such as foo.1 or foo(1) . It is far from perfect and relies on common patterns to exclude spell checking. Nonetheless, it should reduce the number of false positive considerably. Feedback welcome Please let me know if you run into bugs or would like a particular feature implemented. You can submit bug reports and feature requests in the issue tracker on github.

8 August 2020

Reproducible Builds: Reproducible Builds in July 2020

Welcome to the July 2020 report from the Reproducible Builds project. In these monthly reports, we round-up the things that we have been up to over the past month. As a brief refresher, the motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced from the original free software source code to the pre-compiled binaries we install on our systems. (If you re interested in contributing to the project, please visit our main website.)

General news At the upcoming DebConf20 conference (now being held online), Holger Levsen will present a talk on Thursday 27th August about Reproducing Bullseye in practice , focusing on independently verifying that the binaries distributed from ftp.debian.org were made from their claimed sources. Tavis Ormandy published a blog post making the provocative claim that You don t need reproducible builds , asserting elsewhere that the many attacks that have been extensively reported in our previous reports are fantasy threat models . A number of rebuttals have been made, including one from long-time contributor Reproducible Builds contributor Bernhard Wiedemann. On our mailing list this month, Debian Developer Graham Inggs posted to our list asking for ideas why the openorienteering-mapper Debian package was failing to build on the Reproducible Builds testing framework. Chris Lamb remarked from the build logs that the package may be missing a build dependency, although Graham then used our own diffoscope tool to show that the resulting package remains unchanged with or without it. Later, Nico Tyni noticed that the build failure may be due to the relationship between the FILE C preprocessor macro and the -ffile-prefix-map GCC flag. An issue in Zephyr, a small-footprint kernel designed for use on resource-constrained systems, around .a library files not being reproducible was closed after it was noticed that a key part of their toolchain was updated that now calls --enable-deterministic-archives by default. Reproducible Builds developer kpcyrd commented on a pull request against the libsodium cryptographic library wrapper for Rust, arguing against the testing of CPU features at compile-time. He noted that:
I ve accidentally shipped broken updates to users in the past because the build system was feature-tested and the final binary assumed the instructions would be present without further runtime checks
David Kleuker also asked a question on our mailing list about using SOURCE_DATE_EPOCH with the install(1) tool from GNU coreutils. When comparing two installed packages he noticed that the filesystem birth times differed between them. Chris Lamb replied, realising that this was actually a consequence of using an outdated version of diffoscope and that a fix was in diffoscope version 146 released in May 2020. Later in July, John Scott posted asking for clarification regarding on the Javascript files on our website to add metadata for LibreJS, the browser extension that blocks non-free Javascript scripts from executing. Chris Lamb investigated the issue and realised that we could drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ].

Development work

Website On our website this month, Chris Lamb updated the main Reproducible Builds website and documentation to drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ]. He also fixed a number of broken URLs [ ][ ]. Gonzalo Bulnes Guilpain made a large number of grammatical improvements [ ][ ][ ][ ][ ] as well as some misspellings, case and whitespace changes too [ ][ ][ ]. Lastly, Holger Levsen updated the README file [ ], marked the Alpine Linux continuous integration tests as currently disabled [ ] and linked the Arch Linux Reproducible Status page from our projects page [ ].

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds. In July, Chris Lamb made the following changes to diffoscope, including releasing versions 150, 151, 152, 153 & 154:
  • New features:
    • Add support for flash-optimised F2FS filesystems. (#207)
    • Don t require zipnote(1) to determine differences in a .zip file as we can use libarchive. [ ]
    • Allow --profile as a synonym for --profile=-, ie. write profiling data to standard output. [ ]
    • Increase the minimum length of the output of strings(1) to eight characters to avoid unnecessary diff noise. [ ]
    • Drop some legacy argument styles: --exclude-directory-metadata and --no-exclude-directory-metadata have been replaced with --exclude-directory-metadata= yes,no . [ ]
  • Bug fixes:
    • Pass the absolute path when extracting members from SquashFS images as we run the command with working directory in a temporary directory. (#189)
    • Correct adding a comment when we cannot extract a filesystem due to missing libguestfs module. [ ]
    • Don t crash when listing entries in archives if they don t have a listed size such as hardlinks in ISO images. (#188)
  • Output improvements:
    • Strip off the file offset prefix from xxd(1) and show bytes in groups of 4. [ ]
    • Don t emit javap not found in path if it is available in the path but it did not result in an actual difference. [ ]
    • Fix ... not available in path messages when looking for Java decompilers that used the Python class name instead of the command. [ ]
  • Logging improvements:
    • Add a bit more debugging info when launching libguestfs. [ ]
    • Reduce the --debug log noise by truncating the has_some_content messages. [ ]
    • Fix the compare_files log message when the file does not have a literal name. [ ]
  • Codebase improvements:
    • Rewrite and rename exit_if_paths_do_not_exist to not check files multiple times. [ ][ ]
    • Add an add_comment helper method; don t mess with our internal list directly. [ ]
    • Replace some simple usages of str.format with Python f-strings [ ] and make it easier to navigate to the main.py entry point [ ].
    • In the RData comparator, always explicitly return None in the failure case as we return a non-None value in the success one. [ ]
    • Tidy some imports [ ][ ][ ] and don t alias a variable when we do not use it. [ ]
    • Clarify the use of a separate NullChanges quasi-file to represent missing data in the Debian package comparator [ ] and clarify use of a null diff in order to remember an exit code. [ ]
  • Other changes:
    • Profile the launch of libguestfs filesystems. [ ]
    • Clarify and correct our contributing info. [ ][ ][ ][ ][ ][ ]
Jean-Romain Garnier also made the following changes:
  • Allow passing a file with a list of arguments via diffoscope @args.txt. (!62)
  • Improve the output of side-by-side diffs by detecting added lines better. (!64)
  • Remove offsets before instructions in objdump [ ][ ] and remove raw instructions from ELF tests [ ].

Other tools strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. In July, Chris Lamb ensured that we did not install the internal handler documentation generated from Perl POD documents [ ] and fixed a trivial typo [ ]. Marc Herbert added a --verbose-level warning when the Archive::Cpio Perl module is missing. (!6) reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes to support diffoscope version 153 which had removed the (deprecated) --exclude-directory-metadata and --no-exclude-directory-metadata command-line arguments, and updated the testing configuration to also test under Python version 3.8 [ ].

Distributions

Debian In June 2020, Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of hundreds of packages that use the CMake build system. This month however, Niels Thykier uploaded debhelper version 13.2 that passes the -DCMAKE_SKIP_RPATH=ON and -DBUILD_RPATH_USE_ORIGIN=ON arguments to CMake when using the (currently-experimental) Debhelper compatibility level 14. According to Niels, this change:
should fix some reproducibility issues, but may cause breakage if packages run binaries directly from the build directory.
34 reviews of Debian packages were added, 14 were updated and 20 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised the nondeterministic_order_of_debhelper_snippets_added_by_dh_fortran_mod [ ] and gem2deb_install_mkmf_log [ ] toolchain issues. Lastly, Holger Levsen filed two more wishlist bugs against the debrebuild Debian package rebuilder tool [ ][ ].

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update. Bernhard also published the results of performing 12,235 verification builds of packages from openSUSE Leap version 15.2 and, as a result, created three pull requests against the openSUSE Build Result Compare Script [ ][ ][ ].

Other distributions In Arch Linux, there was a mass rebuild of old packages in an attempt to make them reproducible. This was performed because building with a previous release of the pacman package manager caused file ordering and size calculation issues when using the btrfs filesystem. A system was also implemented for Arch Linux packagers to receive notifications if/when their package becomes unreproducible, and packagers now have access to a dashboard where they can all see all their unreproducible packages (more info). Paul Spooren sent two versions of a patch for the OpenWrt embedded distribution for adding a build system revision to the packages manifest so that all external feeds can be rebuilt and verified. [ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including: Vagrant Cascadian also reported two issues, the first regarding a regression in u-boot boot loader reproducibility for a particular target [ ] and a non-deterministic segmentation fault in the guile-ssh test suite [ ]. Lastly, Jelle van der Waa filed a bug against the MeiliSearch search API to report that it embeds the current build date.

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Tweak the rescheduling of various architecture and suite combinations. [ ][ ]
    • Fix links for 404 and not for us icons. (#959363)
    • Further work on a rebuilder prototype, for example correctly processing the sbuild exit code. [ ][ ]
    • Update the sudo configuration file to allow the node health job to work correctly. [ ]
    • Add php-horde packages back to the pkg-php-pear package set for the bullseye distribution. [ ]
    • Update the version of debrebuild. [ ]
  • System health check development:
    • Add checks for broken SSH [ ], logrotate [ ], pbuilder [ ], NetBSD [ ], unkillable processes [ ], unresponsive nodes [ ][ ][ ][ ], proxy connection failures [ ], too many installed kernels [ ], etc.
    • Automatically fix some failed systemd units. [ ]
    • Add notes explaining all the issues that hosts are experiencing [ ] and handle zipped job log files correctly [ ].
    • Separate nodes which have been automatically marked as down [ ] and show status icons for jobs with issues [ ].
  • Misc:
    • Disable all Alpine Linux jobs until they are or Alpine is fixed. [ ]
    • Perform some general upkeep of build nodes hosted by OSUOSL. [ ][ ][ ][ ]
In addition, Mattia Rizzolo updated the init_node script to suggest using sudo instead of explicit logout and logins [ ][ ] and the usual build node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

25 July 2020

Niels Thykier: Support for Debian packaging files in IDEA (IntelliJ/PyCharm)

I have been using the community editions of IntelliJ and PyCharm for a while now for Python or Perl projects. But it started to annoy me that for Debian packaging bits it would revert into a fancy version of notepad. Being fed up with it, I set down and spent the last week studying how to write a plugin to fix this.

After a few prototypes, I have now released IDEA-debpkg v0.0.3 (Link to JetBrain s official plugin listing with screenshots). It provides a set of basic features for debian/control like syntax highlighting, various degree of content validation, folding of long fields, code completion and CTRL + hover documentation. For debian/changelog, it is mostly just syntax highlighting with a bit of fancy linking for now. I have not done anything for debian/rules as I noted there is a Makefile plugin, which will have to do for now.

The code is available from github and licensed under Apache-2.0. Contributors, issues/feature requests and pull requests are very welcome. Among things I could help with are:

I hope you will take it for spin if you have been looking for a bit of Debian packaging support to your PyCharm or other IDEA IDE.  Please do file bugs/issues if you run into issues, rough edges or unhelpful documentation, etc.

29 October 2017

Niels Thykier: Building packages without (fake)root

Turns out that it is surprisingly easy to build most packages without (fake)root. You just need to basic changes:
  1. A way to set ownership to root:root of paths when dpkg-deb build constructs the binary.
  2. A way to have debhelper not do a bunch of (now) pointless chowns to root:root .
The above is sufficient for dpkg, debhelper, lintian, apt-file, mscgen, pbuilder and a long list of other packages that only provide paths owned by root:root . Obviously, packages differ and yours might need more tweaks than this (e.g. dh_usrlocal had to change behaviour to support this). But for me, the best part is that the above is not just some random prototype stuck in two git repos on alioth: Unfortunately, if you are working with games or core packages like shadow with need for static ownership different from root:root (usually with a setuid or setgid bit), then our first implementation does not support your needs at the moment[1]. We are working on a separate way to solve static ownership in a declarative way. [1] Note regarding /usr/local : If your package needs to provide directories there owned by root:staff with mode 02775, then dh_usrlocal can handle that. The non- root:root ownership here works because the directories are created in a maintainer script run as root during installation. Unfortunately, it cannot provide different ownership or modes with R != binary-targets at the moment.
Filed under: Debhelper, Debian

30 July 2017

Niels Thykier: Introducing the debhelper buildlabel prototype for multi-building packages

For most packages, the dh short-hand rules (possibly with a few overrides) work great. It can often auto-detect the buildsystem and handle all the trivial parts. With one notably exception: What if you need to compile the upstream code twice (or more) with different flags? This is the case for all source packages building both regular debs and udebs. In that case, you would previously need to override about 5-6 helpers for this to work at all. The five dh_auto_* helpers and usually also dh_install (to call it with different sourcedir for different packages). This gets even more complex if you want to support Build-Profiles such as noudeb and nodoc . The best way to support nodoc in debhelper is to move documentation out of dh_install s config files and use dh_installman, dh_installdocs, and dh_installexamples instead (NB: wait for compat 11 before doing this). This in turn will mean more overrides with sourcedir and -p/-N. And then there is noudeb , which currently requires manual handling in debian/rules. Basically, you need to use make or shell if-statements to conditionally skip the udeb part of the builds. All of this is needlessly complex. Improving the situation In an attempt to make things better, I have made a new prototype feature in debhelper called buildlabels in experimental. The current prototype is designed to deal with part (but not all) of the above problems: However, it currently not solve the need for overriding the dh_auto_* tools and I am not sure when/if it will. The feature relies on being able to relate packages to a given series of calls to dh_auto_*. In the following example, I will use udebs for the secondary build. However, this feature is not tied to udebs in any way and can be used any source package that needs to do two or more upstream builds for different packages. Assume our example source builds the following binary packages: And in the rules file, we would have something like:
[...]
override_dh_auto_configure:
    dh_auto_configure -B build-deb -- --with-feature1 --with-feature2
    dh_auto_configure -B build-udeb -- --without-feature1 --without-feature2
[...]
What is somewhat obvious to a human is that the first configure line is related to the regular debs and the second configure line is for the udebs. However, debhelper does not know how to infer this and this is where buildlabels come in. With buildlabels, you can let debhelper know which packages and builds that belong together. How to use buildlabels To use buildlabels, you have to do three things:
  1. Pick a reasonable label name for the secondary build. In the example, I will use udeb .
  2. Add buildlabel=$LABEL to all dh_auto_* calls related to your secondary build.
  3. Tag all packages related to my-label with X-DH-Buildlabel: $LABEL in debian/control. (For udeb packages, you may want to add Build-Profiles: <!noudeb> while you are at it).
For the example package, we would change the debian/rules snippet to:
[...]
override_dh_auto_configure:
    dh_auto_configure -B build-deb -- --with-feature1 --with-feature2
    dh_auto_configure --buildlabel=udeb -B build-udeb -- --without-feature1 --without-feature2
[...]
(Remember to update *all* calls to dh_auto_* helpers; the above only lists dh_auto_configure to keep the example short.) And then add X-DH-Buildlabel: udeb in the stanzas for foo-udeb + libfoo1-udeb. With those two minor changes: Real example Thanks to Michael Biebl, I was able to make an branch in the systemd git repository to play with this feature. Therefore I have an real example to use as a show case. The gist of it is in the following three commits: Full branch can be seen at: https://anonscm.debian.org/git/pkg-systemd/systemd.git/log/?h=wip-dh-prototype-smarter-multi-builds Request for comments / call for testing This prototype is now in experimental (debhelper/10.7+exp.buildlabels) and you are very welcome to take it for a spin. Please let me know if you find the idea useful and feel free to file bugs or feature requests. If deemed useful, I will merge into master and include in a future release. If you have any questions or comments about the feature or need help with trying it out, you are also very welcome to mail the debhelper-devel mailing list. Known issues / the fine print:
Filed under: Debhelper, Debian

Next.