Search Results: "bremner"

6 November 2021

Reproducible Builds: Reproducible Builds in October 2021

Welcome to the October 2021 report from the Reproducible Builds project!
This month Samanta Navarro posted to the oss-security security mailing on a novel category of exploit in the .tar archive format, where a single .tar file contains different contents depending on the tar utility being used. Naturally, this has consequences for reproducible builds as Samanta goes onto reply:

Arch Linux uses libarchive (bsdtar) in its build environment. The default tar program installed is GNU tar. It is possible to create a source distribution which leads to different files seen by the build environment than compared to a careful reviewer and other Linux distributions.
Samanta notes that addressing the tar utilities themselves will not be a sufficient fix:
I have submitted bug reports and patches to some projects but eventually I had to conclude that the problem itself cannot be fixed by these implementations alone. The best choice for these tools would be to only allow archives which are fully compatible to standards but this in turn would render a lot of archives broken.
Reproducible builds, with its twin ideas of reaching consensus on the build outputs as well as precisely recording and describing the build environment, would help address this problem at a higher level.
Codethink announced that they had achieved ISO-26262 ASIL D Tool Certification, a way of determining specific safety standards for software. Codethink used open source tooling to achieve this, but they also leverage:
Reproducibility, repeatability and traceability of builds, drawing heavily on best-practices championed by the Reproducible Builds project.

Elsewhere on the internet, according to a comment on Hacker News, Microsoft are now comparing NPM Javascript packages with their original source repositories:
I got a PR in my repository a few days ago leading back to a team trying to make it easier for packages to be reproducible from source.

Lastly, Martin Monperrus started an interesting thread on our mailing list about Github, specifically that their autogenerated release tarballs are not deterministic . The thread generated a significant number of replies that are worth reading.

Events and presentations

Community news On our mailing list this month:
There were quite a few changes to the Reproducible Builds website and documentation this month as well, including Feng Chai updating some links on our publications page [ ] and marco updated our project metadata around the Bitcoin Core building guide [ ].
Lastly, we ran another productive meeting on IRC during October. A full set of notes from the meeting is available to view.

Distribution work Qubes was heavily featured in the latest edition of Linux Weekly News, and a significant section was dedicated to discussing reproducibility. For example, it was mentioned that the Qubes project has been working on incorporating reproducible builds into its continuous integration (CI) infrastructure . But the LWN article goes on to describe that:
The current goal is to be able to build the Qubes OS Debian templates solely from packages that can be built reproducibly. Templates in Qubes OS are VM images that can be used to start an application qube quickly based on the template. The qube will have read-only access to the root filesystem of the template, so that the same root filesystem can be shared with multiple application qubes. There are official templates for several variants of both Fedora and Debian, as well as community maintained templates for several other distributions.
You can view the whole article on LWN, and Fr d ric also published a lengthy summary about their work on reproducible builds in Qubes as well for those wishing to learn more.
In Debian this month, 133 reviews of Debian packages were added, 81 were updated and 24 were removed this month, adding to Debian s ever-growing knowledge about identified issues. A number of issues were categorised and added by Chris Lamb and Vagrant Cascadian too [ ][ ][ ]. In addition, work on alternative snapshot service has made progress by Fr d ric Pierret and Holger Levsen this month, including moving from the existing host (snapshot.notset.fr) to snapshot.reproducible-builds.org (more info) thanks to OSUOSL for the machine and hosting and Debian for the disks.
Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 186, 187, 188 and 189 to Debian
  • New features:
    • Add support for Python Sphinx inventory files (usually named objects.inv on-disk). [ ]
    • Add support for comparing .pyc files. Thanks to Sergei Trofimovich for the inspiration. [ ]
    • Try some alternative suffixes (e.g. .py) to support distributions that strip or retain them. [ ][ ]
  • Bug fixes:
    • Fix Python decompilation tests under Python 3.10+ [ ] and for Python 3.7 [ ].
    • Don t raise a traceback if we cannot unmarshal Python bytecode. This is in order to support Python 3.7 failing to load .pyc files generated with newer versions of Python. [ ]
    • Skip Python bytecode testing where we do not have an expected diff. [ ]
  • Codebase improvements:
    • Use our file_version_is_lt utility instead of accepting both versions of uImage expected diff. [ ]
    • Split out a custom call to assert_diff for a .startswith equivalent. [ ]
    • Use skipif instead of manual conditionals in some tests. [ ]
In addition, Jelle van der Waa added external tool references for Arch Linux for ocamlobjinfo, openssl and ffmpeg [ ][ ][ ] and added Arch Linux as a Continuous Integration (CI) test target. [ ] and Vagrant Cascadian updated the testsuite to skip Python bytecode comparisons when file(1) is older than 5.39. [ ] as well as added external tool references for the Guix distribution for dumppdf and ppudump. [ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ]. Lastly, Guangyuan Yang updated the FreeBSD package name on the website [ ], Mattia Rizzolo made a change to override a new Lintian warning due to the new test files [ ], Roland Clobus added support to detect and log if the GNU_BUILD_ID field in an ELF binary been modified [ ], Sandro J ckel updated a number of helpful links on the website [ ] and Sergei Trofimovich made the uImage test output support file() version 5.41 [ ].

reprotest reprotest is the Reproducible Build s project end-user tool to build same source code twice in widely differing environments, checking the binaries produced by the builds for any differences. This month, reprotest version 0.7.18 was uploaded to Debian unstable by Holger Levsen, which also included a change by Holger to clarify that Python 3.9 is used nowadays [ ], but it also included two changes by Vasyl Gello to implement realistic CPU architecture shuffling [ ] and to log the selected variations when the verbosity is configured at a sufficiently high level [ ]. Finally, Vagrant Cascadian updated reprotest to version 0.7.18 in GNU Guix.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix unreproducible packages. We try to send all of our patches upstream where appropriate. We authored a large number of such patches this month, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Debian-related changes:
      • Incorporate a fix from bremner into builtin-pho related to binary-NMUs. [ ]
      • Keep bullseye environments around longer, in an attempt to fix a Jenkins issue. [ ]
      • Improve the documentation of buildinfos.debian.net. [ ]
      • Improve documentation for the builtin-pho setup. [ ][ ]
    • OpenWrt-related changes:
      • Also use -j1 for better debugging. [ ]
      • Document that that Python 3.x is now used. [ ]
      • Enable further debugging for the toolchain build. [ ]
    • New snapshot.reproducible-builds.org service:
      • Actually add new node. [ ][ ]
      • Install xfsprogs on snapshot.reproducible-builds.org. [ ]
      • Create account for fpierret on new node. [ ]
      • Run node_health_check job on new node too. [ ]
  • Mattia Rizzolo:
    • Debian-related changes:
      • Handle schroot errors when invoking diffoscope instead of masking them. [ ][ ]
      • Declare and define some variables separately to avoid masking the subshell return code. [ ]
      • Fix variable name. [ ]
      • Improve log reporting. [ ]
      • Execute apt-get update with the -q argument to get more decent logs. [ ]
      • Set the Debian HTTP mirror and proxy for snapshot.reproducible-builds.org. [ ]
      • Install the libarchive-tools package (instead of bsdtar) when updating Jenkins nodes. [ ]
    • Be stricter about errors when starting the node agent [ ] and don t overwrite NODE_NAME so that we can expect Jenkins to properly set for us [ ].
    • Explicitly warn if the NODE_NAME is not a fully-qualified domain name (FQDN). [ ]
    • Document whether a node runs in the future. [ ]
    • Disable postgresql_autodoc as it not available in bullseye. [ ]
    • Don t be so eager when deleting schroot internals, call to schroot -e to terminate the schroots instead. [ ]
    • Only consider schroot underlays for deletion that are over a month old. [ ][ ]
    • Only try to unmount /proc if it s actually mounted. [ ]
    • Move the db_backup task to its own Jenkins job. [ ]
Lastly, Vasyl Gello added usage information to the reproducible_build.sh script [ ].

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

10 June 2021

Louis-Philippe V ronneau: New Desktop Computer

I built my last desktop computer what seems like ages ago. In 2011, I was in a very different place, both financially and as a person. At the time, I was earning minimum wage at my school's caf to pay rent. Since the caf was owned by the school cooperative, I had an employee discount on computer parts. This gave me a chance to build my first computer from spare parts at a reasonable price. After 10 years of service1, the time has come to upgrade. Although this machine was still more than capable for day to day tasks like browsing the web or playing casual video games, it started to show its limits when time came to do more serious work. Old computer specs:
CPU: AMD FX-8530
Memory: 8GB DDR3 1600Mhz
Motherboard: ASUS TUF SABERTOOTH 990FX R2.0
Storage: Samsung 850 EVO 500GB SATA
I first started considering an upgrade in September 2020: David Bremner was kindly fixing a bug in ledger that kept me from balancing my books and since it seemed like a class of bug that would've been easily caught by an autopkgtest, I decided to add one. After adding the necessary snippets to run the upstream testsuite (an easy task I've done multiple times now), I ran sbuild and ... my computer froze and crashed. Somehow, what I thought was a simple Python package was maxing all the cores on my CPU and using all of the 8GB of memory I had available.2 A few month later, I worked on jruby and the builds took 20 to 30 minutes long enough to completely disrupt my flow. The same thing happened when I wanted to work on lintian: the testsuite would take more than 15 minutes to run, making quick iterations impossible. Sadly, the pandemic completely wrecked the computer hardware market and prices here in Canada have only recently started to go down again. As a result, I had to wait more time than I would've liked not to pay scalper prices. New computer specs:
CPU: AMD Ryzen 5900X
Memory: 64GB DDR4 3200MHz
Motherboard: MSI MPG B550 Gaming Plus
Storage: Corsair MP600 500 GB Gen4 NVME
The difference between the two machines is pretty staggering: I've gone from a CPU with 2 cores and 8 threads, to one with 12 cores and 24 threads. Not only that, but single-threaded performance has also vastly increased in those 10 years. A good example would be building grammalecte, a package I've recently sponsored. I feel it's a good benchmark, since the build relies on single-threaded performance for the normal Python operations, while being threaded when it compiles the dictionaries. On the old computer:
Build needed 00:10:07, 273040k disk space
And as you can see, on the new computer the build time has been significantly reduced:
Build needed 00:03:18, 273040k disk space
Same goes for things like the lintian testsuite. Since it's a very multi-threaded workload, it now takes less than 2 minutes to run; a 750% improvement. All this to say I'm happy with my purchase. And lo and behold I can now build ledger without a hitch, even though it maxes my 24 threads and uses 28GB of RAM. Who would've thought... Screen capture of htop showing how much resources ledger takes to build

  1. I managed to fry that PC's motherboard in 2016 and later replaced it with a brand new one. I also upgraded the storage along the way, from a very cheap cacheless 120GB SSD to a larger Samsung 850 EVO SATA drive.
  2. As it turns out, ledger is mostly written in C++ :)

1 June 2021

David Bremner: Baby steps towards schroot and slurm cooperation.

Unfortunately schroot does not maintain CPU affinity 1. This means in particular that parallel builds have the tendency to take over an entire slurm managed server, which is kindof rude. I haven't had time to automate this yet, but following demonstrates a simple workaround for interactive building.
  simplex:~
 % schroot --preserve-environment -r -c polymake
(unstable-amd64-sbuild)bremner@simplex:~$ echo $SLURM_CPU_BIND_LIST
0x55555555555555555555
(unstable-amd64-sbuild)bremner@simplex:~$ grep Cpus /proc/self/status
Cpus_allowed:   ffff,ffffffff,ffffffff
Cpus_allowed_list:      0-79
(unstable-amd64-sbuild)bremner@simplex:~$ taskset $SLURM_CPU_BIND_LIST bash
(unstable-amd64-sbuild)bremner@simplex:~$ grep Cpus /proc/self/status
Cpus_allowed:   5555,55555555,55555555
Cpus_allowed_list:      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78

Next steps In principle the schroot configuration parameter can be used to run taskset before every command. In practice it's a bit fiddly because you need a shell script shim (because the environment variable) and you need to e.g. goof around with bind mounts to make sure that your script is available in the chroot. And then there's combining with ccache and eatmydata...

18 July 2020

David Bremner: git-annex and ikiwiki, not as hard as I expected

Background So apparently there's this pandemic thing, which means I'm teaching "Alternate Delivery" courses now. These are just like online courses, except possibly more synchronous, definitely less polished, and the tuition money doesn't go to the College of Extended Learning. I figure I'll need to manage share videos, and our learning management system, in the immortal words of Marie Kondo, does not bring me joy. This has caused me to revisit the problem of sharing large files in an ikiwiki based site (like the one you are reading). My goto solution for large file management is git-annex. The last time I looked at this (a decade ago or so?), I was blocked by git-annex using symlinks and ikiwiki ignoring them for security related reasons. Since then two things changed which made things relatively easy.
  1. I started using the rsync_command ikiwiki option to deploy my site.
  2. git-annex went through several design iterations for allowing non-symlink access to large files.

TL;DR In my ikiwiki config
    # attempt to hardlink source files? (optimisation for large files)
    hardlink => 1,
In my ikiwiki git repo
$ git annex init
$ git annex add foo.jpg
$ git commit -m&aposadd big photo&apos
$ git annex adjust --unlock                 # look ikiwiki, no symlinks
$ ikiwiki --setup ~/.config/ikiwiki/client  # rebuild my local copy, for review
$ ikiwiki --setup /home/bremner/.config/ikiwiki/rsync.setup --refresh  # deploy
You can see the result at photo

6 July 2020

Reproducible Builds: Reproducible Builds in June 2020

Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article) Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube: The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build. The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info) It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software. Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).

Distribution work In the ArchLinux distribution, a goal to remove .doctrees from installed files was created via Arch s TODO list mechanism. These .doctree files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default. Dimitry Andric was able to determine why the reproducibility status of FreeBSD s base.txz depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler [ ]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [ ]. For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan janneke Nieuwenhuizen posted that they had further reduced their bootstrap seed to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:

Debian Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [ ][ ][ ]. Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. [ ] Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system s approach this [ ]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds. 47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates issue [ ] and updated the paths_vary_due_to_usrmerge as deterministic issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path and gcc_captures_build_path issues. [ ][ ][ ]. Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild Debian package rebuilder tool (#961861, #961862 & #961864).

Development On our website this month, Arnout Engelen added a link to our Mastodon account [ ] and moved the SOURCE_DATE_EPOCH git log example to another section [ ]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [ ]. strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper compatibility level to 13 [ ] and adjusted a related dependency to avoid potential circular dependency [ ].

Upstream work The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including: Bernhard M. Wiedemann also filed reports for frr (build fails on single-processor machines), ghc-yesod-static/git-annex (a filesystem ordering issue) and ooRexx (ASLR-related issue).

diffoscope diffoscope is our in-depth diff-on-steroids utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs. This month, Chris Lamb uploaded versions 147, 148 and 149 to Debian and made the following changes:
  • New features:
    • Add output from strings(1) to ELF binaries. (#148)
    • Dump PE32+ executables (such as EFI applications) using objdump(1). (#181)
    • Add support for Zsh shell completion. (#158)
  • Bug fixes:
    • Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF /Info stanza). (#150)
    • Fix compatibility with jsondiff version 1.2.0. (#159)
    • Fix an issue in GnuPG keybox file handling that left filenames in the diff. [ ]
    • Correct detection of JSON files due to missing call to File.recognizes that checks candidates against file(1). [ ]
  • Output improvements:
    • Use the CSS word-break property over manually adding U+200B zero-width spaces as these were making copy-pasting cumbersome. (!53)
    • Downgrade the tlsh warning message to an info level warning. (#29)
  • Logging improvements:
  • Testsuite improvements:
    • Update tests for file(1) version 5.39. (#179)
    • Drop accidentally-duplicated copy of the --diff-mask tests. [ ]
    • Don t mask an existing test. [ ]
  • Codebase improvements:
    • Replace obscure references to WF with Wagner-Fischer for clarity. [ ]
    • Use a semantic AbstractMissingType type instead of remembering to check for both types of missing files. [ ]
    • Add a comment regarding potential security issue in the .changes, .dsc and .buildinfo comparators. [ ]
    • Drop a large number of unused imports. [ ][ ][ ][ ][ ]
    • Make many code sections more Pythonic. [ ][ ][ ][ ]
    • Prevent some variable aliasing issues. [ ][ ][ ]
    • Use some tactical f-strings to tidy up code [ ][ ] and remove explicit u"unicode" strings [ ].
    • Refactor a large number of routines for clarity. [ ][ ][ ][ ]
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12) In addition Jean-Romain Garnier made the following changes:
  • Fix the --new-file option when comparing directories by merging DirectoryContainer.compare and Container.compare. (#180)
  • Allow user to mask/filter diff output via --diff-mask=REGEX. (!51)
  • Make child pages open in new window in the --html-dir presenter format. [ ]
  • Improve the diffs in the --html-dir format. [ ][ ]
Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator [ ] and Mattia Rizzolo prevented warnings from the tlsh fuzzy-matching library during tests [ ] and tweaked the build system to remove an unwanted .build directory [ ]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [ ] and later 148 [ ].

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Prevent bogus failure emails from rsync2buildinfos.debian.net every night. [ ]
    • Merge a fix from David Bremner s database of .buildinfo files to include a fix regarding comparing source vs. binary package versions. [ ]
    • Only run the Debian package rebuilder job twice per day. [ ]
    • Increase bullseye scheduling. [ ]
  • System health status page:
    • Add a note displaying whether a node needs to be rebooted for a kernel upgrade. [ ]
    • Fix sorting order of failed jobs. [ ]
    • Expand footer to link to the related Jenkins job. [ ]
    • Add archlinux_html_pages, openwrt_rebuilder_today and openwrt_rebuilder_future to known broken jobs. [ ]
    • Add HTML <meta> header to refresh the page every 5 minutes. [ ]
    • Count the number of ignored jobs [ ], ignore permanently known broken jobs [ ] and jobs on known offline nodes [ ].
    • Only consider the known offline status from Git. [ ]
    • Various output improvements. [ ][ ]
  • Tools:
    • Switch URLs for the Grml Live Linux and PureOS package sets. [ ][ ]
    • Don t try to build a disorderfs Debian source package. [ ][ ][ ]
    • Stop building diffoscope as we are moving this to Salsa. [ ][ ]
    • Merge several is diffoscope up-to-date on every platform? test jobs into one [ ] and fail less noisily if the version in Debian cannot be determined [ ].
In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components [ ], Jelle van der Waa updated the is diffoscope up-to-date in every platform check for Arch Linux and diffoscope [ ], Mattia Rizzolo backed up a copy of a remove script run on the Codethink-hosted jump server [ ] and Vagrant Cascadian temporarily disabled the fixfilepath on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map categorised issue. Lastly, the usual build node maintenance was performed by Holger Levsen [ ][ ], Mattia Rizzolo [ ] and Vagrant Cascadian [ ][ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

8 April 2020

David Bremner: Tangling multiple files

I have lately been using org-mode literate programming to generate example code and beamer slides from the same source. I hit a wall trying to re-use functions in multiple files, so I came up with the following hack. Thanks 'ngz' on #emacs and Charles Berry on the org-mode list for suggestions and discussion.
(defun db-extract-tangle-includes ()
  (goto-char (point-min))
  (let ((case-fold-search t)
        (retval nil))
    (while (re-search-forward "^#[+]TANGLE_INCLUDE:" nil t)
      (let ((element (org-element-at-point)))
        (when (eq (org-element-type element) &aposkeyword)
          (push (org-element-property :value element) retval))))
    retval))
(defun db-ob-tangle-hook ()
  (let ((includes (db-extract-tangle-includes)))
    (mapc #&aposorg-babel-lob-ingest includes)))
(add-hook &aposorg-babel-pre-tangle-hook #&aposdb-ob-tangle-hook t)
Use involves something like the following in your org-file.
#+SETUPFILE: presentation-settings.org
#+SETUPFILE: tangle-settings.org
#+TANGLE_INCLUDE: lecture21.org
#+TITLE: GC V: Mark & Sweep with free list
For batch export with make, I do something like
%.tangle-stamp: %.org
    emacs --batch --quick  -l org  -l $ HOME /.emacs.d/org-settings.el --eval "(org-babel-tangle-file \"$<\")"
    touch $@

6 March 2020

Reproducible Builds: Reproducible Builds in February 2020

Welcome to the February 2020 report from the Reproducible Builds project. One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the reproducible builds effort is to provide the ability to demonstrate these binaries originated from a particular, trusted, source release: if identical results are generated from a given source in all circumstances, reproducible builds provides the means for multiple third-parties to reach a consensus on whether a build was compromised via distributed checksum validation or some other scheme. In this month s report, we cover:

If you are interested in contributing to the project, please visit our Contribute page on our website.

Media coverage & upstream news Omar Navarro Leija, a PhD student at the University Of Pennsylvania, published a paper entitled Reproducible Containers that describes in detail the workings of a new user-space container tool called DetTrace:
All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows.
There was also considerable discussion on our mailing list regarding this research and a presentation based on the paper will occur at the ASPLOS 2020 conference between March 16th 20th in Lausanne, Switzerland. The many virtues of Reproducible Builds were touted as benefits for software compliance in a talk at FOSDEM 2020, debating whether the Careful Inventory of Licensing Bill of Materials Have Impact of FOSS License Compliance which pitted Jeff McAffer and Carol Smith against Bradley Kuhn and Max Sills. (~47 minutes in). Nobuyoshi Nakada updated the canonical implementation of the Ruby programming language a change such that filesystem globs (ie. calls to list the contents of filesystem directories) will henceforth be sorted in ascending order. Without this change, the underlying nondeterministic ordering of the filesystem is exposed to the language which often results in an unreproducible build. Vagrant Cascadian reported on our mailing list regarding a quick reproducible test for the GNU Guix distribution, which resulted in 81.9% of packages registering as reproducible in his installation:
$ guix challenge --verbose --diff=diffoscope ...
2,463 store items were analyzed:
  - 2,016 (81.9%) were identical
  - 37 (1.5%) differed
  - 410 (16.6%) were inconclusive
Jeremiah Orians announced on our mailing list the release of a number of tools related to cross-compilation such as M2-Planet and mescc-tools-seed. This project attemps a full bootstrap of a cross-platform compiler for the C programming language (written in C itself) from hex, the ultimate goal being able to demonstrate fully-bootstrapped compiler from hex to the GCC GNU Compiler Collection. This has many implications in and around Ken Thompson s Trusting Trust attack outlined in Thompson s 1983 Turing Award Lecture. Twitter user @TheYoctoJester posted an executive summary of reproducible builds in the Yocto Project: Finally, Reddit user tofflos posted to the /r/Java subreddit asking about how to achieve reproducible builds with Maven and Chris Lamb noticed that the Linux kernel documentation about reproducible builds of it is available on the kernel.org homepages in an attractive HTML format.

Distribution work

Debian Chris Lamb created a merge request for the core debian-installer package to allow all arguments and options from sources.list files (such as [check-valid-until=no] , etc.) in order that we can test the reproducibility of the installer images on the Reproducible Builds own testing infrastructure. (#13) Thorsten Glaser followed-up to a bug filed against the dpkg-source component that was originally filed in late 2015 that claims that the build tool does not respect permissions when unpacking tarballs if the umask is set to 0002. Matthew Garrett posted to the debian-devel mailing list on the topic of Producing verifiable initramfs images as part of a wider conversation on being able to trust the entire software stack on our computers. 59 reviews of Debian packages were added, 30 were updated and 42 were removed this month adding to our knowledge about identified issues. Many issue types were noticed and categorised by Chris Lamb, including:

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as provided the following patches:

Software development

diffoscope diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour. Chris Lamb made the following changes this month, including uploading version 137 to Debian:
  • The sng image utility appears to return with an exit code of 1 if there are even minor errors in the file. (#950806)
  • Also extract classes2.dex, classes3.dex from .apk files extracted by apktool. (#88)
  • No need to use str.format if we are just returning the string. [ ]
  • Add generalised support for ignoring returncodes [ ] and move special-casing of returncodes in zip to use Command.VALID_RETURNCODES. [ ]

Other tools disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Vagrant Cascadian updated the Vcs-Git to specify the debian packaging branch. [ ] reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.13 and 0.7.14 were uploaded to Debian unstable by Holger Levsen after Vagrant Cascadian added support for GNU Guix [ ].

Project documentation & website There was more work performed on our documentation and website this month. Bernhard M. Wiedemann added a Java Gradle Build Tool snippet to the SOURCE_DATE_EPOCH documentation [ ] and normalised various terms to unreproducible [ ]. Chris Lamb added a Meson.build example [ ] and improved the documentation for the CMake [ ] to the SOURCE_DATE_EPOCH documentation, replaced anyone can with anyone may as, well, not everyone has the resources, skills, time or funding to actually do what it refers to [ ] and improved the pre-processing for our report generation [ ][ ][ ][ ] etc. In addition, Holger Levsen updated our news page to improve the list of reports [ ], added an explicit mention of the weekly news time span [ ] and reverted sorting of news entries to have latest on top [ ] and Mattia Rizzolo added Codethink as a non-fiscal sponsor [ ] and lastly Tianon Gravi added a Docker Images link underneath the Debian project on our Projects page [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Vagrant Cascadian submitted patches via the Debian bug tracking system targeting the packages the Civil Infrastructure Platform has identified via the CIP and CIP build depends package sets:

Testing framework We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made by Holger Levsen: In addition, Mattia Rizzolo added an Apache web server redirect for buildinfos.debian.net [ ] and reverted the reshuffling of arm64 architecture builders [ ]. The usual build node maintenance was performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian.

Getting in touch If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb and Holger Levsen. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

17 September 2017

Russ Allbery: Free software log (July and August 2017)

I've wanted to start making one of these posts for a few months but have struggled to find the time. But it seems like a good idea, particularly since I get more done when I write down what I do, so you all get a rather belated one. This covers July and August; hopefully the September one will come closer to the end of September. Debian August was DebConf, which included a ton of Policy work thanks to Sean Whitton's energy and encouragement. During DebConf, we incorporated work from Hideki Yamane to convert Policy to reStructuredText, which has already made it far easier to maintain. (Thanks also to David Bremner for a lot of proofreading of the result.) We also did a massive bug triage and closed a ton of older bugs on which there had been no forward progress for many years. After DebConf, as expected, we flushed out various bugs in the reStructuredText conversion and build infrastructure. I fixed a variety of build and packaging issues and started doing some more formatting cleanup, including moving some footnotes to make the resulting document more readable. During July and August, partly at DebConf and partly not, I also merged wording fixes for seven bugs and proposed wording (not yet finished) for three more, as well as participated in various Policy discussions. Policy was nearly all of my Debian work over these two months, but I did upload a new version of the webauth package to build with OpenSSL 1.1 and drop transitional packages. Kerberos I still haven't decided my long-term strategy with the Kerberos packages I maintain. My personal use of Kerberos is now fairly marginal, but I still care a lot about the software and can't convince myself to give it up. This month, I started dusting off pam-krb5 in preparation for a new release. There's been an open issue for a while around defer_pwchange support in Heimdal, and I spent some time on that and tracked it down to an upstream bug in Heimdal as well as a few issues in pam-krb5. The pam-krb5 issues are now fixed in Git, but I haven't gotten any response upstream from the Heimdal bug report. I also dusted off three old Heimdal patches and submitted them as upstream merge requests and reported some more deficiencies I found in FAST support. On the pam-krb5 front, I updated the test suite for the current version of Heimdal (which changed some of the prompting) and updated the portability support code, but haven't yet pulled the trigger on a new release. Other Software I merged a couple of pull requests in podlators, one to fix various typos (thanks, Jakub Wilk) and one to change the formatting of man page references and function names to match the current Linux manual page standard (thanks, Guillem Jover). I also documented a bad interaction with line-buffered output in the Term::ANSIColor man page. Neither of these have seen a new release yet.

2 September 2017

David Bremner: Indexing Debian's buildinfo

Introduction Debian is currently collecting buildinfo but they are not very conveniently searchable. Eventually Chris Lamb's buildinfo.debian.net may solve this problem, but in the mean time, I decided to see how practical indexing the full set of buildinfo files is with sqlite.

Hack
  1. First you need a copy of the buildinfo files. This is currently about 2.6G, and unfortunately you need to be a debian developer to fetch it.
     $ rsync -avz mirror.ftp-master.debian.org:/srv/ftp-master.debian.org/buildinfo .
    
  2. Indexing takes about 15 minutes on my 5 year old machine (with an SSD). If you index all dependencies, you get a database of about 4G, probably because of my natural genius for database design. Restricting to debhelper and dh-elpa, it's about 17M.
     $ python3 index.py
    
    You need at least python3-debian installed
  3. Now you can do queries like
     $ sqlite3 depends.sqlite "select * from depends where depend='dh-elpa' and depend_version<='0106'"
    
    where 0106 is some adhoc normalization of 1.6

Conclusions The version number hackery is pretty fragile, but good enough for my current purposes. A more serious limitation is that I don't currently have a nice (and you see how generous my definition of nice is) way of limiting to builds currently available e.g. in Debian unstable.

Antoine Beaupr : My free software activities, August 2017

Debian Long Term Support (LTS) This is my monthly Debian LTS report. This month I worked on a few major packages that took a long time instead of multiple smaller issues. Affected packages were Mercurial, libdbd-mysql-perl and Ruby.

Mercurial updates Mercurial was vulnerable to two CVEs: CVE-2017-1000116 (command injection on clients through malicious ssh URLs) and CVE-2017-1000115 (path traversal via symlink). The former is an issue that actually affects many other similar software like Git (CVE-2017-1000117), Subversion (CVE-2017-9800) and even CVS (CVE-2017-12836). The latter symlink issue is a distinct issue that came up during an internal audit. The fix, shipped as DLA-1072-1, involved a rather difficult backport, especially because the Mercurial test suite takes a long time to complete. This reminded me of the virtues of DEB_BUILD_OPTIONS=parallel=4, which sped up the builds considerably. I also discovered that the Wheezy build chain doesn't support sbuild's --source-only-changes flag which I had hardcoded in my sbuild.conf file. This seems to be simply because sbuild passes --build=source to dpkg-buildpackage, an option that is supported only in jessie or later.

libdbd-mysql-perl I have worked on fixing two issues with the libdbd-mysql-perl package, CVE-2017-10788 and CVE-2017-10789, which resulted in the DLA-1079-1 upload. Behind this mysteriously named package sits a critical piece of infrastructure, namely the mysql commandline client which is probably used and abused by hundreds if not thousands of home-made scripts, but also all of Perl's MySQL support, which is probably used by even a larger base of software. Through the Debian bug reports (Debian bug #866818 and Debian bug #866821), I have learned that the patches existed in the upstream tracker but were either ignored or even reverted in the latest 4.043 upstream release. It turns out that there are talks of forking that library because of maintainership issue. It blows my mind that such an important part of MySQL is basically unmaintained. I ended up backporting the upstream patches, which was also somewhat difficult because of the long-standing issues with SSL support in MySQL. The backport there was particularly hard to test, as you need to run that test suite by hand, twice: once with a server configured with a (valid!) SSL certificate and one without (!). I'm wondering how much time it is really worth spending on trying to fix SSL in MySQL, however. It has been badly broken forever, and while the patch is an improvement, I would actually still never trust SSL transports in MySQL over an untrusted network. The few people that I know use such transports wrap their connections around a simpler stunnel instead. The other issue was easier to fix so I submitted a pull request upstream to make sure that work isn't lost, although it is not clear what the future of that patch (or project!) will be at this point.

Rubygems I also worked on the rubygems issues, which, thanks to the "vendoring" practice of the Ruby community, also affects the ruby1.9 package. 4 distinct CVEs were triaged here (CVE-2017-0899, CVE-2017-0900, CVE-2017-0901 and CVE-2017-0902) and I determined the latter issue didn't affect wheezy as rubygems doesn't do its own DNS resolution there (later versions lookup SRV records). This is another package where the test suite takes a long time to run. Worse, the packages in Wheezy actually fails to build from source: the test suites just fail in various steps, particularly because of dh key too small errors for Rubygems, but also other errors for Ruby. I also had trouble backporting one test which I had to simply skip for Rubygems. I uploaded and announced test packages and hopefully I'll be able to complete this work soon, although I would certainly appreciate any help on this...

Triage I took a look at the sox, libvorbis and exiv2 issues. None had fixes available. sox and exiv2 were basically a list of fuzzing issues, which are often minor or at least of unknown severity. Those would have required a significant amount of work and I figured I would prioritize other work first. I also triaged CVE-2017-7506, which doesn't seem to affect the spice package in wheezy, after doing a fairly thorough audit of the code. The vulnerability is specifically bound to the reds_on_main_agent_monitors_config function, which is simply not present in our older version. A hostile message would fall through the code and not provoke memory allocation or out of bounds access, so I simply marked the wheezy version as not-affected, something which usually happens during the original triage but can also happen during the actual patching work, as in this case.

Other free software work This describes the volunteer work I do on various free software projects. This month, again, my internal reports show that I spent about the same time on volunteer and paid time, but this is probably a wrong estimate because I spent a lot of time at Debconf which I didn't clock in...

Debconf So I participated in the 17th Debian Conference in Montreal. It was great to see (and make!) so many friends from all over the world in person again, and I was happy to work on specific issues together with other Debian developers. I am especially thankful to David Bremner for fixing the syncing of the flagged tag when added to new messages (patch series). This allows me to easily sync the one tag (inbox) that is not statically assigned during notmuch new, by using flagged as a synchronization tool. This allows me to use notmuch more easily across multiple machines without having to sync all tags with dump/restore or using muchsync which wasn't working for me (although a new release came out which may fix my issues). The magic incantation looks something like this:
notmuch tag -inbox tag:inbox and not tag:flagged
notmuch tag +inbox not tag:inbox and tag:flagged
However, most of my time in the first week (Debcamp) was spent trying to complete the networking setup: configure switches, setup wiring and so on. I also configured an apt-cacher-ng proxy to serve packages to attendees during the conference. I configured it with Avahi to configure clients automatically, which led me to discover (and fix) issue Debian bug #870321) although there are more issues with the autodiscovery mechanism... I spent extra time to document the (somewhat simple) configuration of such a server in the Debian wiki because it was not the first time I had research that procedure... I somehow thought this was a great time to upgrade my laptop to stretch. Normally, I keep that device running stable because I don't use it often and I don't want to have major traumatizing upgrades every time I leave with it on a trip. But this time was special: there were literally hundreds of Debian developers to help me out if there was trouble. And there was, of course, trouble as it turns out! I had problems with the fonts on my display, because, well, I had suspended (twice) my laptop during the install. The fix was simply to flush the fontconfig cache, and I tried to document this in the fonts wiki page and my upgrades page. I also gave a short training called Debian packaging 101 which was pretty successful. Like the short presentation I made at the last Montreal BSP, the workshop was based on my quick debian development guide. I'm thinking of expanding this to a larger audience with a "102" course that would discuss more complex packaging problems. But my secret plan (well, secret until now I guess) is to make packaging procedures more uniform in Debian by training new Debian packagers using that same training for the next 2 decades. But I will probably start by just trying to do this again at the next Debconf, if I can attend.

Debian uploads I also sponsored two packages during Debconf: one was a "scratch an itch" upload (elpa-ivy) which I requested (Debian bug #863216) as part of a larger effort to ship the Emacs elisp packages as Debian packages. The other was an upload of diceware to build the documentation in a separate package and fix other issues I have found in the package during a review. I also uploaded a bunch of other fixes to the Debian archive:

Signing keys rotation I also started the process of moving my main OpenPGP certification key by adding a signing subkey. The subkey is stored in a cryptographic token so I can sign things on more than one machine without storing that critical key on all those devices physically. Unfortunately, this meant that I need to do some shenanigans when I want to sign content in my Debian work, because the new subkey takes time to propagate to the Debian archive. For example, I have to specify the primary key with a "bang" when signing packages (debsign -k '792152527B75921E!' ...) or use inline signatures in email sent for security announcement (since that trick doesn't work in Mutt or Notmuch). I tried to figure out how to better coordinate this next time by reading up documentation on keyring.debian.org, but there is no fixed date for key changes on the rsync interface. There are "monthly changes" so one's best bet is to look for the last change in their git repository.

GitLab.com and LFS migration I finally turned off my src.anarc.at git repository service by moving the remaining repos to GitLab. Unfortunately, GitLab removed support for git-annex recently, so I had to migrate my repositories to Git-LFS, which was an interesting experience. LFS is pretty easy to use, definitely simpler than git-annex. It also seems to be a good match for the use-case at hand, which is to store large files (videos, namely) as part of slides for presentations. It turns out that their migration guide could have been made much simpler. I tried to submit those changes to the documentation but couldn't fork the GitLab EE project to make a patch, so I just documented the issue in the original MR for now. While I was there I filed a feature request to add a new reference shortcut (GL-NNN) after noticing a similar token used on GitHub. This would be a useful addition because I often have numbering conflicts between Debian BTS bug numbers and GitLab issues in packages I maintain there. In particular, I have problems using GitLab issue numbers in Monkeysign, because commit logs end up in Debian changelogs and will be detected by the Debian infrastructure even though those are GitLab bug numbers. Using such a shortcut would avoid detection and such a conflict.

Numpy-stats I wrote a small tool to extract numeric statistics from a given file. I often do ad-hoc benchmarks where I store a bunch of numbers in a file and then try to make averages and so on. As an exercise in learning NumPy, I figured I would write such a simple tool, called numpy-stats, which probably sounds naive to seasoned Python scientists. My incentive was that I was trying to figure out what was the distribution of password length in a given password generator scheme. So I wrote this simple script:
for i in seq 10000 ; do
    shuf -n4 /usr/share/dict/words   tr -d '\n'
done > length
And then feed that data in the tool:
$ numpy-stats lengths 
 
  "max": 60, 
  "mean": 33.883293722913464, 
  "median": 34.0, 
  "min": 14, 
  "size": 143060, 
  "std": 5.101490225062775
 
I am surprised that there isn't such a tool already: hopefully I am wrong and will just be pointed towards the better alternative in the comments here!

Safe Eyes I added screensaver support to the new SafeEyes project, which I am considering as a replacement to the workrave project I have been using for years. I really like how the interruptions basically block the whole screen: way more effective than only blocking the keyboard, because all potential distractions go away. One feature that is missing is keystrokes and mouse movement counting and of course an official Debian package, although the latter would be easy to fix because upstream already has an unofficial build. I am thinking of writing my own little tool to count keystrokes, since the overlap between SafeEyes and such a counter isn't absolutely necessary. This is something that workrave does, but there are "idle time" extensions in Xorg that do not need to count keystrokes. There are already certain tools to count input events, but none seem to do what I want (most of them are basically keyloggers). It would be an interesting test to see if it's possible to write something that would work both for Xorg and Wayland at the same time. Unfortunately, preliminary research show that:
  1. in Xorg, the only way to implement this is to sniff all events, ie. to implement a keylogger
  2. in Wayland, this is completely unsupported. it seems some compositors could implement such a counter, but then it means that this is compositor specific, or, in other words, unportable
So there is little hope here, which brings to my mind "painmeter" as an appropriate name for this future programming nightmare.

Ansible I sent my first contribution to the ansible project with a small documentation fix. I had an eye opener recently when I discovered a GitLab ansible prototype that would manipulate GitLab settings. When I first discovered Ansible, I was frustrated by the YAML/Jinja DSL: it felt silly to write all this code in YAML when you are a Python developer. It was great to see reasonably well-written Python code that would do things and delegate the metadata storage (and only that!) to YAML, as opposed to using YAML as a DSL. So I figured I would look at the Ansible documentation on how this works, but unfortunately, the Ansible documentation is severly lacking in this area. There are broken links (I only fixed one page) and missing pieces. For example, the developing plugins page doesn't explain how to program a plugin at all. I was told on IRC that: "documentation around developing plugins is sparse in general. the code is the best documentation that exists (right now)". I didn't get a reply when asking which code in particular could provide good examples either. In comparison, Puppet has excellent documentation on how to create custom types, functions and facts. That is definitely a turn-off for a new contributor, but at least my pull request was merged in and I can only hope that seasoned Ansible contributors expand on this critical piece of documentation eventually.

Misc As you can see, I'm all over the place, as usual. GitHub tells me I "Opened 13 other pull requests in 11 repositories" (emphasis mine), which I guess means on top of the "9 commits in 5 repositories" mentioned earlier. My profile probably tells a more detailed story that what would be useful to mention here. I should also mention how difficult it is to write those reports: I basically do a combination of looking into my GitHub and GitLab profiles, the last 30 days of emails (!) and filesystem changes (!!). En vrac, a list of changes which may be of interest:
  • font-large (and its alias, font-small): shortcut to send the right escape sequence to rxvt so it changes its font
  • fix-acer: short script to hardcode the modeline (you remember those?!) for my screen which has a broken EDID pin (so autodetection fails, yay Xorg log files...)
  • ikiwiki-pandoc-quickie: fake ikiwiki renderer that (ab)uses pandoc to generate a HTML file with the right stylesheet to preview Markdown as it may look in this blog (the basic template is missing still)
  • git-annex-transfer: a command I've often been missing in git-annex, which is a way to transfer files between remotes without having to copy them locally (upstream feature request)
  • I linked the graphics of the Debian archive software architecture in the Debian wiki in the hope more people notice it.
  • I did some tweaks on my Taffybar to introduce a battery meter and hoping to have temperature sensors, which mostly failed. there's a pending pull request that may bring some sense into this, hopefully.
  • I made two small patches in Monkeysign to fix gpg.conf handling and multiple email output, a dumb bug I cannot believe anyone noticed or reported just yet. Thanks Valerie for the bug report! The upload of this in Debian is pending a review from the release team.

17 August 2017

Sean Whitton: DebCamp/DebConf17: reports on sprints and BoFs

In addition to my personal reflections on DebCamp/DebConf17, here is a brief summary of the activities that I had a hand in co-ordinating. I won t discuss here many other small items of work and valuable conversations that I had during the two weeks; hopefully the fruits of these will show themselves in my uploads to the archive over the next year. Debian Policy sprint & BoF Debian Emacs Team meeting/sprint Unfortunately we didn t make any significant progress towards converting all addons to use dh_elpa, as the work is not that much fun. Might be worth a more focused sprint next year. Report on team website Git for Debian packaging BoF & follow-up conversations The BoF was far more about dgit than I had wanted; however, I think that this was mostly because people had questions about dgit, rather than any unintended lecturing by me. I believe that several people came away from DebConf thinking that starting to use dgit would improve Debian for themselves and for users of their packages.

19 July 2016

Michael Prokop: DebConf16 in Capetown/South Africa: Lessons learnt

DebConf 16 in Capetown/South Africa was fantastic for many reasons. My Capetown/South Africa/Culture/Flight related lessons: My technical lessons from DebConf16: BTW, thanks to the video team the recordings from the sessions are available online.

5 May 2016

Sean Whitton: dh_make_elpa & dh_elpa_test

I recently completed and released some work on Debian s tooling for packaging Emacs Lisp addons to GNU Emacs. Emacs grew a package manager called package.el a few years ago, and last year David Bremner wrote the dh_elpa tool to simplify packaging addons for Debian by leveraging package.el features. Packaging a series of addons for Debian left me with a wishlist of features for dh_elpa and I was recently able to implement them. Debian tooling generally uses Perl, a language I didn t know before starting on this project. I was fortunate enough to receive a free review copy of Perl 5 by Example when I attended a meeting of the Bay Area Linux Users Group while I was visiting San Francisco a few months ago. I accepted the book with the intent of doing this work. dh_make_elpa dh_make_elpa (at present available from Debian experimental) is a Perl script to convert a git repository cloned from the upstream of an Emacs Lisp addon to a rudimentary Debian package. It performs a lot of guesswork, and its simple heuristics are something I hope to improve on. Since I am new to object-oriented program design in Perl and I wanted to leverage object-oriented Debian tooling library code, I took the structure of my project from dh_make_perl. In this manner I found it easy and pleasant to write a maintainable script. dh_elpa_test A lot of Emacs Lisp addon packages use a program called Cask to manage the Emacs Lisp dependencies needed to run their test suites. That meant that dh_auto_test often fails to run Emacs Lisp addon package test suites. Since the Debian packaging toolchain already has advanced dependency management, it s undesirable to involve Cask in the package build pipeline if it can be avoided. I had been copying and pasting the code needed to make the tests run in our environment to the debian/rules files of each package whose test suite I wanted to run. dh_elpa_test tries to detect Emacs Lisp addon package test suites and run them with the workarounds needed in our environment. This avoids boilerplate in debian/rules. dh_elpa_test also disables dh_auto_test to avoid a inadvertent Cask invocation. Future & acknowledgements My hope for this work was to make it easier and faster to package Emacs Lisp addon packages for Debian, for my own sake and for anyone new who is interested in joining the pkg-emacsen team. In the future, I want to have dh_elpa_test generate an autopkgtest definition so that a Testsuite: pkg-emacsen line in debian/control is enough to have an Emacs Lisp addon package test suite run on Debian CI. I m very grateful to David Bremner for reviewing and supporting this work, and also for supporting my Emacs Lisp addon packaging work more generally.

1 February 2016

Lunar: Reproducible builds: week 40 in Stretch cycle

What happened in the reproducible builds effort between January 24th and January 30th:

Media coverage Holger Levsen was interviewed by the FOSDEM team to introduce his talk on Sunday 31st.

Toolchain fixes Jonas Smedegaard uploaded d-shlibs/0.63 which makes the order of dependencies generated by d-devlibdeps stable accross locales. Original patch by Reiner Herrmann.

Packages fixed The following 53 packages have become reproducible due to changes in their build dependencies: appstream-glib, aptitude, arbtt, btrfs-tools, cinnamon-settings-daemon, cppcheck, debian-security-support, easytag, gitit, gnash, gnome-control-center, gnome-keyring, gnome-shell, gnome-software, graphite2, gtk+2.0, gupnp, gvfs, gyp, hgview, htmlcxx, i3status, imms, irker, jmapviewer, katarakt, kmod, lastpass-cli, libaccounts-glib, libam7xxx, libldm, libopenobex, libsecret, linthesia, mate-session-manager, mpris-remote, network-manager, paprefs, php-opencloud, pisa, pyacidobasic, python-pymzml, python-pyscss, qtquick1-opensource-src, rdkit, ruby-rails-html-sanitizer, shellex, slony1-2, spacezero, spamprobe, sugar-toolkit-gtk3, tachyon, tgt. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them:
  • gnubg/1.05.000-4 by Russ Allbery.
  • grcompiler/4.2-6 by Hideki Yamane.
  • sdlgfx/2.0.25-5 fix by Felix Geyer, uploaded by Gianfranco Costamagna.
Patches submitted which have not made their way to the archive yet:
  • #812876 on glib2.0 by Lunar: ensure that functions are sorted using the C locale when giotypefuncs.c is generated.

diffoscope development diffoscope 48 was released on January 26th. It fixes several issues introduced by the retrieval of extra symbols from Debian debug packages. It also restores compatibility with older versions of binutils which does not support readelf --decompress.

strip-nondeterminism development strip-nondeterminism 0.015-1 was uploaded on January 27th. It fixes handling of signed JAR files which are now going to be ignored to keep the signatures intact.

Package reviews 54 reviews have been removed, 36 added and 17 updated in the previous week. 30 new FTBFS bugs have been submitted by Chris Lamb, Michael Tautschnig, Mattia Rizzolo, Tobias Frost.

Misc. Alexander Couzens and Bryan Newbold have been busy fixing more issues in OpenWrt. Version 1.6.3 of FreeBSD's package manager pkg(8) now supports SOURCE_DATE_EPOCH. Ross Karchner did a lightning talk about reproducible builds at his work place and shared the slides.

24 January 2016

Lunar: Reproducible builds: week 39 in Stretch cycle

What happened in the reproducible builds effort between January 17th and January 23rd:

Toolchain fixes James McCoy uploaded subversion/1.9.3-2 which removes -Wdate-time from CPPFLAGS passed to swig enabling several packages to build again. The switch made in binutils/2.25-6 to use deterministic archives by default had the unfortunate effect of breaking a seldom used feature of make. Manoj Srivastava asked on debian-devel the best way to communicate the changes to Debian users. Lunar quickly came up with a patch that displays a warning when Make encounters deterministic archives. Manoj made it available in make/4.1-2 together with a NEWS file advertising the change. Following Guillem Jover's comment on the latest patch to make mtimes of packaged files deterministic, Daniel Kahn Gillmor updated and extended the patch adding the --clamp-mtime option to GNU Tar. Mattia Rizzolo updated texlive-bin in the reproducible experimental repository.

Packages fixed The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:

reproducible.debian.net Transition from reproducible.debian.net to the more general tests.reproducible-builds.org has started. More visual changes are coming. (h01ger) A plan on how to run tests for F-Droid has been worked out. (hc, mvdan, h01ger) A first step has been made by adding a Jenkins job to setup an F-Droid build environment. (h01ger)

diffoscope development diffoscope 46 has been released on January 19th, followed-up by version 47 made available on January 23rd. Try it online at try.diffoscope.org! The biggest visible change is the improvement to ELF file handling. Comparisons are now done section by section, using the most appropriate tool and options to get meaningful results, thanks to Dhole's work and Mike Hommey's suggestions. Also suggested by Mike, symbols for IP-relative ops are now filtered out to remove clutter. Understanding differences in ELF files belonging to Debian packages should also be much easier as diffoscope will now try to extract debug information from the matching dbgsym package. This means objdump disassembler should output line numbers for packages built with recent debhelper as long as the associated debug package is in the same directory. As diff tends to consume huge amount of memory on large inputs, diffoscope has a limit in place to prevent crashes. diffoscope used to display a difference every time the limit was hit. Because this was confusing in case there were actually no differences, a hash is now internally computed to only report a difference when one exists. Files in archives and other container members are now compared in the original order. This should not matter in most case but overall give more predictable results. Debian .buildinfo files are now supported. Amongst other minor fixes and improvements, diffoscope will now properly compare symlinks in directories. Thanks Tuomas Tynkkynen for reporting the problem.

Package reviews 70 reviews have been removed, 125 added and 33 updated in the previous week, gcc-5 amongst others. 25 FTBFS issues have been filled by Chris Lamb, Daniel Stender, Martin Michlmayr.

Misc. The 16th FOSDEM will happen in Brussels, Belgium on January 30-31st. Several talks will be about reproducible builds: h01ger about the general ecosystem, Fabian Keil about the security oriented ElectroBSD, Baptiste Daroussin about FreeBSD packages, Ludovic Court s about Guix.

14 January 2016

Lunar: Reproducible builds: week 37 in Stretch cycle

What happened in the reproducible builds effort between January 3rd and January 9th 2016:

Toolchain fixes David Bremner uploaded dh-elpa/0.0.18 which adds a --fix-autoload-date option (on by default) to take autoload dates from changelog. Lunar updated and sent the patch adding the generation of .buildinfo to dpkg.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: aggressive-indent-mode, circe, company-mode, db4o, dh-elpa, editorconfig-emacs, expand-region-el, f-el, geiser, hyena, js2-mode, markdown-mode, mono-fuse, mysql-connector-net, openbve, regina-normal, sml-mode, vala-mode-el. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #809780 on flask-restful by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810259 on avfs by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810509 on apt by Mattia Rizzolo: ensure a stable file order is given to the linker.

reproducible.debian.net Add 2 more armhf build nodes provided by Vagrant Cascadian. This added 7 more armhf builder jobs. We now run around 900 tests of armhf packages each day. (h01ger) The footer of each page now indicates by which Jenkins jobs build it. (h01ger)

diffoscope development diffoscope 45 has been released on January 4th. It features huge memory improvements when comparing large files, several fixes of squashfs related issues that prevented comparing two Tails images, and improve the file list of tar and cpio archive to be more precise and consistent over time. It also fixes a typo that prevented the Mach-O to work (Rainer M ller), improves comparisons of ELF files when specified on the command line, and solves a few more encoding issues.

Package reviews 134 reviews have been removed, 30 added and 37 updated in the previous week. 20 new fail to build from source issues were reported by Chris Lamb and Chris West. prebuilder will now skip installing diffoscope to save time if the build results are identical. (Reiner Herrmann)

29 December 2015

David Bremner: Converting PDFs to DJVU

Today I was wondering about converting a pdf made from scan of a book into djvu, hopefully to reduce the size, without too much loss of quality. My initial experiments with pdf2djvu were a bit discouraging, so I invested some time building gsdjvu in order to be able to run djvudigital. Watching the messages from djvudigital I realized that the reason it was achieving so much better compression was that it was using black and white for the foreground layer by default. I also figured out that the default 300dpi looks crappy since my source document is apparently 600dpi. I then went back an compared djvudigital to pdf2djvu a bit more carefully. My not-very-scientific conclusions: Perhaps most compellingly, the output from pdf2djvu has sensible metadata and is searchable in evince. Even with the --words option, the output from djvudigital is not. This is possibly related to the error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building gsdjvu involved guessing at corrections for several errors. Some of these issues have to do with building software from 2009 (the instructions suggestion building with ghostscript 8.64) in a modern toolchain; others I'm not sure. There was an upload of gsdjvu in February of 2015, somewhat to my surprise. AT&T has more or less crippled the project by licensing it under the CPL, which means binaries are not distributable, hence motivation to fix all the rough edges is minimal.
Version kilobytes per page position in figure
Original PDF 80.9 top
pdf2djvu --dpi=450 92.0 not shown
pdf2djvu --monochrome --dpi=450 27.5 second from top
pdf2djvu --monochrome --dpi=600 --loss-level=50 21.3 second from bottom
djvudigital --dpi=450 29.4 bottom
djvu-compare.png

23 December 2015

David Bremner: Offline key signing with caff

After a mildly ridiculous amount of effort I made a bootable-usb key. I then layered a bash script on top of a perl script on top of gpg. What could possibly go wrong?
 #!/bin/bash
 infile=$1
 keys=$(gpg --with-colons  $infile   sed -n 's/^pub//p'   cut -f5 -d: )
 gpg --homedir $HOME/.caff/gnupghome --import $infile
 caff -R -m no "$ keys[*] "
 today=$(date +"%Y-%m-%d")
 output="$(pwd)/keys-$today.tar"
 for key in $ keys[*] ; do
     (cd $HOME/.caff/keys/;   tar rvf "$output" $today/$key.mail*)
 done
The idea is that keys are exported to files on a networked host, the files are processed on an offline host, and the resulting tarball of mail messages sneakernetted back to the connected host.

21 December 2015

David Bremner: Bootable Debian USB

Umm. Somehow I thought this would be easier than learning about live-build. Probably I was wrong. There are probably many better tutorials on the web. Two useful observations: zeroing the key can eliminate mysterious grub errors, and systemd-nspawn is pretty handy. One thing that should have been obvious, but wasn't to me is that it's easier to install grub onto a device outside of any container. Find device
 $ dmesg
Count sectors
 # fdisk -l /dev/sdx
Assume that every command after here is dangerous. Zero it out. This is overkill for a fresh key, but fixed a problem with reusing a stick that had a previous live distro installed on it.
 # dd if=/dev/zero of=/dev/sdx bs=1048576 count=$sectors
Make file system. There are lots of options. I eventually used parted
 # parted
 (parted) mklabel msdos
 (parted) mkpart primary ext2 1 -1
 (parted) set 1 boot on
 (parted) quit
Make a file system
 # mkfs.ext2 /dev/sdx1
 # mount /dev/sdx1 /mnt
Install the base system
 # debootstrap --variant=minbase jessie /mnt http://httpredir.debian.org/debian/
Install grub (no chroot needed)
 # grub-install --boot-directory /mnt/boot /dev/sdx1
Set a root password
 # chroot /mnt
 # passwd root
 # exit
create up fstab
# blkid -p /dev/sdc1   cut -f2 -d' ' > /mnt/etc/fstab
Now edit to fix syntax, tell ext2, etc... Now switch to system-nspawn, to avoid boring bind mounting, etc..
# systemd-nspawn -b -D /mnt
login to the container, install linux-base, linux-image-amd64, grub-pc EDIT: fixed block size of dd, based on suggestion of tg.

11 December 2015

Lunar: Reproducible builds: week 32 in Stretch cycle

The first reproducible world summit was held in Athens, Greece, from December 1st-3rd with the support of the Linux Foundation, the Open Tech Fund, and Google. Faidon Liambotis has been an amazing help to sort out all local details. People at ImpactHub Athens have been perfect hosts. North of Athens from the Acropolis with ImpactHub in the center Nearly 40 participants from 14 different free software project had very busy days sharing knowledge, building understanding, and producing actual patches. Anyone interested in cross project discussions should join the rb-general mailing-list. What follows focuses mostly on what happened for Debian this previous week. A more detailed report about the summit will follow soon. You can also read the ones from Joachim Breitner from Debian, Clemens Lang from MacPorts, Georg Koppen from Tor, Dhiru Kholia from Fedora, and Ludovic Court s wrote one for Guix and for the GNU project. The Acropolis from  Infrastructure Several discussions at the meeting helped refine a shared understanding of what kind of information should be recorded on a build, and how they could be used. Daniel Kahn Gillmor sent a detailed update on how .buildinfo files should become part of the Debian archive. Some key changes compared to what we had in mind at DebConf15: Hopefully, ftpmasters will be able to comment on the updated proposal soon. Packages fixed The following packages have become reproducible due to changes in their build dependencies: fades, triplane, caml-crush, globus-authz. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: akira sent proposals on how to make bash reproducible. Alexander Couzens submitted a patch upstream to add support for SOURCE_DATE_EPOCH in grub image generator (#787795). reproducible.debian.net An issue with some armhf build nodes was tracked down to a bad interaction between uname26 personality and new glibc (Vagrant Cascadian). A Debian package was created for koji, the RPM building and tracking system used by Fedora amongst others. It is currently waiting for review in the NEW queue. (Ximin Luo, Marek Marczykowski-G recki) diffoscope development diffoscope now has a dedicated mailing list to better accommodate its growing user and developer base. Going through diffoscope's guts together enabled several new contributors. Baptiste Daroussin, Ed Maste, Clemens Lang, Mike McQuaid, Joachim Breitner all contributed their first patches to improve portability or add new features. Regular contributors Chris Lamb, Reiner Herrmann, and Levente Polyak also submitted improvements. diffoscope hacking session in Athens The next release should support more operating systems, filesystem image comparison via libguestfs, HTML reports with on-demand loading, and parallel processing for the most noticeable improvements. Package reviews 27 reviews have been removed, 17 added and 14 updated in the previous week. Chris Lamb and Val Lorentz filed 4 new FTBFS reports. Misc. Baptiste Daroussin has started to implement support for SOURCE_DATE_EPOCH in FreeBSD in libpkg and the ports tree. Thanks Joachim Breitner and h01ger for the pictures.

Next.