Search Results: "David Bremner"

10 June 2021

Louis-Philippe V ronneau: New Desktop Computer

I built my last desktop computer what seems like ages ago. In 2011, I was in a very different place, both financially and as a person. At the time, I was earning minimum wage at my school's caf to pay rent. Since the caf was owned by the school cooperative, I had an employee discount on computer parts. This gave me a chance to build my first computer from spare parts at a reasonable price. After 10 years of service1, the time has come to upgrade. Although this machine was still more than capable for day to day tasks like browsing the web or playing casual video games, it started to show its limits when time came to do more serious work. Old computer specs:
CPU: AMD FX-8530
Memory: 8GB DDR3 1600Mhz
Motherboard: ASUS TUF SABERTOOTH 990FX R2.0
Storage: Samsung 850 EVO 500GB SATA
I first started considering an upgrade in September 2020: David Bremner was kindly fixing a bug in ledger that kept me from balancing my books and since it seemed like a class of bug that would've been easily caught by an autopkgtest, I decided to add one. After adding the necessary snippets to run the upstream testsuite (an easy task I've done multiple times now), I ran sbuild and ... my computer froze and crashed. Somehow, what I thought was a simple Python package was maxing all the cores on my CPU and using all of the 8GB of memory I had available.2 A few month later, I worked on jruby and the builds took 20 to 30 minutes long enough to completely disrupt my flow. The same thing happened when I wanted to work on lintian: the testsuite would take more than 15 minutes to run, making quick iterations impossible. Sadly, the pandemic completely wrecked the computer hardware market and prices here in Canada have only recently started to go down again. As a result, I had to wait more time than I would've liked not to pay scalper prices. New computer specs:
CPU: AMD Ryzen 5900X
Memory: 64GB DDR4 3200MHz
Motherboard: MSI MPG B550 Gaming Plus
Storage: Corsair MP600 500 GB Gen4 NVME
The difference between the two machines is pretty staggering: I've gone from a CPU with 2 cores and 8 threads, to one with 12 cores and 24 threads. Not only that, but single-threaded performance has also vastly increased in those 10 years. A good example would be building grammalecte, a package I've recently sponsored. I feel it's a good benchmark, since the build relies on single-threaded performance for the normal Python operations, while being threaded when it compiles the dictionaries. On the old computer:
Build needed 00:10:07, 273040k disk space
And as you can see, on the new computer the build time has been significantly reduced:
Build needed 00:03:18, 273040k disk space
Same goes for things like the lintian testsuite. Since it's a very multi-threaded workload, it now takes less than 2 minutes to run; a 750% improvement. All this to say I'm happy with my purchase. And lo and behold I can now build ledger without a hitch, even though it maxes my 24 threads and uses 28GB of RAM. Who would've thought... Screen capture of htop showing how much resources ledger takes to build

  1. I managed to fry that PC's motherboard in 2016 and later replaced it with a brand new one. I also upgraded the storage along the way, from a very cheap cacheless 120GB SSD to a larger Samsung 850 EVO SATA drive.
  2. As it turns out, ledger is mostly written in C++ :)

1 June 2021

David Bremner: Baby steps towards schroot and slurm cooperation.

Unfortunately schroot does not maintain CPU affinity 1. This means in particular that parallel builds have the tendency to take over an entire slurm managed server, which is kindof rude. I haven't had time to automate this yet, but following demonstrates a simple workaround for interactive building.
  simplex:~
 % schroot --preserve-environment -r -c polymake
(unstable-amd64-sbuild)bremner@simplex:~$ echo $SLURM_CPU_BIND_LIST
0x55555555555555555555
(unstable-amd64-sbuild)bremner@simplex:~$ grep Cpus /proc/self/status
Cpus_allowed:   ffff,ffffffff,ffffffff
Cpus_allowed_list:      0-79
(unstable-amd64-sbuild)bremner@simplex:~$ taskset $SLURM_CPU_BIND_LIST bash
(unstable-amd64-sbuild)bremner@simplex:~$ grep Cpus /proc/self/status
Cpus_allowed:   5555,55555555,55555555
Cpus_allowed_list:      0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78

Next steps In principle the schroot configuration parameter can be used to run taskset before every command. In practice it's a bit fiddly because you need a shell script shim (because the environment variable) and you need to e.g. goof around with bind mounts to make sure that your script is available in the chroot. And then there's combining with ccache and eatmydata...

18 July 2020

David Bremner: git-annex and ikiwiki, not as hard as I expected

Background So apparently there's this pandemic thing, which means I'm teaching "Alternate Delivery" courses now. These are just like online courses, except possibly more synchronous, definitely less polished, and the tuition money doesn't go to the College of Extended Learning. I figure I'll need to manage share videos, and our learning management system, in the immortal words of Marie Kondo, does not bring me joy. This has caused me to revisit the problem of sharing large files in an ikiwiki based site (like the one you are reading). My goto solution for large file management is git-annex. The last time I looked at this (a decade ago or so?), I was blocked by git-annex using symlinks and ikiwiki ignoring them for security related reasons. Since then two things changed which made things relatively easy.
  1. I started using the rsync_command ikiwiki option to deploy my site.
  2. git-annex went through several design iterations for allowing non-symlink access to large files.

TL;DR In my ikiwiki config
    # attempt to hardlink source files? (optimisation for large files)
    hardlink => 1,
In my ikiwiki git repo
$ git annex init
$ git annex add foo.jpg
$ git commit -m&aposadd big photo&apos
$ git annex adjust --unlock                 # look ikiwiki, no symlinks
$ ikiwiki --setup ~/.config/ikiwiki/client  # rebuild my local copy, for review
$ ikiwiki --setup /home/bremner/.config/ikiwiki/rsync.setup --refresh  # deploy
You can see the result at photo

6 July 2020

Reproducible Builds: Reproducible Builds in June 2020

Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article) Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube: The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build. The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info) It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software. Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).

Distribution work In the ArchLinux distribution, a goal to remove .doctrees from installed files was created via Arch s TODO list mechanism. These .doctree files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default. Dimitry Andric was able to determine why the reproducibility status of FreeBSD s base.txz depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler [ ]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [ ]. For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan janneke Nieuwenhuizen posted that they had further reduced their bootstrap seed to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:

Debian Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [ ][ ][ ]. Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. [ ] Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system s approach this [ ]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds. 47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates issue [ ] and updated the paths_vary_due_to_usrmerge as deterministic issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path and gcc_captures_build_path issues. [ ][ ][ ]. Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild Debian package rebuilder tool (#961861, #961862 & #961864).

Development On our website this month, Arnout Engelen added a link to our Mastodon account [ ] and moved the SOURCE_DATE_EPOCH git log example to another section [ ]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [ ]. strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper compatibility level to 13 [ ] and adjusted a related dependency to avoid potential circular dependency [ ].

Upstream work The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including: Bernhard M. Wiedemann also filed reports for frr (build fails on single-processor machines), ghc-yesod-static/git-annex (a filesystem ordering issue) and ooRexx (ASLR-related issue).

diffoscope diffoscope is our in-depth diff-on-steroids utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs. This month, Chris Lamb uploaded versions 147, 148 and 149 to Debian and made the following changes:
  • New features:
    • Add output from strings(1) to ELF binaries. (#148)
    • Dump PE32+ executables (such as EFI applications) using objdump(1). (#181)
    • Add support for Zsh shell completion. (#158)
  • Bug fixes:
    • Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF /Info stanza). (#150)
    • Fix compatibility with jsondiff version 1.2.0. (#159)
    • Fix an issue in GnuPG keybox file handling that left filenames in the diff. [ ]
    • Correct detection of JSON files due to missing call to File.recognizes that checks candidates against file(1). [ ]
  • Output improvements:
    • Use the CSS word-break property over manually adding U+200B zero-width spaces as these were making copy-pasting cumbersome. (!53)
    • Downgrade the tlsh warning message to an info level warning. (#29)
  • Logging improvements:
  • Testsuite improvements:
    • Update tests for file(1) version 5.39. (#179)
    • Drop accidentally-duplicated copy of the --diff-mask tests. [ ]
    • Don t mask an existing test. [ ]
  • Codebase improvements:
    • Replace obscure references to WF with Wagner-Fischer for clarity. [ ]
    • Use a semantic AbstractMissingType type instead of remembering to check for both types of missing files. [ ]
    • Add a comment regarding potential security issue in the .changes, .dsc and .buildinfo comparators. [ ]
    • Drop a large number of unused imports. [ ][ ][ ][ ][ ]
    • Make many code sections more Pythonic. [ ][ ][ ][ ]
    • Prevent some variable aliasing issues. [ ][ ][ ]
    • Use some tactical f-strings to tidy up code [ ][ ] and remove explicit u"unicode" strings [ ].
    • Refactor a large number of routines for clarity. [ ][ ][ ][ ]
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12) In addition Jean-Romain Garnier made the following changes:
  • Fix the --new-file option when comparing directories by merging DirectoryContainer.compare and Container.compare. (#180)
  • Allow user to mask/filter diff output via --diff-mask=REGEX. (!51)
  • Make child pages open in new window in the --html-dir presenter format. [ ]
  • Improve the diffs in the --html-dir format. [ ][ ]
Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator [ ] and Mattia Rizzolo prevented warnings from the tlsh fuzzy-matching library during tests [ ] and tweaked the build system to remove an unwanted .build directory [ ]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [ ] and later 148 [ ].

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Prevent bogus failure emails from rsync2buildinfos.debian.net every night. [ ]
    • Merge a fix from David Bremner s database of .buildinfo files to include a fix regarding comparing source vs. binary package versions. [ ]
    • Only run the Debian package rebuilder job twice per day. [ ]
    • Increase bullseye scheduling. [ ]
  • System health status page:
    • Add a note displaying whether a node needs to be rebooted for a kernel upgrade. [ ]
    • Fix sorting order of failed jobs. [ ]
    • Expand footer to link to the related Jenkins job. [ ]
    • Add archlinux_html_pages, openwrt_rebuilder_today and openwrt_rebuilder_future to known broken jobs. [ ]
    • Add HTML <meta> header to refresh the page every 5 minutes. [ ]
    • Count the number of ignored jobs [ ], ignore permanently known broken jobs [ ] and jobs on known offline nodes [ ].
    • Only consider the known offline status from Git. [ ]
    • Various output improvements. [ ][ ]
  • Tools:
    • Switch URLs for the Grml Live Linux and PureOS package sets. [ ][ ]
    • Don t try to build a disorderfs Debian source package. [ ][ ][ ]
    • Stop building diffoscope as we are moving this to Salsa. [ ][ ]
    • Merge several is diffoscope up-to-date on every platform? test jobs into one [ ] and fail less noisily if the version in Debian cannot be determined [ ].
In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components [ ], Jelle van der Waa updated the is diffoscope up-to-date in every platform check for Arch Linux and diffoscope [ ], Mattia Rizzolo backed up a copy of a remove script run on the Codethink-hosted jump server [ ] and Vagrant Cascadian temporarily disabled the fixfilepath on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map categorised issue. Lastly, the usual build node maintenance was performed by Holger Levsen [ ][ ], Mattia Rizzolo [ ] and Vagrant Cascadian [ ][ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

8 April 2020

David Bremner: Tangling multiple files

I have lately been using org-mode literate programming to generate example code and beamer slides from the same source. I hit a wall trying to re-use functions in multiple files, so I came up with the following hack. Thanks 'ngz' on #emacs and Charles Berry on the org-mode list for suggestions and discussion.
(defun db-extract-tangle-includes ()
  (goto-char (point-min))
  (let ((case-fold-search t)
        (retval nil))
    (while (re-search-forward "^#[+]TANGLE_INCLUDE:" nil t)
      (let ((element (org-element-at-point)))
        (when (eq (org-element-type element) &aposkeyword)
          (push (org-element-property :value element) retval))))
    retval))
(defun db-ob-tangle-hook ()
  (let ((includes (db-extract-tangle-includes)))
    (mapc #&aposorg-babel-lob-ingest includes)))
(add-hook &aposorg-babel-pre-tangle-hook #&aposdb-ob-tangle-hook t)
Use involves something like the following in your org-file.
#+SETUPFILE: presentation-settings.org
#+SETUPFILE: tangle-settings.org
#+TANGLE_INCLUDE: lecture21.org
#+TITLE: GC V: Mark & Sweep with free list
For batch export with make, I do something like
%.tangle-stamp: %.org
    emacs --batch --quick  -l org  -l $ HOME /.emacs.d/org-settings.el --eval "(org-babel-tangle-file \"$<\")"
    touch $@

6 March 2020

Reproducible Builds: Reproducible Builds in February 2020

Welcome to the February 2020 report from the Reproducible Builds project. One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the reproducible builds effort is to provide the ability to demonstrate these binaries originated from a particular, trusted, source release: if identical results are generated from a given source in all circumstances, reproducible builds provides the means for multiple third-parties to reach a consensus on whether a build was compromised via distributed checksum validation or some other scheme. In this month s report, we cover:

If you are interested in contributing to the project, please visit our Contribute page on our website.

Media coverage & upstream news Omar Navarro Leija, a PhD student at the University Of Pennsylvania, published a paper entitled Reproducible Containers that describes in detail the workings of a new user-space container tool called DetTrace:
All computation that occurs inside a DetTrace container is a pure function of the initial filesystem state of the container. Reproducible containers can be used for a variety of purposes, including replication for fault-tolerance, reproducible software builds and reproducible data analytics. We use DetTrace to achieve, in an automatic fashion, reproducibility for 12,130 Debian package builds, containing over 800 million lines of code, as well as bioinformatics and machine learning workflows.
There was also considerable discussion on our mailing list regarding this research and a presentation based on the paper will occur at the ASPLOS 2020 conference between March 16th 20th in Lausanne, Switzerland. The many virtues of Reproducible Builds were touted as benefits for software compliance in a talk at FOSDEM 2020, debating whether the Careful Inventory of Licensing Bill of Materials Have Impact of FOSS License Compliance which pitted Jeff McAffer and Carol Smith against Bradley Kuhn and Max Sills. (~47 minutes in). Nobuyoshi Nakada updated the canonical implementation of the Ruby programming language a change such that filesystem globs (ie. calls to list the contents of filesystem directories) will henceforth be sorted in ascending order. Without this change, the underlying nondeterministic ordering of the filesystem is exposed to the language which often results in an unreproducible build. Vagrant Cascadian reported on our mailing list regarding a quick reproducible test for the GNU Guix distribution, which resulted in 81.9% of packages registering as reproducible in his installation:
$ guix challenge --verbose --diff=diffoscope ...
2,463 store items were analyzed:
  - 2,016 (81.9%) were identical
  - 37 (1.5%) differed
  - 410 (16.6%) were inconclusive
Jeremiah Orians announced on our mailing list the release of a number of tools related to cross-compilation such as M2-Planet and mescc-tools-seed. This project attemps a full bootstrap of a cross-platform compiler for the C programming language (written in C itself) from hex, the ultimate goal being able to demonstrate fully-bootstrapped compiler from hex to the GCC GNU Compiler Collection. This has many implications in and around Ken Thompson s Trusting Trust attack outlined in Thompson s 1983 Turing Award Lecture. Twitter user @TheYoctoJester posted an executive summary of reproducible builds in the Yocto Project: Finally, Reddit user tofflos posted to the /r/Java subreddit asking about how to achieve reproducible builds with Maven and Chris Lamb noticed that the Linux kernel documentation about reproducible builds of it is available on the kernel.org homepages in an attractive HTML format.

Distribution work

Debian Chris Lamb created a merge request for the core debian-installer package to allow all arguments and options from sources.list files (such as [check-valid-until=no] , etc.) in order that we can test the reproducibility of the installer images on the Reproducible Builds own testing infrastructure. (#13) Thorsten Glaser followed-up to a bug filed against the dpkg-source component that was originally filed in late 2015 that claims that the build tool does not respect permissions when unpacking tarballs if the umask is set to 0002. Matthew Garrett posted to the debian-devel mailing list on the topic of Producing verifiable initramfs images as part of a wider conversation on being able to trust the entire software stack on our computers. 59 reviews of Debian packages were added, 30 were updated and 42 were removed this month adding to our knowledge about identified issues. Many issue types were noticed and categorised by Chris Lamb, including:

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as provided the following patches:

Software development

diffoscope diffoscope is our in-depth and content-aware diff-like utility that can locate and diagnose reproducibility issues. It is run countless times a day on our testing infrastructure and is essential for identifying fixes and causes of nondeterministic behaviour. Chris Lamb made the following changes this month, including uploading version 137 to Debian:
  • The sng image utility appears to return with an exit code of 1 if there are even minor errors in the file. (#950806)
  • Also extract classes2.dex, classes3.dex from .apk files extracted by apktool. (#88)
  • No need to use str.format if we are just returning the string. [ ]
  • Add generalised support for ignoring returncodes [ ] and move special-casing of returncodes in zip to use Command.VALID_RETURNCODES. [ ]

Other tools disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues. This month, Vagrant Cascadian updated the Vcs-Git to specify the debian packaging branch. [ ] reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, versions 0.7.13 and 0.7.14 were uploaded to Debian unstable by Holger Levsen after Vagrant Cascadian added support for GNU Guix [ ].

Project documentation & website There was more work performed on our documentation and website this month. Bernhard M. Wiedemann added a Java Gradle Build Tool snippet to the SOURCE_DATE_EPOCH documentation [ ] and normalised various terms to unreproducible [ ]. Chris Lamb added a Meson.build example [ ] and improved the documentation for the CMake [ ] to the SOURCE_DATE_EPOCH documentation, replaced anyone can with anyone may as, well, not everyone has the resources, skills, time or funding to actually do what it refers to [ ] and improved the pre-processing for our report generation [ ][ ][ ][ ] etc. In addition, Holger Levsen updated our news page to improve the list of reports [ ], added an explicit mention of the weekly news time span [ ] and reverted sorting of news entries to have latest on top [ ] and Mattia Rizzolo added Codethink as a non-fiscal sponsor [ ] and lastly Tianon Gravi added a Docker Images link underneath the Debian project on our Projects page [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Vagrant Cascadian submitted patches via the Debian bug tracking system targeting the packages the Civil Infrastructure Platform has identified via the CIP and CIP build depends package sets:

Testing framework We operate a fully-featured and comprehensive Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, the following changes were made by Holger Levsen: In addition, Mattia Rizzolo added an Apache web server redirect for buildinfos.debian.net [ ] and reverted the reshuffling of arm64 architecture builders [ ]. The usual build node maintenance was performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian.

Getting in touch If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb and Holger Levsen. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

17 September 2017

Russ Allbery: Free software log (July and August 2017)

I've wanted to start making one of these posts for a few months but have struggled to find the time. But it seems like a good idea, particularly since I get more done when I write down what I do, so you all get a rather belated one. This covers July and August; hopefully the September one will come closer to the end of September. Debian August was DebConf, which included a ton of Policy work thanks to Sean Whitton's energy and encouragement. During DebConf, we incorporated work from Hideki Yamane to convert Policy to reStructuredText, which has already made it far easier to maintain. (Thanks also to David Bremner for a lot of proofreading of the result.) We also did a massive bug triage and closed a ton of older bugs on which there had been no forward progress for many years. After DebConf, as expected, we flushed out various bugs in the reStructuredText conversion and build infrastructure. I fixed a variety of build and packaging issues and started doing some more formatting cleanup, including moving some footnotes to make the resulting document more readable. During July and August, partly at DebConf and partly not, I also merged wording fixes for seven bugs and proposed wording (not yet finished) for three more, as well as participated in various Policy discussions. Policy was nearly all of my Debian work over these two months, but I did upload a new version of the webauth package to build with OpenSSL 1.1 and drop transitional packages. Kerberos I still haven't decided my long-term strategy with the Kerberos packages I maintain. My personal use of Kerberos is now fairly marginal, but I still care a lot about the software and can't convince myself to give it up. This month, I started dusting off pam-krb5 in preparation for a new release. There's been an open issue for a while around defer_pwchange support in Heimdal, and I spent some time on that and tracked it down to an upstream bug in Heimdal as well as a few issues in pam-krb5. The pam-krb5 issues are now fixed in Git, but I haven't gotten any response upstream from the Heimdal bug report. I also dusted off three old Heimdal patches and submitted them as upstream merge requests and reported some more deficiencies I found in FAST support. On the pam-krb5 front, I updated the test suite for the current version of Heimdal (which changed some of the prompting) and updated the portability support code, but haven't yet pulled the trigger on a new release. Other Software I merged a couple of pull requests in podlators, one to fix various typos (thanks, Jakub Wilk) and one to change the formatting of man page references and function names to match the current Linux manual page standard (thanks, Guillem Jover). I also documented a bad interaction with line-buffered output in the Term::ANSIColor man page. Neither of these have seen a new release yet.

2 September 2017

David Bremner: Indexing Debian's buildinfo

Introduction Debian is currently collecting buildinfo but they are not very conveniently searchable. Eventually Chris Lamb's buildinfo.debian.net may solve this problem, but in the mean time, I decided to see how practical indexing the full set of buildinfo files is with sqlite.

Hack
  1. First you need a copy of the buildinfo files. This is currently about 2.6G, and unfortunately you need to be a debian developer to fetch it.
     $ rsync -avz mirror.ftp-master.debian.org:/srv/ftp-master.debian.org/buildinfo .
    
  2. Indexing takes about 15 minutes on my 5 year old machine (with an SSD). If you index all dependencies, you get a database of about 4G, probably because of my natural genius for database design. Restricting to debhelper and dh-elpa, it's about 17M.
     $ python3 index.py
    
    You need at least python3-debian installed
  3. Now you can do queries like
     $ sqlite3 depends.sqlite "select * from depends where depend='dh-elpa' and depend_version<='0106'"
    
    where 0106 is some adhoc normalization of 1.6

Conclusions The version number hackery is pretty fragile, but good enough for my current purposes. A more serious limitation is that I don't currently have a nice (and you see how generous my definition of nice is) way of limiting to builds currently available e.g. in Debian unstable.

17 August 2017

Sean Whitton: DebCamp/DebConf17: reports on sprints and BoFs

In addition to my personal reflections on DebCamp/DebConf17, here is a brief summary of the activities that I had a hand in co-ordinating. I won t discuss here many other small items of work and valuable conversations that I had during the two weeks; hopefully the fruits of these will show themselves in my uploads to the archive over the next year. Debian Policy sprint & BoF Debian Emacs Team meeting/sprint Unfortunately we didn t make any significant progress towards converting all addons to use dh_elpa, as the work is not that much fun. Might be worth a more focused sprint next year. Report on team website Git for Debian packaging BoF & follow-up conversations The BoF was far more about dgit than I had wanted; however, I think that this was mostly because people had questions about dgit, rather than any unintended lecturing by me. I believe that several people came away from DebConf thinking that starting to use dgit would improve Debian for themselves and for users of their packages.

19 July 2016

Michael Prokop: DebConf16 in Capetown/South Africa: Lessons learnt

DebConf 16 in Capetown/South Africa was fantastic for many reasons. My Capetown/South Africa/Culture/Flight related lessons: My technical lessons from DebConf16: BTW, thanks to the video team the recordings from the sessions are available online.

5 May 2016

Sean Whitton: dh_make_elpa & dh_elpa_test

I recently completed and released some work on Debian s tooling for packaging Emacs Lisp addons to GNU Emacs. Emacs grew a package manager called package.el a few years ago, and last year David Bremner wrote the dh_elpa tool to simplify packaging addons for Debian by leveraging package.el features. Packaging a series of addons for Debian left me with a wishlist of features for dh_elpa and I was recently able to implement them. Debian tooling generally uses Perl, a language I didn t know before starting on this project. I was fortunate enough to receive a free review copy of Perl 5 by Example when I attended a meeting of the Bay Area Linux Users Group while I was visiting San Francisco a few months ago. I accepted the book with the intent of doing this work. dh_make_elpa dh_make_elpa (at present available from Debian experimental) is a Perl script to convert a git repository cloned from the upstream of an Emacs Lisp addon to a rudimentary Debian package. It performs a lot of guesswork, and its simple heuristics are something I hope to improve on. Since I am new to object-oriented program design in Perl and I wanted to leverage object-oriented Debian tooling library code, I took the structure of my project from dh_make_perl. In this manner I found it easy and pleasant to write a maintainable script. dh_elpa_test A lot of Emacs Lisp addon packages use a program called Cask to manage the Emacs Lisp dependencies needed to run their test suites. That meant that dh_auto_test often fails to run Emacs Lisp addon package test suites. Since the Debian packaging toolchain already has advanced dependency management, it s undesirable to involve Cask in the package build pipeline if it can be avoided. I had been copying and pasting the code needed to make the tests run in our environment to the debian/rules files of each package whose test suite I wanted to run. dh_elpa_test tries to detect Emacs Lisp addon package test suites and run them with the workarounds needed in our environment. This avoids boilerplate in debian/rules. dh_elpa_test also disables dh_auto_test to avoid a inadvertent Cask invocation. Future & acknowledgements My hope for this work was to make it easier and faster to package Emacs Lisp addon packages for Debian, for my own sake and for anyone new who is interested in joining the pkg-emacsen team. In the future, I want to have dh_elpa_test generate an autopkgtest definition so that a Testsuite: pkg-emacsen line in debian/control is enough to have an Emacs Lisp addon package test suite run on Debian CI. I m very grateful to David Bremner for reviewing and supporting this work, and also for supporting my Emacs Lisp addon packaging work more generally.

1 February 2016

Lunar: Reproducible builds: week 40 in Stretch cycle

What happened in the reproducible builds effort between January 24th and January 30th:

Media coverage Holger Levsen was interviewed by the FOSDEM team to introduce his talk on Sunday 31st.

Toolchain fixes Jonas Smedegaard uploaded d-shlibs/0.63 which makes the order of dependencies generated by d-devlibdeps stable accross locales. Original patch by Reiner Herrmann.

Packages fixed The following 53 packages have become reproducible due to changes in their build dependencies: appstream-glib, aptitude, arbtt, btrfs-tools, cinnamon-settings-daemon, cppcheck, debian-security-support, easytag, gitit, gnash, gnome-control-center, gnome-keyring, gnome-shell, gnome-software, graphite2, gtk+2.0, gupnp, gvfs, gyp, hgview, htmlcxx, i3status, imms, irker, jmapviewer, katarakt, kmod, lastpass-cli, libaccounts-glib, libam7xxx, libldm, libopenobex, libsecret, linthesia, mate-session-manager, mpris-remote, network-manager, paprefs, php-opencloud, pisa, pyacidobasic, python-pymzml, python-pyscss, qtquick1-opensource-src, rdkit, ruby-rails-html-sanitizer, shellex, slony1-2, spacezero, spamprobe, sugar-toolkit-gtk3, tachyon, tgt. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them:
  • gnubg/1.05.000-4 by Russ Allbery.
  • grcompiler/4.2-6 by Hideki Yamane.
  • sdlgfx/2.0.25-5 fix by Felix Geyer, uploaded by Gianfranco Costamagna.
Patches submitted which have not made their way to the archive yet:
  • #812876 on glib2.0 by Lunar: ensure that functions are sorted using the C locale when giotypefuncs.c is generated.

diffoscope development diffoscope 48 was released on January 26th. It fixes several issues introduced by the retrieval of extra symbols from Debian debug packages. It also restores compatibility with older versions of binutils which does not support readelf --decompress.

strip-nondeterminism development strip-nondeterminism 0.015-1 was uploaded on January 27th. It fixes handling of signed JAR files which are now going to be ignored to keep the signatures intact.

Package reviews 54 reviews have been removed, 36 added and 17 updated in the previous week. 30 new FTBFS bugs have been submitted by Chris Lamb, Michael Tautschnig, Mattia Rizzolo, Tobias Frost.

Misc. Alexander Couzens and Bryan Newbold have been busy fixing more issues in OpenWrt. Version 1.6.3 of FreeBSD's package manager pkg(8) now supports SOURCE_DATE_EPOCH. Ross Karchner did a lightning talk about reproducible builds at his work place and shared the slides.

24 January 2016

Lunar: Reproducible builds: week 39 in Stretch cycle

What happened in the reproducible builds effort between January 17th and January 23rd:

Toolchain fixes James McCoy uploaded subversion/1.9.3-2 which removes -Wdate-time from CPPFLAGS passed to swig enabling several packages to build again. The switch made in binutils/2.25-6 to use deterministic archives by default had the unfortunate effect of breaking a seldom used feature of make. Manoj Srivastava asked on debian-devel the best way to communicate the changes to Debian users. Lunar quickly came up with a patch that displays a warning when Make encounters deterministic archives. Manoj made it available in make/4.1-2 together with a NEWS file advertising the change. Following Guillem Jover's comment on the latest patch to make mtimes of packaged files deterministic, Daniel Kahn Gillmor updated and extended the patch adding the --clamp-mtime option to GNU Tar. Mattia Rizzolo updated texlive-bin in the reproducible experimental repository.

Packages fixed The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:

reproducible.debian.net Transition from reproducible.debian.net to the more general tests.reproducible-builds.org has started. More visual changes are coming. (h01ger) A plan on how to run tests for F-Droid has been worked out. (hc, mvdan, h01ger) A first step has been made by adding a Jenkins job to setup an F-Droid build environment. (h01ger)

diffoscope development diffoscope 46 has been released on January 19th, followed-up by version 47 made available on January 23rd. Try it online at try.diffoscope.org! The biggest visible change is the improvement to ELF file handling. Comparisons are now done section by section, using the most appropriate tool and options to get meaningful results, thanks to Dhole's work and Mike Hommey's suggestions. Also suggested by Mike, symbols for IP-relative ops are now filtered out to remove clutter. Understanding differences in ELF files belonging to Debian packages should also be much easier as diffoscope will now try to extract debug information from the matching dbgsym package. This means objdump disassembler should output line numbers for packages built with recent debhelper as long as the associated debug package is in the same directory. As diff tends to consume huge amount of memory on large inputs, diffoscope has a limit in place to prevent crashes. diffoscope used to display a difference every time the limit was hit. Because this was confusing in case there were actually no differences, a hash is now internally computed to only report a difference when one exists. Files in archives and other container members are now compared in the original order. This should not matter in most case but overall give more predictable results. Debian .buildinfo files are now supported. Amongst other minor fixes and improvements, diffoscope will now properly compare symlinks in directories. Thanks Tuomas Tynkkynen for reporting the problem.

Package reviews 70 reviews have been removed, 125 added and 33 updated in the previous week, gcc-5 amongst others. 25 FTBFS issues have been filled by Chris Lamb, Daniel Stender, Martin Michlmayr.

Misc. The 16th FOSDEM will happen in Brussels, Belgium on January 30-31st. Several talks will be about reproducible builds: h01ger about the general ecosystem, Fabian Keil about the security oriented ElectroBSD, Baptiste Daroussin about FreeBSD packages, Ludovic Court s about Guix.

14 January 2016

Lunar: Reproducible builds: week 37 in Stretch cycle

What happened in the reproducible builds effort between January 3rd and January 9th 2016:

Toolchain fixes David Bremner uploaded dh-elpa/0.0.18 which adds a --fix-autoload-date option (on by default) to take autoload dates from changelog. Lunar updated and sent the patch adding the generation of .buildinfo to dpkg.

Packages fixed The following packages have become reproducible due to changes in their build dependencies: aggressive-indent-mode, circe, company-mode, db4o, dh-elpa, editorconfig-emacs, expand-region-el, f-el, geiser, hyena, js2-mode, markdown-mode, mono-fuse, mysql-connector-net, openbve, regina-normal, sml-mode, vala-mode-el. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #809780 on flask-restful by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810259 on avfs by Chris Lamb: implement support for SOURCE_DATE_EPOCH in the build system.
  • #810509 on apt by Mattia Rizzolo: ensure a stable file order is given to the linker.

reproducible.debian.net Add 2 more armhf build nodes provided by Vagrant Cascadian. This added 7 more armhf builder jobs. We now run around 900 tests of armhf packages each day. (h01ger) The footer of each page now indicates by which Jenkins jobs build it. (h01ger)

diffoscope development diffoscope 45 has been released on January 4th. It features huge memory improvements when comparing large files, several fixes of squashfs related issues that prevented comparing two Tails images, and improve the file list of tar and cpio archive to be more precise and consistent over time. It also fixes a typo that prevented the Mach-O to work (Rainer M ller), improves comparisons of ELF files when specified on the command line, and solves a few more encoding issues.

Package reviews 134 reviews have been removed, 30 added and 37 updated in the previous week. 20 new fail to build from source issues were reported by Chris Lamb and Chris West. prebuilder will now skip installing diffoscope to save time if the build results are identical. (Reiner Herrmann)

29 December 2015

David Bremner: Converting PDFs to DJVU

Today I was wondering about converting a pdf made from scan of a book into djvu, hopefully to reduce the size, without too much loss of quality. My initial experiments with pdf2djvu were a bit discouraging, so I invested some time building gsdjvu in order to be able to run djvudigital. Watching the messages from djvudigital I realized that the reason it was achieving so much better compression was that it was using black and white for the foreground layer by default. I also figured out that the default 300dpi looks crappy since my source document is apparently 600dpi. I then went back an compared djvudigital to pdf2djvu a bit more carefully. My not-very-scientific conclusions: Perhaps most compellingly, the output from pdf2djvu has sensible metadata and is searchable in evince. Even with the --words option, the output from djvudigital is not. This is possibly related to the error messages like
Can't build /Identity.Unicode /CIDDecoding resource. See gs_ciddc.ps .
It could well be my fault, because building gsdjvu involved guessing at corrections for several errors. Some of these issues have to do with building software from 2009 (the instructions suggestion building with ghostscript 8.64) in a modern toolchain; others I'm not sure. There was an upload of gsdjvu in February of 2015, somewhat to my surprise. AT&T has more or less crippled the project by licensing it under the CPL, which means binaries are not distributable, hence motivation to fix all the rough edges is minimal.
Version kilobytes per page position in figure
Original PDF 80.9 top
pdf2djvu --dpi=450 92.0 not shown
pdf2djvu --monochrome --dpi=450 27.5 second from top
pdf2djvu --monochrome --dpi=600 --loss-level=50 21.3 second from bottom
djvudigital --dpi=450 29.4 bottom
djvu-compare.png

23 December 2015

David Bremner: Offline key signing with caff

After a mildly ridiculous amount of effort I made a bootable-usb key. I then layered a bash script on top of a perl script on top of gpg. What could possibly go wrong?
 #!/bin/bash
 infile=$1
 keys=$(gpg --with-colons  $infile   sed -n 's/^pub//p'   cut -f5 -d: )
 gpg --homedir $HOME/.caff/gnupghome --import $infile
 caff -R -m no "$ keys[*] "
 today=$(date +"%Y-%m-%d")
 output="$(pwd)/keys-$today.tar"
 for key in $ keys[*] ; do
     (cd $HOME/.caff/keys/;   tar rvf "$output" $today/$key.mail*)
 done
The idea is that keys are exported to files on a networked host, the files are processed on an offline host, and the resulting tarball of mail messages sneakernetted back to the connected host.

21 December 2015

David Bremner: Bootable Debian USB

Umm. Somehow I thought this would be easier than learning about live-build. Probably I was wrong. There are probably many better tutorials on the web. Two useful observations: zeroing the key can eliminate mysterious grub errors, and systemd-nspawn is pretty handy. One thing that should have been obvious, but wasn't to me is that it's easier to install grub onto a device outside of any container. Find device
 $ dmesg
Count sectors
 # fdisk -l /dev/sdx
Assume that every command after here is dangerous. Zero it out. This is overkill for a fresh key, but fixed a problem with reusing a stick that had a previous live distro installed on it.
 # dd if=/dev/zero of=/dev/sdx bs=1048576 count=$sectors
Make file system. There are lots of options. I eventually used parted
 # parted
 (parted) mklabel msdos
 (parted) mkpart primary ext2 1 -1
 (parted) set 1 boot on
 (parted) quit
Make a file system
 # mkfs.ext2 /dev/sdx1
 # mount /dev/sdx1 /mnt
Install the base system
 # debootstrap --variant=minbase jessie /mnt http://httpredir.debian.org/debian/
Install grub (no chroot needed)
 # grub-install --boot-directory /mnt/boot /dev/sdx1
Set a root password
 # chroot /mnt
 # passwd root
 # exit
create up fstab
# blkid -p /dev/sdc1   cut -f2 -d' ' > /mnt/etc/fstab
Now edit to fix syntax, tell ext2, etc... Now switch to system-nspawn, to avoid boring bind mounting, etc..
# systemd-nspawn -b -D /mnt
login to the container, install linux-base, linux-image-amd64, grub-pc EDIT: fixed block size of dd, based on suggestion of tg.

11 December 2015

Lunar: Reproducible builds: week 32 in Stretch cycle

The first reproducible world summit was held in Athens, Greece, from December 1st-3rd with the support of the Linux Foundation, the Open Tech Fund, and Google. Faidon Liambotis has been an amazing help to sort out all local details. People at ImpactHub Athens have been perfect hosts. North of Athens from the Acropolis with ImpactHub in the center Nearly 40 participants from 14 different free software project had very busy days sharing knowledge, building understanding, and producing actual patches. Anyone interested in cross project discussions should join the rb-general mailing-list. What follows focuses mostly on what happened for Debian this previous week. A more detailed report about the summit will follow soon. You can also read the ones from Joachim Breitner from Debian, Clemens Lang from MacPorts, Georg Koppen from Tor, Dhiru Kholia from Fedora, and Ludovic Court s wrote one for Guix and for the GNU project. The Acropolis from  Infrastructure Several discussions at the meeting helped refine a shared understanding of what kind of information should be recorded on a build, and how they could be used. Daniel Kahn Gillmor sent a detailed update on how .buildinfo files should become part of the Debian archive. Some key changes compared to what we had in mind at DebConf15: Hopefully, ftpmasters will be able to comment on the updated proposal soon. Packages fixed The following packages have become reproducible due to changes in their build dependencies: fades, triplane, caml-crush, globus-authz. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet: akira sent proposals on how to make bash reproducible. Alexander Couzens submitted a patch upstream to add support for SOURCE_DATE_EPOCH in grub image generator (#787795). reproducible.debian.net An issue with some armhf build nodes was tracked down to a bad interaction between uname26 personality and new glibc (Vagrant Cascadian). A Debian package was created for koji, the RPM building and tracking system used by Fedora amongst others. It is currently waiting for review in the NEW queue. (Ximin Luo, Marek Marczykowski-G recki) diffoscope development diffoscope now has a dedicated mailing list to better accommodate its growing user and developer base. Going through diffoscope's guts together enabled several new contributors. Baptiste Daroussin, Ed Maste, Clemens Lang, Mike McQuaid, Joachim Breitner all contributed their first patches to improve portability or add new features. Regular contributors Chris Lamb, Reiner Herrmann, and Levente Polyak also submitted improvements. diffoscope hacking session in Athens The next release should support more operating systems, filesystem image comparison via libguestfs, HTML reports with on-demand loading, and parallel processing for the most noticeable improvements. Package reviews 27 reviews have been removed, 17 added and 14 updated in the previous week. Chris Lamb and Val Lorentz filed 4 new FTBFS reports. Misc. Baptiste Daroussin has started to implement support for SOURCE_DATE_EPOCH in FreeBSD in libpkg and the ports tree. Thanks Joachim Breitner and h01ger for the pictures.

20 February 2015

David Bremner: Dear Lenovo, it's not me, it's you.

I've been a mostly happy Thinkpad owner for almost 15 years. My first Thinkpad was a 570, followed by an X40, an X61s, and an X220. There might have been one more in there, my archives only go back a decade. Although it's lately gotten harder to buy Thinkpads at UNB as Dell gets better contracts with our purchasing people, I've persevered, mainly because I'm used to the Trackpoint, and I like the availability of hardware service manuals. Overall I've been pleased with the engineering of the X series. Over the last few days I learned about the installation of the superfish malware on new Lenovo systems, and Lenovo's completely inadequate response to the revelation. I don't use Windows, so this malware would not have directly affected me (unless I had the misfortune to use this system to download installation media for some GNU/Linux distribution). Nonetheless, how can I trust the firmware installed by a company that seems to value its users' security and privacy so little? Unless Lenovo can show some sign of understanding the gravity of this mistake, and undertake not to repeat it, then I'm afraid you will be joining Sony on my list of vendors I used to consider buying from. Sure, it's only a gross income loss of $500 a year or so, if you assume I'm alone in this reaction. I don't think I'm alone in being disgusted and angered by this incident.

25 April 2013

David Bremner: Exporting Debian packaging patches from git, (redux)*

(Debian) packaging and Git. The big picture is as follows. In my view, the most natural way to work on a packaging project in version control [1] is to have an upstream branch which either tracks upstream Git/Hg/Svn, or imports of tarballs (or some combination thereof, and a Debian branch where both modifications to upstream source and commits to stuff in ./debian are added [2]. Deviations from this are mainly motivated by a desire to export source packages, a version control neutral interchange format that still preserves the distinction between upstream source and distro modifications. Of course, if you're happy with the distro modifications as one big diff, then you can stop reading now gitpkg $debian_branch $upstream_branch and you're done. The other easy case is if your changes don't touch upstream; then 3.0 (quilt) packages work nicely with ./debian in a separate tarball. So the tension is between my preferred integration style, and making source packages with changes to upstream source organized in some nice way, preferably in logical patches like, uh, commits in a version control system. At some point we may be able use some form of version control repo as a source package, but the issues with that are for another blog post. At the moment then we are stuck with trying bridge the gap between a git repository and a 3.0 (quilt) source package. If you don't know the details of Debian packaging, just imagine a patch series like you would generate with git format-patch or apply with (surprise) quilt. From Git to Quilt. The most obvious (and the most common) way to bridge the gap between git and quilt is to export patches manually (or using a helper like gbp-pq) and commit them to the packaging repository. This has the advantage of not forcing anyone to use git or specialized helpers to collaborate on the package. On the other hand it's quite far from the vision of using git (or your favourite VCS) to do the integration that I started with. The next level of sophistication is to maintain a branch of upstream-modifying commits. Roughly speaking, this is the approach taken by git-dpm, by gitpkg, and with some additional friction from manually importing and exporting the patches, by gbp-pq. There are some issues with rebasing a branch of patches, mainly it seems to rely on one person at a time working on the patch branch, and it forces the use of specialized tools or workflows. Nonetheless, both git-dpm and gitpkg support this mode of working reasonably well [3]. Lately I've been working on exporting patches from (an immutable) git history. My initial experiments with marking commits with git notes more or less worked [4]. I put this on the back-burner for two reasons, first sharing git notes is still not very well supported by git itself [5], and second Gitpkg maintainer Ron Lee convinced me to automagically pick out what patches to export. Ron's motivation (as I understand it) is to have tools which work on any git repository without extra metadata in the form of notes. Linearizing History on the fly. After a few iterations, I arrived at the following specification. Condition (4) suggests we want something roughly like git format-patch upstream..head, removing those patches which are only about Debian packaging. Because of (3), we have to be a bit careful about commits that touch upstream and ./debian. We also want to avoid outputting patches that have been applied (or worse partially applied) upstream. git patch-id can help identify cherry-picked patches, but not partial application. Eventually I arrived at the following strategy.
  1. Use git-filter-branch to construct a copy of the history upstream..head with ./debian (and for technical reasons .pc) excised.
  2. Filter these commits to remove e.g. those that are present exactly upstream, or those that introduces no changes, or changes unrepresentable in a patch.
  3. Try to revert the remaining commits, in reverse order. The idea here is twofold. First, a patch that occurs twice in history because of merging will only revert the most recent one, allowing earlier copies to be skipped. Second, the state of the temporary branch after all successful reverts represents the difference from upstream not accounted for by any patch.
  4. Generate a "fixup patch" accounting for any remaining differences, to be applied before any if the "nice" patches.
  5. Cherry-pick each "nice" patch on top of the fixup patch, to ensure we have a linear history that can be exported to quilt. If any of these cherry-picks fail, abort the export.
Yep, it seems over-complicated to me too. TL;DR: Show me the code. You can clone my current version from
git://pivot.cs.unb.ca/gitpkg.git
This provides a script "git-debcherry" which does the history linearization discussed above. In order to test out how/if this works in your repository, you could run
git-debcherry --stat $UPSTREAM
For actual use, you probably want to use something like
git-debcherry -o debian/patches
There is a hook in hooks/debcherry-deb-export-hook that does this at source package export time. I'm aware this is not that fast; it does several expensive operations. On the other hand, you know what Don Knuth says about premature optimization, so I'm more interested in reports of when it does and doesn't work. In addition to crashing, generating multi-megabyte "fixup patch" probably counts as failure. Notes
  1. This first part doesn't seem too Debian or git specific to me, but I don't know much concrete about other packaging workflows or other version control systems.
  2. Another variation is to have a patched upstream branch and merge that into the Debian packaging branch. The trade-off here that you can simplify the patch export process a bit, but the repo needs to have taken this disciplined approach from the beginning.
  3. git-dpm merges the patched upstream into the Debian branch. This makes the history a bit messier, but seems to be more robust. I've been thinking about trying this out (semi-manually) for gitpkg.
  4. See e.g. exporting. Although I did not then know the many surprising and horrible things people do in packaging histories, so it probably didn't work as well as I thought it did.
  5. It's doable, but one ends up spending about a bunch lines of code on duplicating basic git functionality; e.g. there is no real support for tags of notes.
  6. Since as far as I know quilt has no way of deleting files except to list the content, this means in particular exporting upstream should yield a DFSG Free source tree.

Next.