Photo by Pixabay |
Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:
2x less disk space used (1,417MB vs 2,940MB, including initrd)
3x less peak RAM usage for the initrd boot (68MB vs 204MB)
0.5x increase in download size (949MB vs 600MB)
2.5x faster initrd generation (4.5s vs 11.3s)
approximately the same total time (103s vs 98s, hardware dependent)
For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:
1.3x less disk space used (548MB vs 742MB)
2.2x less peak RAM usage for initrd boot (27MB vs 62MB)
0.4x increase in download size (207MB vs 146MB)
Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.
This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.
The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.
The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.
Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.
Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:
/
to /usr
, we will also run into the problems that the file move
moratorium was meant to prevent. The way forward is detecting them early and
applying workarounds on a per-package basis. Said detection is now automated
using the Debian Usr Merge Analysis Tool.
As problems are reported to the bug tracking system, they are connected to the
reports if properly usertagged. Bugs and patches for problem categories
DEP17-P2 and DEP17-P6 have been filed.
After consensus has been reached
on the bootstrapping matters, debootstrap
has been
changed to swap the initial unpack and merging
to avoid unpack errors due to pre-existing links. This is a precondition for
having base-files
install the aliasing symbolic links eventually.
It was identified that the root filesystem used by the Debian installer is
still unmerged and a
change has been proposed.
debhelper
was changed to
recognize systemd units installed to /usr.
A discussion with the CTTE and release team on repealing the moratorium has
been initiated.
dpkg-buildflags
now
defaults to issue arm64
-specific compiler flags, more care is needed to
distinguish between build architecture flags and host architecture flags than
previously.The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created. (PDF)
I have identified 16 root causes for unreproducible builds in my empirical study, which I have linked to the corresponding documentation. The initial MR right now contains information about 10 root causes. For each root cause, I have provided a definition, a notable instance, and a workaround. However, I have only found workarounds for 5 out of the 10 root causes listed in this merge request. In the upcoming commits, I plan to add an additional 6 root causes. I kindly request you review the text for any necessary refinements, modifications, or corrections. Additionally, I would appreciate the help with documentation for the solutions/workarounds for the remaining root causes: Archive Metadata, Build ID, File System Ordering, File Permissions, and Snippet Encoding. Your input on the identified root causes for unreproducible builds would be greatly appreciated. [ ]
while packaginggovulncheck
for Arch Linux I noticed a checksum mismatch for a tar file I downloaded fromgo.googlesource.com
. I used diffoscope to compare the.tar
file I downloaded with the.tar
file the build server downloaded, and noticed the timestamps are different.
ffile_prefix_map_passed_to_clang
being fixed since Debian bullseye [ ] and adding a Debian bug tracker reference for the nondeterminism_added_by_pyqt5_pyrcc5
issue [ ].
In addition, Roland Clobus posted another detailed update of the status of reproducible Debian ISO images on our mailing list. In particular, Roland helpfully summarised that live images are looking good, and the number of (passing) automated tests is growing .
util.inspect.object_description
attempts to sort collections, but this can fail. The change handles the failure case by using string-based object descriptions as a
fallback deterministic sort ordering, as well as adding recursive object-description calls for list and tuple datatypes. As a result,
documentation generated by Sphinx will be more likely to be automatically reproducible.
Lastly in news, kpcyrd posted to our mailing list announcing a new repro-env
tool:
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. [ ]
django-graphql-jwt
(fails to build in 2038)doxygen
(filesystem ordering issue)git-interactive-rebase-tool
(date-related issue)obs-build
procmeter
(parallelism race condition)promu
python-cx_Freeze
(version update for year 2038 fix)python-zope.deprecation
python310
(ASLR-related issue)python-control
(fails to build-j4)python-DateTime
(fails to build in 2038)python-pyface
(date/time-related issue)python-quantities
(date/time-related issue)python-scipy
(date/time-related issue)rpmlint
starship
(filesystem ordering issue)Telethon
xindy
(fails to build in 2036)yt
(filesystem ordering issue)python-bpython
, python-flup
, python-mysqlclient
, python-waitress
, python-WebOb
, python-WebTest
, python-zope.event
, python-zope.hookable
& python-zope.i18nmessageid
dotenv-cli
.unity-java
.ruby-babosa
(forwarded upstream).guidata
(forwarded upstream).SOURCE_DATE_EPOCH
, a three-and-a-half year effort started by Bernhard M. Wiedemann in January 2020, taken over by John Neffenger in March 2021, integrated upstream in June 2023, and available starting with JavaFX 21 on September 19, 2023.244
, 245
and 246
were uploaded to Debian unstable by Chris Lamb, who also made the following changes:
libarchive-5
. [ ]test_dex::test_javap_14_differences
test requires the procyon
tool. [ ]assert_diff
in the .ico
and .jpeg
tests. [ ]XFAIL
due to Debian bugs #1040941 & #1040916. [ ]create_meta_pkg_sets
job into two (for Debian unstable and Debian testing) to half the job runtime to approximately 90 minutes. [ ][ ]postgresql_autodoc
is back in Debian bookworm. [ ]kfreebsd
-related tests now that it s officially dead. [ ]dpkg-db-backup
[ ] and munin-node services
[ ].#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
dd if=/dev/sda of=/dev/disk/by-id/usb0 status=progressIf you confirm that dd is available on the netinst image and the previous command runs successfully, test that your windows partition is visible in the new disk s partition table. The start block of the windows partition on each should match, as should the partition size.
fdisk -l /dev/disk/by-id/usb0 fdisk -l /dev/sdaIf the output from the first is the same as the output from the second, then you are probably safe to proceed. Once you confirm that you have made and tested a full copy of the blocks from your windows drive saved on your usb disk, nuke your windows partition table from orbit.
dd if=/dev/zero of=/dev/sda bs=1M count=42You can press alt-f1 to return to the Debian installer now. Follow the instructions to install Debian. Don t forget to remove all attached USB drives. Once you install Debian, press ctrl-alt-f3 to get a root shell. Add your user to the sudoers group:
# adduser cjac sudoerslog out
# exitlog in as your user and confirm that you have sudo
$ sudo lsDon t forget to read the spider man advice enter your password you ll need to install virt-manager. I think this should help:
$ sudo apt-get install virt-manager libvirt-daemon-driver-qemu qemu-system-x86insert the USB drive. You can now create a qcow2 file for your virtual machine.
$ sudo qemu-img convert -O qcow2 \ /dev/disk/by-id/usb0 \ /var/lib/libvirt/images/windows.qcow2I personally create a volume group called /dev/vg00 for the stuff I want to run raw and instead of converting to qcow2 like all of the other users do, I instead write it to a new logical volume.
sudo lvcreate /dev/vg00 -n windows -L 42G # or however large your drive was sudo dd if=/dev/disk/by-id/usb0 of=/dev/vg00/windows status=progressNow that you ve got the qcow2 file created, press alt-left until you return to your GDM session. The apt-get install command above installed virt-manager, so log in to your system if you haven t already and open up gnome-terminal by pressing the windows key or moving your mouse/gesture to the top left of your screen. Type in gnome-terminal and either press enter or click/tap on the icon. I like to run this full screen so that I feel like I m in a space ship. If you like to feel like you re in a spaceship, too, press F11. You can start virt-manager from this shell or you can press the windows key and type in virt-manager and press enter. You ll want the shell to run commands such as virsh console windows or virsh list When virt-manager starts, right click on QEMU/KVM and select New.
Welcome to the April 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. And, as always, if you are interested in contributing to the project, please visit our Contribute page on our website.
The absolute number may not be impressive, but what I hope is at least a useful contribution is that there actually is a number on how much of Trisquel is reproducible. Hopefully this will inspire others to help improve the actual metric.Simon wrote another blog post this month on a new tool to ensure that updates to Linux distribution archive metadata (eg. via
apt-get update
) will only use files that have been recorded in a globally immutable and tamper-resistant ledger. A similar solution exists for Arch Linux (called pacman-bintrans
) which was announced in August 2021 where an archive of all issued signatures is publically accessible.
[ ] the third reduction of the Guix bootstrap binaries has now been merged in the main branch of Guix! If you run guix pull
today, you get a package graph of more than 22,000 nodes rooted in a 357-byte program something that had never been achieved, to our knowledge, since the birth of Unix.
More info about this change is available on the post itself, including:
The full-source bootstrap was once deemed impossible. Yet, here we are, building the foundations of a GNU/Linux distro entirely from source, a long way towards the ideal that the Guix project has been aiming for from the start.
There are still some daunting tasks ahead. For example, what about the Linux kernel? The good news is that the bootstrappable community has grown a lot, from two people six years ago there are now around 100 people in the #bootstrappable
IRC channel.
Pythia is the only publicly available model suite that includes models that were trained on the same data in the same order [and] all the corresponding data and tools to download and replicate the exact training process are publicly released to facilitate further research.These properties are intended to allow researchers to understand how gender bias (etc.) can affected by training data and model scale.
alembic
Debian package to build reproducibly. Although Chris Lamb was able to identify the source problem and provided a potential patch that might fix it, James Addison has taken the issue in hand, leading to a large amount of activity resulting in a proposed pull request that is waiting to be merged.
A software bill of materials (SBOM) is defined as a nested inventory for software, a list of ingredients that make up software components. When you receive a physical delivery of some sort, the bill of materials tells you what s inside the box. Similarly, when you use software created outside of your organisation, the SBOM tells you what s inside that software. The SBOM is a file that declares the software supply chain (SSC) for that specific piece of software. [ ]
#reproducible-builds
IRC channel, but no solution appears to be in sight for now.
.zip
file were different between two builds.
: .lead
appearing in the page [ ][ ][ ], made all the Back to who is involved links italics [ ], and corrected the syntax of the _data/sponsors.yml
file [ ].
build-essential
package set, which was inspired by how close we are to making the Debian build-essential
set reproducible and how important that set of packages are in general . Vagrant mentioned that: I have some progress, some hope, and I daresay, some fears . [ ]
dinstalls
that is to say, the snapshot service is not capturing 100% of all of historical states of the Debian archive. This is relevant to reproducibility because without the availability historical versions, it is becomes impossible to repeat a build at a future date in order to correlate checksums. .
build_path_in_line_annotations_added_by_ruby_ragel
toolchain issue. [ ]
ghc
(workaround a parallelism-related issue)ghc
(report a parallelism-related issue)ruby-regexp-parser
.pike8.0
.binutils
.lomiri-action-api
.lomiri
.nmodl
.php8.2
.qemu
.twisted
.boost1.74
.php8.2
.shaderc
.jackd2
.241
was uploaded to Debian unstable by Chris Lamb. It included contributions already covered in previous months as well a change by Chris Lamb to add a missing raise
statement that was accidentally dropped in a previous commit. [ ]
/tmp/archlinux-ci/
after three days. [ ][ ][ ]schroot
sessions. [ ]stretch
Debian distribution. [ ][ ][ ][ ]pyyaml
6.0 as present in Debian bookworm. [ ]#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome underChanges in Rcpp release version 1.0.10 (2023-01-12)
- Changes in Rcpp API:
- Unwind protection is enabled by default (I aki in #1225). It can be disabled by defining
RCPP_NO_UNWIND_PROTECT
before includingRcpp.h
.RCPP_USE_UNWIND_PROTECT
is not checked anymore and has no effect. The associated pluginunwindProtect
is therefore deprecated and will be removed in a future release.- The 'finalize' method for Rcpp Modules is now eagerly materialized, fixing an issue where errors can occur when Module finalizers are run (Kevin in #1231 closing #1230).
- Zero-row
data.frame
objects can receivepush_back
orpush_front
(Dirk in #1233 fixing #1232).- One remaining
sprintf
has been replaced bysnprintf
(Dirk and Kevin in #1236 and #1237).- Several conversion warnings found by
clang++
have been addressed (Dirk in #1240 and #1241).- Changes in Rcpp Attributes:
- Changes in Rcpp Deployment:
- Several GitHub Actions have been updated.
rcpp
tag at StackOverflow which also allows searching among the
(currently) 2932 previous questions.
If you like this or other open-source work I do, you can sponsor me at
GitHub.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
.apk
files shipped by a number of free-software instant messenger applications.
These scripts are often necessary in the Android/APK ecosystem due to these files containing embedded signatures so the conventional bit-for-bit comparison cannot be used. After detailing a litany of issues with these tools, they come to the conclusion that:
It s quite possible these messengers actually have reproducible builds, but the verification scripts they use don t actually allow us to verify whether they do.This reflects the consensus view within the Reproducible Builds project: pursuing a situation in language or package ecosystems where binaries are bit-for-bit identical (over requiring a bespoke ecosystem-specific tool) is not a luxury demanded by purist engineers, but rather the only practical way to demonstrate reproducibility. obfusk also announced the first release of their own set of tools on our mailing list. Related to this, obfusk also posted to an issue filed against Mastodon regarding the difficulties of creating bit-by-bit identical APKs, especially with respect to copying v2/v3 APK signatures created by different tools; they also reported that some APK ordering differences were not caused by building on macOS after all, but by using Android Studio [ ] and that F-Droid added 16 more apps published with Reproducible Builds in December.
aespipe
(#661079, #1020809), cdbackup
(#1011428) & xmlrpc-epi
(#865688, #1020651)
apr-util
(#1006865), lirc
(#979024) & ruby-omniauth-tumblr
amavisd-milter
(#975954), apophenia
(#940013), cfi
(#995647), chessx
(#881664), cmocka
(#991181), desmume
(#890312), golang-gonum-v1-plot
(#968045), intel-gpu-tools
(#945105), jhbuild
(#971420), libjama
(#986601), libjs-qunit
(#976445), liblip
(#1001513, #989583), libstatgrab
(#961747), mlpost
(#977179 and #977180), netcdf-parallel
(#972930), netgen-lvs
(#955783), perfect-scrollbar
(#1000770), python-tomli
(#994979), pytsk
(#992060), smplayer
(#997689), squeak-plugins-scratch
(#876771, #942006), stgit
(#942009), strace
(#896016), surgescript
(#992061), sympow
(#973601), wxmaxima
(#983148), xavs2
(#952493), xaw3d
(#991180, #986704) and yard
(#972668).
OpenRGB
(filesystem ordering issue)python-maturin
(report an issue regarding random numbers)rav1e
(datetime-related issue)weblate
(report that the build fails in 2038)osuosl167
machine is no longer a openqa-worker
node anymore. [ ][ ]foot-terminfo
package on Debian systems. [ ]--timeout
flag. [ ][ ]
228
, 229
and 230
to Debian:
file(1)
version 5.43, with thanks to Christoph Biedl. [ ]test_html.py::test_diff
test if html2text
is not installed. (#1026034)Standards-Version
on all of our packages, including diffoscope [ ], strip-nondeterminism [ ], disorderfs [ ] and reprotest [ ].
#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
[ ] proposes a general taxonomy for attacks on opensource supply chains, independent of specific programming languages or ecosystems, and covering all supply chain stages from code contributions to package distribution.Taking the form of an attack tree, the paper covers 107 unique vectors linked to 94 real world supply-chain incidents which is then mapped to 33 mitigating safeguards including, of course, reproducible builds:
Reproducible Builds received a very high utility rating (5) from 10 participants (58.8%), but also a high-cost rating (4 or 5) from 12 (70.6%). One expert commented that a reproducible build like used by Solarwinds now, is a good measure against tampering with a single build system and another claimed this is going to be the single, biggest barrier .
[ ] illustrate a concerning new reality for the software industry and illuminates the increasingly sophisticated threats made by outside nation-states to the supply chains and infrastructure on which we all rely.The 12-month anniversary of the 2020 Solarwinds attack (which SolarWinds Worldwide LLC itself calls the SUNBURST attack) was, of course, the likely impetus for publication.
/build/1st/cyrus-imapd-3.6.0~beta3/
/build/2/cyrus-imapd-3.6.0~beta3/2nd/
git archive
command doesn t match the tarball served by GitHub anymore. In his post, kpcyrd narrows the change to a specific commit in Git. [ ]
repro-get
. According to Akihiro s post, repro-get is a tool to install a specific snapshot of apt/dnf/apk/pacman packages using SHA256SUMS files . This is needed in order to install specific (or pinned ) dependencies needed to validate a build.
man-db
UNIX manual page indexing tool:
One of the people working on [reproducible builds] noticed that man-db s database files were an obstacle to [reproducibility]: in particular, the exact contents of the database seemed to depend on the order in which files were scanned when building it. The reporter proposed solving this by processing files in sorted order, but I wasn t keen on that approach: firstly because it would mean we could no longer process files in an order that makes it more efficient to read them all from disk (still valuable on rotational disks), but mostly because the differences seemed to point to other bugs.Colin goes on to describe his approach to solving the problem, including fixing various fits of internal caching, and he ends his post with None of this is particularly glamorous work, but it paid off .
ascii2binary
(Fixed #1020812, #998758 & #1007421)bibclean
(Fixed #829754 & #929036)dradio
(Fixed #1020814)leave
(Fixed #777403, #967002 & #999259)libimage-imlib2-perl
(Fixed #1020665)mailto
(Fixed #998978 & #777413)remote-tty
(Fixed #829721 & #977280)xcolmix
(Fixed #1020748, #999219 & #988018)z80asm
(Fixed #939775 & #1020875)ario
(Investigated #828876)cloop
(Fixed #787996)elvis-tiny
(Fixed #829755 & #901345)hannah
(Fixed #845782 & #901260)mc
(Investigated #828683)mod-dnssd
(Submitted alternate fix for #828752)snake4
(Fixed #829715 & #913734)the
(Fixed #842550)zephyr
(Investigated #828867 & #1021374)msp430mcu
(Fixed #860275)checkpw
(Fixed #777299 & #1020887)madlib
(Fixed #778946)debhelper
, a set of tools used in the packaging of the majority of Debian packages. The patch addressed an issue in the dh_installsysusers
utility so that the postinst
post-installation script that debhelper
generates the same data regardless of the underlying filesystem ordering.
asymptote
(date-related issue)fastjet-contrib
(sort nondeterminstic filesystem ordering)forge
(Sphinx doctree issue)gau2grid
(output varies with march=native
)gosec
(date-related issue)helmfile
(date-related issue)libnvme
(date-related issue)moab
(CPU)tcl
(fails to build in 2038)vectorscan
(output varies with march=native
)xz2/lzma
(Rust-related filesystem ordering)puppet
back in early 2018 was finally merged into Puppet and was released in Puppet 7.20.0.puppet-agent
.tpm2-pytss
(forwarded upstream).cclive
.librep
.zephyr
.libdv
.dbview
.bwbasic
.olpc-powerd
.o3dgc
.icon
.rdist
.stfl
.pacman
.lam
.xsok
.python-djvulibre
.xzoom
.nitpic
.tcm
.xxkb
.yersinia
.centrifuge
.ssocr
.jakarta-jmeter
.guymager
.crack
.dc3dd
.dlt-viewer
.vart
.pgrouting
.libsx
.device-tree-compiler
.tsdecrypt
.openjdk
(Fixed JDK-8292892)224
and 225
to Debian:
html2text
. [ ]ttx(1)
from the fonttools suite. [ ]stable-po
pipeline to fail in the CI. [ ]order1.diff
test fixture to json_expected_ordering_diff
. [ ]assert_diff
over get_data
and an manual assert within the XML tests. [ ]ALLOWED_TEST_FILES
test; it was mostly just annoying. [ ]tests/test_source.py
file. [ ]logparse
tool to analyse results on the Debian Edu build logs. [ ]btop(1)
on all nodes running Debian. [ ]debstrap
jobs, correctly log the tool usage. [ ]cdebootstrap-static
binary for the 2nd runs of the cdebootstrap
tests. [ ]rm(1)
warning into an info -level message. [ ]osuosl168
node for running Debian bookworm already. [ ][ ]non-free-firmware
suite on the o168
node. [ ]/usr
. [ ]usrmerge
package on Debian bookworm and above. [ ]bc(1)
syntax in the computation of the percentage of unreproducible packages in the dashboard. [ ][ ][ ]index_suite_
pages, order the package status to be the same order of the menu. [ ]--distribution
parameter to the pbuilder
utility. [ ]#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
f-droid.org
are built from source on our own builders. This is partly because F-Droid is backed by the free software community; that is, people who have engaged in the free software community long before Android was conceived, and, in particular, share many if not all of its values. Using F-Droid will therefore feel very familiar to anyone familiar with a modern Linux distribution.
fdroid verify
command.)
Our signature & trust scheme means that F-Droid can verify that an app is 100% free software whilst still using the developer s original .APK
file. More details about this may be found in our reproducibility documentation and on the page about our Verification Server.
[..] fixing unreproducible build issues poses a set of challenges [..], among which we consider the localization granularity and the historical knowledge utilization as the most significant ones. To tackle these challenges, we propose a novel approach [called] RepFix that combines tracing-based fine-grained localization with history-based patch generation mechanisms.The paper (PDF, 3.5MB) uses the Debian
mylvmbackup
package as an example to show how RepFix can automatically generate patches to make software build reproducibly. As it happens, Reiner Herrmann submitted a patch for the mylvmbackup
package which has remained unapplied by the Debian package maintainer for over seven years, thus this paper inadvertently underscores that achieving reproducible builds will require both technical and social solutions.
_m
led to unreproducible .pyc
files. In particular, the types
module in Python 3.10 requires the following patch to make it reproducible:
--- a/Lib/types.py
+++ b/Lib/types.py
@@ -37,8 +37,8 @@ _ag = _ag()
AsyncGeneratorType = type(_ag)
class _C:
- def _m(self): pass
-MethodType = type(_C()._m)
+ def _b(self): pass
+MethodType = type(_C()._b)
Simply renaming the dummy method from _m
to _b
was enough to workaround the problem. Johannes bug report first led to a number of improvements in diffoscope to aid in dissecting .pyc
files, but upstream identified this as caused by an issue surrounding interned strings and is being tracked in CPython bug #78274.
foreach
package let their personal email domain expire, so they bought it and now controls foreach
on NPM and the 36,826 projects that depend on it . Shortly afterwards, Drew DeVault published a related blog post titled When will we learn? that offers a brief timeline of major incidents in this area and, not uncontroversially, suggests that the correct way to ship packages is with your distribution s package manager .
There s some bugs open with the Rust maintainers to address reproducible builds, but with the number of issues they have to deal with in the language, I am not optimistic that this problem will be resolved anytime soon. Assuming the only driver of the unreproducibility is the inclusion of OS paths in the binary, one fix to this would be to re-configure our build system to run in some sort of a chroot environment or a virtual machine that fixes the paths in a way that almost anyone else could reproduce. I say almost anyone else because this fix would be OS-dependent, so we d be able to get reproducible builds under, for example, Linux, but it would not help Windows users where chroot environments are not a thing.(Full post)
#reproducible-builds
on the OFTC network.
PKGBUILDs
provide authentication in the context of signed Git tags (i.e. the ability to verify the Git tag was signed by one of the two trusted keys ), they do not support pinning, ie. that upstream could create a new signed Git tag with an identical name, and arbitrarily change the source code without the [maintainer] noticing . Conversely, other PKGBUILD
s support pinning but not authentication. The new tool, auth-tarball-from-git, fixes both problems, as nearly outlined in kpcyrd s original blog post.
212
, 213
and 214
to Debian unstable.
Chris also made the following changes:
zipinfo
and zipinfo -v
. [ ]assert_diff
in test_zip
over calling get_data
with a separate assert
. [ ]re.compile
and then call .sub
on the result; just call re.sub
directly. [ ]--usage
and --help
. [ ]xb-tool
for GNU Guix [ ] as well as updated the diffoscope package in GNU Guix itself [ ][ ][ ].
nondeterministic_ordering_in_deprecated_items_collected_by_doxygen
toolchain issue [ ] as well as ones for mono_mastersummary_xml_files_inherit_filesystem_ordering
[ ], extended_attributes_in_jar_file_created_without_manifest
[ ] and apxs_captures_build_path
[ ].
Vagrant Cascadian performed a rough check of the reproducibility of core package sets in GNU Guix, and in openSUSE, Bernhard M. Wiedemann posted his usual monthly reproducible builds status report.
gtkmm-documentation
(merged; sorting issue)librespot
(merged; random BuildID
issue)lirc
(merged)lsof
(uname
/hostname
problem)solanum
(merged, possibly a race condition)mtink
.fceux
.glob2
.coinor-cgl
.metapixel
.ragel
.gdome2
.sgml-base-doc
.xarclock
.xgammon
.lwatch
.bbrun
.gscanbus
.libnss-gw-name
.pidgin-blinklight
.dvbtune
.efax
.quelcom
.xine-lib-1.2
.fusesmb
.mailfront
.convlit
.bitstormlite
.coinor-osi
.razor
.autoclass
.cdbackup
.dds2tar
.transcalc
.libapache2-mod-authz-unixgroup
.mgdiff
.scsitools
.fstrcmp
.libxsettings-client
.tamil-gtk2im
.tdfsb
.stymulator
.wiipdf
.gdigi
.getstream
.freecdb
.modglue
.nwall
.parprouted
.imagination
.tuxcmd-modules
.libapache2-mod-authn-yubikey
.libapache2-mod-auth-plain
.arm64
binaries to build reproducibly for the Debian systemd
package.
.dsc
files using reprepro.
#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
/usr
, /lib
, etc.), but there are other significant differences too, such as Guix being scriptable using Guile/Scheme, as well as Guix s dedication and focus on free software.
awk
, bash
, coreutils
, grep
sed
, etc., by Gash and Gash-Utils. The final goal of Mes is to help create a full-source bootstrap for any interested UNIX-like operating system.
guile-module
support and support running Gash and Gash Utils.
In working to create a full-source bootstrap, I have disregarded the kernel and Guix build system for now, but otherwise, all packages should be built from source, and obviously, no binary blobs should go in. We still need a Guile binary to execute some scripts, and it will take at least another one to two years to remove that binary. I m using the 80/20 approach, cutting corners initially to get something working and useful early.
Another metric would be how many architectures we have. We are quite a way with ARM, tinycc now works, but there are still problems with GCC and Glibc. RISC-V is coming, too, which could be another metric. Someone has looked into picking up NixOS this summer. How many distros do anything about reproducibility or bootstrappability? The bootstrappability community is so small that we don t need metrics, sadly. The number of bytes of binary seed is a nice metric, but running the whole thing on a full-fledged Linux system is tough to put into a metric. Also, it is worth noting that I m developing on a modern Intel machine (ie. a platform with a management engine), that s another key component that doesn t have metrics.
hex0
, 357-byte binary, we can now build the entire Guix system.
This past year we have not made significant visible progress, however, as our funding was unfortunately not there. The Stage0 project has advanced in RISC-V. A month ago, though, I secured NLnet funding for another year, and thanks to NLnet, Ekaitz Zarraga and Timothy Sample will work on GNU Mes and the Guix bootstrap as well. Separate to this, the bootstrappable community has grown a lot from two people it was six years ago: there are now currently over 100 people in the #bootstrappable
IRC channel, for example. The enlarged community is possibly an even more important win going forward.
irc.libera.chat
in the #bootstrappable
channel).mklabel gpt mkpart EFI fat32 1 99 mkpart boot ext3 99 300 toggle 1 boot toggle 1 esp p # Model: CT1000P1SSD8 (nvme) # Disk /dev/nvme1n1: 1000GB # Sector size (logical/physical): 512B/512B # Partition Table: gpt # Disk Flags: # # Number Start End Size File system Name Flags # 1 1049kB 98.6MB 97.5MB fat32 EFI boot, esp # 2 98.6MB 300MB 201MB ext3 boot qHere are the commands needed to create the filesystems and install the necessary files. This is almost to the stage of being scriptable. Some minor changes need to be made to convert from NVMe device names to SATA/SAS but nothing serious.
mkfs.vfat /dev/nvme1n1p1 mkfs.ext3 -N 1000 /dev/nvme1n1p2 file -s /dev/nvme1n1p2 sed -e s/^.*UUID/UUID/ -e "s/ .*$/ \/boot ext3 noatime 0 1/" >> /etc/fstab file -s /dev/nvme1n1p1 tr "[a-f]" "[A-F]" sed -e s/^.*numBEr.0x/UUID=/ -e "s/, .*$/ \/boot\/efi vfat umask=0077 0 1/" >> /etc/fstab # edit /etc/fstab to put a hyphen between the 2 groups of 4 chars for the VFAT filesystem UUID mount /boot mkdir -p /boot/efi /boot/grub mount /boot/efi mkdir -p /boot/efi/EFI/debian apt install efibootmgr shim-unsigned grub-efi-amd64 cp /usr/lib/shim/* /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi /boot/efi/EFI/debian file -s /dev/nvme1n1p2 sed -e "s/^.*UUID=/search.fs_uuid /" -e "s/ .needs.*$/ root hd0,gpt2/" > /boot/efi/EFI/debian/grub.cfg echo "set prefix=(\$root)'/boot/grub'" >> /boot/efi/EFI/debian/grub.cfg echo "configfile \$prefix/grub.cfg" >> /boot/efi/EFI/debian/grub.cfg grub-install update-grubIf someone would like to make a script that can handle the different partition names of regular SCSI/SATA disks, NVMe, CCISS, etc then that would be great. It would be good to have a script in Debian that creates the partitions and sets up the EFI files. If you want to have a second bootable device then the following commands will copy a GPT partition table and give it new UUIDs, make very certain that $DISKB is the one you want to be wiped and refer to my previous mention of parted -l . Also note that parted has a rescue command which works very well.
sgdisk /dev/$DISKA -R /dev/$DISKB sgdisk -G /dev/$DISKBTo backup a GPT partition table run a command like this. Note that if sgdisk is told to backup a MBR partitioned disk it will say Found invalid GPT and valid MBR; converting MBR to GPT forma which is probably a viable way of converting MBR format to GPT.
sgdisk -b sda.bak /dev/sda
std::rand()
which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package s maintainer who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0.
The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.
Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
- Upgraded to Armadillo release 10.4.0 (Pressure Cooker)
- faster handling of triangular matrices by
log_det()
- added
log_det_sympd()
for log determinant of symmetric positive matrices- added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages
- reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings
- Apply one upstream corrections for
arma::randn
draws when using alternative (here R) generator, andarma::randg
.Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
- Upgraded to Armadillo release 10.3 (Sunrise Chaos)
- faster handling of symmetric positive definite matrices by
pinv()
- expanded
.save()
/.load()
for dense matrices to handle coord_ascii format- for out of bounds access, element accessors now throw the more nuanced
std::out_of_range
exception, instead of onlystd::logic_error
- improved quality of random numbers
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
drat
repo, and we will continue to do so going forward. Releases to CRAN, however, are real work. If they then end up with as much nonsense as the last release 1.0.4, we think it is appropriate to slow things down some more so we intend to now switch to a six-months cycle. As mentioned, interim releases are always just one install.packages()
call with a properly set repos
argument away.
Rcpp has become the most popular way of enhancing R with C or C++ code. As of today, 2002 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 203 in BioConductor. And per the (partial) logs of CRAN downloads, we are running steady at around one millions downloads per month.
This release features again a number of different pull requests by different contributors covering the full range of API improvements, attributes enhancements, changes to Sugar and helper functions, extended documentation as well as continuous integration deplayment. See the list below for details.
Thanks to CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome underChanges in Rcpp patch release version 1.0.5 (2020-07-01)
- Changes in Rcpp API:
- The exception handler code in #1043 was updated to ensure proper include behavior (Kevin in #1047 fixing #1046).
- A missing
Rcpp_list6
definition was added to support R 3.3.* builds (Davis Vaughan in #1049 fixing #1048).- Missing
Rcpp_list 2,3,4,5
definition were added to the Rcpp namespace (Dirk in #1054 fixing #1053).- A further updated corrected the header include and provided a missing else branch (Mattias Ellert in #1055).
- Two more assignments are protected with
Rcpp::Shield
(Dirk in #1059).- One call to
abs
is now properly namespaced withstd::
(Uwe Korn in #1069).- String object memory preservation was corrected/simplified (Kevin in #1082).
- Changes in Rcpp Attributes:
- Changes in Rcpp Sugar:
- Changes in Rcpp support functions:
- Changes in Rcpp Documentation:
- Changes in Rcpp Deployment:
- Travis CI unit tests now run a matrix over the versions of R also tested at CRAN (rel/dev/oldrel/oldoldrel), and coverage runs in parallel for a net speed-up (Dirk in #1056 and #1057).
- The exceptions test is now partially skipped on Solaris as it already is on Windows (Dirk in #1065).
- The default CI runner was upgraded to R 4.0.0 (Dirk).
- The CI matrix spans R 3.5, 3.6, r-release and r-devel (Dirk).
rcpp
tag at StackOverflow which also allows searching among the (currently) 2455 previous questions.
If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
.doctrees
from installed files was created via Arch s TODO list mechanism. These .doctree
files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default.
Dimitry Andric was able to determine why the reproducibility status of FreeBSD s base.txz
depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler [ ]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [ ].
For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan janneke Nieuwenhuizen posted that they had further reduced their bootstrap seed to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:
autogen
(Date issue)carla
(Timestamp in Windows Portable Executable executables)fonttosfnt/xorg-x11-fonts
(Address space layout randomization issue)fossil
(Date issue)gcc10 C++
(Link-time optimisation issue)grep
(Profile-guided optimisation issue)kubernetes1.18
(Remove Go build identifier)libjcat
(Remove certificate)lifelines
(Date issue)miredo
(Drop hostname)stressapptest
(Override date, user & host)reproducible-check
tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [ ][ ][ ].
Timo R hling filed a wishlist bug against the debhelper
build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. [ ]
Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system s approach this [ ]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds.
47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates
issue [ ] and updated the paths_vary_due_to_usrmerge as deterministic
issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path
and gcc_captures_build_path
issues. [ ][ ][ ].
Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map
command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild
Debian package rebuilder tool (#961861, #961862 & #961864).
SOURCE_DATE_EPOCH
git log
example to another section [ ]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [ ].
strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper
compatibility level to 13 [ ] and adjusted a related dependency to avoid potential circular dependency [ ].
autogen
(race condition)cockpit
(date)fossil
(date)libnvidia-container
(date)libv3270
( date)netcdf-fortran
.seqtools
.python-pauvre
.petitboot
.fonts-anonymous-pro
.python-pyqtgraph
(forwarded upstream)libqmi
.tkabber-plugins
.python-stem
.golang-v2ray-core
.critcl
.gftl
.libmbim
.neovim-qt
.golang-github-viant-toolbox
.libxml2
(random data corruption)frr
(build fails on single-processor machines), ghc-yesod-static/git-annex
(a filesystem ordering issue) and ooRexx
(ASLR-related issue).
147
, 148
and 149
to Debian and made the following changes:
/Info
stanza). (#150)jsondiff
version 1.2.0. (#159)File.recognizes
that checks candidates against file(1)
. [ ]subprocess.check_output
by using a wrapper. (#151)AbstractMissingType
type instead of remembering to check for both types of missing files. [ ].changes
, .dsc
and .buildinfo
comparators. [ ]f-strings
to tidy up code [ ][ ] and remove explicit u"unicode"
strings [ ].--new-file
option when comparing directories by merging DirectoryContainer.compare
and Container.compare
. (#180)--diff-mask=REGEX
. (!51)--html-dir
presenter format. [ ]--html-dir
format. [ ][ ]tlsh
fuzzy-matching library during tests [ ] and tweaked the build system to remove an unwanted .build
directory [ ]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [ ] and later 148 [ ].
tests.reproducible-builds.org
. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
rsync2buildinfos.debian.net
every night. [ ].buildinfo
files to include a fix regarding comparing source vs. binary package versions. [ ]archlinux_html_pages
, openwrt_rebuilder_today
and openwrt_rebuilder_future
to known broken jobs. [ ]<meta>
header to refresh the page every 5 minutes. [ ]fixfilepath
on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map
categorised issue.
Lastly, the usual build node maintenance was performed by Holger Levsen [ ][ ], Mattia Rizzolo [ ] and Vagrant Cascadian [ ][ ][ ][ ][ ].
#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.
Publisher: | W.W. Norton |
Copyright: | 2018 |
Printing: | 2019 |
ISBN: | 0-393-35745-7 |
Format: | Kindle |
Pages: | 254 |
"It's Groundhog Day," said Max. "The new people come in and think that the previous administration and the civil service are lazy or stupid. Then they actually get to know the place they are managing. And when they leave, they say, 'This was a really hard job, and those are the best people I've ever worked with.' This happens over and over and over."By 2016, Stier saw vast improvements, despite his frustration with other actions of the Obama administration. He believed their transition briefings were one of the best courses ever produced on how the federal government works. Then that transition process ran into Donald Trump. Or, to be more accurate, that transition did not run into Donald Trump, because neither he nor anyone who worked for him were there. We'll never know how good the transition information was because no one ever listened to or read it. Meetings were never scheduled. No one showed up. This book is not truly about the presidential transition, though, despite its presence as a continuing theme. The Fifth Risk is, at its heart, an examination of government work, the people who do it, why it matters, and why you should care about it. It's a study of the surprising and misunderstood responsibilities of the departments of the United States federal government. And it's a series of profiles of the people who choose this work as a career, not in the upper offices of political appointees, but deep in the civil service, attempting to keep that system running. I will warn now that I am far too happy that this book exists to be entirely objective about it. The United States desperately needs basic education about the government at all levels, but particularly the federal civil service. The public impression of government employees is skewed heavily towards the small number of public-facing positions and towards paperwork frustrations, over which the agency usually has no control because they have been sabotaged by Congress (mostly by Republicans, although the Democrats get involved occasionally). Mental images of who works for the government are weirdly selective. The Coast Guard could say "I'm from the government and I'm here to help" every day, to the immense gratitude of the people they rescue, but Reagan was still able to use that as a cheap applause line in his attack on government programs. Other countries have more functional and realistic social attitudes towards their government workers. The United States is trapped in a politically-fueled cycle of contempt and ignorance. It has to stop. And one way to help stop it is someone with Michael Lewis's story-telling skills writing a different narrative. The Fifth Risk is divided into a prologue about presidential transitions, three main parts, and an afterword (added in current editions) about a remarkable government worker whom you likely otherwise would never hear about. Each of the main parts talks about a different federal department: the Department of Energy, the Department of Agriculture, and the Department of Commerce. In keeping with the theme of the book, the people Lewis profiles do not do what you might expect from the names of those departments. Lewis's title comes from his discussion with John MacWilliams, a former Goldman Sachs banker who quit the industry in search of more personally meaningful work and became the chief risk officer for the Department of Energy. Lewis asks him for the top five risks he sees, and if you know that the DOE is responsible for safeguarding nuclear weapons, you will be able to guess several of them: nuclear weapons accidents, North Korea, and Iran. If you work in computer security, you may share his worry about the safety of the electrical grid. But his fifth risk was project management. Can the government follow through on long-term hazardous waste safety and cleanup projects, despite constant political turnover? Can it attract new scientists to the work of nuclear non-proliferation before everyone with the needed skills retires? Can it continue to lay the groundwork with basic science for innovation that we'll need in twenty or fifty years? This is what the Department of Energy is trying to do. Lewis's profiles of other departments are similarly illuminating. The Department of Agriculture is responsible for food stamps, the most effective anti-poverty program in the United States with the possible exception of Social Security. The section on the Department of Commerce is about weather forecasting, specifically about NOAA (the National Oceanic and Atmospheric Administration). If you didn't know that all of the raw data and many of the forecasts you get from weather apps and web sites are the work of government employees, and that AccuWeather has lobbied Congress persistently for years to prohibit the NOAA from making their weather forecasts public so that AccuWeather can charge you more for data your taxes already paid for, you should read this book. The story of American contempt for government work is partly about ignorance, but it's also partly about corporations who claim all of the credit while selling taxpayer-funded resources back to you at absurd markups. The afterword I'll leave for you to read for yourself, but it's the story of Art Allen, a government employee you likely have never heard of but whose work for the Coast Guard has saved more lives than we are able to measure. I found it deeply moving. If you, like I, are a regular reader of long-form journalism and watch for new Michael Lewis essays in particular, you've probably already read long sections of this book. By the time I sat down with it, I think I'd read about a third in other forms on-line. But the profiles that I had already read were so good that I was happy to read them again, and the additional stories and elaboration around previously published material was more than worth the cost and time investment in the full book.
It was never obvious to me that anyone would want to read what had interested me about the United States government. Doug Stumpf, my magazine editor for the past decade, persuaded me that, at this strange moment in American history, others might share my enthusiasm.I'll join Michael Lewis in thanking Doug Stumpf. The Fifth Risk is not a proposal for how to fix government, or politics, or polarization. It's not even truly a book about the Trump presidency or about the transition. Lewis's goal is more basic: The United States government is full of hard-working people who are doing good and important work. They have effectively no public relations department. Achievements that would result in internal and external press releases in corporations, not to mention bonuses and promotions, go unnoticed and uncelebrated. If you are a United States citizen, this is your government and it does important work that you should care about. It deserves the respect of understanding and thoughtful engagement, both from the citizenry and from the politicians we elect. Rating: 10 out of 10
bzr
) but
how over time we came to appreciate Git for what it is. For Mark this was less
painful because he came to Git early enough that there was little more than the
fundamental data model, without much of the porcelain which now exists.
One point which we all, though Mark in particular, felt was worth considering
was that of distinguishing between published and unpublished changes. The
article touches on it a little, but one of the benefits of the workflow which
Jeremy espouses is that of using the revision control system as an integral
part of the review pipeline. This is perhaps done very well with Git based
workflows, but can be done with other VCSs.
With respect to the points Jeremy makes regarding making commits which are good
for reviewing, we had a general agreement that things were good and sensible,
to a point, but that some things were missed out on. For example, I raised
that commit messages often need to be more thorough than one-liners, but
Jeremy's examples (perhaps through expedience for the article?) were all pretty
trite one-liners which perhaps would have been entirely obvious from the commit
content. Jeremy makes the point that large changes are hard to review, and
Lars poined out that Greg Wilson did research in this area, and at
least one article mentions 200 lines as a maximum size of a
reviewable segment.
I had a brief warble at this point about how reviewing needs to be able to
consider the whole of the intended change (i.e. a diff from base to tip) not
just individual commits, which is also missing from Jeremy's article, but that
such a review does not need to necessarily be thorough and detailed since the
commit-by-commit review remains necessary. I use that high level diff as a way
to get a feel for the full shape of the intended change, a look at the end-game
if you will, before I read the story of how someone got to it. As an aside at
this point, I talked about how Jeremy included a 'style fixes' commit in his
example, but I loathe seeing such things and would much rather it was
either not in the series because it's unrelated to it; or else the style fixes
were folded into the commits they were related to.
We discussed how style rules, as well as commit-bisectability, and other rules
which may exist for a codebase, the adherence to which would form part of the
checks that a code reviewer may perform, are there to be held to when they
help the project, and to be broken when they are in the way of good
communication between humans.
In this, Lars talked about how revision control histories provide high level
valuable communication between developers. Communication between humans is
fraught with error and the rules are not always clear on what will work and
what won't, since this depends on the communicators, the context, etc. However
whatever communication rules are in place should be followed. We often say
that it takes two people to communicate information, but when you're writing
commit messages or arranging your commit history, the second party is often
nebulous "other" and so the code reviewer fulfils that role to concretise it
for the purpose of communication.
At this point, I wondered a little about what value there might be (if any) in
maintaining the metachanges (pull request info, mailing list discussions, etc)
for historical context of decision making. Mark suggested that this is useful
for design decisions etc but not for the style/correctness discussions which
often form a large section of review feedback. Maybe some of the metachange
tracking is done automatically by the review causing the augmentation of the
changes (e.g. by comments, or inclusion of design documentation changes) to
explain why changes are made.
We discussed how the "rebase always vs. rebase never" feeling flip-flopped in
us for many years until, as an example, what finally won Lars over was that he
wants the history of the project to tell the story, in the git commits, of how
the software has changed and evolved in an intentional manner. Lars said
that he doesn't care about the meanderings, but rather about a clear story
which can be followed and understood.
I described this as the switch from "the revision history is about what I did
to achieve the goal" to being more "the revision history is how I would hope
someone else would have done this". Mark further refined that to "The revision
history of a project tells the story of how the project, as a whole, chose to
perform its sequence of evolution."
We discussed how project history must necessarily then contain issue tracking,
mailing list discussions, wikis, etc. There are exist free software projects
where part of their history is forever lost because, for example, the project
moved from Sourceforge to Github, but made no effort (or was unable) to migrate
issues or somesuch. Linkages between changes and the issues they relate to
can easily be broken, though at least with mailing lists you can often rewrite
URLs if you have something consistent like a Message-Id
.
We talked about how cover notes, pull request messages, etc. can thus also be
lost to some extent. Is this an argument to always use merges whose message
bodies contain those details, rather than always fast-forwarding? Or is it a
reason to encapsulate all those discussions into git objects which can be
forever associated with the changes in the DAG?
We then diverted into discussion of CI, testing every commit, and the benefits
and limitations of automated testing vs. manual testing; though I think that's
a little too off-piste for even this summary. We also talked about how commit
message audiences include software perhaps, with the recent movement toward
conventional commits and how, with respect to commit messages for
machine readability, it can become very complex/tricky to craft good commit
messages once there are multiple disparate audiences. For projects the size of
the Linux kernel this kind of thing would be nearly impossible, but for smaller
projects, perhaps there's value.
Finally, we all agreed that we liked the quote at the end of the article, and so
I'd like to close out by repeating it for you all...
Hal Abelson famously said:
Programs must be written for people to read, and only incidentally for machines to execute.Jeremy agrees, as do we, and extends that to the metacommit information as well.
Next.