Search Results: "romain"

14 August 2022

Dirk Eddelbuettel: RcppArmadillo used by 1001 CRAN Packages

It is with a mix of pride and joy, but also some genuine astonishment and amazement, that we can share that the counter of reverse dependencies at CRAN for our RcppArmadillo package for R just crossed 1000 packages [1]: Conrad actually posted this a few weeks ago, by my count we were then still a few packages shy. In any event, having crossed this marker this summer, either then or now, and after more than a dozen years of working on the package is a really nice moment. Google Scholar counts nearly 500 citations for our CSDA paper (also this vignette), and that ratio of nearly a citation for every two packages used is certainly impressive. We have had the pleasure of working with so many other researchers and scientists using RcppArmadillo. Its combination of performance (C++, after all, and heavily tuned) and ease-of-use (inspired by another popular flavour for matrix computing that is however mostly interpreted) makes for a powerful package, and we are delighted to see it used so widely. Working on this with Conrad has been excellent. The (upstream) package (now at this GitLab repo) has received numerous releases at a rate that is in fact so high that we now slow it down to not exceed a monthly cadence of uploads to CRAN. But the package should always be in release condition at its GitHub repo, and is frequently also installable in rc versions via the Rcpp drat repo. So with that, a big Thank You! to Conrad, to Romain for all the early work laying the package foundations, and to all the users of (Rcpp)Armadillo for helping us along with testing, suggestions, extensions, and bug reports. Keep em coming! If you like this or other open-source work I do, you can now sponsor me at GitHub. [1] The code snippet shows that we remove some possible duplicates in the count (mostly for the total of packages). This is a correction we use across packages for consistency. It does not have an effect for RcppArmadillo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 October 2021

Reproducible Builds: Reproducible Builds in September 2021

The goal behind reproducible builds is to ensure that no deliberate flaws have been introduced during compilation processes via promising or mandating that identical results are always generated from a given source. This allowing multiple third-parties to come to an agreement on whether a build was compromised or not by a system of distributed consensus. In these reports we outline the most important things that have been happening in the world of reproducible builds in the past month:
First mentioned in our March 2021 report, Martin Heinz published two blog posts on sigstore, a project that endeavours to offer software signing as a public good, [the] software-signing equivalent to Let s Encrypt . The two posts, the first entitled Sigstore: A Solution to Software Supply Chain Security outlines more about the project and justifies its existence:
Software signing is not a new problem, so there must be some solution already, right? Yes, but signing software and maintaining keys is very difficult especially for non-security folks and UX of existing tools such as PGP leave much to be desired. That s why we need something like sigstore - an easy to use software/toolset for signing software artifacts.
The second post (titled Signing Software The Easy Way with Sigstore and Cosign) goes into some technical details of getting started.
There was an interesting thread in the /r/Signal subreddit that started from the observation that Signal s apk doesn t match with the source code:
Some time ago I checked Signal s reproducibility and it failed. I asked others to test in case I did something wrong, but nobody made any reports. Since then I tried to test the Google Play Store version of the apk against one I compiled myself, and that doesn t match either.

BitcoinBinary.org was announced this month, which aims to be a repository of Reproducible Build Proofs for Bitcoin Projects :
Most users are not capable of building from source code themselves, but we can at least get them able enough to check signatures and shasums. When reputable people who can tell everyone they were able to reproduce the project s build, others at least have a secondary source of validation.

Distribution work Fr d ric Pierret announced a new testing service at beta.tests.reproducible-builds.org, showing actual rebuilds of binaries distributed by both the Debian and Qubes distributions. In Debian specifically, however, 51 reviews of Debian packages were added, 31 were updated and 31 were removed this month to our database of classified issues. As part of this, Chris Lamb refreshed a number of notes, including the build_path_in_record_file_generated_by_pybuild_flit_plugin issue. Elsewhere in Debian, Roland Clobus posted his Fourth status update about reproducible live-build ISO images in Jenkins to our mailing list, which mentions (amongst other things) that:
  • All major configurations are still built regularly using live-build and bullseye.
  • All major configurations are reproducible now; Jenkins is green.
    • I ve worked around the issue for the Cinnamon image.
    • The patch was accepted and released within a few hours.
  • My main focus for the last month was on the live-build tool itself.
Related to this, there was continuing discussion on how to embed/encode the build metadata for the Debian live images which were being worked on by Roland Clobus.
Ariadne Conill published another detailed blog post related to various security initiatives within the Alpine Linux distribution. After summarising some conventional security work being done (eg. with sudo and the release of OpenSSH version 3.0), Ariadne included another section on reproducible builds: The main blocker [was] determining what to do about storing the build metadata so that a build environment can be recreated precisely . Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

Community news On our website this month, Bernhard M. Wiedemann fixed some broken links [ ] and Holger Levsen made a number of changes to the Who is Involved? page [ ][ ][ ]. On our mailing list, Magnus Ihse Bursie started a thread with the subject Reproducible builds on Java, which begins as follows:
I m working for Oracle in the Build Group for OpenJDK which is primary responsible for creating a built artifact of the OpenJDK source code. [ ] For the last few years, we have worked on a low-effort, background-style project to make the build of OpenJDK itself building reproducible. We ve come far, but there are still issues I d like to address. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 183, 184 and 185 as well as performed significant triaging of merge requests and other issues in addition to making the following changes:
  • New features:
    • Support a newer format version of the R language s .rds files. [ ]
    • Update tests for OCaml 4.12. [ ]
    • Add a missing format_class import. [ ]
  • Bug fixes:
    • Don t call close_archive when garbage collecting Archive instances, unless open_archive definitely returned successfully. This prevents, for example, an AttributeError where PGPContainer s cleanup routines were rightfully assuming that its temporary directory had actually been created. [ ]
    • Fix (and test) the comparison of R language s .rdb files after refactoring temporary directory handling. [ ]
    • Ensure that RPM archives exists in the Debian package description, regardless of whether python3-rpm is installed or not at build time. [ ]
  • Codebase improvements:
    • Use our assert_diff routine in tests/comparators/test_rdata.py. [ ]
    • Move diffoscope.versions to diffoscope.tests.utils.versions. [ ]
    • Reformat a number of modules with Black. [ ][ ]
However, the following changes were also made:
  • Mattia Rizzolo:
    • Fix an autopkgtest caused by the androguard module not being in the (expected) python3-androguard Debian package. [ ]
    • Appease a shellcheck warning in debian/tests/control.sh. [ ]
    • Ignore a warning from h5py in our tests that doesn t concern us. [ ]
    • Drop a trailing .1 from the Standards-Version field as it s required. [ ]
  • Zbigniew J drzejewski-Szmek:
    • Stop using the deprecated distutils.spawn.find_executable utility. [ ][ ][ ][ ][ ]
    • Adjust an LLVM-related test for LLVM version 13. [ ]
    • Update invocations of llvm-objdump. [ ]
    • Adjust a test with a one-byte text file for file version 5.40. [ ]
And, finally, Benjamin Peterson added a --diff-context option to control unified diff context size [ ] and Jean-Romain Garnier fixed the Macho comparator for architectures other than x86-64 [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project runs a testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Holger Levsen:
    • Drop my package rebuilder prototype as it s not useful anymore. [ ]
    • Schedule old packages in Debian bookworm. [ ]
    • Stop scheduling packages for Debian buster. [ ][ ]
    • Don t include PostgreSQL debug output in package lists. [ ]
    • Detect Python library mismatches during build in the node health check. [ ]
    • Update a note on updating the FreeBSD system. [ ]
  • Mattia Rizzolo:
    • Silence a warning from Git. [ ]
    • Update a setting to reflect that Debian bookworm is the new testing. [ ]
    • Upgrade the PostgreSQL database to version 13. [ ]
  • Roland Clobus (Debian live image generation):
    • Workaround non-reproducible config files in the libxml-sax-perl package. [ ]
    • Use the new DNS for the snapshot service. [ ]
  • Vagrant Cascadian:
    • Also note that the armhf architecture also systematically varies by the kernel. [ ]

Contributing If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

1 October 2021

Reproducible Builds (diffoscope): diffoscope 186 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 186. This version includes the following changes:
[ Chris Lamb ]
* Don't call close_archive when garbage-collecting Archive instances unless
  open_archive returned successfully. This prevents, amongst others, an
  AttributeError traceback due to PGPContainer's cleanup routines assuming
  that its temporary directory had been created.
  (Closes: reproducible-builds/diffoscope#276)
* Ensure that the string "RPM archives" exists in the package description,
  regardless of whether python3-rpm is installed or not at build time.
[ Jean-Romain Garnier ]
* Fix the LVM Macho comparator for non-x86-64 architectures.
You find out more by visiting the project homepage.

16 July 2021

Reproducible Builds (diffoscope): diffoscope 178 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 178. This version includes the following changes:
[ Chris Lamb ]
* Don't traceback on an broken symlink in a directory.
  (Closes: reproducible-builds/diffoscope#269)
* Rewrite the calculation of a file's "fuzzy hash" to make the control
  flow cleaner.
[ Balint Reczey ]
* Support .deb package members compressed with the Zstandard algorithm.
  (LP: #1923845)
[ Jean-Romain Garnier ]
* Overhaul the Mach-O executable file comparator.
* Implement tests for the Mach-O comparator.
* Switch to new argument format for the LLVM compiler.
* Fix test_libmix_differences in testsuite for the ELF format.
* Improve macOS compatibility for the Mach-O comparator.
* Add llvm-readobj and llvm-objdump to the internal EXTERNAL_TOOLS data
  structure.
[ Mattia Rizzolo ]
* Invoke gzip(1) with the short option variants to support Busybox's gzip.
  (Closes: reproducible-builds/diffoscope#268)
You find out more by visiting the project homepage.

5 April 2021

Kees Cook: security things in Linux v5.9

Previously: v5.8 Linux v5.9 was released in October, 2020. Here s my summary of various security things that I found interesting: seccomp user_notif file descriptor injection
Sargun Dhillon added the ability for SECCOMP_RET_USER_NOTIF filters to inject file descriptors into the target process using SECCOMP_IOCTL_NOTIF_ADDFD. This lets container managers fully emulate syscalls like open() and connect(), where an actual file descriptor is expected to be available after a successful syscall. In the process I fixed a couple bugs and refactored the file descriptor receiving code. zero-initialize stack variables with Clang
When Alexander Potapenko landed support for Clang s automatic variable initialization, it did so with a byte pattern designed to really stand out in kernel crashes. Now he s added support for doing zero initialization via CONFIG_INIT_STACK_ALL_ZERO, which besides actually being faster, has a few behavior benefits as well. Unlike pattern initialization, which has a higher chance of triggering existing bugs, zero initialization provides safe defaults for strings, pointers, indexes, and sizes. Like the pattern initialization, this feature stops entire classes of uninitialized stack variable flaws. common syscall entry/exit routines
Thomas Gleixner created architecture-independent code to do syscall entry/exit, since much of the kernel s work during a syscall entry and exit is the same. There was no need to repeat this in each architecture, and having it implemented separately meant bugs (or features) might only get fixed (or implemented) in a handful of architectures. It means that features like seccomp become much easier to build since it wouldn t need per-architecture implementations any more. Presently only x86 has switched over to the common routines. SLAB kfree() hardening
To reach CONFIG_SLAB_FREELIST_HARDENED feature-parity with the SLUB heap allocator, I added naive double-free detection and the ability to detect cross-cache freeing in the SLAB allocator. This should keep a class of type-confusion bugs from biting kernels using SLAB. (Most distro kernels use SLUB, but some smaller devices prefer the slightly more compact SLAB, so this hardening is mostly aimed at those systems.) new CAP_CHECKPOINT_RESTORE capability
Adrian Reber added the new CAP_CHECKPOINT_RESTORE capability, splitting this functionality off of CAP_SYS_ADMIN. The needs for the kernel to correctly checkpoint and restore a process (e.g. used to move processes between containers) continues to grow, and it became clear that the security implications were lower than those of CAP_SYS_ADMIN yet distinct from other capabilities. Using this capability is now the preferred method for doing things like changing /proc/self/exe. debugfs boot-time visibility restriction
Peter Enderborg added the debugfs boot parameter to control the visibility of the kernel s debug filesystem. The contents of debugfs continue to be a common area of sensitive information being exposed to attackers. While this was effectively possible by unsetting CONFIG_DEBUG_FS, that wasn t a great approach for system builders needing a single set of kernel configs (e.g. a distro kernel), so now it can be disabled at boot time. more seccomp architecture support
Michael Karcher implemented the SuperH seccomp hooks, Guo Ren implemented the C-SKY seccomp hooks, and Max Filippov implemented the xtensa seccomp hooks. Each of these included the ever-important updates to the seccomp regression testing suite in the kernel selftests. stack protector support for RISC-V
Guo Ren implemented -fstack-protector (and -fstack-protector-strong) support for RISC-V. This is the initial global-canary support while the patches to GCC to support per-task canaries is getting finished (similar to the per-task canaries done for arm64). This will mean nearly all stack frame write overflows are no longer useful to attackers on this architecture. It s nice to see this finally land for RISC-V, which is quickly approaching architecture feature parity with the other major architectures in the kernel. new tasklet API
Romain Perier and Allen Pais introduced a new tasklet API to make their use safer. Much like the timer_list refactoring work done earlier, the tasklet API is also a potential source of simple function-pointer-and-first-argument controlled exploits via linear heap overwrites. It s a smaller attack surface since it s used much less in the kernel, but it is the same weak design, making it a sensible thing to replace. While the use of the tasklet API is considered deprecated (replaced by threaded IRQs), it s not always a simple mechanical refactoring, so the old API still needs refactoring (since that CAN be done mechanically is most cases). x86 FSGSBASE implementation
Sasha Levin, Andy Lutomirski, Chang S. Bae, Andi Kleen, Tony Luck, Thomas Gleixner, and others landed the long-awaited FSGSBASE series. This provides task switching performance improvements while keeping the kernel safe from modules accidentally (or maliciously) trying to use the features directly (which exposed an unprivileged direct kernel access hole). filter x86 MSR writes
While it s been long understood that writing to CPU Model-Specific Registers (MSRs) from userspace was a bad idea, it has been left enabled for things like MSR_IA32_ENERGY_PERF_BIAS. Boris Petkov has decided enough is enough and has now enabled logging and kernel tainting (TAINT_CPU_OUT_OF_SPEC) by default and a way to disable MSR writes at runtime. (However, since this is controlled by a normal module parameter and the root user can just turn writes back on, I continue to recommend that people build with CONFIG_X86_MSR=n.) The expectation is that userspace MSR writes will be entirely removed in future kernels. uninitialized_var() macro removed
I made treewide changes to remove the uninitialized_var() macro, which had been used to silence compiler warnings. The rationale for this macro was weak to begin with ( the compiler is reporting an uninitialized variable that is clearly initialized ) since it was mainly papering over compiler bugs. However, it creates a much more fragile situation in the kernel since now such uses can actually disable automatic stack variable initialization, as well as mask legitimate unused variable warnings. The proper solution is to just initialize variables the compiler warns about. function pointer cast removals
Oscar Carter has started removing function pointer casts from the kernel, in an effort to allow the kernel to build with -Wcast-function-type. The future use of Control Flow Integrity checking (which does validation of function prototypes matching between the caller and the target) tends not to work well with function casts, so it d be nice to get rid of these before CFI lands. flexible array conversions
As part of Gustavo A. R. Silva s on-going work to replace zero-length and one-element arrays with flexible arrays, he has documented the details of the flexible array conversions, and the various helpers to be used in kernel code. Every commit gets the kernel closer to building with -Warray-bounds, which catches a lot of potential buffer overflows at compile time. That s it for now! Please let me know if you think anything else needs some attention. Next up is Linux v5.10.

2021, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
CC BY-SA 4.0

11 December 2020

Reproducible Builds (diffoscope): diffoscope 163 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 163. This version includes the following changes:
[ Chris Lamb ]
* Bug fixes & new features:
  - Normalise "ret" to "retq" in objdump output to support multiple versions
    of obdump(1). (Closes: #976760, reproducible-builds/diffoscope#227)
  - Don't show progress indicators when running zstd(1).
    (Closes: reproducible-builds/diffoscope#226)
  - Move the slightly-confusing behaviour of loading an "existing diff" if a
    single file is passed to diffoscope to the new --load-existing-diff
    command.
* Output improvements:
  - Use pprint.pformat in the JSON comparator to serialise the differences
    from jsondiff to make the output render better.
  - Correct tense in --debug log output.
* Code quality:
  * Don't use an old-style "super" call.
  - Rewrite the filter routine for post-processing output from readelf(1).
  - Update my copyright years.
  - Remove unncessary PEP 263 encoding lines (replaced via PEP 3120).
  - Use "minimal" instead of "basic" as a variable name to match the
    underlying package name.
  - Add comment regarding Java tests for diffoscope contributors who are not
    using Debian. (Re: reproducible-builds/diffoscope!58)
* Debian packaging:
  - Update debian/copyright to match copyright notices in source-tree.
    (Closes: reproducible-builds/diffoscope#224)
  - Ensure the new "diffoscope-minimal" package has a different short
    description from the main "diffoscope" one.
[ Jean-Romain Garnier ]
* Add tests for OpenJDK 14.
[ Conrad Ratschan ]
* Add comparator for "legacy" uboot uImage files.
  (MR: reproducible-builds/diffoscope!69)
You find out more by visiting the project homepage.

11 November 2020

Reproducible Builds: Reproducible Builds in October 2020

Welcome to the October 2020 report from the Reproducible Builds project. In our monthly reports, we outline the major things that we have been up to over the past month. As a brief reminder, the motivation behind the Reproducible Builds effort is to ensure flaws have not been introduced in the binaries we install on our systems. If you are interested in contributing to the project, please visit our main website.

General On Saturday 10th October, Morten Linderud gave a talk at Arch Conf Online 2020 on The State of Reproducible Builds in Arch. The video should be available later this month, but as a teaser:
The previous year has seen great progress in Arch Linux to get reproducible builds in the hands of the users and developers. In this talk we will explore the current tooling that allows users to reproduce packages, the rebuilder software that has been written to check packages and the current issues in this space.
During the Reproducible Builds summit in Marrakesh in 2019, developers from the GNU Guix, NixOS and Debian distributions were able to produce a bit-for-bit identical GNU Mes binary despite using three different versions of GCC. Since this summit, additional work resulted in a bit-for-bit identical Mes binary using tcc, and last month a fuller update was posted to this effect by the individuals involved. This month, however, David Wheeler updated his extensive page on Fully Countering Trusting Trust through Diverse Double-Compiling, remarking that:
GNU Mes rebuild is definitely an application of [Diverse Double-Compiling]. [..] This is an awesome application of DDC, and I believe it s the first publicly acknowledged use of DDC on a binary
There was a small, followup discussion on our mailing list. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update. This month, the Reproducible Builds project restarted our IRC meetings, managing to convene twice: the first time on October 12th (summary & logs), and later on the 26th (logs). As mentioned in previous reports, due to the unprecedented events throughout 2020, there will be no in-person summit event this year. On our mailing list this month El as Alejandro posted a request for help with a local configuration

Software development This month, we tried to fix a large number of currently-unreproducible packages, including: Bernhard M. Wiedemann also reported three issues against bison, ibus and postgresql12.

Tools diffoscope is our in-depth and content-aware diff utility. Not only could you locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds too. This month, Chris Lamb uploaded version 161 to Debian (later backported by Mattia Rizzolo), as well as made the following changes:
  • Move test_ocaml to the assert_diff helper. [ ]
  • Update tests to support OCaml version 4.11.1. Thanks to Sebastian Ramacher for the report. (#972518)
  • Bump minimum version of the Black source code formatter to 20.8b1. (#972518)
In addition, Jean-Romain Garnier temporarily updated the dependency on radare2 to ensure our test pipelines continue to work [ ], and for the GNU Guix distribution Vagrant Cascadian diffoscope to version 161 [ ]. In related development, trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb made the following changes:
  • Mark a --help-only test as being a superficial test. (#971506)
  • Add a real, albeit flaky, test that interacts with the try.diffoscope.org service. [ ]
  • Bump debhelper compatibility level to 13 [ ] and bump Standards-Version to 4.5.0 [ ].
Lastly, disorderfs version 0.5.10-2 was uploaded to Debian unstable by Holger Levsen, which enabled security hardening via DEB_BUILD_MAINT_OPTIONS [ ] and dropped debian/disorderfs.lintian-overrides [ ].

Website and documentation This month, a number of updates to the main Reproducible Builds website and related documentation were made by Chris Lamb:
  • Add a citation link to the academic article regarding dettrace [ ], and added yet another supply-chain security attack publication [ ].
  • Reformatted the Jekyll s Liquid templating language and CSS formatting to be consistent [ ] as well as expand a number of tab characters [ ].
  • Used relative_url to fix missing translation icon on various pages. [ ]
  • Published two announcement blog posts regarding the restarting of our IRC meetings. [ ][ ]
  • Added an explicit note regarding the lack of an in-person summit in 2020 to our events page. [ ]

Testing framework The Reproducible Builds project operates a Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Refactor and improve the Debian dashboard. [ ][ ][ ]
    • Track bugs which are usertagged as filesystem , fixfilepath , etc.. [ ][ ][ ]
    • Make a number of changes to package index pages. [ ][ ][ ]
  • System health checks:
    • Relax disk space warning levels. [ ]
    • Specifically detect build failures reported by dpkg-buildpackage. [ ]
    • Fix a regular expression to detect outdated package sets. [ ]
    • Detect Lintian issues in diffoscope. [ ]
  • Misc:
    • Make a number of updates to reflect that our sponsor Profitbricks has renamed itself to IONOS. [ ][ ][ ][ ]
    • Run a F-Droid maintenance routine twice a month to utilise its cleanup features. [ ]
    • Fix the target name in OpenWrt builds to ath79 from ath97. [ ]
    • Add a missing Postfix configuration for a node. [ ]
    • Temporarily disable Arch Linux builds until a core node is back. [ ]
    • Make a number of changes to our thanks page. [ ][ ][ ]
Build node maintenance was performed by both Holger Levsen [ ][ ] and Vagrant Cascadian [ ][ ][ ], Vagrant Cascadian also updated the page listing the variations made when testing to reflect changes for in build paths [ ] and Hans-Christoph Steiner made a number of changes for F-Droid, the free software app repository for Android devices, including:
  • Do not fail reproducibility jobs when their cleanup tasks fail. [ ]
  • Skip libvirt-related sudo command if we are not actually running libvirt. [ ]
  • Use direct URLs in order to eliminate a useless HTTP redirect. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. However, you can also get in touch with us via:

31 October 2020

Chris Lamb: Free software activities in October 2020

Here is my monthly update covering what I have been doing in the free software world during October 2020 (previous month):
  • As part of my role of being the assistant Secretary of the Open Source Initiative and a board director of Software in the Public Interest, I attended their respective monthly meetings and participated in various licensing and other discussions occurring on the internet as well as the usual internal discussions, etc., including continuing 'onboarding' of a new project to SPI.
  • ora2pg is a tool used to migrate an Oracle database to PostgreSQL. This month, I submitted a patch to make it build reproducibly. [...]
  • Addressed an issue filed by Bob Tanner by updating the documentation of my django-slack library which provides a convenient wrapper between projects using the Django and the Slack chat platform. [...]
  • Opened a pull request against libsass-python (a straightforward binding of libsass for Python, to compile the Sass CSS extensions in Python without the conventional Ruby stack) in order to make the build reproducible. [...]
For Lintian, the static analysis tool for Debian packages, I uploaded versions 2.97.0, 2.98.0, 2.99.0 & 2.100.0 as well as updated the declares-possibly-conflicting-debhelper-compat-versions tag as we may specify the Debhelper compatibility level in debian/rules or debian/control (#972464) and dropped a reference to missing manual page [...].

Reproducible Builds One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. The project is proud to be a member project of the Software Freedom Conservancy. Conservancy acts as a corporate umbrella allowing projects to operate as non-profit initiatives without managing their own corporate structure. If you like the work of the Conservancy or the Reproducible Builds project, please consider becoming an official supporter. This month, I:
  • In Debian:
  • Categorised a large number of packages and issues in the Reproducible Builds 'notes' repository.
  • Drafted, published and publicised our monthly report.
I also updated the main Reproducible Builds website and documentation:
  • Wrote and published two announcement blog posts regarding the restarting of our IRC meetings. [...][...]
  • Added a citation link to the academic article regarding dettrace [...], and added yet another supply-chain security attack publication [...].
  • Reformat Jekyll's Liquid templating language and CSS formatting to be consistent [...] as well as expand a number of tab characters [...].
  • Use relative_url to fix missing translation icon on various pages. [...]
  • Add an explicit note regarding the lack of an in-person summit in 2020 to our events page. [...]

Lastly, I made the following changes to diffoscope, including preparing and uploading version 161 to Debian:
  • Reviewed and merged functionality from Jean-Romain Garnier to add support for radare, a decompiler/reverse-engineering framework [...] and update debian/tests/control to match [...].
  • Move test_ocaml to the assert_diff helper. [...]
  • Update tests to support OCaml version 4.11.1. Thanks to Sebastian Ramacher for the report. (#972518)
  • Bump minimum version of the Black source code formatter to 20.8b1. (#972518)
trydiffoscope is the web-based version of diffoscope. This month, I made the following changes:
  • Mark --help-only test as being a 'superficial' test. (#971506)
  • Add a test that interacts with the try.diffoscope.org service. [...]
  • Bump debhelper compatibility level to 13 [...] and bump Standards-Version to 4.5.0 [...].


Debian Debian LTS This month I have worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project.
  • Investigated and triaged junit4 (CVE-2020-15250), libass (CVE-2020-26682), php5, ros-ros-comm (CVE-2020-16124), sympa & zabbix.
  • Frontdesk duties, responding to user/developer questions, reviewing others' packages, participating in mailing list discussions, etc.
  • Issued DLA 2406-1 for jackson-databind, a Java library for processing JSON, to address an external entity expansion vulnerability.
  • Issued DLA 2407-1 for tomcat8, the Java application server. This was to fix an issue where an excessive number of concurrent streams could have resulted in users seeing responses for unexpected resources.
  • Issued DLA 2410-1 and ELA 301-1 for the BlueZ suite of Bluetooth tools, utilities and daemons to prevent a double-free vulnerability.
You can find out more about the project via the following video:

Uploads
  • python-django (3.1.2-1) New upstream bugfix release.
  • redis:
    • 6.0.8-2 Apply a patch from Yossi Gottlieb to fix a crash when reporting RDB/AOF file errors. (#972683)
    • 6.0.9-1 New upstream release.
  • memcached (1.6.8+dfsg-1) New upstream release)
  • mtools (4.0.25-1) New upstream release, where parsing configuration file now works correctly with Turkish locale. (#972387)
  • bfs (2.0-1) New upstream release.
  • black (20.8b1-2) Non-maintainer upload to correct version handling to avoid a ModuleNotFoundError error which was affecting a number of related packages. (#970901)
Bugs filed
  • lintian: Please detect sed -e 's@$(CURDIR)@...@' in debian/rules. (#972629)
  • mdtraj: Manual pages appear to contain error messages instead of actual examples. (#972635)
  • gita: Missing build-depends on python3-yaml. (#972493)
  • git-buildpackage: Correct 'option' typo in manual page. (#972081)

20 October 2020

Reproducible Builds (diffoscope): diffoscope 161 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 161. This version includes the following changes:
[ Chris Lamb ]
* Fix failing testsuite: (Closes: #972518)
  - Update testsuite to support OCaml 4.11.1. (Closes: #972518)
  - Reapply Black and bump minimum version to 20.8b1.
* Move the OCaml tests to the assert_diff helper.
[ Jean-Romain Garnier ]
* Add support for radare2 as a disassembler.
[ Paul Spooren ]
* Automatically deploy Docker images in the continuous integration pipeline.
You find out more by visiting the project homepage.

4 September 2020

Reproducible Builds (diffoscope): diffoscope 159 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 159. This version includes the following changes:
[ Chris Lamb ]
* Show "ordering differences only" in strings(1) output.
  (Closes: reproducible-builds/diffoscope#216)
* Don't alias output from "os.path.splitext" to variables that we do not end
  up using.
* Don't raise exceptions when cleaning up after a guestfs cleanup failure.
[ Jean-Romain Garnier ]
* Make "Command" subclass a new generic Operation class.
You find out more by visiting the project homepage.

8 August 2020

Reproducible Builds: Reproducible Builds in July 2020

Welcome to the July 2020 report from the Reproducible Builds project. In these monthly reports, we round-up the things that we have been up to over the past month. As a brief refresher, the motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced from the original free software source code to the pre-compiled binaries we install on our systems. (If you re interested in contributing to the project, please visit our main website.)

General news At the upcoming DebConf20 conference (now being held online), Holger Levsen will present a talk on Thursday 27th August about Reproducing Bullseye in practice , focusing on independently verifying that the binaries distributed from ftp.debian.org were made from their claimed sources. Tavis Ormandy published a blog post making the provocative claim that You don t need reproducible builds , asserting elsewhere that the many attacks that have been extensively reported in our previous reports are fantasy threat models . A number of rebuttals have been made, including one from long-time contributor Reproducible Builds contributor Bernhard Wiedemann. On our mailing list this month, Debian Developer Graham Inggs posted to our list asking for ideas why the openorienteering-mapper Debian package was failing to build on the Reproducible Builds testing framework. Chris Lamb remarked from the build logs that the package may be missing a build dependency, although Graham then used our own diffoscope tool to show that the resulting package remains unchanged with or without it. Later, Nico Tyni noticed that the build failure may be due to the relationship between the FILE C preprocessor macro and the -ffile-prefix-map GCC flag. An issue in Zephyr, a small-footprint kernel designed for use on resource-constrained systems, around .a library files not being reproducible was closed after it was noticed that a key part of their toolchain was updated that now calls --enable-deterministic-archives by default. Reproducible Builds developer kpcyrd commented on a pull request against the libsodium cryptographic library wrapper for Rust, arguing against the testing of CPU features at compile-time. He noted that:
I ve accidentally shipped broken updates to users in the past because the build system was feature-tested and the final binary assumed the instructions would be present without further runtime checks
David Kleuker also asked a question on our mailing list about using SOURCE_DATE_EPOCH with the install(1) tool from GNU coreutils. When comparing two installed packages he noticed that the filesystem birth times differed between them. Chris Lamb replied, realising that this was actually a consequence of using an outdated version of diffoscope and that a fix was in diffoscope version 146 released in May 2020. Later in July, John Scott posted asking for clarification regarding on the Javascript files on our website to add metadata for LibreJS, the browser extension that blocks non-free Javascript scripts from executing. Chris Lamb investigated the issue and realised that we could drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ].

Development work

Website On our website this month, Chris Lamb updated the main Reproducible Builds website and documentation to drop a number of unused Javascript files [ ][ ][ ] and added unminified versions of Bootstrap and jQuery [ ]. He also fixed a number of broken URLs [ ][ ]. Gonzalo Bulnes Guilpain made a large number of grammatical improvements [ ][ ][ ][ ][ ] as well as some misspellings, case and whitespace changes too [ ][ ][ ]. Lastly, Holger Levsen updated the README file [ ], marked the Alpine Linux continuous integration tests as currently disabled [ ] and linked the Arch Linux Reproducible Status page from our projects page [ ].

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds. In July, Chris Lamb made the following changes to diffoscope, including releasing versions 150, 151, 152, 153 & 154:
  • New features:
    • Add support for flash-optimised F2FS filesystems. (#207)
    • Don t require zipnote(1) to determine differences in a .zip file as we can use libarchive. [ ]
    • Allow --profile as a synonym for --profile=-, ie. write profiling data to standard output. [ ]
    • Increase the minimum length of the output of strings(1) to eight characters to avoid unnecessary diff noise. [ ]
    • Drop some legacy argument styles: --exclude-directory-metadata and --no-exclude-directory-metadata have been replaced with --exclude-directory-metadata= yes,no . [ ]
  • Bug fixes:
    • Pass the absolute path when extracting members from SquashFS images as we run the command with working directory in a temporary directory. (#189)
    • Correct adding a comment when we cannot extract a filesystem due to missing libguestfs module. [ ]
    • Don t crash when listing entries in archives if they don t have a listed size such as hardlinks in ISO images. (#188)
  • Output improvements:
    • Strip off the file offset prefix from xxd(1) and show bytes in groups of 4. [ ]
    • Don t emit javap not found in path if it is available in the path but it did not result in an actual difference. [ ]
    • Fix ... not available in path messages when looking for Java decompilers that used the Python class name instead of the command. [ ]
  • Logging improvements:
    • Add a bit more debugging info when launching libguestfs. [ ]
    • Reduce the --debug log noise by truncating the has_some_content messages. [ ]
    • Fix the compare_files log message when the file does not have a literal name. [ ]
  • Codebase improvements:
    • Rewrite and rename exit_if_paths_do_not_exist to not check files multiple times. [ ][ ]
    • Add an add_comment helper method; don t mess with our internal list directly. [ ]
    • Replace some simple usages of str.format with Python f-strings [ ] and make it easier to navigate to the main.py entry point [ ].
    • In the RData comparator, always explicitly return None in the failure case as we return a non-None value in the success one. [ ]
    • Tidy some imports [ ][ ][ ] and don t alias a variable when we do not use it. [ ]
    • Clarify the use of a separate NullChanges quasi-file to represent missing data in the Debian package comparator [ ] and clarify use of a null diff in order to remember an exit code. [ ]
  • Other changes:
    • Profile the launch of libguestfs filesystems. [ ]
    • Clarify and correct our contributing info. [ ][ ][ ][ ][ ][ ]
Jean-Romain Garnier also made the following changes:
  • Allow passing a file with a list of arguments via diffoscope @args.txt. (!62)
  • Improve the output of side-by-side diffs by detecting added lines better. (!64)
  • Remove offsets before instructions in objdump [ ][ ] and remove raw instructions from ELF tests [ ].

Other tools strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. In July, Chris Lamb ensured that we did not install the internal handler documentation generated from Perl POD documents [ ] and fixed a trivial typo [ ]. Marc Herbert added a --verbose-level warning when the Archive::Cpio Perl module is missing. (!6) reprotest is our end-user tool to build same source code twice in widely differing environments and then checks the binaries produced by each build for any differences. This month, Vagrant Cascadian made a number of changes to support diffoscope version 153 which had removed the (deprecated) --exclude-directory-metadata and --no-exclude-directory-metadata command-line arguments, and updated the testing configuration to also test under Python version 3.8 [ ].

Distributions

Debian In June 2020, Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of hundreds of packages that use the CMake build system. This month however, Niels Thykier uploaded debhelper version 13.2 that passes the -DCMAKE_SKIP_RPATH=ON and -DBUILD_RPATH_USE_ORIGIN=ON arguments to CMake when using the (currently-experimental) Debhelper compatibility level 14. According to Niels, this change:
should fix some reproducibility issues, but may cause breakage if packages run binaries directly from the build directory.
34 reviews of Debian packages were added, 14 were updated and 20 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised the nondeterministic_order_of_debhelper_snippets_added_by_dh_fortran_mod [ ] and gem2deb_install_mkmf_log [ ] toolchain issues. Lastly, Holger Levsen filed two more wishlist bugs against the debrebuild Debian package rebuilder tool [ ][ ].

openSUSE In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update. Bernhard also published the results of performing 12,235 verification builds of packages from openSUSE Leap version 15.2 and, as a result, created three pull requests against the openSUSE Build Result Compare Script [ ][ ][ ].

Other distributions In Arch Linux, there was a mass rebuild of old packages in an attempt to make them reproducible. This was performed because building with a previous release of the pacman package manager caused file ordering and size calculation issues when using the btrfs filesystem. A system was also implemented for Arch Linux packagers to receive notifications if/when their package becomes unreproducible, and packagers now have access to a dashboard where they can all see all their unreproducible packages (more info). Paul Spooren sent two versions of a patch for the OpenWrt embedded distribution for adding a build system revision to the packages manifest so that all external feeds can be rebuilt and verified. [ ][ ]

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including: Vagrant Cascadian also reported two issues, the first regarding a regression in u-boot boot loader reproducibility for a particular target [ ] and a non-deterministic segmentation fault in the guile-ssh test suite [ ]. Lastly, Jelle van der Waa filed a bug against the MeiliSearch search API to report that it embeds the current build date.

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Tweak the rescheduling of various architecture and suite combinations. [ ][ ]
    • Fix links for 404 and not for us icons. (#959363)
    • Further work on a rebuilder prototype, for example correctly processing the sbuild exit code. [ ][ ]
    • Update the sudo configuration file to allow the node health job to work correctly. [ ]
    • Add php-horde packages back to the pkg-php-pear package set for the bullseye distribution. [ ]
    • Update the version of debrebuild. [ ]
  • System health check development:
    • Add checks for broken SSH [ ], logrotate [ ], pbuilder [ ], NetBSD [ ], unkillable processes [ ], unresponsive nodes [ ][ ][ ][ ], proxy connection failures [ ], too many installed kernels [ ], etc.
    • Automatically fix some failed systemd units. [ ]
    • Add notes explaining all the issues that hosts are experiencing [ ] and handle zipped job log files correctly [ ].
    • Separate nodes which have been automatically marked as down [ ] and show status icons for jobs with issues [ ].
  • Misc:
    • Disable all Alpine Linux jobs until they are or Alpine is fixed. [ ]
    • Perform some general upkeep of build nodes hosted by OSUOSL. [ ][ ][ ][ ]
In addition, Mattia Rizzolo updated the init_node script to suggest using sudo instead of explicit logout and logins [ ][ ] and the usual build node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

18 July 2020

Reproducible Builds (diffoscope): diffoscope 152 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 152. This version includes the following changes:
[ Chris Lamb ]
* Bug fixes:
  - Don't require zipnote(1) to determine differences in a .zip file as we
    can use libarchive directly.
* Reporting improvements:
  - Don't emit "javap not found in path" if it is available in the path but
    it did not result in any actual difference.
  - Fix "... not available in path" messages when looking for Java
    decompilers; we were using the Python class name (eg. "<class
    'diffoscope.comparators.java.Javap'>") over the actual command we looked
    for (eg. "javap").
* Code improvements:
  - Replace some simple usages of str.format with f-strings.
  - Tidy inline imports in diffoscope.logging.
  - In the RData comparator, always explicitly return a None value in the
    failure cases as we return a non-None value in the "success" one.
[ Jean-Romain Garnier ]
* Improve output of side-by-side diffs, detecting added lines better.
  (MR: reproducible-builds/diffoscope!64)
* Allow passing file with list of arguments to ArgumentParser (eg.
  "diffoscope @args.txt"). (MR: reproducible-builds/diffoscope!62)
You find out more by visiting the project homepage.

6 July 2020

Reproducible Builds: Reproducible Builds in June 2020

Welcome to the June 2020 report from the Reproducible Builds project. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

What are reproducible builds? One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. But whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes.

News The GitHub Security Lab published a long article on the discovery of a piece of malware designed to backdoor open source projects that used the build process and its resulting artifacts to spread itself. In the course of their analysis and investigation, the GitHub team uncovered 26 open source projects that were backdoored by this malware and were actively serving malicious code. (Full article) Carl Dong from Chaincode Labs uploaded a presentation on Bitcoin Build System Security and reproducible builds to YouTube: The app intended to trace infection chains of Covid-19 in Switzerland published information on how to perform a reproducible build. The Reproducible Builds project has received funding in the past from the Open Technology Fund (OTF) to reach specific technical goals, as well as to enable the project to meet in-person at our summits. The OTF has actually also assisted countless other organisations that promote transparent, civil society as well as those that provide tools to circumvent censorship and repressive surveillance. However, the OTF has now been threatened with closure. (More info) It was noticed that Reproducible Builds was mentioned in the book End-user Computer Security by Mark Fernandes (published by WikiBooks) in the section titled Detection of malware in software. Lastly, reproducible builds and other ideas around software supply chain were mentioned in a recent episode of the Ubuntu Podcast in a wider discussion about the Snap and application stores (at approx 16:00).

Distribution work In the ArchLinux distribution, a goal to remove .doctrees from installed files was created via Arch s TODO list mechanism. These .doctree files are caches generated by the Sphinx documentation generator when developing documentation so that Sphinx does not have to reparse all input files across runs. They should not be packaged, especially as they lead to the package being unreproducible as their pickled format contains unreproducible data. Jelle van der Waa and Eli Schwartz submitted various upstream patches to fix projects that install these by default. Dimitry Andric was able to determine why the reproducibility status of FreeBSD s base.txz depended on the number of CPU cores, attributing it to an optimisation made to the Clang C compiler [ ]. After further detailed discussion on the FreeBSD bug it was possible to get the binaries reproducible again [ ]. For the GNU Guix operating system, Vagrant Cascadian started a thread about collecting reproducibility metrics and Jan janneke Nieuwenhuizen posted that they had further reduced their bootstrap seed to 25% which is intended to reduce the amount of code to be audited to avoid potential compiler backdoors. In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update as well as made the following changes within the distribution itself:

Debian Holger Levsen filed three bugs (#961857, #961858 & #961859) against the reproducible-check tool that reports on the reproducible status of installed packages on a running Debian system. They were subsequently all fixed by Chris Lamb [ ][ ][ ]. Timo R hling filed a wishlist bug against the debhelper build tool impacting the reproducibility status of 100s of packages that use the CMake build system which led to a number of tests and next steps. [ ] Chris Lamb contributed to a conversation regarding the nondeterministic execution of order of Debian maintainer scripts that results in the arbitrary allocation of UNIX group IDs, referencing the Tails operating system s approach this [ ]. Vagrant Cascadian also added to a discussion regarding verification formats for reproducible builds. 47 reviews of Debian packages were added, 37 were updated and 69 were removed this month adding to our knowledge about identified issues. Chris Lamb identified and classified a new uids_gids_in_tarballs_generated_by_cmake_kde_package_app_templates issue [ ] and updated the paths_vary_due_to_usrmerge as deterministic issue, and Vagrant Cascadian updated the cmake_rpath_contains_build_path and gcc_captures_build_path issues. [ ][ ][ ]. Lastly, Debian Developer Bill Allombert started a mailing list thread regarding setting the -fdebug-prefix-map command-line argument via an environment variable and Holger Levsen also filed three bugs against the debrebuild Debian package rebuilder tool (#961861, #961862 & #961864).

Development On our website this month, Arnout Engelen added a link to our Mastodon account [ ] and moved the SOURCE_DATE_EPOCH git log example to another section [ ]. Chris Lamb also limited the number of news posts to avoid showing items from (for example) 2017 [ ]. strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. It is used automatically in most Debian package builds. This month, Mattia Rizzolo bumped the debhelper compatibility level to 13 [ ] and adjusted a related dependency to avoid potential circular dependency [ ].

Upstream work The Reproducible Builds project attempts to fix unreproducible packages and we try to to send all of our patches upstream. This month, we wrote a large number of such patches including: Bernhard M. Wiedemann also filed reports for frr (build fails on single-processor machines), ghc-yesod-static/git-annex (a filesystem ordering issue) and ooRexx (ASLR-related issue).

diffoscope diffoscope is our in-depth diff-on-steroids utility which helps us diagnose reproducibility issues in packages. It does not define reproducibility, but rather provides a helpful and human-readable guidance for packages that are not reproducible, rather than relying essentially-useless binary diffs. This month, Chris Lamb uploaded versions 147, 148 and 149 to Debian and made the following changes:
  • New features:
    • Add output from strings(1) to ELF binaries. (#148)
    • Dump PE32+ executables (such as EFI applications) using objdump(1). (#181)
    • Add support for Zsh shell completion. (#158)
  • Bug fixes:
    • Prevent a traceback when comparing PDF documents that did not contain metadata (ie. a PDF /Info stanza). (#150)
    • Fix compatibility with jsondiff version 1.2.0. (#159)
    • Fix an issue in GnuPG keybox file handling that left filenames in the diff. [ ]
    • Correct detection of JSON files due to missing call to File.recognizes that checks candidates against file(1). [ ]
  • Output improvements:
    • Use the CSS word-break property over manually adding U+200B zero-width spaces as these were making copy-pasting cumbersome. (!53)
    • Downgrade the tlsh warning message to an info level warning. (#29)
  • Logging improvements:
  • Testsuite improvements:
    • Update tests for file(1) version 5.39. (#179)
    • Drop accidentally-duplicated copy of the --diff-mask tests. [ ]
    • Don t mask an existing test. [ ]
  • Codebase improvements:
    • Replace obscure references to WF with Wagner-Fischer for clarity. [ ]
    • Use a semantic AbstractMissingType type instead of remembering to check for both types of missing files. [ ]
    • Add a comment regarding potential security issue in the .changes, .dsc and .buildinfo comparators. [ ]
    • Drop a large number of unused imports. [ ][ ][ ][ ][ ]
    • Make many code sections more Pythonic. [ ][ ][ ][ ]
    • Prevent some variable aliasing issues. [ ][ ][ ]
    • Use some tactical f-strings to tidy up code [ ][ ] and remove explicit u"unicode" strings [ ].
    • Refactor a large number of routines for clarity. [ ][ ][ ][ ]
trydiffoscope is the web-based version of diffoscope. This month, Chris Lamb also corrected the location for the celerybeat scheduler to ensure that the clean/tidy tasks are actually called which had caused an accidental resource exhaustion. (#12) In addition Jean-Romain Garnier made the following changes:
  • Fix the --new-file option when comparing directories by merging DirectoryContainer.compare and Container.compare. (#180)
  • Allow user to mask/filter diff output via --diff-mask=REGEX. (!51)
  • Make child pages open in new window in the --html-dir presenter format. [ ]
  • Improve the diffs in the --html-dir format. [ ][ ]
Lastly, Daniel Fullmer fixed the Coreboot filesystem comparator [ ] and Mattia Rizzolo prevented warnings from the tlsh fuzzy-matching library during tests [ ] and tweaked the build system to remove an unwanted .build directory [ ]. For the GNU Guix distribution Vagrant Cascadian updated the version of diffoscope to version 147 [ ] and later 148 [ ].

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org. Amongst many other tasks, this tracks the status of our reproducibility efforts across many distributions as well as identifies any regressions that have been introduced. This month, Holger Levsen made the following changes:
  • Debian-related changes:
    • Prevent bogus failure emails from rsync2buildinfos.debian.net every night. [ ]
    • Merge a fix from David Bremner s database of .buildinfo files to include a fix regarding comparing source vs. binary package versions. [ ]
    • Only run the Debian package rebuilder job twice per day. [ ]
    • Increase bullseye scheduling. [ ]
  • System health status page:
    • Add a note displaying whether a node needs to be rebooted for a kernel upgrade. [ ]
    • Fix sorting order of failed jobs. [ ]
    • Expand footer to link to the related Jenkins job. [ ]
    • Add archlinux_html_pages, openwrt_rebuilder_today and openwrt_rebuilder_future to known broken jobs. [ ]
    • Add HTML <meta> header to refresh the page every 5 minutes. [ ]
    • Count the number of ignored jobs [ ], ignore permanently known broken jobs [ ] and jobs on known offline nodes [ ].
    • Only consider the known offline status from Git. [ ]
    • Various output improvements. [ ][ ]
  • Tools:
    • Switch URLs for the Grml Live Linux and PureOS package sets. [ ][ ]
    • Don t try to build a disorderfs Debian source package. [ ][ ][ ]
    • Stop building diffoscope as we are moving this to Salsa. [ ][ ]
    • Merge several is diffoscope up-to-date on every platform? test jobs into one [ ] and fail less noisily if the version in Debian cannot be determined [ ].
In addition: Marcus Hoffmann was added as a maintainer of the F-Droid reproducible checking components [ ], Jelle van der Waa updated the is diffoscope up-to-date in every platform check for Arch Linux and diffoscope [ ], Mattia Rizzolo backed up a copy of a remove script run on the Codethink-hosted jump server [ ] and Vagrant Cascadian temporarily disabled the fixfilepath on bullseye, to get better data about the ftbfs_due_to_f-file-prefix-map categorised issue. Lastly, the usual build node maintenance was performed by Holger Levsen [ ][ ], Mattia Rizzolo [ ] and Vagrant Cascadian [ ][ ][ ][ ][ ].

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Eli Schwartz, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

3 July 2020

Reproducible Builds (diffoscope): diffoscope 150 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 150. This version includes the following changes:
[ Chris Lamb ]
* Don't crash when listing entries in archives if they don't have a listed
  size (such as hardlinks in .ISO files).
  (Closes: reproducible-builds/diffoscope#188)
* Dump PE32+ executables (including EFI applications) using objdump.
  (Closes: reproducible-builds/diffoscope#181)
* Tidy detection of JSON files due to missing call to File.recognizes that
  checks against the output of file(1) which was also causing us to attempt
  to parse almost every file using json.loads. (Whoops.)
* Drop accidentally-duplicated copy of the new --diff-mask tests.
* Logging improvements:
  - Split out formatting of class names into a common method.
  - Clarify that we are generating presenter formats in the opening logs.
[ Jean-Romain Garnier ]
* Remove objdjump(1) offsets before instructions to reduce diff noise.
  (Closes: reproducible-builds/diffoscope!57)
You find out more by visiting the project homepage.

26 June 2020

Reproducible Builds (diffoscope): diffoscope 149 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 149. This version includes the following changes:
[ Chris Lamb ]
* Update tests for file 5.39. (Closes: reproducible-builds/diffoscope#179)
* Downgrade the tlsh warning message to an "info" level warning.
  (Closes: #888237, reproducible-builds/diffoscope#29)
* Use the CSS "word-break" property over manually adding U+200B zero-width
  spaces that make copy-pasting cumbersome.
  (Closes: reproducible-builds/diffoscope!53)
* Codebase improvements:
  - Drop some unused imports from the previous commit.
  - Prevent an unnecessary .format() when rendering difference comments.
  - Use a semantic "AbstractMissingType" type instead of remembering to check
    for both "missing" files and missing containers.
[ Jean-Romain Garnier ]
* Allow user to mask/filter reader output via --diff-mask=REGEX.
  (MR: reproducible-builds/diffoscope!51)
* Make --html-dir child pages open in new window to accommodate new web
  browser content security policies.
* Fix the --new-file option when comparing directories by merging
  DirectoryContainer.compare and Container.compare.
  (Closes: reproducible-builds/diffoscope#180)
* Fix zsh completion for --max-page-diff-block-lines.
[ Mattia Rizzolo ]
* Do not warn about missing tlsh during tests.
You find out more by visiting the project homepage.

4 June 2020

Reproducible Builds: Reproducible Builds in May 2020

Welcome to the May 2020 report from the Reproducible Builds project. One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. Nonetheless, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into seemingly secure software during the various compilation and distribution processes. In these reports we outline the most important things that we and the rest of the community have been up to over the past month.

News The Corona-Warn app that helps trace infection chains of SARS-CoV-2/COVID-19 in Germany had a feature request filed against it that it build reproducibly. A number of academics from Cornell University have published a paper titled Backstabber s Knife Collection which reviews various open source software supply chain attacks:
Recent years saw a number of supply chain attacks that leverage the increasing use of open source during software development, which is facilitated by dependency managers that automatically resolve, download and install hundreds of open source packages throughout the software life cycle.
In related news, the LineageOS Android distribution announced that a hacker had access to the infrastructure of their servers after exploiting an unpatched vulnerability. Marcin Jachymiak of the Sia decentralised cloud storage platform posted on their blog that their siac and siad utilities can now be built reproducibly:
This means that anyone can recreate the same binaries produced from our official release process. Now anyone can verify that the release binaries were created using the source code we say they were created from. No single person or computer needs to be trusted when producing the binaries now, which greatly reduces the attack surface for Sia users.
Synchronicity is a distributed build system for Rust build artifacts which have been published to crates.io. The goal of Synchronicity is to provide a distributed binary transparency system which is independent of any central operator. The Comparison of Linux distributions article on Wikipedia now features a Reproducible Builds column indicating whether distributions approach and progress towards achieving reproducible builds.

Distribution work In Debian this month: In Alpine Linux, an issue was filed and closed regarding the reproducibility of .apk packages. Allan McRae of the ArchLinux project posted their third Reproducible builds progress report to the arch-dev-public mailing list which includes the following call for help:
We also need help to investigate and fix the packages that fail to reproduce that we have not investigated as of yet.
In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.

Software development

diffoscope Chris Lamb made the changes listed below to diffoscope, our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. He also prepared and uploaded versions 142, 143, 144, 145 and 146 to Debian, PyPI, etc.
  • Comparison improvements:
    • Improve fuzzy matching of JSON files as file now supports recognising JSON data. (#106)
    • Refactor .changes and .buildinfo handling to show all details (including the GnuPG header and footer components) even when referenced files are not present. (#122)
    • Use our BuildinfoFile comparator (etc.) regardless of whether the associated files (such as the orig.tar.gz and the .deb) are present. [ ]
    • Include GnuPG signature data when comparing .buildinfo, .changes, etc. [ ]
    • Add support for printing Android APK signatures via apksigner(1). (#121)
    • Identify iOS App Zip archive data as .zip files. (#116)
    • Add support for Apple Xcode .mobilepovision files. (#113)
  • Bug fixes:
    • Don t print a traceback if we pass a single, missing argument to diffoscope (eg. a JSON diff to re-load). [ ]
    • Correct differences typo in the ApkFile handler. (#127)
  • Output improvements:
    • Never emit the same id="foo" anchor reference twice in the HTML output, otherwise identically-named parts will not be able to linked to via a #foo anchor. (#120)
    • Never emit an empty id anchor either; it is not possible to link to #. [ ]
    • Don t pretty-print the output when using the --json presenter; it will usually be too complicated to be readable by the human anyway. [ ]
    • Use the SHA256 over MD5 hash when generating page names for the HTML directory-style presenter. (#124)
  • Reporting improvements:
    • Clarify the message when we truncate the number of lines to standard error [ ] and reduce the number of maximum lines printed to 25 as usually the error is obvious by then [ ].
    • Print the amount of free space that we have available in our temporary directory as a debugging message. [ ]
    • Clarify Command [ ] failed with exit code messages to remove duplicate exited with exit but also to note that diffoscope is interpreting this as an error. [ ]
    • Don t leak the full path of the temporary directory in Command [ ] exited with 1 messages. (#126)
    • Clarify the warning message when we cannot import the debian Python module. [ ]
    • Don t repeat stderr from if both commands emit the same output. [ ]
    • Clarify that an external command emits for both files, otherwise it can look like we are repeating itself when, in reality, it is being run twice. [ ]
  • Testsuite improvements:
    • Prevent apksigner test failures due to lack of binfmt_misc, eg. on Salsa CI and elsewhere. [ ]
    • Drop .travis.yml as we use Salsa instead. [ ]
  • Dockerfile improvements:
    • Add a .dockerignore file to whitelist files we actually need in our container. (#105)
    • Use ARG instead of ENV when setting up the DEBIAN_FRONTEND environment variable at runtime. (#103)
    • Run as a non-root user in container. (#102)
    • Install/remove the build-essential during build so we can install the recommended packages from Git. [ ]
  • Codebase improvements:
    • Bump the officially required version of Python from 3.5 to 3.6. (#117)
    • Drop the (default) shell=False keyword argument to subprocess.Popen so that the potentially-unsafe shell=True is more obvious. [ ]
    • Perform string normalisation in Black [ ] and include the Black output in the assertion failure too [ ].
    • Inline MissingFile s special handling of deb822 to prevent leaking through abstract layers. [ ][ ]
    • Allow a bare try/except block when cleaning up temporary files with respect to the flake8 quality assurance tool. [ ]
    • Rename in_dsc_path to dsc_in_same_dir to clarify the use of this variable. [ ]
    • Abstract out the duplicated parts of the debian_fallback class [ ] and add descriptions for the file types. [ ]
    • Various commenting and internal documentation improvements. [ ][ ]
    • Rename the Openssl command class to OpenSSLPKCS7 to accommodate other command names with this prefix. [ ]
  • Misc:
    • Rename the --debugger command-line argument to --pdb. [ ]
    • Normalise filesystem stat(2) birth times (ie. st_birthtime) in the same way we do with the stat(1) command s Access: and Change: times to fix a nondeterministic build failure in GNU Guix. (#74)
    • Ignore case when ordering our file format descriptions. [ ]
    • Drop, add and tidy various module imports. [ ][ ][ ][ ]
In addition:
  • Jean-Romain Garnier fixed a general issue where, for example, LibarchiveMember s has_same_content method was called regardless of the underlying type of file. [ ]
  • Daniel Fullmer fixed an issue where some filesystems could only be mounted read-only. (!49)
  • Emanuel Bronshtein provided a patch to prevent a build of the Docker image containing parts of the build s. (#123)
  • Mattia Rizzolo added an entry to debian/py3dist-overrides to ensure the rpm-python module is used in package dependencies (#89) and moved to using the new execute_after_* and execute_before_* Debhelper rules [ ].

Chris Lamb also performed a huge overhaul of diffoscope s website:
  • Add a completely new design. [ ][ ]
  • Dynamically generate our contributor list [ ] and supported file formats [ ] from the main Git repository.
  • Add a separate, canonical page for every new release. [ ][ ][ ]
  • Generate a latest release section and display that with the corresponding date on the homepage. [ ]
  • Add an RSS feed of our releases [ ][ ][ ][ ][ ] and add to Planet Debian [ ].
  • Use Jekyll s absolute_url and relative_url where possible [ ][ ] and move a number of configuration variables to _config.yml [ ][ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Other tools Elsewhere in our tooling: strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. In May, Chris Lamb uploaded version 1.8.1-1 to Debian unstable and Bernhard M. Wiedemann fixed an off-by-one error when parsing PNG image modification times. (#16) In disorderfs, our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues, Chris Lamb replaced the term dirents in place of directory entries in human-readable output/log messages [ ] and used the astyle source code formatter with the default settings to the main disorderfs.cpp source file [ ]. Holger Levsen bumped the debhelper-compat level to 13 in disorderfs [ ] and reprotest [ ], and for the GNU Guix distribution Vagrant Cascadian updated the versions of disorderfs to version 0.5.10 [ ] and diffoscope to version 145 [ ].

Project documentation & website
  • Carl Dong:
  • Chris Lamb:
    • Rename the Who page to Projects . [ ]
    • Ensure that Jekyll enters the _docs subdirectory to find the _docs/index.md file after an internal move. (#27)
    • Wrap ltmain.sh etc. in preformatted quotes. [ ]
    • Wrap the SOURCE_DATE_EPOCH Python examples onto more lines to prevent visual overflow on the page. [ ]
    • Correct a preferred spelling error. [ ]
  • Holger Levsen:
    • Sort our Academic publications page by publication year [ ] and add Trusting Trust and Fully Countering Trusting Trust through Diverse Double-Compiling [ ].
  • Juri Dispan:

Testing framework We operate a large and many-featured Jenkins-based testing framework that powers tests.reproducible-builds.org that, amongst many other tasks, tracks the status of our reproducibility efforts as well as identifies any regressions that have been introduced. Holger Levsen made the following changes:
  • System health status:
    • Improve page description. [ ]
    • Add more weight to proxy failures. [ ]
    • More verbose debug/failure messages. [ ][ ][ ]
    • Work around strangeness in the Bash shell let VARIABLE=0 exits with an error. [ ]
  • Debian:
    • Fail loudly if there are more than three .buildinfo files with the same name. [ ]
    • Fix a typo which prevented /usr merge variation on Debian unstable. [ ]
    • Temporarily ignore PHP s horde](https://www.horde.org/) packages in Debian bullseye. [ ]
    • Document how to reboot all nodes in parallel, working around molly-guard. [ ]
  • Further work on a Debian package rebuilder:
    • Workaround and document various issues in the debrebuild script. [ ][ ][ ][ ]
    • Improve output in the case of errors. [ ][ ][ ][ ]
    • Improve documentation and future goals [ ][ ][ ][ ], in particular documentiing two real world tests case for an impossible to recreate build environment [ ].
    • Find the right source package to rebuild. [ ]
    • Increase the frequency we run the script. [ ][ ][ ][ ]
    • Improve downloading and selection of the sources to build. [ ][ ][ ]
    • Improve version string handling.. [ ]
    • Handle build failures better. [ ]. [ ]. [ ]
    • Also consider architecture all .buildinfo files. [ ][ ]
In addition:
  • kpcyrd, for Alpine Linux, updated the alpine_schroot.sh script now that a patch for abuild had been released upstream. [ ]
  • Alexander Couzens of the OpenWrt project renamed the brcm47xx target to bcm47xx. [ ]
  • Mattia Rizzolo fixed the printing of the build environment during the second build [ ][ ][ ] and made a number of improvements to the script that deploys Jenkins across our infrastructure [ ][ ][ ].
Lastly, Vagrant Cascadian clarified in the documentation that you need to be user jenkins to run the blacklist command [ ] and the usual build node maintenance was performed was performed by Holger Levsen [ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ].

Mailing list: There were a number of discussions on our mailing list this month: Paul Spooren started a thread titled Reproducible Builds Verification Format which reopens the discussion around a schema for sharing the results from distributed rebuilders:
To make the results accessible, storable and create tools around them, they should all follow the same schema, a reproducible builds verification format. The format tries to be as generic as possible to cover all open source projects offering precompiled source code. It stores the rebuilder results of what is reproducible and what not.
Hans-Christoph Steiner of the Guardian Project also continued his previous discussion regarding making our website translatable. Lastly, Leo Wandersleb posted a detailed request for feedback on a question of supply chain security and other issues of software review; Leo is the founder of the Wallet Scrutiny project which aims to prove the security of Android Bitcoin Wallets:
Do you own your Bitcoins or do you trust that your app allows you to use your coins while they are actually controlled by them ? Do you have a backup? Do they have a copy they didn t tell you about? Did anybody check the wallet for deliberate backdoors or vulnerabilities? Could anybody check the wallet for those?
Elsewhere, Leo had posted instructions on his attempts to reproduce the binaries for the BlueWallet Bitcoin wallet for iOS and Android platforms.


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

This month s report was written by Bernhard M. Wiedemann, Chris Lamb, Holger Levsen, Jelle van der Waa and Vagrant Cascadian. It was subsequently reviewed by a bunch of Reproducible Builds folks on IRC and the mailing list.

30 May 2020

Reproducible Builds (diffoscope): diffoscope 146 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 146. This version includes the following changes:
[ Chris Lamb ]
* Refactor .changes and .buildinfo handling to show all details (including
  the GPG header and footer components), even when referenced files are not
  present. (Closes: reproducible-builds/diffoscope#122)
* Normalise filesystem stat(2) "birth times" (ie. st_birthtime) in the same
  way we do with stat(1)'s "Access:" and "Change:" times to fix a
  nondetermistic build failure on GNU Guix.
  (Closes: reproducible-builds/diffoscope#74)
* Drop the (default) subprocess.Popen(shell=False) keyword argument so that
  the more unsafe shell=True is more obvious.
* Ignore lower vs. upper-case when ordering our file format descriptions.
* Don't skip string normalisation in Black.
[ Mattia Rizzolo ]
* Add a "py3dist" override for the rpm-python module (Closes: #949598)
* Bump the debhelper compat level to 13 and use the new
  execute_after_*/execture_before_* style rules.
* Fix a spelling error in changelog.
[ Daniel Fullmer ]
* Mount GuestFS filesystem images readonly.
[ Jean-Romain Garnier ]
* Prevent an issue where (for example) LibarchiveMember's has_same_content
  method is called regardless of the actual type of file.
You find out more by visiting the project homepage.

17 March 2020

Dirk Eddelbuettel: Rcpp 1.0.4: Lots of goodies

rcpp logo The fourth maintenance release 1.0.4 of Rcpp, following up on the 10th anniversary and the 1.0.0. release sixteen months ago, arrived on CRAN this morning. This follows a few days of gestation at CRAN. To help during the wait we provided this release via drat last Friday. And it followed a pre-release via drat a week earlier. But now that the release is official, Windows and macOS binaries will be built by CRAN over the next few days. The corresponding Debian package will be uploaded as a source package shortly after which binaries can be built. As with the previous releases Rcpp 1.0.1, Rcpp 1.0.2 and Rcpp 1.0.3, we have the predictable and expected four month gap between releases which seems appropriate given both the changes still being made (see below) and the relative stability of Rcpp. It still takes work to release this as we run multiple extensive sets of reverse dependency checks so maybe one day we will switch to six month cycle. For now, four months still seem like a good pace. Rcpp has become the most popular way of enhancing R with C or C++ code. As of today, 1873 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 191 in BioConductor. And per the (partial) logs of CRAN downloads, we are running steasy at one millions downloads per month. This release features quite a number of different pull requests by seven different contributors as detailed below. One (personal) highlight is the switch to tinytest.

Changes in Rcpp version 1.0.4 (2020-03-13)
  • Changes in Rcpp API:
    • Safer Rcpp_list*, Rcpp_lang* and Function.operator() (Romain in #1014, #1015).
    • A number of #nocov markers were added (Dirk in #1036, #1042 and #1044).
    • Finalizer calls clear external pointer first (Kirill M ller and Dirk in #1038).
    • Scalar operations with a rhs matrix no longer change the matrix value (Qiang in #1040 fixing (again) #365).
    • Rcpp::exception and Rcpp::stop are now more thread-safe (Joshua Pritikin in #1043).
  • Changes in Rcpp Attributes:
    • The cppFunction helper now deals correctly with mulitple depends arguments (TJ McKinley in #1016 fixing #1017).
    • Invisible return objects are now supported via new option (Kun Ren in #1025 fixing #1024).
    • Unavailable packages referred to in LinkingTo are now reported (Dirk in #1027 fixing #1026).
    • The sourceCpp function can now create a debug DLL on Windows (Dirk in #1037 fixing #1035).
  • Changes in Rcpp Documentation:
    • The .github/ directory now has more explicit guidance on contributing, issues, and pull requests (Dirk).
    • The Rcpp Attributes vignette describe the new invisible return object option (Kun Ren in #1025).
    • Vignettes are now included as pre-made pdf files (Dirk in #1029)
    • The Rcpp FAQ has a new entry on the recommended importFrom directive (Dirk in #1031 fixing #1030).
    • The bib file for the vignette was once again updated to current package versions (Dirk).
  • Changes in Rcpp Deployment:
    • Added unit test to check if C++ version remains remains aligned with the package number (Dirk in #1022 fixing #1021).
    • The unit test system was switched to tinytest (Dirk in #1028, #1032, #1033).

Please note that the change to execptions and Rcpp::stop() in pr #1043 has been seen to have a minor side effect on macOS issue #1046 which has already been fixed by Kevin in pr #1047 for which I may prepare a 1.0.4.1 release for the Rcpp drat repo in a day or two. Thanks to CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2356 previous questions. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

10 September 2015

Dirk Eddelbuettel: Rcpp 0.12.1: First boat load of fixes!

The first updated in the 0.12.* series of Rcpp is now on the CRAN network for GNU R this morning, and I will push a Debian package. This follows the 0.12.0 release from late July which started to some serious new features. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 461 packages on CRAN depend on Rcpp for making analyses go faster and further. This release once again features contributions from a number of new contributors, with Florian Plaza O ate and Dan Dillon joining the fray with several pull requests each. Kurt Hornik also helped with a fix for a bad interaction between very recent R (3.2.2) changes and Rcpp; another related one was also addressed. A big Thanks! to everybody helping with code, bug reports or documentation, See below for a detailed list of changes extracted from the NEWS file.
Changes in Rcpp version 0.12.1 (2015-09-10)
  • Changes in Rcpp API:
    • Correct use of WIN32 instead of _WIN32 to please Windows 10
    • Add an assignment operator to DimNameProxy (PR #339 by Florian)
    • Add vector and matrix accessors .at() with bounds checking (PR #342 by Florian)
    • Correct character vector conversion from single char (PR #344 by Florian fixing issue #343)
    • Correct on use of R_xlen_t back to size_t (PR #348 by Romain)
    • Correct subsetting code to allow for single assignment (PR #349 by Florian)
    • Enable subset assignment on left and righ-hand side (PR #353 by Qiang, fixing issue #345)
    • Refreshed to included tinyformat template library (PR #357 by Dirk, issue #356)
    • Add operator<<() for vectors and matrices (PR #361 by Dan fixing issue #239)
    • Make String and String_Proxy objects comparable (PR #366 and PR #372 by Dan, fixing issue #191)
    • Add a new class Nullable for objects which may be NULL (PR #368 by Dirk and Dan, fixing issue #363)
    • Correct creation and access of large matrices (PR #370 by Florian, fixing issue #369)
  • Changes in Rcpp Attributes:
    • Correctly reset directory in case of no-rebuilding but Windows code (PR #335 by Dirk)
  • Changes in Rcpp Modules:
    • We no longer define multiple Modules objects named World in the unit tests with was seen to have a bad effect with R 3.2.2 or later (PR #351 by Dirk fixing issue #350).
    • Applied patch by Kurt Hornik which improves how Rcpp loads Modules (PR #353 by Dirk)
  • Changes in Rcpp Documentation:
    • The Rcpp.bib file with bibliographic references was updated.
Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

25 July 2015

Dirk Eddelbuettel: Rcpp 0.12.0: Now with more Big Data!

big-data image A new release 0.12.0 of Rcpp arrived on the CRAN network for GNU R this morning, and I also pushed a Debian package upload. Rcpp has become the most popular way of enhancing GNU R with C++ code. As of today, 423 packages on CRAN depend on Rcpp for making analyses go faster and further. Note that this is 60 more packages since the last release in May! Also, BioConductor adds another 57 packages, and casual searches on GitHub suggests many more. And according to Andrie De Vries, Rcpp has now page rank of one on CRAN as well! And with this release, Rcpp also becomes ready for Big Data, or, as they call it in Texas, Data. Thanks to a lot of work and several pull requests by Qiang Kou, support for R_xlen_t has been added. That means we can now do stunts like
R> library(Rcpp)
R> big <- 2^31-1
R> bigM <- rep(NA, big)
R> bigM2 <- c(bigM, bigM)
R> cppFunction("double getSz(LogicalVector x)   return x.length();  ")
R> getSz(bigM)
[1] 2147483647
R> getSz(bigM2)
[1] 4294967294
R>
where prior versions of Rcpp would just have said
> getSz(bigM2)
Error in getSz(bigM2) :
  long vectors not supported yet: ../../src/include/Rinlinedfuns.h:137
>
which is clearly not Texas-style. Another wellcome change, also thanks to Qiang Kou, adds encoding support for strings. A lot of other things got polished. We are still improving exception handling as we still get the odd curveballs in a corner cases. Matt Dziubinski corrected the var() computation to use the proper two-pass method and added better support for lambda functions in Sugar expression using sapply(), Qiang Kou added more pull requests mostly for string initialization, and Romain added a pull request which made data frame creation a little more robust, and JJ was his usual self in tirelessly looking after all aspects of Rcpp Attributes. As always, you can follow the development via the GitHub repo and particularly the Issue tickets and Pull Requests. And any discussions, questions, ... regarding Rcpp are always welcome at the rcpp-devel mailing list. Last but not least, we are also extremely pleased to annouce that Qiang Kou has joined us in the Rcpp-Core team. We are looking forward to a lot more awesome! See below for a detailed list of changes extracted from the NEWS file.
Changes in Rcpp version 0.12.0 (2015-07-24)
  • Changes in Rcpp API:
    • Rcpp_eval() no longer uses R_ToplevelExec when evaluating R expressions; this should resolve errors where calling handlers (e.g. through suppressMessages()) were not properly respected.
    • All internal length variables have been changed from R_len_t to R_xlen_t to support vectors longer than 2^31-1 elements (via pull request 303 by Qiang Kou).
    • The sugar function sapply now supports lambda functions (addressing issue 213 thanks to Matt Dziubinski)
    • The var sugar function now uses a more robust two-pass method, supports complex numbers, with new unit tests added (via pull request 320 by Matt Dziubinski)
    • String constructors now allow encodings (via pull request 310 by Qiang Kou)
    • String objects are preserving the underlying SEXP objects better, and are more careful about initializations (via pull requests 322 and 329 by Qiang Kou)
    • DataFrame constructors are now a little more careful (via pull request 301 by Romain Francois)
    • For R 3.2.0 or newer, Rf_installChar() is used instead of Rf_install(CHAR()) (via pull request 332).
  • Changes in Rcpp Attributes:
    • Use more robust method of ensuring unique paths for generated shared libraries.
    • The evalCpp function now also supports the plugins argument.
    • Correctly handle signature termination characters (' ' or ';') contained in quotes.
  • Changes in Rcpp Documentation:
    • The Rcpp-FAQ vignette was once again updated with respect to OS X issues and Fortran libraries needed for e.g. RcppArmadillo.
    • The included Rcpp.bib bibtex file (which is also used by other Rcpp* packages) was updated with respect to its CRAN references.
Thanks to CRANberries, you can also look at a diff to the previous release As always, even fuller details are on the Rcpp Changelog page and the Rcpp page which also leads to the downloads page, the browseable doxygen docs and zip files of doxygen output for the standard formats. A local directory has source and documentation too. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.