A new release, now at version 0.0.22, of RcppAnnoy
has arrived on CRAN, just a
little short of two years since the previous release.
RcppAnnoy
is the Rcpp-based R integration of
the nifty Annoy library
by Erik Bernhardsson. Annoy is a small and
lightweight C++ template header library for very fast approximate
nearest neighbours originally developed to drive the Spotify music discovery algorithm. It
had all the buzzwords already a decade ago: it is one of the algorithms
behind (drum roll ) vector search as it finds
approximate matches very quickly and also allows to
persist the data.
This release contains three contributed pull requests covering a new
metric, a new demo and quieter compilation, some changes to
documentation and last but not least general polish including letting
the vignette now use the Rcpp::asis builder.
Details of the release follow based on the NEWS file.
Changes in version 0.0.23
(2026-01-12)
Add dot product distance metrics (Benjamin James in #78)
Apply small polish to the documentation (Dirk closing #79)
A new demo() has been added (Samuel Granjeaud in #79)
Switch to Authors@R in DESCRIPTION
Several updates to continuous integration and README.md
Small enhancements to package help files
Updates to vignettes and references
Vignette now uses Rcpp::asis builder (Dirk in #80)
Switch one macro to a function to avoid a compiler nag (Amos
Elberg in #81)
Machine is a far-future space opera. It is a loose sequel to
Ancestral Night, but you do not have to
remember the first book to enjoy this book and they have only a couple of
secondary characters in common. There are passing spoilers for
Ancestral Night in the story, though, if you care.
Dr. Brookllyn Jens is a rescue paramedic on Synarche Medical Vessel
I Race To Seek the Living. That means she goes into dangerous
situations to get you out of them, patches you up enough to not die, and
brings you to doctors who can do the slower and more time-consuming work.
She was previously a cop (well, Judiciary, which in this universe is
mostly the same thing) and then found that medicine, and specifically the
flagship Synarche hospital Core General, was the institution in all the
universe that she believed in the most.
As Machine opens, Jens is boarding the Big Rock Candy
Mountain, a generation ship launched from Earth during the bad era before
right-minding and joining the Synarche, back when it looked like humanity
on Earth wouldn't survive. Big Rock Candy Mountain was discovered
by accident in the wrong place, going faster than it was supposed to be
going and not responding to hails. The Synarche ship that first discovered
and docked with it is also mysteriously silent. It's the job of Jens and
her colleagues to get on board, see if anyone is still alive, and rescue
them if possible.
What they find is a corpse and a disturbingly servile early AI guarding a
whole lot of people frozen in primitive cryobeds, along with odd
artificial machinery that seems to be controlled by the AI. Or possibly
controlling the AI.
Jens assumes her job will be complete once she gets the cryobeds and the
AI back to Core General where both the humans and the AI can be treated by
appropriate doctors. Jens is very wrong.
Machine is Elizabeth Bear's version of a James White
Sector General novel. If one reads this book
without any prior knowledge, the way that I did, you may not realize this
until the characters make it to Core General, but then it becomes obvious
to anyone who has read White's series. Most of the standard Sector General
elements are here: A vast space station with rings at different gravity
levels and atmospheres, a baffling array of species, and the ability to
load other people's personalities into your head to treat other species at
the cost of discomfort and body dysmorphia. There's a gruff supervisor, a
fragile alien doctor, and a whole lot of idealistic and well-meaning
people working around complex interspecies differences. Sadly, Bear does
drop White's entertainingly oversimplified species classification codes;
this is the correct call for suspension of disbelief, but I kind of missed
them.
I thoroughly enjoy the idea of the Sector General series, so I was
delighted by an updated version that drops the sexism and the doctor/nurse
hierarchy and adds AIs, doctors for AIs, and a more complicated political
structure. The hospital is even run by a sentient tree, which is an
inspired choice.
Bear, of course, doesn't settle for a relatively simple James White
problem-solving plot. There are interlocking, layered problems here,
medical and political, immediate and structural, that unwind in ways that
I found satisfyingly twisty. As with Ancestral Night, Bear has some
complex points to make about morality. I think that aspect of the story
was a bit less convincing than Ancestral Night, in part because
some of the characters use rather bizarre tactics (although I will grant
they are the sort of bizarre tactics that I could imagine would be used by
well-meaning people using who didn't think through all of the possible
consequences). I enjoyed the ethical dilemmas here, but they didn't grab
me the way that Ancestral Night did. The setting, though, is even
better: An interspecies hospital was a brilliant setting when James White
used it, and it continues to be a brilliant setting in Bear's hands.
It's also worth mentioning that Jens has a chronic inflammatory disease
and uses an exoskeleton for mobility, and (as much as I can judge while
not being disabled myself) everything about this aspect of the character
was excellent. It's rare to see characters with meaningful disabilities in
far-future science fiction. When present at all, they're usually treated
like Geordi's sight: something little different than the differential
abilities of the various aliens, or even a backdoor advantage. Jens has a
true, meaningful disability that she has to manage and that causes a
constant cognitive drain, and the treatment of her assistive device is
complex and nuanced in a way that I found thoughtful and satisfying.
The one structural complaint that I will make is that Jens is an
astonishingly talkative first-person protagonist, particularly for an
Elizabeth Bear novel. This is still better than being inscrutable, but she
is prone to such extended philosophical digressions or infodumps in the
middle of a scene that I found myself wishing she'd get on with it already
in a few places. This provides good characterization, in the sense that
the reader certainly gets inside Jens's head, but I think Bear didn't get
the balance quite right.
That complaint aside, this was very fun, and I am certainly going to keep
reading this series. Recommended, particularly if you like James White, or
want to see why other people do.
The most important thing in the universe is not, it turns out, a
single, objective truth. It's not a hospital whose ideals you love,
that treats all comers. It's not a lover; it's not a job. It's not
friends and teammates.
It's not even a child that rarely writes me back, and to be honest I
probably earned that. I could have been there for her. I didn't know
how to be there for anybody, though. Not even for me.
The most important thing in the universe, it turns out, is a complex
of subjective and individual approximations. Of tries and fails. Of
ideals, and things we do to try to get close to those ideals.
It's who we are when nobody is looking.
## 0.23 2025-12-20
commit be15aa25dea40aea66a8534143fb81b29d2e6c08
Author: C.J. Collier
Date: Sat Dec 20 22:40:44 2025 +0000
Fixes C-level test infrastructure and adds more test cases for upb_to_sv conversions.
- **Makefile.PL:**
- Allow extra_src in c_test_config.json to be an array.
- Add ASan flags to CCFLAGS and LDDLFLAGS for better debugging.
- Corrected echo newlines in test_c target.
- **c_test_config.json:**
- Added missing type test files to deps and extra_src for convert/sv_to_upb and convert/upb_to_sv test runners.
- **t/c/convert/upb_to_sv.c:**
- Fixed a double free of test_pool .
- Added missing includes for type test headers.
- Updated test plan counts.
- **t/c/convert/sv_to_upb.c:**
- Added missing includes for type test headers.
- Updated test plan counts.
- Corrected Perl interpreter initialization.
- **t/c/convert/types/**:
- Added missing test_util.h include in new type test headers.
- Completed the set of upb_to_sv test cases for all scalar types by adding optional and repeated tests for sfixed32 , sfixed64 , sint32 , and sint64 , and adding repeated tests to the remaining scalar type files.
- **Documentation:**
- Updated 01-xs-testing.md with more debugging tips, including ASan usage and checking for double frees and typos.
- Updated xs_learnings.md with details from the recent segfault.
- Updated llm-plan-execution-instructions.md to emphasize debugging steps.
## 0.22 2025-12-19
commit 2c171d9a5027e0150eae629729c9104e7f6b9d2b
Author: C.J. Collier
Date: Fri Dec 19 23:41:02 2025 +0000
feat(perl,testing): Initialize C test framework and build system
This commit sets up the foundation for the C-level tests and the build system for the Perl Protobuf module:
1. **Makefile.PL Enhancements:**
* Integrates Devel::PPPort to generate ppport.h for better portability.
* Object files now retain their path structure (e.g., xs/convert/sv_to_upb.o ) instead of being flattened, improving build clarity.
* The MY::postamble is significantly revamped to dynamically generate build rules for all C tests located in t/c/ based on the t/c/c_test_config.json file.
* C tests are linked against libprotobuf_common.a and use ExtUtils::Embed flags.
* Added JSON::MaybeXS to PREREQ_PM .
* The test target now also depends on the test_c target.
2. **C Test Infrastructure ( t/c/ ):
* Introduced t/c/c_test_config.json to configure individual C test builds, specifying dependencies and extra source files.
* Created t/c/convert/test_util.c and .h for shared test functions like loading descriptors.
* Initial t/c/convert/upb_to_sv.c and t/c/convert/sv_to_upb.c test runners.
* Basic t/c/integration/030_protobuf_coro.c for Coro safety testing on core utils using libcoro .
* Basic t/c/integration/035_croak_test.c for testing exception handling.
* Basic t/c/integration/050_convert.c for integration testing conversions.
3. **Test Proto:** Updated t/data/test.proto with more field types for conversion testing and regenerated test_descriptor.bin .
4. **XS Test Harness ( t/c/upb-perl-test.h ):** Added like_n macro for length-aware regex matching.
5. **Documentation:** Updated architecture and plan documents to reflect the C test structure.
6. **ERRSV Testing:** Note that the C tests ( t/c/ ) will primarily check *if* a croak occurs (i.e., that the exception path is taken), but will not assert on the string content of ERRSV . Reliably testing $@ content requires the full Perl test environment with Test::More , which will be done in the .t files when testing the Perl API.
This provides a solid base for developing and testing the XS and C components of the module.
## 0.21 2025-12-18
commit a8b6b6100b2cf29c6df1358adddb291537d979bc
Author: C.J. Collier
Date: Thu Dec 18 04:20:47 2025 +0000
test(C): Add integration tests for Milestone 2 components
- Created t/c/integration/030_protobuf.c to test interactions
between obj_cache, arena, and utils.
- Added this test to t/c/c_test_config.json.
- Verified that all C tests for Milestones 2 and 3 pass,
including the libcoro-based stress test.
## 0.20 2025-12-18
commit 0fcad68680b1f700a83972a7c1c48bf3a6958695
Author: C.J. Collier
Date: Thu Dec 18 04:14:04 2025 +0000
docs(plan): Add guideline review reminders to milestones
- Added a "[ ] REFRESH: Review all documents in @perl/doc/guidelines/**"
checklist item to the start of each component implementation
milestone (C and Perl layers).
- This excludes Integration Test milestones.
## 0.19 2025-12-18
commit 987126c4b09fcdf06967a98fa3adb63d7de59a34
Author: C.J. Collier
Date: Thu Dec 18 04:05:53 2025 +0000
docs(plan): Add C-level and Perl-level Coro tests to milestones
- Added checklist items for libcoro -based C tests
(e.g., t/c/integration/050_convert_coro.c ) to all C layer
integration milestones (050 through 220).
- Updated 030_Integration_Protobuf.md to standardise checklist
items for the existing 030_protobuf_coro.c test.
- Removed the single xt/author/coro-safe.t item from
010_Build.md .
- Added checklist items for Perl-level Coro tests
(e.g., xt/coro/240_arena.t ) to each Perl layer
integration milestone (240 through 400).
- Created perl/t/c/c_test_config.json to manage C test
configurations externally.
- Updated perl/doc/architecture/testing/01-xs-testing.md to describe
both C-level libcoro and Perl-level Coro testing strategies.
## 0.18 2025-12-18
commit 6095a5a610401a6035a81429d0ccb9884d53687b
Author: C.J. Collier
Date: Thu Dec 18 02:34:31 2025 +0000
added coro testing to c layer milestones
## 0.17 2025-12-18
commit cc0aae78b1f7f675fc8a1e99aa876c0764ea1cce
Author: C.J. Collier
Date: Thu Dec 18 02:26:59 2025 +0000
docs(plan): Refine test coverage checklist items for SMARTness
- Updated the "Tests provide full coverage" checklist items in
C layer plan files (020, 040, 060, 080, 100, 120, 140, 160, 180, 200)
to explicitly mention testing all public functions in the
corresponding header files.
- Expanded placeholder checklists in 140, 160, 180, 200.
- Updated the "Tests provide full coverage" and "Add coverage checks"
checklist items in Perl layer plan files (230, 250, 270, 290, 310, 330,
350, 370, 390) to be more specific about the scope of testing
and the use of Test::TestCoverage .
- Expanded Well-Known Types milestone (350) to detail each type.
## 0.16 2025-12-18
commit e4b601f14e3817a17b0f4a38698d981dd4cb2818
Author: C.J. Collier
Date: Thu Dec 18 02:07:35 2025 +0000
docs(plan): Full refactoring of C and Perl plan files
- Split both ProtobufPlan-C.md and ProtobufPlan-Perl.md into
per-milestone files under the perl/doc/plan/ directory.
- Introduced Integration Test milestones after each component
milestone in both C and Perl plans.
- Numbered milestone files sequentially (e.g., 010_Build.md,
230_Perl_Arena.md).
- Updated main ProtobufPlan-C.md and ProtobufPlan-Perl.md to
act as Tables of Contents.
- Ensured consistent naming for integration test files
(e.g., t/c/integration/030_protobuf.c , t/integration/260_descriptor_pool.t ).
- Added architecture review steps to the end of all milestones.
- Moved Coro safety test to C layer Milestone 1.
- Updated Makefile.PL to support new test structure and added Coro.
- Moved and split t/c/convert.c into t/c/convert/*.c.
- Moved other t/c/*.c tests into t/c/protobuf/*.c.
- Deleted old t/c/convert.c.
## 0.15 2025-12-17
commit 649cbacf03abb5e7293e3038bb451c0406e9d0ce
Author: C.J. Collier
Date: Wed Dec 17 23:51:22 2025 +0000
docs(plan): Refactor and reset ProtobufPlan.md
- Split the plan into ProtobufPlan-C.md and ProtobufPlan-Perl.md.
- Reorganized milestones to clearly separate C layer and Perl layer development.
- Added more granular checkboxes for each component:
- C Layer: Create test, Test coverage, Implement, Tests pass.
- Perl Layer: Create test, Test coverage, Implement Module/XS, Tests pass, C-Layer adjustments.
- Reset all checkboxes to [ ] to prepare for a full audit.
- Updated status in architecture/api and architecture/core documents to "Not Started".
feat(obj_cache): Add unregister function and enhance tests
- Added protobuf_unregister_object to xs/protobuf/obj_cache.c .
- Updated xs/protobuf/obj_cache.h with the new function declaration.
- Expanded tests in t/c/protobuf_obj_cache.c to cover unregistering,
overwriting keys, and unregistering non-existent keys.
- Corrected the test plan count in t/c/protobuf_obj_cache.c to 17.
## 0.14 2025-12-17
commit 40b6ad14ca32cf16958d490bb575962f88d868a1
Author: C.J. Collier
Date: Wed Dec 17 23:18:27 2025 +0000
feat(arena): Complete C layer for Arena wrapper
This commit finalizes the C-level implementation for the Protobuf::Arena wrapper.
- Adds PerlUpb_Arena_Destroy for proper cleanup from Perl's DEMOLISH.
- Enhances error checking in PerlUpb_Arena_Get .
- Expands C-level tests in t/c/protobuf_arena.c to cover memory allocation
on the arena and lifecycle through PerlUpb_Arena_Destroy .
- Corrects embedded Perl initialization in the C test.
docs(plan): Refactor ProtobufPlan.md
- Restructures the development plan to clearly separate "C Layer" and
"Perl Layer" tasks within each milestone.
- This aligns the plan with the "C-First Implementation Strategy" and improves progress tracking.
## 0.13 2025-12-17
commit c1e566c25f62d0ae9f195a6df43b895682652c71
Author: C.J. Collier
Date: Wed Dec 17 22:00:40 2025 +0000
refactor(perl): Rename C tests and enhance Makefile.PL
- Renamed test files in t/c/ to better match the xs module structure:
- 01-cache.c -> protobuf_obj_cache.c
- 02-arena.c -> protobuf_arena.c
- 03-utils.c -> protobuf_utils.c
- 04-convert.c -> convert.c
- load_test.c -> upb_descriptor_load.c
- Updated perl/Makefile.PL to reflect the new test names in MY::postamble 's $c_test_config .
- Refactored the $c_test_config generation in Makefile.PL to reduce repetition by using a default flags hash and common dependencies array.
- Added a fail() macro to perl/t/c/upb-perl-test.h for consistency.
- Modified t/c/upb_descriptor_load.c to use the t/c/upb-perl-test.h macros, making its output consistent with other C tests.
- Added a skeleton for t/c/convert.c to test the conversion functions.
- Updated documentation in ProtobufPlan.md and architecture/testing/01-xs-testing.md to reflect new test names.
## 0.12 2025-12-17
commit d8cb5dd415c6c129e71cd452f78e29de398a82c9
Author: C.J. Collier
Date: Wed Dec 17 20:47:38 2025 +0000
feat(perl): Refactor XS code into subdirectories
This commit reorganizes the C code in the perl/xs/ directory into subdirectories, mirroring the structure of the Python UPB extension. This enhances modularity and maintainability.
- Created subdirectories for each major component: convert , descriptor , descriptor_containers , descriptor_pool , extension_dict , map , message , protobuf , repeated , and unknown_fields .
- Created skeleton .h and .c files within each subdirectory to house the component-specific logic.
- Updated top-level component headers (e.g., perl/xs/descriptor.h ) to include the new sub-headers.
- Updated top-level component source files (e.g., perl/xs/descriptor.c ) to include their main header and added stub initialization functions (e.g., PerlUpb_InitDescriptor ).
- Moved code from the original perl/xs/protobuf.c to new files in perl/xs/protobuf/ (arena, obj_cache, utils).
- Moved code from the original perl/xs/convert.c to new files in perl/xs/convert/ (upb_to_sv, sv_to_upb).
- Updated perl/Makefile.PL to use a glob ( xs/*/*.c ) to find the new C source files in the subdirectories.
- Added perl/doc/architecture/core/07-xs-file-organization.md to document the new structure.
- Updated perl/doc/ProtobufPlan.md and other architecture documents to reference the new organization.
- Corrected self-referential includes in the newly created .c files.
This restructuring provides a solid foundation for further development and makes it easier to port logic from the Python implementation.
## 0.11 2025-12-17
commit cdedcd13ded4511b0464f5d3bdd72ce6d34e73fc
Author: C.J. Collier
Date: Wed Dec 17 19:57:52 2025 +0000
feat(perl): Implement C-first testing and core XS infrastructure
This commit introduces a significant refactoring of the Perl XS extension, adopting a C-first development approach to ensure a robust foundation.
Key changes include:
- **C-Level Testing Framework:** Established a C-level testing system in t/c/ with a dedicated Makefile, using an embedded Perl interpreter. Initial tests cover the object cache ( 01-cache.c ), arena wrapper ( 02-arena.c ), and utility functions ( 03-utils.c ).
- **Core XS Infrastructure:**
- Implemented a global object cache ( xs/protobuf.c ) to manage Perl wrappers for UPB objects, using weak references.
- Created an upb_Arena wrapper ( xs/protobuf.c ).
- Consolidated common XS helper functions into xs/protobuf.h and xs/protobuf.c .
- **Makefile.PL Enhancements:** Updated to support building and linking C tests, incorporating flags from ExtUtils::Embed , and handling both .c and .cc source files.
- **XS File Reorganization:** Restructured XS files to mirror the Python UPB extension's layout (e.g., message.c , descriptor.c ). Removed older, monolithic .xs files.
- **Typemap Expansion:** Added extensive typemap entries in perl/typemap to handle conversions between Perl objects and various const upb_*Def* pointers.
- **Descriptor Tests:** Added a new test suite t/02-descriptor.t to validate descriptor loading and accessor methods.
- **Documentation:** Updated development plans and guidelines ( ProtobufPlan.md , xs_learnings.md , etc.) to reflect the C-first strategy, new testing methods, and lessons learned.
- **Build Cleanup:** Removed ppport.h from .gitignore as it's no longer used, due to -DPERL_NO_PPPORT being set in Makefile.PL .
This C-first approach allows for more isolated and reliable testing of the core logic interacting with the UPB library before higher-level Perl APIs are built upon it.
## 0.10 2025-12-17
commit 1ef20ade24603573905cb0376670945f1ab5d829
Author: C.J. Collier
Date: Wed Dec 17 07:08:29 2025 +0000
feat(perl): Implement C-level tests and core XS utils
This commit introduces a C-level testing framework for the XS layer and implements key components:
1. **C-Level Tests ( t/c/ )**:
* Added t/c/Makefile to build standalone C tests.
* Created t/c/upb-perl-test.h with macros for TAP-compliant C tests ( plan , ok , is , is_string , diag ).
* Implemented t/c/01-cache.c to test the object cache.
* Implemented t/c/02-arena.c to test Protobuf::Arena wrappers.
* Implemented t/c/03-utils.c to test string utility functions.
* Corrected include paths and diagnostic messages in C tests.
2. **XS Object Cache ( xs/protobuf.c )**:
* Switched to using stringified pointers ( %p ) as hash keys for stability.
* Fixed a critical double-free bug in PerlUpb_ObjCache_Delete by removing an extra SvREFCNT_dec on the lookup key.
3. **XS Arena Wrapper ( xs/protobuf.c )**:
* Corrected PerlUpb_Arena_New to use newSVrv and PTR2IV for opaque object wrapping.
* Corrected PerlUpb_Arena_Get to safely unwrap the arena pointer.
4. **Makefile.PL ( perl/Makefile.PL )**:
* Added -Ixs to INC to allow C tests to find t/c/upb-perl-test.h and xs/protobuf.h .
* Added LIBS to link libprotobuf_common.a into the main Protobuf.so .
* Added C test targets 01-cache , 02-arena , 03-utils to the test config in MY::postamble .
5. **Protobuf.pm ( perl/lib/Protobuf.pm )**:
* Added use XSLoader; to load the compiled XS code.
6. **New files xs/util.h **:
* Added initial type conversion function.
These changes establish a foundation for testing the C-level interface with UPB and fix crucial bugs in the object cache implementation.
## 0.09 2025-12-17
commit 07d61652b032b32790ca2d3848243f9d75ea98f4
Author: C.J. Collier
Date: Wed Dec 17 04:53:34 2025 +0000
feat(perl): Build system and C cache test for Perl XS
This commit introduces the foundational pieces for the Perl XS implementation, focusing on the build system and a C-level test for the object cache.
- **Makefile.PL:**
- Refactored C test compilation rules in MY::postamble to use a hash ( $c_test_config ) for better organization and test-specific flags.
- Integrated ExtUtils::Embed to provide necessary compiler and linker flags for embedding the Perl interpreter, specifically for the t/c/01-cache.c test.
- Correctly constructs the path to the versioned Perl library ( libperl.so.X.Y.Z ) using $Config archlib and $Config libperl to ensure portability.
- Removed VERSION_FROM and ABSTRACT_FROM to avoid dependency on .pm files for now.
- **C Cache Test (t/c/01-cache.c):**
- Added a C test to exercise the object cache functions implemented in xs/protobuf.c .
- Includes tests for adding, getting, deleting, and weak reference behavior.
- **XS Cache Implementation (xs/protobuf.c, xs/protobuf.h):**
- Implemented PerlUpb_ObjCache_Init , PerlUpb_ObjCache_Add , PerlUpb_ObjCache_Get , PerlUpb_ObjCache_Delete , and PerlUpb_ObjCache_Destroy .
- Uses a Perl hash ( HV* ) for the cache.
- Keys are string representations of the C pointers, created using snprintf with "%llx" .
- Values are weak references ( sv_rvweaken ) to the Perl objects ( SV* ).
- PerlUpb_ObjCache_Get now correctly returns an incremented reference to the original SV, not a copy.
- PerlUpb_ObjCache_Destroy now clears the hash before decrementing its refcount.
- **t/c/upb-perl-test.h:**
- Updated is_sv to perform direct pointer comparison ( got == expected ).
- **Minor:** Added util.h (currently empty), updated typemap .
These changes establish a working C-level test environment for the XS components.
## 0.08 2025-12-17
commit d131fd22ea3ed8158acb9b0b1fe6efd856dc380e
Author: C.J. Collier
Date: Wed Dec 17 02:57:48 2025 +0000
feat(perl): Update docs and core XS files
- Explicitly add TDD cycle to ProtobufPlan.md.
- Clarify mirroring of Python implementation in upb-interfacing.md for both C and Perl layers.
- Branch and adapt python/protobuf.h and python/protobuf.c to perl/xs/protobuf.h and perl/xs/protobuf.c, including the object cache implementation. Removed old cache.* files.
- Create initial C test for the object cache in t/c/01-cache.c.
## 0.07 2025-12-17
commit 56fd6862732c423736a2f9a9fb1a2816fc59e9b0
Author: C.J. Collier
Date: Wed Dec 17 01:09:18 2025 +0000
feat(perl): Align Perl UPB architecture docs with Python
Updates the Perl Protobuf architecture documents to more closely align with the design and implementation strategies used in the Python UPB extension.
Key changes:
- **Object Caching:** Mandates a global, per-interpreter cache using weak references for all UPB-derived objects, mirroring Python's PyUpb_ObjCache .
- **Descriptor Containers:** Introduces a new document outlining the plan to use generic XS container types (Sequence, ByNameMap, ByNumberMap) with vtables to handle collections of descriptors, similar to Python's descriptor_containers.c .
- **Testing:** Adds a note to the testing strategy to port relevant test cases from the Python implementation to ensure feature parity.
## 0.06 2025-12-17
commit 6009ce6ab64eccce5c48729128e5adf3ef98e9ae
Author: C.J. Collier
Date: Wed Dec 17 00:28:20 2025 +0000
feat(perl): Implement object caching and fix build
This commit introduces several key improvements to the Perl XS build system and core functionality:
1. **Object Caching:**
* Introduces xs/protobuf.c and xs/protobuf.h to implement a caching mechanism ( protobuf_c_to_perl_obj ) for wrapping UPB C pointers into Perl objects. This uses a hash and weak references to ensure object identity and prevent memory leaks.
* Updates the typemap to use protobuf_c_to_perl_obj for upb_MessageDef * output, ensuring descriptor objects are cached.
* Corrected sv_weaken to the correct sv_rvweaken function.
2. **Makefile.PL Enhancements:**
* Switched to using the Bazel-generated UPB descriptor sources from bazel-bin/src/google/protobuf/_virtual_imports/descriptor_proto/google/protobuf/ .
* Updated INC paths to correctly locate the generated headers.
* Refactored MY::dynamic_lib to ensure the static library libprotobuf_common.a is correctly linked into each generated .so module, resolving undefined symbol errors.
* Overrode MY::test to use prove -b -j$(nproc) t/*.t xt/*.t for running tests.
* Cleaned up LIBS and LDDLFLAGS usage.
3. **Documentation:**
* Updated ProtobufPlan.md to reflect the current status and design decisions.
* Reorganized architecture documents into subdirectories.
* Added object-caching.md and c-perl-interface.md .
* Updated llm-guidance.md with notes on upb/upb.h and sv_rvweaken .
4. **Testing:**
* Fixed xt/03-moo_immutable.t to skip tests if no Moo modules are found.
This resolves the build issues and makes the core test suite pass.
## 0.05 2025-12-16
commit 177d2f3b2608b9d9c415994e076a77d8560423b8
Author: C.J. Collier
Date: Tue Dec 16 19:51:36 2025 +0000
Refactor: Rename namespace to Protobuf, build system and doc updates
This commit refactors the primary namespace from ProtoBuf to Protobuf
to align with the style guide. This involves renaming files, directories,
and updating package names within all Perl and XS files.
**Namespace Changes:**
* Renamed perl/lib/ProtoBuf to perl/lib/Protobuf .
* Moved and updated ProtoBuf.pm to Protobuf.pm .
* Moved and updated ProtoBuf::Descriptor to Protobuf::Descriptor (.pm & .xs).
* Removed other ProtoBuf::* stubs (Arena, DescriptorPool, Message).
* Updated MODULE and PACKAGE in Descriptor.xs .
* Updated NAME , *_FROM in perl/Makefile.PL .
* Replaced ProtoBuf with Protobuf throughout perl/typemap .
* Updated namespaces in test files t/01-load-protobuf-descriptor.t and t/02-descriptor.t .
* Updated namespaces in all documentation files under perl/doc/ .
* Updated paths in perl/.gitignore .
**Build System Enhancements (Makefile.PL):**
* Included xs/*.c files in the common object files list.
* Added -I. to the INC paths.
* Switched from MYEXTLIB to LIBS => ['-L$(CURDIR) -lprotobuf_common'] for linking.
* Removed custom keys passed to WriteMakefile for postamble.
* MY::postamble now sources variables directly from the main script scope.
* Added all :: $ common_lib dependency in MY::postamble .
* Added t/c/load_test.c compilation rule in MY::postamble .
* Updated clean target to include blib .
* Added more modules to TEST_REQUIRES .
* Removed the explicit PM and XS keys from WriteMakefile , relying on XSMULTI => 1 .
**New Files:**
* perl/lib/Protobuf.pm
* perl/lib/Protobuf/Descriptor.pm
* perl/lib/Protobuf/Descriptor.xs
* perl/t/01-load-protobuf-descriptor.t
* perl/t/02-descriptor.t
* perl/t/c/load_test.c : Standalone C test for UPB.
* perl/xs/types.c & perl/xs/types.h : For Perl/C type conversions.
* perl/doc/architecture/upb-interfacing.md
* perl/xt/03-moo_immutable.t : Test for Moo immutability.
**Deletions:**
* Old test files: t/00_load.t , t/01_basic.t , t/02_serialize.t , t/03_message.t , t/04_descriptor_pool.t , t/05_arena.t , t/05_message.t .
* Removed lib/ProtoBuf.xs as it's not needed with XSMULTI .
**Other:**
* Updated test_descriptor.bin (binary change).
* Significant content updates to markdown documentation files in perl/doc/architecture and perl/doc/internal reflecting the new architecture and learnings.
## 0.04 2025-12-14
commit 92de5d482c8deb9af228f4b5ce31715d3664d6ee
Author: C.J. Collier
Date: Sun Dec 14 21:28:19 2025 +0000
feat(perl): Implement Message object creation and fix lifecycles
This commit introduces the basic structure for ProtoBuf::Message object
creation, linking it with ProtoBuf::Descriptor and ProtoBuf::DescriptorPool ,
and crucially resolves a SEGV by fixing object lifecycle management.
Key Changes:
1. ** ProtoBuf::Descriptor :** Added _pool attribute to hold a strong
reference to the parent ProtoBuf::DescriptorPool . This is essential to
prevent the pool and its C upb_DefPool from being garbage collected
while a descriptor is still in use.
2. ** ProtoBuf::DescriptorPool :**
* find_message_by_name : Now passes the $self (the pool object) to the
ProtoBuf::Descriptor constructor to establish the lifecycle link.
* XSUB pb_dp_find_message_by_name : Updated to accept the pool SV* and
store it in the descriptor's _pool attribute.
* XSUB _load_serialized_descriptor_set : Renamed to avoid clashing with the
Perl method name. The Perl wrapper now correctly calls this internal XSUB.
* DEMOLISH : Made safer by checking for attribute existence.
3. ** ProtoBuf::Message :**
* Implemented using Moo with lazy builders for _upb_arena and
_upb_message .
* _descriptor is a required argument to new() .
* XS functions added for creating the arena ( pb_msg_create_arena ) and
the upb_Message ( pb_msg_create_upb_message ).
* pb_msg_create_upb_message now extracts the upb_MessageDef* from the
descriptor and uses upb_MessageDef_MiniTable() to get the minitable
for upb_Message_New() .
* DEMOLISH : Added to free the message's arena.
4. ** Makefile.PL :**
* Added -g to CCFLAGS for debugging symbols.
* Added Perl CORE include path to MY::postamble 's base_flags .
5. **Tests:**
* t/04_descriptor_pool.t : Updated to check the structure of the
returned ProtoBuf::Descriptor .
* t/05_message.t : Now uses a descriptor obtained from a real pool to
test ProtoBuf::Message->new() .
6. **Documentation:**
* Updated ProtobufPlan.md to reflect progress.
* Updated several files in doc/architecture/ to match the current
implementation details, especially regarding arena management and object
lifecycles.
* Added doc/internal/development_cycle.md and doc/internal/xs_learnings.md .
With these changes, the SEGV is resolved, and message objects can be successfully
created from descriptors.
## 0.03 2025-12-14
commit 6537ad23e93680c2385e1b571d84ed8dbe2f68e8
Author: C.J. Collier
Date: Sun Dec 14 20:23:41 2025 +0000
Refactor(perl): Object-Oriented DescriptorPool with Moo
This commit refactors the ProtoBuf::DescriptorPool to be fully object-oriented using Moo, and resolves several issues related to XS, typemaps, and test data.
Key Changes:
1. **Moo Object:** ProtoBuf::DescriptorPool.pm now uses Moo to define the class. The upb_DefPool pointer is stored as a lazy attribute _upb_defpool .
2. **XS Lifecycle:** DescriptorPool.xs now has pb_dp_create_pool called by the Moo builder and pb_dp_free_pool called from DEMOLISH to manage the upb_DefPool lifecycle per object.
3. **Typemap:** The perl/typemap file has been significantly updated to handle the conversion between the ProtoBuf::DescriptorPool Perl object and the upb_DefPool * C pointer. This includes:
* Mapping upb_DefPool * to T_PTR .
* An INPUT section for ProtoBuf::DescriptorPool to extract the pointer from the object's hash, triggering the lazy builder if needed via call_method .
* An OUTPUT section for upb_DefPool * to convert the pointer back to a Perl integer, used by the builder.
4. **Method Renaming:** add_file_descriptor_set_binary is now load_serialized_descriptor_set .
5. **Test Data:**
* Added perl/t/data/test.proto with a sample message and enum.
* Generated perl/t/data/test_descriptor.bin using protoc .
* Removed t/data/ from .gitignore to ensure test data is versioned.
6. **Test Update:** t/04_descriptor_pool.t is updated to use the new OO interface, load the generated descriptor set, and check for message definitions.
7. **Build Fixes:**
* Corrected #include paths in DescriptorPool.xs to be relative to the upb/ directory (e.g., upb/wire/decode.h ).
* Added -I../upb to CCFLAGS in Makefile.PL .
* Reordered INC paths in Makefile.PL to prioritize local headers.
**Note:** While tests now pass in some environments, a SEGV issue persists in make test runs, indicating a potential memory or lifecycle issue within the XS layer that needs further investigation.
## 0.02 2025-12-14
commit 6c9a6f1a5f774dae176beff02219f504ea3a6e07
Author: C.J. Collier
Date: Sun Dec 14 20:13:09 2025 +0000
Fix(perl): Correct UPB build integration and generated file handling
This commit resolves several issues to achieve a successful build of the Perl extension:
1. **Use Bazel Generated Files:** Switched from compiling UPB's stage0 descriptor.upb.c to using the Bazel-generated descriptor.upb.c and descriptor.upb_minitable.c located in bazel-bin/src/google/protobuf/_virtual_imports/descriptor_proto/google/protobuf/ .
2. **Updated Include Paths:** Added the bazel-bin path to INC in WriteMakefile and to base_flags in MY::postamble to ensure the generated headers are found during both XS and static library compilation.
3. **Removed Stage0:** Removed references to UPB_STAGE0_DIR and no longer include headers or source files from upb/reflection/stage0/ .
4. **-fPIC:** Explicitly added -fPIC to CCFLAGS in WriteMakefile and ensured $(CCFLAGS) is used in the custom compilation rules in MY::postamble . This guarantees all object files in the static library are compiled with position-independent code, resolving linker errors when creating the shared objects for the XS modules.
5. **Refined UPB Sources:** Used File::Find to recursively find UPB C sources, excluding /conformance/ and /reflection/stage0/ to avoid conflicts and unnecessary compilations.
6. **Arena Constructor:** Modified ProtoBuf::Arena::pb_arena_new XSUB to accept the class name argument passed from Perl, making it a proper constructor.
7. **.gitignore:** Added patterns to perl/.gitignore to ignore generated C files from XS ( lib/*.c , lib/ProtoBuf/*.c ), the copied src_google_protobuf_descriptor.pb.cc , and the t/data directory.
8. **Build Documentation:** Updated perl/doc/architecture/upb-build-integration.md to reflect the new build process, including the Bazel prerequisite, include paths, -fPIC usage, and File::Find .
Build Steps:
1. bazel build //src/google/protobuf:descriptor_upb_proto (from repo root)
2. cd perl
3. perl Makefile.PL
4. make
5. make test (Currently has expected failures due to missing test data implementation).
## 0.01 2025-12-14
commit 3e237e8a26442558c94075766e0d4456daaeb71d
Author: C.J. Collier
Date: Sun Dec 14 19:34:28 2025 +0000
feat(perl): Initialize Perl extension scaffold and build system
This commit introduces the perl/ directory, laying the groundwork for the Perl Protocol Buffers extension. It includes the essential build files, linters, formatter configurations, and a vendored Devel::PPPort for XS portability.
Key components added:
* ** Makefile.PL **: The core ExtUtils::MakeMaker build script. It's configured to:
* Build a static library ( libprotobuf_common.a ) from UPB, UTF8_Range, and generated protobuf C/C++ sources.
* Utilize XSMULTI => 1 to create separate shared objects for ProtoBuf , ProtoBuf::Arena , and ProtoBuf::DescriptorPool .
* Link each XS module against the common static library.
* Define custom compilation rules in MY::postamble to handle C vs. C++ flags and build the static library.
* Set up include paths for the project root, UPB, and other dependencies.
* **XS Stubs ( .xs files)**:
* lib/ProtoBuf.xs : Placeholder for the main module's XS functions.
* lib/ProtoBuf/Arena.xs : XS interface for upb_Arena management.
* lib/ProtoBuf/DescriptorPool.xs : XS interface for upb_DefPool management.
* **Perl Module Stubs ( .pm files)**:
* lib/ProtoBuf.pm : Main module, loads XS.
* lib/ProtoBuf/Arena.pm : Perl class for Arenas.
* lib/ProtoBuf/DescriptorPool.pm : Perl class for Descriptor Pools.
* lib/ProtoBuf/Message.pm : Base class for messages (TBD).
* **Configuration Files**:
* .gitignore : Ignores build artifacts, editor files, etc.
* .perlcriticrc : Configures Perl::Critic for static analysis.
* .perltidyrc : Configures perltidy for code formatting.
* ** Devel::PPPort **: Vendored version 3.72 to generate ppport.h for XS compatibility across different Perl versions.
* ** typemap **: Custom typemap for XS argument/result conversion.
* **Documentation ( doc/ )**: Initial architecture and plan documents.
This provides a solid foundation for developing the UPB-based Perl extension.
Welcome to the report for November 2025 from the Reproducible Builds project!
These monthly reports outline what we ve been up to over the past month, highlighting items of news from elsewhere in the increasingly-important area of software supply-chain security. As always, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
In this report:
10 years of Reproducible Builds at SeaGL 2025
On Friday 8th November, Chris Lamb gave a talk called 10 years of Reproducible Builds at SeaGL in Seattle, WA.
Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. Chris talk:
[ ] introduces the concept of reproducible builds, its technical underpinnings and its potentially transformative impact on software security and transparency. It is aimed at developers, security professionals and policy-makers who are concerned with enhancing trust and accountability in our software. It also provides a history of the Reproducible Builds project, which is approximately ten years old. How are we getting on? What have we got left to do? Aren t all the builds reproducible now?
Distribution work
In Debian this month, Jochen Sprickerhof created a merge request to replace the use of reprotest in Debian s Salsa Continuous Integration (CI) pipeline with debrebuild. Jochen cites the advantages as being threefold: firstly, that only one extra build needed ; it uses the same sbuild and ccache tooling as the normal build ; and works for any Debian release . The merge request was merged by Emmanuel Arias and is now active.
kpcyrd posted to our mailing list announcing the initial release of repro-threshold, which implements an APT transport that defines a threshold of at least X of my N trusted rebuilders need to confirm they reproduced the binary before installing Debian packages. Configuration can be done through a config file, or through a curses-like user interface.
Holger then merged two commits by Jochen Sprickerhof in order to address a fakeroot-related reproducibility issue in the debian-installer, and J rg Jaspert deployed a patch by Ivo De Decker for a bug originally filed by Holger in February 2025 related to some Debian packages not being archived on snapshot.debian.org.
Elsewhere, Roland Clobus performed some analysis on the live Debian trixie images, which he determined were not reproducible. However, in a follow-up post, Roland happily reports that the issues have been handled. In addition, 145 reviews of Debian packages were added, 12 were updated and 15 were removed this month adding to our knowledge about identified issues.
Lastly, Jochen Sprickerhof filed a bug announcing their intention to binary NMU a very large number of the R programming language after a reproducibility-related toolchain bug was fixed.
Bernhard M. Wiedemann posted another openSUSEmonthly update for their work there.
Julien Malka and Arnout Engelen launched the new hash collection
server for NixOS. Aside from improved reporting to help focus reproducible builds
efforts within NixOS, it collects build hashes as individually-signed attestations
from independent builders, laying the groundwork for further tooling.
Tool development
diffoscope version 307 was uploaded to Debian unstable (as well as version 309). These changes included further attempts to automatically attempt to deploy to PyPI by liaising with the PyPI developers/maintainers (with this experimental feature). [][][]
In addition, reprotest versions 0.7.31 and 0.7.32 were uploaded to Debian unstable by Holger Levsen, who also made the following changes:
Do not vary the architecture personality if the kernel is not varied. (Thanks to Ra l Cumplido). []
Drop the debian/watch file, as Lintian now flags this as error for native Debian packages. [][]
Bump Standards-Version to 4.7.2, with no changes needed. []
Drop the Rules-Requires-Root header as it is no longer required.. []
In addition, however, Vagrant Cascadian fixed a build failure by removing some extra whitespace from an older changelog entry. []
Website updates
Once again, there were a number of improvements made to our website this month including:
Bernhard M. Wiedemann updated the SOURCE_DATE_EPOCH page to fix the Lisp example syntax. []
Web3 applications, built on blockchain technology, manage billions of dollars in digital assets through decentralized applications (dApps) and smart contracts. These systems rely on complex, software supply chains that introduce significant security vulnerabilities. This paper examines the software supply chain security challenges unique to the Web3 ecosystem, where traditional Web2 software supply chain problems intersect with the immutable and high-stakes nature of blockchain technology. We analyze the threat landscape and propose mitigation strategies to strengthen the security posture of Web3 systems.
Their paper lists reproducible builds as one of the mitigating strategies. A PDF of the full text is available to download.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Welcome to the October 2025 report from the Reproducible Builds project!
Welcome to the very latest report from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As ever, if you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
In this report:
Farewell from the Reproducible Builds Summit 2025
Thank you to everyone who joined us at the Reproducible Builds Summit in Vienna, Austria!
We were thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. During this event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim was to create an inclusive space that fosters collaboration, innovation and problem-solving.
The agenda of the three main days is available online however, some working sessions may still lack notes at time of publication.
One tangible outcome of the summit is that Johannes Starosta finished their rebuilderd tutorial, which is now available online and Johannes is actively seeking feedback.
Google s Play Store breaks reproducible builds for Signal
On the issue tracker for the popular Signal messenger app, developer Greyson Parrelli reports that updates to the Google Play store have, in effect, broken reproducible builds:
The most recent issues have to do with changes to the APKs that are made by the Play Store. Specifically, they add some attributes to some .xml files around languages are resources, which is not unexpected because of how the whole bundle system works. This is trickier to resolve, because unlike current expected differences (like signing information), we can t just exclude a whole file from the comparison. We have to take a more nuanced look at the diff. I ve been hesitant to do that because it ll complicate our currently-very-readable comparison script, but I don t think there s any other reasonable option here.
kpcyrd forwarded a fascinating tidbit regarding so-called ninja and samurai build ordering, that uses data structures in which the pointer values returned from malloc are used to determine some order of execution.
Arnout Engelen, Justin Cappos, Ludovic Court s and kpcyrd continued a conversation started in September regarding the Minimum Elements for a Software Bill of Materials . (Full thread)
Felix Moessbauer of Siemens posted to the list reporting that he had recently stumbled upon a couple of Debian source packages on the snapshot mirrors that are listed multiple times (same name and version), but each time with a different checksum . The thread, which Felix titled, Debian: what precisely identifies a source package is about precisely that what can be axiomatically relied upon by consumers of the Debian archives, as well as indicating an issue where we can t exactly say which packages were used during build time (even when having the .buildinfo files).
Luca DiMaio posted to the list announcing the release of xfsprogs 6.17.0 which specifically includes a commit that implements the functionality to populate a newly created XFS filesystem directly from an existing directory structure which makes it easier to create populated filesystems
without having to mount them [and thus is] particularly useful for reproducible builds . Luca asked the list how they might contribute to the docs of the System images page.
Reproducible Builds at the Transparency.dev summit
Holger Levsen gave a talk at this year s Transparency.dev summit in Gothenburg, Sweden, outlining the achievements of the Reproducible Builds project in the last 12 years, covering both upstream developments as well as some distribution-specific details. As mentioned on the talk s page, Holger s presentation concluded with an outlook into the future and an invitation to collaborate to bring transparency logs into Reproducible Builds projects .
The slides of the talk are available, although a video has yet to be released. Nevertheless, as a result of the discussions at Transparency.dev there is a new page on the Debian wiki with the aim of describing a potential transparency log setup for Debian.
Supply Chain Security for Go
Andrew Ayer has setup a new service at sourcespotter.com that aims to monitor the supply chain security for Go releases. It consists of four separate trackers:
A tool to verify that the Go Module Mirror and Checksum Database is behaving honestly and has not presented inconsistent information to clients.
A module monitor that records every module version served by the Go Module Mirror and Checksum Database, allowing you to monitor for unexpected versions of your modules.
A tool to verifies that the Go toolchains published in the Go Module Mirror can be reproduced from source code, making it difficult to hide backdoors in the binaries downloaded by the go command.
A telemetry config tracker that tracks the names of telemetry counters uploaded by the Go toolchain, to ensure that Go telemetry is not violating users privacy.
As the homepage of the service mentions, the trackers are free software and do not rely on Google infrastructure.
In March 2024, a sophisticated backdoor was discovered in xz, a core compression library in Linux distributions, covertly inserted over three years by a malicious maintainer, Jia Tan. The attack, which enabled remote code execution via ssh, was only uncovered by chance when Andres Freund investigated a minor performance issue. This incident highlights the vulnerability of the open-source supply chain and the effort attackers are willing to invest in gaining trust and access. In this article, I analyze the backdoor s mechanics and explore how bitwise build reproducibility could have helped detect it.
Although quantum computing is a rapidly evolving field of research, it can already benefit from adopting reproducible builds. This paper aims to bridge the gap between the quantum computing and reproducible builds communities. We propose a generalization of the definition of reproducible builds in the quantum setting, motivated by two threat models: one targeting the confidentiality of end users data during circuit preparation and submission to a quantum computer, and another compromising the integrity of quantum computation results. This work presents three examples that show how classical information can be hidden in transpiled quantum circuits, and two cases illustrating how even minimal modifications to these circuits can lead to incorrect quantum computation results.
The thesis focuses on providing a reproducible build process for two open-source E2EE messaging applications: Signal and Wire. The motivation to ensure reproducibility and thereby the integrity of E2EE messaging applications stems from their central role as essential tools for modern digital privacy. These applications provide confidentiality for private and sensitive communications, and their compromise could undermine encryption mechanisms, potentially leaking sensitive data to third parties.
Currently, there are numerous solutions and techniques available in the market to tackle supply chain security, and all claim to be the best solution. This thesis delves deeper by implementing those solutions and evaluates them for better understanding. Some of the tools that this thesis implemented are Syft, Trivy, Grype, FOSSA, dependency-check, and Gemnasium. Software dependencies are generated in a Software Bill of Materials (SBOM) format by using these open-source tools, and the corresponding results have been analyzed. Among these tools, Syft and Trivy outperform others as they provide relevant and accurate information on software dependencies.
In the wake of growing supply chain attacks, the FreeBSD developers are relying on a transparent build concept in the form of Zero-Trust Builds. The approach builds on the established Reproducible Builds, where binary files can be rebuilt bit-for-bit from the published source code. While reproducible builds primarily ensure verifiability, the zero-trust model goes a step further and removes trust from the build process itself. No single server, maintainer, or compiler can be considered more than potentially trustworthy.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Bernhard Wiedemann and Zbigniew J drzejewski-Szmek extended ismypackagereproducibleyet.org with initial support for Fedora [].
In addition, a number of contributors added a series of notes from our recent summit to the website, including Alexander Couzens [], Robin Candau [][][][][][][][][] and kpcyrd [].
Tool development
diffoscope version 307 was uploaded to Debian unstable by Chris Lamb, who made a number of changes including fixing compatibility with LLVM version 21 [], an attempt to automatically attempt to deploy to PyPI by liaising with the PyPI developers/maintainers (with this experimental feature). [] In addition, Vagrant Cascadian updated diffoscope in GNU Guix to version 307.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Welcome to the August 2025 report from the Reproducible Builds project!
Welcome to the latest report from the Reproducible Builds project for August 2025. These monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
In this report:
Reproducible Builds Summit 2025
Please join us at the upcoming Reproducible Builds Summit, set to take place from October 28th 30th 2025 in Vienna, Austria!**
We are thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort.
During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving.
If you re interesting in joining us this year, please make sure to read the event page which has more details about the event and location. Registration is open until 20th September 2025, and we are very much looking forward to seeing many readers of these reports there!
Reproducible Builds and live-bootstrap at WHY2025
WHY2025 (What Hackers Yearn) is a nonprofit outdoors hacker camp that takes place in Geestmerambacht in the Netherlands (approximately 40km north of Amsterdam). The event is organised for and by volunteers from the worldwide hacker community, and knowledge sharing, technological advancement, experimentation, connecting with your hacker peers, forging friendships and hacking are at the core of this event .
At this year s event, Frans Faase gave a talk on live-bootstrap, an attempt to provide a reproducible, automatic, complete end-to-end bootstrap from a minimal number of binary seeds to a supported fully functioning operating system .
Frans talk is available to watch on video and his slides are available as well.
DALEQ Explainable Equivalence for Java Bytecode
Jens Dietrich of the Victoria University of Wellington, New Zealand and Behnaz Hassanshahi of Oracle Labs, Australia published an article this month entitled DALEQ Explainable Equivalence for Java Bytecode which explores the options and difficulties when Java binaries are not identical despite being from the same sources, and what avenues are available for proving equivalence despite the lack of bitwise correlation:
[Java] binaries are often not bitwise identical; however, in most cases, the differences can be attributed to variations in the build environment, and the binaries can still be considered equivalent. Establishing such equivalence, however, is a labor-intensive and error-prone process.
Jens and Behnaz therefore propose a tool called DALEQ, which:
disassembles Java byte code into a relational database, and can normalise this database by applying Datalog rules. Those databases can then be used to infer equivalence between two classes. Notably, equivalence statements are accompanied with Datalog proofs recording the normalisation process. We demonstrate the impact of DALEQ in an industrial context through a large-scale evaluation involving 2,714 pairs of jars, comprising 265,690 class pairs. In this evaluation, DALEQ is compared to two existing bytecode transformation tools. Our findings reveal a significant reduction in the manual effort required to assess non-bitwise equivalent artifacts, which would otherwise demand intensive human inspection. Furthermore, the results show that DALEQ outperforms existing tools by identifying more artifacts rebuilt from the same code as equivalent, even when no behavioral differences are present.
Reproducibility regression identifies issue with AppArmor security policies
Tails developer intrigeri has tracked and followed a reproducibility regression in the generation of AppArmor policy caches, and has identified an issue with the 4.1.0 version of AppArmor.
Although initially tracked on the Tails issue tracker, intrigeri filed an issue on the upstream bug tracker. AppArmor developer John Johansen replied, confirming that they can reproduce the issue and went to work on a draft patch. Through this, John revealed that it was caused by an actual underlying security bug in AppArmor that is to say, it resulted in permissions not (always) matching what the policy intends and, crucially, not merely a cache reproducibility issue.
Work on the fix is ongoing at time of writing.
Rust toolchain fixes
Rust Clippy is a linting tool for the Rust programming language. It provides a collection of lints (rules) designed to identify common mistakes, stylistic issues, potential performance problems and unidiomatic code patterns in Rust projects. This month, however, Sosth ne Gu don filed a new issue in the GitHub requesting a new check that would lint against non deterministic operations in proc-macros, such as iterating over a HashMap .
Dropping support for the armhf architecture. From July 2015, Vagrant Cascadian has been hosting a zoo of approximately 35 armhf systems which were used for building Debian packages for that architecture.
Holger Levsen also uploaded strip-nondeterminism, our program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging. This new version, 1.14.2-1, adds some metadata to aid the deputy tool. ( #1111947)
Lastly, Bernhard M. Wiedemann posted another openSUSEmonthly update for their work there.
diffoscopediffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions, 303, 304 and 305 to Debian:
Improvements:
Use sed(1) backreferences when generating debian/tests/control to avoid duplicating ourselves. []
Move from a mono-utils dependency to versioned mono-devel mono-utils dependency, taking care to maintain the [!riscv64] architecture restriction. []
Use sed over awk to avoid mangling dependency lines containing = (equals) symbols such as version restrictions. []
Bug fixes:
Fix a test after the upload of systemd-ukify version 258~rc3. []
Ensure that Java class files are named .class on the filesystem before passing them to javap(1). []
Do not run jsondiff on files over 100KiB as the algorithm runs in O(n^2) time. []
Don t check for PyPDF version 3 specifically; check for >= 3. []
Misc:
Update copyright years. [][]
In addition, Martin Joerg fixed an issue with the HTML presenter to avoid crash when page limit is None [] and Zbigniew J drzejewski-Szmek fixed compatibility with RPM 6 []. Lastly, John Sirois fixed a missing requests dependency in the trydiffoscope tool. []
Website updates
Once again, there were a number of improvements made to our website this month including:
Chris Lamb:
Write and publish a news entry for the upcoming summit. []
Add some assets used at FOSSY, such as the badges and the paper handouts. []
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In August, however, a number of changes were made by Holger Levsen, including:
Ignore that the megacli RAID controller requires packages from Debian bookworm. []
In addition,
James Addison migrated away from deprecated toplevel deb822 Python module in favour of debian.deb822 in the bin/reproducible_scheduler.py script [] and removed a note on reproduce.debian.net note after the release of Debian trixie [].
Jochen Sprickerhof made a huge number of improvements to the reproduce.debian.net statistics calculation [][][][][][] as well as to the reproduce.debian.net service more generally [][][][][][][][].
Mattia Rizzolo performed a lot of work migrating scripts to SQLAlchemy version 2.0 [][][][][][] in addition to making some changes to the way openSUSE reproducibility tests are handled internally. []
Lastly, Roland Clobus updated the Debian Live packages after the release of Debian trixie. [][]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
It appears that the fragile masculinity tech evangelists have identified Debian as a community with boundaries which exclude them from abusing its members and they re so angry about it! In response to posts such as this, and inspired by Dr. Conway s piece, I ve composed a poem which, hopefully, correctly addresses the feelings of that crowd.
The Very Model of a Patriot Online
I am the very model of a modern patriot online,
My keyboard is my rifle and my noble cause is so divine.
I didn't learn my knowledge in a dusty college lecture hall,
But from the chans where bitter anonymity enthralls us all.
I spend a dozen hours every day upon my sacred quest,
To put the globo-homo narrative completely to the test.
My arguments are peer-reviewed by fellas in the comments section,
Which proves my every thesis is the model of complete perfection.
I m steeped in righteous anger that the libs call 'white fragility,'
For mocking their new pronouns and their lack of masculinity.
I m master of the epic troll, the comeback, and the searing snark,
A digital guerrilla who is fighting battles in the dark.
I know the secret symbols and the dog-whistles historical,
From Pepe the Frog to Let s Go Brandon, in order categorical;
In short, for fighting culture wars with rhetoric rhetorical,
I am the very model of a patriot polemical.
***
I stand for true expression, for the comics and the edgy clown,
Whose satire is too based for all the fragile folks in town.
They say my speech is 'violence' while my spirit they are trampling,
The way they try to silence me is really quite a startling sampling
Of 1984, which I've not read but thoroughly understand,
Is all about the tyranny that's gripping this once-blessed land.
My humor is a weapon, it s a razor-bladed, sharp critique,
(Though sensitive elites will call my masterpiece a form of hate speech ).
They cannot comprehend my need for freedom from all consequence,
They call it 'hate,' I call it 'jokes,' they just don't have a lick of sense.
So when they call me bigot for the spicy memes I post pro bono,
I tell them their the ones who're cancelled, I'm the victim here, you know!
Then I can write a screed against the globalist cabal, you see,
And tell you every detail of their vile conspiracy.
In short, when I use logic that is flexible and personal,
I am the very model of a patriot controversial.
***
I'm very well acquainted with the scientific method, too,
It's watching lengthy YouTube vids until my face is turning blue.
I trust the heartfelt testimony of a tearful, blonde ex-nurse,
But what a paid fact-checker says has no effect and is perverse.
A PhD is proof that you've been brainwashed by the leftist mob,
While my own research on a meme is how I really do my job.
I know that masks will suffocate and vaccines are a devil's brew,
I learned it from a podcast host who used to sell brain-boosting goo.
He scorns the lamestream media, the CNNs and all the rest,
Whose biased reporting I've put fully to a rigorous test
By only reading headlines and confirming what I already knew,
Then posting my analysis for other patriots to view.
With every "study" that they cite from sources I can't stand to hear,
My own profound conclusions become ever more precisely clear.
In short, when I've debunked the experts with a confident "Says who?!",
I am the very model of a researcher who sees right through you.
***
But all these culture wars are just a sleight-of-hand, a clever feint,
To hide the stolen ballots and to cover up the moral taint
Of D.C. pizza parlors and of shipping crates from Wayfair, it s true,
It's all connected in a plot against the likes of me and you!
I've analyzed the satellite photography and watermarks,
I understand the secret drops, the cryptic Qs, the coded sparks.
The habbening is coming, friends, just give it two more weeks or three,
When all the traitors face the trials for their wicked treachery.
They say that nothing happened and the dates have all gone past, you see,
But that's just disinformation from the globalist enemy!
Their moving goalposts constantly, a tactic that is plain to see,
To wear us down and make us doubt the coming, final victory!
My mind can see the patterns that a simple sheep could never find,
The hidden puppet-masters who are poisoning our heart and mind.
In short, when I link drag queens to the price of gas and child-trafficking,
I am the very model of a patriot whose brain is quickening!
***
My pickup truck's a testament to everything that I hold dear,
With vinyl decals saying things the liberals all hate and fear.
The Gadsden flag is waving next to one that's blue and starkly thin,
To show my deep respect for law, except the feds who're steeped in sin.
There's Punisher and Molon Labe, so that everybody knows
I'm not someone to trifle with when push to final shoving goes.
I've got my tactical assault gear sitting ready in the den,
Awaiting for the signal to restore our land with my fellow men.
I practice clearing rooms at home when my mom goes out to the store,
A modern Minuteman who's ready for a civil war.
The neighbors give me funny looks, I see them whisper and take note,
They'll see what's what when I'm the one who's guarding checkpoints by their throat.
I am a peaceful man, of course, but I am also pre-prepared,
To neutralize the threats of which the average citizen's unscared.
In short, when my whole identity's a brand of tactical accessory,
You'll say a better warrior has never graced a Cabela's registry.
***
They say I have to tolerate a man who thinks he is a dame,
While feminists and immigrants are putting out my vital flame!
There taking all the jobs from us and giving them to folks who kneel,
And "woke HR" says my best jokes are things I'm not allowed to feel!
An Alpha Male is what I am, a lion, though I'm in this cubicle,
My life's frustrations can be traced to policies Talmudical.
They lecture me on privilege, I, who have to pay my bills and rent!
While they give handouts to the lazy, worthless, and incompetent!
My grandad fought the Nazis! Now I have to press a key for one
To get a call-rep I can't understand beneath the blazing sun
Of global, corporate tyranny that's crushing out the very soul
Of men like me, who've lost their rightful, natural, and just control!
So yes, I am resentful! And I'm angry! And I'm right to be!
They've stolen all my heritage and my masculinity!
In short, when my own failures are somebody else's evil plot,
I am the very model of the truest patriot we've got!
***
There putting chips inside of you! Their spraying things up in the sky!
They want to make you EAT THE BUGS and watch your very spirit die!
The towers for the 5G are a mind-control delivery tool!
To keep you docile while the children suffer in a grooming school!
The WEF, and Gates, and Soros have a plan they call the 'Great Reset,'
You'll own no property and you'll be happy, or you'll be in debt
To social credit overlords who'll track your every single deed!
There sterilizing you with plastics that they've hidden in the feed!
The world is flat! The moon is fake! The dinosaurs were just a lie!
And every major tragedy's a hoax with actors paid to cry!
I'M NOT INSANE! I SEE THE TRUTH! MY EYES ARE OPEN! CAN'T YOU SEE?!
YOU'RE ALL ASLEEP! YOU'RE COWARDS! YOU'RE AFRAID OF BEING TRULY FREE!
My heart is beating faster now, my breath is short, my vision's blurred,
From all the shocking truth that's in each single, solitary word!
I've sacrificed my life and friends to bring this message to the light, so...
You'd better listen to me now with all your concentrated might, ho!
***
For my heroic struggle, though it's cosmic and it's biblical,
Is waged inside the comments of a post that's algorithm-ical.
And still for all my knowledge that's both tactical and practical,
My mom just wants the rent I owe and says I'm being dramatical.
Some variant of the following[1] worked for me.
The first line is the start of a for loop that runs on each node in my cluster a command using ssh. The argument -t is passed to attach a controlling terminal to STDIN, STDERR and STDOUT of this session, since there will not be an intervening shell to do it for us. The argument to ssh is a workflow of bash commands. They upgrade the 7.x system to the most recent packages on the repository. We then update the sources.list entries for the system to point at bookworm sources instead of bullseye. The package cache is updated and the proxmox-ve package is installed. Packages which are installed are upgraded to the versions from bookworm, and the installer concludes.
Dear reader, you might be surprised how many times I saw the word perl scroll by during the manual, serial scrolling of this install. It took hours. There were a few prompts, so stand by the keyboard!
[1]
gpg: key 1140AF8F639E0C39: public key "Proxmox Bookworm Release Key " imported
# have your ssh agent keychain running and a key loaded that's installed at
# ~root/.ssh/authorized_keys on each node
apt-get install -y keychain
eval $(keychain --eval)
ssh-add ~/.ssh/id_rsa
# Replace the IP address prefix (100.64.79.) and suffixes (64, 121-128)
# with the actual IPs of your cluster nodes. Or use hostnames :-)
for o in 64 121 122 123 124 125 126 127 128 ; do ssh -t root@100.64.79.$o '
sed -i -e s/bullseye/bookworm/g /etc/apt/sources.list $(compgen -G "/etc/apt/sources.listd.d/*.list") \
&& echo "deb [signed-by=/usr/share/keyrings/proxmox-release.gpg] http://download.proxmox.com/debian/pve bookworm pve-no-subscription" \
dd of=/etc/apt/sources.list.d/proxmox-release.list status=none \
&& echo "deb [signed-by=/usr/share/keyrings/proxmox-release.gpg] http://download.proxmox.com/debian/ceph-quincy bookworm main no-subscription" \
dd of=/etc/apt/sources.list.d/ceph.list status=none \
&& proxmox_keyid="0xf4e136c67cdce41ae6de6fc81140af8f639e0c39" \
&& curl "https://keyserver.ubuntu.com/pks/lookup?op=get&search=$ proxmox_keyid " \
gpg --dearmor -o /usr/share/keyrings/proxmox-release.gpg \
&& apt-get -y -qq update \
&& apt-get -y -qq install proxmox-ve \
&& apt-get -y -qq full-upgrade \
&& echo "$(hostname) upgraded"'; done
Welcome to our 5th report from the Reproducible Builds project in 2025! Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please do visit the Contribute page on our website.
In this report:
Security audit of Reproducible Builds tools published
The Open Technology Fund s (OTF) security partner Security Research Labs recently an conducted audit of some specific parts of tools developed by Reproducible Builds. This form of security audit, sometimes called a whitebox audit, is a form testing in which auditors have complete knowledge of the item being tested. They auditors assessed the various codebases for resilience against hacking, with key areas including differential report formats in diffoscope, common client web attacks, command injection, privilege management, hidden modifications in the build process and attack vectors that might enable denials of service.
The audit focused on three core Reproducible Builds tools: diffoscope, a Python application that unpacks archives of files and directories and transforms their binary formats into human-readable form in order to compare them; strip-nondeterminism, a Perl program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging; and reprotest, a Python application that builds source code multiple times in various environments in order to to test reproducibility.
OTF s announcement contains more of an overview of the audit, and the full 24-page report is available in PDF form as well.
[Colleagues] approached me to talk about a reproducibility issue they d been having with some R code. They d been running simulations that rely on generating samples from a multivariate normal distribution, and despite doing the prudent thing and using set.seed() to control the state of the random number generator (RNG), the results were not computationally reproducible. The same code, executed on different machines, would produce different random numbers. The numbers weren t just a little bit different in the way that we ve all wearily learned to expect when you try to force computers to do mathematics. They were painfully, brutally, catastrophically, irreproducible different. Somewhere, somehow, something broke.
present attestable builds, a new paradigm to provide strong source-to-binary correspondence in software artifacts. We tackle the challenge of opaque build pipelines that disconnect the trust between source code, which can be understood and audited, and the final binary artifact, which is difficult to inspect. Our system uses modern trusted execution environments (TEEs) and sandboxed build containers to provide strong guarantees that a given artifact was correctly built from a specific source code snapshot. As such it complements existing approaches like reproducible builds which typically require time-intensive modifications to existing build configurations and dependencies, and require independent parties to continuously build and verify artifacts.
The authors compare attestable builds with reproducible builds by noting an attestable build requires only minimal changes to an existing project, and offers nearly instantaneous verification of the correspondence between a given binary and the source code and build pipeline used to construct it , and proceed by determining that t he overhead (42 seconds start-up latency and 14% increase in build duration) is small in comparison to the overall build time.
Timo Pohl, Pavel Nov k, Marc Ohm and Michael Meier have published a paper called Towards Reproducibility for Software Packages in Scripting Language Ecosystems. The authors note that past research into Reproducible Builds has focused primarily on compiled languages and their ecosystems, with a further emphasis on Linux distribution packages:
However, the popular scripting language ecosystems potentially face unique issues given the systematic difference in distributed artifacts. This Systemization of Knowledge (SoK) [paper] provides an overview of existing research, aiming to highlight future directions, as well as chances to transfer existing knowledge from compiled language ecosystems. To that end, we work out key aspects in current research, systematize identified challenges for software reproducibility, and map them between the ecosystems.
Ultimately, the three authors find that the literature is sparse , focusing on few individual problems and ecosystems, and therefore identify space for more critical research.
Distribution work
In Debian this month:
Ian Jackson filed a bug against the debian-policy package in order to delve into an issue affecting Debian s support for cross-architecture compilation, multiple-architecture systems, reproducible builds SOURCE_DATE_EPOCH environment variable and the ability to recompile already-uploaded packages to Debian with a new/updated toolchain (binNMUs). Ian identifies a specific case, specifically in the libopts25-dev package, involving a manual page that had interesting downstream effects, potentially affecting backup systems. The bug generated a large number of replies, some of which have references to similar or overlapping issues, such as this one from 2016/2017.
There is now a Reproducibility Status link for each app on f-droid.org, listed on every app s page. Our verification server shows or based on its build results, where means our rebuilder reproduced the same APK file and means it did not. The IzzyOnDroid repository has developed a more elaborate system of badges which displays a for each rebuilder. Additionally, there is a sketch of a five-level graph to represent some aspects about which processes were run.
Hans compares the approach with projects such as Arch Linux and Debian that provide developer-facing tools to give feedback about reproducible builds, but do not display information about reproducible builds in the user-facing interfaces like the package management GUIs.
Arnout Engelen of the NixOS project has been working on reproducing the minimal installation ISO image. This month, Arnout has successfully reproduced the build of the minimal image for the 25.05 release without relying on the binary cache. Work on also reproducing the graphical installer image is ongoing.
In openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
Lastly in Fedora news, Jelle van der Waa opened issues tracking reproducible issues in Haskell documentation, Qt6 recording the host kernel and R packages recording the current date. The R packages can be made reproducible with packaging changes in Fedora.
diffoscope & disorderfsdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 295, 296 and 297 to Debian:
Don t rely on zipdetails --walk argument being available, and only add that argument on newer versions after we test for that. []
Review and merge support for NuGet packages from Omair Majid. []
Update copyright years. []
Merge support for an lzma comparator from Will Hollywood. [][]
Chris also merged an impressive changeset from Siva Mahadevan to make disorderfs more portable, especially on FreeBSD. disorderfs is our FUSE-based filesystem that deliberately introduces non-determinism into directory system calls in order to flush out reproducibility issues []. This was then uploaded to Debian as version 0.6.0-1.
Lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 296 [][] and 297 [][], and disorderfs to version 0.6.0 [][].
Website updates
Once again, there were a number of improvements made to our website this month including:
Chris Lamb:
Merged four or five suggestions from Guillem Jover for the GNU Autotools examples on the SOURCE_DATE_EPOCH example page []
Incorporated a number of fixes for the JavaScript SOURCE_DATE_EPOCH snippet from Sebastian Davis, which did not handle non-integer values correctly. []
Remove the JavaScript example that uses a fixed timezone on the SOURCE_DATE_EPOCH page. []
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility.
However, Holger Levsen posted to our mailing list this month in order to bring a wider awareness to funding issues faced by the Oregon State University (OSU) Open Source Lab (OSL). As mentioned on OSL s public post, recent changes in university funding makes our current funding model no longer sustainable [and that] unless we secure $250,000 in committed funds, the OSL will shut down later this year . As Holger notes in his post to our mailing list, the Reproducible Builds project relies on hardware nodes hosted there. Nevertheless, Lance Albertson of OSL posted an update to the funding situation later in the month with broadly positive news.
Separate to this, there were various changes to the Jenkins setup this month, which is used as the backend driver of for both tests.reproducible-builds.org and reproduce.debian.net, including:
Migrating the central jenkins.debian.net server AMD Opteron to Intel Haswell CPUs. Thanks to IONOS for hosting this server since 2012.
After testing it for almost ten years, the i386 architecture has been dropped from tests.reproducible-builds.org. This is because that, with the upcoming release of Debian trixie, i386 is no longer supported as a regular architecture there will be no official kernel and no Debian installer for i386 systems. As a result, a large number of nodes hosted by Infomaniak have been retooled from i386 to amd64.
Another node, ionos17-amd64.debian.net, which is used for verifying packages for all.reproduce.debian.net (hosted by IONOS) has had its memory increased from 40 to 64GB, and the number of cores doubled to 32 as well. In addition, two nodes generously hosted by OSUOSL have had their memory doubled to 16GB.
Lastly, we have been granted access to more riscv64 architecture boards, so now we have seven such nodes, all with 16GB memory and 4 cores that are verifying packages for riscv64.reproduce.debian.net. Many thanks to PLCT Lab, ISCAS for providing those.
Outside of this, a number of smaller changes were also made by Holger Levsen:
Disable testing of the i386 architecture. [][][][][]
Document the current disk usage. [][]
Address some image placement now that we only test three architectures. []
Keep track of build performance. []
Misc:
Fix a (harmless) typo in the multiarch_versionskew script. []
In addition, Jochen Sprickerhof made a series of changes related to reproduce.debian.net:
Add out of memory detection to the statistics page. []
Reverse the sorting order on the statistics page. [][][][]
Improve the spacing between statistics groups. []
Update a (hard-coded) line number in error message detection pertaining to a debrebuild line number. []
Support Debian unstable in the rebuilder-debian.sh script. []]
Rely on rebuildctl to sync only arch-specific packages. [][]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
0xFFFF: Use SOURCE_DATE_EPOCH for date in manual pages.
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Welcome to our fourth report from the Reproducible Builds project in 2025. These monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. Lastly, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
reproduce.debian.net
The last few months have seen the introduction, development and deployment of reproduce.debian.net. In technical terms, this is an instance of rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there.
This month, however, we are pleased to announce that reproduce.debian.net now tests all the Debian trixie architectures except s390x and mips64el.
The ppc64el architecture was added through the generous support of Oregon State University Open Source Laboratory (OSUOSL), and we can support the armel architecture thanks to CodeThink.
Fifty Years of Open Source Software Supply Chain Security
Russ Cox has published a must-read article in ACM Queue on Fifty Years of Open Source Software Supply Chain Security. Subtitled, For decades, software reuse was only a lofty goal. Now it s very real. , Russ article goes on to outline the history and original goals of software supply-chain security in the US military in the early 1970s, all the way to the XZ Utils backdoor of 2024. Through that lens, Russ explores the problem and how it has changed, and hasn t changed, over time.
He concludes as follows:
We are all struggling with a massive shift that has happened in the past 10 or 20 years in the software industry. For decades, software reuse was only a lofty goal. Now it s very real. Modern programming environments such as Go, Node and Rust have made it trivial to reuse work by others, but our instincts about responsible behaviors have not yet adapted to this new reality.
We all have more work to do.
Luca DiMaio of Chainguard posted to the list reporting that they had successfully implemented reproducible filesystem images with both ext4and an EFI system partition. They go on to list the various methods, and the thread generated at least fifteen replies.
David Wheeler announced that the OpenSSF is building a glossary of sorts in order that they consistently use the same meaning for the same term and, moreover, that they have drafted a definition for reproducible build . The thread generated a significant number of replies on the definition, leading to a potential update to the Reproducible Build s own definition.
My initial interest in reproducible builds was how do I distribute pre-compiled binaries on GitHub without people raising security concerns about them . I ve cycled back to this original problem about 5 years later and built a tool that is meant to address this. []
[ ] Achieving reproducibility at scale remains difficult, especially in Java, due to a range of non-deterministic factors and caveats in the build process. In this work, we focus on reproducibility in Java-based software, archetypal of enterprise applications. We introduce a conceptual framework for reproducible builds, we analyze a large dataset from Reproducible Central and we develop a novel taxonomy of six root causes of unreproducibility. We study actionable mitigations: artifact and bytecode canonicalization using OSS-Rebuild and jNorm respectively. Finally, we present Chains-Rebuild, a tool that raises reproducibility success from 9.48% to 26.89% on 12,283 unreproducible artifacts. To sum up, our contributions are the first large-scale taxonomy of build unreproducibility causes in Java, a publicly available dataset of unreproducible builds, and Chains-Rebuild, a canonicalization tool for mitigating unreproducible builds in Java.
OSS Rebuild adds new TUI features
OSS Rebuild aims to automate rebuilding upstream language packages (e.g. from PyPI, crates.io and npm registries) and publish signed attestations and build definitions for public use.
OSS Rebuild ships a text-based user interface (TUI) for viewing, launching, and debugging rebuilds. While previously requiring ownership of a full instance of OSS Rebuild s hosted infrastructure, the TUI now supports a fully local mode of build execution and artifact storage. Thanks to Giacomo Benedetti for his usage feedback and work to extend the local-only development toolkit.
Another feature added to the TUI was an experimental chatbot integration that provides interactive feedback on rebuild failure root causes and suggests fixes.
Debian developer Simon Josefsson published another two reproducibility-related blog posts this month, the first on the topic of Verified Reproducible Tarballs. Simon sardonically challenges the reader as follows: Do you want a supply-chain challenge for the Easter weekend? Pick some well-known software and try to re-create the official release tarballs from the corresponding Git checkout. Is anyone able to reproduce anything these days? After that, they also published a blog post on Building Debian in a GitLab Pipeline using their multi-stage rebuild approach.
Roland also posted to our mailing list to highlight that there is now another tool in Debian that generates reproducible output, equivs . This is a tool to create trivial Debian packages that might Depend on other packages. As Roland writes, building the [equivs] package has been reproducible for a while, [but] now the output of the [tool] has become reproducible as well .
The IzzyOnDroid Android APK repository made more progress in April. Thanks to funding by NLnet and Mobifree, the project was also to put more time into their tooling. For instance, developers can now easily run their own verification builder in less than 5 minutes . This currently supports Debian-based systems, but support for RPM-based systems is incoming.
The rbuilder_setup tool can now setup the entire framework within less than five minutes. The process is configurable, too, so everything from just the basics to verify builds up to a fully-fledged RB environment is also possible.
This tool works on Debian, RedHat and Arch Linux, as well as their derivates. The project has received successful reports from Debian, Ubuntu, Fedora and some Arch Linux derivates so far.
Documentation on how to work with reproducible builds (making apps reproducible, debugging unreproducible packages, etc) is available in the project s wiki page.
Future work is also in the pipeline, including documentation, guidelines and helpers for debugging.
NixOS defined an Outreachy project for improving build reproducibility. In the application phase, NixOS saw some strong candidates providing contributions, both on the NixOS side and upstream: guider-le-ecit analyzed a libpinyin issue. Tessy James fixed an issue in arandr and helped analyze one in libvlc that led to a proposed upstream fix. Finally, 3pleX fixed an issue which was accepted in upstream kitty, one in upstream maturin, one in upstream python-sip and one in the Nix packaging of python-libbytesize. Sadly, the funding for this internship fell through, so NixOS were forced to abandon their search.
Lastly, in openSUSE news, Bernhard M. Wiedemann posted another monthly update for their work there.
diffoscope & strip-nondeterminismdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading a number of versions to Debian:
Use the --walk argument over the potentially dangerous alternative --scan when calling out to zipdetails(1). []
Correct a longstanding issue where many >-based version tests used in conditional fixtures were broken. This was used to ensure that specific tests were only run when the version on the system was newer than a particular number. Thanks to Colin Watson for the report (Debian bug #1102658) []
Address a long-hidden issue in the test_versions testsuite as well, where we weren t actually testing the greater-than comparisons mentioned above, as it was masked by the tests for equality. []
Update copyright years. []
In strip-nondeterminism, however, Holger Levsen updated the Continuous Integration (CI) configuration in order to use the standard Debian pipelines via debian/salsa-ci.yml instead of using .gitlab-ci.yml. []
Website updates
Once again, there were a number of improvements made to our website this month including:
Aman Sharma added OSS-Rebuild s stabilize tool to the Tools page. [][]
Chris Lamb added a configure.ac (GNU Autotools) example for using SOURCE_DATE_EPOCH. []. Chris also updated the SOURCE_DATE_EPOCH snippet and move the archive metadata to a more suitable location. []
Denis Carikli added GNU Boot to our ever-evolving Projects page.
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In April, a number of changes were made by Holger Levsen, including:
Make various changes to the ppc64el nodes. [][]9[][]
Make various changes to the arm64 and armhf nodes. [][][][]
Various changes related to the rebuilderd-worker entry point. [][][]
Create and deploy a pkgsync script. [][][][][][][][]
Fix the monitoring of the riscv64 architecture. [][]
Make a number of changes related to starting the rebuilderd service. [][][][]
Backup-related:
Backup the rebuilder databases every week. [][][][]
Improve the node health checks. [][]
Misc:
Re-use existing connections to the SSH proxy node on the riscv64 nodes. [][]
Node maintenance. [][][]
In addition:
Jochen Sprickerhof fixed the risvc64 host names [] and requested access to all the rebuilderd nodes [].
Mattia Rizzolo updated the self-serve rebuild scheduling tool, replacing the deprecated SSO -style authentication with OpenIDC which authenticates against salsa.debian.org. [][][]
Roland Clobus updated the configuration for the osuosl3 node to designate 4 workers for bigger builds. []
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Up the Down Staircase is a novel (in an unconventional format,
which I'll describe in a moment) about the experiences of a new teacher in
a fictional New York City high school. It was a massive best-seller in the
1960s, including a 1967 movie, but seems to have dropped out of the public
discussion. I read it from the library sometime in the late 1980s or early
1990s and have thought about it periodically ever since. It was Bel
Kaufman's first novel.
Sylvia Barrett is a new graduate with a master's degree in English, where
she specialized in Chaucer. As Up the Down Staircase opens, it is
her first day as an English teacher in Calvin Coolidge High School. As she
says in a letter to a college friend:
What I really had in mind was to do a little teaching. "And gladly
wolde he lerne, and gladly teche" like Chaucer's Clerke of Oxenford.
I had come eager to share all I know and feel; to imbue the young with
a love for their language and literature; to instruct and to inspire.
What happened in real life (when I had asked why they were taking
English, a boy said: "To help us in real life") was something else
again, and even if I could describe it, you would think I am
exaggerating.
She instead encounters chaos and bureaucracy, broken windows and mindless
regulations, a librarian who is so protective of her books that she
doesn't let any students touch them, a school guidance counselor who
thinks she's Freud, and a principal whose sole interaction with the school
is to occasionally float through on a cushion of cliches, dispensing
utterly useless wisdom only to vanish again.
I want to take this opportunity to extend a warm welcome to all
faculty and staff, and the sincere hope that you have returned from a
healthful and fruitful summer vacation with renewed vim and vigor,
ready to gird your loins and tackle the many important and vital tasks
that lie ahead undaunted. Thank you for your help and cooperation in
the past and future.
Maxwell E. Clarke
Principal
In practice, the school is run by James J. McHare, Clarke's administrative
assistant, who signs his messages JJ McH, Adm. Asst. and who Sylvia
immediately starts calling Admiral Ass. McHare is a micro-managing control
freak who spends the book desperately attempting to impose order over
school procedures, the teachers, and the students, with very little
success. The title of the book comes from one of his detention slips:
Please admit bearer to class
Detained by me for going Up the Down staircase and subsequent
insolence.
JJ McH
The conceit of this book is that, except for the first and last chapters,
it consists only of memos, letters, notes, circulars, and other paper
detritus, often said to come from Sylvia's wastepaper basket. Sylvia
serves as the first-person narrator through her long letters to her
college friend, and through shorter but more frequent exchanges via
intraschool memo with Beatrice Schachter, another English teacher at the
same school, but much of the book lies outside her narration. The reader
has to piece together what's happening from the discarded paper of a
dysfunctional institution.
Amid the bureaucratic and personal communications, there are frequent
chapters with notes from the students, usually from the suggestion box
that Sylvia establishes early in the book. These start as chaotic glimpses
of often-misspelled wariness or open hostility, but over the course of
Up the Down Staircase, some of the students become characters with
fragmentary but still visible story arcs. This remains confusing
throughout the novel there are too many students to keep them entirely
straight, and several of them use pseudonyms for the suggestion box but
it's the sort of confusion that feels like an intentional authorial
choice. It mirrors the difficulty a teacher has in piecing together and
remembering the stories of individual students in overstuffed classrooms,
even if (like Sylvia and unlike several of her colleagues) the teacher is
trying to pay attention.
At the start, Up the Down Staircase reads as mostly-disconnected
humor. There is a strong "kids say the darnedest things" vibe, which didn't
entirely work for me, but the send-up of chaotic bureaucracy is both more
sophisticated and more entertaining. It has the "laugh so that you don't
cry" absurdity of a system with insufficient resources, entirely absent
management, and colleagues who have let their quirks take over their
personalities. Sylvia alternates between incredulity and stubbornness, and
I think this book is at its best when it shows the small acts of practical
defiance that one uses to carve out space and coherence from mismanaged
bureaucracy.
But this book is not just a collection of humorous anecdotes about
teaching high school. Sylvia is sincere in her desire to teach, which
crystallizes around, but is not limited to, a quixotic attempt to reach one
delinquent that everyone else in the school has written off. She slowly
finds her footing, she has a few breakthroughs in reaching her students,
and the book slowly turns into an earnest portrayal of an attempt to make
the system work despite its obvious unfitness for purpose. This part of
the book is hard to review. Parts of it worked brilliantly; I could feel
myself both adjusting my expectations alongside Sylvia to something less
idealistic and also celebrating the rare breakthrough with her. Parts of
it were weirdly uncomfortable in ways that I'm not sure I enjoyed. That
includes Sylvia's climactic conversation with the boy she's been trying to
reach, which was weirdly charged and ambiguous in a way that felt like the
author's reach exceeding their grasp.
One thing that didn't help my enjoyment is Sylvia's relationship with Paul
Barringer, another of the English teachers and a frustrated novelist and
poet. Everyone who works at the school has found their own way to cope
with the stress and chaos, and many of the ways that seem humorous turn
out to have a deeper logic and even heroism. Paul's, however, is to
retreat into indifference and alcohol. He is a believable character who
works with Kaufman's themes, but he's also entirely unlikable. I never
understood why Sylvia tolerated that creepy asshole, let alone kept having
lunch with him. It is clear from the plot of the book that Kaufman at
least partially understands Paul's deficiencies, but that did not help me
enjoy reading about him.
This is a great example of a book that tried to do something unusual and
risky and didn't entirely pull it off. I like books that take a risk, and
sometimes Up the Down Staircase is very funny or suddenly
insightful in a way that I'm not sure Kaufman could have reached with a
more traditional novel. It takes a hard look at what it means to try to
make a system work when it's clearly broken and you can't change it, and
the way all of the characters arrive at different answers that are much
deeper than their initial impressions was subtle and effective. It's the
sort of book that sticks in your head, as shown by the fact I bought it on
a whim to re-read some 35 years after I first read it. But it's not
consistently great. Some parts of it drag, the characters are
frustratingly hard to keep track of, and the emotional climax points are
odd and unsatisfying, at least to me.
I'm not sure whether to recommend it or not, but it's certainly unusual.
I'm glad I read it again, but I probably won't re-read it for another 35
years, at least.
If you are considering getting this book, be aware that it has a lot of
drawings and several hand-written letters. The publisher of the edition I
read did a reasonably good job formatting this for an ebook, but some of
the pages, particularly the hand-written letters, were extremely hard to
read on a Kindle. Consider paper, or at least reading on a tablet or
computer screen, if you don't want to have to puzzle over low-resolution
images.
The 1991 trade paperback had a new introduction by the author, reproduced
in the edition I read as an afterward (which is a better choice than an
introduction). It is a long and fascinating essay from Kaufman about her
experience with the reaction to this book, culminating in a passionate
plea for supporting public schools and public school teachers. Kaufman's
personal account adds a lot of depth to the story; I highly recommend it.
Content note: Self-harm, plus several scenes that are closely adjacent to
student-teacher relationships. Kaufman deals frankly with the problems of
mostly-poor high school kids, including sexuality, so be warned that this
is not the humorous romp that it might appear on first glance. A couple of
the scenes made me uncomfortable; there isn't anything explicit, but the
emotional overtones can be pretty disturbing.
Rating: 7 out of 10
Welcome to the third report in 2025 from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As usual, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
Debian bookworm live images now fully reproducible from their binary packages
Roland Clobus announced on our mailing list this month that all the major desktop variants (ie. Gnome, KDE, etc.) can be reproducibly created for Debian bullseye, bookworm and trixie from their (pre-compiled) binary packages.
Building reproducible Debian live images does not require building from reproducible source code, but this is still a remarkable achievement. Some large proportion of the binary packages that comprise these live images can (and were) built reproducibly, but live image generation works at a higher level. (By contrast, full or end-to-end reproducibility of a bootable OS image will, in time, require both the compile-the-packages the build-the-bootable-image stages to be reproducible.)
Nevertheless, in response, Roland s announcement generated significant congratulations as well as some discussion regarding the finer points of the terms employed: a full outline of the replies can be found here.
The news was also picked up by Linux Weekly News (LWN) as well as to Hacker News.
LWN: Fedora change aims for 99% package reproducibilityLinux Weekly News (LWN) contributor Joe Brockmeier has published a detailed round-up on how Fedora change aims for 99% package reproducibility. The article opens by mentioning that although Debian has been working toward reproducible builds for more than a decade , the Fedora project has now:
progressed far enough that the project is now considering a change proposal for the Fedora 43 development cycle, expected to be released in October, with a goal of making 99% of Fedora s package builds reproducible. So far, reaction to the proposal seems favorable and focused primarily on how to achieve the goal with minimal pain for packagers rather than whether to attempt it.
Over the last few releases, we [Fedora] changed our build infrastructure to make package builds reproducible. This is enough to reach 90%. The remaining issues need to be fixed in individual packages. After this Change, package builds are expected to be reproducible. Bugs will be filed against packages when an irreproducibility is detected. The goal is to have no fewer than 99% of package builds reproducible.
Python adopts PEP standard for specifying package dependencies
Python developer Brett Cannonreported on Fosstodon that PEP 751 was recently accepted. This design document has the purpose of describing a file format to record Python dependencies for installation reproducibility . As the abstract of the proposal writes:
This PEP proposes a new file format for specifying dependencies to enable reproducible installation in a Python environment. The format is designed to be human-readable and machine-generated. Installers consuming the file should be able to calculate what to install without the need for dependency resolution at install-time.
The PEP, which itself supersedes PEP 665, mentions that there are at least five well-known solutions to this problem in the community .
OSS Rebuild real-time validation and tooling improvements
OSS Rebuild aims to automate rebuilding upstream language packages (e.g. from PyPI, crates.io, npm registries) and publish signed attestations and build definitions for public use.
OSS Rebuild is now attempting rebuilds as packages are published, shortening the time to validating rebuilds and publishing attestations.
Aman Sharma contributed classifiers and fixes for common sources of non-determinism in JAR packages.
Improvements were also made to some of the core tools in the project:
timewarp for simulating the registry responses from sometime in the past.
proxy for transparent interception and logging of network activity.
SimpleX Chat server components now reproducible
SimpleX Chat is a privacy-oriented decentralised messaging platform that eliminates user identifiers and metadata, offers end-to-end encryption and has a unique approach to decentralised identity. Starting from version 6.3, however, Simplex has implemented reproducible builds for its server components. This advancement allows anyone to verify that the binaries distributed by SimpleX match the source code, improving transparency and trustworthiness.
Three new scholarly papers
Aman Sharma of the KTH Royal Institute of Technology of Stockholm, Sweden published a paper on Build and Runtime Integrity for Java (PDF). The paper s abstract notes that Software Supply Chain attacks are increasingly threatening the security of software systems and goes on to compare build- and run-time integrity:
Build-time integrity ensures that the software artifact creation process, from source code to compiled binaries, remains untampered. Runtime integrity, on the other hand, guarantees that the executing application loads and runs only
trusted code, preventing dynamic injection of malicious components.
The recently mandated software bill of materials (SBOM) is intended to help mitigate software supply-chain risk. We discuss extensions that would enable an SBOM to serve as a basis for making trust assessments thus also serving as a proactive defense.
A full PDF of the paper is available.
Lastly, congratulations to Giacomo Benedetti of the University of Genoa for publishing their PhD thesis. Titled Improving Transparency, Trust, and Automation in the Software Supply Chain, Giacomo s thesis:
addresses three critical aspects of the software supply chain to enhance security: transparency, trust, and automation. First, it investigates transparency as a mechanism to empower developers with accurate and complete insights into the software components integrated into their applications. To this end, the thesis introduces SUNSET and PIP-SBOM, leveraging modeling and SBOMs (Software Bill of Materials) as foundational tools for transparency and security. Second, it examines software trust, focusing on the effectiveness of reproducible builds in major ecosystems and proposing solutions to bolster their adoption. Finally, it emphasizes the role of automation in modern software management, particularly in ensuring user safety and application reliability. This includes developing a tool for automated security testing of GitHub Actions and analyzing the permission models of prominent platforms like GitHub, GitLab, and BitBucket.
Debian developer Simon Josefsson published two reproducibility-related blog posts this month. The first was on the topic of Reproducible Software Releases which discusses some techniques and gotchas that can be encountered when generating reproducible source packages ie. ensuring that the source code archives that open-source software projects release can be reproduced by others. Simon s second post builds on his earlier experiments with reproducing parts of Trisquel/Debian. Titled On Binary Distribution Rebuilds, it discusses potential methods to bootstrap a binary distribution like Debian from some other bootstrappable environment like Guix.
Jochen Sprickerhof uploaded sbuild version 0.88.5 with a change relevant to reproducible builds: specifically, the build_as_root_when_needed functionality still supports older versions of dpkg(1). []
The IzzyOnDroid Android APK repository reached another milestone in March, crossing the 40% coverage mark specifically, more than 42% of the apps in the repository is now reproducible
Thanks to funding by NLnet/Mobifree, the project was also to put more
time into their tooling. For instance, developers can now run easily their own verification builder in less than 5 minutes . This currently supports Debian-based systems, but support for RPM-based systems is incoming. Future work in the pipeline, including documentation, guidelines and helpers for debugging.
Fedora developer Zbigniew J drzejewski-Szmek announced a work-in-progress script called fedora-repro-build which attempts to reproduce an existing package within a Koji build environment. Although the project s README file lists a number of fields will always or almost always vary (and there are a non-zero list of other known issues), this is an excellent first step towards full Fedora reproducibility (see above for more information).
Lastly, in openSUSE news, Bernhard M. Wiedemann posted another monthly update for his work there.
[What] would it take to compromise an entire Linux distribution directly through their public infrastructure? Is it possible to perform such a compromise as simple security researchers with no available resources but time?
diffoscope & strip-nondeterminismdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 290, 291, 292 and 293 and 293 to Debian:
Bug fixes:
file(1) version 5.46 now returns XHTML document for .xhtml files such as those found nested within our .epub tests. []
Also consider .aar files as APK files, at least for the sake of diffoscope. []
Require the new, upcoming, version of file(1) and update our quine-related testcase. []
Codebase improvements:
Ensure all calls to our_check_output in the ELF comparator have the potential CalledProcessError exception caught. [][]
Correct an import masking issue. []
Add a missing subprocess import. []
Reformat openssl.py. []
Update copyright years. [][][]
In addition, Ivan Trubach contributed a change to ignore the st_size metadata entry for directories as it is essentially arbitrary and introduces unnecessary or even spurious changes. []
Website updates
Once again, there were a number of improvements made to our website this month, including:
Herv Boutemy updated the JVM documentation to clarify that the target is rebuild attestation. []
Lastly, Holger Levsen added Julien Malka and Zbigniew J drzejewski-Szmek to our Involved people [][] as well as replaced suggestions to follow us on Twitter/X to follow us on Mastodon instead [][].
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, a number of changes were made by Holger Levsen, including:
And finally, node maintenance was performed by Holger Levsen [][][] and Mattia Rizzolo [][].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
dmidecode grep -A8 ^System Information
tells me that the Manufacturer is HP and Product Name is OMEN Transcend Gaming Laptop 14-fb0xxx
I m provisioning a new piece of hardware for my eng consultant and it s proving more difficult than I expected. I must admit guilt for some of this difficulty. Instead of installing using the debian installer on my keychain, I dd d the pv block device of the 16 inch 2023 version onto the partition set aside from it. I then rebooted into rescue mode and cleaned up the grub config, corrected the EFI boot partition s path in /etc/fstab, ran the grub installer from the rescue menu, and rebooted.
On the initial boot of the system, X or Wayland or whatever is supposed to be talking to this vast array of GPU hardware in this device, it s unable to do more than create a black screen on vt1. It s easy enough to switch to vt2 and get a shell on the installed system. So I m doing that and investigating what s changed in Trixie. It seems like it s pretty significant. Did they just throw out Keith Packard s and Behdad Esfahbod s work on font rendering? I don t understand what s happening in this effort to abstract to a simpler interface. I ll probably end up reading more about it.
In an effort to have Debian re-configure the system for Desktop use, I have uninstalled as many packages as I could find that were in the display and human interface category, or were firmware/drivers for devices not present in this Laptop s SoC. Some commands I used to clear these packages and re-install connamon follow:
And then I rebooted. When it came back up, I was greeted with a login prompt, and Trixie looks to be fully functional on this device, including the attached wifi radio, tethering to my android, and the thunderbolt-attached Marvell SFP+ enclosure.
I m also installing libvirt and fetched the DVD iso material for Debian, Ubuntu and Rocky in case we have a need of building VMs during the development process. These are the platforms that I target at work with gcp Dataproc, so I m pretty good at performing maintenance operation on them at this point.
FreedomBox is a Debian blend that makes it easier to run your own server. Approximately every two years, there is a new stable release of Debian. This year s release will be called Debian 13 "trixie".
This post will provide an overview of changes between FreedomBox 23.6 (the version that shipped in Debian 12 "bookworm") and 25.5 (the latest release). Note: Debian 13 "trixie" is not yet released, so things may still change, be added or removed, before the official release.
General
A number of translations were updated, including Albanian, Arabic, Belarusian, Bulgarian, Chinese (Simplified Han script), Chinese (Traditional Han script), Czech, Dutch, French, German, Hindi, Japanese, Norwegian Bokm l, Polish, Portuguese, Russian, Spanish, Swedish, Telugu, Turkish, and Ukrainian.
Fix cases where a package or service is used by multiple apps, so that disabling or uninstalling one app does not affect the other app.
When uninstalling an app, purge the packages, to remove all data and configuration.
For configuration files that need to be placed into folders owned by other packages, we now install these files under /usr/share/freedombox/etc/, and create a symbolic link to the other package s configuration folder. This prevents the files being lost when other packages are purged.
Add an action to re-run the setup process for an app. This can fix many of the possible issues that occur.
Various improvements related to the "force upgrade" feature, which handles upgrading packages with conffile prompts.
Fix install/uninstall issues for apps that use MySQL database (WordPress, Zoph).
Improve handling of file uploads (Backups, Feather Wiki, Kiwix).
Switch to Bootstrap 5 front-end framework.
Removed I2P app, since the i2p package was removed from Debian.
Various user interface changes, including:
Add tags for apps, replacing short descriptions. When a tag is clicked, search and filter for one or multiple tags.
Organize the System page into sections.
Add breadcrumbs for page hierarchy navigation.
Add next steps page after initial FreedomBox setup.
Diagnostics
Add diagnostic checks to detect common errors.
Add diagnostics daily run, with notifications about failures.
Add Repair action for failed diagnostics, and option for automatic repairs.
Name Services
Move hostname and domain name configuration to Names page.
Support multiple static and/or dynamic domains.
Use systemd-resolved for DNS resolution.
Add options for setting global DNS-over-TLS and DNSSEC preferences.
Networks
Add more options for IPv6 configuration method.
Overhaul Wi-Fi networks scan page.
Privacy
Add option to disable fallback DNS servers.
Add option to set the lookup URL to get the public IP address of the FreedomBox.
Users and Groups
Delete or move home folder when user is deleted or renamed.
When a user is inactivated, also inactivate the user in LDAP.
Deluge
This BitTorrent client app should be available once again in Debian 13 "trixie".
Ejabberd
Turn on Message Archive Management setting by default, to help various XMPP clients use it.
Feather Wiki
Add new app for note taking.
This app lives in a single HTML file, which is downloaded from the FreedomBox website.
GitWeb
Disable snapshot feature, due to high resource use.
Various fixes for repository operations.
GNOME
Add new app to provide a graphical desktop environment.
Requires a monitor, keyboard, and mouse to be physically connected to the FreedomBox.
Not suitable for low-end hardware.
ikiwiki
Disable discussion pages by default for new wiki/blog, to avoid spam.
Kiwix
Add new app for offline reader of Wikipedia and other sites.
Matrix Synapse
Add an option for token-based registration verification, so that users signing up for new accounts will need to provide a token during account registration.
MediaWiki
Allow setting the site language code.
Increase PHP maximum execution time to 100 seconds.
MiniDLNA
Add media directory selection form.
Miniflux
Add new app for reading news from RSS/ATOM feeds.
Nextcloud
Add new app for file sync and collaboration.
Uses a Docker container maintained by the Nextcloud community. The container is downloaded from FreedomBox container registry.
OpenVPN
Renew server/client certificates, and set expiry to 10 years.
Postfix/Dovecot
Fix DKIM signing.
Show DNS entries for all domains.
Shadowsocks Server
Add new app for censorship resistance, separate from Shadowsocks Client app.
SOGo
Add new app for groupware (webmail, calendar, tasks, and contacts).
Works with Postfix/Dovecot email server app.
TiddlyWiki
Add new app for note taking.
This app lives in a single HTML file, which is downloaded from the FreedomBox website.
Tor Proxy
Add new app for Tor SOCKS proxy, separate from Tor app.
Transmission
Allow remote user interfaces to connect.
Conclusion
Over the past two years, FreedomBox has been increasing the number of features and applications available to its users. We have also focused on improving the reliability of the system, detecting unexpected situations, and providing means to return to a known good state. With these improvements, FreedomBox has become a good solution for people with limited time or energy to set up and start running a personal server, at home or in the cloud.
Looking forward, we would like to focus on making more powerful hardware available with FreedomBox pre-installed and ready to be used. This hardware would also support larger storage devices, making it suitable as a NAS or media server. We are also very interested in exploring new features such as atomic updates, which will further enhance the reliability of the system.
Welcome to the second report in 2025 from the Reproducible Builds project. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. As usual, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
Reproducible Builds at FOSDEM 2025
Similar to last year s event, there was considerable activity regarding Reproducible Builds at FOSDEM 2025, held on on 1st and 2nd February this year in Brussels, Belgium. We count at least four talks related to reproducible builds. (You can also read our news report from last year s event in which Holger Levsen presented in the main track.)
Jelle van der Waa, Holger Levsen and kpcyrd presented in the Distributions track on A Tale of several distros joining forces for a common goal. In this talk, three developers from two different Linux distributions (Arch Linux and Debian), discuss this goal which is, of course, reproducible builds. The presenters discuss both what is shared and different between the two efforts, touching on the history and future challenges alike. The slides of this talk are available to view, as is the full video (30m02s). The talk was also discussed on Hacker News.
Zbigniew J drzejewski-Szmek presented in the ever-popular Python track a on Rewriting .pyc files for fun and reproducibility, i.e. the bytecode files generated by Python in order to speed up module imports: It s been known for a while that those are not reproducible: on different architectures, the bytecode for exactly the same sources ends up slightly different. The slides of this talk are available, as is the full video (28m32s).
In the Nix and NixOS track, Julien Malka presented on the Saturday asking How reproducible is NixOS: We know that the NixOS ISO image is very close to be perfectly reproducible thanks to reproducible.nixos.org, but there doesn t exist any monitoring of Nixpkgs as a whole. In this talk I ll present the findings of a project that evaluated the reproducibility of Nixpkgs as a whole by mass rebuilding packages from revisions between 2017 and 2023 and comparing the results with the NixOS cache. Unfortunately, no video of the talk is available, but there is a blog and article on the results.
Lastly, Simon Tournier presented in the Open Research track on the confluence of GNU Guix and Software Heritage: Source Code Archiving to the Rescue of Reproducible Deployment. Simon s talk describes design and implementation we came up and reports on the archival coverage for package source code with data collected over five years. It opens to some remaining challenges toward a better open and reproducible research. The slides for the talk are available, as is the full video (23m17s).
Reproducible Builds at PyCascades 2025
Vagrant Cascadian presented at this year s PyCascades conference which was held on February 8th and 9th February in Portland, OR, USA. PyCascades is a regional instance of PyCon held in the Pacific Northwest. Vagrant s talk, entitled Re-Py-Ducible Builds caught the audience s attention with the following abstract:
Crank your Python best practices up to 11 with Reproducible Builds! This talk will explore Reproducible Builds by highlighting issues identified in Python projects, from the simple to the seemingly inscrutable. Reproducible Builds is basically the crazy idea that when you build something, and you build it again, you get the exact same thing or even more important, if someone else builds it, they get the exact same thing too.
reproduce.debian.net updates
The last few months have seen the introduction of reproduce.debian.net. Announced first at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project.
Powering this work is rebuilderd, our server which monitors the official package repositories of Linux distributions and attempt to reproduce the observed results there. This month, however, Holger Levsen:
Split packages that are not specific to any architecture away from amd64.reproducible.debian.net service into a new all.reproducible.debian.net page.
Increased the number of riscv64 nodes to a total of 4, and added a new amd64 node added thanks to our (now 10-year sponsor), IONOS.
Uploaded the devscripts package, incorporating changes from Jochen Sprickerhof to the debrebuild script specifically to fix the handling the Rules-Requires-Root header in Debian source packages.
Uploaded a number of Rust dependencies of rebuilderd (rust-libbz2-rs-sys, rust-actix-web, rust-actix-server, rust-actix-http, rust-actix-server, rust-actix-http, rust-actix-web-codegen and rust-time-tz) after they were prepared by kpcyrd :
Jochen Sprickerhof also updated the sbuild package to:
Obey requests from the user/developer for a different temporary directory.
Use the root/superuser for some values of Rules-Requires-Root.
Don t pass --root-owner-group to old versions of dpkg.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
go (clear GOROOT for func ldShared when -trimpath is used)
Distribution work
There as been the usual work in various distributions this month, such as:
In Debian, 17 reviews of Debian packages were added, 6 were updated and 8 were removed this month adding to our knowledge about identified issues.
Fedora developers Davide Cavalca and Zbigniew J drzejewski-Szmek gave a talk on Reproducible Builds in Fedora (PDF), touching on SRPM-specific issues as well as the current status and future plans.
Thanks to an investment from the Sovereign Tech Agency, the FreeBSD project s work on unprivileged and reproducible builds continued this month. Notable fixes include:
The Yocto Project has been struggling to upgrade to the latest Go and Rust releases due to reproducibility problems in the newer versions. Hongxu Jia tracked down the issue with Go which meant that the project could upgrade from the 1.22 series to 1.24, with the fix being submitted upstream for review (see above). For Rust, however, the project was significantly behind, but has made recent progress after finally identifying the blocking reproducibility issues. At time of writing, the project is at Rust version 1.82, with patches under review for 1.83 and 1.84 and fixes being discussed with the Rust developers. The project hopes to improve the tests for reproducibility in the Rust project itself in order to try and avoid future regressions.
Yocto continues to maintain its ability to binary reproduce all of the recipes in OpenEmbedded-Core, regardless of the build host distribution or the current build path.
Finally, Douglas DeMaio published an article on the openSUSE blog on announcing that the Reproducible-openSUSE (RBOS) Project Hits [Significant] Milestone. In particular:
The Reproducible-openSUSE (RBOS) project, which is a proof-of-concept fork of openSUSE, has reached a significant milestone after demonstrating a usable Linux distribution can be built with 100% bit-identical packages.
diffoscope & strip-nondeterminismdiffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 288 and 289 to Debian:
Add asar to DIFFOSCOPE_FAIL_TESTS_ON_MISSING_TOOLS in order to address Debian bug #1095057) []
Catch a CalledProcessError when calling html2text. []
Additionally, Vagrant Cascadian updated diffoscope in GNU Guix to version 287 [][] and 288 [][] as well as submitted a patch to update to 289 []. Vagrant also fixed an issue that was breaking reprotest on Guix [][].
strip-nondeterminism is our sister tool to remove specific non-deterministic results from a completed build. This month version 1.14.1-2 was uploaded to Debian unstable by Holger Levsen.
Website updates
There were a large number of improvements made to our website this month, including:
Holger Levsen clarified the name of a link to our old Wiki pages on the History page [] and added a number of new links to the Talks & Resources page [][].
James Addison update the website s own README file to document a couple of additional dependencies [][], as well as did more work on a future Getting Started guide page [][].
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen, including:
Fix /etc/cron.d and /etc/logrotate.d permissions for Jenkins nodes. []
Add support for riscv64 architecture nodes. [][]
Grant Jochen Sprickerhof access to the o4 node. []
Disable the janitor-setup-worker. [][]
In addition:
kpcyrd fixed the /all/api/ API endpoints on reproduce.debian.net by altering the nginx configuration. []
James Addison updated reproduce.debian.net to display the so-called bad reasons hyperlink inline [] and merged the Categorized issues links into the Reproduced builds column [].
Jochen Sprickerhof also made some reproduce.debian.net-related changes, adding support for detecting a bug in the mmdebstrap package [] as well as updating some documentation [].
Roland Clobus continued their work on reproducible live images for Debian, making changes related to new clustering of jobs in openQA. []
And finally, both Holger Levsen [][][] and Vagrant Cascadian performed significant node maintenance. [][][][][]
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
Welcome to the first report in 2025 from the Reproducible Builds project!
Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As usual, though, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
reproduce.debian.net
The last few months saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. Powering that is rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there.
This month, however, we are pleased to announce that in addition to the existing amd64.reproduce.debian.net and i386.reproduce.debian.net architecture-specific pages, we now build for a three more architectures (for a total of five) arm64armhf and riscv64.
Two new academic papers
Giacomo Benedetti, Oreofe Solarin, Courtney Miller, Greg Tystahl, William Enck, Christian K stner, Alexandros Kapravelos, Alessio Merlo and Luca Verderame published an interesting article recently. Titled An Empirical Study on Reproducible Packaging in Open-Source Ecosystem, the abstract outlines its optimistic findings:
[We] identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.
In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709,816 packages from historical snapshots of the nixpkgs repository[. We] obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full diffoscopes: recursive diffs of where unreproducible build artifacts differ.
As above, the entire PDF of the article is available to view online.
Distribution work
There as been the usual work in various distributions this month, such as:
10+ reviews of Debian packages were added, 11 were updated and 10 were removed this month adding to our knowledge about identified issues. A number of issue types were updated also.
The FreeBSD Foundation announced that a planned project to deliver zero-trust builds has begun in January 2025 . Supported by the Sovereign Tech Agency, this project is centered on the various build processes, and that the primary goal of this work is to enable the entire release process to run without requiring root access, and that build artifacts build reproducibly that is, that a third party can build bit-for-bit identical artifacts. The full announcement can be found online, which includes an estimated schedule and other details.
Following-up to a substantial amount of previous work pertaining the Sphinx documentation generator, James Addison asked a question pertaining to the relationship between SOURCE_DATE_EPOCH environment variable and testing that generated a number of replies.
Adithya Balakumar of Toshiba asked a question about whether it is possible to make ext4 filesystem images reproducible. Adithya s issue is that even the smallest amount of post-processing of the filesystem results in the modification of the Last mount and Last write timestamps.
FUSE (Filesystem in USErspace) filesystems such as disorderfs do not delete files from the underlying filesystem when they are deleted from the overlay. This can cause seemingly straightforward tests for example, cases that expect directory contents to be empty after deletion is requested for all files listed within them to fail.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
diffoscopediffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 285, 286 and 287 to Debian:
Security fixes:
Validate the --css command-line argument to prevent a potential Cross-site scripting (XSS) attack. Thanks to Daniel Schmidt from SRLabs for the report. []
Prevent XML entity expansion attacks. Thanks to Florian Wilkens from SRLabs for the report.. [][]
Print a warning if we have disabled XML comparisons due to a potentially vulnerable version of pyexpat. []
Bug fixes:
Correctly identify changes to only the line-endings of files; don t mark them as Ordering differences only. []
When passing files on the command line, don t call specialize( ) before we ve checked that the files are identical or not. []
Do not exit with a traceback if paths are inaccessible, either directly, via symbolic links or within a directory. []
Don t cause a traceback if cbfstool extraction failed.. []
Use the surrogateescape mechanism to avoid a UnicodeDecodeError and crash when any decoding zipinfo output that is not UTF-8 compliant. []
Testsuite improvements:
Don t mangle newlines when opening test fixtures; we want them untouched. []
Move to assert_diff in test_text.py. []
Misc improvements:
Drop unused subprocess imports. [][]
Drop an unused function in iso9600.py. []
Inline a call and check of Config().force_details; no need for an additional variable in this particular method. []
Remove an unnecessary return value from the Difference.check_for_ordering_differences method. []
Remove unused logging facility from a few comparators. []
Update copyright years. [][]
In addition, fridtjof added support for the ASAR.tar-like archive format. [][][][] and lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 285 [][] and 286 [][].
strip-nondeterminism is our sister tool to remove specific non-deterministic results from a completed build. This month version 1.14.1-1 was uploaded to Debian unstable by Chris Lamb, making the following the changes:
Clarify the --verbose and non --verbose output of bin/strip-nondeterminism so we don t imply we are normalizing files that we are not. []
Bump Standards-Version to 4.7.0. []
Website updates
There were a large number of improvements made to our website this month, including:
Update the website s README to make the setup command copy & paste friendly. []
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen, including:
Disable debug output for FreeBSD rebuilding jobs. []
Upgrade to FreeBSD 14.2 [] and document that bmake was installed on the underlying FreeBSD virtual machine image [].
Misc:
Update the real year to 2025. []
Don t try to install a Debian bookworm kernel from backports on the infom08 node which is running Debian trixie. []
Don t warn about system updates for systems running Debian testing. []
Fix a typo in the ZOMBIES definition. [][]
In addition:
Ed Maste modified the FreeBSD build system to the clean the object directory before commencing a build. []
Gioele Barabucci updated the rebuilder stats to first add a category for network errors [] as well as to categorise failures without a diffoscope log [].
Jessica Clarke also made some FreeBSD-related changes, including:
Ensuring we clean up the object directory for second build as well. [][]
Updating the sudoers for the relevant rm -rf command. []
Update the cleanup_tmpdirs method to to match other removals. []
Rework and simplify the generation of statistics linked from reproduce.debian.net. [][][][]
Roland Clobus:
Update the reproducible_debstrap job to call Debian s debootstrap with the full path [] and to use eatmydata as well [][].
Make some changes to deduce the CPU load in the debian_live_build job. []
Lastly, both Holger Levsen [] and Vagrant Cascadian [] performed some node maintenance.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
As people around the world understand how LLMs behave, more and more people
wonder as to why these models hallucinate, and what can be done about to
reduce it. This provocatively named article by Michael Townsen Hicks, James
Humphries and Joe Slater bring is an excellent primer to better understanding
how LLMs work and what to expect from them.
As humans carrying out our relations using our language as the main tool, we are
easily at awe with the apparent ease with which ChatGPT (the first widely
available, and to this day probably the best known, LLM-based automated
chatbot) simulates human-like understanding and how it helps us to easily
carry out even daunting data aggregation tasks. It is common that people ask
ChatGPT for an answer and, if it gets part of the answer wrong, they justify it
by stating that it s just a hallucination. Townsen et al. invite us to switch
from that characterization to a more correct one: LLMs are bullshitting. This
term is formally presented by Frankfurt [1]. To Bullshit is not the same as to
lie, because lying requires to know (and want to cover) the truth. A
bullshitter not necessarily knows the truth, they just have to provide a
compelling description, regardless of what is really aligned with truth.
After introducing Frankfurt s ideas, the authors explain the fundamental ideas
behind LLM-based chatbots such as ChatGPT; a Generative Pre-trained Transformer
(GPT) s have as their only goal to produce human-like text, and it is carried
out mainly by presenting output that matches the input s high-dimensional
abstract vector representation, and probabilistically outputs the next token
(word) iteratively with the text produced so far. Clearly, a GPT s ask is not to
seek truth or to convey useful information they are built to provide a
normal-seeming response to the prompts provided by their user. Core data are not
queried to find optimal solutions for the user s requests, but are generated on
the requested topic, attempting to mimic the style of document set it was
trained with.
Erroneous data emitted by a LLM is, thus, not equiparable with what a person
could hallucinate with, but appears because the model has no understanding of
truth; in a way, this is very fitting with the current state of the world, a
time often termed as the age of post-truth [2]. Requesting an LLM to provide
truth in its answers is basically impossible, given the difference between
intelligence and consciousness: Following Harari s definitions [3], LLM
systems, or any AI-based system, can be seen as intelligent, as they have the
ability to attain goals in various, flexible ways, but they cannot be seen as
conscious, as they have no ability to experience subjectivity. This is, the
LLM is, by definition, bullshitting its way towards an answer: their goal is
to provide an answer, not to interpret the world in a trustworthy way.
The authors close their article with a plea for literature on the topic to adopt
the more correct bullshit term instead of the vacuous, anthropomorphizing
hallucination . Of course, being the word already loaded with a negative
meaning, it is an unlikely request.
This is a great article that mixes together Computer Science and Philosophy, and
can shed some light on a topic that is hard to grasp for many users.
[1] Frankfurt, Harry (2005). On Bullshit. Princeton University Press.
[2] Zoglauer, Thomas (2023). Constructed truths: truth and knowledge in a
post-truth world. Springer.
[3] Harari, Yuval Noah (2023. Nexus: A Brief History of Information Networks
From the Stone Age to AI. Random House.
What is HMAC?
HMAC stands for Hash-Based Message Authentication Code. It s a specific way to use a cryptographic hash function (like SHA-1, SHA-256, etc.) along with a secret key to produce a unique fingerprint of some data. This fingerprint allows someone else with the same key to verify that the data hasn t been tampered with.
How HMAC Works
Keyed Hashing: The core idea is to incorporate the secret key into the hashing process. This is done in a specific way to prevent clever attacks that might try to bypass the security.
Inner and Outer Hashing: HMAC uses two rounds of hashing. First, the message and a modified version of the key are hashed together. Then, the result of that hash, along with another modified version of the key, are hashed again. This two-step process adds an extra layer of protection.
HMAC in OpenSSH
OpenSSH uses HMAC to ensure the integrity of messages sent back and forth during an SSH session. This prevents an attacker from subtly modifying data in transit.
HMAC-SHA1 with OpenSSH: Is it Weak?
SHA-1 itself is considered cryptographically broken. This means that with enough computing power, it s possible to find collisions (two different messages that produce the same hash). However, HMAC-SHA1 is generally still considered secure for most purposes. This is because exploiting weaknesses in SHA-1 to break HMAC-SHA1 is much more difficult than just finding collisions in SHA-1.
Should you use it?
While HMAC-SHA1 might still be okay for now, it s best practice to move to stronger alternatives like HMAC-SHA256 or HMAC-SHA512. OpenSSH supports these, and they provide a greater margin of safety against future attacks.
In Summary
HMAC is a powerful tool for ensuring data integrity. Even though SHA-1 has weaknesses, HMAC-SHA1 in OpenSSH is likely still safe for most users. However, to be on the safe side and prepare for the future, switching to HMAC-SHA256 or HMAC-SHA512 is recommended.
Following are instructions for creating dataproc clusters with sha1 mac support removed:
I can appreciate an excess of caution, and I can offer you some code to produce Dataproc instances which do not allow HMAC authentication using sha1.
Place code similar to this in a startup script or an initialization action that you reference when creating a cluster with gcloud dataproc clusters create:
#!/bin/bash
# remove mac specification from sshd configuration
sed -i -e 's/^macs.*$//' /etc/ssh/sshd_config
# place a new mac specification at the end of the service configuration
ssh -Q mac perl -e \
'@mac=grep chomp; ! /sha1/ ; print("macs ", join(",",@mac), $/)' >> /etc/ssh/sshd_config
# reload the new ssh service configuration
systemctl reload ssh.service
If this code is hosted on GCS, you can refer to it with
Welcome to the December 2024 report from the Reproducible Builds project!
Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As ever, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website.
Table of contents:
reproduce.debian.net
Last month saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. rebuilderd is our server designed monitor the official package repositories of Linux distributions and attempts to reproduce the observed results there.
This month, however, we are pleased to announce that not only does the service now produce graphs, the reproduce.debian.net homepage itself has become a start page of sorts, and the amd64.reproduce.debian.net and i386.reproduce.debian.net pages have emerged. The first of these rebuilds the amd64 architecture, naturally, but it also is building Debian packages that are marked with the no architecture label, all. The second builder is, however, only rebuilding the i386 architecture.
Both of these services were also switched to reproduce the Debian trixie distribution instead of unstable, which started with 43% of the archive rebuild with 79.3% reproduced successfully. This is very much a work in progress, and we ll start reproducing Debian unstable soon.
Our i386 hosts are very kindly sponsored by Infomaniak whilst the amd64 node is sponsored by OSUOSL thank you! Indeed, we are looking for more workers for more Debian architectures; please contact us if you are able to help.
debian-repro-status
Reproducible builds developer kpcyrd has published a client program for reproduce.debian.net (see above) that queries the status of the locally installed packages and rates the system with a percentage score. This tool works analogously to arch-repro-status for the Arch Linux Reproducible Builds setup.
The tool was packaged for Debian and is currently available in Debian trixie: it can be installed with apt install debian-repro-status.
Bernhard M. Wiedemann wrote a detailed post on his long journey towards a bit-reproducible Emacs package. In his interesting message, Bernhard goes into depth about the tools that they used and the lower-level technical details of, for instance, compatibility with the version for glibc within openSUSE.
Shivanand Kunijadar posed a question pertaining to the reproducibility issues with encrypted images. Shivanand explains that they must use a random IV for encryption with AES CBC. The resulting artifact is not reproducible due to the random IV used. The message resulted in a handful of replies, hopefully helpful!
Lastly, kpcyrd followed-up to a post from September 2024 which mentioned their desire for someone to implement a hashset of allowed module hashes that is generated during the kernel build and then embedded in the kernel image , thus enabling a deterministic and reproducible build. However, they are now reporting that somebody implemented the hash-based allow list feature and submitted it to the Linux kernel mailing list . Like kpcyrd, we hope it gets merged.
Enhancing the Security of Software Supply Chains: Methods and Practices
Mehdi Keshani of the Delft University of Technology in the Netherlands has published their thesis on Enhancing the Security of Software Supply Chains: Methods and Practices . Their introductory summary first begins with an outline of software supply chains and the importance of the Maven ecosystem before outlining the issues that it faces that threaten its security and effectiveness . To address these:
First, we propose an automated approach for library reproducibility to enhance library security during the deployment phase. We then develop a scalable call graph generation technique to support various use cases, such as method-level vulnerability analysis and change impact analysis, which help mitigate security challenges within the ecosystem. Utilizing the generated call graphs, we explore the impact of libraries on their users. Finally, through empirical research and mining techniques, we investigate the current state of the Maven ecosystem, identify harmful practices, and propose recommendations to address them.
diffoscopediffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 283 and 284 to Debian:
Update copyright years. []
Update tests to support file 5.46. [][]
Simplify tests_quines.py::test_ differences,differences_deb to simply use assert_diff and not mangle the test fixture. []
Supply-chain attack in the Solana ecosystem
A significant supply-chain attack impacted Solana, an ecosystem for decentralised applications running on a blockchain.
Hackers targeted the @solana/web3.js JavaScript library and embedded malicious code that extracted private keys and drained funds from cryptocurrency wallets. According to some reports, about $160,000 worth of assets were stolen, not including SOL tokens and other crypto assets.
Website updates
Similar to last month, there was a large number of changes made to our website this month, including:
Chris Lamb:
Make the landing page hero look nicer when the vertical height component of the viewport is restricted, not just the horizontal width.
Rename the Buy-in page to Why Reproducible Builds? []
Removing the top black border. [][]
Holger Levsen:
Fixed a number of issues on the 2024 Summit page, including fixing the path to a sponsor logo [] but also added the event documentation from Aspiration [].
Check and cleanup a presentation formerly linked from the About page on the Debian wiki. []
Remove the sidebar-type layout and move to a static navigation element. [][][][]
Create and merge a new Success stories page, which highlights the success stories of Reproducible Builds, showcasing real-world examples of projects shipping with verifiable, reproducible builds. These stories aim to enhance the technical resilience of the initiative by encouraging community involvement and inspiring new contributions. . []
Remove the translation icon from the navigation bar. []
Remove unused CSS styles pertaining to the sidebar. []
Add sponsors to the global footer. []
Add extra space on large screens on the Who page. []
Hide the side navigation on small screens on the Documentation pages. []
Debian changes
There were a significant number of reproducibility-related changes within Debian this month, including:
Santiago Vila uploaded version 0.11+nmu4 of the dh-buildinfo package. In this release, the dh_buildinfo becomes a no-op ie. it no longer does anything beyond warning the developer that the dh-buildinfo package is now obsolete. In his upload, Santiago wrote that We still want packages to drop their [dependency] on dh-buildinfo, but now they will immediately benefit from this change after a simple rebuild.
Holger Levsen filed Debian bug #1091550 requesting a rebuild of a number of packages that were built with a very old version of dpkg.
Gioele Barabucci filed a number of bugs against the debrebuild component/script of the devscripts package, including:
#1089087: Address a spurious extra subdirectory in the build path.
#1089201: Extra zero bytes added to .dynstr when rebuilding CMake projects.
#1089088: Some binNMUs have a 1-second offset in some timestamps.
Gioele Barabucci also filed a bug against the dh-r package to report that the Recommends and Suggests fields are missing from rebuilt R packages. At the time of writing, this bug has no patch and needs some help to make over 350 binary packages reproducible.
The IzzyOnDroid Android APK repository published an extensive Review of 2024 and Outlook for 2025 which includes statistics and future plans related to reproducible builds (including having passed the 30% mark this month).
The historic Arch Linux reproducibility tests that were hosted at tests.reproducible-builds.org/archlinux now redirect to reproducible.archlinux.org instead. In fact, everything Arch-related has now been removed from the jenkins.debian.net.git repository, as those continuous integration tests have been disabled for some time.
Use a non-constant object to test memory address capture. []
rebuilderd was updated as follows by kpcyrd:
Migrate diesel dependency from 1.x to 2.x. []
Migrate clap dependency from 2 to 4. []
Refactor reqwest code, and the replace openssl dependency with the memory-safe rustls. [][]
Lastly, in openSUSE, Bernhard M. Wiedemann published another report for the distribution. There, Bernhard reports about the success of building R-B-OS , a partial fork of openSUSE with only 100% bit-reproducible packages. This effort was sponsored by the NLNet NGI0 initiative.
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In November, a number of changes were made by Holger Levsen, including:
Lastly, Gioele Barabucci also classified packages affected by 1-second offset issue filed as Debian bug #1089088 [][][][], Chris Hofstaedtler updated the URL for Grml s dpkg.selections file [], Roland Clobus updated the Jenkins log parser to parse warnings from diffoscope [] and Mattia Rizzolo banned a number of bots and crawlers from the service [][].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via: