Akvorado 2.0 was released today! Akvorado collects network flows with
IPFIX and sFlow. It enriches flows and stores them in a ClickHouse database.
Users can browse the data through a web console. This release introduces an
important architectural change and other smaller improvements. Let s dive in!
New outlet service
The major change in Akvorado 2.0 is splitting the inlet service into two
parts: the inlet and the outlet. Previously, the inlet handled all flow
processing: receiving, decoding, and enrichment. Flows were then sent to Kafka
for storage in ClickHouse:
Akvorado flow processing before the introduction of the outlet service
Network flows reach the inlet service using UDP, an unreliable protocol. The
inlet must process them fast enough to avoid losing packets. To handle a
high number of flows, the inlet spawns several sets of workers to receive flows,
fetch metadata, and assemble enriched flows for Kafka. Many configuration
options existed for scaling, which increased complexity for users. The code
needed to avoid blocking at any cost, making the processing pipeline complex
and sometimes unreliable, particularly the BMP receiver.1 Adding new
features became difficult without making the problem worse.2
In Akvorado 2.0, the inlet receives flows and pushes them to Kafka without
decoding them. The new outlet service handles the remaining tasks:
Akvorado flow processing after the introduction of the outlet service
This change goes beyond a simple split:3 the outlet now reads flows from
Kafka and pushes them to ClickHouse, two tasks that Akvorado did not handle
before. Flows are heavily batched to increase efficiency and reduce the load
on ClickHouse using ch-go, a low-level Go client for ClickHouse. When
batches are too small, asynchronous inserts are used (e20645). The number of
outlet workers scales dynamically (e5a625) based on the target batch
size and latency (50,000 flows and 5 seconds by default).
This new architecture also allows us to simplify and optimize the code. The
outlet fetches metadata synchronously (e20645). The BMP component becomes
simpler by removing cooperative multitasking (3b9486). Reusing the same
RawFlow object to decode protobuf-encoded flows from Kafka reduces pressure on
the garbage collector (8b580f).
The effect on Akvorado s overall performance was somewhat uncertain, but a
user reported 35% lower CPU usage after migrating from the previous
version, plus resolution of the long-standing BMP component issue.
Other changes
This new version includes many miscellaneous changes, such as completion for
source and destination ports (f92d2e), and automatic restart of the
orchestrator service (0f72ff) when configuration changes to avoid a common
pitfall for newcomers.
Let s focus on some key areas for this release: observability,
documentation, CI,
Docker, Go, and JavaScript.
Observability
Akvorado exposes metrics to provide visibility into the processing pipeline and
help troubleshoot issues. These are available through Prometheus HTTP metrics
endpoints, such as /api/v0/inlet/metrics. With the introduction
of the outlet, many metrics moved. Some were also renamed (4c0b15) to match
Prometheus best practices. Kafka consumer lag was added as a new metric
(e3a778).
If you do not have your own observability stack, the Docker Compose setup
shipped with Akvorado provides one. You can enable it by activating the profiles
introduced for this purpose (529a8f).
The prometheus profile ships Prometheus to store metrics and Alloy
to collect them (2b3c46, f81299, and 8eb7cd). Redis and Kafka
metrics are collected through the exporter bundled with Alloy (560113).
Other metrics are exposed using Prometheus metrics endpoints and are
automatically fetched by Alloy with the help of some Docker labels, similar to
what is done to configure Traefik. cAdvisor was also added (83d855) to
provide some container-related metrics.
The loki profile ships Loki to store logs (45c684). While Alloy
can collect and ship logs to Loki, its parsing abilities are limited: I could
not find a way to preserve all metadata associated with structured logs produced
by many applications, including Akvorado. Vector replaces Alloy (95e201)
and features a domain-specific language, VRL, to transform logs. Annoyingly,
Vector currently cannot retrieve Docker logs from before it was
started.
Finally, the grafana profile ships Grafana, but the shipped dashboards are
broken. This is planned for a future version.
Documentation
The Docker Compose setup provided by Akvorado makes it easy to get the web
interface up and running quickly. However, Akvorado requires a few mandatory
steps to be functional. It ships with comprehensive documentation, including
a chapter about troubleshooting problems. I hoped this documentation would
reduce the support burden. It is difficult to know if it works. Happy users
rarely report their success, while some users open discussions asking for help
without reading much of the documentation.
In this release, the documentation was significantly improved.
The documentation was updated (fc1028) to match Akvorado s new architecture.
The troubleshooting section was rewritten (17a272). Instructions on how to
improve ClickHouse performance when upgrading from versions earlier than 1.10.0
was added (5f1e9a). An LLM proofread the entire content (06e3f3).
Developer-focused documentation was also improved (548bbb, e41bae, and
871fc5).
From a usability perspective, table of content sections are now collapsable
(c142e5). Admonitions help draw user attention to important points
(8ac894).
Example of use of admonitions in Akvorado's documentation
Continuous integration
This release includes efforts to speed up continuous integration on GitHub.
Coverage and race tests run in parallel (6af216 and fa9e48). The Docker
image builds during the tests but gets tagged only after they succeed
(8b0dce).
GitHub workflow to test and build Akvorado
End-to-end tests (883e19) ensure the shipped Docker Compose setup works as
expected. Hurl runs tests on various HTTP endpoints, particularly to verify
metrics (42679b and 169fa9). For example:
## Test inlet has received NetFlow flowsGEThttp://127.0.0.1:8080/prometheus/api/v1/query[Query]query:sum(akvorado_inlet_flow_input_udp_packets_total job="akvorado-inlet",listener=":2055" )
HTTP200[Captures]inlet_receivedflows:jsonpath "$.data.result[0].value[1]" toInt
[Asserts]variable"inlet_receivedflows">10## Test inlet has sent them to KafkaGEThttp://127.0.0.1:8080/prometheus/api/v1/query[Query]query:sum(akvorado_inlet_kafka_sent_messages_total job="akvorado-inlet" )
HTTP200[Captures]inlet_sentflows:jsonpath "$.data.result[0].value[1]" toInt
[Asserts]variable"inlet_sentflows">=inlet_receivedflows
Docker
Akvorado ships with a comprehensive Docker Compose setup to help users get
started quickly. It ensures a consistent deployment, eliminating many
configuration-related issues. It also serves as a living documentation of the
complete architecture.
This release brings some small enhancements around Docker:
Previously, many Docker images were pulled from the Bitnami Containers
library. However, VMWare acquired Bitnami in 2019 and Broadcom acquired
VMWare in 2023. As a result, Bitnami images were deprecated in less than a
month. This was not really a surprise4. Previous versions of Akvorado
had already started moving away from them. In this release, the Apache project s
Kafka image replaces the Bitnami one (1eb382). Thanks to the switch to KRaft
mode, Zookeeper is no longer needed (0a2ea1, 8a49ca, and f65d20).
Akvorado s Docker images were previously compiled with Nix. However, building
AArch64 images on x86-64 is slow because it relies on QEMU userland emulation.
The updated Dockerfile uses multi-stage and multi-platform builds: one
stage builds the JavaScript part on the host platform, one stage builds the Go
part cross-compiled on the host platform, and the final stage assembles the
image on top of a slim distroless image (268e95 and d526ca).
# This is a simplified versionFROM--platform=$BUILDPLATFORMnode:20-alpineASbuild-js
RUNapkadd--no-cachemake
WORKDIR/buildCOPYconsole/frontendconsole/frontend
COPYMakefile.
RUNmakeconsole/data/frontend
FROM--platform=$BUILDPLATFORMgolang:alpineASbuild-go
RUNapkadd--no-cachemakecurlzip
WORKDIR/buildCOPY..
COPY--from=build-js/build/console/data/frontendconsole/data/frontend
RUNgomoddownload
RUNmakeall-indep
ARGTARGETOSTARGETARCHTARGETVARIANTVERSION
RUNmake
FROMgcr.io/distroless/static:latestCOPY--from=build-go/build/bin/akvorado/usr/local/bin/akvorado
ENTRYPOINT["/usr/local/bin/akvorado"]
When building for multiple platforms with --platform
linux/amd64,linux/arm64,linux/arm/v7, the build steps until the highlighted
line execute only once for all platforms. This significantly speeds up the
build.
Akvorado now ships Docker images for these platforms: linux/amd64,
linux/amd64/v3, linux/arm64, and linux/arm/v7. When requesting
ghcr.io/akvorado/akvorado, Docker selects the best image for the current CPU.
On x86-64, there are two choices. If your CPU is recent enough, Docker
downloads linux/amd64/v3. This version contains additional optimizations and
should run faster than the linux/amd64 version. It would be interesting to
ship an image for linux/arm64/v8.2, but Docker does not support the same
mechanism for AArch64 yet (792808).
Go
This release includes many changes related to Go but not visible to the users.
Toolchain
In the past, Akvorado supported the two latest Go versions, preventing immediate
use of the latest enhancements. The goal was to allow users of stable
distributions to use Go versions shipped with their distribution to compile
Akvorado. However, this became frustrating when interesting features, like go
tool, were released. Akvorado 2.0 requires Go 1.25 (77306d) but can be
compiled with older toolchains by automatically downloading a newer one
(94fb1c).5 Users can still override GOTOOLCHAIN to revert this
decision. The recommended toolchain updates weekly through CI to ensure we get
the latest minor release (5b11ec). This change also simplifies updates to
newer versions: only go.mod needs updating.
Thanks to this change, Akvorado now uses wg.Go() (77306d) and I have
started converting some unit tests to the new test/synctest package
(bd787e, 7016d8, and 159085).
Testing
When testing equality, I use a helper function Diff() to display the
differences when it fails:
This function uses kylelemons/godebug. This package is
no longer maintained and has some shortcomings: for example, by default, it does
not compare struct private fields, which may cause unexpectedly successful
tests. I replaced it with google/go-cmp, which is stricter
and has better output (e2f1df).
Another package for Kafka
Another change is the switch from Sarama to franz-go to interact with
Kafka (756e4a and 2d26c5). The main motivation for this change is to
get a better concurrency model. Sarama heavily relies on channels and it is
difficult to understand the lifecycle of an object handed to this package.
franz-go uses a more modern approach with callbacks6 that is both more
performant and easier to understand. It also ships with a package to spawn fake
Kafka broker clusters, which is more convenient than the mocking functions
provided by Sarama.
Improved routing table for BMP
To store its routing table, the BMP component used
kentik/patricia, an implementation of a patricia tree
focused on reducing garbage collection pressure.
gaissmai/bart is a more recent alternative using an
adaptation of [Donald Knuth s ART algorithm][] that promises better
performance and delivers it: 90% faster lookups and 27% faster
insertions (92ee2e and fdb65c).
Unlike kentik/patricia, gaissmai/bart does not help efficiently store values
attached to each prefix. I adapted the same approach as kentik/patricia to
store route lists for each prefix: store a 32-bit index for each prefix, and use
it to build a 64-bit index for looking up routes in a map. This leverages Go s
efficient map structure.
gaissmai/bart also supports a lockless routing table version, but this is not
simple because we would need to extend this to the map storing the routes and to
the interning mechanism. I also attempted to use Go s new unique package to
replace the intern package included in Akvorado, but performance was
worse.7
Miscellaneous
Previous versions of Akvorado were using a custom Protobuf encoder for
performance and flexibility. With the introduction of the outlet service,
Akvorado only needs a simple static schema, so this code was removed. However,
it is possible to enhance performance with
planetscale/vtprotobuf (e49a74, and 8b580f).
Moreover, the dependency on protoc, a C++ program, was somewhat annoying.
Therefore, Akvorado now uses buf, written in Go, to convert a Protobuf
schema into Go code (f4c879).
Another small optimization to reduce the size of the Akvorado binary by
10 MB was to compress the static assets embedded in Akvorado in a ZIP file. It
includes the ASN database, as well as the SVG images for the documentation. A
small layer of code makes this change transparent (b1d638 and e69b91).
JavaScript
Recently, two large supply-chain attacks hit the JavaScript ecosystem: one
affecting the popular packages chalk and debug and another
impacting the popular package @ctrl/tinycolor. These attacks also
exist in other ecosystems, but JavaScript is a prime target due to heavy use of
small third-party dependencies. The previous version of Akvorado relied on 653
dependencies.
npm-run-all was removed (3424e8, 132 dependencies). patch-package was
removed (625805 and e85ff0, 69 dependencies) by moving missing
TypeScript definitions to env.d.ts. eslint was replaced with oxlint, a
linter written in Rust (97fd8c, 125 dependencies, including the plugins).
I switched from npm to Pnpm, an alternative package manager (fce383).
Pnpm does not run install scripts by default8 and prevents installing
packages that are too recent. It is also significantly faster.9 Node.js
does not ship Pnpm but it ships Corepack, which allows us to use Pnpm
without installing it. Pnpm can also list licenses used by each dependency,
removing the need for license-compliance (a35ca8, 42 dependencies).
For additional speed improvements, beyond switching to Pnpm and Oxlint, Vite
was replaced with its faster Rolldown version (463827).
After these changes, Akvorado only pulls 225 dependencies.
Next steps
I would like to land three features in the next version of Akvorado:
Add Grafana dashboards to complete the observability stack. See issue
#1906 for details.
Integrate OVH s Grafana plugin by providing a stable API for such
integrations. Akvorado s web console would still be useful for browsing
results, but if you want to build and share dashboards, you should switch to
Grafana. See issue #1895.
Move some work currently done in ClickHouse (custom dictionaries, GeoIP and IP
enrichment) back into the outlet service. This should give more flexibility
for adding features like the one requested in issue #1030. See issue #2006.
I started working on splitting the inlet into two parts more than one year ago.
I found more motivation in recent months, partly thanks to Claude Code,
which I used as a rubber duck. Almost none of the produced code was
kept:10 it is like an intern who does not learn.
Many attempts were made to make the BMP component both performant and
not blocking. See for example PR #254, PR #255, and PR #278.
Despite these efforts, this component remained problematic for most users.
See issue #1461 as an example.
Some features have been pushed to ClickHouse to avoid the
processing cost in the inlet. See for example PR #1059.
Broadcom is known for its user-hostile moves. Look at what happened
with VMWare.
As a Debian developer, I dislike these mechanisms that circumvent
the distribution package manager. The final straw came when Go 1.25 spent one month in the Debian NEW queue, an arbitrary mechanism I
don t like at all.
In the early years of Go, channels were heavily promoted. Sarama
was designed during this period. A few years later, a more nuanced approach
emerged. See notably Go channels are bad and you should feel bad.
This should be investigated further, but my theory is that the
intern package uses 32-bit integers, while unique uses 64-bit pointers.
See commit 74e5ac.
This is also possible with npm. See commit dab2f7.
An even faster alternative is Bun, but it is less available.
The exceptions are part of the code for the admonition blocks,
the code for collapsing the table of content, and part of the documentation.
A brand new release 0.1.14 of the RcppSimdJson
package is now on CRAN.
RcppSimdJson
wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via
very clever algorithmic engineering to obtain largely branch-free code,
coupled with modern C++ and newer compiler instructions, it results in
parsing gigabytes of JSON parsed per second which is quite
mindboggling. The best-case performance is faster than CPU speed as
use of parallel SIMD instructions and careful branch avoidance can lead
to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire
at QCon.
This version includes the new major
upstream release 4.0.0 with major new features including a builder
for creating JSON from the C++ side objects. This is something a little
orthogonal to the standard R usage of the package to parse and load JSON
data but could still be of interest to some.
The short NEWS entry for this release follows.
Changes in version 0.1.14
(2025-09-13)
simdjson was upgraded to version 4.0.0
(Dirk in #96
Continuous integration now relies a token for codecov.io
My trip to pgday.at started Wednesday at the airport in D sseldorf. I was there on time, and the plane started with an estimated flight time of about 90 minutes. About half an hour into the flight, the captain announced that we would be landing in 30 minutes - in D sseldorf, because of some unspecified technical problems. Three hours after the original departure time, the plane made another attempt, and we made it to Vienna.
On the plane I had already met Dirk Krautschick who had the great honor of bringing Slonik (in the form of a big extra bag) to the conference, and we took a taxi to the hotel. On the taxi, the next surprise happened: Hans-J rgen Sch nig unfortunately couldn't make it to the conference, and his talks had to be replaced. I had submitted a talk to the conference, but it was not accepted, and neither queued on the reserve list. But two speakers on the reserve list had cancelled, and another was already giving a talk in parallel to the slot that had to be filled, so Pavlo messaged me if I could hold the talk - well of course I could. Before, I didn't have any specific plans for the evening yet, but suddenly I was a speaker, so I joined the folks going to the speakers dinner at the Wiener Grill Haus two corners from the hotel. It was a very nice evening, chatting with a lot of folks from the PostgreSQL community that I had not seen for a while.
Thursday was the conference day. The hotel was a short walk from the venue, the Apothekertrakt in Vienna's Schloss Sch nbrunn. The courtyard was already filled with visitors registering for the conference. Since I originally didn't have a talk scheduled, I had signed up to volunteer for a shift as room host. We got our badge and swag bag, and I changed into the "crew" T-shirt.
The opening and sponsor keynotes took place in the main room, the Orangerie. We were over 100 people in the room, but apparently still not enough to really fill it, so the acoustics with some echo made it a bit difficult to understand everything. I hope that part can be improved for next time (which is planned to happen!).
I was host for the Maximilian room, where the sponsor sessions were scheduled in the morning. The first talk was by our Peter Hofer, also replacing the absent Hans. He had only joined the company at the beginning of the same week, and was already tasked to give Hans' talk on PostgreSQL as Open Source. Of course he did well.
Next was Tanmay Sinha from Readyset. They are building a system that caches expensive SQL queries and selectively invalidates the cache whenever any data used by these queries changes. Whenever actually fixing the application isn't feasible, that system looks like an interesting alternative to manually maintaining materialized views, or perhaps using pg_ivm.
After lunch, I went to Federico Campoli's Mastering Index Performance, but really spent the time polishing the slides for my talk. I had given the original version at pgconf.de in Berlin in May, and the slides were still in German, so I had to do some translating. Luckily, most slides are just git commit messages, so the effort was manageable.
The next slot was mine, talking about Modern VACUUM. I started with a recap of MVCC, vacuum and freezing in PostgreSQL, and then showed how over the past years, the system was updated to be more focused (the PostgreSQL 8.4 visibility map tells vacuum which pages to visit), faster (12 made autovacuum run 10 times faster by default), less scary (14 has an emergency mode where freezing switches to maximum speed if it runs out of time; 16 makes freezing create much less WAL) and more performant (17 makes vacuum use much less memory). In summary, there is still room for the DBA to tune some knobs (for example, the default autovacuum_max_workers=3 isn't much), but the vacuum default settings are pretty much okay these days for average workloads. Specific workloads still have a whopping 31 postgresql.conf settings at their disposal just for vacuum.
Right after my talk, there was another vacuum talk: When Autovacuum Met FinOps by Mayuresh Bagayatkar. He added practical advice on tuning the performance in cloud environments. Luckily, our contents did not overlap.
After the coffee break, I was again room host, now for Floor Drees and Contributing to Postgres beyond code. She presented the various ways in which PostgreSQL is more than just the code in the Git repository: translators, web site, system administration, conference organizers, speakers, bloggers, advocates. As a member of the PostgreSQL Contributors Committee, I could only approve and we should closer cooperate in the future to make people's contributions to PostgreSQL more visible and give them the recognition they deserve.
That was already the end of the main talks and everyone rushed to the Orangerie for the lightning talks. My highlight was the Sheldrick Wildlife Trust. Tickets for the conference had included the option to donate for the elephants in Kenya, and the talk presented the trust's work in the elephant orphanage there.
After the conference had officially closed, there was a bonus track: the Celebrity DB Deathmatch, aptly presented by Boriss Mejias. PostgreSQL, MongoDB, CloudDB and Oracle were competing for the grace of a developer. MongoDB couldn't stand the JSON workload, CloudDB was dismissed for handing out new invoices all the time, and Oracle had even brought a lawyer to the stage, but then lost control over a literally 10 meter long contract with too much fine print. In the end, PostgreSQL (played by Floor) won the love of the developer (played by our Svitlana Lytvynenko).
The day closed with a gathering at the Brandauer Schlossbr u - just at the other end of the castle ground, but still a 15min walk away. We enjoyed good beer and Kaiserschmarrn. I went back to the hotel a bit before midnight, but some extended that time quite some bit more.
On Friday, my flight back was only in the afternoon, so I spent some time in morning in the Technikmuseum just next to the hotel, enjoying some old steam engines and a live demonstration of Tesla coils. This time, the flight actually went to the destination, and I was back in D sseldorf in the late afternoon.
In summary, pgday.at was a very nice event in a classy location. Thanks to the organizers for putting in all the work - and next year, Hans will hopefully be present in person!
The post A Trip To Vienna With Surprises appeared first on CYBERTEC PostgreSQL Services & Support.
Release 0.2.9 of our RcppSMC package arrived at
CRAN today. RcppSMC
provides Rcpp-based bindings to R for the Sequential Monte Carlo
Template Classes (SMCTC) by Adam
Johansen described in his JSS article. Sequential
Monte Carlo is also referred to as Particle Filter
in some contexts. The package now also features the Google Summer of Code
work by Leah South
in 2017, and by Ilya Zarubin in
2021.
This release is again entirely internal. It updates the code for the
just-released RcppArmadillo
15.0.2-1, in particular opts into Armadillo 15.0.2. And it makes one
small tweak to the continuous integration setup switching to the
r-ci action.
The release is summarized below.
Changes in RcppSMC
version 0.2.9 (2025-09-09)
Adjust to RcppArmadillo 15.0.* by
setting ARMA_USE_CURRENT and updating two expressions from
deprecated code
Rely on r-ci GitHub Action which includes the bootstrap
step
Welcome to the August 2025 report from the Reproducible Builds project!
Welcome to the latest report from the Reproducible Builds project for August 2025. These monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
In this report:
Reproducible Builds Summit 2025
Please join us at the upcoming Reproducible Builds Summit, set to take place from October 28th 30th 2025 in Vienna, Austria!**
We are thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort.
During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving.
If you re interesting in joining us this year, please make sure to read the event page which has more details about the event and location. Registration is open until 20th September 2025, and we are very much looking forward to seeing many readers of these reports there!
Reproducible Builds and live-bootstrap at WHY2025
WHY2025 (What Hackers Yearn) is a nonprofit outdoors hacker camp that takes place in Geestmerambacht in the Netherlands (approximately 40km north of Amsterdam). The event is organised for and by volunteers from the worldwide hacker community, and knowledge sharing, technological advancement, experimentation, connecting with your hacker peers, forging friendships and hacking are at the core of this event .
At this year s event, Frans Faase gave a talk on live-bootstrap, an attempt to provide a reproducible, automatic, complete end-to-end bootstrap from a minimal number of binary seeds to a supported fully functioning operating system .
Frans talk is available to watch on video and his slides are available as well.
DALEQ Explainable Equivalence for Java Bytecode
Jens Dietrich of the Victoria University of Wellington, New Zealand and Behnaz Hassanshahi of Oracle Labs, Australia published an article this month entitled DALEQ Explainable Equivalence for Java Bytecode which explores the options and difficulties when Java binaries are not identical despite being from the same sources, and what avenues are available for proving equivalence despite the lack of bitwise correlation:
[Java] binaries are often not bitwise identical; however, in most cases, the differences can be attributed to variations in the build environment, and the binaries can still be considered equivalent. Establishing such equivalence, however, is a labor-intensive and error-prone process.
Jens and Behnaz therefore propose a tool called DALEQ, which:
disassembles Java byte code into a relational database, and can normalise this database by applying Datalog rules. Those databases can then be used to infer equivalence between two classes. Notably, equivalence statements are accompanied with Datalog proofs recording the normalisation process. We demonstrate the impact of DALEQ in an industrial context through a large-scale evaluation involving 2,714 pairs of jars, comprising 265,690 class pairs. In this evaluation, DALEQ is compared to two existing bytecode transformation tools. Our findings reveal a significant reduction in the manual effort required to assess non-bitwise equivalent artifacts, which would otherwise demand intensive human inspection. Furthermore, the results show that DALEQ outperforms existing tools by identifying more artifacts rebuilt from the same code as equivalent, even when no behavioral differences are present.
Reproducibility regression identifies issue with AppArmor security policies
Tails developer intrigeri has tracked and followed a reproducibility regression in the generation of AppArmor policy caches, and has identified an issue with the 4.1.0 version of AppArmor.
Although initially tracked on the Tails issue tracker, intrigeri filed an issue on the upstream bug tracker. AppArmor developer John Johansen replied, confirming that they can reproduce the issue and went to work on a draft patch. Through this, John revealed that it was caused by an actual underlying security bug in AppArmor that is to say, it resulted in permissions not (always) matching what the policy intends and, crucially, not merely a cache reproducibility issue.
Work on the fix is ongoing at time of writing.
Rust toolchain fixes
Rust Clippy is a linting tool for the Rust programming language. It provides a collection of lints (rules) designed to identify common mistakes, stylistic issues, potential performance problems and unidiomatic code patterns in Rust projects. This month, however, Sosth ne Gu don filed a new issue in the GitHub requesting a new check that would lint against non deterministic operations in proc-macros, such as iterating over a HashMap .
Dropping support for the armhf architecture. From July 2015, Vagrant Cascadian has been hosting a zoo of approximately 35 armhf systems which were used for building Debian packages for that architecture.
Holger Levsen also uploaded strip-nondeterminism, our program that improves reproducibility by stripping out non-deterministic information such as timestamps or other elements introduced during packaging. This new version, 1.14.2-1, adds some metadata to aid the deputy tool. ( #1111947)
Lastly, Bernhard M. Wiedemann posted another openSUSEmonthly update for their work there.
diffoscopediffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions, 303, 304 and 305 to Debian:
Improvements:
Use sed(1) backreferences when generating debian/tests/control to avoid duplicating ourselves. []
Move from a mono-utils dependency to versioned mono-devel mono-utils dependency, taking care to maintain the [!riscv64] architecture restriction. []
Use sed over awk to avoid mangling dependency lines containing = (equals) symbols such as version restrictions. []
Bug fixes:
Fix a test after the upload of systemd-ukify version 258~rc3. []
Ensure that Java class files are named .class on the filesystem before passing them to javap(1). []
Do not run jsondiff on files over 100KiB as the algorithm runs in O(n^2) time. []
Don t check for PyPDF version 3 specifically; check for >= 3. []
Misc:
Update copyright years. [][]
In addition, Martin Joerg fixed an issue with the HTML presenter to avoid crash when page limit is None [] and Zbigniew J drzejewski-Szmek fixed compatibility with RPM 6 []. Lastly, John Sirois fixed a missing requests dependency in the trydiffoscope tool. []
Website updates
Once again, there were a number of improvements made to our website this month including:
Chris Lamb:
Write and publish a news entry for the upcoming summit. []
Add some assets used at FOSSY, such as the badges and the paper handouts. []
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In August, however, a number of changes were made by Holger Levsen, including:
Ignore that the megacli RAID controller requires packages from Debian bookworm. []
In addition,
James Addison migrated away from deprecated toplevel deb822 Python module in favour of debian.deb822 in the bin/reproducible_scheduler.py script [] and removed a note on reproduce.debian.net note after the release of Debian trixie [].
Jochen Sprickerhof made a huge number of improvements to the reproduce.debian.net statistics calculation [][][][][][] as well as to the reproduce.debian.net service more generally [][][][][][][][].
Mattia Rizzolo performed a lot of work migrating scripts to SQLAlchemy version 2.0 [][][][][][] in addition to making some changes to the way openSUSE reproducibility tests are handled internally. []
Lastly, Roland Clobus updated the Debian Live packages after the release of Debian trixie. [][]
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
The Release of Debian 13 ("Trixie") last month marked another milestone on the
effort to provide automated test support for Debian packages in their installed
form. We have achieved the mark of 57% of the source packages in the archive
declaring support for autopkgtest.
Release
Packages with tests
Total number of packages
% of packages with tests
wheezy
5
17175
0%
jessie
1112
20596
5%
stretch
5110
24845
20%
buster
9966
28501
34%
bullseye
13949
30943
45%
bookworm
17868
34341
52%
trixie
21527
37670
57%
The code that generated this table is provided at the bottom.
The growth rate has been consistently decreasing at each release after stretch.
That probably means that the low hanging fruit -- adding support en masse for
large numbers of similar packages, such as team-maintained packages for a given
programming language -- has been picked, and from now on the work gets slightly
harder. Perhaps there is a significant long tail of packages that will never
get autopkgtest support.
Looking for common prefixes among the packages missing a Testsuite: field
gives me us the largest groups of packages missing autopkgtest support:
There seems to be a fair amount of Haskell and Python. If someone could figure
out a way of testing installed fonts in a meaningful way, this would a be a
good niche where we can cover 300+ packages.
There is a another analysis that can be made, which I didn't: which percentage
of new packages introduced in a given release have declared autopkgtest
support, compared with the total of new packages in that release? My data only
counts the totals, so we start with the technical debt of the almost all of the
17,000 packages with no tests in wheezy, which was the stable at the time I
started Debian CI. How many of those got tests since
then?
Note that not supporting autopkgtest does not mean that a package is not
tested at all: it can run build-time tests, which are also useful. Not
supporting autopkgtest, though, means that their binaries in the archive
can't be automatically tested in their installed, but then there is a entire
horde of volunteers running testing and unstable on a daily basis who test
Debian and report bugs.
This is the script that produced the table in the beginning of this post:
#!/bin/shset-eu
extract()local release
local url
release="$1"url="$2"if[!-f"$ release"];then
rm-f"$ release.gz"
curl --silent-o$ release.gz "$ url"gunzip"$ release.gz"fi
local with_tests
local total
with_tests="$(grep-dctrl -c-F Testsuite --regex.$release)"total="$(grep-dctrl -c-F Package --regex.$release)"echo" $ release$ with_tests$ total$((100*with_tests/total))% "echo" **Release** **Packages with tests** **Total number of packages** **% of packages with tests** "echo" ------------- ------------------------- ------------------------------ ------------------------------ "for release in wheezy jessie stretch buster;do
extract "$ release""http://archive.debian.org/debian/dists/$ release/main/source/Sources.gz"done
for release in bullseye bookworm trixie;do
extract "$ release""http://ftp.br.debian.org/debian/dists/$ release/main/source/Sources.gz"done
About 95% of my Debian contributions this month were
sponsored by Freexian.
You can also support my work directly via
Liberapay or GitHub
Sponsors.
Python team
forky is
open!
As a result I m starting to think about the upcoming Python
3.14. At some point we ll doubtless do
a full test rebuild, but in advance of that I concluded that one of the most
useful things I could do would be to work on our very long list of packages
with new upstream
versions.
Of course there s no real chance of this ever becoming empty since upstream
maintainers aren t going to stop work for that long, but there are a lot of
packages there where we re quite a long way out of date, and many of those
include fixes that we ll need for 3.14, either directly or by fixing
interactions with new versions of other packages that in turn will need to
be fixed. We can backport changes when we need to, but more often than not
the most efficient way to do things is just to keep up to date.
So, I upgraded these packages to new upstream versions (deep breath):
That s only about 10% of the backlog, but of course others are working on
this too. If we can keep this up for a while then it should help.
I packaged pytest-run-parallel,
pytest-unmagic (still in NEW), and
python-forbiddenfruit (still in NEW),
all needed as new dependencies of various other packages.
setuptools upstream will be removing the setup.py install
command on 31
October. While this may not trickle down immediately into Debian, it does
mean that in the near future nearly all Python packages will have to use
pybuild-plugin-pyproject (note that this does not mean that they
necessarily have to use pyproject.toml; this is just a question of how the
packaging runs the build system). We talked about this a bit at DebConf,
and I said that I d noticed a number of packages where this isn t
straightforward and promised to write up some notes. I wrote the
Python/PybuildPluginPyproject
wiki page for this; I expect to add more bits and pieces to it as I find them.
On that note, I converted several packages to pybuild-plugin-pyproject:
I reviewed Debian defaults: nftables as banaction and systemd as
backend,
but it looked as though nothing actually needed to be changed so we closed
this with no action.
Rust team
Upgrading Pydantic was complicated, and required a rust-pyo3 transition
(which Jelmer Vernoo started and Peter Michael Green has mostly been
driving, thankfully), packaging rust-malloc-size-of (including an upstream
portability fix), and
upgrading several packages to new upstream versions:
As per the previous Polkit blog post the policykit framwork has lost the ability to understand its own .pkla files and policies need to be expressed in Javascript with .rules files now.
To re-enable allowing remote users (think ssh) to reboot, hibernate, suspend or power off the local system, create a 10-shutdown-reboot.rules file in /etc/polkit-1/rules.d/:
The diffoscope maintainers are pleased to announce the release of diffoscope
version 304. This version includes the following changes:
[ Chris Lamb ]
* Do not run jsondiff on files over 100KiB as the algorithm runs in O(n^2)
time. (Closes: reproducible-builds/diffoscope#414)
* Fix test after the upload of systemd-ukify 258~rc3 (vs. 258~rc2).
* Move from a mono-utils dependency to versioned "mono-devel mono-utils"
dependency, taking care to maintain the [!riscv64] architecture
restriction. (Closes: #1111742)
* Use sed -ne over awk -F= to to avoid mangling dependency lines containing
equals signs (=), for example version restrictions.
* Use sed backreferences when generating debian/tests/control to avoid DRY
violations.
* Update copyright years.
[ Martin Joerg ]
* Avoid a crash in the HTML presenter when page limit is None.
Utkarsh Gupta
did 15.0h (out of 1.0h assigned and 14.0h from previous period).
Evolution of the situation
In July, we released 24 DLAs.
Notable security updates:
angular.js, prepared by Bastien Roucari s, fixes multiple vulnerabilities including input sanitization and potential regular expression denial of service (ReDoS)
tomcat9, prepared by Markus Koschany, fixes an assortment of vulnerabilities
mediawiki, prepared by Guilhem Moulin, fixes several information disclosure and privilege escalation vulnerabilities
php7.4, prepared by Guilhem Moulin, fixes several server side request forgery and denial of service vulnerabilities
This month s contributions from outside the regular team include an update to thunderbird, prepared by Christoph Goehre (the package maintainer).
LTS Team members also contributed updates of the following packages:
commons-beanutils (to stable and unstable), prepared by Adrian Bunk
djvulibre (to oldstable, stable, and unstable), prepared by Adrian Bunk
git (to stable), prepared by Adrian Bunk
redis (to oldstable), prepared by Chris Lamb
libxml2 (to oldstable), prepared by Guilhem Moulin
commons-vfs (to oldstable), prepared by Daniel Leidert
Additionally, LTS Team member Santiago Ruano Rinc n proposed and implemented an improvement to the debian-security-support package. This package is available so that interested users can quickly determine if any installed packages are subject to limited security support or are excluded entirely from security support. However, there was not previously a way to identify explicitly supported packages, which has become necessary to note exceptions to broad exclusion policies (e.g., those which apply to substantial package groups, like modules belonging to the Go and Rust language ecosystems). Santiago s work has enabled the notation of exceptions to these exclusions, thus ensuring that users of debian-security-support have accurate status information concerning installed packages.
DebCamp 25 Security Tracker Sprint
The previously announced security tracker sprint took place at DebCamp from 7-13 July. Participants included 8 members of the standing LTS Team, 2 active Debian Developers with an interest in LTS, 3 community members, and 1 member of the Debian Security Team (who provided guidance and reviews on proposed changes to the security tracker); participation was a mix of in person at the venue in Brest, France and remote. During the days of the sprint, the team tackled a wide range of bugs and improvements, mostly targeting the security tracker.
The sprint participants worked on the following items:
Continued work (which was in progress prior to the sprint) on improved tooling to support security releases of packages from language ecosystems that rely heavily on static linking
As can be seen from the above list, only a small number of changes were brought to completion during the sprint week itself. Given the very compressed timeframe involved, the broad scope of tasks which were under consideration, and the highly sensitive data managed by the security tracker, this is not entirely unexpected and in no way diminishes the great work done by the sprint participants. The LTS Team would especially like to thank Salvatore Bonaccorso of the Debian Security Team for making himself available throughout the sprint to answer questions, for providing guidance on the work, and for helping the work by reviewing and merging the MRs which were able to merged during the sprint itself.
In the weeks that follow the sprint, the team will continue working towards completing the in progress items.
Thanks to our sponsors
Sponsors that joined recently are in bold.
Debian LTS
This was my hundred-thirty-third month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. During my allocated time I uploaded or worked on:
[DLA 4255-1] audiofile security update of two CVEs related to an integer overflow and a memory leak.
[DLA 4256-1] libetpan security update to fix one CVE related to prevent a null pointer dereference.
[DLA 4257-1] libcaca security update to fix two CVEs related to heap buffer overflows.
[DLA 4258-1] libfastjson security update to fix one CVE related to an out of bounds write.
[#1106867] kmail-account-wizard was marked as accepted
I also continued my work on suricata, which turned out to be more challenging than expected. This month I also did a week of FD duties and attended the monthly LTS/ELTS meeting.
Debian ELTS
This month was the eighty-fourth ELTS month. Unfortunately my allocated hours were far less than expected, so I couldn t do as much work as planned.
Most of the time I spent with FD tasks and I also attended the monthly LTS/ELTS meeting. I further listened to the debusine talks during debconf. On the one hand I would like to use debusine to prepare uploads for embargoed ELTS issues, on the other hand I would like to use debusine to run the version of lintian that is used in the different releases. At the moment some manual steps are involved here and I tried to automate things. Of course like for LTS, I also continued my work on suricata.
Debian Printing
This month I uploaded a new upstream version of:
Guess what, I also started to work on a new version of hplip and intend to upload it in August.
This work is generously funded by Freexian!
Debian Astro
This month I uploaded new upstream versions of:
I also uploaded the new package boinor. This is a fork of poliastro, which was retired by upstream and removed from Debian some months ago. I adopted it and rebranded it at the desire of upstream. boinor is the abbreviation of BOdies IN ORbit and I hope this software is still useful.
Debian Mobcom
Unfortunately I didn t found any time to work on this topic.
misc
On my fight against outdated RFPs, I closed 31 of them in July. Their number is down to 3447 (how can you dare to open new RFPs? :-)). Don t be afraid of them, they don t bite and are happy to be released to a closed state.
FTP master
The peace will soon come to an end, so this month I accepted 87 and rejected 2 packages. The overall number of packages that got accepted was 100.
Welcome to the seventh report from the Reproducible Builds project in 2025. Our monthly reports outline what we ve been up to over the past month, and highlight items of news from elsewhere in the increasingly-important area of software supply-chain security. If you are interested in contributing to the Reproducible Builds project, please see the Contribute page on our website.
In this report:
Reproducible Builds Summit 2025
We are extremely pleased to announce the upcoming Reproducible Builds Summit, set to take place from October 28th 30th 2025 in Vienna, Austria!
We are thrilled to host the eighth edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin, Hamburg and Athens. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort.
During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving.
If you re interesting in joining us this year, please make sure to read the event page which has more details about the event and location. Registration is open until 20th September 2025, and we are very much looking forward to seeing many readers of these reports there!
[Everything] changed earlier this year when reproducible-builds for SLES-16 became an official goal for the product. More people are talking about digital sovereignty and supply-chain security now. [ ] Today, only 9 of 3319 (source) packages have significant problems left (plus 7 with pending fixes), so 99.5% of packages have reproducible builds.
There are numerous policy compliance and regulatory processes being developed that target software development but do they solve actual problems? Does it improve the quality of software? Do Software Bill of Materials (SBOMs) actually give you the information necessary to verify how a given software artifact was built? What is the goal of all these compliance checklists anyways or more importantly, what should the goals be? If a software object is signed, who should be trusted to sign it, and can they be trusted forever?
Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, USA, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2025 website, including the full programme schedule.
Vagrant and Chris also staffed a table, where they will be available to answer any questions about Reproducible Builds and discuss collaborations with other projects.
Automation to derive declarative build definitions for existing PyPI (Python), npm (JS/TS), and Crates.io (Rust) packages.
SLSA Provenance for thousands of packages across our supported ecosystems, meeting SLSA Build Level 3 requirements with no publisher intervention.
Build observability and verification tools that security teams can integrate into their existing vulnerability management workflows.
Infrastructure definitions to allow organizations to easily run their own instances of OSS Rebuild to rebuild, generate, sign, and distribute provenance.
One difference with most projects that aim for bit-for-bit reproducibility, OSS Rebuild aims for a kind of semantic reproducibility:
Through automation and heuristics, we determine a prospective build definition for a target package and rebuild it. We semantically compare the result with the existing upstream artifact, normalizing each one to remove instabilities that cause bit-for-bit comparisons to fail (e.g. archive compression).
The extensive post includes examples about how to access OSS Rebuild attestations using the Go-based command-line interface.
New extension of Python setuptools to support reproducible builds
Wim Jeantine-Glenn has written a PEP 517 Build backend in order to enable reproducible builds when building Python projects that use setuptools.
Called setuptools-reproducible, the project s README file contains the following:
Setuptools can create reproducible wheel archives (.whl) by setting SOURCE_DATE_EPOCH at build time, but setting the env var is insufficient for creating reproducible sdists (.tar.gz). setuptools-reproducible [therefore] wraps the hooks build_sdistbuild_wheel with some modifications to make reproducible builds by default.
diffoscopediffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 301, 302 and 303 to Debian:
Improvements:
Use Difference.from_operation in an attempt to pipeline the output of the extract-vmlinux script, potentially avoiding it all in memory. []
Memoize a number of calls to --version, saving a very large number of external subprocess calls.
Bug fixes:
Don t check for PyPDF version 3 specifically, check for versions greater than 3. []
Ensure that Java class files are named .class on the filesystem before passing them to javap(1). []
Mask stderr from extract-vmlinux script. [][]
Avoid spurious differences in h5dump output caused by exposure of absolute internal extraction paths. (#1108690)
Misc:
Use our_check_output in the ODT comparator. []
Update copyright years. []
In addition:
Siva Mahadevan made a change to use the --print-armap long option when calling nm(1) for wider compatibility. []
Lastly, Chris Lamb added a tmpfs to try.diffoscope.org so that diffoscope has a non-trivial temporary area to unpack archives, etc. []
Elsewhere in our tooling, however, reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, reprotest version 0.7.30 was uploaded to Debian unstable by Holger Levsen, chiefly including a change by Rebecca N. Palmer to not call sudo with the -h flag in order to fix Debian bug #1108550. []
New library to patch system functions for reproducibility
Nicolas Graves has written and published libfate, a simple collection of tiny libraries to patch system functions deterministically using LD_PRELOAD. According to the project s README:
libfate provides deterministic replacements for common non-deterministic system functions that can break reproducible builds. Instead of relying on complex build systems or apps or extensive patching, libfate uses the LD_PRELOAD trick to intercept system calls and return fixed, predictable values.
Describing why he wrote it, Nicolas writes:
I originally used the OpenSUSE dettrace approach to make Emacs reproducible in Guix. But when Guix switch to GCC@14, dettrace stopped working as expected. dettrace is a complex piece of software, my need was much less heavy: I don t need to systematically patch all sources of nondetermism, just the ones that make a process/binary unreproducible in a container/chroot.
One desirable property is that someone else should be able to reproduce the same git bundle, and not only that a single individual is able to reproduce things on one machine. It surprised me to see that when I ran the same set of commands on a different machine (started from a fresh git clone), I got a different checksum. The different checksums occurred even when nothing had been committed on the server side between the two runs.
Website updates
Once again, there were a number of improvements made to our website this month including:
Bernhard M. Wiedemann added an entry related to R-B-OS on the History page. []
Chris Lamb:
Replaced rbtlog run by Fay by rbtlog run by Benl on the Who is involved page. []
Debian contributors have made significant progress toward ensuring package builds produce byte-for-byte reproducible results. You can check the status for packages installed on your system using the new package debian-repro-status, or visit reproduce.debian.net for Debian s overall statistics for trixie and later. You can contribute to these efforts by joining #debian-reproducible on IRC to discuss fixes, or verify the statistics by installing the new rebuilderd package and setting up your own instance.
Reproducibility testing framework
The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In July, however, a number of changes were made by Holger Levsen, including:
Make the dsa-check-packages output more useful. []
Setup the ppc64el architecture again, has it has returned this time with a 2.7 GiB database instead of 72 GiB. []
In addition, Jochen Sprickerhof improved the reproducibility statistics generation:
Enable caching of statistics. [][][]
Add some common non-reproducible patterns. []
Change output to directory. []
Add a page sorted by diffoscope size. [][]
Switch to Python s argparse module and separate output(). []
Holger also submitted a number of Debian bugs against rebuilderd and rebuilderd-worker:
Config files and scripts for a simple one machine setup. [][]
Create a rebuilderd user. []
Create rebuilderd-worker user with sbuild. []
Lastly, Mattia Rizzolo added a scheduled job to renew some SSL certificates [] and Vagrant Cascadian performed some node maintenance [][].
Upstream patches
The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:
Finally, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:
This method is for wayland based systems. There are better ways to do this on GNOME or KDE desktops, but the method we are going to use is independent of DE/WM that you are using. I am doing this on sway window manager, but you can try this on any other Wayland based WM or DE. I have not tried this on Xorg based systems, there are several other guides for Xorg based systems online. When we connect a physical monitor to our laptops, it creates a second display output in our display settings that we can then re-arrange in layout, set resolution, set scale etc. Since we are not connecting via a physical interface like HDMI, DP, VGA etc. We need to create a virtual display within our system and set the display properties manually.
Get a list of current display outputs. You can also just check it in display settings of your DE/WM with wdisplays
rajudev@sanganak ~> swaymsg -t get_outputs
Output LVDS-1 &aposSeiko Epson Corporation 0x3047 Unknown&apos (focused)
Current mode: 1366x768 @ 60.002 Hz
Power: on
Position: 0,0
Scale factor: 1.000000
Scale filter: nearest
Subpixel hinting: rgb
Transform: normal
Workspace: 2
Max render time: off
Adaptive sync: disabled
Allow tearing: no
Available modes:
1366x768 @ 60.002 Hz
Single physical display of the laptopCurrently we are seeing only one display output. Our goal is to create a second virtual display that we will then share on the tablet/phone. To do this there are various tools available. We are using sway-vdctl . It is currently not available within Debian packages, so we need to install it manually.
$ vdctl --help
Usage: vdctl [OPTIONS] <ACTION> [VALUE]
Arguments:
<ACTION>
Possible values:
- create: Create new output based on a preset
- kill: Terminate / unplug an active preset
- list: List out active presets
- next-number: Manually set the next output number, in case something breaks
- sync-number: Sync the next output number using &aposswaymsg -t get_outputs&apos
[VALUE]
Preset name to apply, alternatively a value
[default: ]
Options:
--novnc
do not launch a vnc server, just create the output
-h, --help
Print help (see a summary with &apos-h&apos)
Before creating the virtual display, we need to set it&aposs properties at .config/vdctl/config.json . I am using Xiaomi Pad 6 tablet as my external display. You can adjust the properties according to the device you want to use as a second display. $ (text-editor) .config/vdctl/config.json
In the JSON, you can set the display resolution according to your external device and other configurations. If you want to configure multiple displays, you can add another entry into the presets in the json file. You can refer to example json file into the git repository. Now we need to actually create the virtual monitor.
$ vdctl create pad6
Created output, presumably &aposHEADLESS-1&apos
Set resolution of &aposHEADLESS-1&apos to 2800x1800
Set scale factor of &aposHEADLESS-1&apos to 2
Preset &apospad6&apos (&aposHEADLESS-1&apos: 2800x1800) is now active on port 9901
Now if you will check the display outputs in your display settings or from command line, you will see two different displays.
$ swaymsg -t get_outputs
Output LVDS-1 &aposSeiko Epson Corporation 0x3047 Unknown&apos
Current mode: 1366x768 @ 60.002 Hz
Power: on
Position: 0,0
Scale factor: 1.000000
Scale filter: nearest
Subpixel hinting: rgb
Transform: normal
Workspace: 2
Max render time: off
Adaptive sync: disabled
Allow tearing: no
Available modes:
1366x768 @ 60.002 Hz
Output HEADLESS-1 &aposUnknown Unknown Unknown&apos (focused)
Current mode: 2800x1800 @ 0.000 Hz
Power: on
Position: 1366,0
Scale factor: 2.000000
Scale filter: nearest
Subpixel hinting: unknown
Transform: normal
Workspace: 3
Max render time: off
Adaptive sync: disabled
Allow tearing: no
Also in the display settings. Display settings on Wayland with physical and virtual monitor outputNow we need to make this virtual display available over VNC which we will access with a VNC client on the tablet. To accomplish this I am using wayvnc but you can use any VNC server package. Install wayvnc
$ sudo apt install wayvnc
Now we will serve our virtual display HEADLESS-1 with wayvnc.
$ wayvnc -o HEADLESS-1 0.0.0.0 5900
You can adjust the port number as per your need. The process from laptop side is done. Now install any VNC software on your tablet. I am using AVNC, which is available on F-Droid. In the VNC software interface, add a new connection with the IP address of your laptop and the port started by wayvnc. Remember, both your laptop and phone need to be on the same Wi-Fi network.AVNC interface with the connection details to connect to the virtual monitor.Save and connect. Now you will be able to see a extended display on your tablet. Enjoy working with multiple screens in a portable setup. Till next time.. Have a great time.
(You wait ages for an archiving blog post and two come along at once!)
Between 1969-2019, the Newcastle University School of Computing published a
Technical Reports Series. Until 2017-ish, the full list of
individually-numbered reports was available on the School's website, as well as
full text PDFs for every report.
At some time around 2014 I was responsible for migrating the School's website
from self-managed to centrally-managed. The driver was to improve the website from
the perspective of student recruitment. The TR listings (as well as full
listings and texts for awarded PhD theses, MSc dissertations, Director's
reports and various others) survived the initial move. After I left (as staff)
in 2015, anything not specifically about student recruitment degraded and by
2017 the listings were gone.
I've been trying, on and off, to convince different parts of the University
to restore and take ownership of these lists ever since. For one reason or
another each avenue I've pursued has gone nowhere.
Recently the last remaining promising way forward failed, so I gave up and
did it myself. The list is now hosted by the Historic Computing Committee,
here:
https://nuhc.ncl.ac.uk/computing/techreports/
It's not complete (most of the missing entries are towards the end of the run),
but it's a start. The approach that finally yielded results was simply scraping
the Internet Archive Wayback Machine for various pages from back when the
material was represented on the School website, and then filling in the gaps
from some other sources.
What I envisage in the future: per-page reports with the relevant metadata
(including abstracts); authors de-duplicated and cross-referenced; PDFs OCRd;
providing access to the whole metadata DB (probably as as lump of JSON); a
mechanism for people to report errors; a platform for students to perform data
mining projects: perhaps some kind of classification/tagging by automated
content analysis; cross-referencing copies of papers in other venues (lots of
TRs are pre-prints).
In beginning of July I got my 12" framework
laptop and installed Debian
on it. During that setup I made some updates to my base
setup scripts that I use to
install Debian machines.
Due to the freeze I did not do much package related work. But I was at
DebConf and I uploaded a new
release of labwc to experimental, mostly to test
the tag2upload workflow.
I started working on packaging
wlr-sunclock which
is a small Wayland widget that displays the sun s shadows on the earth. I also
created an ITP for
wayback. Wayback is an X11
compatibility layer to allow to run X11 desktop environments using Wayland.
In my dayjob I did my usual work on
apis-core-rdf, which is our
Django application for managing prosopographic data. I implemented a password
change interface and did some restructuring of the templates. We released a new
version which was followed by a bugfix release a couple of days later.
I also implemented a rather big refactoring in
pfp-api. PFP-API is a
FastAPI based REST API that uses
rdfproxy to fetch data from a
Triplestore, converts the data to Pydantic models and then ships the models as
JSON. Most of the work is done by
rdfproxy in the background, but I
adapted the existing pfp-api code to make it easier to add new entity types.
Here s my 70th monthly but brief update about the activities I ve done in the F/L/OSS world.
Debian
This was my 79th month of actively contributing to Debian.
I became a DM in late March 2019 and a DD on Christmas 19! \o/
Debian was in freeze throughout so whilst I didn t do many uploads, there s a bunch of other things I did:
Attended DebConf25 in Brest, France.
Lead the bursary BOF and discussions.
Participated in other sessions, especially around the FTP masters.
I ve started to look at things with my trainee hat on.
Participated in the Debian Security Tracker sprints during DebCamp. More on that below.
Mentoring for newcomers.
Moderation of -project mailing list.
Ubuntu
This was my 54th month of actively contributing to Ubuntu.
I joined Canonical to work on Ubuntu full-time back in February 2021.
Whilst I can t give a full, detailed list of things I did (there s so much and some of it might not be public yet!), here s a quick TL;DR of what I did:
Got a recognition award for leading 24.04.2 LTS release and leading the Release Management team.
Preparing for the 24.04.3 LTS release early next month.
Debian (E)LTS
Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years. Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success.
And Debian Extended LTS (ELTS) is its sister project, extending support to the stretch and jessie release (+2 years after LTS support).
This was my 70th month as a Debian LTS and 57th month as a Debian ELTS paid contributor.
I only worked for 15.00 hours for LTS and 5.00 hours for ELTS and did the following things:
Coordinated with upstream due to lack of clarity on 1.11.4 being affected & not having a clear reproducer.
As 1.11.4 was still partially vulnerable and the backport was non-trivial, it was probably conveinent to bump the upstream version to 1.11.12 instead, fixing:
Helped Bastien with ca-certificates bit & coordinating with fellow Debian contributors.
Also triaged some newly supported packages for ELTS.
[LTS] Attended the monthly LTS meeting on IRC. Summary here.
Debian Security Tracker sprint 2025
Thanks to the LTS team for also organizing a security tracker sprint during DebCamp25. I attended the sprint and spent 10 hours working on the following tasks:
This includes some coordination with the web team and some back and forth with them to ensure both the Web team and the Security team would be happy with the fix. It also fixes more issues as mentioned in the description of the MR.
Show packages from next-point-release.txt in source package overview
Achieving full disk encryption using FIPS, TCG OPAL and LUKS to encrypt UEFI ESP on bare-metal and in VMs
Many security standards such as CIS and STIG require to protect information at rest. For example, NIST SP 800-53r5 SC-28 advocate to use cryptographic protection, offline storage and TPMs to enhance protection of information confidentiality and/or integrity.Traditionally to satisfy such controls on portable devices such as laptops one would utilize software based Full Disk Encryption - Mac OS X FileVault, Windows Bitlocker, Linux cryptsetup LUKS2. In cases when FIPS cryptography is required, additional burden would be placed onto these systems to operate their kernels in FIPS mode.Trusted Computing Group works on establishing many industry standards and specifications, which are widely adopted to improve safety and security of computing whilst keeping it easy to use. One of their most famous specifications them is TCG TPM 2.0 (Trusted Platform Module). TPMs are now widely available on most devices and help to protect secret keys and attest systems. For example, most software full disk encryption solutions can utilise TCG TPM to store full disk encryption keys providing passwordless, biometric or pin-base ways to unlock the drives as well as attesting that system have not been modified or compromised whilst offline.TCG Storage Security Subsystem Class: Opal Specification is a set of specifications for features of data storage devices. The authors and contributors to OPAL are leading and well trusted storage manufacturers such as Samsung, Western Digital, Seagate Technologies, Dell, Google, Lenovo, IBM, Kioxia, among others. One of the features that Opal Specification enables is self-encrypting drives which becomes very powerful when combined with pre-boot authentication. Out of the box, such drives always and transparently encrypt all disk data using hardware acceleration. To protect data one can enter UEFI firmware setup (BIOS) to set NVMe single user password (or user + administrator/recovery passwords) to encrypt the disk encryption key. If one's firmware didn't come with such features, one can also use SEDutil to inspect and configure all of this. Latest release of major Linux distributions have SEDutil already packaged.Once password is set, on startup, pre-boot authentication will request one to enter password - prior to booting any operating systems. It means that full disk is actually encrypted, including the UEFI ESP and all operating systems that are installed in case of dual or multi-boot installations. This also prevents tampering with ESP, UEFI bootloaders and kernels which with traditional software-based encryption often remain unencrypted and accessible. It also means one doesn't have to do special OS level repartitioning, or installation steps to ensure all data is encrypted at rest.What about FIPS compliance? Well, the good news is that majority of the OPAL compliant hard drives and/or security sub-chips do have FIPS 140-3 certification. Meaning they have been tested by independent laboratories to ensure they do in-fact encrypt data. On the CMVP website one can search for module name terms "OPAL" or "NVMe" or name of hardware vendor to locate FIPS certificates.Are such drives widely available? Yes. For example, a common Thinkpad X1 gen 11 has OPAL NVMe drives as standard, and they have FIPS certification too. Thus, it is likely in your hardware fleet these are already widely available. Use sedutil to check if MediaEncrypt and LockingSupported features are available.Well, this is great for laptops and physical servers, but you may ask - what about public or private cloud? Actually, more or less the same is already in-place in both. On CVMP website all major clouds have their disk encryption hardware certified, and all of them always encrypt all Virtual Machines with FIPS certified cryptography without an ability to opt-out. One is however in full control of how the encryption keys are managed: cloud-provider or self-managed (either with a cloud HSM or KMS or bring your own / external). See these relevant encryption options and key management docs for GCP, Azure, AWS. But the key takeaway without doing anything, at rest, VMs in public cloud are always encrypted and satisfy NIST SP 800-53 controls.What about private cloud? Most Linux based private clouds ultimately use qemu typically with qcow2 virtual disk images. Qemu supports user-space encryption of qcow2 disk, see this manpage. Such encryption encrypts the full virtual machine disk, including the bootloader and ESP. And it is handled entirely outside of the VM on the host - meaning the VM never has access to the disk encryption keys. Qemu implements this encryption entirely in userspace using gnutls, nettle, libgcrypt depending on how it was compiled. This also means one can satisfy FIPS requirements entirely in userspace without a Linux kernel in FIPS mode. Higher level APIs built on top of qemu also support qcow2 disk encryption, as in projects such as libvirt and OpenStack Cinder.If you carefully read the docs, you may notice that agent support is explicitly sometimes called out as not supported or not mentioned. Quite often agents running inside the OS may not have enough observability to them to assess if there is external encryption. It does mean that monitoring above encryption options require different approaches - for example monitor your cloud configuration using tools such as Wiz and Orca, rather than using agents inside individual VMs. For laptop / endpoint security agents, I do wish they would start gaining capability to report OPAL SED availability and status if it is active or not.What about using software encryption none-the-less on top of the above solutions? It is commonly referred to double or multiple encryption. There will be an additional performance impact, but it can be worthwhile. It really depends on what you define as data at rest for yourself and which controls you need. If one has a dual-boot laptop, and wants to keep one OS encrypted whilst booted into the other, it can perfectly reasonable to encrypted the two using separate software encryption keys. In addition to the OPAL encryption of the ESP. For more targeted per-file / per-folder encryption, one can look into using gocryptfs which is the best successor to the once popular, but now deprecated eCryptfs (amazing tool, but has fallen behind in development and can lead to data loss).All of the above mostly talks about cryptographic encryption, which only provides confidentially but not data integrity. To protect integrity, one needs to choose how to maintain that. dm-verity is a good choice for read-only and rigid installations. For read-write workloads, it may be easier to deploy ZFS or Btrfs instead. If one is using filesystems without a built-in integrity support such as XFS or Ext4, one can retrofit integrity layer to them by using dm-integrity (either standalone, or via dm-luks/cryptsetup --integrity option).
If one has a lot of estate and a lot of encryption keys to keep track off a key management solution is likely needed. The most popular solution is likely the one from Thales Group marketed under ChiperTrust Data Security Platform (previously Vormetric), but there are many others including OEM / Vendor / Hardware / Cloud specific or agnostic solutions.
I hope this crash course guide piques your interest to learn and discover modern confidentially and integrity solutions, and to re-affirm or change your existing controls w.r.t. to data protection at rest.
Full disk encryption, including UEFI ESP /boot/efi is now widely achievable by default on both baremetal machines and in VMs including with FIPS certification. To discuss more let's connect on Linkedin.
tl;dr: there is an attack in the wild which is triggering dangerous-but-seemingly-intended behaviour in the Oj JSON parser when used in the default and recommended manner, which can lead to everyone s favourite kind of security problem: object deserialization bugs!
If you have the oj gem anywhere in your Gemfile.lock, the quickest mitigation is to make sure you have Oj.default_options = mode: :strict somewhere, and that no library is overwriting that setting to something else.
Prologue
As a sensible sysadmin, all the sites I run send me a notification if any unhandled exception gets raised.
Mostly, what I get sent is error-handling corner cases I missed, but now and then things get more interesting.
In this case, it was a PG::UndefinedColumn exception, which looked something like this:
PG::UndefinedColumn: ERROR: column "xyzzydeadbeef" does not exist
This is weird on two fronts: firstly, this application has been running for a while, and if there was a schema problem, I d expect it to have made itself apparent long before now.
And secondly, while I don t profess to perfection in my programming, I m usually better at naming my database columns than that.
Something is definitely hinky here, so let s jump into the mystery mobile!
The column name is coming from outside the building!
The exception notifications I get sent include a whole lot of information about the request that caused the exception, including the request body.
In this case, the request body was JSON, and looked like this:
"name":":xyzzydeadbeef", ...
The leading colon looks an awful lot like the syntax for a Ruby symbol, but it s in a JSON string.
Surely there s no way a JSON parser would be turning that into a symbol, right?
Right?!?
Immediately, I thought that that possibly was what was happening, because I use Sequel for my SQL database access needs, and Sequel treats symbols as database column names.
It seemed like too much of a coincidence that a vaguely symbol-shaped string was being sent in, and the exact same name was showing up as a column name.
But how the flying fudgepickles was a JSON string being turned into a Ruby symbol, anyway?
Enter Oj.
Oj? I barely know aj
A long, long time ago, the standard Ruby JSON library had a reputation for being slow.
Thus did many competitors flourish, claiming more features and better performance.
Strong amongst the contenders was oj (for Optimized JSON ), touted as The fastest JSON parser and object serializer .
Given the history, it s not surprising that people who wanted the best possible performance turned to Oj, leading to it being found in a great many projects, often as a sub-dependency of a dependency of a dependency (which is how it ended up in my project).
You might have noticed in Oj s description that, in addition to claiming fastest , it also describes itself as an object serializer .
Anyone who has kept an eye on the security bug landscape will recall that object deserialization is a rich vein of vulnerabilities to mine.
Libraries that do object deserialization, especially ones with a history that goes back to before the vulnerability class was well-understood, are likely to be trouble magnets.
And thus, it turns out to be with Oj.
By default, Oj will happily turn any string that starts with a colon into a symbol:
How that gets exploited is only limited by the creativity of an attacker.
Which I ll talk about more shortly but first, a word from my rant cortex.
Insecure By Default is a Cancer
While the object of my ire today is Oj and its fast-and-loose approach to deserialization, it is just one example of a pervasive problem in software: insecurity by default.
Whether it s a database listening on 0.0.0.0 with no password as soon as its installed, or a library whose default behaviour is to permit arbitrary code execution, it all contributes to a software ecosystem that is an appalling security nightmare.
When a user (in this case, a developer who wants to parse JSON) comes across a new piece of software, they have by definition no idea what they re doing with that software.
They re going to use the defaults, and follow the most easily-available documentation, to achieve their goal.
It is unrealistic to assume that a new user of a piece of software is going to do things the right way , unless that right way is the only way, or at least the by-far-the-easiest way.
Conversely, the developer(s) of the software is/are the domain experts.
They have knowledge of the problem domain, through their exploration while building the software, and unrivalled expertise in the codebase.
Given this disparity in knowledge, it is tantamount to malpractice for the experts the developer(s) to off-load the responsibility for the safe and secure use of the software to the party that has the least knowledge of how to do that (the new user).
To apply this general principle to the specific case, take the Using section of the Oj README.
The example code there calls Oj.load, with no indication that this code will, in fact, parse specially-crafted JSON documents into Ruby objects.
The brand-user user of the library, no doubt being under pressure to Get Things Done, is almost certainly going to look at this Using example, get the apparent result they were after (a parsed JSON document), and call it a day.
It is unlikely that a brand-new user will, for instance, scroll down to the Further Reading section, find the second last (of ten) listed documents, Security.md , and carefully peruse it.
If they do, they ll find an oblique suggestion that parsing untrusted input is never a good idea .
While that s true, it s also rather unhelpful, because I d wager that by far the majority of JSON parsed in the world is untrusted , in one way or another, given the predominance of JSON as a format for serializing data passing over the Internet.
This guidance is roughly akin to putting a label on a car s airbags that driving at speed can be hazardous to your health : true, but unhelpful under the circumstances.
The solution is for default behaviours to be secure, and any deviation from that default that has the potential to degrade security must, at the very least, be clearly labelled as such.
For example, the Oj.load function should be named Oj.unsafe_load, and the Oj.load function should behave as the Oj.safe_load function does presently.
By naming the unsafe function as explicitly unsafe, developers (and reviewers) have at least a fighting chance of recognising they re doing something risky.
We put warning labels on just about everything in the real world; the same should be true of dangerous function calls.
OK, rant over.
Back to the story.
But how is this exploitable?
So far, I ve hopefully made it clear that Oj does some Weird Stuff with parsing certain JSON strings.
It caused an unhandled exception in a web application I run, which isn t cool, but apart from bombing me with exception notifications, what s the harm?
For starters, let s look at our original example: when presented with a symbol, Sequel will interpret that as a column name, rather than a string value.
Thus, if our save an update to the user code looked like this:
# request_body has the JSON representation of the form being submitted
body = Oj.load(request_body)
DB[:users].where(id: user_id).update(name: body["name"])
In normal operation, this will issue an SQL query along the lines of UPDATE users SET name='Jaime' WHERE id=42.
If the name given is Jaime O Dowd , all is still good, because Sequel quotes string values, etc etc.
All s well so far.
But, imagine there is a column in the users table that normally users cannot read, perhaps admin_notes.
Or perhaps an attacker has gotten temporary access to an account, and wants to dump the user s password hash for offline cracking.
So, they send an update claiming that their name is :admin_notes (or :password_hash).
In JSON, that ll look like "name":":admin_notes" , and Oj.load will happily turn that into a Ruby object of "name"=>:admin_notes .
When run through the above update the user code fragment, it ll produce the SQL UPDATE users SET name=admin_notes WHERE id=42.
In other words, it ll copy the contents of the admin_notes column into the name column which the attacker can then read out just by refreshing their profile page.
But Wait, There s More!
That an attacker can read other fields in the same table isn t great, but that s barely scratching the surface.
Remember before I said that Oj does object serialization ?
That means that, in general, you can create arbitrary Ruby objects from JSON.
Since objects contain code, it s entirely possible to trigger arbitrary code execution by instantiating an appropriate Ruby object.
I m not going to go into details about how to do this, because it s not really my area of expertise, and many others have covered it in detail.
But rest assured, if an attacker can feed input of their choosing into a default call to Oj.load, they ve been handed remote code execution on a platter.
Mitigations
As Oj s object deserialization is intended and documented behaviour, don t expect a future release to make any of this any safer.
Instead, we need to mitigate the risks.
Here are my recommended steps:
Look in your Gemfile.lock (or SBOM, if that s your thing) to see if the oj gem is anywhere in your codebase.
Remember that even if you don t use it directly, it s popular enough that it is used in a lot of places.
If you find it in your transitive dependency tree anywhere, there s a chance you re vulnerable, limited only by the ingenuity of attackers to feed crafted JSON into a deeply-hidden Oj.load call.
If you depend on oj directly and use it in your project, consider not doing that.
The json gem is acceptably fast, and JSON.parse won t create arbitrary Ruby objects.
If you really, really need to squeeze the last erg of performance out of your JSON parsing, and decide to use oj to do so, find all calls to Oj.load in your code and switch them to call Oj.safe_load.
It is a really, really bad idea to ever use Oj to deserialize JSON into objects, as it lacks the safety features needed to mitigate the worst of the risks of doing so (for example, restricting which classes can be instantiated, as is provided by the permitted_classes argument to Psych.load).
I d make it a priority to move away from using Oj for that, and switch to something somewhat safer (such as the aforementioned Psych).
At the very least, audit and comment heavily to minimise the risk of user-provided input sneaking into those calls somehow, and pass mode: :object as the second argument to Oj.load, to make it explicit that you are opting-in to this far more dangerous behaviour only when it s absolutely necessary.
To secure any unsafe uses of Oj.load in your dependencies, consider setting the default Oj parsing mode to :strict, by putting Oj.default_options = mode: :strict somewhere in your initialization code (and make sure no dependencies are setting it to something else later!).
There is a small chance that this change of default might break something, if a dependency is using Oj to deliberately create Ruby objects from JSON, but the overwhelming likelihood is that Oj s just being used to parse ordinary JSON, and these calls are just RCE vulnerabilities waiting to give you a bad time.
Is Your Bacon Saved?
If I ve helped you identify and fix potential RCE vulnerabilities in your software, or even just opened your eyes to the risks of object deserialization, please help me out by buying me a refreshing beverage.
I would really appreciate any support you can give.
Alternately, if you d like my help in fixing these (and many other) sorts of problems, I m looking for work, so email me.