Search Results: "bas"

12 February 2025

Dirk Eddelbuettel: RcppUUID 1.2.0 on CRAN: Adding Clock-based UUIDs

The RcppUUID package on CRAN has been providing UUIDs (based on the underlying Boost library) for several years. Written by Artem Klemsov and maintained in this gitlab repo, the package is a very nice example of clean and straightforward library binding. As it had dropped off CRAN over a relatively minor issue, I descided to adopted it with the previous 1.1.2 release made quite recently. This release adds new high-resolution clock-based UUIDs accordingt to the v7 spec. Internally 100ns increments are represented. The resulting UUIDs are both unique and sortable. I added this recent example to the README.md which illustrated both the implicit ordering and uniqueness. The unit tests check this with a much larger N.
> RcppUUID::uuid_generate_time(5)
[1] "0194d8fa-7add-735c-805b-6bbf22b78b9e" "0194d8fa-7add-735e-8012-3e0e53895b19"
[3] "0194d8fa-7add-735e-81af-bc67bb435ade" "0194d8fa-7add-735e-82b1-405bf57963ad"
[5] "0194d8fa-7add-735f-801e-efe57078b2e7"
>
While one can revert from the UUID object to the clock object, I am not aware of a text parser so there is currently no inverse function (as ulid offers) for the character representation. The NEWS entry for the two releases follows.

Changes in version 1.2.0 (2025-02-12)
  • Time-based UUIDs, ie version 7, can now be generated (requiring Boost 1.86 or newer as in the current BH package)

Changes in version 1.1.2 (2025-01-31)
  • New maintainer to resurrect package on CRAN

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppUUID page, or the github repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

11 February 2025

Ian Jackson: derive-deftly 1.0.0 - Rust derive macros, the easy way

derive-deftly 1.0 is released. derive-deftly is a template-based derive-macro facility for Rust. It has been a great success. Your codebase may benefit from it too! Rust programmers will appreciate its power, flexibility, and consistency, compared to macro_rules; and its convenience and simplicity, compared to proc macros. Programmers coming to Rust from scripting languages will appreciate derive-deftly s convenient automatic code generation, which works as a kind of compile-time introspection. Rust s two main macro systems I m often a fan of metaprogramming, including macros. They can help remove duplication and flab, which are often the enemy of correctness. Rust has two macro systems. derive-deftly offers much of the power of the more advanced (proc_macros), while beating the simpler one (macro_rules) at its own game for ease of use. (Side note: Rust has at least three other ways to do metaprogramming: generics; build.rs; and, multiple module inclusion via #[path=]. These are beyond the scope of this blog post.) macro_rules! macro_rules! aka pattern macros , declarative macros , or sometimes macros by example are the simpler kind of Rust macro. They involve writing a sort-of-BNF pattern-matcher, and a template which is then expanded with substitutions from the actual input. If your macro wants to accept comma-separated lists, or other simple kinds of input, this is OK. But often we want to emulate a #[derive(...)] macro: e.g., to define code based on a struct, handling each field. Doing that with macro_rules is very awkward: macro_rules! s pattern language doesn t have a cooked way to match a data structure, so you have to hand-write a matcher for Rust syntax, in each macro. Writing such a matcher is very hard in the general case, because macro_rules lacks features for matching important parts of Rust syntax (notably, generics). (If you really need to, there s a horrible technique as a workaround.) And, the invocation syntax for the macro is awkward: you must enclose the whole of the struct in my_macro! . This makes it hard to apply more than one macro to the same struct, and produces rightward drift. Enclosing the struct this way means the macro must reproduce its input - so it can have bugs where it mangles the input, perhaps subtly. This also means the reader cannot be sure precisely whether the macro modifies the struct itself. In Rust, the types and data structures are often the key places to go to understand a program, so this is a significant downside. macro_rules also has various other weird deficiencies too specific to list here. Overall, compared to (say) the C preprocessor, it s great, but programmers used to the power of Lisp macros, or (say) metaprogramming in Tcl, will quickly become frustrated. proc macros Rust s second macro system is much more advanced. It is a fully general system for processing and rewriting code. The macro s implementation is Rust code, which takes the macro s input as arguments, in the form of Rust tokens, and returns Rust tokens to be inserted into the actual program. This approach is more similar to Common Lisp s macros than to most other programming languages macros systems. It is extremely powerful, and is used to implement many very widely used and powerful facilities. In particular, proc macros can be applied to data structures with #[derive(...)]. The macro receives the data structure, in the form of Rust tokens, and returns the code for the new implementations, functions etc. This is used very heavily in the standard library for basic features like #[derive(Debug)] and Clone, and for important libraries like serde and strum. But, it is a complete pain in the backside to write and maintain a proc_macro. The Rust types and functions you deal with in your macro are very low level. You must manually handle every possible case, with runtime conditions and pattern-matching. Error handling and recovery is so nontrivial there are macro-writing libraries and even more macros to help. Unlike a Lisp codewalker, a Rust proc macro must deal with Rust s highly complex syntax. You will probably end up dealing with syn, which is a complete Rust parsing library, separate from the compiler; syn is capable and comprehensive, but a proc macro must still contain a lot of often-intricate code. There are build/execution environment problems. The proc_macro code can t live with your application; you have to put the proc macros in a separate cargo package, complicating your build arrangements. The proc macro package environment is weird: you can t test it separately, without jumping through hoops. Debugging can be awkward. Proper tests can only realistically be done with the help of complex additional tools, and will involve a pinned version of Nightly Rust. derive-deftly to the rescue derive-deftly lets you use a write a #[derive(...)] macro, driven by a data structure, without wading into any of that stuff. Your macro definition is a template in a simple syntax, with predefined $-substitutions for the various parts of the input data structure. Example Here s a real-world example from a personal project:
define_derive_deftly!  
    export UpdateWorkerReport:
    impl $ttype  
        pub fn update_worker_report(&self, wr: &mut WorkerReport)  
            $(
                $ when fmeta(worker_report) 
                wr.$fname = Some(self.$fname.clone()).into();
            )
         
     
 
#[derive(Debug, Deftly, Clone)]
...
#[derive_deftly(UiMap, UpdateWorkerReport)]
pub struct JobRow  
    ...
    #[deftly(worker_report)]
    pub status: JobStatus,
    pub processing: NoneIsEmpty<ProcessingInfo>,
    #[deftly(worker_report)]
    pub info: String,
    pub duplicate_of: Option<JobId>,
 
This is a nice example, also, of how using a macro can avoid bugs. Implementing this update by hand without a macro would involve a lot of cut-and-paste. When doing that cut-and-paste it can be very easy to accidentally write bugs where you forget to update some parts of each of the copies:
    pub fn update_worker_report(&self, wr: &mut WorkerReport)  
        wr.status = Some(self.status.clone()).into();
        wr.info = Some(self.status.clone()).into();
     
Spot the mistake? We copy status to info. Bugs like this are extremely common, and not always found by the type system. derive-deftly can make it much easier to make them impossible. Special-purpose derive macros are now worthwhile! Because of the difficult and cumbersome nature of proc macros, very few projects have site-specific, special-purpose #[derive(...)] macros. The Arti codebase has no bespoke proc macros, across its 240kloc and 86 crates. (We did fork one upstream proc macro package to add a feature we needed.) I have only one bespoke, case-specific, proc macro amongst all of my personal Rust projects; it predates derive-deftly. Since we have started using derive-deftly in Arti, it has become an important tool in our toolbox. We have 37 bespoke derive macros, done with derive-deftly. Of these, 9 are exported for use by downstream crates. (For comparison there are 176 macro_rules macros.) In my most recent personal Rust project, I have 22 bespoke derive macros, done with with derive-deftly, and 19 macro_rules macros. derive-deftly macros are easy and straightforward enough that they can be used as readily as macro_rules macros. Indeed, they are often clearer than a macro_rules macro. Stability without stagnation derive-deftly is already highly capable, and can solve many advanced problems. It is mature software, well tested, with excellent documentation, comprising both comprehensive reference material and the walkthrough-structured user guide. But declaring it 1.0 doesn t mean that it won t improve further. Our ticket tracker has a laundry list of possible features. We ll sometimes be cautious about committing to these, so we ve added a beta feature flag, for opting in to less-stable features, so that we can prototype things without painting ourselves into a corner. And, we intend to further develop the Guide.

comment count unavailable comments

B lint R czey: Supercharge Your Installs with apt-eatmydata: Because Who Needs Crash Safety Anyway?

APT eatmydata super cow powers
Tired of waiting for apt to finish installing packages? Wish there were a way to make your installations blazingly fast without caring about minor things like, oh, data integrity? Well, today is your lucky day!
I m thrilled to introduce apt-eatmydata, now available for Debian and all supported Ubuntu releases!

What Is apt-eatmydata? If you ve ever used libeatmydata, you know it s a nifty little hack that disables fsync() and friends, making package installations way faster by skipping unnecessary disk writes. Normally, you d have to remember to wrap apt commands manually, like this:
eatmydata apt install texlive-full
But who has time for that? apt-eatmydata takes care of this automagically by integrating eatmydata seamlessly into apt itself! That means every package install is now turbocharged no extra typing required.

How to Get It

Debian If you re on Debian unstable/testing (or possibly soon in stable-backports), you can install it directly with:
sudo apt install apt-eatmydata

Ubuntu Ubuntu users already enjoy faster package installation thanks to zstd-compressed packages and to switch to even higher gear I ve backported apt-eatmydata to all supported Ubuntu releases. Just add this PPA and install:
sudo add-apt-repository ppa:firebuild/apt-eatmydata
sudo apt install apt-eatmydata
And boom! Your apt install times are getting serious upgrade. Let s run some tests
# pre-download package to measure only the installation
$ sudo apt install -d linux-headers-6.8.0-53-lowlatency
...
# installation time is 9.35s without apt-eatmydata:
$ sudo time apt install linux-headers-6.8.0-53-lowlatency
...
2.30user 2.12system 0:09.35elapsed 47%CPU (0avgtext+0avgdata 174680maxresident)k
32inputs+1495216outputs (0major+196945minor)pagefaults 0swaps
$ sudo apt install apt-eatmydata
...
$ sudo apt purge linux-headers-6.8.0-53-lowlatency
# installation time is 3.17s with apt-eatmydata:
$ sudo time eatmydata apt install linux-headers-6.8.0-53-lowlatency
2.30user 0.88system 0:03.17elapsed 100%CPU (0avgtext+0avgdata 174692maxresident)k
0inputs+205664outputs (0major+198099minor)pagefaults 0swaps
apt-eatmydata just made installing Linux headers 3x faster!

But Wait, There s More!  If you re automating CI builds, there s even a GitHub Action to make your workflows faster essentially doing what apt-eatmydata does, just setting it up in less than a second! Check it out here:
 GitHub Marketplace: apt-eatmydata

Should You Use It?  Warning: apt-eatmydata is not for all production environments. If your system crashes mid-install, you might end up with a broken package database. But for throwaway VMs, containers, and CI pipelines? It s an absolute game-changer. I use it on my laptop, too. So go forth and install recklessly fast!  If you run into any issues, feel free to file a bug or drop a comment. Happy hacking! (To accelerate your CI pipeline or local builds, check out Firebuild, that speeds up the builds, too!)

Freexian Collaborators: Debian Contributions: Python 3.13 as the default Python 3 version, Fixing qtpaths6 for cross compilation, sbuild support for Salsa CI, Rails 7 transition, DebConf preparations and more! (by Anupa Ann Joseph)

Debian Contributions: 2025-01 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Python 3.13 is now the default Python 3 version in Debian, by Stefano Rivera and Colin Watson The Python 3.13 as default transition has now completed. The next step is to remove Python 3.12 from the archive, which should be very straightforward, it just requires rebuilding C extension packages in no particular order. Stefano fixed some miscellaneous bugs blocking the completion of the 3.13 as default transition.

Fixing qtpaths6 for cross compilation, by Helmut Grohne While Qt5 used to use qmake to query installation properties, Qt6 is moving more and more to CMake and to ease that transition it relies on more qtpaths. Since this tool is not naturally aware of the architecture it is called for, it tends to produce results for the build architecture. Therefore, more than 100 packages were picking up a multiarch directory for the build architecture during cross builds. In collaboration with the Qt/KDE team and Sandro Knau in particular (none affiliated with Freexian), we added an architecture-specific wrapper script in the same way qmake has one for Qt5 and Qt6 already. The relevant CMake module has been updated to prefer the triplet-prefixed wrapper. As a result, most of the KDE packages now cross build on unstable ready in time for the trixie release.

/usr-move, by Helmut Grohne In December, Emil S dergren reported that a live-build was not working for him and in January, Colin Watson reported that the proposed mitigation for debian-installer-utils would practically fail. Both failures were to be attributed to a wrong understanding of implementation-defined behavior in dpkg-divert. As a result, all M18 mitigations had to be reviewed and many of them replaced. Many have been uploaded already and all instances have received updated patches. Even though dumat has been in operation for more than a year, it gained recent changes. For one thing, analysis of architectures other than amd64 was requested. Chris Hofstaedler (not affiliated with Freexian) kindly provided computing resources for repeatedly running it on the larger set. Doing so revealed various cross-architecture undeclared file conflicts in gcc, glibc, and binutils-z80, but it also revealed a previously unknown /usr-move issue in rpi.rpi-common. On top of that, dumat produced false positive diagnostics and wrongly associated Debian bugs in some cases, both of which have now been fixed. As a result, a supposedly fixed python3-sepolicy issue had to be reopened.

rebootstrap, by Helmut Grohne As much as we think of our base system as stable, it is changing a lot and the architecture cross bootstrap tooling is very sensitive to such changes requiring permanent maintenance. A problem that recently surfaced was that building a binutils cross toolchain would result in a binutils-for-host package that would not be practically installable as it would depend on a binutils-common package that was not built. This turned into an examination of binutils-common and noticing that it actually differed across architectures even though it should not. Johannes Schauer Marin Rodrigues (not affiliated with Freexian) and Colin Watson kindly helped brainstorm possible solutions. Eventually, Helmut provided a patch to move gprofng bits out of binutils-common. Independently, Matthias Klose (not affiliated with Freexian) split out binutils-gold into a separate source package. As a result, binutils-common is now equal across architectures and can be marked Multi-Arch: foreign resolving the initial problem.

Salsa CI, by Santiago Ruano Rinc n Santiago continued the work about the sbuild support for Salsa CI, that was mentioned in the previous month report. The !568 merge request that created the new build image was merged, making it easier to test !569 with external projects. Santiago used a fork of the debusine repo to try the draft !569, and some issues were spotted, and part of them fixed. This is the last debusine pipeline run with the current !569: https://salsa.debian.org/santiago/debusine/-/pipelines/794233. One of the last improvements relates to how to enable projects to customize the pipeline, in an equivalent way than they currently do in the extract-source and build jobs. While this is work-in-progress, the results are rather promising. Next steps include deciding on introducing schroot support for bookworm, bookworm-security, and older releases, as they are done in the official debian buildd.

DebConf preparations, by Stefano Rivera and Santiago Ruano Rinc n DebConf will be happening in Brest, France, in July. Santiago continued the DebConf 25 organization work, looking for catering providers. Both Stefano and Santiago have been reaching out to some potential sponsors. DebConf depends on sponsors to cover the organization cost, if your company depends on Debian, please consider sponsoring DebConf. Stefano has been winding up some of the finances from previous DebConfs. Finalizing reimbursements to team members from DebConf 23, and handling some outstanding issues from DebConf 24. Stefano and the rest of the DebConf committee have been reviewing bids for DebConf 26, to select the next venue.

Ruby 3.3 is now the default Ruby interpreter, by Lucas Kanashiro Ruby 3.3 is about to become the default Ruby interpreter for Trixie. Many bugs were fixed by Lucas and the Debian Ruby team during the sprint hold in Paris during Jan 27-31. The next step is to remove support of Ruby 3.1, which is the alternative Ruby interpreter for now. Thanks to the Debian Release team for all the support, especially Emilio Pozuelo Monfort.

Rails 7 transition, by Lucas Kanashiro Rails 6 has been shipped by Debian since Bullseye, and as a WEB framework, many issues (especially security related issues) have been encountered and the maintainability of it becomes harder and harder. With that in mind, during the Debian Ruby team sprint last month, the transition to Rack 3 (an important dependency of rails containing many breaking changes) was started in Debian unstable, it is ongoing. Once it is done, the Rails 7 transition will take place, and Rails 7 should be shipped in Debian Trixie.

Miscellaneous contributions
  • Stefano improved a poor ImportError for users of the turtle module on Python 3, who haven t installed the python3-tk package.
  • Stefano updated several packages to new upstream releases.
  • Stefano added the Python extension to the re2 package, allowing for the use of the Google RE2 regular expression library as a direct replacement for the standard library re module.
  • Stefano started provisioning a new physical server for the debian.social infrastructure.
  • Carles improved simplemonitor (documentation on systemd integration, worked with upstream for fixing a bug).
  • Carles upgraded packages to new upstream versions: python-ring-doorbell and python-asyncclick.
  • Carles did po-debconf translations to Catalan: reviewed 44 packages and submitted translations to 90 packages (via salsa merge requests or bugtracker bugs).
  • Carles maintained po-debconf-manager with small fixes.
  • Rapha l worked on some outstanding DEP-14 merge request and participated in the associated discussion. The discussions have been more contentious than anticipated, somewhat exacerbated by Otto s desire to conclude fast while the required tool support is not yet there.
  • Rapha l, with the help of Philipp Kern from the DSA team, upgraded tracker.debian.org to use Django 4.2 (from bookworm-backports) which in turn enabled him to configure authentication via salsa.debian.org. It s now possible to login to tracker.debian.org with your salsa credentials!
  • Rapha l updated zim a nice desktop wiki that is very handy to organize your day-to-day digital life to the latest upstream version (0.76).
  • Helmut sent patches for 10 cross build failures.
  • Helmut continued working on a tool for memory-based concurrency limit of builds.
  • Helmut NMUed libtool, opensysusers and virtualbox.
  • Enrico tried to support Helmut in working out tricky usrmerge situations
  • Thorsten Alteholz uploaded a new upstream version of brlaser.
  • Colin Watson upgraded 33 Python packages to new upstream versions, including fixes for CVE-2024-42353, CVE-2024-47532, and CVE-2025-22153.
  • Emilio Pozuelo managed various transitions, and fixed various RC bugs (telepathy-glib, xorg, xserver-xorg-video-vesa, apitrace, mesa).
  • Anupa attended the monthly team meeting for Debian publicity team and shared the social media stats.
  • Anupa assisted Jean-Pierre Giraud in the point release announcement for Debian 12.9 and published the Micronews.
  • Anupa took part in multiple Debian publicity team discussions regarding our presence in social media platforms.

10 February 2025

Petter Reinholdtsen: Some of my 2024 free software activities

It is a while since I posted a summary of the free software and open culture activities and projects I have worked on. Here is a quick summary of the major ones from last year. I guess the biggest project of the year has been migrating orphaned packages in Debian without a version control system to have a git repository on salsa.debian.org. When I started in April around 450 the orphaned packages needed git. I've since migrated around 250 of the packages to a salsa git repository, and around 40 packages were left when I took a break. Not sure who did the around 160 conversions I was not involved in, but I am very glad I got some help on the project. I stopped partly because some of the remaining packages needed more disk space to build than I have available on my development machine, and partly because some had a strange build setup I could not figure out. I had a time budget of 20 minutes per package, if the package proved problematic and likely to take longer, I moved to another package. Might continue later, if I manage to free up some disk space. Another rather big project was the translation to Norwegian Bokm l and publishing of the first book ever published by a S mi woman, the M ter vi liv eller d d? book by Elsa Laula, with a PD0 and CC-BY license. I released it during the summer, and to my surprise it has already sold several copies. As I suck at marketing, I did not expect to sell any. A smaller, but more long term project (for more than 10 years now), and related to orphaned packages in Debian, is my project to ensure a simple way to install hardware related packages in Debian when the relevant hardware is present in a machine. It made a fairly big advance forward last year, partly because I have been poking and begging package maintainers and upstream developers to include AppStream metadata XML in their packages. I've also released a few new versions of the isenkram system with some robustness improvements. Today 127 packages in Debian provide such information, allowing isenkram-lookup to propose them. Will keep pushing until the around 35 package names currently hard coded in the isenkram package are down to zero, so only information provided by individual packages are used for this feature. As part of the work on AppStream, I have sponsored several packages into Debian where the maintainer wanted to fix the issue but lacked direct upload rights. I've also sponsored a few other packages, when approached by the maintainer. I would also like to mention two hardware related packages in particular where I have been involved, the megactl and mfi-util packages. Both work with the hardware RAID systems in several Dell PowerEdge servers, and the first one is already available in Debian (and of course, proposed by isenkram when used on the appropriate Dell server), the other is waiting for NEW processing since this autumn. I manage several such Dell servers and would like the tools needed to monitor and configure these RAID controllers to be available from within Debian out of the box. Vaguely related to hardware support in Debian, I have also been trying to find ways to help out the Debian ROCm team, to improve the support in Debian for my artificial idiocy (AI) compute node. So far only uploaded one package, helped test the initial packaging of llama.cpp and tried to figure out how to get good speech recognition like Whisper into Debian. I am still involved in the LinuxCNC project, and organised a developer gathering in Norway last summer. A new one is planned the summer of 2025. I've also helped evaluate patches and uploaded new versions of LinuxCNC into Debian. After a 10 years long break, we managed to get a new and improved upstream version of lsdvd released just before Christmas. As I use it regularly to maintain my DVD archive, I was very happy to finally get out a version supporting DVDDiscID useful for uniquely identifying DVDs. I am dreaming of a Internet service mapping DVD IDs to IMDB movie IDs, to make life as a DVD collector easier. My involvement in Norwegian archive standardisation and the free software implementation of the vendor neutral Noark 5 API continued for the entire year. I've been pushing patches into both the API and the test code for the API, participated in several editorial meetings regarding the Noark 5 Tjenestegrensesnitt specification, submitted several proposals for improvements for the same. We also organised a small seminar for Noark 5 interested people, and is organising a new seminar in a month. Part of the year was spent working on and coordinating a Norwegian Bokm l translation of the marvellous children's book Ada and Zangemann , which focus on the right to repair and control your own property, and the value of controlling the software on the devices you own. The translation is mostly complete, and is now waiting for a transformation of the project and manuscript to use Docbook XML instead of a home made semi-text based format. Great progress is being made and the new book build process is almost complete. I have also been looking at how to companies in Norway can use free software to report their accounting summaries to the Norwegian government. Several new regulations make it very hard for companies to do use free software for accounting, and I would like to change this. Found a few drafts for opening up the reporting process, and have read up on some of the specifications, but nothing much is working yet. These were just the top of the iceberg, but I guess this blog post is long enough now. If you would like to help with any of these projects, please get in touch, either directly on the project mailing lists and forums, or with me via email, IRC or Signal. :) As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

Russ Allbery: Review: The Scavenger Door

Review: The Scavenger Door, by Suzanne Palmer
Series: Finder Chronicles #3
Publisher: DAW
Copyright: 2021
ISBN: 0-7564-1516-0
Format: Kindle
Pages: 458
The Scavenger Door is a science fiction adventure and the third book of the Finder Chronicles. While each of the books of this series stand alone reasonably well, I would still read the series in order. Each book has some spoilers for the previous book. Fergus is back on Earth following the events of Driving the Deep, at loose ends and annoying his relatives. To get him out of their hair, his cousin sends him into the Scottish hills to find a friend's missing flock of sheep. Fergus finds things professionally, but usually not livestock. It's an easy enough job, though; the lead sheep was wearing a tracker and he just has to get close enough to pick it up. The unexpected twist is also finding a metal fragment buried in a hillside that has some strange resonance with the unwanted gift that Fergus got in Finder. Fergus's alien friend Ignatio is so alarmed by the metal fragment that he turns up in person in Fergus's cousin's bar in Scotland. Before he arrives, Fergus gets a mysteriously infuriating warning visit from alien acquaintances he does not consider friends. He has, as usual, stepped into something dangerous and complicated, and now somehow it's become his problem. So, first, we get lots of Ignatio, who is an enthusiastic large ball of green fuzz with five limbs who mostly speaks English but does so from an odd angle. This makes me happy because I love Ignatio and his tendency to take things just a bit too literally.
SANTO'S, the sign read. Under it, in smaller letters, was CURIOSITIES AND INCONVENIENCES FOR COMMENDABLE SUMS. "Inconveniences sound just like my thing," Fergus said. "You two want to wait in the car while I check it out?" "Oh, no, I am not missing this," Isla said, and got out of the podcar. "I am uncertain," Ignatio said. "I would like some curiouses, but not any inconveniences. Please proceed while I decide, and if there is also murdering or calamity or raisins, you will yell right away, yes?"
Also, if your story setup requires a partly-understood alien artifact that the protagonist can get some explanations for but not have the mystery neatly solved for them, Ignatio's explanations are perfect.
"It is a door. A doorbell. A... peephole? A key. A control light. A signal. A stop-and-go sign. A road. A bridge. A beacon. A call. A map. A channel. A way," Ignatio said. "It is a problem to explain. To say a doorkey is best, and also wrong. If put together, a path may be opened." "And then?" "And then the bad things on the other side, who we were trying to lock away, will be free to travel through."
Second, the thing about Palmer's writing that continues to impress me is her ability to take a standard science fiction plot, one whose variations I've read probably dozens of times before, and still make it utterly engrossing. This book is literally a fetch quest. There are a bunch of scattered fragments, Fergus has to find them and keep them from being assembled, various other people are after the same fragments, and Fergus either has to get there first or get the fragments back from them. If you haven't read this book before, you've played the video game or watched the movie. The threat is basically a Stargate SG-1 plot. And yet, this was so much fun. The characters are great. This book leans less on found family than the last one and a bit more on actual family. When I started reading this series, Fergus felt a bit bland in the way that adventure protagonists sometimes can, but he's fleshed out nicely as the series goes along. He's not someone who tends to indulge in big emotions, but now the reader can tell that's because he's the kind of person who finds things to do in order to keep from dwelling on things he doesn't want to think about. He's unflappable in a quietly competent way while still having a backstory and emotional baggage and a rich inner life that the reader sees in glancing fragments. We get more of Fergus's backstory, particularly around Mars, but I like that it's told in anecdotes and small pieces. The last thing Fergus wants to do is wallow in his past trauma, so he doesn't and finds something to do instead. There's just enough detail around the edges to deepen his character without turning the book into a story about Fergus's emotions and childhood. It's a tricky balancing act that Palmer handles well. There are also more sentient ships, and I am so in favor of more sentient ships.
"When I am adding a new skill, I import diagnostic and environmental information specific to my platform and topology, segregate the skill subroutines to a dedicated, protected logical space, run incremental testing on integration under all projected scenarios and variables, and then when I am persuaded the code is benevolent, an asset, and provides the functionality I was seeking, I roll it into my primary processing units," Whiro said. "You cannot do any of that, because if I may speak in purely objective terms you may incorrectly interpret as personal, you are made of squishy, unreliable goo."
We get the normal pieces of a well-done fetch quest: wildly varying locations, some great local characters (the US-based trauma surgeons on vacation in Australia were my favorites), and believable antagonists. There are two other groups looking for the fragments, and while one of them is the standard villain in this sort of story, the other is an apocalyptic cult whose members Fergus mostly feels sorry for and who add just the right amount of surreality to the story. The more we find out about them, the more believable they are, and the more they make this world feel like realistic messy chaos instead of the obvious (and boring) good versus evil patterns that a lot of adventure plots collapse into. There are things about this book that I feel like I should be criticizing, but I just can't. Fetch quests are usually synonymous with lazy plotting, and yet it worked for me. The way Fergus gets dumped into the middle of this problem starts out feeling as arbitrary and unmotivated as some video game fetch quest stories, but by the end of the book it starts to make sense. The story could arguably be described as episodic and cliched, and yet I was thoroughly invested. There are a few pacing problems at the very end, but I was too invested to care that much. This feels like a book that's better than the sum of its parts. Most of the story is future-Earth adventure with some heist elements. The ending goes in a rather different direction but stays at the center of the classic science fiction genre. The Scavenger Door reaches a satisfying conclusion, but there are a ton of unanswered questions that will send me on to the fourth (and reportedly final) novel in the series shortly. This is great stuff. It's not going to win literary awards, but if you're in the mood for some classic science fiction with fun aliens and neat ideas, but also benefiting from the massive improvements in characterization the genre has seen in the past forty years, this series is perfect. Highly recommended. Followed by Ghostdrift. Rating: 9 out of 10

9 February 2025

Antoine Beaupr : A slow blogging year

Well, 2024 will be remembered, won't it? I guess 2025 already wants to make its mark too, but let's not worry about that right now, and instead let's talk about me. A little over a year ago, I was gloating over how I had such a great blogging year in 2022, and was considering 2023 to be average, then went on to gather more stats and traffic analysis... Then I said, and I quote:
I hope to write more next year. I've been thinking about a few posts I could write for work, about how things work behind the scenes at Tor, that could be informative for many people. We run a rather old setup, but things hold up pretty well for what we throw at it, and it's worth sharing that with the world...
What a load of bollocks.

A bad year for this blog 2024 was the second worst year ever in my blogging history, tied with 2009 at a measly 6 posts for the year:
anarcat@angela:anarc.at$ curl -sSL https://anarc.at/blog/   grep 'href="\./'   grep -o 20[0-9][0-9]   sort   uniq -c   sort -nr   grep -v 2025   tail -3
      6 2024
      6 2009
      3 2014
I did write about my work though, detailing the migration from Gitolite to GitLab we completed that year. But after August, total radio silence until now.

Loads of drafts It's not that I have nothing to say: I have no less than five drafts in my working tree here, not counting three actual drafts recorded in the Git repository here:
anarcat@angela:anarc.at$ git s blog
## main...origin/main
?? blog/bell-bot.md
?? blog/fish.md
?? blog/kensington.md
?? blog/nixos.md
?? blog/tmux.md
anarcat@angela:anarc.at$ git grep -l '\!tag draft'
blog/mobile-massive-gallery.md
blog/on-dying.mdwn
blog/secrets-recovery.md
I just don't have time to wrap those things up. I think part of me is disgusted by seeing my work stolen by large corporations to build proprietary large language models while my idols have been pushed to suicide for trying to share science with the world. Another part of me wants to make those things just right. The "tagged drafts" above are nothing more than a huge pile of chaotic links, far from being useful for anyone else than me, and even then. The on-dying article, in particular, is becoming my nemesis. I've been wanting to write that article for over 6 years now, I think. It's just too hard.

Writing elsewhere There's also the fact that I write for work already. A lot. Here are the top-10 contributors to our team's wiki:
anarcat@angela:help.torproject.org$ git shortlog --numbered --summary --group="format:%al"   head -10
  4272  anarcat
   423  jerome
   117  zen
   116  lelutin
   104  peter
    58  kez
    45  irl
    43  hiro
    18  gaba
    17  groente
... but that's a bit unfair, since I've been there half a decade. Here's the last year:
anarcat@angela:help.torproject.org$ git shortlog --since=2024-01-01 --numbered --summary --group="format:%al"   head -10
   827  anarcat
   117  zen
   116  lelutin
    91  jerome
    17  groente
    10  gaba
     8  micah
     7  kez
     5  jnewsome
     4  stephen.swift
So I still write the most commits! But to truly get a sense of the amount I wrote in there, we should count actual changes. Here it is by number of lines (from commandlinefu.com):
anarcat@angela:help.torproject.org$ git ls-files   xargs -n1 git blame --line-porcelain   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  99046 Antoine Beaupr 
   6900 Zen Fu
   4784 J r me Charaoui
   1446 Gabriel Filion
   1146 Jerome Charaoui
    837 groente
    705 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
That, of course, is the entire history of the git repo, again. We should take only the last year into account, and probably ignore the tails directory, as sneaky Zen Fu imported the entire docs from another wiki there...
anarcat@angela:help.torproject.org$ find [d-s]* -type f -mtime -365   xargs -n1 git blame --line-porcelain 2>/dev/null   sed -n 's/^author //p'   sort -f   uniq -ic   sort -nr   head -10
  75037 Antoine Beaupr 
   2932 J r me Charaoui
   1442 Gabriel Filion
   1400 Zen Fu
    929 Jerome Charaoui
    837 groente
    702 kez
    569 Gaba
    381 Matt Traudt
    237 Stephen Swift
Pretty good! 75k lines. But those are the files that were modified in the last year. If we go a little more nuts, we find that:
anarcat@angela:help.torproject.org$ $ git-count-words-range.py    sort -k6 -nr   head -10
parsing commits for words changes from command: git log '--since=1 year ago' '--format=%H %al'
anarcat 126116 - 36932 = 89184
zen 31774 - 5749 = 26025
groente 9732 - 607 = 9125
lelutin 10768 - 2578 = 8190
jerome 6236 - 2586 = 3650
gaba 3164 - 491 = 2673
stephen.swift 2443 - 673 = 1770
kez 1034 - 74 = 960
micah 772 - 250 = 522
weasel 410 - 0 = 410
I wrote 126,116 words in that wiki, only in the last year. I also deleted 37k words, so the final total is more like 89k words, but still: that's about forty (40!) articles of the average size (~2k) I wrote in 2022. (And yes, I did go nuts and write a new log parser, essentially from scratch, to figure out those word diffs. I did get the courage only after asking GPT-4o for an example first, I must admit.) Let's celebrate that again: I wrote 90 thousand words in that wiki in 2024. According to Wikipedia, a "novella" is 17,500 to 40,000 words, which would mean I wrote about a novella and a novel, in the past year. But interestingly, if I look at the repository analytics. I certainly didn't write that much more in the past year. So that alone cannot explain the lull in my production here.

Arguments Another part of me is just tired of the bickering and arguing on the internet. I have at least two articles in there that I suspect is going to get me a lot of push-back (NixOS and Fish). I know how to deal with this: you need to write well, consider the controversy, spell it out, and defuse things before they happen. But that's hard work and, frankly, I don't really care that much about what people think anymore. I'm not writing here to convince people. I have stop evangelizing a long time ago. Now, I'm more into documenting, and teaching. And, while teaching, there's a two-way interaction: when you give out a speech or workshop, people can ask questions, or respond, and you all learn something. When you document, you quickly get told "where is this? I couldn't find it" or "I don't understand this" or "I tried that and it didn't work" or "wait, really? shouldn't we do X instead", and you learn. Here, it's static. It's my little soapbox where I scream in the void. The only thing people can do is scream back.

Collaboration So. Let's see if we can work together here. If you don't like something I say, disagree, or find something wrong or to be improved, instead of screaming on social media or ignoring me, try contributing back. This site here is backed by a git repository and I promise to read everything you send there, whether it is an issue or a merge request. I will, of course, still read comments sent by email or IRC or social media, but please, be kind. You can also, of course, follow the latest changes on the TPA wiki. If you want to catch up with the last year, some of the "novellas" I wrote include: (Well, no, you can't actually follow changes on a GitLab wiki. But we have a wiki-replica git repository where you can see the latest commits, and subscribe to the RSS feed.) See you there!

Antoine Beaupr : Qalculate hacks

This is going to be a controversial statement because some people are absolute nerds about this, but, I need to say it. Qalculate is the best calculator that has ever been made. I am not going to try to convince you of this, I just wanted to put out my bias out there before writing down those notes. I am a total fan. This page will collect my notes of cool hacks I do with Qalculate. Most examples are copy-pasted from the command-line interface (qalc(1)), but I typically use the graphical interface as it's slightly better at displaying complex formulas. Discoverability is obviously also better for the cornucopia of features this fantastic application ships.

Qalc commandline primer On Debian, Qalculate's CLI interface can be installed with:
apt install qalc
Then you start it with the qalc command, and end up on a prompt:
anarcat@angela:~$ qalc
> 
Then it's a normal calculator:
anarcat@angela:~$ qalc
> 1+1
  1 + 1 = 2
> 1/7
  1 / 7   0.1429
> pi
  pi   3.142
> 
There's a bunch of variables to control display, approximation, and so on:
> set precision 6
> 1/7
  1 / 7   0.142857
> set precision 20
> pi
  pi   3.1415926535897932385
When I need more, I typically browse around the menus. One big issue I have with Qalculate is there are a lot of menus and features. I had to fiddle quite a bit to figure out that set precision command above. I might add more examples here as I find them.

Bandwidth estimates I often use the data units to estimate bandwidths. For example, here's what 1 megabit per second is over a month ("about 300 GiB"):
> 1 megabit/s * 30 day to gibibyte 
  (1 megabit/second)   (30 days)   301.7 GiB
Or, "how long will it take to download X", in this case, 1GiB over a 100 mbps link:
> 1GiB/(100 megabit/s)
  (1 gibibyte) / (100 megabits/second)   1 min + 25.90 s

Password entropy To calculate how much entropy (in bits) a given password structure, you count the number of possibilities in each entry (say, [a-z] is 26 possibilities, "one word in a 8k dictionary" is 8000), extract the base-2 logarithm, multiplied by the number of entries. For example, an alphabetic 14-character password is:
> log2(26*2)*14
  log (26   2)   14   79.81
... 80 bits of entropy. To get the equivalent in a Diceware password with a 8000 word dictionary, you would need:
> log2(8k)*x = 80
  (log (8   000)   x) = 80  
  x   6.170
... about 6 words, which gives you:
> log2(8k)*6
  log (8   1000)   6   77.79
78 bits of entropy.

Exchange rates You can convert between currencies!
> 1 EUR to USD
  1 EUR   1.038 USD
Even fake ones!
> 1 BTC to USD
  1 BTC   96712 USD
This relies on a database pulled form the internet (typically the central european bank rates, see the source). It will prompt you if it's too old:
It has been 256 days since the exchange rates last were updated.
Do you wish to update the exchange rates now? y
As a reader pointed out, you can set the refresh rate for currencies, as some countries will require way more frequent exchange rates. The graphical version has a little graphical indicator that, when you mouse over, tells you where the rate comes from.

Other conversions Here are other neat conversions extracted from my history
> teaspoon to ml
  teaspoon = 5 mL
> tablespoon to ml
  tablespoon = 15 mL
> 1 cup to ml 
  1 cup   236.6 mL
> 6 L/100km to mpg
  (6 liters) / (100 kilometers)   39.20 mpg
> 100 kph to mph
  100 kph   62.14 mph
> (108km - 72km) / 110km/h
  ((108 kilometers)   (72 kilometers)) / (110 kilometers/hour)  
  19 min + 38.18 s

Completion time estimates This is a more involved example I often do.

Background Say you have started a long running copy job and you don't have the luxury of having a pipe you can insert pv(1) into to get a nice progress bar. For example, rsync or cp -R can have that problem (but not tar!). (Yes, you can use --info=progress2 in rsync, but that estimate is incremental and therefore inaccurate unless you disable the incremental mode with --no-inc-recursive, but then you pay a huge up-front wait cost while the entire directory gets crawled.)

Extracting a process start time First step is to gather data. Find the process start time. If you were unfortunate enough to forget to run date --iso-8601=seconds before starting, you can get a similar timestamp with stat(1) on the process tree in /proc with:
$ stat /proc/11232
  File: /proc/11232
  Size: 0               Blocks: 0          IO Block: 1024   directory
Device: 0,21    Inode: 57021       Links: 9
Access: (0555/dr-xr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2025-02-07 15:50:25.287220819 -0500
Modify: 2025-02-07 15:50:25.287220819 -0500
Change: 2025-02-07 15:50:25.287220819 -0500
 Birth: -
So our start time is 2025-02-07 15:50:25, we shave off the nanoseconds there, they're below our precision noise floor. If you're not dealing with an actual UNIX process, you need to figure out a start time: this can be a SQL query, a network request, whatever, exercise for the reader.

Saving a variable This is optional, but for the sake of demonstration, let's save this as a variable:
> start="2025-02-07 15:50:25"
  save("2025-02-07T15:50:25"; start; Temporary; ; 1) =
  "2025-02-07T15:50:25"

Estimating data size Next, estimate your data size. That will vary wildly with the job you're running: this can be anything: number of files, documents being processed, rows to be destroyed in a database, whatever. In this case, rsync tells me how many bytes it has transferred so far:
# rsync -ASHaXx --info=progress2 /srv/ /srv-zfs/
2.968.252.503.968  94%    7,63MB/s    6:04:58  xfr#464440, ir-chk=1000/982266) 
Strip off the weird dots in there, because that will confuse qalculate, which will count this as:
  2.968252503968 bytes   2.968 B
Or, essentially, three bytes. We actually transferred almost 3TB here:
  2968252503968 bytes   2.968 TB
So let's use that. If you had the misfortune of making rsync silent, but were lucky enough to transfer entire partitions, you can use df (without -h! we want to be more precise here), in my case:
Filesystem              1K-blocks       Used  Available Use% Mounted on
/dev/mapper/vg_hdd-srv 7512681384 7258298036  179205040  98% /srv
tank/srv               7667173248 2870444032 4796729216  38% /srv-zfs
(Otherwise, of course, you use du -sh $DIRECTORY.)

Digression over bytes Those are 1 K bytes which is actually (and rather unfortunately) Ki, or "kibibytes" (1024 bytes), not "kilobytes" (1000 bytes). Ugh.
> 2870444032 KiB
  2870444032 kibibytes   2.939 TB
> 2870444032 kB
  2870444032 kilobytes   2.870 TB
At this scale, those details matter quite a bit, we're talking about a 69GB (64GiB) difference here:
> 2870444032 KiB - 2870444032 kB
  (2870444032 kibibytes)   (2870444032 kilobytes)   68.89 GB
Anyways. Let's take 2968252503968 bytes as our current progress. Our entire dataset is 7258298064 KiB, as seen above.

Solving a cross-multiplication We have 3 out of four variables for our equation here, so we can already solve:
> (now-start)/x = (2996538438607 bytes)/(7258298064 KiB) to h
  ((actual   start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes))
  x   59.24 h
The entire transfer will take about 60 hours to complete! Note that's not the time left, that is the total time. To break this down step by step, we could calculate how long it has taken so far:
> now-start
  now   start   23 h + 53 min + 6.762 s
> now-start to s
  now   start   85987 s
... and do the cross-multiplication manually, it's basically:
x/(now-start) = (total/current)
so:
x = (total/current) * (now-start)
or, in Qalc:
> ((7258298064  kibibytes) / ( 2996538438607 bytes) ) *  85987 s
  ((7258298064 kibibytes) / (2996538438607 bytes))   (85987 secondes)  
  2 d + 11 h + 14 min + 38.81 s
It's interesting it gives us different units here! Not sure why.

Now and built-in variables The now here is actually a built-in variable:
> now
  now   "2025-02-08T22:25:25"
There is a bewildering list of such variables, for example:
> uptime
  uptime = 5 d + 6 h + 34 min + 12.11 s
> golden
  golden   1.618
> exact
  golden = ( (5) + 1) / 2

Computing dates In any case, yay! We know the transfer is going to take roughly 60 hours total, and we've already spent around 24h of that, so, we have 36h left. But I did that all in my head, we can ask more of Qalc yet! Let's make another variable, for that total estimated time:
> total=(now-start)/x = (2996538438607 bytes)/(7258298064 KiB)
  save(((now   start) / x) = ((2996538438607 bytes) / (7258298064
  kibibytes)); total; Temporary; ; 1)  
  2 d + 11 h + 14 min + 38.22 s
And we can plug that into another formula with our start time to figure out when we'll be done!
> start+total
  start + total   "2025-02-10T03:28:52"
> start+total-now
  start + total   now   1 d + 11 h + 34 min + 48.52 s
> start+total-now to h
  start + total   now   35 h + 34 min + 32.01 s
That transfer has ~1d left, or 35h24m32s, and should complete around 4 in the morning on February 10th. But that's icing on top. I typically only do the cross-multiplication and calculate the remaining time in my head. I mostly did the last bit to show Qalculate could compute dates and time differences, as long as you use ISO timestamps. Although it can also convert to and from UNIX timestamps, it cannot parse arbitrary date strings (yet?).

Other functionality Qalculate can:
  • Plot graphs;
  • Use RPN input;
  • Do all sorts of algebraic, calculus, matrix, statistics, trigonometry functions (and more!);
  • ... and so much more!
I have a hard time finding things it cannot do. When I get there, I typically need to resort to programming code in Python, use a spreadsheet, and others will turn to more complete engines like Maple, Mathematica or R. But for daily use, Qalculate is just fantastic. And it's pink! Use it!

Further reading and installation This is just scratching the surface, the fine manual has more information, including more examples. There is also of course a qalc(1) manual page which also ships an excellent EXAMPLES section. Qalculate is packaged for over 30 Linux distributions, but also ships packages for Windows and MacOS. There are third-party derivatives as well including a web version and an Android app.

Updates Colin Watson liked this blog post and was inspired to write his own hacks, similar to what's here, but with extras, check it out!

Petter Reinholdtsen: New oggz release 1.1.2 after 15 years

A little over a week ago, I noticed the liboggz package on my Debian dashboard had not had a new upstream release for a while. A closer look showed that its last release, version 1.1.1, happened in 2010. A few patches had accumulated in the Debian package, and I even noticed that I had passed on these patches to upstream five years ago. A handful crash bugs had been reported against the Debian package, and looking at the upstream repository I even found a few crash bugs reported there too. To add insult to injury, I discovered that upstream had accumulated several fixes in the years between 2010 and now, and many of them had not made their way into the Debian package. I decided enough was enough, and that a new upstream release was needed fixing these nasty crash bugs. Luckily I am also a member of the Xiph team, aka upstream, and could actually go to work immediately to fix it. I started by adding automatic build testing on the Xiph gitlab oggz instance, to get a better idea of the state of affairs with the code base. This exposed a few build problems, which I had to fix. In parallel to this, I sent an email announcing my wish for a new release to every person who had committed to the upstream code base since 2010, and asked for help doing a new release both on email and on the #xiph IRC channel. Sadly only a fraction of their email providers accepted my email. But Ralph Giles in the Xiph team came to the rescue and provided invaluable help to guide be through the release Xiph process. While this was going on, I spent a few days tracking down the crash bugs with good help from valgrind, and came up with patch proposals to get rid of at least these specific crash bugs. The open issues also had to be checked. Several of them proved to be fixed already, but a few I had to creat patches for. I also checked out the Debian, Arch, Fedora, Suse and Gentoo packages to see if there were patches applied in these Linux distributions that should be passed upstream. The end result was ready yesterday. A new liboggz release, version 1.1.2, was tagged, wrapped up and published on the project page. And today, the new release was uploaded into Debian. You are probably by now curious on what actually changed in the library. I guess the most interesting new feature was support for Opus and VP8. Almost all other changes were stability or documentation fixes. The rest were related to the gitlab continuous integration testing. All in all, this was really a minor update, hence the version bump only from 1.1.1 to to 1.1.2, but it was long overdue and I am very happy that it is out the door. One change proposed upstream was not included this time, as it extended the API and changed some of the existing library methods, and thus require a major SONAME bump and possibly code changes in every program using the library. As I am not that familiar with the code base, I am unsure if I am the right person to evaluate the change. Perhaps later. Since the release was tagged, a few minor fixes has been committed upstream already: automatic testing the cross building to Windows, and documentation updates linking to the correct project page. If a important issue is discovered with this release, I guess a new release might happen soon including the minor fixes. If not, perhaps they can wait fifteen years. :) I would like to send a big thank you to everyone that helped make this release happen, from the people adding fixes upstream over the course of fifteen years, to the ones reporting crash bugs, other bugs and those maintaining the package in various Linux distributions. Thank you very much for your time and interest. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

8 February 2025

Erich Schubert: Azul s State-of-Java report is nonsense

Azul s State-of-Java report is full of nonsense, and no worth looking at. The report claims various stuff about the adoption of AI in the Java ecosystem. But its results do not make any sense when looked at in detail. For example (in the AI section): I can only guess that people picked some random plausible answer, but were not actually using any of that. Probably because of bad incentives:
Participants were offered token compensation for their participation.
Seems like Dimensional Research, the company who did that survey, screwed up badly.

7 February 2025

Dirk Eddelbuettel: zigg 0.0.2 on CRAN: Micromaintenance

benchmark chart The still very new package zigg which arrived on CRAN a week ago just received a micro-update at CRAN. zigg provides the Ziggurat pseudo-random number generator (PRNG) for Normal, Exponential and Uniform draws proposed by Marsaglia and Tsang (JSS, 2000), and extended by Leong et al. (JSS, 2005). This PRNG is lightweight and very fast: on my machine speedups for the Normal, Exponential, and Uniform are on the order of 7.4, 5.2 and 4.7 times faster than the default generators in R as illustrated in the benchmark chart borrowed from the git repo. As wrote last week in the initial announcement, I had picked up their work in package RcppZiggurat and updated its code for the 64-buit world we now live in. That package alredy provided the Normal generator along with several competing implementations which it compared rigorously and timed them. As one of the generators was based on the GNU GSL via the implementation of Voss, we always ended up with a run-time dependency on the GSL too. No more: this new package is zero-depedency, zero-suggsts and hence very easy to deploy. Moreover, we also include a demonstration of four distinct ways of accessing the compiled code from another R package: pure and straight-up C, similarly pure C++, inclusion of the header in C++ as well as via Rcpp. The other advance is the resurrection of the second generator for the Exponential distribution. And following Burkardt we expose the Uniform too. The main upside of these generators is their excellent speed as can be seen in the comparison the default R generators generated by the example script timings.R: Needless to say, speed is not everything. This PRNG comes the time of 32-bit computing so the generator period is likely to be shorter than that of newer high-quality generators. If in doubt, forgo speed and stick with the high-quality default generators. This release essentially just completes the DESCRIPTION file and README.md now that this is a CRAN package. The short NEWS entry follows.

Changes in version 0.0.2 (2025-02-07)
  • Complete DESCRIPTION and README.md following initial CRAN upload

Courtesy of my CRANberries, there is a diffstat report relative to previous release. For more information, see the package page or the git repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

6 February 2025

Dominique Dumont: Drawbacks of using Cookiecutter with Cruft

Hi Cookiecutter is a tool for building coding project templates. It s often used to provide a scaffolding to build lots of similar project. I ve seen it used to create Symfony projects and several cloud infrastructures deployed with Terraform. This tool was useful to accelerate the creation of new projects.  Since these templates were bound to evolve, the teams providing these template relied on cruft to update the code provided by the template in their user s code. In other words, they wanted their users to apply a diff of the template modification to their code. At the beginning, all was fine. But problems began to appear during the lifetime of these projects. What went wrong ? In both cases, we had the following scenario: User team giving up on updates is a major problem because these update may bring security or compliance fixes.  Note that code conflicts seen with cruft are similar to git merge conflicts, but harder to resolve because, unlike with a git merge, there s no common ancestor, so 3-way merges are not possible. From an organisation point of view, the main problem is the ambiguous ownership of the functionalities brought by template code: who own this code ? The provider team who writes the template or the user team who owns the repository of the code generated from the template ? Conflicts are bound to happen. Possible solutions to get out of this tar pit: Of course your users won t be happy to be faced with a manual migration from the old big template to the new one with external dependencies. On the other hand, this may be easier to sell than updates based on cruft since the painful work will happen once. Further updates will be done by incrementing dependency versions (which can be automated with renovate). If many projects are to be created with this template, it may be more practical to provide use a CLI that will create a skeleton project. See for instance terragrunt scaffold command. My name is Dominique Dumont, I m a devops freelance. You can find the devops and audit services I propose on my website or reach out to me on LinkedIn. All the best

Bits from Debian: Proxmox Platinum Sponsor of DebConf25

proxmox-logo We are pleased to announce that Proxmox has committed to sponsor DebConf25 as a Platinum Sponsor. Proxmox develops powerful, yet easy-to-use Open Source server software. The product portfolio from Proxmox, including server virtualization, backup, and email security, helps companies of any size, sector, or industry to simplify their IT infrastructures. The Proxmox solutions are based on the great Debian platform, and we are happy that we can give back to the community by sponsoring DebConf25. With this commitment as Platinum Sponsor, Proxmox is contributing to the Debian annual Developers' conference, directly supporting the progress of Debian and Free Software. Proxmox contributes to strengthen the community that collaborates on Debian projects from all around the world throughout all of the year. Thank you very much, Proxmox, for your support of DebConf25! Become a sponsor too! DebConf25 will take place from 14 to 20 July 2025 in Brest, France, and will be preceded by DebCamp, from 7 to 13 July 2025. DebConf25 is accepting sponsors! Interested companies and organizations may contact the DebConf team through sponsors@debconf.org, and visit the DebConf25 website at https://debconf25.debconf.org/sponsors /become-a-sponsor/.

5 February 2025

Reproducible Builds: Reproducible Builds in January 2025

Welcome to the first report in 2025 from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As usual, though, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. reproduce.debian.net
  2. Two new academic papers
  3. Distribution work
  4. On our mailing list
  5. Upstream patches
  6. diffoscope
  7. Website updates
  8. Reproducibility testing framework

reproduce.debian.net The last few months saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. Powering that is rebuilderd, our server designed monitor the official package repositories of Linux distributions and attempt to reproduce the observed results there. This month, however, we are pleased to announce that in addition to the existing amd64.reproduce.debian.net and i386.reproduce.debian.net architecture-specific pages, we now build for a three more architectures (for a total of five) arm64 armhf and riscv64.

Two new academic papers Giacomo Benedetti, Oreofe Solarin, Courtney Miller, Greg Tystahl, William Enck, Christian K stner, Alexandros Kapravelos, Alessio Merlo and Luca Verderame published an interesting article recently. Titled An Empirical Study on Reproducible Packaging in Open-Source Ecosystem, the abstract outlines its optimistic findings:
[We] identified that with relatively straightforward infrastructure configuration and patching of build tools, we can achieve very high rates of reproducible builds in all studied ecosystems. We conclude that if the ecosystems adopt our suggestions, the build process of published packages can be independently confirmed for nearly all packages without individual developer actions, and doing so will prevent significant future software supply chain attacks.
The entire PDF is available online to view.
In addition, Julien Malka, Stefano Zacchiroli and Th o Zimmermann of T l com Paris in-house research laboratory, the Information Processing and Communications Laboratory (LTCI) published an article asking the question: Does Functional Package Management Enable Reproducible Builds at Scale?. Answering strongly in the affirmative, the article s abstract reads as follows:
In this work, we perform the first large-scale study of bitwise reproducibility, in the context of the Nix functional package manager, rebuilding 709,816 packages from historical snapshots of the nixpkgs repository[. We] obtain very high bitwise reproducibility rates, between 69 and 91% with an upward trend, and even higher rebuildability rates, over 99%. We investigate unreproducibility causes, showing that about 15% of failures are due to embedded build dates. We release a novel dataset with all build statuses, logs, as well as full diffoscopes: recursive diffs of where unreproducible build artifacts differ.
As above, the entire PDF of the article is available to view online.

Distribution work There as been the usual work in various distributions this month, such as:
  • 10+ reviews of Debian packages were added, 11 were updated and 10 were removed this month adding to our knowledge about identified issues. A number of issue types were updated also.
  • The FreeBSD Foundation announced that a planned project to deliver zero-trust builds has begun in January 2025 . Supported by the Sovereign Tech Agency, this project is centered on the various build processes, and that the primary goal of this work is to enable the entire release process to run without requiring root access, and that build artifacts build reproducibly that is, that a third party can build bit-for-bit identical artifacts. The full announcement can be found online, which includes an estimated schedule and other details.

On our mailing list On our mailing list this month:
  • Following-up to a substantial amount of previous work pertaining the Sphinx documentation generator, James Addison asked a question pertaining to the relationship between SOURCE_DATE_EPOCH environment variable and testing that generated a number of replies.
  • Adithya Balakumar of Toshiba asked a question about whether it is possible to make ext4 filesystem images reproducible. Adithya s issue is that even the smallest amount of post-processing of the filesystem results in the modification of the Last mount and Last write timestamps.
  • James Addison also investigated an interesting issue surrounding our disorderfs filesystem. In particular:
    FUSE (Filesystem in USErspace) filesystems such as disorderfs do not delete files from the underlying filesystem when they are deleted from the overlay. This can cause seemingly straightforward tests for example, cases that expect directory contents to be empty after deletion is requested for all files listed within them to fail.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 285, 286 and 287 to Debian:
  • Security fixes:
    • Validate the --css command-line argument to prevent a potential Cross-site scripting (XSS) attack. Thanks to Daniel Schmidt from SRLabs for the report. [ ]
    • Prevent XML entity expansion attacks. Thanks to Florian Wilkens from SRLabs for the report.. [ ][ ]
    • Print a warning if we have disabled XML comparisons due to a potentially vulnerable version of pyexpat. [ ]
  • Bug fixes:
    • Correctly identify changes to only the line-endings of files; don t mark them as Ordering differences only. [ ]
    • When passing files on the command line, don t call specialize( ) before we ve checked that the files are identical or not. [ ]
    • Do not exit with a traceback if paths are inaccessible, either directly, via symbolic links or within a directory. [ ]
    • Don t cause a traceback if cbfstool extraction failed.. [ ]
    • Use the surrogateescape mechanism to avoid a UnicodeDecodeError and crash when any decoding zipinfo output that is not UTF-8 compliant. [ ]
  • Testsuite improvements:
    • Don t mangle newlines when opening test fixtures; we want them untouched. [ ]
    • Move to assert_diff in test_text.py. [ ]
  • Misc improvements:
    • Drop unused subprocess imports. [ ][ ]
    • Drop an unused function in iso9600.py. [ ]
    • Inline a call and check of Config().force_details; no need for an additional variable in this particular method. [ ]
    • Remove an unnecessary return value from the Difference.check_for_ordering_differences method. [ ]
    • Remove unused logging facility from a few comparators. [ ]
    • Update copyright years. [ ][ ]
In addition, fridtjof added support for the ASAR .tar-like archive format. [ ][ ][ ][ ] and lastly, Vagrant Cascadian updated diffoscope in GNU Guix to version 285 [ ][ ] and 286 [ ][ ].
strip-nondeterminism is our sister tool to remove specific non-deterministic results from a completed build. This month version 1.14.1-1 was uploaded to Debian unstable by Chris Lamb, making the following the changes:
  • Clarify the --verbose and non --verbose output of bin/strip-nondeterminism so we don t imply we are normalizing files that we are not. [ ]
  • Bump Standards-Version to 4.7.0. [ ]

Website updates There were a large number of improvements made to our website this month, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Add support for rebuilding the armhf architecture. [ ][ ]
    • Add support for rebuilding the arm64 architecture. [ ][ ][ ][ ]
    • Add support for rebuilding the riscv64 architecture. [ ][ ]
    • Move the i386 builder to the osuosl5 node. [ ][ ][ ][ ]
    • Don t run our rebuilders on a public port. [ ][ ]
    • Add database backups on all builders and add links. [ ][ ]
    • Rework and dramatically improve the statistics collection and generation. [ ][ ][ ][ ][ ][ ]
    • Add contact info to the main page [ ], thumbnails [ ] as well as the new, missing architectures. [ ]
    • Move the amd64 worker to the osuosl4 and node. [ ]
    • Run the underlying debrebuild script under nice. [ ]
    • Try to use TMPDIR when calling debrebuild. [ ][ ]
  • buildinfos.debian.net-related:
    • Stop creating buildinfo-pool_$ suite _$ arch .list files. [ ]
    • Temporarily disable automatic updates of pool links. [ ]
  • FreeBSD-related:
    • Fix the sudoers to actually permit builds. [ ]
    • Disable debug output for FreeBSD rebuilding jobs. [ ]
    • Upgrade to FreeBSD 14.2 [ ] and document that bmake was installed on the underlying FreeBSD virtual machine image [ ].
  • Misc:
    • Update the real year to 2025. [ ]
    • Don t try to install a Debian bookworm kernel from backports on the infom08 node which is running Debian trixie. [ ]
    • Don t warn about system updates for systems running Debian testing. [ ]
    • Fix a typo in the ZOMBIES definition. [ ][ ]
In addition:
  • Ed Maste modified the FreeBSD build system to the clean the object directory before commencing a build. [ ]
  • Gioele Barabucci updated the rebuilder stats to first add a category for network errors [ ] as well as to categorise failures without a diffoscope log [ ].
  • Jessica Clarke also made some FreeBSD-related changes, including:
    • Ensuring we clean up the object directory for second build as well. [ ][ ]
    • Updating the sudoers for the relevant rm -rf command. [ ]
    • Update the cleanup_tmpdirs method to to match other removals. [ ]
  • Jochen Sprickerhof:
  • Roland Clobus:
    • Update the reproducible_debstrap job to call Debian s debootstrap with the full path [ ] and to use eatmydata as well [ ][ ].
    • Make some changes to deduce the CPU load in the debian_live_build job. [ ]
Lastly, both Holger Levsen [ ] and Vagrant Cascadian [ ] performed some node maintenance.
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

4 February 2025

Paul Wise: FLOSS Activities January 2025

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Sponsors All work was done on a volunteer basis.

2 February 2025

Dirk Eddelbuettel: RcppUUID 1.1.2 on CRAN: Newly Adopted Package

The RcppUUID package on CRAN has been providing UUIDs (based on the underlying Boost library) for several years. Written by Artem Klemsov and maintained in this gitlab repo, the package is a very nice example of clean and straightforward library binding. When we did our annual BH upgrade to 1.87.0 and check reverse dependencies, we noticed the RcppUUID needed a small and rather minor update which we showed as a short diff in an issue filed. Neither I nor CRAN heard from Artem, so the packaged ended up being archived last week. Which in turn lead me to make this minimal update to 1.1.2 to resurrect it, which CRAN processed more or less like a regular update given this explanation and so it arrived last Friday. Courtesy of my CRANberries, there is also a a new package note (no diffstat report yet). More detailed information is on the RcppUUID page, or the github repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

Joachim Breitner: Coding on my eInk Tablet

For many years I wished I had a setup that would allow me to work (that is, code) productively outside in the bright sun. It s winter right now, but when its summer again it s always a bit. this weekend I got closer to that goal. TL;DR: Using code-server on a beefy machine seems to be quite neat.
Passively lit coding Passively lit coding

Personal history Looking back at my own old blog entries I find one from 10 years ago describing how I bought a Kobo eBook reader with the intent of using it as an external monitor for my laptop. It seems that I got a proof-of-concept setup working, using VNC, but it was tedious to set up, and I never actually used that. I subsequently noticed that the eBook reader is rather useful to read eBooks, and it has been in heavy use for that every since. Four years ago I gave this old idea another shot and bought an Onyx BOOX Max Lumi. This is an A4-sized tablet running Android and had the very promising feature of an HDMI input. So hopefully I d attach it to my laptop and it just works . Turns out that this never worked as well as I hoped: Even if I set the resolution to exactly the tablet s screen s resolution I got blurry output, and it also drained the battery a lot, so I gave up on this. I subsequently noticed that the tablet is rather useful to take notes, and it has been in sporadic use for that. Going off on this tangent: I later learned that the HDMI input of this device appears to the system like a camera input, and I don t have to use Boox s monitor app but could other apps like FreeDCam as well. This somehow managed to fix the resolution issues, but the setup still wasn t as convenient to be used regularly. I also played around with pure terminal approaches, e.g. SSH ing into a system, but since my usual workflow was never purely text-based (I was at least used to using a window manager instead of a terminal multiplexer like screen or tmux) that never led anywhere either.

VSCode, working remotely Since these attempts I have started a new job working on the Lean theorem prover, and working on or with Lean basically means using VSCode. (There is a very good neovim plugin as well, but I m using VSCode nevertheless, if only to make sure I am dogfooding our default user experience). My colleagues have said good things about using VSCode with the remote SSH extension to work on a beefy machine, so I gave this a try now as well, and while it s not a complete game changer for me, it does make certain tasks (rebuilding everything after a switching branches, running the test suite) very convenient. And it s a bit spooky to run these work loads without the laptop s fan spinning up. In this setup, the workspace is remote, but VSCode still runs locally. But it made me wonder about my old goal of being able to work reasonably efficient on my eInk tablet. Can I replicate this setup there? VSCode itself doesn t run on Android directly. There are project that run a Linux chroot or in termux on the Android system, and then you can VNC to connect to it (e.g. on Andronix) but that did not seem promising. It seemed fiddly, and I probably should take it easy on the tablet s system.

code-server, running remotely A more promising option is code-server. This is a fork of VSCode (actually of VSCodium) that runs completely on the remote machine, and the client machine just needs a browser. I set that up this weekend and found that I was able to do a little bit of work reasonably.

Access With code-server one has to decide how to expose it safely enough. I decided against the tunnel-over-SSH option, as I expected that to be somewhat tedious to set up (both initially and for each session) on the android system, and I liked the idea of being able to use any device to work in my environment. I also decided against the more involved reverse proxy behind proper hostname with SSL setups, because they involve a few extra steps, and some of them I cannot do as I do not have root access on the shared beefy machine I wanted to use. That left me with the option of using a code-server s built-in support for self-signed certificates and a password:
$ cat .config/code-server/config.yaml
bind-addr: 1.2.3.4:8080
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: true
With trust-on-first-use this seems reasonably secure. Update: I noticed that the browsers would forget that I trust this self-signed cert after restarting the browser, and also that I cannot install the page (as a Progressive Web App) unless it has a valid certificate. But since I don t have superuser access to that machine, I can t just follow the official recommendation of using a reverse proxy on port 80 or 431 with automatic certificates. Instead, I pointed a hostname that I control to that machine, obtained a certificate manually on my laptop (using acme.sh) and copied the files over, so the configuration now reads as follows:
bind-addr: 1.2.3.4:3933
auth: password
password: xxxxxxxxxxxxxxxxxxxxxxxx
cert: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.cer
cert-key: .acme.sh/foobar.nomeata.de_ecc/foobar.nomeata.de.key
(This is getting very specific to my particular needs and constraints, so I ll spare you the details.)

Service To keep code-server running I created a systemd service that s managed by my user s systemd instance:
~ $ cat ~/.config/systemd/user/code-server.service
[Unit]
Description=code-server
After=network-online.target
[Service]
Environment=PATH=/home/joachim/.nix-profile/bin:/nix/var/nix/profiles/default/bin:/usr/local/bin:/usr/bin:/bin:/usr/local/games:/usr/games
ExecStart=/nix/var/nix/profiles/default/bin/nix run nixpkgs#code-server
[Install]
WantedBy=default.target
(I am using nix as a package manager on a Debian system there, hence the additional PATH and complex ExecStart. If you have a more conventional setup then you do not have to worry about Environment and can likely use ExecStart=code-server. For this to survive me logging out I had to ask the system administrator to run loginctl enable-linger joachim, so that systemd allows my jobs to linger.

Git credentials The next issue to be solved was how to access the git repositories. The work is all on public repositories, but I still need a way to push my work. With the classic VSCode-SSH-remote setup from my laptop, this is no problem: My local SSH key is forwarded using the SSH agent, so I can seamlessly use that on the other side. But with code-server there is no SSH key involved. I could create a new SSH key and store it on the server. That did not seem appealing, though, because SSH keys on Github always have full access. It wouldn t be horrible, but I still wondered if I can do better. I thought of creating fine-grained personal access tokens that only me to push code to specific repositories, and nothing else, and just store them permanently on the remote server. Still a neat and convenient option, but creating PATs for our org requires approval and I didn t want to bother anyone on the weekend. So I am experimenting with Github s git-credential-manager now. I have configured it to use git s credential cache with an elevated timeout, so that once I log in, I don t have to again for one workday.
$ nix-env -iA nixpkgs.git-credential-manager
$ git-credential-manager configure
$ git config --global credential.credentialStore cache
$ git config --global credential.cacheOptions "--timeout 36000"
To login, I have to https://github.com/login/device on an authenticated device (e.g. my phone) and enter a 8-character code. Not too shabby in terms of security. I only wish that webpage would not require me to press Tab after each character This still grants rather broad permissions to the code-server, but at least only temporarily

Android setup On the client side I could now open https://host.example.com:8080 in Firefox on my eInk Android tablet, click through the warning about self-signed certificates, log in with the fixed password mentioned above, and start working! I switched to a theme that supposedly is eInk-optimized (eInk by Mufanza). It s not perfect (e.g. git diffs are unhelpful because it is not possible to distinguish deleted from added lines), but it s a start. There are more eInk themes on the official Visual Studio Marketplace, but because code-server is a fork it cannot use that marketplace, and for example this theme isn t on Open-VSX. For some reason the F11 key doesn t work, but going fullscreen is crucial, because screen estate is scarce in this setup. I can go fullscreen using VSCode s command palette (Ctrl-P) and invoking the command there, but Firefox often jumps out of the fullscreen mode, which is annoying. I still have to pay attention to when that s happening; maybe its the Esc key, which I am of course using a lot due to me using vim bindings. A more annoying problem was that on my Boox tablet, sometimes the on-screen keyboard would pop up, which is seriously annoying! It took me a while to track this down: The Boox has two virtual keyboards installed: The usual Google ASOP keyboard, and the Onyx Keyboard. The former is clever enough to stay hidden when there is a physical keyboard attached, but the latter isn t. Moreover, pressing Shift-Ctrl on the physical keyboard rotates through the virtual keyboards. Now, VSCode has many keyboard shortcuts that require Shift-Ctrl (especially on an eInk device, where you really want to avoid using the mouse). And the limited settings exposed by the Boox Android system do not allow you configure that or disable the Onyx keyboard! To solve this, I had to install the KISS Launcher, which would allow me to see more Android settings, and in particular allow me to disable the Onyx keyboard. So this is fixed. I was hoping to improve the experience even more by opening the web page as a Progressive Web App (PWA), as described in the code-server FAQ. Unfortunately, that did not work. Firefox on Android did not recognize the site as a PWA (even though it recognizes a PWA test page). And I couldn t use Chrome either because (unlike Firefox) it would not consider a site with a self-signed certificate as a secure context, and then code-server does not work fully. Maybe this is just some bug that gets fixed in later versions. Now that I use a proper certificate, I can use it as a Progressive Web App, and with Firefox on Android this starts the app in full-screen mode (no system bars, no location bar). The F11 key still does t work, and using the command palette to enter fullscreen does nothing visible, but then Esc leaves that fullscreen mode and I suddenly have the system bars again. But maybe if I just don t do that I get the full screen experience. We ll see. I did not work enough with this yet to assess how much the smaller screen estate, the lack of colors and the slower refresh rate will bother me. I probably need to hide Lean s InfoView more often, and maybe use the Error Lens extension, to avoid having to split my screen vertically. I also cannot easily work on a park bench this way, with a tablet and a separate external keyboard. I d need at least a table, or some additional piece of hardware that turns tablet + keyboard into some laptop-like structure that I can put on my, well, lap. There are cases for Onyx products that include a keyboard, and maybe they work on the lap, but they don t have the Trackpoint that I have on my ThinkPad TrackPoint Keyboard II, and how can you live without that?

Conclusion After this initial setup chances are good that entering and using this environment is convenient enough for me to actually use it; we will see when it gets warmer. A few bits could be better. In particular logging in and authenticating GitHub access could be both more convenient and more safe I could imagine that when I open the page I confirm that on my phone (maybe with a fingerprint), and that temporarily grants access to the code-server and to specific GitHub repositories only. Is that easily possible?

Anuradha Weeraman: DeepSeek-R1, at the cusp of an open revolution

DeepSeek-R1, at the cusp of an open revolutionDeepSeek R1, the new entrant to the Large Language Model wars has created quite a splash over the last few weeks. Its entrance into a space dominated by the Big Corps, while pursuing asymmetric and novel strategies has been a refreshing eye-opener.GPT AI improvement was starting to show signs of slowing down, and has been observed to be reaching a point of diminishing returns as it runs out of data and compute required to train, fine-tune increasingly large models. This has turned the focus towards building "reasoning" models that are post-trained through reinforcement learning, techniques such as inference-time and test-time scaling and search algorithms to make the models appear to think and reason better. OpenAI&aposs o1-series models were the first to achieve this successfully with its inference-time scaling and Chain-of-Thought reasoning.

Intelligence as an emergent property of Reinforcement Learning (RL)Reinforcement Learning (RL) has been successfully used in the past by Google&aposs DeepMind team to build highly intelligent and specialized systems where intelligence is observed as an emergent property through rewards-based training approach that yielded achievements like AlphaGo (see my post on it here - AlphaGo: a journey to machine intuition).DeepMind went on to build a series of Alpha* projects that achieved many notable feats using RL:
  • AlphaGo, defeated the world champion Lee Seedol in the game of Go
  • AlphaZero, a generalized system that learned to play games such as Chess, Shogi and Go without human input
  • AlphaStar, achieved high performance in the complex real-time strategy game StarCraft II.
  • AlphaFold, a tool for predicting protein structures which significantly advanced computational biology.
  • AlphaCode, a model designed to generate computer programs, performing competitively in coding challenges.
  • AlphaDev, a system developed to discover novel algorithms, notably optimizing sorting algorithms beyond human-derived methods.
All of these systems achieved mastery in its own area through self-training/self-play and by optimizing and maximizing the cumulative reward over time by interacting with its environment where intelligence was observed as an emergent property of the system.
DeepSeek-R1, at the cusp of an open revolutionThe RL feedback loop
RL mimics the process through which a baby would learn to walk, through trial, error and first principles.

R1 model training pipelineAt a technical level, DeepSeek-R1 leverages a combination of Reinforcement Learning (RL) and Supervised Fine-Tuning (SFT) for its training pipeline:
DeepSeek-R1, at the cusp of an open revolutionDeepSeek-R1 Model Training Pipeline
Using RL and DeepSeek-v3, an interim reasoning model was built, called DeepSeek-R1-Zero, purely based on RL without relying on SFT, which demonstrated superior reasoning capabilities that matched the performance of OpenAI&aposs o1 in certain benchmarks such as AIME 2024.The model was however affected by poor readability and language-mixing and is only an interim-reasoning model built on RL principles and self-evolution.DeepSeek-R1-Zero was then used to generate SFT data, which was combined with supervised data from DeepSeek-v3 to re-train the DeepSeek-v3-Base model.The new DeepSeek-v3-Base model then underwent additional RL with prompts and scenarios to come up with the DeepSeek-R1 model.The R1-model was then used to distill a number of smaller open source models such as Llama-8b, Qwen-7b, 14b which outperformed bigger models by a large margin, effectively making the smaller models more accessible and usable.

Key contributions of DeepSeek-R1
  1. RL without the need for SFT for emergent reasoning capabilities
R1 was the first open research project to validate the efficacy of RL directly on the base model without relying on SFT as a first step, which resulted in the model developing advanced reasoning capabilities purely through self-reflection and self-verification.Although, it did degrade in its language capabilities during the process, its Chain-of-Thought (CoT) capabilities for solving complex problems was later used for further RL on the DeepSeek-v3-Base model which became R1. This is a significant contribution back to the research community.The below analysis of DeepSeek-R1-Zero and OpenAI o1-0912 shows that it is viable to attain robust reasoning capabilities purely through RL alone, which can be further augmented with other techniques to deliver even better reasoning performance.
DeepSeek-R1, at the cusp of an open revolutionSource: https://github.com/deepseek-ai/DeepSeek-R1
Its quite interesting, that the application of RL gives rise to seemingly human capabilities of "reflection", and arriving at "aha" moments, causing it to pause, ponder and focus on a specific aspect of the problem, resulting in emergent capabilities to problem-solve as humans do.
  1. Model distillation
DeepSeek-R1 also demonstrated that larger models can be distilled into smaller models which makes advanced capabilities accessible to resource-constrained environments, such as your laptop. While its not possible to run a 671b model on a stock laptop, you can still run a distilled 14b model that is distilled from the larger model which still performs better than most publicly available models out there. This enables intelligence to be brought closer to the edge, to allow faster inference at the point of experience (such as on a smartphone, or on a Raspberry Pi), which paves way for more use cases and possibilities for innovation.
DeepSeek-R1, at the cusp of an open revolutionSource: https://github.com/deepseek-ai/DeepSeek-R1
Distilled models are very different to R1, which is a massive model with a completely different model architecture than the distilled variants, and so are not directly comparable in terms of capability, but are instead built to be more smaller and efficient for more constrained environments. This technique of being able to distill a larger model&aposs capabilities down to a smaller model for portability, accessibility, speed, and cost will bring about a lot of possibilities for applying artificial intelligence in places where it would have otherwise not been possible. This is another key contribution of this technology from DeepSeek, which I believe has even further potential for democratization and accessibility of AI.
DeepSeek-R1, at the cusp of an open revolution

Why is this moment so significant?DeepSeek-R1 was a pivotal contribution in many ways.
  1. The contributions to the state-of-the-art and the open research helps move the field forward where everybody benefits, not just a few highly funded AI labs building the next billion dollar model.
  2. Open-sourcing and making the model freely available follows an asymmetric strategy to the prevailing closed nature of much of the model-sphere of the larger players. DeepSeek should be commended for making their contributions free and open.
  3. It reminds us that its not just a one-horse race, and it incentivizes competition, which has already resulted in OpenAI o3-mini a cost-effective reasoning model which now shows the Chain-of-Thought reasoning. Competition is a good thing.
  4. We stand at the cusp of an explosion of small-models that are hyper-specialized, and optimized for a specific use case that can be trained and deployed cheaply for solving problems at the edge. It raises a lot of exciting possibilities and is why DeepSeek-R1 is one of the most pivotal moments of tech history.
Truly exciting times. What will you build?

1 February 2025

Guido G nther: Free Software Activities January 2025

Another short status update of what happened on my side last month. Mostly focused on quality of life improvements in phosh and cleaning up and improving phoc this time around (including catching up with wlroots git) but some improvements for other things like phosh-osk-stub happened on the side line too. phosh phoc phosh-osk-stub xdg-desktop-portal-phosh phosh-recipes libcmatrix phrog Debian git-buildpackage livi feedbackd Wayland protocols Wlroots Bug reports Reviews This is not code by me but reviews on other peoples code. The list is incomplete, but I hope to improve on this in the upcoming months. Thanks for the contributions! Help Development If you want to support my work see donations. Comments? Join the Fediverse thread

31 January 2025

Gunnar Wolf: ChatGPT is bullshit

This post is an unpublished review for ChatGPT is bullshit
As people around the world understand how LLMs behave, more and more people wonder as to why these models hallucinate, and what can be done about to reduce it. This provocatively named article by Michael Townsen Hicks, James Humphries and Joe Slater bring is an excellent primer to better understanding how LLMs work and what to expect from them. As humans carrying out our relations using our language as the main tool, we are easily at awe with the apparent ease with which ChatGPT (the first widely available, and to this day probably the best known, LLM-based automated chatbot) simulates human-like understanding and how it helps us to easily carry out even daunting data aggregation tasks. It is common that people ask ChatGPT for an answer and, if it gets part of the answer wrong, they justify it by stating that it s just a hallucination. Townsen et al. invite us to switch from that characterization to a more correct one: LLMs are bullshitting. This term is formally presented by Frankfurt [1]. To Bullshit is not the same as to lie, because lying requires to know (and want to cover) the truth. A bullshitter not necessarily knows the truth, they just have to provide a compelling description, regardless of what is really aligned with truth. After introducing Frankfurt s ideas, the authors explain the fundamental ideas behind LLM-based chatbots such as ChatGPT; a Generative Pre-trained Transformer (GPT) s have as their only goal to produce human-like text, and it is carried out mainly by presenting output that matches the input s high-dimensional abstract vector representation, and probabilistically outputs the next token (word) iteratively with the text produced so far. Clearly, a GPT s ask is not to seek truth or to convey useful information they are built to provide a normal-seeming response to the prompts provided by their user. Core data are not queried to find optimal solutions for the user s requests, but are generated on the requested topic, attempting to mimic the style of document set it was trained with. Erroneous data emitted by a LLM is, thus, not equiparable with what a person could hallucinate with, but appears because the model has no understanding of truth; in a way, this is very fitting with the current state of the world, a time often termed as the age of post-truth [2]. Requesting an LLM to provide truth in its answers is basically impossible, given the difference between intelligence and consciousness: Following Harari s definitions [3], LLM systems, or any AI-based system, can be seen as intelligent, as they have the ability to attain goals in various, flexible ways, but they cannot be seen as conscious, as they have no ability to experience subjectivity. This is, the LLM is, by definition, bullshitting its way towards an answer: their goal is to provide an answer, not to interpret the world in a trustworthy way. The authors close their article with a plea for literature on the topic to adopt the more correct bullshit term instead of the vacuous, anthropomorphizing hallucination . Of course, being the word already loaded with a negative meaning, it is an unlikely request. This is a great article that mixes together Computer Science and Philosophy, and can shed some light on a topic that is hard to grasp for many users. [1] Frankfurt, Harry (2005). On Bullshit. Princeton University Press. [2] Zoglauer, Thomas (2023). Constructed truths: truth and knowledge in a post-truth world. Springer. [3] Harari, Yuval Noah (2023. Nexus: A Brief History of Information Networks From the Stone Age to AI. Random House.

Next.

Previous.