Search Results: "eric"

13 April 2025

Keith Packard: sanitizer-fun

Fun with -fsanitize=undefined and Picolibc Both GCC and Clang support the -fsanitize=undefined flag which instruments the generated code to detect places where the program wanders into parts of the C language specification which are either undefined or implementation defined. Many of these are also common programming errors. It would be great if there were sanitizers for other easily detected bugs, but for now, at least the undefined sanitizer does catch several useful problems. Supporting the sanitizer The sanitizer can be built to either trap on any error or call handlers. In both modes, the same problems are identified, but when trap mode is enabled, the compiler inserts a trap instruction and doesn't expect the program to continue running. When handlers are in use, each identified issue is tagged with a bunch of useful data and then a specific sanitizer handling function is called. The specific functions are not all that well documented, nor are the parameters they receive. Maybe this is because both compilers provide an implementation of all of the functions they use and don't really expect external implementations to exist? However, to make these useful in an embedded environment, picolibc needs to provide a complete set of handlers that support all versions both gcc and clang as the compiler-provided versions depend upon specific C (and C++) libraries. Of course, programs can be built in trap-on-error mode, but that makes it much more difficult to figure out what went wrong. Fixing Sanitizer Issues Once the sanitizer handlers were implemented, picolibc could be built with them enabled and all of the picolibc tests run to uncover issues within the library. As with the static analyzer adventure from last year, the vast bulk of sanitizer complaints came from invoking undefined or implementation-defined behavior in harmless ways: Signed integer shifts This is one area where the C language spec is just wrong. For left shift, before C99, it worked on signed integers as a bit-wise operator, equivalent to the operator on unsigned integers. After that, left shift of negative integers became undefined. Fortunately, it's straightforward (if tedious) to work around this issue by just casting the operand to unsigned, performing the shift and casting it back to the original type. Picolibc now has an internal macro, lsl, which does this:
    #define lsl(__x,__s) ((sizeof(__x) == sizeof(char)) ?                   \
                          (__typeof(__x)) ((unsigned char) (__x) << (__s)) :  \
                          (sizeof(__x) == sizeof(short)) ?                  \
                          (__typeof(__x)) ((unsigned short) (__x) << (__s)) : \
                          (sizeof(__x) == sizeof(int)) ?                    \
                          (__typeof(__x)) ((unsigned int) (__x) << (__s)) :   \
                          (sizeof(__x) == sizeof(long)) ?                   \
                          (__typeof(__x)) ((unsigned long) (__x) << (__s)) :  \
                          (sizeof(__x) == sizeof(long long)) ?              \
                          (__typeof(__x)) ((unsigned long long) (__x) << (__s)) : \
                          __undefined_shift_size(__x, __s))
Right shift is significantly more complicated to implement. What we want is an arithmetic shift with the sign bit being replicated as the value is shifted rightwards. C defines no such operator. Instead, right shift of negative integers is implementation defined. Fortunately, both gcc and clang define the >> operator on signed integers as arithmetic shift. Also fortunately, C hasn't made this undefined, so the program itself doesn't end up undefined. The trouble with arithmetic right shift is that it is not equivalent to right shift of unsigned values. Here's what Per Vognsen came up with using standard C operators:
    int
    __asr_int(int x, int s)  
        return x < 0 ? ~(~x >> s) : x >> s;
     
When the value is negative, we invert all of the bits (making it positive), shift right, then flip all of the bits back. Both GCC and Clang seem to compile this to a single asr instruction. This function is replicated for each of the five standard integer types and then the set of them wrapped in another sizeof-selecting macro:
    #define asr(__x,__s) ((sizeof(__x) == sizeof(char)) ?           \
                          (__typeof(__x))__asr_char(__x, __s) :       \
                          (sizeof(__x) == sizeof(short)) ?          \
                          (__typeof(__x))__asr_short(__x, __s) :      \
                          (sizeof(__x) == sizeof(int)) ?            \
                          (__typeof(__x))__asr_int(__x, __s) :        \
                          (sizeof(__x) == sizeof(long)) ?           \
                          (__typeof(__x))__asr_long(__x, __s) :       \
                          (sizeof(__x) == sizeof(long long)) ?      \
                          (__typeof(__x))__asr_long_long(__x, __s):   \
                          __undefined_shift_size(__x, __s))
The lsl and asr macros use sizeof instead of the type-generic mechanism to remain compatible with compilers that lack type-generic support. Once these macros were written, they needed to be applied where required. To preserve the benefits of detecting programming errors, they were only applied where required, not blindly across the whole codebase. There are a couple of common patterns in the math code using shift operators. One is when computing the exponent value for subnormal numbers.
for (ix = -1022, i = hx << 11; i > 0; i <<= 1)
    ix -= 1;
This code computes the exponent by shifting the significand left by 11 bits (the width of the exponent field) and then incrementally shifting it one bit at a time until the sign flips, which indicates that the most-significant bit is set. Use of the pre-C99 definition of the left shift operator is intentional here; so both shifts are replaced with our lsl operator. In the implementation of pow, the final exponent is computed as the sum of the two exponents, both of which are in the allowed range. The resulting sum is then tested to see if it is zero or negative to see if the final value is sub-normal:
hx += n << 20;
if (hx >> 20 <= 0)
    /* do sub-normal things */
In this case, the exponent adjustment, n, is a signed value and so that shift is replaced with the lsl macro. The test value needs to compute the correct the sign bit, so we replace this with the asr macro. Because the right shift operation is not undefined, we only use our fancy macro above when the undefined behavior sanitizer is enabled. On the other hand, the lsl macro should have zero cost and covers undefined behavior, so it is always used. Actual Bugs Found! The goal of this little adventure was both to make using the undefined behavior sanitizer with picolibc possible as well as to use the sanitizer to identify bugs in the library code. I fully expected that most of the effort would be spent masking harmless undefined behavior instances, but was hopeful that the effort would also uncover real bugs in the code. I was not disappointed. Through this work, I found (and fixed) eight bugs in the code:
  1. setlocale/newlocale didn't check for NULL locale names
  2. qsort was using uintptr_t to swap data around. On MSP430 in 'large' mode, that's a 20-bit type inside a 32-bit representation.
  3. random() was returning values in int range rather than long.
  4. m68k assembly for memcpy was broken for sizes > 64kB.
  5. freopen returned NULL, even on success
  6. The optimized version of memrchr was always performing unaligned accesses.
  7. String to float conversion had a table missing four values. This caused an array access overflow which resulted in imprecise values in some cases.
  8. vfwscanf mis-parsed floating point values by assuming that wchar_t was unsigned.
Sanitizer Wishes While it's great to have a way to detect places in your C code which evoke undefined and implementation defined behaviors, it seems like this tooling could easily be extended to detect other common programming mistakes, even where the code is well defined according to the language spec. An obvious example is in unsigned arithmetic. How many bugs come from this seemingly innocuous line of code?
    p = malloc(sizeof(*p) * c);
Because sizeof returns an unsigned value, the resulting computation never results in undefined behavior, even when the multiplication wraps around, so even with the undefined behavior sanitizer enabled, this bug will not be caught. Clang seems to have an unsigned integer overflow sanitizer which should do this, but I couldn't find anything like this in gcc. Summary The undefined behavior sanitizers present in clang and gcc both provide useful diagnostics which uncover some common programming errors. In most cases, replacing undefined behavior with defined behavior is straightforward, although the lack of an arithmetic right shift operator in standard C is irksome. I recommend anyone using C to give it a try.

30 March 2025

Russ Allbery: Review: Cascade Failure

Review: Cascade Failure, by L.M. Sagas
Series: Ambit's Run #1
Publisher: Tor
Copyright: 2024
ISBN: 1-250-87126-3
Format: Kindle
Pages: 407
Cascade Failure is a far-future science fiction adventure with a small helping of cyberpunk vibes. It is the first of a (so far) two-book series, and was the author's first novel. The Ambit is an old and small Guild ship, not much to look at, but it holds a couple of surprises. One is its captain, Eoan, who is an AI with a deep and insatiable curiosity that has driven them and their ship farther and farther out into the Spiral. The other is its surprisingly competent crew: a battle-scarred veteran named Saint who handles the fighting, and a talented engineer named Nash who does literally everything else. The novel opens with them taking on supplies at Aron Outpost. A supposed Guild deserter named Jalsen wanders into the ship looking for work. An AI ship with a found-family crew is normally my catnip, so I wanted to love this book. Alas, I did not. There were parts I liked. Nash is great: snarky, competent, and direct. Eoan is a bit distant and slightly more simplistic of a character than I was expecting, but I appreciated the way Sagas put them firmly in charge of the ship and departed from the conventional AI character presentation. Once the plot starts in earnest (more on that in a moment), we meet Anke, the computer hacker, whose charming anxiety reaction is a complete inability to stop talking and who adds some needed depth to the character interactions. There's plenty of action, a plot that makes at least some sense, and a few moments that almost achieved the emotional payoff the author was attempting. Unfortunately, most of the story focuses on Saint and Jal, and both of them are irritatingly dense cliches. The moment Jal wanders onto the Ambit in the first chapter, the reader is informed that Jal, Saint, and Eoan have a history. The crew of the Ambit spent a year looking for Jal and aren't letting go of him now that they've found him. Jal, on the other hand, clearly blames Saint for something and is not inclined to trust him. Okay, fine, a bit generic of a setup but the writing moved right along and I was curious enough. It then takes a full 180 pages before the reader finds out what the hell is going on with Saint and Jal. Predictably, it's a stupid misunderstanding that could have been cleared up with one conversation in the second chapter. Cascade Failure does not contain a romance (and to the extent that it hints at one, it's a sapphic romance), but I swear Saint and Jal are both the male protagonist from a certain type of stereotypical heterosexual romance novel. They're both the brooding man with the past, who is too hurt to trust anyone and assumes the worst because he's unable to use his words or ask an open question and then listen to the answer. The first half of this book is them being sullen at each other at great length while both of them feel miserable. Jal keeps doing weird and suspicious things to resolve a problem that would have been far more easily resolved by the rest of the crew if he would offer any explanation at all. It's not even suspenseful; we've read about this character enough times to know that he'll turn out to have a heart of gold and everything will be a misunderstanding. I found it tedious. Maybe people who like slow burn romances with this character type will have a less negative reaction. The real plot starts at about the time Saint and Jal finally get their shit sorted out. It turns out to have almost nothing to do with either of them. The environmental control systems of worlds are suddenly failing (hence the book title), and Anke, the late-arriving computer programmer and terraforming specialist, has a rather wild theory about what's happening. This leads to a lot of action, some decent twists, and a plot that felt very cyberpunk to me, although unfortunately it culminates in an absurdly-cliched action climax. This book is an action movie that desperately wants to make you feel all the feels, and it worked about as well as that typically works in action movies for me. Jaded cynicism and an inability to communicate are not the ways to get me to have an emotional reaction to a book, and Jal (once he finally starts talking) is so ridiculously earnest that it's like reading the adventures of a Labrador puppy. There was enough going on that it kept me reading, but not enough for the story to feel satisfying. I needed a twist, some depth, way more Nash and Anke and way less of the men, something. Everyone is going to compare this book to Firefly, but Firefly had better banter, created more complex character interactions due to the larger and more varied crew, and played the cynical mercenary for laughs instead of straight, all of which suited me better. This is not a bad book, particularly once it gets past the halfway point, but it's not that memorable either, at least for me. If you're looking for a space adventure with heavy action hero and military SF vibes that wants to be about Big Feelings but gets there in mostly obvious ways, you could do worse. If you're looking for a found-family starship crew story more like Becky Chambers, I think you'll find this one a bit too shallow and obvious. Not really recommended, although there's nothing that wrong with it and I'm sure other people's experience will differ. Followed by Gravity Lost, which I'm unlikely to read. Rating: 6 out of 10

28 March 2025

John Goerzen: Why You Should (Still) Use Signal As Much As Possible

As I write this in March 2025, there is a lot of confusion about Signal messenger due to the recent news of people using Signal in government, and subsequent leaks. The short version is: there was no problem with Signal here. People were using it because they understood it to be secure, not the other way around. Both the government and the Electronic Frontier Foundation recommend people use Signal. This is an unusual alliance, and in the case of the government, was prompted because it understood other countries had a persistent attack against American telephone companies and SMS traffic. So let s dive in. I ll cover some basics of what security is, what happened in this situation, and why Signal is a good idea. This post isn t for programmers that work with cryptography every day. Rather, I hope it can make some of these concepts accessible to everyone else.

What makes communications secure? When most people are talking about secure communications, they mean some combination of these properties:
  1. Privacy - nobody except the intended recipient can decode a message.
  2. Authentication - guarantees that the person you are chatting with really is the intended recipient.
  3. Ephemerality - preventing a record of the communication from being stored. That is, making it more like a conversation around the table than a written email.
  4. Anonymity - keeping your set of contacts to yourself and even obfuscating the fact that communications are occurring.
If you think about it, most people care the most about the first two. In fact, authentication is a key part of privacy. There is an attack known as man in the middle in which somebody pretends to be the intended recipient. The interceptor reads the messages, and then passes them on to the real intended recipient. So we can t really have privacy without authentication. I ll have more to say about these later. For now, let s discuss attack scenarios.

What compromises security? There are a number of ways that security can be compromised. Let s think through some of them:

Communications infrastructure snooping Let s say you used no encryption at all, and connected to public WiFi in a coffee shop to send your message. Who all could potentially see it?
  • The owner of the coffee shop s WiFi
  • The coffee shop s Internet provider
  • The recipient s Internet provider
  • Any Internet providers along the network between the sender and the recipient
  • Any government or institution that can compel any of the above to hand over copies of the traffic
  • Any hackers that compromise any of the above systems
Back in the early days of the Internet, most traffic had no encryption. People were careful about putting their credit cards into webpages and emails because they knew it was easy to intercept them. We have been on a decades-long evolution towards more pervasive encryption, which is a good thing. Text messages (SMS) follow a similar path to the above scenario, and are unencrypted. We know that all of the above are ways people s texts can be compromised; for instance, governments can issue search warrants to obtain copies of texts, and China is believed to have a persistent hack into western telcos. SMS fails all four of our attributes of secure communication above (privacy, authentication, ephemerality, and anonymity). Also, think about what information is collected from SMS and by who. Texts you send could be retained in your phone, the recipient s phone, your phone company, their phone company, and so forth. They might also live in cloud backups of your devices. You only have control over your own phone s retention. So defenses against this involve things like:
  • Strong end-to-end encryption, so no intermediate party even the people that make the app can snoop on it.
  • Using strong authentication of your peers
  • Taking steps to prevent even app developers from being able to see your contact list or communication history
You may see some other apps saying they use strong encryption or use the Signal protocol. But while they may do that for some or all of your message content, they may still upload your contact list, history, location, etc. to a central location where it is still vulnerable to these kinds of attacks. When you think about anonymity, think about it like this: if you send a letter to a friend every week, every postal carrier that transports it even if they never open it or attempt to peak inside will be able to read the envelope and know that you communicate on a certain schedule with that friend. The same can be said of SMS, email, or most encrypted chat operators. Signal s design prevents it from retaining even this information, though nation-states or ISPs might still be able to notice patterns (every time you send something via Signal, your contact receives something from Signal a few milliseconds later). It is very difficult to provide perfect anonymity from well-funded adversaries, even if you can provide very good privacy.

Device compromise Let s say you use an app with strong end-to-end encryption. This takes away some of the easiest ways someone could get to your messages. But it doesn t take away all of them. What if somebody stole your phone? Perhaps the phone has a password, but if an attacker pulled out the storage unit, could they access your messages without a password? Or maybe they somehow trick or compel you into revealing your password. Now what? An even simpler attack doesn t require them to steal your device at all. All they need is a few minutes with it to steal your SIM card. Now they can receive any texts sent to your number - whether from your bank or your friend. Yikes, right? Signal stores your data in an encrypted form on your device. It can protect it in various ways. One of the most important protections is ephemerality - it can automatically delete your old texts. A text that is securely erased can never fall into the wrong hands if the device is compromised later. An actively-compromised phone, though, could still give up secrets. For instance, what if a malicious keyboard app sent every keypress to an adversary? Signal is only as secure as the phone it runs on but still, it protects against a wide variety of attacks.

Untrustworthy communication partner Perhaps you are sending sensitive information to a contact, but that person doesn t want to keep it in confidence. There is very little you can do about that technologically; with pretty much any tool out there, nothing stops them from taking a picture of your messages and handing the picture off.

Environmental compromise Perhaps your device is secure, but a hidden camera still captures what s on your screen. You can take some steps against things like this, of course.

Human error Sometimes humans make mistakes. For instance, the reason a reporter got copies of messages recently was because a participant in a group chat accidentally added him (presumably that participant meant to add someone else and just selected the wrong name). Phishing attacks can trick people into revealing passwords or other sensitive data. Humans are, quite often, the weakest link in the chain.

Protecting yourself So how can you protect yourself against these attacks? Let s consider:
  • Use a secure app like Signal that uses strong end-to-end encryption where even the provider can t access your messages
  • Keep your software and phone up-to-date
  • Be careful about phishing attacks and who you add to chat rooms
  • Be aware of your surroundings; don t send sensitive messages where people might be looking over your shoulder with their eyes or cameras
There are other methods besides Signal. For instance, you could install GnuPG (GPG) on a laptop that has no WiFi card or any other way to connect it to the Internet. You could always type your messages on that laptop, encrypt them, copy the encrypted text to a floppy disk (or USB device), take that USB drive to your Internet computer, and send the encrypted message by email or something. It would be exceptionally difficult to break the privacy of messages in that case (though anonymity would be mostly lost). Even if someone got the password to your secure laptop, it wouldn t do them any good unless they physically broke into your house or something. In some ways, it is probably safer than Signal. (For more on this, see my article How gapped is your air?) But, that approach is hard to use. Many people aren t familiar with GnuPG. You don t have the convenience of sending a quick text message from anywhere. Security that is hard to use most often simply isn t used. That is, you and your friends will probably just revert back to using insecure SMS instead of this GnuPG approach because SMS is so much easier. Signal strikes a unique balance of providing very good security while also being practical, easy, and useful. For most people, it is the most secure option available. Signal is also open source; you don t have to trust that it is as secure as it says, because you can inspect it for yourself. Also, while it s not federated, I previously addressed that.

Government use If you are a government, particularly one that is highly consequential to the world, you can imagine that you are a huge target. Other nations are likely spending billions of dollars to compromise your communications. Signal itself might be secure, but if some other government can add spyware to your phones, or conduct a successful phishing attack, you can still have your communications compromised. I have no direct knowledge, but I think it is generally understood that the US government maintains communications networks that are entirely separate from the Internet and can only be accessed from secure physical locations and secure rooms. These can be even more secure than the average person using Signal because they can protect against things like environmental compromise, human error, and so forth. The scandal in March of 2025 happened because government employees were using Signal rather than official government tools for sensitive information, had taken advantage of Signal s ephemerality (laws require records to be kept), and through apparent human error had directly shared this information with a reporter. Presumably a reporter would have lacked access to the restricted communications networks in the first place, so that wouldn t have been possible. This doesn t mean that Signal is bad. It just means that somebody that can spend billions of dollars on security can be more secure than you. Signal is still a great tool for people, and in many cases defeats even those that can spend lots of dollars trying to defeat it. And remember - to use those restricted networks, you have to go to specific rooms in specific buildings. They are still not as convenient as what you carry around in your pocket.

Conclusion Signal is practical security. Do you want phone companies reading your messages? How about Facebook or X? Have those companies demonstrated that they are completely trustworthy throughout their entire history? I say no. So, go install Signal. It s the best, most practical tool we have.
This post is also available on my website, where it may be periodically updated.

24 March 2025

Simon Josefsson: Reproducible Software Releases

Around a year ago I discussed two concerns with software release archives (tarball artifacts) that could be improved to increase confidence in the supply-chain security of software releases. Repeating the goals for simplicity: While implementing these ideas for a small project was accomplished within weeks see my announcement of Libntlm version 1.8 adressing this in complex projects uncovered concerns with tools that had to be addressed, and things stalled for many months pending that work. I had the notion that these two goals were easy and shouldn t be hard to accomplish. I still believe that, but have had to realize that improving tooling to support these goals takes time. It seems clear that these concepts are not universally agreed on and implemented generally. I m now happy to recap some of the work that led to releases of libtasn1 v4.20.0, inetutils v2.6, libidn2 v2.3.8, libidn v1.43. These releases all achieve these goals. I am working on a bunch of more projects to support these ideas too. What have the obstacles so far been to make this happen? It may help others who are in the same process of addressing these concerns to have a high-level introduction to the issues I encountered. Source code for projects above are available and anyone can look at the solutions to learn how the problems are addressed. First let s look at the problems we need to solve to make git-archive style tarballs usable:

Version Handling To build usable binaries from a minimal tarballs, it need to know which version number it is. Traditionally this information was stored inside configure.ac in git. However I use gnulib s git-version-gen to infer the version number from the git tag or git commit instead. The git tag information is not available in a git-archive tarball. My solution to this was to make use of the export-subst feature of the .gitattributes file. I store the file .tarball-version-git in git containing the magic cookie like this:
$Format:%(describe)$
With this, git-archive will replace with a useful version identifier on export, see the libtasn1 patch to achieve this. To make use of this information, the git-version-gen script was enhanced to read this information, see the gnulib patch. This is invoked by ./configure to figure out which version number the package is for.

Translations We want translations to be included in the minimal source tarball for it to be buildable. Traditionally these files are retrieved by the maintainer from the Translation project when running ./bootstrap, however there are two problems with this. The first one is that there is no strong authentication or versioning information on this data, the tools just download and place whatever wget downloaded into your source tree (printf-style injection attack anyone?). We could improve this (e.g., publish GnuPG signed translations messages with clear versioning), however I did not work on that further. The reason is that I want to support offline builds of packages. Downloading random things from the Internet during builds does not work when building a Debian package, for example. The translation project could solve this by making a monthly tarball with their translations available, for distributors to pick up and provide as a separate package that could be used as a build dependency. However that is not how these tools and projects are designed. Instead I reverted back to storing translations in git, something that I did for most projects back when I was using CVS 20 years ago. Hooking this into ./bootstrap and gettext workflow can be tricky (ideas for improvement most welcome!), but I used a simple approach to store all directly downloaded po/*.po files directly as po/*.po.in and make the ./bootstrap tool move them in place, see the libidn2 commit followed by the actual make update-po commit with all the translations where one essential step is:
# Prime po/*.po from fall-back copy stored in git.
for poin in po/*.po.in; do
    po=$(echo $poin   sed 's/.in//')
    test -f $po   cp -v $poin $po
done
ls po/*.po   sed 's .*/ ; s \.po$ ' > po/LINGUAS

Fetching vendor files like gnulib Most build dependencies are in the shape of You need a C compiler . However some come in the shape of source-code files intended to be vendored , and gnulib is a huge repository of such files. The latter is a problem when building from a minimal git archive. It is possible to consider translation files as a class of vendor files, since they need to be copied verbatim into the project build directory for things to work. The same goes for *.m4 macros from the GNU Autoconf Archive. However I m not confident that the solution for all vendor files must be the same. For translation files and for Autoconf Archive macros, I have decided to put these files into git and merge them manually occasionally. For gnulib files, in some projects like OATH Toolkit I also store all gnulib files in git which effectively resolve this concern. (Incidentally, the reason for doing so was originally that running ./bootstrap took forever since there is five gnulib instances used, which is no longer the case since gnulib-tool was rewritten in Python.) For most projects, however, I rely on ./bootstrap to fetch a gnulib git clone when building. I like this model, however it doesn t work offline. One way to resolve this is to make the gnulib git repository available for offline use, and I ve made some effort to make this happen via a Gnulib Git Bundle and have explained how to implement this approach for Debian packaging. I don t think that is sufficient as a generic solution though, it is mostly applicable to building old releases that uses old gnulib files. It won t work when building from CI/CD pipelines, for example, where I have settled to use a crude way of fetching and unpacking a particular gnulib snapshot, see this Libntlm patch. This is much faster than working with git submodules and cloning gnulib during ./bootstrap. Essentially this is doing:
GNULIB_REVISION=$(. bootstrap.conf >&2; echo $GNULIB_REVISION)
wget -nv https://gitlab.com/libidn/gnulib-mirror/-/archive/$GNULIB_REVISION/gnulib-mirror-$GNULIB_REVISION.tar.gz
gzip -cd gnulib-mirror-$GNULIB_REVISION.tar.gz   tar xf -
rm -fv gnulib-mirror-$GNULIB_REVISION.tar.gz
export GNULIB_SRCDIR=$PWD/gnulib-mirror-$GNULIB_REVISION
./bootstrap --no-git
./configure
make

Test the git-archive tarball This goes without saying, but if you don t test that building from a git-archive style tarball works, you are likely to regress at some point. Use CI/CD techniques to continuously test that a minimal git-archive tarball leads to a usable build.

Mission Accomplished So that wasn t hard, was it? You should now be able to publish a minimal git-archive tarball and users should be able to build your project from it. I recommend naming these archives as PROJECT-vX.Y.Z-src.tar.gz replacing PROJECT with your project name and X.Y.Z with your version number. The archive should have only one sub-directory named PROJECT-vX.Y.Z/ containing all the source-code files. This differentiate it against traditional PROJECT-X.Y.Z.tar.gz tarballs in that it embeds the git tag (which typically starts with v) and contains a wildcard-friendly -src substring. Alas there is no consistency around this naming pattern, and GitLab, GitHub, Codeberg etc all seem to use their own slightly incompatible variant. Let s go on to see what is needed to achieve reproducible make dist source tarballs. This is the release artifact that most users use, and they often contain lots of generated files and vendor files. These files are included to make it easy to build for the user. What are the challenges to make these reproducible?

Build dependencies causing different generated content The first part is to realize that if you use tool X with version A to generate a file that goes into the tarball, version B of that tool may produce different outputs. This is a generic concern and it cannot be solved. We want our build tools to evolve and produce better outputs over time. What can be addressed is to avoid needless differences. For example, many tools store timestamps and versioning information in the generated files. This causes needless differences, which makes audits harder. I have worked on some of these, like Autoconf Archive timestamps but solving all of these examples will take a long time, and some upstream are reluctant to incorporate these changes. My approach meanwhile is to build things using similar environments, and compare the outputs for differences. I ve found that the various closely related forks of GNU/Linux distributions are useful for this. Trisquel 11 is based on Ubuntu 22.04, and building my projects using both and comparing the differences only give me the relevant differences to improve. This can be extended to compare AlmaLinux with RockyLinux (for both versions 8 and 9), Devuan 5 against Debian 12, PureOS 10 with Debian 11, and so on.

Timestamps Sometimes tools store timestamps in files in a way that is harder to fix. Two notable examples of this are *.po translation files and Texinfo manuals. For translation files, I have resolved this by making sure the files use a predictable POT-Creation-Date timestamp, and I set it to the modification timestamps of the NEWS file in the repository (which I set to the git commit of the latest commit elsewhere) like this:
dist-hook: po-CreationDate-to-mtime-NEWS
.PHONY: po-CreationDate-to-mtime-NEWS
po-CreationDate-to-mtime-NEWS: mtime-NEWS-to-git-HEAD
  $(AM_V_GEN)for p in $(distdir)/po/*.po $(distdir)/po/$(PACKAGE).pot; do \
    if test -f "$$p"; then \
      $(SED) -e 's,POT-Creation-Date: .*\\n",POT-Creation-Date: '"$$(env LC_ALL=C TZ=UTC0 stat --format=%y $(srcdir)/NEWS   cut -c1-16,31-)"'\\n",' < $$p > $$p.tmp && \
      if cmp $$p $$p.tmp > /dev/null; then \
        rm -f $$p.tmp; \
      else \
        mv $$p.tmp $$p; \
      fi \
    fi \
  done
Similarily, I set a predictable modification time of the texinfo source file like this:
dist-hook: mtime-NEWS-to-git-HEAD
.PHONY: mtime-NEWS-to-git-HEAD
mtime-NEWS-to-git-HEAD:
  $(AM_V_GEN)if test -e $(srcdir)/.git \
                && command -v git > /dev/null; then \
    touch -m -t "$$(git log -1 --format=%cd \
      --date=format-local:%Y%m%d%H%M.%S)" $(srcdir)/NEWS; \
  fi
However I ve realized that this needs to happen earlier and probably has to be run during ./configure time, because the doc/version.texi file is generated on first build before running make dist and for some reason the file is not rebuilt at release time. The Automake texinfo integration is a bit inflexible about providing hooks to extend the dependency tracking. The method to address these differences isn t really important, and they change over time depending on preferences. What is important is that the differences are eliminated.

ChangeLog Traditionally ChangeLog files were manually prepared, and still is for some projects. I maintain git2cl but recently I ve settled with gnulib s gitlog-to-changelog because doing so avoids another build dependency (although the output formatting is different and arguable worse for my git commit style). So the ChangeLog files are generated from git history. This means a shallow clone will not produce the same ChangeLog file depending on how deep it was cloned. For Libntlm I simply disabled use of generated ChangeLog because I wanted to support an even more extreme form of reproducibility: I wanted to be able to reproduce the full make dist source archives from a minimal git-archive source archive. However for other projects I ve settled with a middle ground. I realized that for git describe to produce reproducible outputs, the shallow clone needs to include the last release tag. So it felt acceptable to assume that the clone is not minimal, but instead has some but not all of the history. I settled with the following recipe to produce ChangeLog's covering all changes since the last release.
dist-hook: gen-ChangeLog
.PHONY: gen-ChangeLog
gen-ChangeLog:
  $(AM_V_GEN)if test -e $(srcdir)/.git; then			\
    LC_ALL=en_US.UTF-8 TZ=UTC0					\
    $(top_srcdir)/build-aux/gitlog-to-changelog			\
       --srcdir=$(srcdir) --					\
       v$(PREV_VERSION)~.. > $(distdir)/cl-t &&			\
         printf '\n\nSee the source repo for older entries\n'	\
         >> $(distdir)/cl-t &&					\
         rm -f $(distdir)/ChangeLog &&				\
         mv $(distdir)/cl-t $(distdir)/ChangeLog;  		\
  fi
I m undecided about the usefulness of generated ChangeLog files within make dist archives. Before we have stable and secure archival of git repositories widely implemented, I can see some utility of this in case we lose all copies of the upstream git repositories. I can sympathize with the concept of ChangeLog files died when we started to generate them from git logs: the files no longer serve any purpose, and we can ask people to go look at the git log instead of reading these generated non-source files.

Long-term reproducible trusted build environment Distributions comes and goes, and old releases of them goes out of support and often stops working. Which build environment should I chose to build the official release archives? To my knowledge only Guix offers a reliable way to re-create an older build environment (guix gime-machine) that have bootstrappable properties for additional confidence. However I had two difficult problems here. The first one was that I needed Guix container images that were usable in GitLab CI/CD Pipelines, and this side-tracked me for a while. The second one delayed my effort for many months, and I was inclined to give up. Libidn distribute a C# implementation. Some of the C# source code files included in the release tarball are generated. By what? You guess it, by a C# program, with the source code included in the distribution. This means nobody could reproduce the source tarball of Libidn without trusting someone elses C# compiler binaries, which were built from binaries of earlier releases, chaining back into something that nobody ever attempts to build any more and likely fail to build due to bit-rot. I had two basic choices, either remove the C# implementation from Libidn (which may be a good idea for other reasons, since the C and C# are unrelated implementations) or build the source tarball on some binary-only distribution like Trisquel. Neither felt appealing to me, but a late christmas gift of a reproducible Mono came to Guix that resolve this.

Embedded images in Texinfo manual For Libidn one section of the manual has an image illustrating some concepts. The PNG, PDF and EPS outputs were generated via fig2dev from a *.fig file (hello 1985!) that I had stored in git. Over time, I had also started to store the generated outputs because of build issues. At some point, it was possible to post-process the PDF outputs with grep to remove some timestamps, however with compression this is no longer possible and actually the grep command I used resulted in a 0-byte output file. So my embedded binaries in git was no longer reproducible. I first set out to fix this by post-processing things properly, however I then realized that the *.fig file is not really easy to work with in a modern world. I wanted to create an image from some text-file description of the image. Eventually, via the Guix manual on guix graph, I came to re-discover the graphviz language and tool called dot (hello 1993!). All well then? Oh no, the PDF output embeds timestamps. Binary editing of PDF s no longer work through simple grep, remember? I was back where I started, and after some (soul- and web-) searching I discovered that Ghostscript (hello 1988!) pdfmarks could be used to modify things here. Cooperating with automake s texinfo rules related to make dist proved once again a worthy challenge, and eventually I ended up with a Makefile.am snippet to build images that could be condensed into:
info_TEXINFOS = libidn.texi
libidn_TEXINFOS += libidn-components.png
imagesdir = $(infodir)
images_DATA = libidn-components.png
EXTRA_DIST += components.dot
DISTCLEANFILES = \
  libidn-components.eps libidn-components.png libidn-components.pdf
libidn-components.eps: $(srcdir)/components.dot
  $(AM_V_GEN)$(DOT) -Nfontsize=9 -Teps < $< > $@.tmp
  $(AM_V_at)! grep %%CreationDate $@.tmp
  $(AM_V_at)mv $@.tmp $@
libidn-components.pdf: $(srcdir)/components.dot
  $(AM_V_GEN)$(DOT) -Nfontsize=9 -Tpdf < $< > $@.tmp
# A simple sed on CreationDate is no longer possible due to compression.
# 'exiftool -CreateDate' is alternative to 'gs', but adds ~4kb to file.
# Ghostscript add <1kb.  Why can't 'dot' avoid setting CreationDate?
  $(AM_V_at)printf '[ /ModDate ()\n  /CreationDate ()\n  /DOCINFO pdfmark\n' > pdfmarks
  $(AM_V_at)$(GS) -q -dBATCH -dNOPAUSE -sDEVICE=pdfwrite -sOutputFile=$@.tmp2 $@.tmp pdfmarks
  $(AM_V_at)rm -f $@.tmp pdfmarks
  $(AM_V_at)mv $@.tmp2 $@
libidn-components.png: $(srcdir)/components.dot
  $(AM_V_GEN)$(DOT) -Nfontsize=9 -Tpng < $< > $@.tmp
  $(AM_V_at)mv $@.tmp $@
pdf-recursive: libidn-components.pdf
dvi-recursive: libidn-components.eps
ps-recursive: libidn-components.eps
info-recursive: $(top_srcdir)/.version libidn-components.png
Surely this can be improved, but I m not yet certain in what way is the best one forward. I like having a text representation as the source of the image. I m sad that the new image size is ~48kb compared to the old image size of ~1kb. I tried using exiftool -CreateDate as an alternative to GhostScript, but using it to remove the timestamp added ~4kb to the file size and naturally I was appalled by this ignorance of impending doom.

Test reproducibility of tarball Again, you need to continuously test the properties you desire. This means building your project twice using different environments and comparing the results. I ve settled with a small GitLab CI/CD pipeline job that perform bit-by-bit comparison of generated make dist archives. It also perform bit-by-bit comparison of generated git-archive artifacts. See the Libidn2 .gitlab-ci.yml 0-compare job which essentially is:
0-compare:
  image: alpine:latest
  stage: repro
  needs: [ B-AlmaLinux8, B-AlmaLinux9, B-RockyLinux8, B-RockyLinux9, B-Trisquel11, B-Ubuntu2204, B-PureOS10, B-Debian11, B-Devuan5, B-Debian12, B-gcc, B-clang, B-Guix, R-Guix, R-Debian12, R-Ubuntu2404, S-Trisquel10, S-Ubuntu2004 ]
  script:
  - cd out
  - sha256sum */*.tar.* */*/*.tar.*   sort   grep    -- -src.tar.
  - sha256sum */*.tar.* */*/*.tar.*   sort   grep -v -- -src.tar.
  - sha256sum */*.tar.* */*/*.tar.*   sort   uniq -c -w64   sort -rn
  - sha256sum */*.tar.* */*/*.tar.*   grep    -- -src.tar.   sort   uniq -c -w64   grep -v '^      1 '
  - sha256sum */*.tar.* */*/*.tar.*   grep -v -- -src.tar.   sort   uniq -c -w64   grep -v '^      1 '
# Confirm modern git-archive tarball reproducibility
  - cmp b-almalinux8/src/*.tar.gz b-almalinux9/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz b-rockylinux8/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz b-rockylinux9/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz b-debian12/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz b-devuan5/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz r-guix/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz r-debian12/src/*.tar.gz
  - cmp b-almalinux8/src/*.tar.gz r-ubuntu2404/src/*v2.*.tar.gz
# Confirm old git-archive (export-subst but long git describe) tarball reproducibility
  - cmp b-trisquel11/src/*.tar.gz b-ubuntu2204/src/*.tar.gz
# Confirm really old git-archive (no export-subst) tarball reproducibility
  - cmp b-debian11/src/*.tar.gz b-pureos10/src/*.tar.gz
# Confirm 'make dist' generated tarball reproducibility
  - cmp b-almalinux8/*.tar.gz b-rockylinux8/*.tar.gz
  - cmp b-almalinux9/*.tar.gz b-rockylinux9/*.tar.gz
  - cmp b-pureos10/*.tar.gz b-debian11/*.tar.gz
  - cmp b-devuan5/*.tar.gz b-debian12/*.tar.gz
  - cmp b-trisquel11/*.tar.gz b-ubuntu2204/*.tar.gz
  - cmp b-guix/*.tar.gz r-guix/*.tar.gz
# Confirm 'make dist' from git-archive tarball reproducibility
  - cmp s-trisquel10/*.tar.gz s-ubuntu2004/*.tar.gz
Notice that I discovered that git archive outputs differ over time too, which is natural but a bit of a nuisance. The output of the job is illuminating in the way that all SHA256 checksums of generated tarballs are included, for example the libidn2 v2.3.8 job log:
$ sha256sum */*.tar.* */*/*.tar.*   sort   grep -v -- -src.tar.
368488b6cc8697a0a937b9eb307a014396dd17d3feba3881e6911d549732a293  b-trisquel11/libidn2-2.3.8.tar.gz
368488b6cc8697a0a937b9eb307a014396dd17d3feba3881e6911d549732a293  b-ubuntu2204/libidn2-2.3.8.tar.gz
59db2d045fdc5639c98592d236403daa24d33d7c8db0986686b2a3056dfe0ded  b-debian11/libidn2-2.3.8.tar.gz
59db2d045fdc5639c98592d236403daa24d33d7c8db0986686b2a3056dfe0ded  b-pureos10/libidn2-2.3.8.tar.gz
5bd521d5ecd75f4b0ab0fc6d95d444944ef44a84cad859c9fb01363d3ce48bb8  s-trisquel10/libidn2-2.3.8.tar.gz
5bd521d5ecd75f4b0ab0fc6d95d444944ef44a84cad859c9fb01363d3ce48bb8  s-ubuntu2004/libidn2-2.3.8.tar.gz
7f1dcdea3772a34b7a9f22d6ae6361cdcbe5513e3b6485d40100b8565c9b961a  b-almalinux8/libidn2-2.3.8.tar.gz
7f1dcdea3772a34b7a9f22d6ae6361cdcbe5513e3b6485d40100b8565c9b961a  b-rockylinux8/libidn2-2.3.8.tar.gz
8031278157ce43b5813f36cf8dd6baf0d9a7f88324ced796765dcd5cd96ccc06  b-clang/libidn2-2.3.8.tar.gz
8031278157ce43b5813f36cf8dd6baf0d9a7f88324ced796765dcd5cd96ccc06  b-debian12/libidn2-2.3.8.tar.gz
8031278157ce43b5813f36cf8dd6baf0d9a7f88324ced796765dcd5cd96ccc06  b-devuan5/libidn2-2.3.8.tar.gz
8031278157ce43b5813f36cf8dd6baf0d9a7f88324ced796765dcd5cd96ccc06  b-gcc/libidn2-2.3.8.tar.gz
8031278157ce43b5813f36cf8dd6baf0d9a7f88324ced796765dcd5cd96ccc06  r-debian12/libidn2-2.3.8.tar.gz
acf5cbb295e0693e4394a56c71600421059f9c9bf45ccf8a7e305c995630b32b  r-ubuntu2404/libidn2-2.3.8.tar.gz
cbdb75c38100e9267670b916f41878b6dbc35f9c6cbe60d50f458b40df64fcf1  b-almalinux9/libidn2-2.3.8.tar.gz
cbdb75c38100e9267670b916f41878b6dbc35f9c6cbe60d50f458b40df64fcf1  b-rockylinux9/libidn2-2.3.8.tar.gz
f557911bf6171621e1f72ff35f5b1825bb35b52ed45325dcdee931e5d3c0787a  b-guix/libidn2-2.3.8.tar.gz
f557911bf6171621e1f72ff35f5b1825bb35b52ed45325dcdee931e5d3c0787a  r-guix/libidn2-2.3.8.tar.gz
I m sure I have forgotten or suppressed some challenges (sprinkling LANG=C TZ=UTC0 helps) related to these goals, but my hope is that this discussion of solutions will inspire you to implement these concepts for your software project too. Please share your thoughts and additional insights in a comment below. Enjoy Happy Hacking in the course of practicing this!

17 March 2025

Vincent Bernat: Offline PKI using 3 YubiKeys and an ARM single board computer

An offline PKI enhances security by physically isolating the certificate authority from network threats. A YubiKey is a low-cost solution to store a root certificate. You also need an air-gapped environment to operate the root CA.
PKI relying on a set of 3 YubiKeys: 2 for the root CA and 1 for the intermediate CA.
Offline PKI backed up by 3 YubiKeys
This post describes an offline PKI system using the following components: It is possible to add more YubiKeys as a backup of the root CA if needed. This is not needed for the intermediate CA as you can generate a new one if the current one gets destroyed.

The software part offline-pki is a small Python application to manage an offline PKI. It relies on yubikey-manager to manage YubiKeys and cryptography for cryptographic operations not executed on the YubiKeys. The application has some opinionated design choices. Notably, the cryptography is hard-coded to use NIST P-384 elliptic curve. The first step is to reset all your YubiKeys:
$ offline-pki yubikey reset
This will reset the connected YubiKey. Are you sure? [y/N]: y
New PIN code:
Repeat for confirmation:
New PUK code:
Repeat for confirmation:
New management key ('.' to generate a random one):
WARNING[pki-yubikey] Using random management key: e8ffdce07a4e3bd5c0d803aa3948a9c36cfb86ed5a2d5cf533e97b088ae9e629
INFO[pki-yubikey]  0: Yubico YubiKey OTP+FIDO+CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.management] Device config written
INFO[yubikit.piv] PIV application data reset performed
INFO[yubikit.piv] Management key set
INFO[yubikit.piv] New PUK set
INFO[yubikit.piv] New PIN set
INFO[pki-yubikey] YubiKey reset successful!
Then, generate the root CA and create as many copies as you want:
$ offline-pki certificate root --permitted example.com
Management key for Root X:
Plug YubiKey "Root X"...
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.piv] Data written to object slot 0x5fc10a
INFO[yubikit.piv] Certificate written to slot 9C (SIGNATURE), compression=True
INFO[yubikit.piv] Private key imported in slot 9C (SIGNATURE) of type ECCP384
Copy root certificate to another YubiKey? [y/N]: y
Plug YubiKey "Root X"...
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.piv] Data written to object slot 0x5fc10a
INFO[yubikit.piv] Certificate written to slot 9C (SIGNATURE), compression=True
INFO[yubikit.piv] Private key imported in slot 9C (SIGNATURE) of type ECCP384
Copy root certificate to another YubiKey? [y/N]: n
You can inspect the result:
$ offline-pki yubikey info
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[pki-yubikey] Slot 9C (SIGNATURE):
INFO[pki-yubikey]   Private key type: ECCP384
INFO[pki-yubikey]   Public key:
INFO[pki-yubikey]     Algorithm:  secp384r1
INFO[pki-yubikey]     Issuer:     CN=Root CA
INFO[pki-yubikey]     Subject:    CN=Root CA
INFO[pki-yubikey]     Serial:     1
INFO[pki-yubikey]     Not before: 2024-07-05T18:17:19+00:00
INFO[pki-yubikey]     Not after:  2044-06-30T18:17:19+00:00
INFO[pki-yubikey]     PEM:
-----BEGIN CERTIFICATE-----
MIIBcjCB+aADAgECAgEBMAoGCCqGSM49BAMDMBIxEDAOBgNVBAMMB1Jvb3QgQ0Ew
HhcNMjQwNzA1MTgxNzE5WhcNNDQwNjMwMTgxNzE5WjASMRAwDgYDVQQDDAdSb290
IENBMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAERg3Vir6cpEtB8Vgo5cAyBTkku/4w
kXvhWlYZysz7+YzTcxIInZV6mpw61o8W+XbxZV6H6+3YHsr/IeigkK04/HJPi6+i
zU5WJHeBJMqjj2No54Nsx6ep4OtNBMa/7T9foyMwITAPBgNVHRMBAf8EBTADAQH/
MA4GA1UdDwEB/wQEAwIBhjAKBggqhkjOPQQDAwNoADBlAjEAwYKy/L8leJyiZSnn
xrY8xv8wkB9HL2TEAI6fC7gNc2bsISKFwMkyAwg+mKFKN2w7AjBRCtZKg4DZ2iUo
6c0BTXC9a3/28V5aydZj6rvx0JqbF/Ln5+RQL6wFMLoPIvCIiCU=
-----END CERTIFICATE-----
Then, you can create an intermediate certificate with offline-pki yubikey intermediate and use it to sign certificates by providing a CSR to offline-pki certificate sign. Be careful and inspect the CSR before signing it, as only the subject name can be overridden. Check the documentation for more details. Get the available options using the --help flag.

The hardware part To ensure the operations on the root and intermediate CAs are air-gapped, a cost-efficient solution is to use an ARM64 single board computer. The Libre Computer Sweet Potato SBC is a more open alternative to the well-known Raspberry Pi.1
Libre Computer Sweet Potato single board computer relying on the Amlogic S905X SOC
Libre Computer Sweet Potato SBC, powered by the AML-S905X SOC
I interact with it through an USB to TTL UART converter:
$ tio /dev/ttyUSB0
[16:40:44.546] tio v3.7
[16:40:44.546] Press ctrl-t q to quit
[16:40:44.555] Connected to /dev/ttyUSB0
GXL:BL1:9ac50e:bb16dc;FEAT:ADFC318C:0;POC:1;RCY:0;SPI:0;0.0;CHK:0;
TE: 36574
BL2 Built : 15:21:18, Aug 28 2019. gxl g1bf2b53 - luan.yuan@droid15-sz
set vcck to 1120 mv
set vddee to 1000 mv
Board ID = 4
CPU clk: 1200MHz
[ ]

The Nix glue To bring everything together, I am using Nix with a Flake providing:
  • a package for the offline-pki application, with shell completion,
  • a development shell, including an editable version of the offline-pki application,
  • a NixOS module to setup the offline PKI, resetting the system at each boot,
  • a QEMU image for testing, and
  • an SD card image to be used on the Sweet Potato or another ARM64 SBC.
# Execute the application locally
nix run github:vincentbernat/offline-pki -- --help
# Run the application inside a QEMU VM
nix run github:vincentbernat/offline-pki\#qemu
# Build a SD card for the Sweet Potato or for the Raspberry Pi
nix build --system aarch64-linux github:vincentbernat/offline-pki\#sdcard.potato
nix build --system aarch64-linux github:vincentbernat/offline-pki\#sdcard.generic
# Get a development shell with the application
nix develop github:vincentbernat/offline-pki

  1. The key for the root CA is not generated by the YubiKey. Using an air-gapped computer is all the more important. Put it in a safe with the YubiKeys when done!

9 March 2025

Dirk Eddelbuettel: RcppNLoptExample 0.0.2: Minor Updates

An update to our package RcppNLoptExample arrived on CRAN earlier today marking the first update since the intial release more than four year ago. The nloptr package, created by Jelmer Ypma, has long been providing an excellent R interface to NLopt, a very comprehensive library for nonlinear optimization. In particular, Jelmer carefully exposed the API entry points such that other R packages can rely on NLopt without having to explicitly link to it (as one can rely on R providing sufficient function calling and registration to make this possible by referring back to nloptr which naturally has the linking information and resolution). This package demonstrates this in a simple-to-use Rcpp example package that can serve as a stanza. More recent NLopt versions appear to have changed behaviour a little so that an example we relied upon in simple unit test now converges to a marginally different numerical value, so we adjusted a convergence treshold. Other than that we did a number of the usual small updates to package metadata, to the README.md file, and to continuous integration. The (very short) NEWS entry follows:

Changes in version 0.0.2 (2025-03-09)
  • Updated tolerance in simple test as newer upstream nlopt change behaviour ever so slightly leading to an other spurious failure
  • Numerous small and standard updates to DESCRIPTION, README.md, badges, and continuous integration setup

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

Niels Thykier: Improving Debian packaging in Kate

The other day, I noted that the emacs integration with debputy stopped working. After debugging for a while, I realized that emacs no longer sent the didOpen notification that is expected of it, which confused debputy. At this point, I was already several hours into the debugging and I noted there was some discussions on debian-devel about emacs and byte compilation not working. So I figured I would shelve the emacs problem for now. But I needed an LSP capable editor and with my vi skills leaving much to be desired, I skipped out on vim-youcompleteme. Instead, I pulled out kate, which I had not been using for years. It had LSP support, so it would fine, right? Well, no. Turns out that debputy LSP support had some assumptions that worked for emacs but not kate. Plus once you start down the rabbit hole, you stumble on things you missed previously.
Getting started First order of business was to tell kate about debputy. Conveniently, kate has a configuration tab for adding language servers in a JSON format right next to the tab where you can see its configuration for built-in LSP (also in JSON format9. So a quick bit of copy-paste magic and that was done. Yesterday, I opened an MR against upstream to have the configuration added (https://invent.kde.org/utilities/kate/-/merge_requests/1748) and they already merged it. Today, I then filed a wishlist against kate in Debian to have the Debian maintainers cherry-pick it, so it works out of the box for Trixie (https://bugs.debian.org/1099876). So far so good.
Inlay hint woes Since July (2024), debputy has support for Inlay hints. They are basically small bits of text that the LSP server can ask the editor to inject into the text to provide hints to the reader. Typically, you see them used to provide typing hints, where the editor or the underlying LSP server has figured out the type of a variable or expression that you did not explicitly type. Another common use case is to inject the parameter name for positional arguments when calling a function, so the user do not have to count the position to figure out which value is passed as which parameter. In debputy, I have been using the Inlay hints to show inherited fields in debian/control. As an example, if you have a definition like:
Source: foo-src
Section: devel
Priority: optional
Package: foo-bin
Architecture: any
Then foo-bin inherits the Section and Priority field since it does not supply its own. Previously, debputy would that by injecting the fields themselves and their value just below the Package field as if you had typed them out directly. The editor always renders Inlay hints distinctly from regular text, so there was no risk of confusion and it made the text look like a valid debian/control file end to end. The result looked something like:
Source: foo-src
Section: devel
Priority: optional
Package: foo-bin
Section: devel
Priority: optional
Architecture: any
With the second instances of Section and Priority being rendered differently than its surrendering (usually faded or colorlessly). Unfortunately, kate did not like injecting Inlay hints with a newline in them, which was needed for this trick. Reading into the LSP specs, it says nothing about multi-line Inlay hints being a thing and I figured I would see this problem again with other editors if I left it be. I ended up changing the Inlay hints to be placed at the end of the Package field and then included surrounding () for better visuals. So now, it looks like:
Source: foo-src
Section: devel
Priority: optional
Package: foo-bin  (Section: devel)  (Priority: optional)
Architecture: any
Unfortunately, it is no longer 1:1 with the underlying syntax which I liked about the previous one. But it works in more editors and is still explicit. I also removed the Inlay hint for the Homepage field. It takes too much space and I have yet to meet someone missing it in the binary stanza. If you have any better ideas for how to render it, feel free to reach out to me.
Spurious completion and hover As I was debugging the Inlay hints, I wanted to do a quick restart of debputy after each fix. Then I would trigger a small change to the document to ensure kate would request an update from debputy to render the Inlay hints with the new code. The full outgoing payloads are sent via the logs to the client, so it was really about minimizing which LSP requests are sent to debputy. Notably, two cases would flood the log:
  • Completion requests. These are triggered by typing anything at all and since I wanted to a change, I could not avoid this. So here it was about making sure there would be nothing to complete, so the result was a small as possible.
  • Hover doc requests. These are triggered by mouse hovering over field, so this was mostly about ensuring my mouse movement did not linger over any field on the way between restarting the LSP server and scrolling the log in kate.
In my infinite wisdom, I chose to make a comment line where I would do the change. I figured it would neuter the completion requests completely and it should not matter if my cursor landed on the comment as there would be no hover docs for comments either. Unfortunately for me, debputy would ignore the fact that it was on a comment line. Instead, it would find the next field after the comment line and try to complete based on that. Normally you do not see this, because the editor correctly identifies that none of the completion suggestions start with a \#, so they are all discarded. But it was pretty annoying for the debugging, so now debputy has been told to explicitly stop these requests early on comment lines.
Hover docs for packages I added a feature in debputy where you can hover over package names in your relationship fields (such as Depends) and debputy will render a small snippet about it based on data from your local APT cache. This doc is then handed to the editor and tagged as markdown provided the editor supports markdown rendering. Both emacs and kate support markdown. However, not all markdown renderings are equal. Notably, emacs's rendering does not reformat the text into paragraphs. In a sense, emacs rendering works a bit like <pre>...</pre> except it does a bit of fancy rendering inside the <pre>...</pre>. On the other hand, kate seems to convert the markdown to HTML and then throw the result into an HTML render engine. Here it is important to remember that not all newlines are equal in markdown. A Foo<newline>Bar is treated as one "paragraph" (<p>...</p>) and the HTML render happily renders this as single line Foo Bar provided there is sufficient width to do so. A couple of extra newlines made wonders for the kate rendering, but I have a feeling this is not going to be the last time the hover docs will need some tweaking for prettification. Feel free to reach out if you spot a weirdly rendered hover doc somewhere.
Making quickfixes available in kate Quickfixes are treated as generic code actions in the LSP specs. Each code action has a "type" (kind in the LSP lingo), which enables the editor to group the actions accordingly or filter by certain types of code actions. The design in the specs leads to the following flow:
  1. The LSP server provides the editor with diagnostics (there are multiple ways to trigger this, so we will keep this part simple).
  2. The editor renders them to the user and the user chooses to interact with one of them.
  3. The interaction makes the editor asks the LSP server, which code actions are available at that location (optionally with filter to only see quickfixes).
  4. The LSP server looks at the provided range and is expected to return the relevant quickfixes here.
This flow is really annoying from a LSP server writer point of view. When you do the diagnostics (in step 1), you tend to already know what the possible quickfixes would be. The LSP spec authors realized this at some point, so there are two features the editor provides to simplify this.
  1. In the editor request for code actions, the editor is expected to provide the diagnostics that they received from the server. Side note: I cannot quite tell if this is optional or required from the spec.
  2. The editor can provide support for remembering a data member in each diagnostic. The server can then store arbitrary information in that member, which they will see again in the code actions request. Again, provided that the editor supports this optional feature.
All the quickfix logic in debputy so far has hinged on both of these two features. As life would have it, kate provides neither of them. Which meant I had to teach debputy to keep track of its diagnostics on its own. The plus side is that makes it easier to support "pull diagnostics" down the line, since it requires a similar feature. Additionally, it also means that quickfixes are now available in more editors. For consistency, debputy logic is now always used rather than relying on the editor support when present. The downside is that I had to spend hours coming up with and debugging a way to find the diagnostics that overlap with the range provided by the editor. The most difficult part was keeping the logic straight and getting the runes correct for it.
Making the quickfixes actually work With all of that, kate would show the quickfixes for diagnostics from debputy and you could use them too. However, they would always apply twice with suboptimal outcome as a result. The LSP spec has multiple ways of defining what need to be changed in response to activating a code action. In debputy, all edits are currently done via the WorkspaceEdit type. It has two ways of defining the changes. Either via changes or documentChanges with documentChanges being the preferred one if both parties support this. I originally read that as I was allowed to provide both and the editor would pick the one it preferred. However, after seeing kate blindly use both when they are present, I reviewed the spec and it does say "The edit should either provide changes or documentChanges", so I think that one is on me. None of the changes in debputy currently require documentChanges, so I went with just using changes for now despite it not being preferred. I cannot figure out the logic of whether an editor supports documentChanges. As I read the notes for this part of the spec, my understanding is that kate does not announce its support for documentChanges but it clearly uses them when present. Therefore, I decided to keep it simple for now until I have time to dig deeper.
Remaining limitations with kate There is one remaining limitation with kate that I have not yet solved. The kate program uses KSyntaxHighlighting for its language detection, which in turn is the basis for which LSP server is assigned to a given document. This engine does not seem to support as complex detection logic as I hoped from it. Concretely, it either works on matching on an extension / a basename (same field for both cases) or mime type. This combined with our habit in Debian to use extension less files like debian/control vs. debian/tests/control or debian/rules or debian/upstream/metadata makes things awkward a best. Concretely, the syntax engine cannot tell debian/control from debian/tests/control as they use the same basename. Fortunately, the syntax is close enough to work for both and debputy is set to use filename based lookups, so this case works well enough. However, for debian/rules and debian/upstream/metadata, my understanding is that if I assign these in the syntax engine as Debian files, these rules will also trigger for any file named foo.rules or bar.metadata. That seems a bit too broad for me, so I have opted out of that for now. The down side is that these files will not work out of the box with kate for now. The current LSP configuration in kate does not recognize makefiles or YAML either. Ideally, we would assign custom languages for the affected Debian files, so we do not steal the ID from other language servers. Notably, kate has a built-in language server for YAML and debputy does nothing for a generic YAML document. However, adding YAML as a supported language for debputy would cause conflict and regressions for users that are already happy with their generic YAML language server from kate. So there are certainly still work to be done. If you are good with KSyntaxHighlighting and know how to solve some of this, I hope you will help me out.
Changes unrelated to kate While I was working on debputy, I also added some other features that I want to mention.
  1. The debputy lint command will now show related context to diagnostic in its terminal report when such information is available and is from the same file as the diagnostic itself (cross file cases are rendered without related information). The related information is typically used to highlight a source of a conflict. As an example, if you use the same field twice in a stanza of debian/control, then debputy will add a diagnostic to the second occurrence. The related information for that diagnostic would provide the position of the first occurrence. This should make it easier to find the source of the conflict in the cases where debputy provides it. Let me know if you are missing it for certain diagnostics.

  2. The diagnostics analysis of debian/control will now identify and flag simple duplicated relations (complex ones like OR relations are ignored for now). Thanks to Matthias Geiger for suggesting the feature and Otto Kek l inen for reporting a false positive that is now fixed.

Closing I am glad I tested with kate to weed out most of these issues in time before the freeze. The Debian freeze will start within a week from now. Since debputy is a part of the toolchain packages it will be frozen from there except for important bug fixes.

8 March 2025

Debian Brasil: MiniDebConf Belo Horizonte 2024 - a brief report

From April 27th to 30th, 2024, MiniDebConf Belo Horizonte 2024 was held at the Pampulha Campus of UFMG - Federal University of Minas Gerais, in Belo Horizonte city. MiniDebConf BH 2024 banners This was the fifth time that a MiniDebConf (as an exclusive in-person event about Debian) took place in Brazil. Previous editions were in Curitiba (2016, 2017, and 2018), and in Bras lia 2023. We had other MiniDebConfs editions held within Free Software events such as FISL and Latinoware, and other online events. See our event history. Parallel to MiniDebConf, on 27th (Saturday) FLISOL - Latin American Free Software Installation Festival took place. It's the largest event in Latin America to promote Free Software, and It has been held since 2005 simultaneously in several cities. MiniDebConf Belo Horizonte 2024 was a success (as were previous editions) thanks to the participation of everyone, regardless of their level of knowledge about Debian. We value the presence of both beginner users who are familiarizing themselves with the system and the official project developers. The spirit of welcome and collaboration was present during all the event. MiniDebConf BH 2024 flisol 2024 edition numbers During the four days of the event, several activities took place for all levels of users and collaborators of the Debian project. The official schedule was composed of: MiniDebConf BH 2024 palestra The final numbers for MiniDebConf Belo Horizonte 2024 show that we had a record number of participants. Of the 224 participants, 15 were official Brazilian contributors, 10 being DDs (Debian Developers) and 05 (Debian Maintainers), in addition to several unofficial contributors. The organization was carried out by 14 people who started working at the end of 2023, including Prof. Lo c Cerf from the Computing Department who made the event possible at UFMG, and 37 volunteers who helped during the event. As MiniDebConf was held at UFMG facilities, we had the help of more than 10 University employees. See the list with the names of people who helped in some way in organizing MiniDebConf Belo Horizonte 2024. The difference between the number of people registered and the number of attendees in the event is probably explained by the fact that there is no registration fee, so if the person decides not to go to the event, they will not suffer financial losses. The 2024 edition of MiniDebconf Belo Horizonte was truly grand and shows the result of the constant efforts made over the last few years to attract more contributors to the Debian community in Brazil. With each edition the numbers only increase, with more attendees, more activities, more rooms, and more sponsors/supporters. MiniDebConf BH 2024 grupo

MiniDebConf BH 2024 grupo Activities The MiniDebConf schedule was intense and diverse. On the 27th, 29th and 30th (Saturday, Monday and Tuesday) we had talks, discussions, workshops and many practical activities. MiniDebConf BH 2024 palestra On the 28th (Sunday), the Day Trip took place, a day dedicated to sightseeing around the city. In the morning we left the hotel and went, on a chartered bus, to the Belo Horizonte Central Market. People took the opportunity to buy various things such as cheeses, sweets, cacha as and souvenirs, as well as tasting some local foods. MiniDebConf BH 2024 mercado After a 2-hour tour of the Market, we got back on the bus and hit the road for lunch at a typical Minas Gerais food restaurant. MiniDebConf BH 2024 palestra With everyone well fed, we returned to Belo Horizonte to visit the city's main tourist attraction: Lagoa da Pampulha and Capela S o Francisco de Assis, better known as Igrejinha da Pampulha. MiniDebConf BH 2024 palestra We went back to the hotel and the day ended in the hacker space that we set up in the events room for people to chat, packaging, and eat pizzas. MiniDebConf BH 2024 palestra Crowdfunding For the third time we ran a crowdfunding campaign and it was incredible how people contributed! The initial goal was to raise the amount equivalent to a gold tier of R$ 3,000.00. When we reached this goal, we defined a new one, equivalent to one gold tier + one silver tier (R$ 5,000.00). And again we achieved this goal. So we proposed as a final goal the value of a gold + silver + bronze tiers, which would be equivalent to R$ 6,000.00. The result was that we raised R$7,239.65 (~ USD 1,400) with the help of more than 100 people! Thank you very much to the people who contributed any amount. As a thank you, we list the names of the people who donated. MiniDebConf BH 2024 doadores Food, accommodation and/or travel grants for participants Each edition of MiniDebConf brought some innovation, or some different benefit for the attendees. In this year's edition in Belo Horizonte, as with DebConfs, we offered bursaries for food, accommodation and/or travel to help those people who would like to come to the event but who would need some kind of help. In the registration form, we included the option for the person to request a food, accommodation and/or travel bursary, but to do so, they would have to identify themselves as a contributor (official or unofficial) to Debian and write a justification for the request. Number of people benefited: The food bursary provided lunch and dinner every day. The lunches included attendees who live in Belo Horizonte and the region. Dinners were paid for attendees who also received accommodation and/or travel. The accommodation was held at the BH Jaragu Hotel. And the travels included airplane or bus tickets, or fuel (for those who came by car or motorbike). Much of the money to fund the bursaries came from the Debian Project, mainly for travels. We sent a budget request to the former Debian leader Jonathan Carter, and He promptly approved our request. In addition to this event budget, the leader also approved individual requests sent by some DDs who preferred to request directly from him. The experience of offering the bursaries was really good because it allowed several people to come from other cities. MiniDebConf BH 2024 grupo Photos and videos You can watch recordings of the talks at the links below: Thanks We would like to thank all the attendees, organizers, volunteers, sponsors and supporters who contributed to the success of MiniDebConf Belo Horizonte 2024. MiniDebConf BH 2024 grupo Sponsors Gold: Silver: Bronze: Organizers

Dirk Eddelbuettel: RcppTOML 0.2.3 on CRAN: Compiler Nag, Small Updates

A new (mostly maintenance) release 0.2.3 of RcppTOML is now on CRAN. TOMLis a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. This release was tickled by another CRAN request: just like yesterday s and the RcppDate release two days ago, it responds to the esoteric whitespace in literal operator depreceation warning. We alerted upstream too. The short summary of changes follows.

Changes in version 0.2.3 (2025-03-08)
  • Correct the minimum version of Rcpp to 1.0.8 (Walter Somerville)
  • The package now uses Authors@R as mandated by CRAN
  • Updated 'whitespace in literal' issue upsetting clang++-20
  • Continuous integration updates including simpler r-ci setup

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

7 March 2025

Dirk Eddelbuettel: RcppSimdJson 0.1.13 on CRAN: Compiler Nag, New Upsteam

A new release 0.1.13 of the RcppSimdJson package is now on CRAN. RcppSimdJson wraps the fantastic and genuinely impressive simdjson library by Daniel Lemire and collaborators. Via very clever algorithmic engineering to obtain largely branch-free code, coupled with modern C++ and newer compiler instructions, it results in parsing gigabytes of JSON parsed per second which is quite mindboggling. The best-case performance is faster than CPU speed as use of parallel SIMD instructions and careful branch avoidance can lead to less than one cpu cycle per byte parsed; see the video of the talk by Daniel Lemire at QCon. This release was tickled by another CRAN request: just like yesterday s RcppDate release, it responds to the esoteric whitespace in literal operator depreceation warning. Turns out that upstream simdjson had this fixed a few months ago as the node bindings package ran into it. Other changes include a bit of earlier polish by Daniel, another CRAN mandated update, CI improvements, and a move of two demos to examples/ to avoid having to add half a dozen packages to Suggests: for no real usage gain in the package. The short NEWS entry for this release follows.

Changes in version 0.1.13 (2025-03-07)
  • A call to std::string::erase is now guarded (Daniel)
  • The package now uses Authors@R as mandated by CRAN (Dirk)
  • simdjson was upgraded to version 3.12.2 (Dirk)
  • Continuous integration updated to more compilers and simpler setup
  • Two demos are now in inst/examples to not inflate Suggests

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

6 March 2025

Dirk Eddelbuettel: RcppDate 0.0.5: Address Minor Compiler Nag

RcppDate wraps the featureful date library written by Howard Hinnant for use with R. This header-only modern C++ library has been in pretty wide-spread use for a while now, and adds to C++11/C++14/C++17 what will is (with minor modifications) the date library in C++20. The RcppDate adds no extra R or C++ code and can therefore be a zero-cost dependency for any other project; yet a number of other projects decided to re-vendor it resulting in less-efficient duplication. Oh well. C est la via. This release sync wuth the (already mostly included) upstream release 3.0.3, and also addresses a new fresh (and mildly esoteric) nag from clang++-20. One upstream PR already addressed this in the files tickled by some CRAN packages, I followed this up with another upstream PR addressing this in a few more occurrences.

Changes in version 0.0.5 (2025-03-06)
  • Updated to upstream version 3.0.3
  • Updated 'whitespace in literal' issue upsetting clang++-20; this is also fixed upstream via two PRs

Courtesy of my CRANberries, there is also a diffstat report for the most recent release. More information is available at the repository or the package page.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can sponsor me at GitHub.

11 February 2025

Ian Jackson: derive-deftly 1.0.0 - Rust derive macros, the easy way

derive-deftly 1.0 is released. derive-deftly is a template-based derive-macro facility for Rust. It has been a great success. Your codebase may benefit from it too! Rust programmers will appreciate its power, flexibility, and consistency, compared to macro_rules; and its convenience and simplicity, compared to proc macros. Programmers coming to Rust from scripting languages will appreciate derive-deftly s convenient automatic code generation, which works as a kind of compile-time introspection. Rust s two main macro systems I m often a fan of metaprogramming, including macros. They can help remove duplication and flab, which are often the enemy of correctness. Rust has two macro systems. derive-deftly offers much of the power of the more advanced (proc_macros), while beating the simpler one (macro_rules) at its own game for ease of use. (Side note: Rust has at least three other ways to do metaprogramming: generics; build.rs; and, multiple module inclusion via #[path=]. These are beyond the scope of this blog post.) macro_rules! macro_rules! aka pattern macros , declarative macros , or sometimes macros by example are the simpler kind of Rust macro. They involve writing a sort-of-BNF pattern-matcher, and a template which is then expanded with substitutions from the actual input. If your macro wants to accept comma-separated lists, or other simple kinds of input, this is OK. But often we want to emulate a #[derive(...)] macro: e.g., to define code based on a struct, handling each field. Doing that with macro_rules is very awkward: macro_rules! s pattern language doesn t have a cooked way to match a data structure, so you have to hand-write a matcher for Rust syntax, in each macro. Writing such a matcher is very hard in the general case, because macro_rules lacks features for matching important parts of Rust syntax (notably, generics). (If you really need to, there s a horrible technique as a workaround.) And, the invocation syntax for the macro is awkward: you must enclose the whole of the struct in my_macro! . This makes it hard to apply more than one macro to the same struct, and produces rightward drift. Enclosing the struct this way means the macro must reproduce its input - so it can have bugs where it mangles the input, perhaps subtly. This also means the reader cannot be sure precisely whether the macro modifies the struct itself. In Rust, the types and data structures are often the key places to go to understand a program, so this is a significant downside. macro_rules also has various other weird deficiencies too specific to list here. Overall, compared to (say) the C preprocessor, it s great, but programmers used to the power of Lisp macros, or (say) metaprogramming in Tcl, will quickly become frustrated. proc macros Rust s second macro system is much more advanced. It is a fully general system for processing and rewriting code. The macro s implementation is Rust code, which takes the macro s input as arguments, in the form of Rust tokens, and returns Rust tokens to be inserted into the actual program. This approach is more similar to Common Lisp s macros than to most other programming languages macros systems. It is extremely powerful, and is used to implement many very widely used and powerful facilities. In particular, proc macros can be applied to data structures with #[derive(...)]. The macro receives the data structure, in the form of Rust tokens, and returns the code for the new implementations, functions etc. This is used very heavily in the standard library for basic features like #[derive(Debug)] and Clone, and for important libraries like serde and strum. But, it is a complete pain in the backside to write and maintain a proc_macro. The Rust types and functions you deal with in your macro are very low level. You must manually handle every possible case, with runtime conditions and pattern-matching. Error handling and recovery is so nontrivial there are macro-writing libraries and even more macros to help. Unlike a Lisp codewalker, a Rust proc macro must deal with Rust s highly complex syntax. You will probably end up dealing with syn, which is a complete Rust parsing library, separate from the compiler; syn is capable and comprehensive, but a proc macro must still contain a lot of often-intricate code. There are build/execution environment problems. The proc_macro code can t live with your application; you have to put the proc macros in a separate cargo package, complicating your build arrangements. The proc macro package environment is weird: you can t test it separately, without jumping through hoops. Debugging can be awkward. Proper tests can only realistically be done with the help of complex additional tools, and will involve a pinned version of Nightly Rust. derive-deftly to the rescue derive-deftly lets you use a write a #[derive(...)] macro, driven by a data structure, without wading into any of that stuff. Your macro definition is a template in a simple syntax, with predefined $-substitutions for the various parts of the input data structure. Example Here s a real-world example from a personal project:
define_derive_deftly!  
    export UpdateWorkerReport:
    impl $ttype  
        pub fn update_worker_report(&self, wr: &mut WorkerReport)  
            $(
                $ when fmeta(worker_report) 
                wr.$fname = Some(self.$fname.clone()).into();
            )
         
     
 
#[derive(Debug, Deftly, Clone)]
...
#[derive_deftly(UiMap, UpdateWorkerReport)]
pub struct JobRow  
    ...
    #[deftly(worker_report)]
    pub status: JobStatus,
    pub processing: NoneIsEmpty<ProcessingInfo>,
    #[deftly(worker_report)]
    pub info: String,
    pub duplicate_of: Option<JobId>,
 
This is a nice example, also, of how using a macro can avoid bugs. Implementing this update by hand without a macro would involve a lot of cut-and-paste. When doing that cut-and-paste it can be very easy to accidentally write bugs where you forget to update some parts of each of the copies:
    pub fn update_worker_report(&self, wr: &mut WorkerReport)  
        wr.status = Some(self.status.clone()).into();
        wr.info = Some(self.status.clone()).into();
     
Spot the mistake? We copy status to info. Bugs like this are extremely common, and not always found by the type system. derive-deftly can make it much easier to make them impossible. Special-purpose derive macros are now worthwhile! Because of the difficult and cumbersome nature of proc macros, very few projects have site-specific, special-purpose #[derive(...)] macros. The Arti codebase has no bespoke proc macros, across its 240kloc and 86 crates. (We did fork one upstream proc macro package to add a feature we needed.) I have only one bespoke, case-specific, proc macro amongst all of my personal Rust projects; it predates derive-deftly. Since we have started using derive-deftly in Arti, it has become an important tool in our toolbox. We have 37 bespoke derive macros, done with derive-deftly. Of these, 9 are exported for use by downstream crates. (For comparison there are 176 macro_rules macros.) In my most recent personal Rust project, I have 22 bespoke derive macros, done with with derive-deftly, and 19 macro_rules macros. derive-deftly macros are easy and straightforward enough that they can be used as readily as macro_rules macros. Indeed, they are often clearer than a macro_rules macro. Stability without stagnation derive-deftly is already highly capable, and can solve many advanced problems. It is mature software, well tested, with excellent documentation, comprising both comprehensive reference material and the walkthrough-structured user guide. But declaring it 1.0 doesn t mean that it won t improve further. Our ticket tracker has a laundry list of possible features. We ll sometimes be cautious about committing to these, so we ve added a beta feature flag, for opting in to less-stable features, so that we can prototype things without painting ourselves into a corner. And, we intend to further develop the Guide.

comment count unavailable comments

8 February 2025

Erich Schubert: Azul s State-of-Java report is nonsense

Azul s State-of-Java report is full of nonsense, and no worth looking at. The report claims various stuff about the adoption of AI in the Java ecosystem. But its results do not make any sense when looked at in detail. For example (in the AI section): I can only guess that people picked some random plausible answer, but were not actually using any of that. Probably because of bad incentives:
Participants were offered token compensation for their participation.
Seems like Dimensional Research, the company who did that survey, screwed up badly.

7 February 2025

Emmanuel Kasper: Wireless headset dongle not detected by PulseAudio

For whatever reason, when I plug and unplug my Wireless Headset dongle over USB, it is not always detected by the PulseAudio/Pipewire stack which is running our desktop sound Linux those days. But we can fix that with a restart of the handling daemon, see below.
In PulseAudio terminology an input device (microphone) is called a source, and an output device a sink. When the headset dongle is plugged in, we can see it on the USB bus:
$ lsusb   grep Headset 
Bus 001 Device 094: ID 046d:0af7 Logitech, Inc. Logitech G PRO X 2 Gaming Headset
The device is detected correctly as a Human Interface Device (HID) device
$ dmesg
...
[310230.507591] input: Logitech Logitech G PRO X 2 Gaming Headset as /devices/pci0000:00/0000:00:14.0/usb1/1-1/1-1.1/1-1.1.4/1-1.1.4:1.3/0003:046D:0AF7.0060/input/input163
[310230.507762] hid-generic 0003:046D:0AF7.0060: input,hiddev2,hidraw11: USB HID v1.10 Device [Logitech Logitech G PRO X 2 Gaming Headset] on usb-0000:00:14.0-1.1.4/input
However it is not seen in the list of sources / sinks of PulseAudio:
$ pactl list short sinks
58      alsa_output.usb-Lenovo_ThinkPad_Thunderbolt_3_Dock_USB_Audio_000000000000-00.analog-stereo      PipeWire        s16le 2ch 48000Hz       IDLE
62      alsa_output.pci-0000_00_1f.3.analog-stereo      PipeWire        s32le 2ch 48000Hz       SUSPENDED
95      bluez_output.F4_4E_FD_D2_97_1F.1        PipeWire        s16le 2ch 48000Hz       IDLE
This unfriendly list shows my docking station, which as a small jack connector for a wired cable, the built in speaker of my laptop, and a blutooth headset. If I restart Pipewire,
$ systemctl --user restart pipewire
then the headset appears as possible audio output.
$ pactl list short sinks
54      alsa_output.usb-Lenovo_ThinkPad_Thunderbolt_3_Dock_USB_Audio_000000000000-00.analog-stereo      PipeWire        s16le 2ch 48000Hz       SUSPENDED
56      alsa_output.usb-Logitech_Logitech_G_PRO_X_2_Gaming_Headset_0000000000000000-00.analog-stereo    PipeWire        s16le 2ch 48000Hz       SUSPENDED
58      alsa_output.pci-0000_00_1f.3.analog-stereo      PipeWire        s32le 2ch 48000Hz       SUSPENDED
77      bluez_output.F4_4E_FD_D2_97_1F.1        PipeWire        s16le 2ch 48000Hz       SUSPENDED
Once you have set the default input / output device, for me in Gnome, you can check it with:
$ pactl info   egrep '(Sink Source)'
Default Sink: alsa_output.usb-Logitech_Logitech_G_PRO_X_2_Gaming_Headset_0000000000000000-00.analog-stereo
Default Source: alsa_input.usb-Logitech_Logitech_G_PRO_X_2_Gaming_Headset_0000000000000000-00.mono-fallback
Finally let us play some test sounds:
$ speaker-test --test wav --nloops 1 --channels 2
And test some recording, you will hear the output around one second after the speaking (yes that is recorded audio sent over a Unix pipe for playing !):
# don't do this when the output is a speaker, this will create audio feedback (larsen effect)
$ arecord -f cd -   aplay

26 January 2025

Russ Allbery: Review: Dark Matters

Review: Dark Matters, by Michelle Diener
Series: Class 5 #4
Publisher: Eclipse
Copyright: October 2019
ISBN: 0-6454658-6-0
Format: Kindle
Pages: 307
Dark Matters is the fourth book in the science fiction semi-romance Class 5 series. There are spoilers for all of the previous books, and although enough is explained that you could make sense of the story starting here, I wouldn't recommend it. As with the other books in the series, it follows new protagonists, but the previous protagonists make an appearance. You will be unsurprised to hear that the Tecran kidnapped yet another Earth woman. The repetitiveness of the setup would be more annoying if the book took itself too seriously, but it doesn't, and so I mostly find it entertaining. I thought Diener was going to dodge the obvious series structure, but now I am wondering if we're going to end up with one woman per Class 5 ship after all. Lucy is not on a ship, however, Tecran or otherwise. She is a captive in a military research facility on the Tecran home world. The Tecran are in very deep trouble given the events of the previous book and have decided that Lucy's existence is a liability. Only the intervention of some sympathetic Tecran scientists she partly befriended during her captivity lets her escape the facility before it's destroyed. Now she's alone, on an alien world, being hunted by the military. It's not entirely the fault of this book that it didn't tell the story that I wanted to read. The setup for Dark Matters implies this book will see the arrival of consequences for the Tecran's blatant violations of the Sentient Beings Agreement. I was looking forward to a more political novel about how such consequences could be administered. This is the sort of problem that we struggle with in our politics: Collective punishment isn't acceptable, but there have to be consequences sufficient to ensure that a state doesn't repeat the outlawed behavior, and yet attempting to deliver those consequences feels like occupation and can set off worse social ruptures and even atrocities. I wasn't expecting that deep of political analysis of what is, after all, a lighthearted SF adventure series, but Diener has been willing to touch on hard problems. The ethics of violence has been an ongoing theme of the series. Alas for me, this is not what we get. The arriving cavalry, in the form of a Class 5 and the inevitable Grih hunk to serve as the love interest du jour, quickly become more interested in helping Lucy elude pursuers (or escape captors) than in the delicate political situation. The conflict between the local population is a significant story element, but only as backdrop. Instead, this reads like a thriller or an action movie, complete with alien predators and a cinematic set piece finale. The political conflict between the Tecran and the United Council does reach a conclusion of sorts, but it's not that satisfying. Perhaps some of the political fallout will happen in future books, but here Diener simplifies the morality of the story in the climax and dodges out of the tricky ethical and social challenge of how to punish a sovereign nation. One of the things I like about this series is that it takes moral indignation seriously, but now that Diener has raised the (correct) complication that people have strong motivations to find excuses for the actions of their own side, I hope she can find a believable political resolution that isn't simple brute force. This entry in the series wasn't bad, but it didn't grab me. Lucy was fine as a protagonist; her ability to manipulate the Tecran into making mistakes fits the longer time she's had to study them and keeps her distinct from the other protagonists. But the small bit of politics we do see is unsatisfying and conveniently simplistic, and this book mostly degenerates into generic action sequences. Bane, the Class 5 ship featured in this story, is great when he's active, and I continue to be entertained by the obsession the Class 5 ships have with Earth women, but he's sidelined for too much of the story. I felt like Diener focused on the least interesting part of the story setup. If you've read this far, there's nothing wrong with this entry. You'll probably want to keep reading. But it felt like a missed opportunity. Followed in publication order by Dark Ambitions, a novella that returns to Rose to tell a side story. The next novel is Dark Class, in which we'll presumably see the last kidnapped Earth woman. Rating: 6 out of 10

24 January 2025

Sam Hartman: Feeling Targeted: Executive Order Ending Wasteful DEIA Efforts

As most here know, I m totally blind. One of my roles involves a contract for the US Government, under which I have a government email account. The department recently received a message talking about our work to end, to the maximum extend permitted by law, all diversity, equity, inclusion, and accessibility efforts in the government in accordance with the recently signed executive order. We are all reminded that if we timely identify the contracts and positions that are related to these efforts, there will be no consequences. There are a lot of times in my life when I have felt marginalized frustrated and angry that people weren t interested in working with me to make the small changes that would help me fit in. As an example with this government job, I asked to have access to a screen reader so that I could use my computer. My preferred adaptive software was not approved, even though it was thousands of dollars cheaper than the option the government wanted and could have been installed instantly rather than waiting for a multi-week ordering process. When the screen reader eventually became available, the government-provided installer was not accessible: a blind person could not use it. When I asked for help, the government added an additional multi-week delay because they weren t sure that the license management technology for the software they had chosen met the government s security and privacy policies. Which is to say that even with people actively working toward accessibility, sharing a commitment that accessibility is important, we have a lot of work to do. I feel very targeted at the current time. Now we are removing as many of the resources that help me be effective and feel welcome as we can. Talking about the lack of consequences now is just a way to remind everyone that there will be consequences later and get the fear going. The witch hunt is coming, and if people do a good enough job of turning in all the people who could help me feel welcome, they won t face consequences. Yes, I understand that the Americans with Disabilities act is still law, but its effectiveness will be very different in a climate where you need to eliminate accessibility positions to avoid consequences than in a climate where accessibility is a goal.

comment count unavailable comments

11 January 2025

Petter Reinholdtsen: The 2025 LinuxCNC Norwegian developer gathering

The LinuxCNC project is trotting along. And I believe this great software system for numerical control of machines such as milling machines, lathes, plasma cutters, routers, cutting machines, robots and hexapods, would do even better with more in-person developer gatherings, so we plan to organise such gathering this summer too. This year we would like to invite to a small LinuxCNC and free software fabrication workshop/gathering in Norway this summer for the weekend starting July 4th 2025. New this year is the slightly larger scope, and we invite people also outside the LinuxCNC community to join. As earlier, we suggest to organize it as an unconference, where the participants create the program upon arrival. The location is a metal workshop 15 minutes drive away from to the Gardermoen airport (OSL), where there is a lot of space and a hotel only 5 minutes away by car. We plan to fire up the barbeque in the evenings. Please let us know if you would like to join. We track the list of participants on a simple pad, please add yourself there if you are interested in joining. The NUUG Foundation has on our request offered to handle any money involved with this gathering, in other words holding any sponsor funds and paying any bills. NUUG Foundation is a spinnoff from the NUUG member organisation here in Norway with long ties to the free software and open standards communities. As usual we hope to find sponsors to pay for food, lodging and travel. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

8 January 2025

John Goerzen: Censorship Is Complicated: What Internet History Says about Meta/Facebook

In light of this week s announcement by Meta (Facebook, Instagram, Threads, etc), I have been pondering this question: Why am I, a person that has long been a staunch advocate of free speech and encryption, leery of sites that talk about being free speech-oriented? And, more to the point, why an I a person that has been censored by Facebook for mentioning the Open Source social network Mastodon not cheering a lighter touch ? The answers are complicated, and take me back to the early days of social networking. Yes, I mean the 1980s and 1990s. Before digital communications, there were barriers to reaching a lot of people. Especially money. This led to a sort of self-censorship: it may be legal to write certain things, but would a newspaper publish a letter to the editor containing expletives? Probably not. As digital communications started to happen, suddenly people could have their own communities. Not just free from the same kinds of monetary pressures, but free from outside oversight (parents, teachers, peers, community, etc.) When you have a community that the majority of people lack the equipment to access and wouldn t understand how to access even if they had the equipment you have a place where self-expression can be unleashed. And, as J. C. Herz covers in what is now an unintentional history (her book Surfing on the Internet was published in 1995), self-expression WAS unleashed. She enjoyed the wit and expression of everything from odd corners of Usenet to the text-based open world of MOOs and MUDs. She even talks about groups dedicated to insults (flaming) in positive terms. But as I ve seen time and again, if there are absolutely no rules, then whenever a group gets big enough more than a few dozen people, say there are troublemakers that ruin it for everyone. Maybe it s trolling, maybe it s vicious attacks, you name it it will arrive and it will be poisonous. I remember the debates within the Debian community about this. Debian is one of the pillars of the Internet today, a nonprofit project with free speech in its DNA. And yet there were inevitably the poisonous people. Debian took too long to learn that allowing those people to run rampant was causing more harm than good, because having a well-worn Delete key and a tolerance for insults became a requirement for being a Debian developer, and that drove away people that had no desire to deal with such things. (I should note that Debian strikes a much better balance today.) But in reality, there were never absolutely no rules. If you joined a BBS, you used it at the whim of the owner (the sysop or system operator). The sysop may be a 16-yr-old running it from their bedroom, or a retired programmer, but in any case they were letting you use their resources for free and they could kick you off for any or no reason at all. So if you caused trouble, or perhaps insulted their cat, you re banned. But, in all but the smallest towns, there were other options you could try. On the other hand, sysops enjoyed having people call their BBSs and didn t want to drive everyone off, so there was a natural balance at play. As networks like Fidonet developed, a sort of uneasy approach kicked in: don t be excessively annoying, and don t be easily annoyed. Like it or not, it seemed to generally work. A BBS that repeatedly failed to deal with troublemakers could risk removal from Fidonet. On the more institutional Usenet, you generally got access through your university (or, in a few cases, employer). Most universities didn t really even know they were running a Usenet server, and you were generally left alone. Until you did something that annoyed somebody enough that they tracked down the phone number for your dean, in which case real-world consequences would kick in. A site may face the Usenet Death Penalty delinking from the network if they repeatedly failed to prevent malicious content from flowing through their site. Some BBSs let people from minority communities such as LGBTQ+ thrive in a place of peace from tormentors. A lot of them let people be themselves in a way they couldn t be in real life . And yes, some harbored trolls and flamers. The point I am trying to make here is that each BBS, or Usenet site, set their own policies about what their own users could do. These had to be harmonized to a certain extent with the global community, but in a certain sense, with BBSs especially, you could just use a different one if you didn t like what the vibe was at a certain place. That this free speech ethos survived was never inevitable. There were many attempts to regulate the Internet, and it was thanks to the advocacy of groups like the EFF that we have things like strong encryption and a degree of freedom online. With the rise of the very large platforms and here I mean CompuServe and AOL at first, and then Facebook, Twitter, and the like later the low-friction option of just choosing a different place started to decline. You could participate on a Fidonet forum from any of thousands of BBSs, but you could only participate in an AOL forum from AOL. The same goes for Facebook, Twitter, and so forth. Not only that, but as social media became conceived of as very large sites, it became impossible for a person with enough skill, funds, and time to just start a site themselves. Instead of neading a few thousand dollars of equipment, you d need tens or hundreds of millions of dollars of equipment and employees. All that means you can t really run Facebook as a nonprofit. It is a business. It should be absolutely clear to everyone that Facebook s mission is not the one they say it is [to] give people the power to build community and bring the world closer together. If that was their goal, they wouldn t be creating AI users and AI spam and all the rest. Zuck isn t showing courage; he s sucking up to Trump and those that will pay the price are those that always do: women and minorities. Really, the point of any large social network isn t to build community. It s to make the owners their next billion. They do that by convincing people to look at ads on their site. Zuck is as much a windsock as anyone else; he will adjust policies in whichever direction he thinks the wind is blowing so as to let him keep putting ads in front of eyeballs, and stomp all over principles even free speech doing it. Don t expect anything different from any large commercial social network either. Bluesky is going to follow the same trajectory as all the others. The problem with a one-size-fits-all content policy is that the world isn t that kind of place. For instance, I am a pacifist. There is a place for a group where pacifists can hang out with each other, free from the noise of the debate about pacifism. And there is a place for the debate. Forcing everyone that signs up for the conversation to sign up for the debate is harmful. Preventing the debate is often also harmful. One company can t square this circle. Beyond that, the fact that we care so much about one company is a problem on two levels. First, it indicates how succeptible people are to misinformation and such. I don t have much to offer on that point. Secondly, it indicates that we are too centralized. We have a solution there: Mastodon. Mastodon is a modern, open source, decentralized social network. You can join any instance, easily migrate your account from one server to another, and so forth. You pick an instance that suits you. There are thousands of others you can choose from. Some aggressively defederate with instances known to harbor poisonous people; some don t. And, to harken back to the BBS era, if you have some time, some skill, and a few bucks, you can run your own Mastodon instance. Personally, I still visit Facebook on occasion because some people I care about are mainly there. But it is such a terrible experience that I rarely do. Meta is becoming irrelevant to me. They are on a path to becoming irrelevant to many more as well. Maybe this is the moment to go shrug, this sucks and try something better. (And when you do, feel free to say hi to me at @jgoerzen@floss.social on Mastodon.)

1 January 2025

Russ Allbery: 2024 Book Reading in Review

In 2024, I finished and reviewed 46 books, not counting another three books I've finished but not yet reviewed and which will therefore roll over to 2025. This is slightly fewer books than the last couple of years, but more books than 2021. Reading was particularly spotty this year, with much of the year's reading packed into late November and December. This was a year in which I figured out I was trying to do too much, but did not finish figuring out what to do about it. Reading and particularly reviewing reflected that, with long silent periods and then attempts to catch up. One of the goals for next year is to find a more sustainable balance for the hobbies in my life, including reading. My favorite books I read this year were Ashley Herring Blake's Bright Falls sapphic romance trilogy: Delilah Green Doesn't Care, Astrid Parker Doesn't Fail, and Iris Kelly Doesn't Date. These are not perfect books, but they made me laugh, made me cry, and were impossible to put down. My thanks to a video from BookTuber Georgia Marie for the recommendation. I Shall Wear Midnight was the best of the remaining Pratchett novels. It's the penultimate Tiffany Aching book and, in my opinion, the best. All of the elements of the previous books come together in snarky competence porn that was a delight to read. The best book I read last year was Mark Lawrence's The Book That Wouldn't Burn, which much to my surprise did not make a single award list for its publication year of 2023. It was a tour de force of world-building that surprised me multiple times. Unfortunately, the sequel was not as good and I fear the series may be heading in the wrong direction. I am attempting to stay hopeful about the upcoming third and concluding book. I didn't read much non-fiction this year, but the best of what I did read was Zeke Faux's Number Go Up about the cryptocurrency bubble. This book will not change anyone's mind, but it's a readable and entertaining summary of some of the more obvious cryptocurrency scams. I also had enough quibbles with it to write an extended review, which is a compliment of sorts. The Discworld read-through is done, so I may either start or return to another series re-read in 2025. I have a huge backlog of all sorts of books, though, so we will see how the year goes. As always, I have no specific numeric goals, just a hope that I can make time for regular and varied reading and maintain a rhythm with writing reviews. The full analysis includes some additional personal reading statistics, probably only of interest to me.

Louis-Philippe V ronneau: 2024 A Musical Retrospective

Another musical retrospective. If you enjoy this, I also did a 2022 and a 2023 one. Albums In 2024, I added 88 new albums to my collection that's a lot! This year again, I bought the vast majority of my music on Bandcamp. To be honest, I'm quite distraught by what's become of that website. Although it stays a wonderful place to buy underground music, Songtradr, the new owner of the platform, has been shown to be viciously anti-union. Money continues to ruin the world, I guess. Concerts I continued to go to a lot of concerts in 2024 (25!). Over the past 3 years, I have been going to more and more concerts, and I think I've reached my "peak". A mean of a concert every two weeks is quite a lot :) If you also like music and concerts, but find yourself not going to as many as you would like, the real secret is not to be afraid to go to concerts alone. Going with friends is always fun, but if I restricted myself to only going to concerts in a group, I'd barely see a few each year. Another good advice is to bring a book or something else1 to pass the time between sets. It can often take 30-45 minutes between sets for the artists to get their instruments ready, which can get quite boring if you just stand there and wait. Anyway, here are the concerts I went to in 2024: Shout out to the Gancio project and to the folks running the Montreal instance. It continues to be a smash hit and most of the interesting concerts end up being advertised there. See you all in 2025!

  1. I bought a Miyoo Mini Plus, a handheld Linux console running OnionOS, for that express reason. So far it's been great and I've been very happy to revisit some childhood classics.

Next.