A new (mostly maintenance) release 0.2.3 of RcppTOML is now on CRAN. TOMLis a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. This release was tickled by another CRAN request: just like yesterday s and the RcppDate release two days ago, it responds to the esoteric whitespace in literal operator depreceation warning. We alerted upstream too. The short summary of changes follows.

Changes in version 0.2.3 (2025-03-08)

Correct the minimum version of Rcpp to 1.0.8 (Walter Somerville)

The package now uses Authors@R as mandated by CRAN

Updated 'whitespace in literal' issue upsetting clang++-20

Continuous integration updates including simpler r-ci setup

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

Meta In case you haven't noticed, I'm trying to post and one of the things that entails is to just dump over the fence a bunch of draft notes. In this specific case, I had a set of rough notes about NixOS and particularly Nix, the package manager. In this case, you can see the very birth of an article, what it looks like before it becomes the questionable prose it is now, by looking at the Git history of this file, particularly its birth. I have a couple of those left, and it would be pretty easy to publish them as is, but I feel I'd be doing others (and myself! I write for my own documentation too after all) a disservice by not going the extra mile on those. So here's the long version of my experiment with Nix.

Nix A couple friends are real fans of Nix. Just like I work with Puppet a lot, they deploy and maintain servers (if not fleets of servers) with NixOS and its declarative package management system. Essentially, they use it as a configuration management system, which is pretty awesome. That, however, is a bit too high of a bar for me. I rarely try new operating systems these days: I'm a Debian developer and it takes most of my time to keep that functional. I'm not going to go around messing with other systems as I know that would inevitably get me dragged down into contributing into yet another free software project. I'm mature now and know where to draw the line. Right? So I'm just testing Nix, the package manager, on Debian, because I learned from my friend that nixpkgs is the largest package repository out there, a mind-boggling 100,000 at the time of writing (with 88% of packages up to date), compared to around 40,000 in Debian (or 72,000 if you count binary packages, with 72% up to date). I naively thought Debian was the largest, perhaps competing with Arch, and I was wrong: Arch is larger than Debian too. What brought me there is I wanted to run Harper, a fast spell-checker written in Rust. The logic behind using Nix instead of just downloading the source and running it myself is that I delegate the work of supply-chain integrity checking to a distributor, a bit like you trust Debian developers like myself to package things in a sane way. I know this widens the attack surface to a third party of course, but the rationale is that I shift cryptographic verification to another stack than just "TLS + GitHub" (although that is somewhat still involved) that's linked with my current chain (Debian packages). I have since then stopped using Harper for various reasons and also wrapped up my Nix experiment, but felt it worthwhile to jot down some observations on the project.

Hot take Overall, Nix is hard to get into, with a complicated learning curve. I have found the documentation to be a bit confusing, since there are many ways to do certain things. I particularly tripped on "flakes" and, frankly, incomprehensible error reporting. It didn't help that I tried to run nixpkgs on Debian which is technically possible, but you can tell that I'm not supposed to be doing this. My friend who reviewed this article expressed surprised at how easy this was, but then he only saw the finished result, not me tearing my hair out to make this actually work.

Nix on Debian primer So here's how I got started. First I installed the nix binary package:

apt install nix-bin

Then I had to add myself to the right group and logout/log back in to get the rights to deploy Nix packages:

adduser anarcat nix-users

That wasn't easy to find, but is mentioned in the README.Debian file shipped with the Debian package. Then, I didn't write this down, but the README.Debian file above mentions it, so I think I added a "channel" like this:

nix-channel --add https://nixos.org/channels/nixpkgs-unstable nixpkgs
nix-channel --update

And I likely installed the Harper package with:

nix-env --install harper

At this point, harper was installed in a ... profile? Not sure. I had to add ~/.nix-profile/bin (a symlink to /nix/store/sympqw0zyybxqzz6fzhv03lyivqqrq92-harper-0.10.0/bin) to my $PATH environment for this to actually work.

Side notes on documentation Those last two commands (nix-channel and nix-env) were hard to figure out, which is kind of amazing because you'd think a tutorial on Nix would feature something like this prominently. But three different tutorials failed to bring me up to that basic setup, even the README.Debian didn't spell that out clearly. The tutorials all show me how to develop packages for Nix, not plainly how to install Nix software. This is presumably because "I'm doing it wrong": you shouldn't just "install a package", you should setup an environment declaratively and tell it what you want to do. But here's the thing: I didn't want to "do the right thing". I just wanted to install Harper, and documentation failed to bring me to that basic "hello world" stage. Here's what one of the tutorials suggests as a first step, for example:

curl -L https://nixos.org/nix/install   sh
nix-shell --packages cowsay lolcat
nix-collect-garbage

... which, when you follow through, leaves you with almost precisely nothing left installed (apart from Nix itself, setup with a nasty "curl pipe bash". So while that works in testing Nix, you're not much better off than when you started.

Rolling back everything Now that I have stopped using Harper, I don't need Nix anymore, which I'm sure my Nix friends will be sad to read about. Don't worry, I have notes now, and can try again! But still, I wanted to clear things out, so I did this, as root:

deluser anarcat nix-users
apt purge nix-bin
rm -rf /nix ~/.nix*

I think this cleared things out, but I'm not actually sure.

Side note on Nix drama This blurb wouldn't be complete without a mention that the Nix community has been somewhat tainted by the behavior of its founder. I won't bother you too much with this; LWN covered it well in 2024, and made a followup article about spinoffs and forks that's worth reading as well. I did want to say that everyone I have been in contact with in the Nix community was absolutely fantastic. So I am really sad that the behavior of a single individual can pollute a community in such a way. As a leader, if you have all but one responsability, it's to behave properly for people around you. It's actually really, really hard to do that, because yes, it means you need to act differently than others and no, you just don't get to be upset at others like you would normally do with friends, because you're in a position of authority. It's a lesson I'm still learning myself, to be fair. But at least I don't work with arms manufacturers or, if I would, I would be sure as hell to accept the nick (or nix?) on the chin when people would get upset, and try to make amends. So long live the Nix people! I hope the community recovers from that dark moment, so far it seems like it will. And thanks for helping me test Harper!

Review: The Last Hour Between Worlds, by Melissa Caruso

Series:	The Echo Archives #1
Publisher:	Orbit
Copyright:	November 2024
ISBN:	0-316-30364-X
Format:	Kindle
Pages:	388

The Last Hour Between Worlds is urban, somewhat political high fantasy with strong fae vibes. It is the first book of a series, but it stands alone quite well. Kembral Thorne is a Hound, a member of the guild that serves as guards, investigators, and protectors. Kembral's specialty is Echo retrieval: rescues of people and animals who have fallen through a weak spot in reality into one of the strange, dangerous, and malleable layers called Echoes. Kem once rescued a dog from six layers down, an almost unheard-of feat. Kem is also a new single mother, which means her past two months have been spent in a sleep-deprived haze revolving exclusively around her much-beloved infant. Dona Marjorie Swift's year-turning party is the first time she's been out without Emmi since she gave birth, and she's only there because her sister took the child and practically shoved her out the door. Now, she's desperately trying to remember how to be social and normal, which is not made easier by the unexpected presence of Rika at the party. Rika Nonesuch is not a Hound. She's a Cat, a member of the guild of thieves and occasional assassins. They are the nemesis of the Hounds, but in a stylized and formalized way in which certain courtesies are expected. (The politics of this don't really make sense; you just have to go with it.) Kem has complicated feelings about Rika's grace, banter, and intoxicating perfume, feelings that she thought might be reciprocated until Rika drugged her during an apparent date and left her buried under a pile of garbage. She was not expecting Rika to be at this party and is definitely not ready to have a conversation with her. This emotional turmoil is rudely interrupted by the death of nearly everyone at the party via an Echo poison, the appearance of a dark figure driving a black sword into someone, and the descent of the entire party into an Echo. This was one of those books that kept getting better the farther into the book I read. I was a bit leery at first because the publisher's blurb made it sound more like horror than I prefer, but this is more the disturbing strangeness of fae creatures than the sort of gruesomeness, disgust, or body horror that I find off-putting. Most importantly, the point of this book is not to torture the characters or scare the reader. It's instead structured a bit like a murder mystery, but one whose resolution requires working out obscure fantasy rules and hidden political agendas. One of the currencies in the world of Echos is blood, but another is emotion, revelation, and the stories that bring both, and Caruso focuses the story more on that aspect than on horrifying imagery.

Rika frowned. "Resolve it? How?" "I have no idea." I couldn't keep my frustration from leaking through. "Might be that we have to delve deep into our own hearts to confront the unhealed wounds we've carried with us in secret. Might be that we have to say their names backward, or just close our eyes and they'll go away. Echoes never make any damned sense." Rika made a face. "We'd better not have to confront our unhealed wounds, or I'm leaving you to die."

All of The Last Hour Between Worlds is told in the first person from Kem's perspective, but Rika is the best character in this book. Kem is a rather straightforward, dogged, stubborn protector; Rika is complicated, selfish, conflicted, and considerably more dynamic. The first obvious twist in her background I spotted so long before Kem found out that it was a bit frustrating, but there were multiple satisfying twists after that. As advertised in the blurb, there's a sapphic romance angle here, but it's the sort that comes from a complicated friendship and a lot of mutual respect rather than love at first sight. Some of their relationship conflict is driven by misunderstanding, but the misunderstanding happens before the novel begins, which means the reader doesn't have to sit through the bit where one yells at the characters for being stupid. It helps that the characters have something concrete to do, and that driving plot problem is multi-layered and satisfying. Each time the party falls through a layer of reality, it's mostly reset to the start of the book, but the word "mostly" is hiding a lot of subtlety. Given the clock at the start of each chapter and the blurb (if one read it), the reader can make a good guess that the plot problem will not be fully resolved until the characters fall quite deep into the Echoes, but the story never felt repetitive the way that some time loop stories can. As the characters gain more understanding, the problems change, the players change, and they have to make several excursions into the surrounding world. This is the sort of fantasy that feels a bit like science fiction. You're thrown into a world with a different culture and different rules that are foreign to the reader and natural to the characters. Part of the fun of reading is figuring out the rules, history, and backstory while watching the characters try to solve the puzzles they're faced with. The writing is good but not great. Characterization was good enough for a story primarily focused on action and puzzle-solving, but it was a bit lacking in subtlety. I think Caruso's strengths showed most in the world design, particularly the magic system and the rules followed by the Echo creatures. The excursions outside of the somewhat-protected house struck a balance between eeriness and comprehensibility that reminded me of T. Kingfisher or Sandman. The human politics were unfortunately less successful and rested on some tired centrist cliches. Thankfully, this was not the main point of the story. I should also warn that there is a lot of talk about babies. Kem's entire identity at the start of the novel, to the point of incessant monologue, is "new mother." This is not a perspective we get very often in fantasy, and Kem eventually finds a steadier balance between her bond with her daughter and the other parts of her life. I think some readers will feel very seen. But Caruso leans hard into maternal bonding. So hard. If you don't want to read about someone who is deliriously obsessed with their new child, you may want to skip this one. Right after I finished this book, I thought it was amazing. Now that I've had a few days to think about it, the lack of subtlety and the facile human politics brought it down a notch. I'm a science fiction reader at heart, so I loved the slow revelation of mechanics; the reader starts the story by knowing that Kem can "blink step" but not knowing what that means, and by the end of the story one not only knows but has opinions about its limitations, political implications, and interactions with other forms of magic. The Echo worlds are treated similarly, and this type of world-building is my jam. But the cost is that the human characters, particularly the supporting cast, don't get the same focus and therefore are a bit straightforward and obvious. The subplot with Dona Vandelle was particularly annoying. Ah well. Kem and Rika's relationship did work, and it's the center of the book. If you like fantasy mechanics but are a bit leery of fae stories because they feel too symbolic or arbitrary, give this a try. It's the most satisfyingly constructed fae story that I've read in a long time. It's not great literary fiction, but it's also not trying to be; it's a puzzle adventure, and a well-executed one. Recommended, and I will definitely be reading the sequel. Content notes: Lots of violent death and other physical damage, creepy dream worlds with implied but not explicit horror, and rather a lot of blood. Followed by The Last Soul Among Wolves, not yet published at the time I wrote this review. Rating: 8 out of 10

A few weeks ago Tiziano Bacocco started a small project to implement a (golang) proxy that allows to store proxmox backups on S3 compatible storage: pmoxs3backuproxy, a feature which the current backup server does not have. I wanted to have a look at the Proxmox Backup Server implementation for a while, so i jumped on the wagon and helped with adding most of the API endpoints required to seamlessly use it as drop-in replacement in PVE. The current version can be configured as storage backend in PVE. You can then schedule your backups to the S3 storage likewise. It now supports both the Fixed index format required to create virtual machine backups and the Dynamic index format, used by the regular proxmox-backup-client for file and container backups. (full and incremental) The other endpoints like adding notes, removing or protecting backups, mounting images using the PVE frontend (or proxmox-backup-client) work too. It comes with a garbage collector that does prune the backup storage if snapshots expire and runs integrity checks on the data. You can also configure it as so called remote storage in the Proxmox Backup server itself and pull back complete buckets using proxmox-backup-manager pull , if your local datastore crashes. I think it will become more interesting if future proxmox versions will allow to push backups to other stores, too.

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.

Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server. Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:

https://git.torproject.org/

to:

https://gitlab.torproject.org/

In your Git configuration. The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab. Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab. User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate. The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining. The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code? Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request. An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page. Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost. Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones. A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form. That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab. Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for `gitweb.torproject.org`, instead search for `git.torproject.org`. In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why? This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago. The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories. There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab. TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab. In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules. When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories. When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.) So a relatively large amount of Python code was produced to basically do the following:

check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
find a matching GitLab project by name, prompt the user for multiple matches
if a match is found, redirect if the repository is non-empty
- we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
- in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change

User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks. Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order. The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions. The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:

anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]

In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:

$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> =>  'id': 2723, 'name': 'master', 'push_access_levels': [ 'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None ], 'merge_access_levels': [ 'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers' ], 'allow_force_push': False 
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]

Acute eyes will notice the bell used as a notification mechanism as well in this transcript. A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious. This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone. Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record. The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case
# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches hooks info objects/).*) git-.* upload-pack receive-pack HEAD config description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond % REQUEST_URI  ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%2 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$2 .git/$3 [R=302,L]
# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

gitweb.torproject.org rewrite rules Those are the vastly more complicated GitWeb to GitLab rewrite rules. Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.

RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]
# list of endpoints taken from cgit's cmd.c
# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond % REQUEST_URI  ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# summary
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# about
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# commit
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond "% QUERY_STRING " "(.*(?:^ &))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1 [R=302,L,QSD]
# patch
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.patch [R=302,L,QSD]
# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.diff [R=302,L,QSD]
# log
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD$2 [R=302,L]
# atom
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L,QSD]
# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags/%1 [R=302,L,QSD]
# tree
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD$2 [R=302,L]
# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD [R=302,L]
# plain
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/HEAD$2 [R=302,L]
# blame: disabled
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteCond % QUERY_STRING  h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/HEAD/$2 [R=302,L]
# stats
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/graphs/HEAD [R=302,L]
# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side
# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]

The reference copy of those is available in our (currently private) Puppet git repository.

It is, sadly, not entirely surprising that Facebook is censoring articles critical of Meta. The Kansas Reflector published an artical about Meta censoring environmental articles about climate change deeming them too controversial . Facebook then censored the article about Facebook censorship, and then after an independent site published a copy of the climate change article, Facebook censored it too. The CNN story says Facebook apologized and said it was a mistake and was fixing it. Color me skeptical, because today I saw this:

Yes, that s right: today, April 6, I get a notification that they removed a post from August 12. The notification was dated April 4, but only showed up for me today. I wonder why my post from August 12 was fine for nearly 8 months, and then all of a sudden, when the same website runs an article critical of Facebook, my 8-month-old post is a problem. Hmm.

Riiiiiight. Cybersecurity. This isn t even the first time they ve done this to me. On September 11, 2021, they removed my post about the social network Mastodon (click that link for screenshot). A post that, incidentally, had been made 10 months prior to being removed. While they ultimately reversed themselves, I subsequently wrote Facebook s Blocking Decisions Are Deliberate Including Their Censorship of Mastodon. That this same pattern has played out a second time again with something that is a very slight challenege to Facebook seems to validate my conclusion. Facebook lets all sort of hateful garbage infest their site, but anything about climate change or their own censorship gets removed, and this pattern persists for years. There s a reason I prefer Mastodon these days. You can find me there as @jgoerzen@floss.social. So. I ve written this blog post. And then I m going to post it to Facebook. Let s see if they try to censor me for a third time. Bring it, Facebook.

The status quo Back in 2015, I bought an off-the-shelf NAS, a QNAP TS-453mini, to act as my file store and Plex server. I had previously owned a Synology box, and whilst I liked the Synology OS and experience, the hardware was underwhelming. I loaded up the successor QNAP with four 5TB drives in RAID10, and moved all my files over (after some initial DoA drive issues were handled).
QNAP TS-453mini product photo
That thing has been in service for about 8 years now, and it s been a mixed bag. It was definitely more powerful than the predecessor system, but it was clear that QNAP s OS was not up to the same standard as Synology s perhaps best exemplified by HappyGet 2 , the QNAP webapp for downloading videos from streaming services like YouTube, whose icon is a straight rip-off of StarCraft 2. On its own, meaningless but a bad omen for overall software quality
The logo for QNAP HappyGet 2 and Blizzard s StarCraft 2 side by side
Additionally, the embedded Celeron processor in the NAS turned out to be an issue for some cases. It turns out, when playing back videos with subtitles, most Plex clients do not support subtitles properly instead they rely on the Plex server doing JIT transcoding to bake the subtitles directly into the video stream. I discovered this with some Blu-Ray rips of Game of Thrones some episodes would play back fine on my smart TV, but episodes with subtitled Dothraki speech would play at only 2 or 3 frames per second. The final straw was a ransomware attack, which went through all my data and locked every file below a 60MiB threshold. Practically all my music gone. A substantial collection of downloaded files, all gone. Some of these files had been carried around since my college days digital rarities, or at least digital detritus I felt a real sense of loss at having to replace. This episode was caused by a ransomware targeting specific vulnerabilities in the QNAP OS, not an error on my part. So, I decided to start planning a replacement with:

A non-garbage OS, whilst still being a NAS-appliance type offering (not an off-the-shelf Linux server distro)

Full remote management capabilities

A small form factor comparable to off-the-shelf NAS

A powerful modern CPU capable of transcoding high resolution video

All flash storage, no spinning rust

At the time, no consumer NAS offered everything (The Asustor FS6712X exists now, but didn t when this project started), so I opted to go for a full DIY rather than an appliance not the first time I ve jumped between appliances and DIY for home storage.

Selecting the core of the system There aren t many companies which will sell you a small motherboard with IPMI. Supermicro is a bust, so is Tyan. But ASRock Rack, the server division of third-tier motherboard vendor ASRock, delivers. Most of their boards aren t actually compliant Mini-ITX size, they re a proprietary Deep Mini-ITX with the regular screw holes, but 40mm of extra length (and a commensurately small list of compatible cases). But, thankfully, they do have a tiny selection of boards without the extra size, and I stumbled onto the X570D4I-2T, a board with an AMD AM4 socket and the mature X570 chipset. This board can use any AMD Ryzen chip (before the latest-gen Ryzen 7000 series); has built in dual 10 gigabit ethernet; IPMI; four (laptop-sized) RAM slots with full ECC support; one M.2 slot for NVMe SSD storage; a PCIe 16x slot (generally for graphics cards, but we live in a world of possibilities); and up to 8 SATA drives OR a couple more NVMe SSDs. It s astonishingly well featured, just a shame it costs about $450 compared to a good consumer-grade Mini ITX AM4 board costing less than half that. I was so impressed with the offering, in fact, that I crowed about it on Mastodon and ended up securing ASRock another sale, with someone else looking into a very similar project to mine around the same timespan. The next question was the CPU. An important feature of a system expected to run 24/7 is low power, and AM4 chips can consume as much as 130W under load, out of the box. At the other end, some models can require as little as 35W under load the OEM-only GE suffix chips, which are readily found for import on eBay. In their PRO variant, they also support ECC (all non-G Ryzen chips support ECC, but only Pro G chips do). The top of the range 8 core Ryzen 7 PRO 5750GE is prohibitively expensive, but the slightly weaker 6 core Ryzen 5 PRO 5650GE was affordable, and one arrived quickly from Hong Kong. Supplemented with a couple of cheap 16 GiB SODIMM sticks of DDR4 PC-3200 direct from Micron for under $50 a piece, that left only cooling as an unsolved problem to get a bootable test system. The official support list for the X570D4I-2T only includes two rackmount coolers, both expensive and hard to source. The reason for such a small list is the non standard cooling layout of the board instead of an AM4 hole pattern with the standard plastic AM4 retaining clips, it has an Intel 115x hole pattern with a non-standard backplate (Intel 115x boards have no backplate, the stock Intel 115x cooler attaches to the holes with push pins). As such every single cooler compatibility list excludes this motherboard. However, the backplate is only secured with a mild glue with minimal pressure and a plastic prying tool it can be removed, giving compatibility with any 115x cooler (which is basically any CPU cooler for more than a decade). I picked an oversized low profile Thermalright AXP120-X67 hoping that its 120mm fan would cool the nearby MOSFETs and X570 chipset too.
Thermalright AXP120-X67, AMD Ryzen 5 PRO 5650GE, ASRock Rack X570D4I-2T, all assembled and running on a flat surface

Testing up to this point Using a spare ATX power supply, I had enough of a system built to explore the IPMI and UEFI instances, and run MemTest86 to validate my progress. The memory test ran without a hitch and confirmed the ECC was working, although it also showed that the memory was only running at 2933 MT/s instead of the rated 3200 MT/s (a limit imposed by the motherboard, as higher speeds are considered overclocking). The IPMI interface isn t the best I ve ever used by a long shot, but it s minimum viable and allowed me to configure the basics and boot from media entirely via a Web browser.
Memtest86 showing test progress, taken from IPMI remote control window
One sad discovery, however, which I ve never seen documented before, on PCIe bifurcation. With PCI Express, you have a number of lanes which are allocated in groups by the motherboard and CPU manufacturer. For Ryzen prior to Ryzen 7000, that s 16 lanes in one slot for the graphics card; 4 lanes in one M.2 connector for an SSD; then 4 lanes connecting the CPU to the chipset, which can offer whatever it likes for peripherals or extra lanes (bottlenecked by that shared 4x link to the CPU, if it comes down to it). It s possible, with motherboard and CPU support, to split PCIe groups up for example an 8x slot could be split into two 4x slots (eg allowing two NVMe drives in an adapter card NVME drives these days all use 4x). However with a Cezanne Ryzen with integrated graphics, the 16x graphics card slot cannot be split into four 4x slots (ie used for for NVMe drives) the most bifurcation it allows is 8x4x4x, which is useless in a NAS.
Screenshot of PCIe 16x slot bifurcation options in UEFI settings, taken from IPMI remote control window
As such, I had to abandon any ideas of an all-NVMe NAS I was considering: the 16x slot split into four 4x, combined with two 4x connectors fed by the X570 chipset, to a total of 6 NVMe drives. 7.6TB U.2 enterprise disks are remarkably affordable (cheaper than consumer SATA 8TB drives), but alas, I was locked out by my 5650GE. Thankfully I found out before spending hundreds on a U.2 hot swap bay. The NVMe setup would be nearly 10x as fast as SATA SSDs, but at least the SATA SSD route would still outperform any spinning rust choice on the market (including the fastest 10K RPM SAS drives)

Containing the core The next step was to pick a case and power supply. A lot of NAS cases require an SFX (rather than ATX) size supply, so I ordered a modular SX500 unit from Silverstone. Even if I ended up with a case requiring ATX, it s easy to turn an SFX power supply into ATX, and the worst result is you have less space taken up in your case, hardly the worst problem to have. That said, on to picking a case. There s only one brand with any cachet making ITX NAS cases, Silverstone. They have three choices in an appropriate size: CS01-HS, CS280, and DS380. The problem is, these cases are all badly designed garbage. Take the CS280 as an example, the case with the most space for a CPU cooler. Here s how close together the hotswap bay (right) and power supply (left) are:
Internal image of Silverstone CS280 NAS build. Image stolen from ServeTheHome
With actual cables connected, the cable clearance problem is even worse:
Internal image of Silverstone CS280 NAS build. Image stolen from ServeTheHome
Remember, this is the best of the three cases for internal layout, the one with the least restriction on CPU cooler height. And it s garbage! Total hot garbage! I decided therefore to completely skip the NAS case market, and instead purchase a 5.25 -to-2.5 hot swap bay adapter from Icy Dock, and put it in an ITX gamer case with a 5.25 bay. This is no longer a served market 5.25 bays are extinct since nobody uses CD/DVD drives anymore. The ones on the market are really new old stock from 2014-2017: The Fractal Design Core 500, Cooler Master Elite 130, and Silverstone SUGO 14. Of the three, the Fractal is the best rated so I opted to get that one however it seems the global supply of new old stock fully dried up in the two weeks between me making a decision and placing an order leaving only the Silverstone case. Icy Dock have a selection of 8-bay 2.5 SATA 5.25 hot swap chassis choices in their ToughArmor MB998 series. I opted for the ToughArmor MB998IP-B, to reduce cable clutter it requires only two SFF-8611-to-SF-8643 cables from the motherboard to serve all eight bays, which should make airflow less of a mess. The X570D4I-2T doesn t have any SATA ports on board, instead it has two SFF-8611 OCuLink ports, each supporting 4 PCI Express lanes OR 4 SATA connectors via a breakout cable. I had hoped to get the ToughArmor MB118VP-B and run six U.2 drives, but as I said, the PCIe bifurcation issue with Ryzen G chips meant I wouldn t be able to run all six bays successfully.
NAS build in Silverstone SUGO 14, mid build, panels removed

Silverstone SUGO 14 from the front, with hot swap bay installed

Actual storage for the storage server My concept for the system always involved a fast boot/cache drive in the motherboard s M.2 slot, non-redundant (just backups of the config if the worst were to happen) and separate storage drives somewhere between 3.8 and 8 TB each (somewhere from $200-$350). As a boot drive, I selected the Intel Optane SSD P1600X 58G, available for under $35 and rated for 228 years between failures (or 11,000 complete drive rewrite cycles). So, on to the big expensive choice: storage drives. I narrowed it down to two contenders: new-old-stock Intel D3-S4510 3.84TB enterprise drives, at about $200, or Samsung 870 QVO 8TB consumer drives, at about $375. I did spend a long time agonizing over the specification differences, the ZFS usage reports, the expected lifetime endurance figures, but in reality, it came down to price $1600 of expensive drives vs $3200 of even more expensive drives. That s 27TB of usable capacity in RAID-Z1, or 23TB in RAID-Z2. For comparison, I m using about 5TB of the old NAS, so that s a LOT of overhead for expansion.
Storage SSD loaded into hot swap sled

Booting up Bringing it all together is the OS. I wanted an appliance NAS OS rather than self-administering a Linux distribution, and after looking into the surrounding ecosystems, decided on TrueNAS Scale (the beta of the 2023 release, based on Debian 12).
TrueNAS Dashboard screenshot in browser window
I set up RAID-Z1, and with zero tuning (other than enabling auto-TRIM), got the following performance numbers:
IOPS Bandwidth
4k random writes 19.3k 75.6 MiB/s
4k random reads 36.1k 141 MiB/s
Sequential writes 2300 MiB/s
Sequential reads 3800 MiB/s
Results using fio parameters suggested by Huawei
And for comparison, the maximum theoretical numbers quoted by Intel for a single drive:
IOPS Bandwidth
4k random writes 16k ?
4k random reads 90k ?
Sequential writes 280 MiB/s
Sequential reads 560 MiB/s
Numbers quoted by Intel SSD successors Solidigm.
Finally, the numbers reported on the old NAS with four 7200 RPM hard disks in RAID 10:
IOPS Bandwidth
4k random writes 430 1.7 MiB/s
4k random reads 8006 32 MiB/s
Sequential writes 311 MiB/s
Sequential reads 566 MiB/s
Performance seems pretty OK. There s always going to be an overhead to RAID. I ll settle for the 45x improvement on random writes vs. its predecessor, and 4.5x improvement on random reads. The sequential write numbers are gonna be impacted by the size of the ZFS cache (50% of RAM, so 16 GiB), but the rest should be a reasonable indication of true performance. It took me a little while to fully understand the TrueNAS permissions model, but I finally got Plex configured to access data from the same place as my SMB shares, which have anonymous read-only access or authenticated write access for myself and my wife, working fine via both Linux and Windows. And that s it! I built a NAS. I intend to add some fans and more RAM, but that s the build. Total spent: about $3000, which sounds like an unreasonable amount, but it s actually less than a comparable Synology DiskStation DS1823xs+ which has 4 cores instead of 6, first-generation AMD Zen instead of Zen 3, 8 GiB RAM instead of 32 GiB, no hardware-accelerated video transcoding, etc. And it would have been a whole lot less fun!
The final system, powered up
(Also posted on PCPartPicker)

As we all know, this year is a very special year for the Debian project, the project turns 30!
The Brazilian Community joined in and during the anniversary week, organized some online activities through the Debian Brasil YouTube channel.
Information about talks given can be seen on the commemoration website.
Talks are also have been published individually on the Debian social Peertube and Youtube. After this week of celebration, the Debian Community in Curitiba, decided to get together for a lunch for confraternization among some local members.
The confraternization took place at The Barbers restaurant. The local menu was a traditional Feijoada (Rice, Beans with pork meat, fried banana, cabbage, orange and farofa). The meeting took place with a lot of laughs, conversations, fun, caipirinha and draft beer! We can only thank the Debian Project for providing great moments! A small photographic record of the people present! Group photo

For obscure reasons, I have found myself with a phone number registered with Signal but without any device associated with it. This is the I lost my phone section in Signal support, which rather unhelpfully tell you that, literally:

Until you have access to your phone number, there is nothing that can be done with Signal.

To be fair, I guess that sort of makes sense: Signal relies heavily on phone numbers for identity. It's how you register to the service and how you recover after losing your phone number. If you have your PIN ready, you don't even change safety numbers! But my case is different: this phone number was a test number, associated with my tablet, because you can't link multiple Android device to the same phone number. And now that I brilliantly bricked that tablet, I just need to tell people to stop trying to contact me over that thing (which wasn't really working in the first place anyway because I wasn't using the tablet that much, but I digress). So. What do you do? You could follow the above "lost my phone" guide and get a new Android or iOS phone to register on Signal again, but that's pretty dumb: I don't want another phone, I already have one. Lo and behold, signal-cli to the rescue!

Disclaimer: no warranty or liability Before following this guide, make sure you remember the license of this website, which specifically has a Section 5 Disclaimer of Warranties and Limitation of Liability. If you follow this guide literally, you might actually get into trouble. You have been warned. All Cats Are Beautiful.

Installing in Docker Because signal-cli is not packaged in Debian (but really should be), I need to bend over backwards to install it. The installation instructions suggest building from source (what is this, GentooBSD?) or installing binary files (what is this, Debiandows?), that's all so last millennium. I want something fresh and fancy, so I went with the extremely legit Docker registry ran by the not-shady-at-all gitlab.com/packaging group which is suspiciously not owned by any GitLab.com person I know of. This is surely perfectly safe.

(Insert long digression on supply chain security here and how Podman is so much superior to Docker. Feel free to dive deep into how RedHat sold out to the nazis or how this is just me ranting about something I don't understand, again. I'm not going to do all the work for you.)

Anyway. The magic command is:

mkdir .config/signal-cli
podman pull registry.gitlab.com/packaging/signal-cli/signal-cli-jre:latest
# lightly hit computer with magic supply chain verification wand
alias signal-cli="podman run --rm --publish 7583:7583 --volume .config/signal-cli:/var/lib/signal-cli --tmpfs /tmp:exec   registry.gitlab.com/packaging/signal-cli/signal-cli-jre:latest --config /var/lib/signal-cli"

At this point, you have a signal-cli alias that should more or less behave as per upstream documentation. Note that it sets up a network service on port 7583 which is unnecessary because you likely won't be using signal-cli's "daemon mode" here, this is a one-shot thing. But I'll probably be reusing those instructions later on, so I figured it might be a safe addition. Besides, it's what the instructions told me to do so I'm blindly slamming my head in the bash pipe, as trained. Also, you're going to have the signal-cli configuration persist in ~/.config/signal-cli there. Again, totally unnecessary.

Re-registering the number Back to our original plan of canceling our Signal account. The next step is, of course, to register with Signal.

Yes, this is a little counter-intuitive and you'd think there would be a "I want off this boat" button on https://signal.org that would do this for you, but hey, I guess that's only reserved for elite hackers who want to ~~screw people over~~, I mean close their accounts. Mere mortals don't get access to such beauties. Update: a friend reminded me there used to be such a page at https://signal.org/signal/unregister/ but it's mysteriously gone from the web, but still available on the wayback machine although surely that doesn't work anymore. Untested.

To register an account with signal-cli, you first need to pass a CAPTCHA. Those are the funky images generated by deep neural networks that try to fool humans into thinking other neural networks can't break them, and generally annoy the hell out of people. This will generate a URL that looks like:

signalcaptcha://signal-hcaptcha.$UUID.registration.$THIRTYTWOKILOBYTESOFGARBAGE

Yes, it's a very long URL. Yes, you need the entire thing. The URL is hidden behind the Open Signal link, you can right-click on the link to copy it or, if you want to feel like it's 1988 again, use view-source: or butterflies or something. You will also need the phone number you want to unregister here, obviously. We're going to take a not quite random phone number as an example, +18002677468.

Don't do this at home kids! Use the actual number and don't copy-paste examples from random websites!

So the actual command you need to run now is:

signal-cli -a +18002677468 register --captcha signalcaptcha://signal-hcaptcha.$UUID.registration.$THIRTYTWOKILOBYTESOFGARBAGE

To confirm the registration, Signal will send a text message (SMS) to that phone number with a verification code. (Fun fact: it's actually Twilio relaying that message for Signal and that is... not great.) If you don't have access to SMS on that number, you can try again with the --voice option, which will do the same thing with a actual phone call. I wish it would say "Ok boomer" when it calls, but it doesn't. If you don't have access to either, you're screwed. You may be able to port your phone number to another provider to gain control of the phone number again that said, but at that point it's a whole different ball game. With any luck now you've received the verification code. You use it with:

signal-cli -a +18002677468 verify 131213

If you want to make sure this worked, you can try writing to another not random number at all, it should Just Work:

signal-cli -a +18002677468 send -mtest +18005778477

This is almost without any warning on the other end too, which says something amazing about Signal's usability and something horrible about its security.

Unregistering the number Now we get to the final conclusion, the climax. Can you feel it? I'll try to refrain from further rants, I promise. It's pretty simple and fast, just call:

signal-cli -a +18002677468 unregister

That's it! Your peers will now see an "Invite to Signal" button instead of a text field to send a text message.

Cleanup Optionally, cleanup the mess you left on this computer:

rm -r ~/.config/signal-cli
podman image rm registry.gitlab.com/packaging/signal-cli/signal-cli-jre

I wrote the Ruby bindings for the Enquo Project, my attempt to bring queryable encryption to all databases, using the Rutie library. Recently, I ve rewritten the bindings to use Magnus instead, and I thought I d put down my thoughts about the whole situation.

The Story So Far The Enquo Project core cryptography is all written in Rust, as seems to be the vogue these days. Rust is fast, safe, and easily interoperable with most of the rest of the modern software development ecosystem, making it a good choice as a language to implement the cryptographic primitives that Enquo needs, like Order-Revealing Encryption. Of course, since not everyone writes their applications in Rust, we need to provide the functionality of the Enquo client in the languages that people do use, such as Ruby, Python, and so on. Since re-writing all that cryptographic code in a myriad of languages would be tedious and error-prone, we instead provide bindings to the core Rust code. These are just thin shims of code that translate the data types and function calls between Rust and the target language.
Wrong sort of shim, but canned language bindings would be handy
As I m most familiar with Ruby and its development ecosystem (particularly Ruby on Rails), it was natural that I d make Ruby bindings for Enquo as my first target. Rummaging around, it seemed that Rutie was a good library to use, so off I went.

What are Rutie and Magnus, Anyway? Both libraries share the same goal: provide the ability to write some Rust code, run that through a compiler, and produce something that can be loaded by the Ruby interpreter and used just like any other Ruby class. They re both fairly high level interfaces, trying to abstract away much of the gory details, and do a lot of the common heavy lifting that can make writing bindings fiddly and annoying. Things like mapping data types (like strings and integers) between Rust data types and the closest equivalents in Ruby. This mapping never goes perfectly smoothly. For example, Ruby integers don t have a fixed range of values they can represent you can store a huge number like 2²⁵⁶ more-or-less as easily as you can the number 12. But Rust, being a lower-level language, only has a set of integer types that have fixed boundaries, like the `u32` type, which can only store integers between zero and about four billion (2³² - 1, to be precise). There s also lots of little things that need to be just right, also, like translating the different memory management approaches of the languages, and dealing with a myriad of fiddly little issues like passing arguments and return values in and out of method calls, helpers for defining classes and methods (and pointing to the correct Rust functions), and so on.
This is what I imagine it looks like inside these libraries
(Herv Cozanet / Wikimedia Commons, CC-BY-SA)
All in all, these libraries are fairly significant pieces of work, and I m mighty glad that someone else has taken on the job of building (and maintaining!) them.

So Why the Change? Good question. It s important to say at the outset that there s nothing particularly wrong with Rutie. I found using Rutie to be very straightforward, and the Ruby bindings came together very quickly and easily. If someone chose to use Rutie for their project, I m sure they d have a good experience. What made me take the time to rewrite using Magnus was a set of a few tiny things, which together gave me enough of a shove to do the work. Firstly, I d had a hiccup with Rutie s support of newer versions of Ruby, particularly 3.2 (PR). Also, I d hit a couple of segfault issues, which were ultimately caused by Ruby garbage-collecting data out from underneath me. These were ultimately my fault, of course, but Rutie wasn t helping me out in avoiding the problems in the first place. Finally, while Rutie helped translate data types, there was still a bit of boilerplate and ugliness that needed to be included. This wasn t a showstopper, but I m appreciating the extra smoothness that Magnus provides here. As an example, here s what s required in Rutie to get native Rust data types from Ruby method parameters (and the self reference to the current object):

fn enquo_field_decrypt_text(ciphertext_obj: RString, context_obj: RString) -> RString  
    let ciphertext = ciphertext_obj.to_str_unchecked();
    let context = context_obj.to_vec_u8_unchecked();
    let field = rbself.get_data(&*FIELD_WRAPPER);
    // etc etc etc

The equivalent in Magnus is just the function signature:

fn decrypt_text(&self, ciphertext: String, context: String) -> Result<String, magnus::Error>

You can also see there that Magnus signals an exception via the Result return value, while Rutie s approach to raising an exception involves poking the Ruby VM directly, which always struck me as a bit ugly. There are several other minor things in Magnus (like its cleaner approach to wrapping structs so they can be stored in Ruby objects) that I m appreciating, too. Never discount the power of ergonomics for making a happy developer.

The End Result I spent a bit over half of last weekend doing the rewrite maybe ten hours of so. Since Magnus did more type checking and data validation, and its approach to error handling was smoother, I took the opportunity to rewrite a bunch of Ruby wrapper code I d written (which just existed to check things like ranges of values and string encodings) into Rust, as well. To make sure that the conversion was accurate, I added a heap more unit tests to the bindings. I also took the opportunity to restructure the codebase to split the code for the different Ruby classes into separate files, which I hadn t done initially as the code had originally accreted, rather than being purposefully written. All up, though, my rewrite ended up removing over 60 lines (excluding the extra specs I added):

$ git diff --stat -- lib ext/enquo/src
 ruby/ext/enquo/src/field.rs         342 ++++++++++++++++++++++++++++++++++++++
 ruby/ext/enquo/src/lib.rs           338 ++++---------------------------------
 ruby/ext/enquo/src/root.rs           39 +++++
 ruby/ext/enquo/src/root_key.rs       67 ++++++++
 ruby/lib/enquo.rb                     6 +-
 ruby/lib/enquo/field.rb             173 -------------------
 ruby/lib/enquo/root.rb               28 ----
 ruby/lib/enquo/root_key.rb            1 -
 ruby/lib/enquo/root_key/static.rb    27 ---
 9 files changed, 479 insertions(+), 542 deletions(-)

Considering that I was translating from a higher level language into a lower level one, the removal of so much code is quite remarkable. Magnus was able to automagically replace rather a lot of raise ArgumentError if something.isnt_right code in those .rb files. So, in conclusion, if you, too, are building Ruby extensions in Rust, while Rutie is a solid choice (and you probably should stick with it if you re already using it), I highly recommend giving Magnus a look for your next extension.

Protocol Buffers are a popular choice for serializing structured data due to their compact size, fast processing speed, language independence, and compatibility. There exist other alternatives, including Cap n Proto, CBOR, and Avro. Usually, data structures are described in a proto definition file (.proto). The protoc compiler and a language-specific plugin convert it into code:

$ head flow-4.proto
syntax = "proto3";
package decoder;
option go_package = "akvorado/inlet/flow/decoder";
message FlowMessagev4  
  uint64 TimeReceived = 2;
  uint32 SequenceNum = 3;
  uint64 SamplingRate = 4;
  uint32 FlowDirection = 5;
$ protoc -I=. --plugin=protoc-gen-go --go_out=module=akvorado:. flow-4.proto
$ head inlet/flow/decoder/flow-4.pb.go
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
//      protoc-gen-go v1.28.0
//      protoc        v3.21.12
// source: inlet/flow/data/schemas/flow-4.proto
package decoder
import (
        protoreflect "google.golang.org/protobuf/reflect/protoreflect"

Akvorado collects network flows using IPFIX or sFlow, decodes them with GoFlow2, encodes them to Protocol Buffers, and sends them to Kafka to be stored in a ClickHouse database. Collecting a new field, such as source and destination MAC addresses, requires modifications in multiple places, including the proto definition file and the ClickHouse migration code. Moreover, the cost is paid by all users.¹ It would be nice to have an application-wide schema and let users enable or disable the fields they need. While the main goal is flexibility, we do not want to sacrifice performance. On this front, this is quite a success: when upgrading from 1.6.4 to 1.7.1, the decoding and encoding performance almost doubled!

goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt                 final.txt               
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.963    2%   7.836    1%  -39.55% (p=0.000 n=10)
Sflow/with_encoding-12         19.37    1%   10.15    2%  -47.63% (p=0.000 n=10)

Faster Protocol Buffers encoding I use the following code to benchmark both the decoding and encoding process. Initially, the Decode() method is a thin layer above GoFlow2 producer and stores the decoded data into the in-memory structure generated by protoc. Later, some of the data will be encoded directly during flow decoding. This is why we measure both the decoding and the encoding.²

func BenchmarkDecodeEncodeSflow(b *testing.B)  
    r := reporter.NewMock(b)
    sdecoder := sflow.New(r)
    data := helpers.ReadPcapPayload(b,
        filepath.Join("decoder", "sflow", "testdata", "data-1140.pcap"))
    for _, withEncoding := range []bool true, false   
        title := map[bool]string 
            true:  "with encoding",
            false: "without encoding",
         [withEncoding]
        var got []*decoder.FlowMessage
        b.Run(title, func(b *testing.B)  
            for i := 0; i < b.N; i++  
                got = sdecoder.Decode(decoder.RawFlow 
                    Payload: data,
                    Source: net.ParseIP("127.0.0.1"),
                 )
                if withEncoding  
                    for _, flow := range got  
                        buf := []byte 
                        buf = protowire.AppendVarint(buf, uint64(proto.Size(flow)))
                        proto.MarshalOptions .MarshalAppend(buf, flow)
                     
                 
             
         )

The canonical Go implementation for Protocol Buffers, google.golang.org/protobuf is not the most efficient one. For a long time, people were relying on gogoprotobuf. However, the project is now deprecated. A good replacement is vtprotobuf.³

goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt               bench-2.txt              
                                sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.96    2%   10.28    2%  -20.67% (p=0.000 n=10)
Netflow/without_encoding-12   8.935    2%   8.975    2%        ~ (p=0.143 n=10)
Sflow/with_encoding-12        19.37    1%   16.67    2%  -13.93% (p=0.000 n=10)
Sflow/without_encoding-12     14.62    3%   14.87    1%   +1.66% (p=0.007 n=10)

Dynamic Protocol Buffers encoding We have our baseline. Let s see how to encode our Protocol Buffers without a .proto file. The wire format is simple and rely a lot on variable-width integers. Variable-width integers, or varints, are an efficient way of encoding unsigned integers using a variable number of bytes, from one to ten, with small values using fewer bytes. They work by splitting integers into 7-bit payloads and using the 8^th bit as a continuation indicator, set to 1 for all payloads except the last.

Variable-width integers encoding in Protocol Buffers: conversion of 150 to a varint

For our usage, we only need two types: variable-width integers and byte sequences. A byte sequence is encoded by prefixing it by its length as a varint. When a message is encoded, each key-value pair is turned into a record consisting of a field number, a wire type, and a payload. The field number and the wire type are encoded as a single variable-width integer called a tag.

Message encoded with Protocol Buffers: three varints, two sequences of bytes

We use the following low-level functions to build the output buffer:

protowire.AppendTag() encodes a tag,
protowire.AppendVarint() encodes a variable-width integer, and
protowire.AppendBytes() append bytes as is.

Our schema abstraction contains the appropriate information to encode a message (ProtobufIndex) and to generate a proto definition file (fields starting with Protobuf):

type Column struct  
    Key       ColumnKey
    Name      string
    Disabled  bool
    // [ ]
    // For protobuf.
    ProtobufIndex    protowire.Number
    ProtobufType     protoreflect.Kind // Uint64Kind, Uint32Kind,  
    ProtobufEnum     map[int]string
    ProtobufEnumName string
    ProtobufRepeated bool

We have a few helper methods around the protowire functions to directly encode the fields while decoding the flows. They skip disabled fields or non-repeated fields already encoded. Here is an excerpt of the sFlow decoder:

sch.ProtobufAppendVarint(bf, schema.ColumnBytes, uint64(recordData.Base.Length))
sch.ProtobufAppendVarint(bf, schema.ColumnProto, uint64(recordData.Base.Protocol))
sch.ProtobufAppendVarint(bf, schema.ColumnSrcPort, uint64(recordData.Base.SrcPort))
sch.ProtobufAppendVarint(bf, schema.ColumnDstPort, uint64(recordData.Base.DstPort))
sch.ProtobufAppendVarint(bf, schema.ColumnEType, helpers.ETypeIPv4)

For fields that are required later in the pipeline, like source and destination addresses, they are stored unencoded in a separate structure:

type FlowMessage struct  
    TimeReceived uint64
    SamplingRate uint32
    // For exporter classifier
    ExporterAddress netip.Addr
    // For interface classifier
    InIf  uint32
    OutIf uint32
    // For geolocation or BMP
    SrcAddr netip.Addr
    DstAddr netip.Addr
    NextHop netip.Addr
    // Core component may override them
    SrcAS     uint32
    DstAS     uint32
    GotASPath bool
    // protobuf is the protobuf representation for the information not contained above.
    protobuf      []byte
    protobufSet   bitset.BitSet

The protobuf slice holds encoded data. It is initialized with a capacity of 500 bytes to avoid resizing during encoding. There is also some reserved room at the beginning to be able to encode the total size as a variable-width integer. Upon finalizing encoding, the remaining fields are added and the message length is prefixed:

func (schema *Schema) ProtobufMarshal(bf *FlowMessage) []byte  
    schema.ProtobufAppendVarint(bf, ColumnTimeReceived, bf.TimeReceived)
    schema.ProtobufAppendVarint(bf, ColumnSamplingRate, uint64(bf.SamplingRate))
    schema.ProtobufAppendIP(bf, ColumnExporterAddress, bf.ExporterAddress)
    schema.ProtobufAppendVarint(bf, ColumnSrcAS, uint64(bf.SrcAS))
    schema.ProtobufAppendVarint(bf, ColumnDstAS, uint64(bf.DstAS))
    schema.ProtobufAppendIP(bf, ColumnSrcAddr, bf.SrcAddr)
    schema.ProtobufAppendIP(bf, ColumnDstAddr, bf.DstAddr)
    // Add length and move it as a prefix
    end := len(bf.protobuf)
    payloadLen := end - maxSizeVarint
    bf.protobuf = protowire.AppendVarint(bf.protobuf, uint64(payloadLen))
    sizeLen := len(bf.protobuf) - end
    result := bf.protobuf[maxSizeVarint-sizeLen : end]
    copy(result, bf.protobuf[end:end+sizeLen])
    return result

Minimizing allocations is critical for maintaining encoding performance. The benchmark tests should be run with the -benchmem flag to monitor allocation numbers. Each allocation incurs an additional cost to the garbage collector. The Go profiler is a valuable tool for identifying areas of code that can be optimized:

$ go test -run=__nothing__ -bench=Netflow/with_encoding \
>         -benchmem -cpuprofile profile.out \
>         akvorado/inlet/flow
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
Netflow/with_encoding-12             143953              7955 ns/op            8256 B/op        134 allocs/op
PASS
ok      akvorado/inlet/flow     1.418s
$ go tool pprof profile.out
File: flow.test
Type: cpu
Time: Feb 4, 2023 at 8:12pm (CET)
Duration: 1.41s, Total samples = 2.08s (147.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web

After using the internal schema instead of code generated from the proto definition file, the performance improved. However, this comparison is not entirely fair as less information is being decoded and previously GoFlow2 was decoding to its own structure, which was then copied to our own version.

goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-2.txt                bench-3.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      10.284    2%   7.758    3%  -24.56% (p=0.000 n=10)
Netflow/without_encoding-12    8.975    2%   7.304    2%  -18.61% (p=0.000 n=10)
Sflow/with_encoding-12         16.67    2%   14.26    1%  -14.50% (p=0.000 n=10)
Sflow/without_encoding-12      14.87    1%   13.56    2%   -8.80% (p=0.000 n=10)

As for testing, we use github.com/jhump/protoreflect: the protoparse package parses the proto definition file we generate and the dynamic package decodes the messages. Check the ProtobufDecode() method for more details.⁴ To get the final figures, I have also optimized the decoding in GoFlow2. It was relying heavily on binary.Read(). This function may use reflection in certain cases and each call allocates a byte array to read data. Replacing it with a more efficient version provides the following improvement:

goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-3.txt                bench-4.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12       7.758    3%   7.365    2%   -5.07% (p=0.000 n=10)
Netflow/without_encoding-12    7.304    2%   6.931    3%   -5.11% (p=0.000 n=10)
Sflow/with_encoding-12        14.256    1%   9.834    2%  -31.02% (p=0.000 n=10)
Sflow/without_encoding-12     13.559    2%   9.353    2%  -31.02% (p=0.000 n=10)

It is now easier to collect new data and the inlet component is faster!

Notice Some paragraphs were editorialized by ChatGPT, using editorialize and keep it short as a prompt. The result was proofread by a human for correctness. The main idea is that ChatGPT should be better at English than me.

While empty fields are not serialized to Protocol Buffers, empty columns in ClickHouse take some space, even if they compress well. Moreover, unused fields are still decoded and they may clutter the interface.
There is a similar function using NetFlow. NetFlow and IPFIX protocols are less complex to decode than sFlow as they are using a simpler TLV structure.

vtprotobuf generates more optimized Go code by removing an abstraction layer. It directly generates the code encoding each field to bytes:

if m.OutIfSpeed != 0  
    i = encodeVarint(dAtA, i, uint64(m.OutIfSpeed))
    i--
    dAtA[i] = 0x6
    i--
    dAtA[i] = 0xd8

There is also a protoprint package to generate proto definition file. I did not use it.

Just days after a build-fix release (for aarch64) and still only a few weeks after the 0.2.0 release of RcppTOML and its switch to toml++, we have another bugfix release 0.2.2 on CRAN also bringing release 3.3.0 of toml++ (even if we had large chunks of 3.3.0 already incorporated). TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. The package was building fine on Intel-based macOS provided the versions were recent enough. CRAN, however, aims for the broadest possibly reach of binaries and builds on a fairly ancient macOS 10.13 with clang version 10. This confused toml++ into (wrongly) concluding it could not build when it in fact can. After a hint from Simon that Apple in their infinite wisdom redefines clang version ids, this has been reflected in version 3.3.0 of toml++ by Mark so we should now build everywhere. Big thanks to everybody for the help. The short summary of changes follows.

Changes in version 0.2.2 (2023-01-29)

New toml++ version 3.3.0 with fix to permit compilation on ancient macOS systems as used by CRAN for the Intel-based builds.

Courtesy of my CRANberries, there is a diffstat report for this release. More information is on the RcppTOML page page. Please use the GitHub issue tracker for issues and bugreports. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Two weeks after the release of RcppTOML 0.2.0 and the switch to toml++, we have a quick bugfix release 0.2.1. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. Some architectures, aarch64 included, got confused over float16 which is of course a tiny two-byte type nobody should need. After consulting with Mark we concluded to (at least for now) simply override this excluding the use of float16 . The short summary of changes follows.

Changes in version 0.2.1 (2023-01-25)

Explicitly set -DTOML_ENABLE_FLOAT16=0 to permit compilation on some architectures stumbling of the type.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

A few years since the last release in late 2020, the RcppTOML package is now back with a new and shiny CRAN release 0.2.0. It is now based on the wonderful toml++ C++17 library by Mark Gillard and gets us (at long last!) full TOML v1.0.0 compliance for use with R. TOML is a file format that is most suitable for configurations, as it is meant to be edited by humans but read by computers. It emphasizes strong readability for humans while at the same time supporting strong typing as well as immediate and clear error reports. On small typos you get parse errors, rather than silently corrupted garbage. Much preferable to any and all of XML, JSON or YAML though sadly these may be too ubiquitous now. TOML is frequently being used with the projects such as the Hugo static blog compiler, or the Cargo system of Crates (aka packages ) for the Rust language. This package is a rewrite of the internals interfacing the library, and updates the package to using toml++ and C++17. The R interface is unchanged, and a full run of reverse dependencies passed. This involved finding one sole test failure which turned to have been driven by a non-conforming TOML input file which Jianfeng Li kindly fixed at the source making his (extensive) set of tests in package configr pass too. The actual rewrite was mostly done in a one-off repo RcppTomlPlusPlus which can now be considered frozen. The short summary of changes follows.

Changes in version 0.2.0 (2023-01-10)

Rewritten in C++17 using toml++ for TOML v1.0.0 compliance

Unchanged interface from R, unchanged (and expanded tests)

Several small continuous integration upgrades since last release

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

I've been playing with an Orbic Speed, a relatively outdated device that only speaks LTE Cat 4, but the towers I can see from here are, uh, not well provisioned so throughput really isn't a concern (and refurbs are $18, so). As usual I'm pretty terrible at just buying devices and using them for their intended purpose, and in this case it has the irritating behaviour that if there's a power cut and the battery runs out it doesn't boot again when power returns, so here's what I've learned so far.

First, it's clearly running Linux (nmap indicates that, as do the headers from the built-in webserver). The login page for the web interface has some text reading "Open Source Notice" that highlights when you move the mouse over it, but that's it - there's code to make the text light up, but it's not actually a link. There's no exposed license notices at all, although there is a copy on the filesystem that doesn't seem to be reachable from anywhere. The notice tells you to email them to receive source code, but doesn't actually provide an email address.

Still! Let's see what else we can figure out. There's no open ports other than the web server, but there is an update utility that includes some interesting components. First, there's a copy of adb, the Android Debug Bridge. That doesn't mean the device is running Android, it's common for embedded devices from various vendors to use a bunch of Android infrastructure (including the bootloader) while having a non-Android userland on top. But this is still slightly surprising, because the device isn't exposing an adb interface over USB. There's also drivers for various Qualcomm endpoints that are, again, not exposed. Running the utility under Windows while the modem is connected results in the modem rebooting and Windows talking about new hardware being detected, and watching the device manager shows a bunch of COM ports being detected and bound by Qualcomm drivers. So, what's it doing?

Sticking the utility into Ghidra and looking for strings that correspond to the output that the tool conveniently leaves in the logs subdirectory shows that after finding a device it calls vendor_device_send_cmd(). This is implemented in a copy of libusb-win32 that, again, has no offer for source code. But it's also easy to drop that into Ghidra and discover thatn vendor_device_send_cmd() is just a wrapper for usb_control_msg(dev,0xc0,0xa0,0,0,NULL,0,1000);. Sending that from Linux results in the device rebooting and suddenly exposing some more USB endpoints, including a functional adb interface. Although, annoyingly, the rndis interface that enables USB tethering via the modem is now missing.

Unfortunately the adb user is unprivileged, but most files on the system are world-readable. data/logs/atfwd.log is especially interesting. This modem has an application processor built into the modem chipset itself, and while the modem implements the Hayes Command Set there's also a mechanism for userland to register that certain AT commands should be pushed up to userland. These are handled by the atfwd_daemon that runs as root, and conveniently logs everything it's up to. This includes having logged all the communications executed when the update tool was run earlier, so let's dig into that.

The system sends a bunch of AT+SYSCMD= commands, each of which is in the form of echo (stuff) >>/usrdata/sec/chipid. Once that's all done, it sends AT+CHIPID, receives a response of CHIPID:PASS, and then AT+SER=3,1, at which point the modem reboots back into the normal mode - adb is gone, but rndis is back. But the logs also reveal that between the CHIPID request and the response is a security check that involves RSA. The logs on the client side show that the text being written to the chipid file is a single block of base64 encoded data. Decoding it just gives apparently random binary. Heading back to Ghidra shows that atfwd_daemon is reading the chipid file and then decrypting it with an RSA key. The key is obtained by calling a series of functions, each of which returns a long base64-encoded string. Decoding each of these gives 1028 bytes of high entropy data, which is then passed to another function that decrypts it using AES CBC using a key of 000102030405060708090a0b0c0d0e0f and an initialization vector of all 0s. This is somewhat weird, since there's 1028 bytes of data and 128 bit AES works on blocks of 16 bytes. The behaviour of OpenSSL is apparently to just pad the data out to a multiple of 16 bytes, but that implies that we're going to end up with a block of garbage at the end. It turns out not to matter - despite the fact that we decrypt 1028 bytes of input only the first 200 bytes mean anything, with the rest just being garbage. Concatenating all of that together gives us a PKCS#8 private key blob in PEM format. Which means we have not only the private key, but also the public key.

So, what's in the encrypted data, and where did it come from in the first place? It turns out to be a JSON blob that contains the IMEI and the serial number of the modem. This is information that can be read from the modem in the first place, so it's not secret. The modem decrypts it, compares the values in the blob to its own values, and if they match sets a flag indicating that validation has succeeeded. But what encrypted it in the first place? It turns out that the json blob is just POSTed to http://pro.w.ifelman.com/api/encrypt and an encrypted blob returned. Of course, the fact that it's being encrypted on the server with the public key and sent to the modem that decrypted with the private key means that having access to the modem gives us the public key as well, which means we can just encrypt our own blobs.

What does that buy us? Digging through the code shows the only case that it seems to matter is when parsing the AT+SER command. The first argument to this is the serial mode to transition to, and the second is whether this should be a temporary transition or a permanent one. Once parsed, these arguments are passed to /sbin/usb/compositions/switch_usb which just writes the mode out to /usrdata/mode.cfg (if permanent) or /usrdata/mode_tmp.cfg (if temporary). On boot, /data/usb/boot_hsusb_composition reads the number from this file and chooses which USB profile to apply. This requires no special permissions, except if the number is 3 - if so, the RSA verification has to be performed first. This is somewhat strange, since mode 9 gives the same rndis functionality as mode 3, but also still leaves the debug and diagnostic interfaces enabled.

So what's the point of all of this? I'm honestly not sure! It doesn't seem like any sort of effective enforcement mechanism (even ignoring the fact that you can just create your own blobs, if you change the IMEI on the device somehow, you can just POST the new values to the server and get back a new blob), so the best I've been able to come up with is to ensure that there's some mapping between IMEI and serial number before the device can be transitioned into production mode during manufacturing.

But, uh, we can just ignore all of this anyway. Remember that AT+SYSCMD= stuff that was writing the data to /usrdata/sec/chipid in the first place? Anything that's passed to AT+SYSCMD is just executed as root. Which means we can just write a new value (including 3) to /usrdata/mode.cfg in the first place, without needing to jump through any of these hoops. Which also means we can just adb push a shell onto there and then use the AT interface to make it suid root, which avoids needing to figure out how to exploit any of the bugs that are just sitting there given it's running a 3.18.48 kernel.

Anyway, I've now got a modem that's got working USB tethering and also exposes a working adb interface, and I've got root on it. Which let me dump the bootloader and discover that it implements fastboot and has an oem off-mode-charge command which solves the problem I wanted to solve of having the device boot when it gets power again. Unfortunately I still need to get into fastboot mode. I haven't found a way to do it through software (adb reboot bootloader doesn't do anything), but this post suggests it's just a matter of grounding a test pad, at which point I should just be able to run fastboot oem off-mode-charge and it'll be all set. But that's a job for tomorrow.

Edit: Got into fastboot mode and ran fastboot oem off-mode-charge 0 but sadly it doesn't actually do anything, so I guess next is going to involve patching the bootloader binary. Since it's signed with a cert titled "General Use Test Key (for testing only)" it apparently doesn't have secure boot enabled, so this should be easy enough.

comments

Well this is a mouthful. I recently worked on a neat hack called puppet-package-check. It is designed to warn about manually installed packages, to make sure "everything is in Puppet". But it turns out it can (probably?) dramatically decrease the bootstrap time of Puppet bootstrap when it needs to install a large number of packages.

Detecting manual packages On a cleanly filed workstation, it looks like this:

root@emma:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
0 unmanaged packages found

A messy workstation will look like this:

root@curie:/home/anarcat/bin# ./puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
288 unmanaged packages found
apparmor-utils beignet-opencl-icd bridge-utils clustershell cups-pk-helper davfs2 dconf-cli dconf-editor dconf-gsettings-backend ddccontrol ddrescueview debmake debootstrap decopy dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dnsdiag dropbear-initramfs ebtables efibootmgr elpa-lua-mode entr eog evince figlet file file-roller fio flac flex font-manager fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi freetype2-demos ftp fwupd-amd64-signed gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdisk gdm3 gdu gedit gedit-plugins gettext-base git-debrebase gnome-boxes gnote gnupg2 golang-any golang-docker-credential-helpers golang-golang-x-tools grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gtypist gvfs-backends hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-gtk3 ibus-libpinyin ibus-pinyin im-config imediff img2pdf imv initramfs-tools input-utils installation-birthday internetarchive ipmitool iptables iptraf-ng jackd2 jupyter jupyter-nbextension-jupyter-js-widgets jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools keyboard-configuration kfind konsole krb5-locales kwin-x11 leiningen lightdm lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 lynx lz4json magic-wormhole mailscripts mailutils manuskript mat2 mate-notification-daemon mate-themes mime-support mktorrent mp3splt mpdris2 msitools mtp-tools mtree-netbsd mupdf nautilus nautilus-sendto ncal nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache npm2deb ntfs-3g ntpdate nvme-cli nwipe obs-studio okular-extra-backends openstack-clients openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby ruby-dev ruby-feedparser ruby-magic ruby-mocha ruby-ronn rygel-playbin rygel-tracker s-tui sanoid saytime scrcpy scrcpy-server screenfetch scrot sdate sddm seahorse shim-signed sigil smartmontools smem smplayer sng sound-juicer sound-theme-freedesktop spectre-meltdown-checker sq ssh-audit sshuttle stress-ng strongswan strongswan-swanctl syncthing system-config-printer system-config-printer-common system-config-printer-udev systemd-bootchart systemd-container tardiff task-desktop task-english task-ssh-server tasksel tellico texinfo texlive-fonts-extra texlive-lang-cyrillic texlive-lang-french texlive-lang-german texlive-lang-italian texlive-xetex tftp-hpa thunar-archive-plugin tidy tikzit tint2 tintin++ tipa tpm2-tools traceroute tree trocla ucf udisks2 unifont unrar-free upower usbguard uuid-runtime vagrant-cachier vagrant-libvirt virt-manager vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas wireshark xapian-tools xclip xdg-user-dirs-gtk xlax xmlto xsensors xserver-xorg xsltproc xxd xz-utils yubioath-desktop zathura zathura-pdf-poppler zenity zfs-dkms zfs-initramfs zfsutils-linux zip zlib1g zlib1g-dev
157 old: apparmor-utils clustershell davfs2 dconf-cli dconf-editor ddccontrol ddrescueview decopy dnsdiag ebtables efibootmgr elpa-lua-mode entr figlet file-roller fio flac flex font-manager freetype2-demos ftp gallery-dl gcc-arm-linux-gnueabihf gcolor3 gcp gdu gedit git-debrebase gnote golang-docker-credential-helpers golang-golang-x-tools gtypist hackrf hashcat html2text httpie httping hugo humanfriendly iamerican-huge ibus ibus-pinyin imediff input-utils internetarchive ipmitool iptraf-ng jackd2 jupyter-qtconsole k3b kbtin kdialog keditbookmarks keepassxc kexec-tools kfind konsole leiningen lightdm lynx lz4json magic-wormhole manuskript mat2 mate-notification-daemon mktorrent mp3splt msitools mtp-tools mtree-netbsd nautilus nautilus-sendto nd ndisc6 neomutt net-tools nethogs nghttp2-client nocache ntpdate nwipe obs-studio openstack-pkg-tools paprefs pass-extension-audit pcmanfm pdf-presenter-console pdf2svg percol pipenv playerctl qjackctl qpdfview qrencode r-cran-ggplot2 r-cran-reshape2 rake restic rhash rpl rpm2cpio rs ruby-feedparser ruby-magic ruby-mocha ruby-ronn s-tui saytime scrcpy screenfetch scrot sdate seahorse shim-signed sigil smem smplayer sng sound-juicer spectre-meltdown-checker sq ssh-audit sshuttle stress-ng system-config-printer system-config-printer-common tardiff tasksel tellico texlive-lang-cyrillic texlive-lang-french tftp-hpa tikzit tint2 tintin++ tpm2-tools traceroute tree unrar-free vagrant-cachier vagrant-libvirt vmtouch vorbis-tools w3m wamerican wamerican-huge wfrench whipper whohas xdg-user-dirs-gtk xlax xmlto xsensors xxd yubioath-desktop zenity zip
131 new: beignet-opencl-icd bridge-utils cups-pk-helper dconf-gsettings-backend debmake debootstrap dict-devil dict-freedict-eng-fra dict-freedict-eng-spa dict-freedict-fra-eng dict-freedict-spa-eng diffoscope dropbear-initramfs eog evince file fonts-cantarell fonts-inconsolata fonts-ipafont-gothic fonts-ipafont-mincho fonts-liberation fonts-monoid fonts-monoid-tight fonts-noto fonts-powerline fonts-symbola freeipmi fwupd-amd64-signed gdisk gdm3 gedit-plugins gettext-base gnome-boxes gnupg2 golang-any grub-efi-amd64-signed gsettings-desktop-schemas gsfonts gstreamer1.0-libav gstreamer1.0-plugins-base gstreamer1.0-plugins-good gstreamer1.0-plugins-ugly gstreamer1.0-pulseaudio gvfs-backends ibus-gtk3 ibus-libpinyin im-config img2pdf imv initramfs-tools installation-birthday iptables jupyter jupyter-nbextension-jupyter-js-widgets keyboard-configuration krb5-locales kwin-x11 lintian linux-image-amd64 linux-perf lmodern lsb-base lvm2 mailscripts mailutils mate-themes mime-support mpdris2 mupdf ncal npm2deb ntfs-3g nvme-cli okular-extra-backends openstack-clients plymouth plymouth-themes popularity-contest progress prometheus-node-exporter psensor pubpaste pulseaudio python3-ldap ruby ruby-dev rygel-playbin rygel-tracker sanoid scrcpy-server sddm smartmontools sound-theme-freedesktop strongswan strongswan-swanctl syncthing system-config-printer-udev systemd-bootchart systemd-container task-desktop task-english task-ssh-server texinfo texlive-fonts-extra texlive-lang-german texlive-lang-italian texlive-xetex thunar-archive-plugin tidy tipa trocla ucf udisks2 unifont upower usbguard uuid-runtime virt-manager wireshark xapian-tools xclip xserver-xorg xsltproc xz-utils zathura zathura-pdf-poppler zfs-dkms zfs-initramfs zfsutils-linux zlib1g zlib1g-dev

Yuck! That's a lot of shit to go through. Notice how the packages get sorted between "old" and "new" packages. This is because popcon is used as a tool to mark which packages are "old". If you have unmanaged packages, the "old" ones are likely things that you can uninstall, for example. If you don't have popcon installed, you'll also get this warning:

popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'

The error can otherwise be safely ignored, but you won't get "help" prioritizing the packages to add to your manifests. Note that the tool ignores packages that were "marked" (see apt-mark(8)) as automatically installed. This implies that you might have to do a little bit of cleanup the first time you run this, as Debian doesn't necessarily mark all of those packages correctly on first install. For example, here's how it looks like on a clean install, after Puppet ran:

root@angela:/home/anarcat# ./bin/puppet-package-check -v
listing puppet packages...
listing apt packages...
loading apt cache...
127 unmanaged packages found
ca-certificates console-setup cryptsetup-initramfs dbus file gcc-12-base gettext-base grub-common grub-efi-amd64 i3lock initramfs-tools iw keyboard-configuration krb5-locales laptop-detect libacl1 libapparmor1 libapt-pkg6.0 libargon2-1 libattr1 libaudit-common libaudit1 libblkid1 libbpf0 libbsd0 libbz2-1.0 libc6 libcap-ng0 libcap2 libcap2-bin libcom-err2 libcrypt1 libcryptsetup12 libdb5.3 libdebconfclient0 libdevmapper1.02.1 libedit2 libelf1 libext2fs2 libfdisk1 libffi8 libgcc-s1 libgcrypt20 libgmp10 libgnutls30 libgpg-error0 libgssapi-krb5-2 libhogweed6 libidn2-0 libip4tc2 libiw30 libjansson4 libjson-c5 libk5crypto3 libkeyutils1 libkmod2 libkrb5-3 libkrb5support0 liblocale-gettext-perl liblockfile-bin liblz4-1 liblzma5 libmd0 libmnl0 libmount1 libncurses6 libncursesw6 libnettle8 libnewt0.52 libnftables1 libnftnl11 libnl-3-200 libnl-genl-3-200 libnl-route-3-200 libnss-systemd libp11-kit0 libpam-systemd libpam0g libpcre2-8-0 libpcre3 libpcsclite1 libpopt0 libprocps8 libreadline8 libselinux1 libsemanage-common libsemanage2 libsepol2 libslang2 libsmartcols1 libss2 libssl1.1 libssl3 libstdc++6 libsystemd-shared libsystemd0 libtasn1-6 libtext-charwidth-perl libtext-iconv-perl libtext-wrapi18n-perl libtinfo6 libtirpc-common libtirpc3 libudev1 libunistring2 libuuid1 libxtables12 libxxhash0 libzstd1 linux-image-amd64 logsave lsb-base lvm2 media-types mlocate ncurses-term pass-extension-otp puppet python3-reportbug shim-signed tasksel ucf usr-is-merged util-linux-extra wpasupplicant xorg zlib1g
popcon stats not available: [Errno 2] No such file or directory: '/var/log/popularity-contest'

Normally, there should be unmanaged packages here. But because of the way Debian is installed, a lot of libraries and some core packages are marked as manually installed, and are of course not managed through Puppet. There are two solutions to this problem:

really manage everything in Puppet (argh)
mark packages as automatically installed

I typically chose the second path and mark a ton of stuff as automatic. Then either they will be auto-removed, or will stop being listed. In the above scenario, one could mark all libraries as automatically installed with:

apt-mark auto $(./bin/puppet-package-check   grep -o 'lib[^ ]*')

... but if you trust that most of that stuff is actually garbage that you don't really want installed anyways, you could just mark it all as automatically installed:

apt-mark auto $(./bin/puppet-package-check)

In my case, that ended up keeping basically all libraries (because of course they're installed for some reason) and auto-removing this:

dh-dkms discover-data dkms libdiscover2 libjsoncpp25 libssl1.1 linux-headers-amd64 mlocate pass-extension-otp pass-otp plocate x11-apps x11-session-utils xinit xorg

You'll notice xorg in there: yep, that's bad. Not what I wanted. But for some reason, on other workstations, I did not actually have xorg installed. Turns out having xserver-xorg is enough, and that one has dependencies. So now I guess I just learned to stop worrying and live without X(org).

Optimizing large package installs But that, of course, is not all. Why make things simple when you can have an unreadable title that is trying to be both syntactically correct and click-baity enough to flatter my vain ego? Right. One of the challenges in bootstrapping Puppet with large package lists is that it's slow. Puppet lists packages as individual resources and will basically run apt install $PKG on every package in the manifest, one at a time. While the overhead of apt is generally small, when you add things like apt-listbugs, apt-listchanges, needrestart, triggers and so on, it can take forever setting up a new host. So for initial installs, it can actually makes sense to skip the queue and just install everything in one big batch. And because the above tool inspects the packages installed by Puppet, you can run it against a catalog and have a full lists of all the packages Puppet would install, even before I even had Puppet running. So when reinstalling my laptop, I basically did this:

apt install puppet-agent/experimental
puppet agent --test --noop
apt install $(./puppet-package-check --debug \
    2>&1   grep ^puppet\ packages 
      sed 's/puppet packages://;s/ /\n/g'
      grep -v -e onionshare -e golint -e git-sizer -e github-backup -e hledger -e xsane -e audacity -e chirp -e elpa-flycheck -e elpa-lsp-ui -e yubikey-manager -e git-annex -e hopenpgp-tools -e puppet
) puppet-agent/experimental

That massive grep was because there are currently a lot of packages missing from bookworm. Those are all packages that I have in my catalog but that still haven't made it to bookworm. Sad, I know. I eventually worked around that by adding bullseye sources so that the Puppet manifest actually ran. The point here is that this improves the Puppet run time a lot. All packages get installed at once, and you get a nice progress bar. Then you actually run Puppet to deploy configurations and all the other goodies:

puppet agent --test

I wish I could tell you how much faster that ran. I don't know, and I will not go through a full reinstall just to please your curiosity. The only hard number I have is that it installed 444 packages (which exploded in 10,191 packages with dependencies) in a mere 10 minutes. That might also be with the packages already downloaded. In any case, I have that gut feeling it's faster, so you'll have to just trust my gut. It is, after all, much more important than you might think.

Similar work The blueprint system is something similar to this:

It figures out what you ve done manually, stores it locally in a Git repository, generates code that s able to recreate your efforts, and helps you deploy those changes to production

That tool has unfortunately been abandoned for a decade at this point. Also note that the AutoRemove::RecommendsImportant and AutoRemove::SuggestsImportant are relevant here. If it is set to true (the default), a package will not be removed if it is (respectively) a Recommends or Suggests of another package (as opposed to the normal Depends). In other words, if you want to also auto-remove packages that are only Suggests, you would, for example, add this to apt.conf:

AutoRemove::SuggestsImportant false;

Paul Wise has tried to make the Debian installer and debootstrap properly mark packages as automatically installed in the past, but his bug reports were rejected. The other suggestions in this section are also from Paul, thanks!

Fiction A few days ago somebody asked me and I think it is an often requested to perhaps all fiction readers as to why we like fiction? First of all, reading in itself is told as food for the soul. Because, whenever you write or read anything you don t just read it, you also visualize it. And that visualization is and would be far greater than any attempt in cinema as there are no budget constraints and it takes no more than a minute to visualize a scenario if the writer is any good. You just close your eyes and in a moment you are transported to a different world. This is also what is known as world building . Something fantasy writers are especially gifted in. Also, with the whole parallel Universes being a reality, it is just so much fertile land for imagination that I just cannot believe that it hasn t been worked to death to date. And you do need a lot of patience to make a world, to make characters, to make characters a bit eccentric one way or the other. And you have to know to put into a three, five, or whatever number of acts you want to put in. And then, of course, they have readers like us who dream and add more color to the story than the author did. As we take his, her, or their story and weave countless stories depending on where we are, where we are and who we are. What people need to understand is that not just readers want escapism but writers too want to escape from the human condition. And they find solace in whatever they write. The well-known example of J.R.R. Tolkien is always there. How he must have felt each day coming after war, to somehow find the strength and just dream away, transport himself to a world of hobbits, elves, and other mysterious beings. It surely must have taken a lot of pain from him that otherwise, he would have felt. There are many others. What also does happen now and then, is authors believe in their own intelligence so much, that they commit crimes, but that s par for the course.

Dean Koontz, Odd Apocalypse Currently, I am reading the above title. It is perhaps one of the first horror title books that I have read which has so much fun. The hero has a great sense of wit, humor, and sarcasm that you can cut butter with it. Now if you got that, this is par for the wordplay happening every second paragraph and I m just 100 pages in of the 500-page Novel. Now, while I haven t read the whole book and I m just speculating, what if at the end we realize that the hero all along was or is the villain. Sadly, we don t have many such twisted stories and that too is perhaps because most people used to have black and white rather than grey characters. From all my reading, and even watching web series and whatnot, it is only the Europeans who seem to have a taste for exploring grey characters and giving twists at the end that people cannot anticipate. Even their heroes or heroines are grey characters. and they can really take you for a ride. It is also perhaps how we humans are, neither black nor white but more greyish. Having grey characters also frees the author quite a bit as she doesn t have to use so-called tropes and just led the characters to lead themselves.

Indian Book publishing Industry I do know Bengali stories do have a lot of grey characters, but sadly most of the good works are still in Bengali and not widely published compared to say European or American authors. While there is huge potential in the Indian publishing market for English books and there is also hunger, getting good and cheap publishers is the issue. Just recently SAGE publishing division shut down and this does not augur well for the Indian market. In the past few years, I and other readers have seen some very good publishing houses quit India for one reason or the other. GST has also made the sector more expensive. The only thing that works now and has been for some time is the seconds and thirds market. For e.g. I just bought today about 15-20 books @INR 125/- a kind of belated present for the self. That would be what, at the most 2 USD or 2 Euros per book. I bet even a burger costs more than that, but again India being a price-sensitive market, at these prices the seconds book sells. And these are all my favorite authors, Lee Child, Tom Clancy, Dean Koontz, and so on and so forth. I also saw a lot of fantasy books but they would have to wait for another day.

Tourism in India for Debconf 23 I had shared a while back that I would write a bit about tourism as Debconf or Annual Debian Conference will happen in India next year around this time. I was supposed to write it in the FAQ but couldn t find a place or a corner where I could write it. There are actually two things that people need to be aware of. The one thing that people need to be very aware of is food poisoning or Delhi Belly. This is a far too common sight that I have witnessed especially with westerners when they come to visit India. I am somewhat shocked that it hasn t been shared in the FAQ but then perhaps we cannot cover all the bases therein. I did find this interesting article and would recommend the suggestions given in it wholeheartedly. I would suggest people coming to India to buy and have purifying water tablets with them if they decide to stay back and explore India. Now the problem with tourism is, that one can have as much tourism as one wants. One of the unique ways I found some westerners having the time of their life is buying an Indian Rickshaw or Tuk-Tuk and traveling with it. A few years ago, when I was more adventourous-spirited I was able to meet a few of them. There is also the Race with Rickshaws that happens in Rajasthan and you get to see about 10 odd cities in and around Rajasthan state and get to see the vibrancy in the North. If somebody really wants to explore India, then I would suggest getting down to Goa, specifically, South Goa, meeting with the hippie crowd, and getting one of the hippie guidebooks to India. Most people forget that the Hippies came to India in the 1960s and many of them just never left. Tap water in Pune is ok, have seen and experienced the same in Himachal, Garwhal, and Uttarakhand, although it has been a few years since I have been to those places. North-East is a place I have yet to venture into. India does have a lot of beauty but most people are not clean-conscious so if you go to common tourist destinations, you will find a lot of garbage. Most cities in India do give you an option of homestays and some even offer food, so if you are on a budget as well as wanna experience life with an Indian family, that could be something you could look into. So you can see and share about India with different eyes. There is casteism, racism, and all that. Generally speaking, you would see it wielded a lot more in your face in North India than in South India where it is there but far more subtle. About food, what has been shared in the India BOF. Have to say, it doesn t even scratch the surface. If you stay with an Indian family, there is probably a much better chance of exploring the variety of food that India has to offer. From the western perspective, we tend to overcook stuff and make food with Masalas but that s the way most people like it. People who have had hot sauces or whatnot would probably find India much easier to adjust to as tastes might be similar to some extent. If you want to socialize with young people, while discos are an option, meetup.com also is a good place. You can share your passions and many people have taken to it with gusto. We also have been hosting Comiccons in India, but I haven t had the opportunity to attend them so far. India has a rich oral culture reach going back a few thousand years, but many of those who are practicing those reside more in villages rather than in cities. And while there have been attempts in the past to record them, most of those have come to naught as money runs out as there is no commercial viability to such projects, but that probably is for another day. In the end, what I have shared is barely a drop in the ocean that is India. Come, have fun, explore, enjoy and invigorate yourself and others

A notorious ex-DD decided to post garbage on his site in which he links my name to the suicide of Frans Pop, and mentions that my GPG key is currently disabled in the Debian keyring, along with some manufactured screenshots of the Debian NM site that allegedly show I'm no longer a DD. I'm not going to link to the post -- he deserves to be ridiculed, not given attention. Just to set the record straight, however: Frans Pop was my friend. I never treated him with anything but respect. I do not know why he chose to take his own life, but I grieved for him for a long time. It saddens me that Mr. Notorious believes it a good idea to drag Frans' name through the mud like this, but then, one can hardly expect anything else from him by this point. Although his post is mostly garbage, there is one bit of information that is correct, and that is that my GPG key is currently no longer in the Debian keyring. Nothing sinister is going on here, however; the simple fact of the matter is that I misplaced my OpenPGP key card, which means there is a (very very slight) chance that a malicious actor (like, perhaps, Mr. Notorious) would get access to my GPG key and abuse that to upload packages to Debian. Obviously we can't have that -- certainly not from him -- so for that reason, I asked the Debian keyring maintainers to please disable my key in the Debian keyring. I've ordered new cards; as soon as they arrive I'll generate a new key and perform the necessary steps to get my new key into the Debian keyring again. Given that shipping key cards to South Africa takes a while, this has taken longer than I would have initially hoped, but I'm hoping at this point that by about halfway September this hurdle will have been taken, meaning, I will be able to exercise my rights as a Debian Developer again. As for Mr. Notorious, one can only hope he will get the psychiatric help he very obviously needs, sooner rather than later, because right now he appears to be more like a goat yelling in the desert. Ah well.

Munnar is a hill station in Idukki district of Kerala, India. Home to 2nd largest tea plantation in the country. Lot of people visit here on summer and in winter as well. I live in the neighboring district of Munnar though I never made a visit. In my mind I pictured Munnar as a Tourist trap with lots of garbage lying around. I recently made a visit and it changed my perception of this place.

Little background I never liked tea much. I am also not a coffee person either. But if I have to choose over two that will be coffee because of the strong aroma. Going to relatives house, they always offered hot tea defacto. I always find difficult say no to their friendly gesture. But I hate tea. A generation before me drinks lot of tea here at my place. You can see tea stalls in every corner and people sipping tea. I always wondered why people drink lot of tea on a hot country like India. The book I am currently trying to read has a chapter about Munnar and how it became a Tea plantation under the British rule. Well, around the same time. I watched a documentary program about the tea and Munnar.

Munnar Too much word here and there I decided to do a visit. I took a motorbike and started a journey to Munnar. Due to covid restrictions there weren t much tourists, so this was to my advantage. There are many water falls on the way to Munnar. Some are very close to road and some are far away but can be spotted. Munnar travel is just not about the destination because its never been a single spot. Enjoying the journey that the ride has to offer. I stayed at a hotel, little far away from town, though I never recommend hotels in Munnar. Try to find home stays and small establishments away from the town. There are British Era bungalows inside the plantations still maintained in good condition which can be booked per room or entire property. The lush greenery on the Mountains of tea plantation is very refreshing and feast to our eyes. The mornings and evenings of Munnar is something to watch, mountains wrapped in mist slowly uncovering with sunlight and again slipping to mist by dark evening. I planned only to visit places which are less explored by tourists. People here live a simple life. Most of them are plantation workers. The native people of Munnar are actually tribal folks but since the plantation boom many people from Tamil Nadu(neighboring state) and other parts of Kerala settled here. The houses of this plantation workers resembled Hobbit homes in Shire from Lord of the Rings as they are in the hill slides. The Kannan Devan hills, the biggest hill in area covers more than half of Munnar. Two famous Tea companies from Munnar are Tata Tea and KDHP(Kanan Devan Hills Plantations Company (P) Limited) tea. KDHP is actually an employee owned Tea company ie a good share of this company is owned by the employees working there. This was interesting to me, so I bought a bag of speciality tea from KDHP store on my return. I don t drink tea on a daily basis but I will try it on special occasions.

	IOPS	Bandwidth
4k random writes	19.3k	75.6 MiB/s
4k random reads	36.1k	141 MiB/s
Sequential writes		2300 MiB/s
Sequential reads		3800 MiB/s

Search Results: "bage"

8 March 2025

Changes in version 0.2.3 (2025-03-08) Correct the minimum version of Rcpp to 1.0.8 (Walter Somerville) The package now uses Authors@R as mandated by CRAN Updated 'whitespace in literal' issue upsetting clang++-20 Continuous integration updates including simpler r-ci setup

6 March 2025

29 December 2024

29 August 2024

1 May 2024

6 April 2024

12 September 2023

24 August 2023

29 June 2023

Cleanup Optionally, cleanup the mess you left on this computer: rm -r ~/.config/signal-cli podman image rm registry.gitlab.com/packaging/signal-cli/signal-cli-jre

18 April 2023

6 February 2023

29 January 2023

Changes in version 0.2.2 (2023-01-29) New toml++ version 3.3.0 with fix to permit compilation on ancient macOS systems as used by CRAN for the Intel-based builds.

26 January 2023

Changes in version 0.2.1 (2023-01-25) Explicitly set -DTOML_ENABLE_FLOAT16=0 to permit compilation on some architectures stumbling of the type.

10 January 2023

Changes in version 0.2.0 (2023-01-10) Rewritten in C++17 using toml++ for TOML v1.0.0 compliance Unchanged interface from R, unchanged (and expanded tests) Several small continuous integration upgrades since last release

26 November 2022

4 October 2022

29 September 2022

17 September 2022

30 August 2022

4 August 2022

Changes in version 0.2.3 (2025-03-08)

Correct the minimum version of Rcpp to 1.0.8 (Walter Somerville)

The package now uses Authors@R as mandated by CRAN

Updated 'whitespace in literal' issue upsetting clang++-20

Continuous integration updates including simpler r-ci setup

Cleanup Optionally, cleanup the mess you left on this computer:
`rm -r ~/.config/signal-cli podman image rm registry.gitlab.com/packaging/signal-cli/signal-cli-jre`

Changes in version 0.2.2 (2023-01-29)

New toml++ version 3.3.0 with fix to permit compilation on ancient macOS systems as used by CRAN for the Intel-based builds.

Changes in version 0.2.1 (2023-01-25)

Explicitly set `-DTOML_ENABLE_FLOAT16=0` to permit compilation on some architectures stumbling of the type.

Changes in version 0.2.0 (2023-01-10)

Rewritten in C++17 using toml++ for TOML v1.0.0 compliance

Unchanged interface from R, unchanged (and expanded tests)

Several small continuous integration upgrades since last release