Search Results: "jgoerzen"

11 June 2025

John Goerzen: I Learned We All Have Linux Seats, and I m Not Entirely Pleased

I recently wrote about How to Use SSH with FIDO2/U2F Security Keys, which I now use on almost all of my machines. The last one that needed this was my Raspberry Pi hooked up to my DEC vt510 terminal and IBM mechanical keyboard. Yes I do still use that setup! To my surprise, generating a key on it failed. I very quickly saw that /dev/hidraw0 had incorrect permissions, accessible only to root. On other machines, it looks like this:
crw-rw----+ 1 root root 243, 16 May 24 16:47 /dev/hidraw16
And, if I run getfacl on it, I see:
# file: dev/hidraw16
# owner: root
# group: root
user::rw-
user:jgoerzen:rw-
group::---
mask::rw-
other::---
Yes, something was setting an ACL on it. Thus began to saga to figure out what was doing that. Firing up inotifywatch, I saw it was systemd-udevd or its udev-worker. But cranking up logging on that to maximum only showed me that uaccess was somehow doing this. I started digging. uaccess turned out to be almost entirely undocumented. People say to use it, but there s no description of what it does or how. Its purpose appears to be to grant access to devices to those logged in to a machine by dynamically adding them to ACLs for devices. OK, that s a nice goal, but why was machine A doing this and not machine B? I dug some more. I came across a hint that uaccess may only do that for a seat . A seat? I ve not heard of that in Linux before. Turns out there s some information (older and newer) about this out there. Sure enough, on the machine with KDE, loginctl list-sessions shows me on seat0, but on the machine where I log in from ttyUSB0, it shows an empty seat. But how to make myself part of the seat? I tried various udev rules to add the seat or master-of-seat tags, but nothing made any difference. I finally gave up and did the old-fashioned rule to just make it work already:
TAG=="security-device",SUBSYSTEM=="hidraw",GROUP="mygroup"
I still don t know how to teach logind to add a seat for ttyUSB0, but oh well. At least I learned something. An annoying something, but hey. This all had a laudable goal, but when there are so many layers of indirection, poorly documented, with poor logging, it gets pretty annoying.

10 April 2025

John Goerzen: Announcing the NNCPNET Email Network

From 1995 to 2019, I ran my own mail server. It began with a UUCP link, an expensive long-distance call for me then. Later, I ran a mail server in my apartment, then ran it as a VPS at various places. But running an email server got difficult. You can t just run it on a residential IP. Now there s SPF, DKIM, DMARC, and TLS to worry about. I recently reviewed mail hosting services, and don t get me wrong: I still use one, and probably will, because things like email from my bank are critical. But we ve lost the ability to tinker, to experiment, to have fun with email. Not anymore. NNCPNET is an email system that runs atop NNCP. I ve written a lot about NNCP, including a less-ambitious article about point-to-point email over NNCP 5 years ago. NNCP is to UUCP what ssh is to telnet: a modernization, with modern security and features. NNCP is an asynchronous, onion-routed, store-and-forward network. It can use as a transport anything from the Internet to a USB stick. NNCPNET is a set of standards, scripts, and tools to facilitate a broader email network using NNCP as the transport. You can read more about NNCPNET on its wiki! The easy mode is to use the Docker container (multi-arch, so you can use it on your Raspberry Pi) I provide, which bundles: It is open to all. The homepage has a more extensive list of features. I even have mailing lists running on NNCPNET; see the interesting addresses page for more details. There is extensive documentation, and of course the source to the whole thing is available. The gateway to Internet SMTP mail is off by default, but can easily be enabled for any node. It is a full participant, in both directions, with SPF, DKIM, DMARC, and TLS. You don t need any inbound ports for any of this. You don t need an always-on Internet connection. You don t even need an Internet connection at all. You can run it from your laptop and still use Thunderbird to talk to it via its optional built-in IMAP server.

8 January 2025

John Goerzen: Censorship Is Complicated: What Internet History Says about Meta/Facebook

In light of this week s announcement by Meta (Facebook, Instagram, Threads, etc), I have been pondering this question: Why am I, a person that has long been a staunch advocate of free speech and encryption, leery of sites that talk about being free speech-oriented? And, more to the point, why an I a person that has been censored by Facebook for mentioning the Open Source social network Mastodon not cheering a lighter touch ? The answers are complicated, and take me back to the early days of social networking. Yes, I mean the 1980s and 1990s. Before digital communications, there were barriers to reaching a lot of people. Especially money. This led to a sort of self-censorship: it may be legal to write certain things, but would a newspaper publish a letter to the editor containing expletives? Probably not. As digital communications started to happen, suddenly people could have their own communities. Not just free from the same kinds of monetary pressures, but free from outside oversight (parents, teachers, peers, community, etc.) When you have a community that the majority of people lack the equipment to access and wouldn t understand how to access even if they had the equipment you have a place where self-expression can be unleashed. And, as J. C. Herz covers in what is now an unintentional history (her book Surfing on the Internet was published in 1995), self-expression WAS unleashed. She enjoyed the wit and expression of everything from odd corners of Usenet to the text-based open world of MOOs and MUDs. She even talks about groups dedicated to insults (flaming) in positive terms. But as I ve seen time and again, if there are absolutely no rules, then whenever a group gets big enough more than a few dozen people, say there are troublemakers that ruin it for everyone. Maybe it s trolling, maybe it s vicious attacks, you name it it will arrive and it will be poisonous. I remember the debates within the Debian community about this. Debian is one of the pillars of the Internet today, a nonprofit project with free speech in its DNA. And yet there were inevitably the poisonous people. Debian took too long to learn that allowing those people to run rampant was causing more harm than good, because having a well-worn Delete key and a tolerance for insults became a requirement for being a Debian developer, and that drove away people that had no desire to deal with such things. (I should note that Debian strikes a much better balance today.) But in reality, there were never absolutely no rules. If you joined a BBS, you used it at the whim of the owner (the sysop or system operator). The sysop may be a 16-yr-old running it from their bedroom, or a retired programmer, but in any case they were letting you use their resources for free and they could kick you off for any or no reason at all. So if you caused trouble, or perhaps insulted their cat, you re banned. But, in all but the smallest towns, there were other options you could try. On the other hand, sysops enjoyed having people call their BBSs and didn t want to drive everyone off, so there was a natural balance at play. As networks like Fidonet developed, a sort of uneasy approach kicked in: don t be excessively annoying, and don t be easily annoyed. Like it or not, it seemed to generally work. A BBS that repeatedly failed to deal with troublemakers could risk removal from Fidonet. On the more institutional Usenet, you generally got access through your university (or, in a few cases, employer). Most universities didn t really even know they were running a Usenet server, and you were generally left alone. Until you did something that annoyed somebody enough that they tracked down the phone number for your dean, in which case real-world consequences would kick in. A site may face the Usenet Death Penalty delinking from the network if they repeatedly failed to prevent malicious content from flowing through their site. Some BBSs let people from minority communities such as LGBTQ+ thrive in a place of peace from tormentors. A lot of them let people be themselves in a way they couldn t be in real life . And yes, some harbored trolls and flamers. The point I am trying to make here is that each BBS, or Usenet site, set their own policies about what their own users could do. These had to be harmonized to a certain extent with the global community, but in a certain sense, with BBSs especially, you could just use a different one if you didn t like what the vibe was at a certain place. That this free speech ethos survived was never inevitable. There were many attempts to regulate the Internet, and it was thanks to the advocacy of groups like the EFF that we have things like strong encryption and a degree of freedom online. With the rise of the very large platforms and here I mean CompuServe and AOL at first, and then Facebook, Twitter, and the like later the low-friction option of just choosing a different place started to decline. You could participate on a Fidonet forum from any of thousands of BBSs, but you could only participate in an AOL forum from AOL. The same goes for Facebook, Twitter, and so forth. Not only that, but as social media became conceived of as very large sites, it became impossible for a person with enough skill, funds, and time to just start a site themselves. Instead of neading a few thousand dollars of equipment, you d need tens or hundreds of millions of dollars of equipment and employees. All that means you can t really run Facebook as a nonprofit. It is a business. It should be absolutely clear to everyone that Facebook s mission is not the one they say it is [to] give people the power to build community and bring the world closer together. If that was their goal, they wouldn t be creating AI users and AI spam and all the rest. Zuck isn t showing courage; he s sucking up to Trump and those that will pay the price are those that always do: women and minorities. Really, the point of any large social network isn t to build community. It s to make the owners their next billion. They do that by convincing people to look at ads on their site. Zuck is as much a windsock as anyone else; he will adjust policies in whichever direction he thinks the wind is blowing so as to let him keep putting ads in front of eyeballs, and stomp all over principles even free speech doing it. Don t expect anything different from any large commercial social network either. Bluesky is going to follow the same trajectory as all the others. The problem with a one-size-fits-all content policy is that the world isn t that kind of place. For instance, I am a pacifist. There is a place for a group where pacifists can hang out with each other, free from the noise of the debate about pacifism. And there is a place for the debate. Forcing everyone that signs up for the conversation to sign up for the debate is harmful. Preventing the debate is often also harmful. One company can t square this circle. Beyond that, the fact that we care so much about one company is a problem on two levels. First, it indicates how succeptible people are to misinformation and such. I don t have much to offer on that point. Secondly, it indicates that we are too centralized. We have a solution there: Mastodon. Mastodon is a modern, open source, decentralized social network. You can join any instance, easily migrate your account from one server to another, and so forth. You pick an instance that suits you. There are thousands of others you can choose from. Some aggressively defederate with instances known to harbor poisonous people; some don t. And, to harken back to the BBS era, if you have some time, some skill, and a few bucks, you can run your own Mastodon instance. Personally, I still visit Facebook on occasion because some people I care about are mainly there. But it is such a terrible experience that I rarely do. Meta is becoming irrelevant to me. They are on a path to becoming irrelevant to many more as well. Maybe this is the moment to go shrug, this sucks and try something better. (And when you do, feel free to say hi to me at @jgoerzen@floss.social on Mastodon.)

16 May 2024

John Goerzen: Review of Reputable, Functional, and Secure Email Service

I last reviewed email services in 2019. That review focused a lot of attention on privacy. At the time, I selected mailbox.org as my provider, and have been using them for these 5 years since. However, both their service and their support have gone significantly downhill since, so it is time for me to look at other options. Here I am focusing strongly on email. Some of the providers mentioned here provide other services (IM, video calls, groupware, etc.), and to the extent they do, I am ignoring them.

What Matters in 2024
I want to start off by acknowledging that what you need in email probably depends on your circumstances and the country in which you live. For me, I begin by naming that the largest threat most of us face isn t from state actors but from criminals: hackers, ransomware gangs, etc. It is important to take as many steps as possible to secure one s account against that. Privacy and security are both part of the mix. I still value privacy but I am acknowledging, as Migadu does, that Email as we know it and encryption are incompatible. Although some of these services strongly protect parts of the conversation, the reality is that most people will be emailing people using plain old email services which don t. For stronger security, something like Signal would be needed. (I wrote about Signal in 2021 also.) Interestingly, OpenPGP support seems to be something of a standard feature in the providers I reviewed by this point. All or almost all of them provide integration with browser-based encryption as well as server-side encryption if you prefer that. Although mailbox.org can automatically PGP-encrypt every message that arrives in plaintext, for general use, this is unwieldy; there isn t good tooling for searching mailboxes where every message is encrypted, etc. So I never enabled that feature at Mailbox. I still value security and privacy, but a pragmatic approach addresses the most pressing threats first.

My criteria
The basic requirements for an email service include:
  1. Ability to use my own domains
  2. Strong privacy policy
  3. Ability for me to use my own IMAP and SMTP clients on both desktop and mobile
  4. It must be extremely reliable
  5. It must not be free
  6. It must have excellent support for those rare occasions when it is needed
  7. Support for basic aliases
Why do I say it must not be free? Because if someone is providing a service with the quality I m talking about here, and not charging for it, it implies something is fishy: either they are unscrupulous, are financially unstable, or the product is something else like ads. I am not aware of any provider that matches the other criteria with a free account anyhow. These providers range from about $30 to $90 per year, so cheaper than a Netflix subscription. Immediately, this rules out several options:
  • Proton doesn t let me use my own clients on mobile (their bridge is desktop-only)
  • Tuta also doesn t let me use my own clients
  • Posteo doesn t let me use my own domain
  • mxroute.com lacks a strong privacy policy, and its policy has numerous causes for concern (for instance, If you repeatedly send email to invalid/unroutable recipients, they may be published on our GitHub )
I will have a bit more to say about a couple of these providers below. There are some additional criteria that are strongly desired but not absolutely required:
  1. Ability to set individual access passwords for every device/app
  2. Support for two-factor authentication (2FA/TFA/TOTP) for web-based access
  3. Support for basics in filtering: ability to filter on envelope recipient (so if I get BCC d, I can still filter), and ability to execute more than one action on filter match (eg, deliver to two folders, or deliver to a folder and forward to someone else)
IMAP and SMTP don t really support 2FA, so by setting individual passwords for every device, you can at least limit the blast radius and cut off a specific device if something is (or might be) compromised.

The candidates
I considered these providers: Startmail, Mailfence, Runbox, Fastmail, Kolab, Mailbox.org, and Migadu. I ll review each, and highlight the pricing of the plan I would most likely use. Each provider offers multiple plans; some may be more expensive and some may be cheaper than the one I reviewed. I included a link to each provider s full pricing information so you can compare for your needs. I set up trials with each of these (except Mailbox.org, with which I already had a paid account). It so happend that I had actual questions for support for each one, which gave me an opportunity to see how support responded. I did not fabricate questions, and would not have contacted support if I didn t have real ones. (This means that I asked different questions of each provider, because they were the REAL questions I had.) I ll jump to the spoiler right now: I eventually chose Migadu, with Fastmail and Mailfence as close seconds. I looked for providers myself, and also solicited recommendations in a Mastodon thread.

Mailbox.org
I begin with Mailbox, as it was my top choice in 2019 and the incumbent. Until this year, I had been quite happy with it. I had cause to reach their support less than once a year on average, and each time they replied the same day or next day. Now, however, they are failing on reliability and on support. Their spam filter has become overly aggressive. It has blocked quite a bit of legitimate mail. When contacting their support about a prior issue earlier this year, they initially took 4 days to reply, and then 6 days to reply after that. Ouch. They had me disable some spam settings. It didn t really help. I continue to lose mail. I don t know how much, because they block a lot of it before it even hits the spam folder. One of my friends texted to say mail was dropping. I raised a new ticket with mailbox, which took them 5 days to reply to. Their reply was unhelpful. As the Internet is not a static system, unforeseen events can always occur. Well yes, that s true, and I get it, false positives exist with email. But this was from an ISP s mail system with an address that had been established for years, and it was part of a larger pattern of rejecting quite a bit of legit mail. And every interaction with them recently hasn t resulted in them actually doing anything to resolve anything. It s just a paragraph or two of reply that does nothing and helps nothing. When I complained that it took 5 days to reply, they said We have not been able to reply sooner as we are currently experiencing a high volume of customer enquiries. Even though their SLA for my account is a not-great 48 business hour turnaround, they still missed it and their reason is we re busy. I finally asked what RBL had caught the blocked email, since when I checked, the sender wasn t on any RBL. Mailbox s reply: they only keep their logs for 7 days, so next time I should contact them within 7 days. Which, of course, I DID; it was them that kept delaying. Ugh! It s like they ve become a cable company. Even worse is how they have been blocking mail from GrapheneOS s discussion form. See their thread about it. In short, Graphene s mail server has a clean reputation and Mailbox has no problem with it. But because one of Graphene s IPv6 webservers has an IPv6 allocation of a size Mailbox doesn t like, they drop mail. It s ridiculous, and Mailbox was dismissive of this well-known and well-regarded Open Source project. So if the likes of GrapheneOS can t get good faith effort to deliver their mail, what chance does an individual like me have? I m sorry, but I m literally paying you to deliver email for me and provide good support. If you can t do either of those, you don t get to push that problem down onto me. Hire appropriate staff. On the technical side, they support aliases, my own clients, and have a reasonable privacy policy. Their 2FA support exists for the web interface (though weirdly not the support site), though it is somewhat weird. They do not support app passwords. A somewhat unique feature is the @secure.mailbox.org domain. If you try to receive mail at that address, mailbox.org will block it unless it uses TLS. Same for sending. This isn t E2EE, but it does at least require things not be in plaintext for the last hop to Mailbox. Verdict: not recommended due to poor reliability and support. Mailbox.Org summary:
  • Website: https://mailbox.org/en/
  • Reliability: iffy due to over-aggressive spam filtering
  • Support: Poor; takes 4-6 days for a reply and replies are unhelpful
  • Individual access passwords: No
  • 2FA: Yes, but with a PIN instead of a password as the other factor
  • Filtering: Full SIEVE feature set and GUI editor
  • Spam settings: greylisting on/off, reject some/all spam, etc. But they re insufficient to address Mailbox s overzealousness, which support says I cannot workaround within the interface.
  • Server storage location: Germany
  • Plan as reviewed: standard [pricing link]
    • Cost per year: EUR 30 (about $33)
    • Mail storage included: 10GB
    • Limits on send/receive volume: none
    • Aliases: 50 on your domain name, 25 on mailbox.org
    • Additional mailboxes: Available; each one at the same fee as the primary mailbox

Startmail
I really wanted to like Startmail. Its vault is an interesting idea and should contribute to the security and privacy of an account. They clearly care about privacy. It falls down in filtering. They have no way to filter on envelope recipient (BCC or similar). Their support confirmed this to me and that s a showstopper. Startmail support was also as slow as Mailbox, taking 5 days to respond to me. Two showstoppers right there. Verdict: Not recommended due to slow support responsiveness and weak filtering. Startmail summary:
  • Website: https://www.startmail.com/
  • Reliability: Seems to be fine
  • Support: Mediocre; Took 5 days for a reply, but the reply was helpful
  • Individual app access passwords: Yes
  • 2FA: Yes
  • Filtering: Poor; cannot filter on envelope recipient, and can t build filters with multiple actions
  • Spam settings: None
  • Server storage location: The Netherlands
  • Plan as reviewed: Custom domain (trial was Personal), [pricing link]
    • Cost per year: $70
    • Mail storage included: 20GB
    • Limits on send/receive volume: none
    • Aliases: unlimited, with lots of features: can set expiration, etc.
    • Additional mailboxes: not available

Kolab
Kolab Now is mainly positioned as a full groupware service, but they do have a email-only option which I investigated. There isn t much documentation about it compared to other providers, and also not much in the way of settings. You can turn greylisting on or off. And . that s it. It has a full suite of filtering options. They set an X-Envelope-To header which you can use with the arbitrary header match to do the right thing even for BCC situations. Filters can have multiple conditions and multiple actions. It is SIEVE-based and you can download your SIEVE definitions. If you enable 2FA, you disable IMAP and SMTP; not great. Verdict: Not an impressive enough email featureset to justify going with it. Kolab Now summary:
  • Website: https://kolabnow.com/
  • Reliability: Seems to be fine
  • Support: Fine responsiveness (next day)
  • Invidiaul app passwords: no
  • 2FA: Yes, but if you enable it, they disable IMAP and SMTP
  • Filtering: Excellent
  • Spam settings: Only greylisting on/off
  • Server storage location: Switzerland; they have lots of details on their setup
  • Plan as reviewed: Just email [pricing link]
    • Cost per year: CHF 60, about $66
    • Mail storage included: 5GB
    • Limitations on send/receive volume: None
    • Aliases: Yes. Not sure if there are limits.
    • Additional mailboxes: Yes if you set up a group account. Flexible pricing based on user count is not documented anywhere I could find.

Mailfence
Mailfence is another option, somewhat similar to Startmail but without the unique vault. I had some questions about filters, and support was quite responsive, responding in a couple of hours. Some of their copy on their website is a bit misleading, but support clarified when I asked them. They do not offer encryption at rest (like most of the entries here). Mailfence s filtering system is the kind I d like to see. It allows multiple conditions and multiple actions for each rule, and has some unique actions as well (notify by SMS or XMPP). Support says that Recipients matches envelope recipients. However, one ommission is that I can t match on arbitrary headers; only the canned list of headers they provide. They have only two spam settings:
  • spam filter on/off
  • whitelist
Given some recent complaints about their spam filter being overly aggressive, I find this lack of control somewhat concerning. (However, I discount complaints about people begging for more features in free accounts; free won t provide the kind of service I m looking for with any provider.) There are generally just very few settings for email as well. Verdict: Response and helpful support, filtering has the right structure but lacks arbitrary header match. Could be a good option. Mailfence summary:
  • Website: https://mailfence.com/
  • Reliability: Seems to be fine
  • Support: Excellent responsiveness and helpful replies (after some initial confusion about my question of greylisting)
  • Individual app access passwords: No. You can set a per-service password (eg, an IMAP password), but those will be shared with all devices speaking that protocol.
  • 2FA: Yes
  • Filtering: Good; only misses the ability to filter on arbitrary headers
  • Spam settings: Very few
  • Server storage location: Belgium
  • Plan as reviewed: Entry [pricing link]
    • Cost per year: $42
    • Mail storage included: 10GB, with a maximum of 50,000 messages
    • Limits on send/receive volume: none
    • Aliases: 50. Aliases can t be deleted once created (there may be an exeption to this for aliases on your own domain rather than mailfence.com)
    • Additional mailboxes: Their page on this is a bit confusing, and the pricing page lacks the information promised. It looks like you can pay the same $42/year for additional mailboxes, with a limit of up to 2 additional paid mailboxes and 2 additional free mailboxes tied to the account.

Runbox
This one came recommended in a Mastodon thread. I had some questions about it, and support response was fantastic I heard from two people that were co-founders of the company! Even within hours, on a weekend. Incredible! This kind of response was only surpassed by Migadu. I initially wrote to Runbox with questions about the incoming and outgoing message limits, which I hadn t seen elsewhere, as well as the bandwidth limit. They said the bandwidth limit is no longer enforced on paid accounts. The incoming and outgoing limits are enforced, and all email (even spam) counts towards the limit. Notably the outgoing limit is per recipient, so if you send 10 messages to your 50-recipient family group, that s the limit. However, they also indicated a willingness to reset the limit if something happens. Unfortunately, hitting the limit results in a hard bounce (SMTP 5xx) rather than a temporary failure (SMTP 4xx) so it can result in lost mail. This means I d be worried about some attack or other weirdness causing me to lose mail. Their filter is a pain point. Here are the challenges:
  • You can t directly match on a BCC recipient. Support advised to use a headers match, which will search for something anywhere in the headers. This works and is probably good enough since this data is in the Received: headers, but it is a little more imprecise.
  • They only have a contains , not an equals operator. So, for instance, a pattern searching for test@example.com would also match newtest@example.com . Support advised to put the email address in angle brackets to avoid this. That will work mostly. Angle brackets aren t always required in headers.
  • There is no way to have multiple actions on the filter (there is just no way to file an incoming message into two folders). This was the ultimate showstopper for me.
Support advised they are planning to upgrade the filter system in the future, but these are the limitations today. Verdict: A good option if you don t need much from the filtering system. Lots of privacy emphasis. Runbox summary:
  • Website: https://runbox.com/
  • Reliability: Seems to be fine, except returning 5xx codes if per-day limits are exceeded
  • Support: Excellent responsiveness and replies from founders
  • Individual app passwords: Yes
  • 2FA: Yes
  • Filtering: Poor
  • Spam settings: Very few
  • Server storage location: Norway
  • Plan as reviewed: Mini [pricing link]
    • Cost per year: $35
    • Mail storage included: 10GB
    • Limited on send/receive volume: Receive 5000 messages/day, Send 500 recipients/day
    • Aliases: 100 on runbox.com; unlimited on your own domain
    • Additional mailboxes: $15/yr each, also with 10GB non-shared storage per mailbox

Fastmail
Fastmail came recommended to me by a friend I ve known for decades. Here s the thing about Fastmail, compared to all the services listed above: It all just works. Everything. Filtering, spam prevention, it is all there, all feature-complete, and all just does the right thing as you d hope. Their filtering system has a canned dropdown for To/Cc/Bcc , it supports multiple conditions and multiple actions, and just does the right thing. (Delivering to multiple folders is a little cumbersome but possible.) It has a particularly strong feature set around administering multiple accounts, including things like whether users can prevent admins from reading their mail. The not-so-great part of the picture is around privacy. Fastmail is based in Australia, where the government has extensive power around spying on data, even to the point of forcing companies to add wiretap capabilities. Fastmail s privacy policy states user data may be held in Australia, USA, India, and Netherlands. By default, they share data with unidentified spam companies , though you can disable this in settings. On the other hand, they do make a good effort towards privacy. I contacted support with some questions and got back a helpful response in three hours. However, one of the questions was about in which countries my particular data would be stored, and the support response said they would have to get back to me on that. It s been several days and no word back. Verdict: A featureful option that just works , with a lot of features for managing family accounts and the like, but lacking in the privacy area. Fastmail summary:
  • Website: https://www.fastmail.com/
  • Reliability: Seems to be fine
  • Support: Good response time on most questions; dropped the ball on one tha trequired research
  • Individual app access passwords: Yes
  • 2FA: Yes
  • Filtering: Excellent
  • Spam settings: Can set filter aggressiveness, decide whether to share spam data with spam-fighting companies , configure how to handle backscatter spam, and evaluate the personal learning filter.
  • Server storage locations: Australia, USA, India, and The Netherlands. Legal jurisdiction is Australia.
  • Plan as reviewed: Individual [pricing link]
    • Cost per year: $60
    • Mail storage included: 50GB
    • Limits on send/receive volume: 300/hour
    • Aliases: Unlimited from what I can see
    • Additional mailboxes: No; requires a different plan for that

Migadu
Migadu was a service I d never heard of, but came recommended to me on Mastodon. I listed Migadu last because it is a class of its own compared to all the other options. Every other service is basically a webmail interface with a few extra settings tacked on. Migadu has a full-featured email admin console in addition. By that I mean you can:
  • View usage graphs (incoming, outgoing, storage) over time
  • Manage DNS (if you want Migadu to run your nameservers)
  • Manage multiple domains, and cross-domain relationships with mailboxes
  • View a limited set of logs
  • Configure accounts, reset their passwords if needed/authorized, etc.
  • Configure email address rewrite rules with wildcards and so forth
Basically, if you were the sort of person that ran your own mail servers back in the day, here is Migadu giving you most of that functionality. Effectively you have a web interface to do all the useful stuff, and they handle the boring and annoying bits. This is a really attractive model. Migadu support has been fantastic. They are quick to respond, and went above and beyond. I pointed out that their X-Envelope-To header, which is needed for filtering by BCC, wasn t being added on emails I sent myself. They replied 5 hours later indicating they had added the feature to add X-Envelope-To even for internal mails! Wow! I am impressed. With Migadu, you buy a pool of resources: storage space and incoming/outgoing traffic. What you do within that pool is up to you. You can set up users ( mailboxes ), aliases, domains, whatever you like. It all just shares the pool. You can restrict users further so that an individual user has access to only a subset of the pool resources. I was initially concerned about Migadu s daily send/receive message count limits, but in visiting with support and reading the documentation, what really comes out is that Migadu is a service with a personal touch. Hitting the incoming traffic limit will cause a SMTP temporary fail (4xx) response so you won t lose legit mail and support will work with you if it s a problem for legit uses. In other words, restrictions are soft and they are interpreted reasonably. One interesting thing about Migadu is that they do not offer accounts under their domain. That is, you MUST bring your own domain. That s pretty easy and cheap, of course. It also puts you in a position of power, because it is easy to migrate email from one provider to another if you own the domain. Filtering is done via SIEVE. There is a GUI editor which lets you accomplish most things, though it has an odd blind spot where you can t file a message into multiple folders. However, you can edit a SIEVE ruleset directly and you get the full SIEVE featureset, which is extensive (and does support filing a message into multiple folders). I note that the SIEVE :envelope match doesn t work, but Migadu adds an X-Envelope-To header which is just as good. I particularly love a company that tells you all the reasons you might not want to use them. Migadu s pro/con list is an honest drawbacks list (of course, their homepage highlights all the features!). Verdict: Fantastically powerful, excellent support, and good privacy. I chose this one. Migadu summary:
  • Website: https://migadu.com/
  • Reliability: Excellent
  • Support: Fantastic. Good response times and they added a feature (or fixed a bug?) a few hours after I requested it.
  • Individual access passwords: Yes. Create identities to support them.
  • 2FA: Yes, on both the admin interface and the webmail interface
  • Filtering: Excellent, based on SIEVE. GUI editor doesn t support multiple actions when filing into a folder, but full SIEVE functionality is exposed.
  • Spam settings:
    • On the domain level, filter aggressiveness, Greylisting on/off, black and white lists
    • On the mailbox level, filter aggressiveness, black and whitelists, action to take with spam; compatible with filters.
  • Server storage location: France; legal jurisdiction Switzerland
  • Plan as reviewed: mini [pricing link]
    • Cost per year: $90
    • Mail storage included: 30GB ( soft quota)
    • Limits on send/receive volume: 1000 messgaes in/day, 100 messages out/day ( soft quotas)
    • Aliases: Unlimited on an unlimited number of domains
    • Additional mailboxes: Unlimited and free; uses pooled quotas, but individual quotas can be set

Others
Here are a few others that I didn t think worthy of getting a trial:
  • mxroute was recommended by several. Lots of concerning things in their policy, such as:
    • if you repeatedly send mail to unroutable recipients, they may publish the addresses on Github
    • they will terminate your account if they think you are rude or want to contest a charge
    • they reserve the right to cancel your service at any time for any (or no) reason.
  • Proton keeps coming up, and I will not consider it so long as I am locked into their client on mobile.
  • Skiff comes up sometimes, but they were acquired by Notion.
  • Disroot comes up; this discussion highlights a number of reasons why I avoid them. Their Terms of Service (ToS) is inconsistent with a general-purpose email account (I guess for targeting nonprofits and activists, that could make sense). Particularly laughable is that they claim to be friends of Open Source, but then would take down your account if you upload copyrighted material. News flash: in order for an Open Source license to be meaningful, the underlying work is copyrighted. It is perfectly legal to upload copyrighted material when you wrote it or have the license to do so!

Conclusions
There are a lot of good options for email hosting today, and in particular I appreciate the excellent personal support from companies like Migadu and Runbox. Support small businesses!

6 April 2024

John Goerzen: Facebook is Censoring Stories about Climate Change and Illegal Raid in Marion, Kansas

It is, sadly, not entirely surprising that Facebook is censoring articles critical of Meta. The Kansas Reflector published an artical about Meta censoring environmental articles about climate change deeming them too controversial . Facebook then censored the article about Facebook censorship, and then after an independent site published a copy of the climate change article, Facebook censored it too. The CNN story says Facebook apologized and said it was a mistake and was fixing it. Color me skeptical, because today I saw this: Yes, that s right: today, April 6, I get a notification that they removed a post from August 12. The notification was dated April 4, but only showed up for me today. I wonder why my post from August 12 was fine for nearly 8 months, and then all of a sudden, when the same website runs an article critical of Facebook, my 8-month-old post is a problem. Hmm. Riiiiiight. Cybersecurity. This isn t even the first time they ve done this to me. On September 11, 2021, they removed my post about the social network Mastodon (click that link for screenshot). A post that, incidentally, had been made 10 months prior to being removed. While they ultimately reversed themselves, I subsequently wrote Facebook s Blocking Decisions Are Deliberate Including Their Censorship of Mastodon. That this same pattern has played out a second time again with something that is a very slight challenege to Facebook seems to validate my conclusion. Facebook lets all sort of hateful garbage infest their site, but anything about climate change or their own censorship gets removed, and this pattern persists for years. There s a reason I prefer Mastodon these days. You can find me there as @jgoerzen@floss.social. So. I ve written this blog post. And then I m going to post it to Facebook. Let s see if they try to censor me for a third time. Bring it, Facebook.

3 January 2024

John Goerzen: Consider Security First

I write this in the context of my decision to ditch Raspberry Pi OS and move everything I possibly can, including my Raspberry Pi devices, to Debian. I will write about that later. But for now, I wanted to comment on something I think is often overlooked and misunderstood by people considering distributions or operating systems: the huge importance of getting security updates in an automated and easy way.

Background Let s assume that these statements are true, which I think are well-supported by available evidence:
  1. Every computer system (OS plus applications) that can do useful modern work has security vulnerabilities, some of which are unknown at any given point in time;
  2. During the lifetime of that computer system, some of these vulnerabilities will be discovered. For a (hopefully large) subset of those vulnerabilities, timely patches will become available.
Now then, it follows that applying those timely patches is a critical part of having a system that it as secure as possible. Of course, you have to do other things as well good passwords, secure practices, etc but, fundamentally, if your system lacks patches for known vulnerabilities, you ve already lost at the security ballgame.

How to stay patched There is something of a continuum of how you might patch your system. It runs roughly like this, from best to worst:
  1. All components are kept up-to-date automatically, with no intervention from the user/operator
  2. The operator is automatically alerted to necessary patches, and they can be easily installed with minimal intervention
  3. The operator is automatically alerted to necessary patches, but they require significant effort to apply
  4. The operator has no way to detect vulnerabilities or necessary patches
It should be obvious that the first situation is ideal. Every other situation relies on the timeliness of human action to keep up-to-date with security patches. This is a fallible situation; humans are busy, take trips, dismiss alerts, miss alerts, etc. That said, it is rare to find any system living truly all the way in that scenario, as you ll see.

What is your system ? A critical point here is: what is your system ? It includes:
  • Your kernel
  • Your base operating system
  • Your applications
  • All the libraries needed to run all of the above
Some OSs, such as Debian, make little or no distinction between the base OS and the applications. Others, such as many BSDs, have a distinction there. And in some cases, people will compile or install applications outside of any OS mechanism. (It must be stressed that by doing so, you are taking the responsibility of patching them on your own shoulders.)

How do common systems stack up?
  • Debian, with its support for unattended-upgrades, needrestart, debian-security-support, and such, is largely category 1. It can automatically apply security patches, in most cases can restart the necessary services for the patch to take effect, and will alert you when some processes or the system must be manually restarted for a patch to take effect (for instance, a kernel update). Those cases requiring manual intervention are category 2. The debian-security-support package will even warn you of gaps in the system. You can also use debsecan to scan for known vulnerabilities on a given installation.
  • FreeBSD has no way to automatically install security patches for things in the packages collection. As with many rolling-release systems, you can t automate the installation of these security patches with FreeBSD because it is not safe to blindly update packages. It s not safe to blindly update packages because they may bring along more than just security patches: they may represent major upgrades that introduce incompatibilities, etc. Unlike Debian s practice of backporting fixes and thus producing narrowly-tailored patches, forcing upgrades to newer versions precludes a minimal intervention install. Therefore, rolling release systems are category 3.
  • Things such as Snap, Flatpak, AppImage, Docker containers, Electron apps, and third-party binaries often contain embedded libraries and such for which you have no easy visibility into their status. For instance, if there was a bug in libpng, would you know how many of your containers had a vulnerability? These systems are category 4 you don t even know if you re vulnerable. It s for this reason that my Debian-based Docker containers apply security patches before starting processes, and also run unattended-upgrades and friends.

The pernicious library problem As mentioned in my last category above, hidden vulnerabilities can be a big problem. I ve been writing about this for years. Back in 2017, I wrote an article focused on Docker containers, but which applies to the other systems like Snap and so forth. I cited a study back then that Over 80% of the :latest versions of official images contained at least one high severity vulnerability. The situation is no better now. In December 2023, it was reported that, two years after the critical Log4Shell vulnerability, 25% of apps were still vulnerable to it. Also, only 21% of developers ever update third-party libraries after introducing them into their projects. Clearly, you can t rely on these images with embedded libraries to be secure. And since they are black box, they are difficult to audit. Debian s policy of always splitting libraries out from packages is hugely beneficial; it allows finegrained analysis of not just vulnerabilities, but also the dependency graph. If there s a vulnerability in libpng, you have one place to patch it and you also know exactly what components of your system use it. If you use snaps, or AppImages, you can t know if they contain a deeply embedded vulnerability, nor could you patch it yourself if you even knew. You are at the mercy of upstream detecting and remedying the problem a dicey situation at best.

Who makes the patches? Fundamentally, humans produce security patches. Often, but not always, patches originate with the authors of a program and then are integrated into distribution packages. It should be noted that every security team has finite resources; there will always be some CVEs that aren t patched in a given system for various reasons; perhaps they are not exploitable, or are too low-impact, or have better mitigations than patches. Debian has an excellent security team; they manage the process of integrating patches into Debian, produce Debian Security Advisories, maintain the Debian Security Tracker (which maintains cross-references with the CVE database), etc. Some distributions don t have this infrastructure. For instance, I was unable to find this kind of tracker for Devuan or Raspberry Pi OS. In contrast, Ubuntu and Arch Linux both seem to have active security teams with trackers and advisories.

Implications for Raspberry Pi OS and others As I mentioned above, I m transitioning my Pi devices off Raspberry Pi OS (Raspbian). Security is one reason. Although Raspbian is a fork of Debian, and you can install packages like unattended-upgrades on it, they don t work right because they use the Debian infrastructure, and Raspbian hasn t modified them to use their own infrastructure. I don t see any Raspberry Pi OS security advisories, trackers, etc. In short, they lack the infrastructure to support those Debian tools anyhow. Not only that, but Raspbian lags behind Debian in both new releases and new security patches, sometimes by days or weeks. Live Migrating from Raspberry Pi OS bullseye to Debian bookworm contains instructions for migrating Raspberry Pis to Debian.

17 June 2023

John Goerzen: Using dar for Data Archiving

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I d evaluate git-annex and dar. The second post evaluated git-annex, and now it s time to look at dar. The series will conclude with a post comparing git-annex with dar. What is dar? I could open with the same thing I did with git-annex, just changing the name of the program: [dar] is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar s homepage lays out a comprehensive list of features, which I will try to summarize here. So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it. Isolated cataloges aren t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking. Walkthrough: Creating the first archive As with the git-annex walkthrough, I ll set some variables to make it easy to remember: OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren t familiar to everyone, I will use the long-form ones in these examples. Here s how we create our initial full backup. I ll explain the parameters below:
$ dar \
--verbose \
--create $DRIVE/bak1 \
--on-fly-isolate $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
Let s look at each of these parameters: This same command could have been written with short options as:
$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR
What does it look like while running? Here s an excerpt:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted]
Finished writing to file 1, ready to continue ? [return = YES Esc = NO]
...
Writing down archive contents...
Closing the escape layer...
Writing down the first archive terminator...
Writing down archive trailer...
Writing down the second archive terminator...
Closing archive low layer...
Archive is closed.
--------------------------------------------
581 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
0 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
0 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 581 inode(s)
--------------------------------------------
Making room in memory (releasing memory used by archive of reference)...
Now performing on-fly isolation...
...
That was easy! Let s look at the contents of the backup directory:
$ ls -lh $DRIVE
total 3.7G
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar
-rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar
And the isolated catalog:
$ ls -lh $CATDIR
total 37K
-rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar
The isolated catalog is stored compressed automatically. Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data. Walkthrough: Inspecting the saved archive Can dar tell us which slice contains a given file? Sure:
$ dar --list $DRIVE/bak1 --list-format=slicing less
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
...
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
1-2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
...
This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.
$ dar --list $DRIVE/bak1 less
[Data ][D][ EA ][FSA][Compr][S] Permission User Group Size Date filename
--------------------------------+------------+-------+-------+---------+-------------------------------+------------
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 24 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 16 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 22 Mio Mon Mar 5 07:58:09 2018 [redacted]
These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format. Walkthrough: updates As with git-annex, I ve made some changes in the source directory: moved a file, added another, and deleted one. Let s create an incremental backup now:
$ dar \
--verbose \
--create $DRIVE/bak2 \
--on-fly-isolate $CATDIR/bak2 \
--ref $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here. Here s what I did to the $SOURCEDIR: Let s see if dar s command output matches this:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp
Adding folder to archive: [redacted]
Saving Filesystem Specific Attributes for [redacted]
Adding reference to files that have been destroyed since reference backup...
...
--------------------------------------------
3 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
578 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
2 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 583
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 3 inode(s)
--------------------------------------------
...
Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out. Let s see the files that were created:
$ ls -lh $DRIVE/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar
$ ls -lh $CATDIR/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar
What does list look like now?
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
[ ][ ] [---][-----][X] -rwxr--r-- [redacted]
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- file01-unchanged
...
[--- REMOVED ENTRY ----][redacted]
[--- REMOVED ENTRY ----][redacted]
Here I show an example of:
  1. A file that was not changed from the initial backup. Its presence was simply noted, but because we re doing an incremental, the data wasn t saved.
  2. A file that is saved in this incremental, on slice 1.
  3. The two deleted files
Walkthrough: dar_manager As we ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager s when parameter, we can restore files as of a particular date. Let s try it out. First, we create our database:
$ dar_manager --create $DARDB
$ dar_manager --base $DARDB --add $DRIVE/bak1
Auto detecting min-digits to be 2
$ dar_manager --base $DARDB --add $DRIVE/bak2
Auto detecting min-digits to be 2
Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It s important to add the catalogs in order. Let s do some quick experimentation with dar_manager:
$ dar_manager -v --base $DARDB --list
Decompressing and loading database to memory...
dar path :
dar options :
database version : 6
compression used : gzip
compression level: 9 archive # path basename
------------+--------------+---------------
1 /acrypt/no-backup/jgoerzen/dar-testing/drive bak1
2 /acrypt/no-backup/jgoerzen/dar-testing/drive bak2
$ dar_manager --base $DARDB --stat
archive # most recent/total data most recent/total EA
--------------+-------------------------+-----------------------
1 580/581 0/0
2 3/3 0/0
The list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename. Now let s see just what files are saved in archive , the incremental:
$ dar_manager --base $DARDB --used 2
[ Saved ][ ] [redacted]
[ Saved ][ ] file01-unchanged
[ Saved ][ ] cp
Now we can also where a file is stored. Here s one that was saved in the full backup and unmodified in the incremental:
$ dar_manager --base $DARDB --file [redacted]
1 Fri Jun 16 19:15:12 2023 saved absent
2 Fri Jun 16 19:15:12 2023 present absent
(The absent at the end refers to extended attributes that the file didn t have) Similarly, for files that were added or removed, they ll be listed only at the appropriate place. Walkthrough: Restoration I m not going to repeat the author s full restoration with dar page, but here are some quick examples. A simple way of doing everything is using incrementals for the whole series. To do that, you d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options: If you get fancy for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs. Anyhow, let s do a restore using just dar. I ll make a $RESTOREDIR and do it that way.
$ dar \
--verbose \
--extract $DRIVE/bak1 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
This execute lets us see how dar works; this is an illustration of the power it has (above pause); it s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it s not strictly necessary, as dar will prompt you for slices it needs if they re not mounted. Anyhow, you ll see it first reading the last slice, which contains the catalog, then reading from the beginning. Here we go:
Auto detecting min-digits to be 2
Opening archive bak1 ...
Opening the archive using the multi-slice abstraction layer...
Ready for slice 10. Press Enter
...
Loading catalogue into memory...
Locating archive contents...
Reading archive contents...
File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES Esc = NO]
Continuing...
Restoring file's data: [redacted]
Restoring file's FSA: [redacted]
Ready for slice 1. Press Enter
...
Ready for slice 2. Press Enter
...
--------------------------------------------
581 inode(s) restored
including 0 hard link(s)
0 inode(s) not restored (not saved in archive)
0 inode(s) not restored (overwriting policy decision)
0 inode(s) ignored (excluded by filters)
0 inode(s) failed to restore (filesystem error)
0 inode(s) deleted
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA restored for 0 inode(s)
FSA restored for 0 inode(s)
--------------------------------------------
The warning is because I m not doing the extraction as root, which limits dar s ability to fully restore ownership data. OK, now the incremental:
$ dar \
--verbose \
--extract $DRIVE/bak2 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
...
Ready for slice 1. Press Enter
...
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory]
Removing file (reason is file recorded as removed in archive): [redacted file]
Removing file (reason is file recorded as removed in archive): [redacted file]
This all looks right! Now how about we compare the restore to the original source directory?
$ diff -durN $SOURCEDIR $RESTOREDIR
No changes perfect. We could instead do this restore via a single dar_manager command, though annoyingly, we d have to pass all top-level files/directories to dar_manager restore. But still, it s one command, and basically automates and optimizes the dar restores shown above. Conclusions Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore. A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care for instance, by date then this should be a pretty easy task. I haven t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds tape marks (or escape sequences ) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration. This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I ll discuss that more in the next post.

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

2 February 2023

John Goerzen: Using Yggdrasil As an Automatic Mesh Fabric to Connect All Your Docker Containers, VMs, and Servers

Sometimes you might want to run Docker containers on more than one host. Maybe you want to run some at one hosting facility, some at another, and so forth. Maybe you d like run VMs at various places, and let them talk to Docker containers and bare metal servers wherever they are. And maybe you d like to be able to easily migrate any of these from one provider to another. There are all sorts of very complicated ways to set all this stuff up. But there s also a simple one: Yggdrasil. My blog post Make the Internet Yours Again With an Instant Mesh Network explains some of the possibilities of Yggdrasil in general terms. Here I want to show you how to use Yggdrasil to solve some of these issues more specifically. Because Yggdrasil is always Encrypted, some of the security lifting is done for us.

Background Often in Docker, we connect multiple containers to a single network that runs on a given host. That much is easy. Once you start talking about containers on multiple hosts, then you start adding layers and layers of complexity. Once you start talking multiple providers, maybe multiple continents, then the complexity can increase. And, if you want to integrate everything from bare metal servers to VMs into this well, there are ways, but they re not easy. I m a believer in the KISS principle. Let s not make things complex when we don t have to.

Enter Yggdrasil As I ve explained before, Yggdrasil can automatically form a global mesh network. This is pretty cool! As most people use it, they join it to the main Yggdrasil network. But Yggdrasil can be run entirely privately as well. You can run your own private mesh, and that s what we ll talk about here. All we have to do is run Yggdrasil inside each container, VM, server, or whatever. We handle some basics of connectivity, and bam! Everything is host- and location-agnostic.

Setup in Docker The installation of Yggdrasil on a regular system is pretty straightforward. Docker is a bit more complicated for several reasons:
  • It blocks IPv6 inside containers by default
  • The default set of permissions doesn t permit you to set up tunnels inside a container
  • It doesn t typically pass multicast (broadcast) packets
Normally, Yggdrasil could auto-discover peers on a LAN interface. However, aside from some esoteric Docker networking approaches, Docker doesn t permit that. So my approach is going to be setting up one or more Yggdrasil router containers on a given Docker host. All the other containers talk directly to the router container and it s all good.

Basic installation In my Dockerfile, I have something like this:
FROM jgoerzen/debian-base-security:bullseye
RUN echo "deb http://deb.debian.org/debian bullseye-backports main" >> /etc/apt/sources.list && \
    apt-get --allow-releaseinfo-change update && \
    apt-get -y --no-install-recommends -t bullseye-backports install yggdrasil
...
COPY yggdrasil.conf /etc/yggdrasil/
RUN set -x; \
    chown root:yggdrasil /etc/yggdrasil/yggdrasil.conf && \
    chmod 0750 /etc/yggdrasil/yggdrasil.conf && \
    systemctl enable yggdrasil
The magic parameters to docker run to make Yggdrasil work are:
--cap-add=NET_ADMIN --sysctl net.ipv6.conf.all.disable_ipv6=0 --device=/dev/net/tun:/dev/net/tun
This example uses my docker-debian-base images, so if you use them as well, you ll also need to add their parameters. Note that it is NOT necessary to use --privileged. In fact, due to the network namespaces in use in Docker, this command does not let the container modify the host s networking (unless you use --net=host, which I do not recommend). The --sysctl parameter was the result of a lot of banging my head against the wall. Apparently Docker tries to disable IPv6 in the container by default. Annoying.

Configuration of the router container(s) The idea is that the router node (or more than one, if you want redundancy) will be the only ones to have an open incoming port. Although the normal Yggdrasil case of directly detecting peers in a broadcast domain is more convenient and more robust, this can work pretty well too. You can, of course, generate a template yggdrasil.conf with yggdrasil -genconf like usual. Some things to note for this one:
  • You ll want to change Listen to something like Listen: ["tls://[::]:12345"] where 12345 is the port number you ll be listening on.
  • You ll want to disable the MulticastInterfaces entirely by just setting it to [] since it doesn t work anyway.
  • If you expose the port to the Internet, you ll certainly want to firewall it to only authorized peers. Setting AllowedPublicKeys is another useful step.
  • If you have more than one router container on a host, each of them will both Listen and act as a client to the others. See below.

Configuration of the non-router nodes Again, you can start with a simple configuration. Some notes here:
  • You ll want to set Peers to something like Peers: ["tls://routernode:12345"] where routernode is the Docker hostname of the router container, and 12345 is its port number as defined above. If you have more than one local router container, you can simply list them all here. Yggdrasil will then fail over nicely if any one of them go down.
  • Listen should be empty.
  • As above, MulticastInterfaces should be empty.

Using the interfaces At this point, you should be able to ping6 between your containers. If you have multiple hosts running Docker, you can simply set up the router nodes on each to connect to each other. Now you have direct, secure, container-to-container communication that is host-agnostic! You can also set up Yggdrasil on a bare metal server or VM using standard procedures and everything will just talk nicely!

Security notes Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure your internal comms stay private: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. By disabling multicast discovery, you eliminate the chance for random machines on the LAN to join the mesh. By making sure that you firewall off (outside of Yggdrasil) who can connect to a Yggdrasil node with a listening port, you can authorize only your own machines. And, by setting AllowedPublicKeys on the nodes with listening ports, you can authenticate the Yggdrasil peers. Note that part of the benefit of the Yggdrasil mesh is normally that you don t have to propagate a configuration change to every participatory node that s a nice thing in general! You can also run a firewall inside your container (I like firehol for this purpose) and aggressively firewall the IPs that are allowed to connect via the Yggdrasil interface. I like to set a stable interface name like ygg0 in yggdrasil.conf, and then it becomes pretty easy to firewall the services. The Docker parameters that allow Yggdrasil to run are also sufficient to run firehol.

Naming Yggdrasil peers You probably don t want to hard-code Yggdrasil IPs all over the place. There are a few solutions:
  • You could run an internal DNS service
  • You can do a bit of scripting around Docker s --add-host command to add things to /etc/hosts

Other hints & conclusion Here are some other helpful use cases:
  • If you are migrating between hosts, you could leave your reverse proxy up at both hosts, both pointing to the target containers over Yggdrasil. The targets will be automatically found from both sides of the migration while you wait for DNS caches to update and such.
  • This can make services integrate with local networks a lot more painlessly than they might otherwise.
This is just an idea. The point of Yggdrasil is expanding our ideas of what we can do with a network, so here s one such expansion. Have fun!
Note: This post also has a permanent home on my webiste, where it may be periodically updated.

29 May 2022

John Goerzen: Fast, Ordered Unixy Queues over NNCP and Syncthing with Filespooler

It seems that lately I ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Why did I bother? My needs arose primarily from handling Backups over Asynchronous Communication methods in this case, NNCP. When backups contain incrementals that are unpacked on the destination, they must be applied in the correct order. In some cases, like ZFS, the receiving side will detect an out-of-order backup file and exit with an error. In those cases, processing in random order is acceptable but can be slow if, say, hundreds or thousands of hourly backups have stacked up over a period of time. The same goes for using gitsync-nncp to synchronize git repositories. In both cases, a best effort based on creation date is sufficient to produce a significant performance improvement. With other cases, such as tar or dar backups, the receiving cannot detect out of order incrementals. In those situations, the incrementals absolutely must be applied with strict ordering. There are many other situations that arise with these needs also. Filespooler is the answer to these.

Existing Work Before writing my own program, I of course looked at what was out there already. I looked at celeary, gearman, nq, rq, cctools work queue, ts/tsp (task spooler), filequeue, dramatiq, GNU parallel, and so forth. Unfortunately, none of these met my needs at all. They all tended to have properties like:
  • An extremely complicated client/server system that was incompatible with piping data over existing asynchronous tools
  • A large bias to processing of small web requests, resulting in terrible inefficiency or outright incompatibility with jobs in the TB range
  • An inability to enforce strict ordering of jobs, especially if they arrive in a different order from how they were queued
Many also lacked some nice-to-haves that I implemented for Filespooler:
  • Support for the encryption and cryptographic authentication of jobs, including metadata
  • First-class support for arbitrary compressors
  • Ability to use both stream transports (pipes) and filesystem-like transports (eg, rclone mount, S3, Syncthing, or Dropbox)

Introducing Filespooler Filespooler is a tool in the Unix tradition: that is, do one thing well, and integrate nicely with other tools using the fundamental Unix building blocks of files and pipes. Filespooler itself doesn t provide transport for jobs, but instead is designed to cooperate extremely easily with transports that can be written to as a filesystem or piped to which is to say, almost anything of interest. Filespooler is written in Rust and has an extensive Filespooler Reference as well as many tutorials on its homepage. To give you a few examples, here are some links:

Basics of How it Works Filespooler is intentionally simple:
  • The sender maintains a sequence file that includes a number for the next job packet to be created.
  • The receiver also maintains a sequence file that includes a number for the next job to be processed.
  • fspl prepare creates a Filespooler job packet and emits it to stdout. It includes a small header (<100 bytes in most cases) that includes the sequence number, creation timestamp, and some other useful metadata.
  • You get to transport this job packet to the receiver in any of many simple ways, which may or may not involve Filespooler s assistance.
  • On the receiver, Filespooler (when running in the default strict ordering mode) will simply look at the sequence file and process jobs in incremental order until it runs out of jobs to process.
The name of job files on-disk matches a pattern for identification, but the content of them is not significant; only the header matters. You can send job data in three ways:
  1. By piping it to fspl prepare
  2. By setting certain environment variables when calling fspl prepare
  3. By passing additional command-line arguments to fspl prepare, which can optionally be passed to the processing command at the receiver.
Data piped in is added to the job payload , while environment variables and command-line parameters are encoded in the header.

Basic usage Here I will excerpt part of the Using Filespooler over Syncthing tutorial; consult it for further detail. As a bit of background, Syncthing is a FLOSS decentralized directory synchronization tool akin to Dropbox (but with a much richer feature set in many ways).

Preparation First, on the receiver, you create the queue (passing the directory name to -q):
sender$ fspl queue-init -q ~/sync/b64queue
Now, we can send a job like this:
sender$ echo Hi   fspl prepare -s ~/b64seq -i -   fspl queue-write -q ~/sync/b64queue
Let s break that down:
  • First, we pipe Hi to fspl prepare.
  • fspl prepare takes two parameters:
    • -s seqfile gives the path to a sequence file used on the sender side. This file has a simple number in it that increments a unique counter for every generated job file. It is matched with the nextseq file within the queue to make sure that the receiver processes jobs in the correct order. It MUST be separate from the file that is in the queue and should NOT be placed within the queue. There is no need to sync this file, and it would be ideal to not sync it.
    • The -i option tells fspl prepare to read a file for the packet payload. -i - tells it to read stdin for this purpose. So, the payload will consist of three bytes: Hi\n (that is, including the terminating newline that echo wrote)
  • Now, fspl prepare writes the packet to its stdout. We pipe that into fspl queue-write:
    • fspl queue-write reads stdin and writes it to a file in the queue directory in a safe manner. The file will ultimately match the fspl-*.fspl pattern and have a random string in the middle.
At this point, wait a few seconds (or however long it takes) for the queue files to be synced over to the recipient. On the receiver, we can see if any jobs have arrived yet:
receiver$ fspl queue-ls -q ~/sync/b64queue
ID                   creation timestamp          filename
1                    2022-05-16T20:29:32-05:00   fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
Let s say we d like some information about the job. Try this:
receiver$ $ fspl queue-info -q ~/sync/b64queue -j 1
FSPL_SEQ=1
FSPL_CTIME_SECS=1652940172
FSPL_CTIME_NANOS=94106744
FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z
FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00
FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/b64queue
FSPL_JOB_FULLPATH=/home/jgoerzen/sync/b64queue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
This information is intentionally emitted in a format convenient for parsing. Now let s run the job!
receiver$ fspl queue-process -q ~/sync/b64queue --allow-job-params base64
SGkK
There are two new parameters here:
  • --allow-job-params says that the sender is trusted to supply additional parameters for the command we will be running.
  • base64 is the name of the command that we will run for every job. It will:
    • Have environment variables set as we just saw in queue-info
    • Have the text we previously prepared Hi\n piped to it
By default, fspl queue-process doesn t do anything special with the output; see Handling Filespooler Command Output for details on other options. So, the base64-encoded version of our string is SGkK . We successfully sent a packet using Syncthing as a transport mechanism! At this point, if you do a fspl queue-ls again, you ll see the queue is empty. By default, fspl queue-process deletes jobs that have been successfully processed.

For more See the Filespooler homepage.
This blog post is also available as a permanent, periodically-updated page.

3 March 2022

John Goerzen: Tools for Communicating Offline and in Difficult Circumstances

Note: this post is also available on my website, where it will be updated periodically. When things are difficult maybe there s been a disaster, or an invasion (this page is being written in 2022 just after Russia invaded Ukraine), or maybe you re just backpacking off the grid there are tools that can help you keep in touch, or move your data around. This page aims to survey some of them, roughly in order from easiest to more complex.

Simple radios Handheld radios shouldn t be forgotten. They are cheap, small, and easy to operate. Their range isn t huge maybe a couple of miles in rural areas, much less in cities but they can be a useful place to start. They tend to have no actual encryption features (the privacy features really aren t.) In the USA, options are FRS/GMRS and CB.

Syncthing With Syncthing, you can share files among your devices or with your friends. Syncthing essentially builds a private mesh for file sharing. Devices will auto-discover each other when on the same LAN or Wifi network, and opportunistically sync. I wrote more about offline uses of Syncthing, and its use with NNCP, in my blog post A simple, delay-tolerant, offline-capable mesh network with Syncthing (+ optional NNCP). Yes, it is a form of a Mesh Network! Homepage: https://syncthing.net/

Briar Briar is an instant messaging service based around Android. It s IM with a twist: it can use a mesh of Bluetooh devices. Or, if Internet is available, Tor. It has even been extended to support the use of SD cards and USB sticks to carry your messages. Like some others here, it can relay messages for third parties as well. Homepage: https://briarproject.org/

Manyverse and Scuttlebutt Manyverse is a client for Scuttlebutt, which is a sort of asynchronous, offline-friendly social network. You can use it to keep in touch with your family and friends, and it supports syncing over Bluetooth and Wifi even in the absence of Internet. Homepages: https://www.manyver.se/ and https://scuttlebutt.nz/

Yggdrasil Yggdrasil is a self-healing, fully end-to-end Encrypted Mesh Network. It can work among local devices or on the global Internet. It has network services that can egress onto things like Tor, I2P, and the public Internet. Yggdrasil makes a perfect companion to ad-hoc wifi as it has auto peer discovery on the local network. I talked about it in more detail in my blog post Make the Internet Yours Again With an Instant Mesh Network. Homepage: https://yggdrasil-network.github.io/

Ad-Hoc Wifi Few people know about the ad-hoc wifi mode. Ad-hoc wifi lets devices in range talk to each other without an access point. You just all set your devices to the same network name and password and there you go. However, there often isn t DHCP, so IP configuration can be a bit of a challenge. Yggdrasil helps here.

NNCP Moving now to more advanced tools, NNCP lets you assemble a network of peers that can use Asynchronous Communication over sneakernet, USB drives, radios, CD-Rs, Internet, tor, NNCP over Yggdrasil, Syncthing, Dropbox, S3, you name it . NNCP supports multi-hop file transfer and remote execution. It is fully end-to-end encrypted. Think of it as the offline version of ssh. Homepage: https://nncp.mirrors.quux.org/

Meshtastic Meshtastic uses long-range, low-power LoRa radios to build a long-distance, encrypted, instant messaging system that is a Mesh Network. It requires specialized hardware, about $30, but will tend to get much better range than simple radios, and with very little power. Homepages: https://meshtastic.org/ and https://meshtastic.letstalkthis.com/

Portable Satellite Communicators You can get portable satellite communicators that can send SMS from anywhere on earth with a clear view of the sky. The Garmin InReach mini and Zoleo are two credible options. Subscriptions range from about $10 to $40 per month depending on usage. They also have global SOS features.

Telephone Lines If you have a phone line and a modem, UUCP can get through just about anything. It s an older protocol that lacks modern security, but will deal with slow and noisy serial lines well. XBee SX radios also have a serial mode that can work well with UUCP.

Additional Suggestions It is probably useful to have a Linux live USB stick with whatever software you want to use handy. Debian can be installed from the live environment, or you could use a security-focused distribution such as Tails or Qubes.

References This page originated in my Mastodon thread and incorporates some suggestions I received there. It also formed a post on my blog.

25 August 2021

John Goerzen: Excellent Experience with Debian Bullseye

I ve appreciated the bullseye upgrade, like most Debian upgrades. I m not quite sure how, since I was already running a backports kernel, but somehow the entire system is snappier. Maybe newer X or something? I m really pleased with it. Hardware integration is even nicer now, particularly the automatic driverless support for scanners in addition to the existing support for printers. All in all, a very nice upgrade, and pretty painless. I experienced a few odd situations. For one, I had been using Gnome Flashback. Since xmonad-log-applet didn t compile there (due to bitrot in the log applet, not flashback), and I had been finding Gnome Flashback to be a rather dusty and forgotten corner of Gnome for a long time, I decided to try Mate. Mate just seemed utterly unable to handle a situation with a laptop and an external monitor very well. I want to use only the external monitor with the laptop lid is closed, and it just couldn t remember how to do the right thing external monitor on, laptop monitor off, laptop not put into suspend. gdm3 also didn t seem to be able to put the external monitor to sleep, either, causing a few nights of wasted power. So off I went to XFCE, which I had been using for years on my workstation anyhow. Lots more settings available in XFCE, plus things Just Worked there. Odd that XFCE, the thin and light DE, is now the one that has the most relevant settings. It seems the Gnome let s remove a bunch of features approach has extended to MATE as well. When I switched to XFCE, I also removed gdm3 from my system, leaving lightdm as the only DM on it. That matched what my desktop machine was using, and also what task-xfce-desktop called for. But strangely, the XFCE settings for lightdm were completely different between the laptop and the desktop. It turns out that with lightdm, you can have the lightdm-gtk-greeter and the accompanying lightdm-gtk-greeter-settings, or slick-greeter and the accompanying lightdm-settings. One machine had one greeter and settings, and the other had the other. Why, I don t know. But lightdm-gtk-greeter-settings had the necessary options for putting monitors to sleep on the login screen, so I went with it. This does highlight a bit of a weakness in Debian upgrades. There is SO MUCH choice in Debian, which I highly value. At some point, almost certainly without my conscious choice, one machine got one greeter and another got the other. Despite both having task-xfce-desktop installed, they got different desktop experiences. There isn t a great way to say OK, I know I had a bunch of things installed before, but NOW I want the default bullseye experience . But overall, it is an absolutely fantastic distribution. It is great to see this nonprofit community distribution continue to have such quality on such an immense scale. And hard to believe I ve been a Debian developer for 25 years. That seems almost impossible!

18 August 2021

John Goerzen: Distributed, Asynchronous Git Syncing with NNCP

I have a problem. I have a directory that I use with org-mode and org-roam. I want it to be synced across multiple machines. I also want to keep the history with git. And, I want to use end-to-end encryption (no storing a plain git repo on a remote server), have a serverless setup, not require any two machines to be up simultaneously, and be resilient in the face of races and conflicts. Whew. I ve tried a number of setups git-remote-gcrypt on a remote server (fragile), some complicated scripts around a separate repo in syncthing (requires one machine to be in charge ), etc. They all were subpar. Then NNCP introdoced asynchronous multicast and I was intrigued. So, I wrote gitsync-nncp, which uses NNCP to distribute git bundles to all the participating machines. The comprehensive documentation for gitsync-nncp goes into a lot more detail about how it works and what problems it solves. It s working quite well for me!

30 December 2020

John Goerzen: Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let s dig into the details. Today s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later. The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I m going to try to go into more detail here. Assumptions I am assuming a setup where: Hardware Let s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don t need encrypted backups or don t care that much about performance may people probably fall into this category you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server. I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it s been quite reliable. For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 toaster (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive. Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick plug it in, all the spooled data is moved over, then plug it in to the backup system and it s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup. Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I m going to focus on how to integrate ZFS with NNCP. Operating System Of course, it should be no surprise that I set this up on Debian. As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state. Security There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it s trivial to add gpg to the pipeline as well, and in fact I ll be demonstrating that in a future post for other reasons. Software Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won t work for this situation. I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well. Preparing NNCP I m going to assume three hosts in this setup: The basic NNCP workflow documentation covers the basic steps. You ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you ll add a via line like this:
backupsvr:  
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]
This tells NNCP that data destined for backupsvr should always be sent via spooler first. You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer. Generating Backup Data Now, on the laptop, install simplesnap (2.0.0 or above). Although you won t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:
zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap
Then, create a script /usr/local/bin/runsimplesnap like this:
#!/bin/bash
set -e
simplesnap --store tank/simplesnap --setname backups --local --host  hostname  \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap
su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'
if ip addr   grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi
The call to simplesnap sets it up to send the data to simplesnap-queue, which we ll create in a moment. The receivmd, plus noreap, sets it up to run without ZFS on the local system. The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don t want to automatically do this in case I m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options. Now, here s what /usr/local/bin/simplesnap-queue looks like:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2
This is a pretty simple script. simplesnap will call it with a path based on the store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive. Receiving ZFS backups In the NNCP configuration on the recipient s side, in the laptop section, we define what command it s allowed to run as zfsreceive:
      exec:  
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
       
We authorize the nncp user to run this under sudo in /etc/sudoers.d/local nncp:
Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive
The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later. Now, here s a basic nncp-zfs-receive script:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
DEST="$1"
# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"
And there you have it all the basics are in place. Update 2020-12-30: An earlier version of this article had zfs receive -F instead of zfs receive -o readonly=on -x mountpoint . These changed arguments are more robust.
Update 2021-01-04: I am now recommending zfs receive -u -o readonly=on ; see my successor article for more. Enhancements You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
# $1 will be the host/dataset
DEST="$1"
HOST=" echo "$1"   sed 's,/.*,,g' "
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
exiterror ()  
   logerror "$1"
   echo "$1" 1>&2
   exit 10
 
# Sanity check
if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi
runcommand zfs receive -F "$STORE/$DEST"
Now you ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did. Further notes on NNCP nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

16 December 2020

John Goerzen: How To Join the Fediverse and Cast Off the Attention Economy

In a recent post, I wrote about how the attention economy in use at big social networks hurts you. In this post, I m going to suggest what to do about it. Mastodon and the Fediverse When you use email, you can send a message from an account at Google to one at Yahoo, Microsoft, or any of millions of businesses and organizations running their own mail server. Unlike, say, Facebook, email isn t a single service, but rather a whole bunch of independent systems that can communicate (or federate) with each other. The Fediverse is similar, and the most advanced Fediverse client is Mastodon. Mastodon: It s easy to get started! Head over to joinmastodon.org and click Get Started . Pick a community don t worry, this isn t a hugely consequential decision, as you can always move or change later. You can browse activity from across the Fediverse, or just on your local community, so if you find a community with similar interests, it can be a neat way to find others to follow. If you re looking for more details, mastodon.help has a nice guide. Defeating the Attention Economy So, why does Mastodon make a difference? First of all, you get to pick your host (and even software). With Twitter, you pretty much are using Twitter (yes, I know of things like Hootsuite, but for the vast majority of people, it s twitter.com only). With Mastodon, you have choice. Pick the host that runs the software and has the kind of moderation you like. Secondly, Mastodon is not for profit. There is no money to be made in keeping you on the site. Almost all Mastodon instances are ad-free. And Mastodon s completely open protocols make it easy to go elsewhere if you like. It s Not Just Mastodon! There are plenty of other programs in the Fediverse. And, this is really key, they all interact with each other. You can share photos in Pixelfed (sort of like a federated Instagram) and see them and comment in Mastodon! Some things to point out: And there are many others. This blog, for instance, runs WordPress and uses an ActivityPub connector; comments from the Fediverse integrate here. Find me in the Fediverse You can look me up: just type in @jgoerzen@floss.social in the search box of any Mastodon instance and click Follow. You can also follow this blog at @jgoerzen@changelog.complete.org.

14 August 2020

John Goerzen: In Which COVID-19 Misinformation Leads To A Bunch of Graphs Made With Rust

A funny and by funny, I mean sad thing has happened. Recently the Kansas Department of Health and Environment (KDHE) has been analyzing data from the patchwork implementation of mask requirements in Kansas. They came to a conclusion that shouldn t be surprising to anyone: masks help. They published a chart showing this. A right-wing propaganda publication got ahold of this, and claimed the numbers were doctored because there were two-different Y-axes. I set about to analyze the data myself from public sources, and produced graphs of various kinds using a single Y-axis and supporting the idea that the graphs were not, in fact, doctored. Here s one graph that s showing that:
In order to do that, I had imported COVID-19 data from various public sources. Many states in the US are large enough to have significant variation in COVID-19 conditions, and many of the source people look at don t show county-level data over time. I wanted to do that. Eventually, I wrote covid19db, which ingests data from a number of public sources and generates a SQLite database file. Using Github Actions, this file is automatically updated every morning and available for download. Or, you can download the code and generate a database yourself locally. Then, I wrote covid19ks, which generates various pretty graphs covering the data. These graphs, incidentally, turn out to highlight just how poorly the United States is doing compared to the rest of the industrialized world. I hope that these resources, and especially covid19db, might be useful to others that would like to analyze the data. The code isn t the prettiest since it was done in a hurry, but I think that functionally this is useful.

6 November 2017

John Goerzen: The Yellow House Phone Company (Featuring Asterisk and an 11-year-old)

Well Jacob, do you think we should set up our own pretend phone company in the house? We can DO THAT? Yes! Then yes. Yes! YES YES YESYESYESYES YES! Let s do it, dad! Not long ago, my parents had dug up the old phone I used back in the day. We still have a landline, and Jacob was having fun discovering how an analog phone works. I told him about the special number he could call to get the time and temperature read out to him. He discovered what happens if you call your own number and hang up. He figured out how to play Mary Had a Little Lamb using touchtone keys (after a slightly concerned lecture from me setting out some rules to make sure his musical dialing wouldn t result in any, well, dialing.) He was hooked. So I thought that taking it to the next level would be a good thing for a rainy day. I have run Asterisk before, though I had unfortunately gotten rid of most of my equipment some time back. But I found a great deal on a Cisco 186 ATA (Analog Telephone Adapter). It has two FXS lines (FXS ports simulate the phone company, and provide dialtone and ring voltage to a connected phone), and of course hooks up to the LAN. We plugged that in, and Jacob was amazed to see its web interface come up. I had to figure out how to configure it (unfortunately, it uses SCCP rather than SIP, and figuring out Asterisk s chan_skinny took some doing, but we got there.) I set up voicemail. He loved it. He promptly figured out how to record his own greetings. We set up a second phone on the other line, so he could call between them. The cordless phones in our house support SIP, so I configured one of them as a third line. He spent a long time leaving himself messages. IMG_3465 Pretty soon we both started having ideas. I set up extension 777, where he could call for the time. Then he wanted a way to get the weather forecast. Well, weather-util generates a text-based report. With it, a little sed and grep tweaking, the espeak TTS engine, and a little help from sox, I had a shell script worked up that would read back a forecast whenever he called a certain extension. He was super excited! That s great, dad! Can it also read weather alerts too? Sure! weather-util has a nice option just for that. Both boys cackled as the system tried to read out the NWS header (their timestamps like 201711031258 started with two hundred one billion ) Then I found an online source for streaming NOAA Weather Radio feeds Jacob enjoys listening to weather radio and I set up another extension he could call to listen to that. More delight! But it really took off when I asked him, Would you like to record your own menu? You mean those things where it says press 1 or 2 for this or that? Yes. WE CAN DO THAT? Oh yes! YES, LET S DO IT RIGHT NOW! So he recorded a menu, then came and hovered by me while I hacked up extensions.conf, then eagerly went back to the phone to try it. Oh the excitement of hearing hisown voice, and finding that it worked! Pretty soon he was designing sub-menus ( OK Dad, so we ll set it up so people can press 2 for the weather, and then choose if they want weather radio or the weather report. I m recording that now. Got it? ) He has informed me that next Saturday we will build an intercom system like we have at school. I m going to have to have some ideas on how to tie Squeezebox in with Asterisk to make that happen, I think. Maybe this will do.

22 August 2017

John Goerzen: The Eclipse

Highway US-81 in northern Kansas and southern Nebraska is normally a pleasant, sleepy sort of drive. It was upgraded to a 4-lane road not too long ago, but as far as 4-lane roads go, its traffic is typically light. For drives from Kansas to South Dakota, it makes a pleasant route. Yesterday was eclipse day. I strongly suspect that highway 81 had more traffic that day than it ever has before, or ever will again. For nearly the entire 3-hour drive to Geneva, NE, it was packed though mostly still moving at a good speed. And for our entire drive back, highway 81 and every other southbound road we used was so full it felt like rush hour in Dallas. (Well, not quite. Traffic was still moving.) I believe scenes like this were played out across the continent. I ve been taking a lot of photos, and writing about our new baby Martha lately. Now it s time to write a bit about some more adventures with Jacob and Oliver they re now in third and fifth grades in school. We had been planning to fly, and airports I called were either full, or were planning to park planes in the grass, or even shut down some runways to use for parking. The airport in the little town of Beatrice, NE (which I had visited twice before) was even going to have a temporary FAA control tower. At the last minute, due to some storm activity near home at departure time, we unloaded the plane and drove instead. The atmosphere at the fairgrounds in Geneva was festive. One family had brought bubbles for their kids and extras to share. IMG_20170821_113229 I had bought the boys a book about the eclipse, which they were reading before and during the event. They were both great, safe users of their eclipse glasses. IMG_20170821_124809 Jacob caught a toad, and played with it for awhile. He wanted to bring it home with us, but I convinced him to let me take a picture of him with his toad friend instead. IMG_20170821_124553 While we were waiting for totality, a number of buses from the local school district arrived. So by the time the big moment arrived, we could hear the distant roar of delight and applause from the school children gathered at the far end of the field, plus all the excitement nearby. Both boys were absolutely ecstatic to be witnessing it (and so was I!) Wow! Awesome! And simple cackles of delight were heard. On the drive home, they both kept talking about how amazing it was, and it was once in a lifetime. We enjoyed our eclipse neighbors the woman from San Antonio next to us, the surprise discovery of another family from just a few miles from us parked two cars down, even running into relatives at a restaurant on the way home. The applause from all around when it started and when it ended. And the feeling, which is hard to describe, of awe and amazement at the wonders of our world and our universe. There are many problems with the world right now, but somehow there s something right about people coming together from all over to enjoy it.

10 August 2017

John Goerzen: A new baby and deep smiles

IMG_2059 A month ago, we were waiting for our new baby; time seemed to stand still. Now she is here! Martha Goerzen was born recently, and she is doing well and growing! Laura and I have enjoyed moments of cuddling her, watching her stare at our faces, hearing her (hopefully) soft sounds as she falls asleep in our arms. It is also heart-warming to see Martha s older brothers take such an interest in her. Here is the first time Jacob got to hold her: IMG_1846 Oliver, who is a boy very much into sports, play involving police and firefighters, and such, has started adding aww and she s so cute! to his common vocabulary. He can be very insistent about interrupting me to hold her, too.

9 June 2017

John Goerzen: Fixing the Problems with Docker Images

I recently wrote about the challenges in securing Docker container contents, and in particular with keeping up-to-date with security patches from all over the Internet. Today I want to fix that. Besides security, there is a second problem: the common way of running things in Docker pretends to provide a traditional POSIX API and environment, but really doesn t. This is a big deal. Before diving into that, I want to explain something: I have often heard it said the Docker provides single-process containers. This is unambiguously false in almost every case. Any time you have a shell script inside Docker that calls cp or even ls, you are running a second process. Web servers from Apache to whatever else use processes or threads of various types to service multiple connections at once. Many Docker containers are single-application, but a process is a core part of the POSIX API, and very little software would work if it was limited to a single process. So this is my little plea for more precise language. OK, soapbox mode off. Now then, in a traditional Linux environment, besides your application, there are other key components of the system. These are usually missing in Docker containers. So today, I will fix this also. In my docker-debian-base images, I have prepared a system that still has only 11MB RAM overhead, makes minimal changes on top of Debian, and yet provides a very complete environment and API. Here s what you get: The above goes into my minimal image. Additional images add layers on top of it, and here are some of the features they add: All of the above, including the optional features, has an 11MB overhead on start. Not bad for so much, right? From here, you can layer on top all your usual Dockery things. You can still run one application per container. But you can now make sure your disk doesn t fill up from logs, run your database vacuuming commands at will, have your blog download its RSS feeds every few minutes, etc all from within the container, as it should be. Furthermore, you don t have to reinvent the wheel, because Debian already ships with things to take care of a lot of this out of the box and now those tools will just work. There is some popular work done in this area already by phusion s baseimage-docker. However, I made my own for these reasons: Finally a word on the choice to use sysvinit. It would have been simpler to use systemd here, since it is the default in Debian these days. Unfortunately, systemd requires you to poke some holes in the Docker security model, as well as mount a cgroups filesystem from the host. I didn t consider this acceptable, and sysvinit ran without these workarounds, so I went with it. With all this, Docker becomes a viable replacement for KVM for various services on my internal networks. I ll be writing about that later.

Next.