Search Results: "John Goerzen"

3 January 2024

John Goerzen: Live Migrating from Raspberry Pi OS bullseye to Debian bookworm

I ve been getting annoyed with Raspberry Pi OS (Raspbian) for years now. It s a fork of Debian, but manages to omit some of the most useful things. So I ve decided to migrate all of my Pis to run pure Debian. These are my reasons:
  1. Raspberry Pi OS has, for years now, specified that there is no upgrade path. That is, to get to a newer major release, it s a reinstall. While I have sometimes worked around this, for a device that is frequently installed in hard-to-reach locations, this is even more important than usual. It s common for me to upgrade machines for a decade or more across Debian releases and there s no reason that it should be so much more difficult with Raspbian.
  2. As I noted in Consider Security First, the security situation for Raspberry Pi OS isn t as good as it is with Debian.
  3. Raspbian lags behind Debian often times by 6 months or more for major releases, and days or weeks for bug fixes and security patches.
  4. Raspbian has no direct backports support, though Raspberry Pi 3 and above can use Debian s backports (per my instructions as Installing Debian Backports on Raspberry Pi)
  5. Raspbian uses a custom kernel without initramfs support
It turns out it is actually possible to do an in-place migration from Raspberry Pi OS bullseye to Debian bookworm. Here I will describe how. Even if you don t have a Raspberry Pi, this might still be instructive on how Raspbian and Debian packages work.

WARNINGS Before continuing, back up your system. This process isn t for the neophyte and it is entirely possible to mess up your boot device to the point that you have to do a fresh install to get your Pi to boot. This isn t a supported process at all.

Architecture Confusion Debian has three ARM-based architectures:
  • armel, for the lowest-end 32-bit ARM devices without hardware floating point support
  • armhf, for the higher-end 32-bit ARM devices with hardware float (hence hf )
  • arm64, for 64-bit ARM devices (which all have hardware float)
Although the Raspberry Pi 0 and 1 do support hardware float, they lack support for other CPU features that Debian s armhf architecture assumes. Therefore, the Raspberry Pi 0 and 1 could only run Debian s armel architecture. Raspberry Pi 3 and above are capable of running 64-bit, and can run both armhf and arm64. Prior to the release of the Raspberry Pi 5 / Raspbian bookworm, Raspbian only shipped the armhf architecture. Well, it was an architecture they called armhf, but it was different from Debian s armhf in that everything was recompiled to work with the more limited set of features on the earlier Raspberry Pi boards. It was really somewhere between Debian s armel and armhf archs. You could run Debian armel on those, but it would run more slowly, due to doing floating point calculations without hardware support. Debian s raspi FAQ goes into this a bit. What I am going to describe here is going from Raspbian armhf to Debian armhf with a 64-bit kernel. Therefore, it will only work with Raspberry Pi 3 and above. It may theoretically be possible to take a Raspberry Pi 2 to Debian armhf with a 32-bit kernel, but I haven t tried this and it may be more difficult. I have seen conflicting information on whether armhf really works on a Pi 2. (If you do try it on a Pi 2, ignore everything about arm64 and 64-bit kernels below, and just go with the linux-image-armmp-lpae kernel per the ARMMP page) There is another wrinkle: Debian doesn t support running 32-bit ARM kernels on 64-bit ARM CPUs, though it does support running a 32-bit userland on them. So we will wind up with a system with kernel packages from arm64 and everything else from armhf. This is a perfectly valid configuration as the arm64 like x86_64 is multiarch (that is, the CPU can natively execute both the 32-bit and 64-bit instructions). (It is theoretically possible to crossgrade a system from 32-bit to 64-bit userland, but that felt like a rather heavy lift for dubious benefit on a Pi; nevertheless, if you want to make this process even more complicated, refer to the CrossGrading page.)

Prerequisites and Limitations In addition to the need for a Raspberry Pi 3 or above in order for this to work, there are a few other things to mention. If you are using the GPIO features of the Pi, I don t know if those work with Debian. I think Raspberry Pi OS modified the desktop environment more than other components. All of my Pis are headless, so I don t know if this process will work if you use a desktop environment. I am assuming you are booting from a MicroSD card as is typical in the Raspberry Pi world. The Pi s firmware looks for a FAT partition (MBR type 0x0c) and looks within it for boot information. Depending on how long ago you first installed an OS on your Pi, your /boot may be too small for Debian. Use df -h /boot to see how big it is. I recommend 200MB at minimum. If your /boot is smaller than that, stop now (or use some other system to shrink your root filesystem and rearrange your partitions; I ve done this, but it s outside the scope of this article.) You need to have stable power. Once you begin this process, your pi will mostly be left in a non-bootable state until you finish. (You did make a backup, right?)

Basic idea The basic idea here is that since bookworm has almost entirely newer packages then bullseye, we can just switch over to it and let the Debian packages replace the Raspbian ones as they are upgraded. Well, it s not quite that easy, but that s the main idea.

Preparation First, make a backup. Even an image of your MicroSD card might be nice. OK, I think I ve said that enough now. It would be a good idea to have a HDMI cable (with the appropriate size of connector for your particular Pi board) and a HDMI display handy so you can troubleshoot any bootup issues with a console.

Preparation: access The Raspberry Pi OS by default sets up a user named pi that can use sudo to gain root without a password. I think this is an insecure practice, but assuming you haven t changed it, you will need to ensure it still works once you move to Debian. Raspberry Pi OS had a patch in their sudo package to enable it, and that will be removed when Debian s sudo package is installed. So, put this in /etc/sudoers.d/010_picompat:
pi ALL=(ALL) NOPASSWD: ALL
Also, there may be no password set for the root account. It would be a good idea to set one; it makes it easier to log in at the console. Use the passwd command as root to do so.

Preparation: bluetooth Debian doesn t correctly identify the Bluetooth hardware address. You can save it off to a file by running hcitool dev > /root/bluetooth-from-raspbian.txt. I don t use Bluetooth, but this should let you develop a script to bring it up properly.

Preparation: Debian archive keyring You will next need to install Debian s archive keyring so that apt can authenticate packages from Debian. Go to the bookworm download page for debian-archive-keyring and copy the URL for one of the files, then download it on the pi. For instance:
wget http://http.us.debian.org/debian/pool/main/d/debian-archive-keyring/debian-archive-keyring_2023.3+deb12u1_all.deb
Use sha256sum to verify the checksum of the downloaded file, comparing it to the package page on the Debian site. Now, you ll install it with:
dpkg -i debian-archive-keyring_2023.3+deb12u1_all.deb

Package first steps From here on, we are making modifications to the system that can leave it in a non-bootable state. Examine /etc/apt/sources.list and all the files in /etc/apt/sources.list.d. Most likely you will want to delete or comment out all lines in all files there. Replace them with something like:
deb http://deb.debian.org/debian/ bookworm main non-free-firmware contrib non-free
deb http://security.debian.org/debian-security bookworm-security main non-free-firmware contrib non-free
deb https://deb.debian.org/debian bookworm-backports main non-free-firmware contrib non-free
(you might leave off contrib and non-free depending on your needs) Now, we re going to tell it that we ll support arm64 packages:
dpkg --add-architecture arm64
And finally, download the bookworm package lists:
apt-get update
If there are any errors from that command, fix them and don t proceed until you have a clean run of apt-get update.

Moving /boot to /boot/firmware The boot FAT partition I mentioned above is mounted at /boot by Raspberry Pi OS, but Debian s scripts assume it will be at /boot/firmware. We need to fix this. First:
umount /boot
mkdir /boot/firmware
Now, edit fstab and change the reference to /boot to be to /boot/firmware. Now:
mount -v /boot/firmware
cd /boot/firmware
mv -vi * ..
This mounts the filesystem at the new location, and moves all its contents back to where apt believes it should be. Debian s packages will populate /boot/firmware later.

Installing the first packages Now we start by installing the first of the needed packages. Eventually we will wind up with roughly the same set Debian uses.
apt-get install linux-image-arm64
apt-get install firmware-brcm80211=20230210-5
apt-get install raspi-firmware
If you get errors relating to firmware-brcm80211 from any commands, run that install firmware-brcm80211 command and then proceed. There are a few packages that Raspbian marked as newer than the version in bookworm (whether or not they really are), and that s one of them.

Configuring the bootloader We need to configure a few things in /etc/default/raspi-firmware before proceeding. Edit that file. First, uncomment (or add) a line like this:
KERNEL_ARCH="arm64"
Next, in /boot/cmdline.txt you can find your old Raspbian boot command line. It will say something like:
root=PARTUUID=...
Save off the bit starting with PARTUUID. Back in /etc/default/raspi-firmware, set a line like this:
ROOTPART=PARTUUID=abcdef00
(substituting your real value for abcdef00). This is necessary because the microSD card device name often changes from /dev/mmcblk0 to /dev/mmcblk1 when switching to Debian s kernel. raspi-firmware will encode the current device name in /boot/firmware/cmdline.txt by default, which will be wrong once you boot into Debian s kernel. The PARTUUID approach lets it work regardless of the device name.

Purging the Raspbian kernel Run:
dpkg --purge raspberrypi-kernel

Upgrading the system At this point, we are going to run the procedure beginning at section 4.4.3 of the Debian release notes. Generally, you will do:
apt-get -u upgrade
apt full-upgrade
Fix any errors at each step before proceeding to the next. Now, to remove some cruft, run:
apt-get --purge autoremove
Inspect the list to make sure nothing important isn t going to be removed.

Removing Raspbian cruft You can list some of the cruft with:
apt list '~o'
And remove it with:
apt purge '~o'
I also don t run Bluetooth, and it seemed to sometimes hang on boot becuase I didn t bother to fix it, so I did:
apt-get --purge remove bluez

Installing some packages This makes sure some basic Debian infrastructure is available:
apt-get install wpasupplicant parted dosfstools wireless-tools iw alsa-tools
apt-get --purge autoremove

Installing firmware Now run:
apt-get install firmware-linux

Resolving firmware package version issues If it gives an error about the installed version of a package, you may need to force it to the bookworm version. For me, this often happened with firmware-atheros, firmware-libertas, and firmware-realtek. Here s how to resolve it, with firmware-realtek as an example:
  1. Go to https://packages.debian.org/PACKAGENAME for instance, https://packages.debian.org/firmware-realtek. Note the version number in bookworm in this case, 20230210-5.
  2. Now, you will force the installation of that package at that version:
    apt-get install firmware-realtek=20230210-5
    
  3. Repeat with every conflicting package until done.
  4. Rerun apt-get install firmware-linux and make sure it runs cleanly.
Also, in the end you should be able to:
apt-get install firmware-atheros firmware-libertas firmware-realtek firmware-linux

Dealing with other Raspbian packages The Debian release notes discuss removing non-Debian packages. There will still be a few of those. Run:
apt list '?narrow(?installed, ?not(?origin(Debian)))'
Deal with them; mostly you will need to force the installation of a bookworm version using the procedure in the section Resolving firmware package version issues above (even if it s not for a firmware package). For non-firmware packages, you might possibly want to add --mark-auto to your apt-get install command line to allow the package to be autoremoved later if the things depending on it go away. If you aren t going to use Bluetooth, I recommend apt-get --purge remove bluez as well. Sometimes it can hang at boot if you don t fix it up as described above.

Set up networking We ll be switching to the Debian method of networking, so we ll create some files in /etc/network/interfaces.d. First, eth0 should look like this:
allow-hotplug eth0
iface eth0 inet dhcp
iface eth0 inet6 auto
And wlan0 should look like this:
allow-hotplug wlan0
iface wlan0 inet dhcp
    wpa-conf /etc/wpa_supplicant/wpa_supplicant.conf
Raspbian is inconsistent about using eth0/wlan0 or renamed interface. Run ifconfig or ip addr. If you see a long-named interface such as enx<something> or wlp<something>, copy the eth0 file to the one named after the enx interface, or the wlan0 file to the one named after the wlp interface, and edit the internal references to eth0/wlan0 in this new file to name the long interface name. If using wifi, verify that your SSIDs and passwords are in /etc/wpa_supplicant/wpa_supplicant.conf. It should have lines like:
network= 
   ssid="NetworkName"
   psk="passwordHere"
 
(This is where Raspberry Pi OS put them).

Deal with DHCP Raspberry Pi OS used dhcpcd, whereas bookworm normally uses isc-dhcp-client. Verify the system is in the correct state:
apt-get install isc-dhcp-client
apt-get --purge remove dhcpcd dhcpcd-base dhcpcd5 dhcpcd-dbus

Set up LEDs To set up the LEDs to trigger on MicroSD activity as they did with Raspbian, follow the Debian instructions. Run apt-get install sysfsutils. Then put this in a file at /etc/sysfs.d/local-raspi-leds.conf:
class/leds/ACT/brightness = 1
class/leds/ACT/trigger = mmc1

Prepare for boot To make sure all the /boot/firmware files are updated, run update-initramfs -u. Verify that root in /boot/firmware/cmdline.txt references the PARTUUID as appropriate. Verify that /boot/firmware/config.txt contains the lines arm_64bit=1 and upstream_kernel=1. If not, go back to the section on modifying /etc/default/raspi-firmware and fix it up.

The moment arrives Cross your fingers and try rebooting into your Debian system:
reboot
For some reason, I found that the first boot into Debian seems to hang for 30-60 seconds during bootstrap. I m not sure why; don t panic if that happens. It may be necessary to power cycle the Pi for this boot.

Troubleshooting If things don t work out, hook up the Pi to a HDMI display and see what s up. If I anticipated a particular problem, I would have documented it here (a lot of the things I documented here are because I ran into them!) So I can t give specific advice other than to watch boot messages on the console. If you don t even get kernel messages going, then there is some problem with your partition table or /boot/firmware FAT partition. Otherwise, you ve at least got the kernel going and can troubleshoot like usual from there.

John Goerzen: Consider Security First

I write this in the context of my decision to ditch Raspberry Pi OS and move everything I possibly can, including my Raspberry Pi devices, to Debian. I will write about that later. But for now, I wanted to comment on something I think is often overlooked and misunderstood by people considering distributions or operating systems: the huge importance of getting security updates in an automated and easy way.

Background Let s assume that these statements are true, which I think are well-supported by available evidence:
  1. Every computer system (OS plus applications) that can do useful modern work has security vulnerabilities, some of which are unknown at any given point in time;
  2. During the lifetime of that computer system, some of these vulnerabilities will be discovered. For a (hopefully large) subset of those vulnerabilities, timely patches will become available.
Now then, it follows that applying those timely patches is a critical part of having a system that it as secure as possible. Of course, you have to do other things as well good passwords, secure practices, etc but, fundamentally, if your system lacks patches for known vulnerabilities, you ve already lost at the security ballgame.

How to stay patched There is something of a continuum of how you might patch your system. It runs roughly like this, from best to worst:
  1. All components are kept up-to-date automatically, with no intervention from the user/operator
  2. The operator is automatically alerted to necessary patches, and they can be easily installed with minimal intervention
  3. The operator is automatically alerted to necessary patches, but they require significant effort to apply
  4. The operator has no way to detect vulnerabilities or necessary patches
It should be obvious that the first situation is ideal. Every other situation relies on the timeliness of human action to keep up-to-date with security patches. This is a fallible situation; humans are busy, take trips, dismiss alerts, miss alerts, etc. That said, it is rare to find any system living truly all the way in that scenario, as you ll see.

What is your system ? A critical point here is: what is your system ? It includes:
  • Your kernel
  • Your base operating system
  • Your applications
  • All the libraries needed to run all of the above
Some OSs, such as Debian, make little or no distinction between the base OS and the applications. Others, such as many BSDs, have a distinction there. And in some cases, people will compile or install applications outside of any OS mechanism. (It must be stressed that by doing so, you are taking the responsibility of patching them on your own shoulders.)

How do common systems stack up?
  • Debian, with its support for unattended-upgrades, needrestart, debian-security-support, and such, is largely category 1. It can automatically apply security patches, in most cases can restart the necessary services for the patch to take effect, and will alert you when some processes or the system must be manually restarted for a patch to take effect (for instance, a kernel update). Those cases requiring manual intervention are category 2. The debian-security-support package will even warn you of gaps in the system. You can also use debsecan to scan for known vulnerabilities on a given installation.
  • FreeBSD has no way to automatically install security patches for things in the packages collection. As with many rolling-release systems, you can t automate the installation of these security patches with FreeBSD because it is not safe to blindly update packages. It s not safe to blindly update packages because they may bring along more than just security patches: they may represent major upgrades that introduce incompatibilities, etc. Unlike Debian s practice of backporting fixes and thus producing narrowly-tailored patches, forcing upgrades to newer versions precludes a minimal intervention install. Therefore, rolling release systems are category 3.
  • Things such as Snap, Flatpak, AppImage, Docker containers, Electron apps, and third-party binaries often contain embedded libraries and such for which you have no easy visibility into their status. For instance, if there was a bug in libpng, would you know how many of your containers had a vulnerability? These systems are category 4 you don t even know if you re vulnerable. It s for this reason that my Debian-based Docker containers apply security patches before starting processes, and also run unattended-upgrades and friends.

The pernicious library problem As mentioned in my last category above, hidden vulnerabilities can be a big problem. I ve been writing about this for years. Back in 2017, I wrote an article focused on Docker containers, but which applies to the other systems like Snap and so forth. I cited a study back then that Over 80% of the :latest versions of official images contained at least one high severity vulnerability. The situation is no better now. In December 2023, it was reported that, two years after the critical Log4Shell vulnerability, 25% of apps were still vulnerable to it. Also, only 21% of developers ever update third-party libraries after introducing them into their projects. Clearly, you can t rely on these images with embedded libraries to be secure. And since they are black box, they are difficult to audit. Debian s policy of always splitting libraries out from packages is hugely beneficial; it allows finegrained analysis of not just vulnerabilities, but also the dependency graph. If there s a vulnerability in libpng, you have one place to patch it and you also know exactly what components of your system use it. If you use snaps, or AppImages, you can t know if they contain a deeply embedded vulnerability, nor could you patch it yourself if you even knew. You are at the mercy of upstream detecting and remedying the problem a dicey situation at best.

Who makes the patches? Fundamentally, humans produce security patches. Often, but not always, patches originate with the authors of a program and then are integrated into distribution packages. It should be noted that every security team has finite resources; there will always be some CVEs that aren t patched in a given system for various reasons; perhaps they are not exploitable, or are too low-impact, or have better mitigations than patches. Debian has an excellent security team; they manage the process of integrating patches into Debian, produce Debian Security Advisories, maintain the Debian Security Tracker (which maintains cross-references with the CVE database), etc. Some distributions don t have this infrastructure. For instance, I was unable to find this kind of tracker for Devuan or Raspberry Pi OS. In contrast, Ubuntu and Arch Linux both seem to have active security teams with trackers and advisories.

Implications for Raspberry Pi OS and others As I mentioned above, I m transitioning my Pi devices off Raspberry Pi OS (Raspbian). Security is one reason. Although Raspbian is a fork of Debian, and you can install packages like unattended-upgrades on it, they don t work right because they use the Debian infrastructure, and Raspbian hasn t modified them to use their own infrastructure. I don t see any Raspberry Pi OS security advisories, trackers, etc. In short, they lack the infrastructure to support those Debian tools anyhow. Not only that, but Raspbian lags behind Debian in both new releases and new security patches, sometimes by days or weeks. Live Migrating from Raspberry Pi OS bullseye to Debian bookworm contains instructions for migrating Raspberry Pis to Debian.

25 December 2023

John Goerzen: The Grumpy Cricket (And Other Enormous Creatures)

This Christmas, one of my gifts to my kids was a text adventure (interactive fiction) game for them. Now that they ve enjoyed it, I m releasing it under the GPL v3. As interactive fiction, it s like an e-book, but the reader is also the player, guiding the exploration of the world. The Grumpy Cricket is designed to be friendly for a first-time player of interactive fiction. There is no way to lose the game or to die. There is an in-game hint system providing context-sensitive hints anytime the reader types HINT. There are splashes of humor throughout that got all three of my kids laughing. I wrote it in 2023 for my kids, which range in age from 6 to 17. That s quite a wide range, but they all were enjoying it. You can download it, get the source, or play it online in a web browser at https://www.complete.org/the-grumpy-cricket/

30 November 2023

Russell Coker: Links November 2023

The Long Now has an insightful article about air quality [1]. Every country needs food labelling laws like Mexico has [2]. Also we need to abolish the investor state tribunals, companies should just accept local laws and obey them or be treated in the same way as pirates on the high seas. Ian Jackson wrote a good post about conference policies regarding Covid19 [3]. We really need to do more about this, conservatives like to imagine that it s gone away but people are still getting sick and dying of it. John Goerzen wrote an informative article about air gaps and ways they can be part of a useful and usable security system [4]. This YouTube video has a good introduction to LLMs (Large Languge Models) for machine learning [5]. This eye tracker is interesting technology [6]. The video shows it being used for MS Flight Simulator but it can be used for other things. Unfortunately the price of about $550 Australian puts it out of range of a lot of free software work. I think this would be good for tracking the user FOR THEIR BENEFIT so that notifications won t be delivered when the user is concentrating. This ABC article about the risk of a past Covid19 infection exacerbating or accelerating Parkinson s or Alzheimer s is a worry [7]. Sam Hartman wrote an insightful blog post about AI safety, consent, and discussions of sex [8].

14 November 2023

John Goerzen: It s More Important To Recognize What Direction People Are Moving Than Where They Are

I recently read a post on social media that went something like this (paraphrased): If you buy an EV, you re part of the problem. You re advancing car culture and are actively hurting the planet. The only ethical thing to do is ditch your cars and put all your effort into supporting transit. Anything else is worthless. There is some truth there; supporting transit in areas it makes sense is better than having more cars, even EVs. But of course the key here is in areas it makes sense. My road isn t even paved. I live miles from the nearest town. And get into the remote regions of the western USA and you ll find people that live 40 miles from the nearest neighbor. There s no realistic way that mass transit is ever going to be a thing in these areas. And even if it were somehow usable, sending buses over miles where nobody lives just to reach the few that are there will be worse than private EVs. And because I can hear this argument coming a mile away, no, it doesn t make sense to tell these people to just not live in the country because the planet won t support that anymore, because those people are literally the ones that feed the ones that live in the cities. The funny thing is: the person that wrote that shares my concerns and my goals. We both care deeply about climate change. We both want positive change. And I, ahem, recently bought an EV. I have seen this play out in so many ways over the last few years. Drive a car? Get yelled at. Support the wrong politician? Get a shunning. Not speak up loudly enough about the right politician? That s a yellin too. The problem is, this doesn t make friends. In fact, it hurts the cause. It doesn t recognize this truth:
It is more important to recognize what direction people are moving than where they are.
I support trains and transit. I ve donated money and written letters to politicians. But, realistically, there will never be transit here. People in my county are unable to move all the way to transit. But what can we do? Plenty. We bought an EV. I ve been writing letters to the board of our local electrical co-op advocating for relaxation of rules around residential solar installations, and am planning one myself. It may well be that our solar-powered transportation winds up having a lower carbon footprint than the poster s transit use. Pick your favorite cause. Whatever it is, consider your strategy: What do you do with someone that is very far away from you, but has taken the first step to move an inch in your direction? Do you yell at them for not being there instantly? Or do you celebrate that they have changed and are moving?

15 September 2023

John Goerzen: How Gapped is Your Air?

Sometimes we want better-than-firewall security for things. For instance:
  1. An industrial control system for a municipal water-treatment plant should never have data come in or out
  2. Or, a variant of the industrial control system: it should only permit telemetry and monitoring data out, and nothing else in or out
  3. A system dedicated to keeping your GPG private keys secure should only have material to sign (or decrypt) come in, and signatures (or decrypted data) go out
  4. A system keeping your tax records should normally only have new records go in, but may on occasion have data go out (eg, to print a copy of an old record)
In this article, I ll talk about the high side (the high-security or high-sensitivity systems) and the low side (the lower-sensitivity or general-purpose systems). For the sake of simplicity, I ll assume the high side is a single machine, but it could as well be a whole network. Let s focus on examples 3 and 4 to make things simpler. Let s consider the primary concern to be data exfiltration (someone stealing your data), with a secondary concern of data integrity (somebody modifying or destroying your data). You might think the safest possible approach is Airgapped that is, there is literal no physical network connection to the machine at all. This help! But then, the problem becomes: how do we deal with the inevitable need to legitimately get things on or off of the system? As I wrote in Dead USB Drives Are Fine: Building a Reliable Sneakernet, by using tools such as NNCP, you can certainly create a sneakernet : using USB drives as transport. While this is a very secure setup, as with most things in security, it s less than perfect. The Wikipedia airgap article discusses some ways airgapped machines can still be exploited. It mentions that security holes relating to removable media have been exploited in the past. There are also other ways to get data out; for instance, Debian ships with gensio and minimodem, both of which can transfer data acoustically. But let s back up and think about why we think of airgapped machines as so much more secure, and what the failure modes of other approaches might be.

What about firewalls? You could very easily set up high-side machine that is on a network, but is restricted to only one outbound TCP port. There could be a local firewall, and perhaps also a special port on an external firewall that implements the same restrictions. A variant on this approach would be two computers connected directly by a crossover cable, though this doesn t necessarily imply being more secure. Of course, the concern about a local firewall is that it could potentially be compromised. An external firewall might too; for instance, if your credentials to it were on a machine that got compromised. This kind of dual compromise may be unlikely, but it is possible. We can also think about the complexity in a network stack and firewall configuration, and think that there may be various opportunities to have things misconfigured or buggy in a system of that complexity. Another consideration is that data could be sent at any time, potentially making it harder to detect. On the other hand, network monitoring tools are commonplace. On the other hand, it is convenient and cheap. I use a system along those lines to do my backups. Data is sent, gpg-encrypted and then encrypted again at the NNCP layer, to the backup server. The NNCP process on the backup server runs as an untrusted user, and dumps the gpg-encrypted files to a secure location that is then processed by a cron job using Filespooler. The backup server is on a dedicated firewall port, with a dedicated subnet. The only ports allowed out are for NNCP and NTP, and offsite backups. There is no default gateway. Not even DNS is permitted out (the firewall does the appropriate redirection). There is one pinhole allowed out, where a subset of the backup data is sent offsite. I initially used USB drives as transport, and it had no network connection at all. But there were disadvantages to doing this for backups particularly that I d have no backups for as long as I d forget to move the drives. The backup system also would have clock drift, and the offsite backup picture was more challenging. (The clock drift was a problem because I use 2FA on the system; a password, plus a TOTP generated by a Yubikey) This is pretty good security, I d think. What are the weak spots? Well, if there were somehow a bug in the NNCP client, and the remote NNCP were compromised, that could lead to a compromise of the NNCP account. But this itself would accomplish little; some other vulnerability would have to be exploited on the backup server, because the NNCP account can t see plaintext data at all. I use borgbackup to send a subset of backup data offsite over ssh. borgbackup has to run as root to be able to access all the files, but the ssh it calls runs as a separate user. A ssh vulnerability is therefore unlikely to cause much damage. If, somehow, the remote offsite system were compromised and it was able to exploit a security issue in the local borgbackup, that would be a problem. But that sounds like a remote possibility. borgbackup itself can t even be used over a sneakernet since it is not asynchronous. A more secure solution would probably be using something like dar over NNCP. This would eliminate the ssh installation entirely, and allow a complete isolation between the data-access and the communication stacks, and notably not require bidirectional communication. Logic separation matters too. My Roundup of Data Backup and Archiving Tools may be helpful here. Other attack vectors could be a vulnerability in the kernel s networking stack, local root exploits that could be combined with exploiting NNCP or borgbackup to gain root, or local misconfiguration that makes the sandboxes around NNCP and borgbackup less secure. Because this system is in my basement in a utility closet with no chairs and no good place for a console, I normally manage it via a serial console. While it s a dedicated line between the system and another machine, if the other machine is compromised or an adversary gets access to the physical line, credentials (and perhaps even data) could leak, albeit slowly. But we can do much better with serial lines. Let s take a look.

Serial lines Some of us remember RS-232 serial lines and their once-ubiquitous DB-9 connectors. Traditionally, their speed maxxed out at 115.2Kbps. Serial lines have the benefit that they can be a direct application-to-application link. In my backup example above, a serial line could directly link the NNCP daemon on one system with the NNCP caller on another, with no firewall or anything else necessary. It is simply up to those programs to open the serial device appropriately. This isn t perfect, however. Unlike TCP over Ethernet, a serial line has no inherent error checking. Modern programs such as NNCP and ssh assume that a lower layer is making the link completely clean and error-free for them, and will interpret any corruption as an attempt to tamper and sever the connection. However, there is a solution to that: gensio. In my page Using gensio and ser2net, I discuss how to run NNCP and ssh over gensio. gensio is a generic framework that can add framing, error checking, and retransmit to an unreliable link such as a serial port. It can also add encryption and authentication using TLS, which could be particularly useful for applications that aren t already doing that themselves. More traditional solutions for serial communications have their own built-in error correction. For instance, UUCP and Kermit both were designed in an era of noisy serial lines and might be an excellent fit for some use cases. The ZModem protocol also might be, though it offers somewhat less flexibility and automation than Kermit. I have found that certain USB-to-serial adapters by Gearmo will actually run at up to 2Mbps on a serial line! Look for the ones on their spec pages with a FTDI chipset rated at 920Kbps. It turns out they can successfully be driven faster, especially if gensio s relpkt is used. I ve personally verified 2Mbps operation (Linux port speed 2000000) on Gearmo s USA-FTDI2X and the USA-FTDI4X. (I haven t seen any single-port options from Gearmo with the 920Kbps chipset, but they may exist). Still, even at 2Mbps, speed may well be a limiting factor with some applications. If what you need is a console and some textual or batch data, it s probably fine. If you are sending 500GB backup files, you might look for something else. In theory, this USB to RS-422 adapter should work at 10Mbps, but I haven t tried it. But if the speed works, running a dedicated application over a serial link could be a nice and fairly secure option. One of the benefits of the airgapped approach is that data never leaves unless you are physically aware of transporting a USB stick. Of course, you may not be physically aware of what is ON that stick in the event of a compromise. This could easily be solved with a serial approach by, say, only plugging in the cable when you have data to transfer.

Data diodes A traditional diode lets electrical current flow in only one direction. A data diode is the same concept, but for data: a hardware device that allows data to flow in only one direction. This could be useful, for instance, in the tax records system that should only receive data, or the industrial system that should only send it. Wikipedia claims that the simplest kind of data diode is a fiber link with transceivers connected in only one direction. I think you could go one simpler: a serial cable with only ground and TX connected at one end, wired to ground and RX at the other. (I haven t tried this.) This approach does have some challenges:
  • Many existing protocols assume a bidirectional link and won t be usable
  • There is a challenge of confirming data was successfully received. For a situation like telemetry, maybe it doesn t matter; another observation will come along in a minute. But for sending important documents, one wants to make sure they were properly received.
In some cases, the solution might be simple. For instance, with telemetry, just writing out data down the serial port in a simple format may be enough. For sending files, various mitigations, such as sending them multiple times, etc., might help. You might also look into FEC-supporting infrastructure such as blkar and flute, but these don t provide an absolute guarantee. There is no perfect solution to knowing when a file has been successfully received if the data communication is entirely one-way.

Audio transport I hinted above that minimodem and gensio both are software audio modems. That is, you could literally use speakers and microphones, or alternatively audio cables, as a means of getting data into or out of these systems. This is pretty limited; it is 1200bps, and often half-duplex, and could literally be disrupted by barking dogs in some setups. But hey, it s an option.

Airgapped with USB transport This is the scenario I began with, and named some of the possible pitfalls above as well. In addition to those, note also that USB drives aren t necessarily known for their error-free longevity. Be prepared for failure.

Concluding thoughts I wanted to lay out a few things in this post. First, that simply being airgapped is generally a step forward in security, but is not perfect. Secondly, that both physical and logical separation matter. And finally, that while tools like NNCP can make airgapped-with-USB-drive-transport a doable reality, there are also alternatives worth considering especially serial ports, firewalled hard-wired Ethernet, data diodes, and so forth. I think serial links, in particular, have been largely forgotten these days. Note: This article also appears on my website, where it may be periodically updated.

12 September 2023

John Goerzen: A Maze of Twisty Little Pixels, All Tiny

Two years ago, I wrote Managing an External Display on Linux Shouldn t Be This Hard. Happily, since I wrote that post, most of those issues have been resolved. But then you throw HiDPI into the mix and it all goes wonky. If you re running X11, basically the story is that you can change the scale factor, but it only takes effect on newly-launched applications (which means a logout/in because some of your applications you can t really re-launch). That is a problem if, like me, you sometimes connect an external display that is HiDPI, sometimes not, or your internal display is HiDPI but others aren t. Wayland is far better, supporting on-the-fly resizes quite nicely. I ve had two devices with HiDPI displays: a Surface Go 2, and a work-issued Thinkpad. The Surface Go 2 is my ultraportable Linux tablet. I use it sparingly at home, and rarely with an external display. I just put Gnome on it, in part because Gnome had better on-screen keyboard support at the time, and left it at that. On the work-issued Thinkpad, I really wanted to run KDE thanks to its tiling support (I wound up using bismuth with it). KDE was buggy with Wayland at the time, so I just stuck with X11 and ran my HiDPI displays at lower resolutions and lived with the fuzziness. But now that I have a Framework laptop with a HiDPI screen, I wanted to get this right. I tried both Gnome and KDE. Here are my observations with both: Gnome I used PaperWM with Gnome. PaperWM is a tiling manager with a unique horizontal ribbon approach. It grew on me; I think I would be equally at home, or maybe even prefer it, to my usual xmonad-style approach. Editing the active window border color required editing ~/.local/share/gnome-shell/extensions/paperwm@hedning:matrix.org/stylesheet.css and inserting background-color and border-color items in the paperwm-selection section. Gnome continues to have an absolutely terrible picture for configuring things. It has no less than four places to make changes (Settings, Tweaks, Extensions, and dconf-editor). In many cases, configuration for a given thing is split between Settings and Tweaks, and sometimes even with Extensions, and then there are sometimes options that are only visible in dconf. That is, where the Gnome people have even allowed something to be configurable. Gnome installs a power manager by default. It offers three options: performance, balanced, and saver. There is no explanation of the difference between them. None. What is it setting when I change the pref? A maximum frequency? A scaling governor? A balance between performance and efficiency cores? Not only that, but there s no way to tell it to just use performance when plugged in and balanced or saver when on battery. In an issue about adding that, a Gnome dev wrote We re not going to add a preference just because you want one . KDE, on the other hand, aside from not mucking with your system s power settings in this way, has a nice panel with on AC and on battery and you can very easily tweak various settings accordingly. The hostile attitude from the Gnome developers in that thread was a real turnoff. While Gnome has excellent support for Wayland, it doesn t (directly) support fractional scaling. That is, you can set it to 100%, 200%, and so forth, but no 150%. Well, unless you manage to discover that you can run gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']" first. (Oh wait, does that make a FIFTH settings tool? Why yes it does.) Despite its name, that allows you to select fractional scaling under Wayland. For X11 apps, they will be blurry, a problem that is optional under KDE (more on that below). Gnome won t show the battery life time remaining on the task bar. Yikes. An extension might work in some cases. Not only that, but the Gnome battery icon frequently failed to indicate AC charging when AC was connected, a problem that didn t exist on KDE. Both Gnome and KDE support night light (warmer color temperatures at night), but Gnome s often didn t change when it should have, or changed on one display but not the other. The appindicator extension is pretty much required, as otherwise a number of applications (eg, Nextcloud) don t have their icon display anywhere. It does, however, generate a significant amount of log spam. There may be a fix for this. Unlike KDE, which has a nice inobtrusive popup asking what to do, Gnome silently automounts USB sticks when inserted. This is often wrong; for instance, if I m about to dd a Debian installer to it, I definitely don t want it mounted. I learned this the hard way. It is particularly annoying because in a GUI, there is no reason to mount a drive before the user tries to access it anyhow. It looks like there is a dconf setting, but then to actually mount a drive you have to open up Files (because OF COURSE Gnome doesn t have a nice removable-drives icon like KDE does) and it s a bunch of annoying clicks, and I didn t want to use the GUI file manager anyway. Same for unmounting; two clicks in KDE thanks to the task bar icon, but in Gnome you have to open up the file manager, unmount the drive, close the file manager again, etc. The ssh agent on Gnome doesn t start up for a Wayland session, though this is easily enough worked around. The reason I completely soured on Gnome is that after using it for awhile, I noticed my laptop fans spinning up. One core would be constantly busy. It was busy with a kworker events task, something to do with sound events. Logging out would resolve it. I believe it to be a Gnome shell issue. I could find no resolution to this, and am unwilling to tolerate the decreased battery life this implies. The Gnome summary: it looks nice out of the box, but you quickly realize that this is something of a paper-thin illusion when you try to actually use it regularly. KDE The KDE experience on Wayland was a little bit opposite of Gnome. While with Gnome, things start out looking great but you realize there are some serious issues (especially battery-eating), with KDE things start out looking a tad rough but you realize you can trivially fix them and wind up with a very solid system. Compared to Gnome, KDE never had a battery-draining problem. It will show me estimated battery time remaining if I want it to. It will do whatever I want it to when I insert a USB drive. It doesn t muck with my CPU power settings, and lets me easily define on AC vs on battery settings for things like suspend when idle. KDE supports fractional scaling, to any arbitrary setting (even with the gsettings thing above, Gnome still only supports it in 25% increments). Then the question is what to do with X11-only applications. KDE offers two choices. The first is Scaled by the system , which is also the only option for Gnome. With that setting, the X11 apps effectively run natively at 100% and then are scaled up within Wayland, giving them a blurry appearance on HiDPI displays. The advantage is that the scaling happens within Wayland, so the size of the app will always be correct even when the Wayland scaling factor changes. The other option is Apply scaling themselves , which uses native X11 scaling. This lets most X11 apps display crisp and sharp, but then if the system scaling changes, due to limitations of X11, you ll have to restart the X apps to get them to be the correct size. I appreciate the choice, and use Apply scaling by themselves because only a few of my apps aren t Wayland-aware. I did encounter a few bugs in KDE under Wayland: sddm, the display manager, would be slow to stop and cause a long delay on shutdown or reboot. This seems to be a known issue with sddm and Wayland, and is easily worked around by adding a systemd TimeoutStopSec. Konsole, the KDE terminal emulator, has weird display artifacts when using fractional scaling under Wayland. I applied some patches and rebuilt Konsole and then all was fine. The Bismuth tiling extension has some pretty weird behavior under Wayland, but a 1-character patch fixes it. On Debian, KDE mysteriously installed Pulseaudio instead of Debian s new default Pipewire, but that was easily fixed as well (and Pulseaudio also works fine). Conclusions I m sticking with KDE. Given that I couldn t figure out how to stop Gnome from deciding to eat enough battery to make my fan come on, the decision wasn t hard. But even if it weren t for that, I d have gone with KDE. Once a couple of things were patched, the experience is solid, fast, and flawless. Emacs (my main X11-only application) looks great with the self-scaling in KDE. Gimp, which I use occasionally, was terrible with the blurry scaling in Gnome. Update: Corrected the gsettings command

11 September 2023

John Goerzen: For the First Time In Years, I m Excited By My Computer Purchase

Some decades back, when I d buy a new PC, it would unlock new capabilities. Maybe AGP video, or a PCMCIA slot, or, heck, sound. Nowadays, mostly new hardware means things get a bit faster or less crashy, or I have some more space for files. It s good and useful, but sorta meh. Not this purchase. Cory Doctorow wrote about the Framework laptop in 2021:
There s no tape. There s no glue. Every part has a QR code that you can shoot with your phone to go to a service manual that has simple-to-follow instructions for installing, removing and replacing it. Every part is labeled in English, too! The screen is replaceable. The keyboard is replaceable. The touchpad is replaceable. Removing the battery and replacing it takes less than five minutes. The computer actually ships with a screwdriver.
Framework had been on my radar for awhile. But for various reasons, when I was ready to purchase, I didn t; either the waitlist was long, or they didn t have the specs I wanted. Lately my aging laptop with 8GB RAM started OOMing (running out of RAM). My desktop had developed a tendency to hard hang about once a month, and I researched replacing it, but the cost was too high to justify. But when I looked into the Framework, I thought: this thing could replace both. It is a real shift in perspective to have a laptop that is nearly as upgradable as a desktop, and can be specced out to exactly what I wanted: 2TB storage and 64GB RAM. And still cheaper than a Macbook or Thinkpad with far lower specs, because the Framework uses off-the-shelf components as much as possible. Cory Doctorow wrote, in The Framework is the most exciting laptop I ve ever broken:
The Framework works beautifully, but it fails even better Framework has designed a small, powerful, lightweight machine it works well. But they ve also designed a computer that, when you drop it, you can fix yourself. That attention to graceful failure saved my ass.
I like small laptops, so I ordered the Framework 13. I loaded it up with the 64GB RAM and 2TB SSD I wanted. Frameworks have four configurable ports, which are also hot-swappable. I ordered two USB-C, one USB-A, and one HDMI. I put them in my preferred spots (one USB-C on each side for easy docking and charging). I put Debian on it, and it all Just Worked. Perfectly. Now, I orderd the DIY version. I hesitated about this I HATE working with laptops because they re all so hard, even though I KNEW this one was different but went for it, because my preferred specs weren t available in a pre-assembled model. I m glad I did that, because assembly was actually FUN. I got my box. I opened it. There was the bottom shell with the motherboard and CPU installed. Here are the RAM sticks. There s the SSD. A minute or two with each has them installed. Put the bezel on the screen, attach the keyboard it has magnets to guide it into place and boom, ready to go. Less than 30 minutes to assemble a laptop nearly from scratch. It was easier than assembling most desktops. So now, for the first time, my main computing device is a laptop. Rather than having a desktop and a laptop, I just have a laptop. I ll be able to upgrade parts of it later if I want to. I can rearrange the ports. And I can take all my most important files with me. I m quite pleased!

4 August 2023

John Goerzen: Try the Last Internet Kermit Server

$ grep kermit /etc/services
kermit          1649/tcp
What is this mysterious protocol? Who uses it and what is its story? This story is a winding one, beginning in 1981. Kermit is, to the best of my knowledge, the oldest actively-maintained software package with an original developer still participating. It is also a scripting language, an Internet server, a (scriptable!) SSH client, and a file transfer protocol. And my first use of it was talking to my HP-48GX calculator over a 9600bps serial link. Yes, that calculator had a Kermit server built in. But let s back up and talk about serial ports and Modems.

Serial Ports and Modems In my piece The PC & Internet Revolution in Rural America, I recently talked about getting a modem what an excitement it was to get one! I realize that many people today have never used a serial line or a modem, so let s briefly discuss. Before Ethernet and Wifi took off in a big way, in the 1990s-2000s, two computers would talk to each other over a serial line and a modem. By modern standards, these were slow; 300bps was a common early speed. They also (at least in the beginning) had no kind of error checking. Characters could be dropped or changed. Sometimes even those speeds were faster than the receiving device could handle. Some serial links were 7-bit, and wouldn t even pass all 7-bit characters; for instance, sending a Ctrl-S could lock up a remote until you sent Ctrl-Q. And computers back in the 1970s and 1980s weren t as uniform as they are now. They used different character sets, different line endings, and even had different notions of what a file is. Today s notion of a file as whatever set of binary bytes an application wants it to be was by no means universal; some systems treated a file as a set of fixed-length records, for instance. So there were a lot of challenges in reliably moving files between systems. Kermit was introduced to reliably move files between systems using serial lines, automatically working around the varieties of serial lines, detecting errors and retransmitting, managing transmit speeds, and adapting between architectures as appropriate. Quite a task! And perhaps this explains why it was supported on a calculator with a primitive CPU by today s standards. Serial communication, by the way, is still commonplace, though now it isn t prominent in everyone s home PC setup. It s used a lot in industrial equipment, avionics, embedded systems, and so forth. The key point about serial lines is that they aren t inherently multiplexed or packetized. Whereas an Ethernet network is designed to let many dozens of applications use it at once, a serial line typically runs only one (unless it is something like PPP, which is designed to do multiplexing over the serial line). So it become useful to be able to both log in to a machine and transfer files with it. That is, incidentally, still useful today.

Kermit and XModem/ZModem I wondered: why did we end up with two diverging sets of protocols, created at about the same time? The Kermit website has the answer: essentially, BBSs could assume 8-bit clean connections, so XModem and ZModem had much less complexity to worry about. Kermit, on the other hand, was highly flexible. Although ZModem came out a few years before Kermit had its performance optimizations, by about 1993 Kermit was on par or faster than ZModem.

Beyond serial ports As LANs and the Internet came to be popular, people started to use telnet (and later ssh) to connect to remote systems, rather than serial lines and modems. FTP was an early way to transfer files across the Internet, but it had its challenges. Kermit added telnet support, as well as later support for ssh (as a wrapper around the ssh command you already know). Now you could easily log in to a machine and exchange files with it without missing a beat. And so it was that the Internet Kermit Service Daemon (IKSD) came into existence. It allows a person to set up a Kermit server, which can authenticate against local accounts or present anonymous access akin to FTP. And so I established the quux.org Kermit Server, which runs the Unix IKSD (part of the Debian ckermit package).

Trying Out the quux.org Kermit Server There are more instructions on the quux.org Kermit Server page! You can connect to it using either telnet or the kermit program. I won t duplicate all of the information here, but here s what it looks like to connect:
$ kermit
C-Kermit 10.0 Beta.08, 15 Dec 2022, for Linux+SSL (64-bit)
 Copyright (C) 1985, 2022,
  Trustees of Columbia University in the City of New York.
  Open Source 3-clause BSD license since 2011.
Type ? or HELP for help.
(/tmp/t/) C-Kermit>iksd /user:anonymous kermit.quux.org
 DNS Lookup...  Trying 135.148.101.37...  Reverse DNS Lookup... (OK)
Connecting to host glockenspiel.complete.org:1649
 Escape character: Ctrl-\ (ASCII 28, FS): enabled
Type the escape character followed by C to get back,
or followed by ? to see other options.
----------------------------------------------------

 >>> Welcome to the Internet Kermit Service at kermit.quux.org <<<

To log in, use 'anonymous' as the username, and any non-empty password

Internet Kermit Service ready at Fri Aug  4 22:32:17 2023
C-Kermit 10.0 Beta.08, 15 Dec 2022
kermit

Enter e-mail address as Password: [redacted]

Anonymous login.

You are now connected to the quux kermit server.

Try commands like HELP, cd gopher, dir, and the like.  Use INTRO
for a nice introduction.

(~/) IKSD>
You can even recursively download the entire Kermit mirror: over 1GB of files!

Conclusions So, have fun. Enjoy this experience from the 1980s. And note that Kermit also makes a better ssh client than ssh in a lot of ways; see ideas on my Kermit page. This page also has a permanent home on my website, where it may be periodically updated.

12 July 2023

John Goerzen: Backing Up and Archiving to Removable Media: dar vs. git-annex

This is the fourth in a series about archiving to removable media (optical discs such as BD-Rs and DVD+Rs or portable hard drives). Here are the first three parts: I want to state at the outset that this is not a general review of dar or git-annex. This is an analysis of how those tools stack up to a particular use case. Neither tool focuses on this use case, and I note it is particularly far from the more common uses of git-annex. For instance, both tools offer support for cloud storage providers and special support for ssh targets, but neither of those are in-scope for this post. Comparison Matrix As part of this project, I made a comparison matrix which includes not just dar and git-annex, but also backuppc, bacula/bareos, and borg. This may give you some good context, and also some reference for other projects in this general space. Reviewing the Goals I identified some goals in part 1. They are all valid. As I have thought through the project more, I feel like I should condense them into a simpler ordered list, with the first being the most important. I omit some things here that both dar and git-annex can do (updates/incrementals, for instance; see the expanded goals list in part 1). Here they are:
  1. The tool must not modify the source data in any way.
  2. It must be simple to create or update an archive. Processes that require a lot of manual work, are flaky, or are difficult to do correctly, are unlikely to be done correctly and often. If it s easy to do right, I m more likely to do it. Put another way: an archive never created can never be restored.
  3. The chances of a successful restore by someone that is not me, that doesn t know Linux, and is at least 10 years in the future, should be maximized. This implies a simple toolset, solid support for dealing with media errors or missing media, etc.
  4. Both a partial point-in-time restore and a full restore should be possible. The full restore must, at minimum, provide a consistent directory tree; that is, deletions, additions, and moves over time must be accurately reflected. Preserving modification times is a near-requirement, and preserving hard links, symbolic links, and other POSIX metadata is a significant nice-to-have.
  5. There must be a strategy to provide redundancy; for instance, a way for one set of archive discs to be offsite, another onsite, and the two to be periodically swapped.
  6. Use storage space efficiently.
Let s take a look at how the two stack up against these goals. Goal 1: Not modifying source data With dar, this is accomplished. dar --create does not modify source data (and even has a mode to avoid updating atime) so that s done. git-annex normally does modify source data, in that it typically replaces files with symlinks into its hash-indexed storage directory. It can instead use hardlinks. In either case, you will wind up with files that have identical content (but may have originally been separate, non-linked files) linked together with git-annex. This would cause me trouble, as well as run the risk of modifying timestamps. So instead of just storing my data under a git-annex repo as is its most common case, I use the directory special remote with importtree=yes to sort of import the data in. This, plus my desire to have the repos sensible and usable on non-POSIX operating systems, accounts for a chunk of the git-annex complexity you see here. You wouldn t normally see as much complexity with git-annex (though, as you will see, even without the directory special remote, dar still has less complexity). Winner: dar, though I demonstrated a working approach with git-annex as well. Goal 2: Simplicity of creating or updating an archive Let us simply start by recognizing this: Both tools have a lot of power, but I must say, it is easier to wrap my head around what dar is doing than what git-annex is doing. Everything dar does is with files: here are the files to archive, here is an archive file, here is a detached (isolated) catalog. It is very straightforward. It took me far less time to develop my dar page than my git-annex page, despite having existing familiarity with both tools. As I pointed out in part 2, I still don t fully understand how git-annex syncs metadata. Unsolved mysteries from that post include why the two git-annex drives had no idea what was on the other drives, and why the export operation silenty did nothing. Additionally, for the optical disc case, I had to create a restricted-size filesystem/dataset for git-annex to write into in order to get the desired size limit. Looking at the optical disc case, dar has a lot of nice infrastructure built in. With pause and execute, it can very easily be combined with disc burning operations. slice will automatically limit the size of a given slice, regardless of how much disk space is free, meaning that the git-annex tricks of creating smaller filesystems/datasets are unnecessary with dar. To create an initial full backup with dar, you just give it the size of the device, and it will automatically split up the archive, with hooks to integrate for burning or changing drives. About as easy as you could get. With git-annex, you would run the commands to have it fill up the initial filesystem, then burn the disc (or remove the drive), then run the commands to create another repo on the second filesystem, and so forth. With hard drives, with git-annex you would do something similar; let it fill up a repo on a drive, and if it exits with a space error, swap in the next. With dar, you would slice as with an optical disk. Dar s slicing is less convenient in this case, though, as it assumes every drive is the same size and yours may not be. You could work around that by using a slice size no bigger than the smallest drive, and putting multiple slices on larger drives if need be. If a single drive is large enough to hold your entire data set, though, you need not worry about this with either tool. Here s a warning about git-annex: it won t store anything beneath directories named .git. My use case doesn t have many of those. If your use case does, you re going to have to figure out what to do about it. Maybe rename them to something else while the backup runs? In any case, it is simply a fact that git-annex cannot back up git repositories, and this cuts against being able to back up things correctly. Another point is that git-annex has scalability concerns. If your archive set gets into the hundreds of thousands of files, you may need to split it into multiple distinct git-annex repositories. If this occurs and it will in my case it may serve to dull the shine of some of git-annex s features such as location tracking. A detour down the update strategies path Update strategies get a little more complicated with both. First, let s consider: what exactly should our update strategy be? For optical discs, I might consider doing a monthly update. I could burn a disc (or more than one, if needed) regardless of how much data is going to go onto it, because I want no more than a month s data lost in any case. An alternative might be to spool up data until I have a disc s worth, and then write that, but that could possibly mean months between actually burning a disc. Probably not good. For removable drives, we re unlikely to use a new drive each month. So there it makes sense to continue writing to the drive until it s full. Now we have a choice: do we write and preserve each month s updates, or do we eliminate intermediate changes and just keep the most recent data? With both tools, the monthly burn of an optical disc turns out to be very similar to the initial full backup to optical disc. The considerations for spanning multiple discs are the same. With both tools, we would presumably want to keep some metadata on the host so that we don t have to refer to a previous disc to know what was burned. In the dar case, that would be an isolated catalog. For git-annex, it would be a metadata-only repo. I illustrated both of these in parts 2 and 3. Now, for hard drives. Assuming we want to continue preserving each month s updates, with dar, we could just write an incremental to the drive each month. Assuming that the size of the incremental is likely far smaller than the size of the drive, you could easily enough do this. More fancily, you could look at the free space on the drive and tell dar to use that as the size of the first slice. For git-annex, you simply avoid calling drop/dropunused. This will cause the old versions of files to accumulate in .git/annex. You can get at them with git annex commands. This may imply some degree of elevated risk, as you are modifying metadata in the repo each month, which with dar you could chmod a-w or even chattr +i the archive files once written. Hopefully this elevated risk is low. If you don t want to preserve each month s updates, with dar, you could just write an incremental each month that is based on the previous drive s last backup, overwriting the previous. That implies some risk of drive failure during the time the overwrite is happening. Alternatively, you could write an incremental and then use dar to merge it into the previous incremental, creating a new one. This implies some degree of extra space needed (maybe on a different filesystem) while doing this. With git-annex, you would use drop/dropunused as I demonstrated in part 2. The winner for goal 2 is dar. The gap is biggest with optical discs and more narrow with hard drives, thanks to git-annex s different options for updates. Still, I would be more confident I got it right with dar. Goal 3: Greatest chance of successful restore in the distant future If you use git-annex like I suggested in part 2, you will have a set of discs or drives that contain a folder structure with plain files in them. These files can be opened without any additional tools at all. For sheer ability to get at raw data, git-annex has the edge. When you talk about getting a consistent full restore without multiple copies of renamed files or deleted files coming back then you are going to need to use git-annex to do that. Both git-annex and dar provide binaries. Dar provides a win64 version on its Sourceforge page. On the author s releases site, you can find the win64 version in addition to a statically-linked x86_64 version for Linux. The git-annex install page mostly directs you to package managers for your distribution, but the downloads page also lists builds for Linux, Windows, and Mac OS X. The Linux version is dynamic, but ships most of its .so files alongside. The Windows version requires cygwin.dll, and all versions require you to also install git itself. Both tools are in package managers for Mac OS X, Debian, FreeBSD, and so forth. Let s just say that you are likely to be able to run either one on a future Windows or Linux system. There are also GUI frontends for dar, such as DARGUI and gdar. This can increase the chances of a future person being able to use the software easily. git-annex has the assistant, which is based on a different use case and probably not directly helpful here. When it comes to doing the actual restore process using software, dar provides the easier process here. For dealing with media errors and the like, dar can integrate with par2. While technically you could use par2 against the files git-annex writes, that s more cumbersome to manage to the point that it is likely not to be done. Both tools can deal reasonably with missing media entirely. I m going to give the edge on this one to git-annex; while dar does provide the easier restore and superior tools for recovering from media errors, the ability to access raw data as plain files without any tools at all is quite compelling. I believe it is the most critical advantage git-annex has, and it s a big one. Goal 4: Support high-fidelity partial and full restores Both tools make it possible to do a full restore reflecting deletions, additions, and so forth. Dar, as noted, is easier for this, but it is possible with git-annex. So, both can achieve a consistent restore. Part of this goal deals with fidelity of the restore: preserving timestamps, hard and symbolic links, ownership, permissions, etc. Of these, timestamps are the most important for me. git-annex can t do any of that. dar does all of it. Some of this can be worked around using mtree as I documented in part 2. However, that implies a need to also provide mtree on the discs for future users, and I m not sure mtree really exists for Windows. It also cuts against the argument that git-annex discs can be used without any tools. It is true, they can, but all you will get is filename and content; no accurate date. Timestamps are often highly relevant for everything from photos to finding an elusive document or record. Winner: dar. Goal 5: Supporting backup strategies with redundancy My main goal here is to have two separate backup sets: one that is offsite, and one that is onsite. Depending on the strategy and media, they might just always stay that way, or periodically rotate. For instance, with optical discs, you might just burn two copies of every disc and store one at each place. For hard drives, since you will be updating the content of them, you might swap them periodically. This is possible with both tools. With both tools, if using the optical disc scheme I laid out, you can just burn two identical copies of each disc. With the hard drive case, with dar, you can keep two directories of isolated catalogs, one for each drive set. A little identifier file on each drive will let you know which set to use. git-annex can track locations itself. As I demonstrated in part 2, you can make each drive its own repo, add all drives from a given drive set to a git-annex group. When initializing a drive, you tell git-annex what group it s a prt of. From then on, git-annex knows what content is in each group and will add whatever a given drive s group needs to that drive. It s possible to do this with both, but the winner here is git-annex. Goal 6: Efficient use of storage Here are situations in which one or the other will be more efficient: The winner depends on your particular situation. Other notes While not part of the goals above, dar is capable of using tapes directly. While not as common, they are often used in communities of people that archive lots of data. Conclusions Overall, dar is the winner for me. It is simpler in most areas, easier to get correct, and scales very well. git-annex does, however, have some quite compelling points. Being able to access files as plain files is huge, and its location tracking is nicer than dar s, even when using dar_manager. Both tools are excellent and I recommend them both and for more than the particular scenario shown here. Both have fantastic and responsive authors.

17 June 2023

John Goerzen: Using dar for Data Archiving

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I d evaluate git-annex and dar. The second post evaluated git-annex, and now it s time to look at dar. The series will conclude with a post comparing git-annex with dar. What is dar? I could open with the same thing I did with git-annex, just changing the name of the program: [dar] is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar s homepage lays out a comprehensive list of features, which I will try to summarize here. So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it. Isolated cataloges aren t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking. Walkthrough: Creating the first archive As with the git-annex walkthrough, I ll set some variables to make it easy to remember: OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren t familiar to everyone, I will use the long-form ones in these examples. Here s how we create our initial full backup. I ll explain the parameters below:
$ dar \
--verbose \
--create $DRIVE/bak1 \
--on-fly-isolate $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
Let s look at each of these parameters: This same command could have been written with short options as:
$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR
What does it look like while running? Here s an excerpt:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted]
Finished writing to file 1, ready to continue ? [return = YES Esc = NO]
...
Writing down archive contents...
Closing the escape layer...
Writing down the first archive terminator...
Writing down archive trailer...
Writing down the second archive terminator...
Closing archive low layer...
Archive is closed.
--------------------------------------------
581 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
0 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
0 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 581 inode(s)
--------------------------------------------
Making room in memory (releasing memory used by archive of reference)...
Now performing on-fly isolation...
...
That was easy! Let s look at the contents of the backup directory:
$ ls -lh $DRIVE
total 3.7G
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar
-rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar
And the isolated catalog:
$ ls -lh $CATDIR
total 37K
-rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar
The isolated catalog is stored compressed automatically. Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data. Walkthrough: Inspecting the saved archive Can dar tell us which slice contains a given file? Sure:
$ dar --list $DRIVE/bak1 --list-format=slicing less
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
...
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
1-2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
...
This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.
$ dar --list $DRIVE/bak1 less
[Data ][D][ EA ][FSA][Compr][S] Permission User Group Size Date filename
--------------------------------+------------+-------+-------+---------+-------------------------------+------------
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 24 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 16 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 22 Mio Mon Mar 5 07:58:09 2018 [redacted]
These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format. Walkthrough: updates As with git-annex, I ve made some changes in the source directory: moved a file, added another, and deleted one. Let s create an incremental backup now:
$ dar \
--verbose \
--create $DRIVE/bak2 \
--on-fly-isolate $CATDIR/bak2 \
--ref $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here. Here s what I did to the $SOURCEDIR: Let s see if dar s command output matches this:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp
Adding folder to archive: [redacted]
Saving Filesystem Specific Attributes for [redacted]
Adding reference to files that have been destroyed since reference backup...
...
--------------------------------------------
3 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
578 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
2 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 583
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 3 inode(s)
--------------------------------------------
...
Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out. Let s see the files that were created:
$ ls -lh $DRIVE/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar
$ ls -lh $CATDIR/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar
What does list look like now?
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
[ ][ ] [---][-----][X] -rwxr--r-- [redacted]
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- file01-unchanged
...
[--- REMOVED ENTRY ----][redacted]
[--- REMOVED ENTRY ----][redacted]
Here I show an example of:
  1. A file that was not changed from the initial backup. Its presence was simply noted, but because we re doing an incremental, the data wasn t saved.
  2. A file that is saved in this incremental, on slice 1.
  3. The two deleted files
Walkthrough: dar_manager As we ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager s when parameter, we can restore files as of a particular date. Let s try it out. First, we create our database:
$ dar_manager --create $DARDB
$ dar_manager --base $DARDB --add $DRIVE/bak1
Auto detecting min-digits to be 2
$ dar_manager --base $DARDB --add $DRIVE/bak2
Auto detecting min-digits to be 2
Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It s important to add the catalogs in order. Let s do some quick experimentation with dar_manager:
$ dar_manager -v --base $DARDB --list
Decompressing and loading database to memory...
dar path :
dar options :
database version : 6
compression used : gzip
compression level: 9 archive # path basename
------------+--------------+---------------
1 /acrypt/no-backup/jgoerzen/dar-testing/drive bak1
2 /acrypt/no-backup/jgoerzen/dar-testing/drive bak2
$ dar_manager --base $DARDB --stat
archive # most recent/total data most recent/total EA
--------------+-------------------------+-----------------------
1 580/581 0/0
2 3/3 0/0
The list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename. Now let s see just what files are saved in archive , the incremental:
$ dar_manager --base $DARDB --used 2
[ Saved ][ ] [redacted]
[ Saved ][ ] file01-unchanged
[ Saved ][ ] cp
Now we can also where a file is stored. Here s one that was saved in the full backup and unmodified in the incremental:
$ dar_manager --base $DARDB --file [redacted]
1 Fri Jun 16 19:15:12 2023 saved absent
2 Fri Jun 16 19:15:12 2023 present absent
(The absent at the end refers to extended attributes that the file didn t have) Similarly, for files that were added or removed, they ll be listed only at the appropriate place. Walkthrough: Restoration I m not going to repeat the author s full restoration with dar page, but here are some quick examples. A simple way of doing everything is using incrementals for the whole series. To do that, you d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options: If you get fancy for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs. Anyhow, let s do a restore using just dar. I ll make a $RESTOREDIR and do it that way.
$ dar \
--verbose \
--extract $DRIVE/bak1 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
This execute lets us see how dar works; this is an illustration of the power it has (above pause); it s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it s not strictly necessary, as dar will prompt you for slices it needs if they re not mounted. Anyhow, you ll see it first reading the last slice, which contains the catalog, then reading from the beginning. Here we go:
Auto detecting min-digits to be 2
Opening archive bak1 ...
Opening the archive using the multi-slice abstraction layer...
Ready for slice 10. Press Enter
...
Loading catalogue into memory...
Locating archive contents...
Reading archive contents...
File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES Esc = NO]
Continuing...
Restoring file's data: [redacted]
Restoring file's FSA: [redacted]
Ready for slice 1. Press Enter
...
Ready for slice 2. Press Enter
...
--------------------------------------------
581 inode(s) restored
including 0 hard link(s)
0 inode(s) not restored (not saved in archive)
0 inode(s) not restored (overwriting policy decision)
0 inode(s) ignored (excluded by filters)
0 inode(s) failed to restore (filesystem error)
0 inode(s) deleted
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA restored for 0 inode(s)
FSA restored for 0 inode(s)
--------------------------------------------
The warning is because I m not doing the extraction as root, which limits dar s ability to fully restore ownership data. OK, now the incremental:
$ dar \
--verbose \
--extract $DRIVE/bak2 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
...
Ready for slice 1. Press Enter
...
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory]
Removing file (reason is file recorded as removed in archive): [redacted file]
Removing file (reason is file recorded as removed in archive): [redacted file]
This all looks right! Now how about we compare the restore to the original source directory?
$ diff -durN $SOURCEDIR $RESTOREDIR
No changes perfect. We could instead do this restore via a single dar_manager command, though annoyingly, we d have to pass all top-level files/directories to dar_manager restore. But still, it s one command, and basically automates and optimizes the dar restores shown above. Conclusions Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore. A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care for instance, by date then this should be a pretty easy task. I haven t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds tape marks (or escape sequences ) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration. This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I ll discuss that more in the next post.

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

29 May 2023

John Goerzen: Recommendations for Tools for Backing Up and Archiving to Removable Media

I have several TB worth of family photos, videos, and other data. This needs to be backed up and archived. Backups and archives are often thought of as similar. And indeed, they may be done with the same tools at the same time. But the goals differ somewhat: Backups are designed to recover from a disaster that you can fairly rapidly detect. Archives are designed to survive for many years, protecting against disaster not only impacting the original equipment but also the original person that created them. Reflecting on this, it implies that while a nice ZFS snapshot-based scheme that supports twice-hourly backups may be fantastic for that purpose, if you think about things like family members being able to access it if you are incapacitated, or accessibility in a few decades time, it becomes much less appealing for archives. ZFS doesn t have the wide software support that NTFS, FAT, UDF, ISO-9660, etc. do. This post isn t about the pros and cons of the different storage media, nor is it about the pros and cons of cloud storage for archiving; these conversations can readily be found elsewhere. Let s assume, for the point of conversation, that we are considering BD-R optical discs as well as external HDDs, both of which are too small to hold the entire backup set. What would you use for archiving in these circumstances? Establishing goals The goals I have are: I would welcome your ideas for what to use. Below, I ll highlight different approaches I ve looked into and how they stack up. Basic copies of directories The initial approach might be one of simply copying directories across. This would work well if the data set to be archived is smaller than the archival media. In that case, you could just burn or rsync a new copy with every update and be done. Unfortunately, this is much less convenient with data of the size I m dealing with. rsync is unavailable in that case. With some datasets, you could manually design some rsyncs to store individual directories on individual devices, but that gets unwieldy fast and isn t scalable. You could use something like my datapacker program to split the data across multiple discs/drives efficiently. However, updates will be a problem; you d have to re-burn the entire set to get a consistent copy, or rely on external tools like mtree to reflect deletions. Not very convenient in any case. So I won t be using this. tar or zip While you can split tar and zip files across multiple media, they have a lot of issues. GNU tar s incremental mode is clunky and buggy; zip is even worse. tar files can t be read randomly, making it extremely time-consuming to extract just certain files out of a tar file. The only thing going for these formats (and especially zip) is the wide compatibility for restoration. dar Here we start to get into the more interesting tools. Dar is, in my opinion, one of the best Linux tools that few people know about. Since I first wrote about dar in 2008, it s added some interesting new features; among them, binary deltas and cloud storage support. So, dar has quite a few interesting features that I make use of in other ways, and could also be quite helpful here: Additionally, dar comes with a dar_manager program. dar_manager makes a database out of dar catalogs (or archives). This can then be used to identify the precise archive containing a particular version of a particular file. All this combines to make a useful system for archiving. Isolated catalogs are tiny, and it would be easy enough to include the isolated catalogs for the entire set of archives that came before (or even the dar_manager database file) with each new incremental archive. This would make restoration of a particular subset easy. The main thing to address with dar is that you do need dar to extract the archive. Every dar release comes with source code and a win64 build. dar also supports building a statically-linked Linux binary. It would therefore be easy to include win64 binary, Linux binary, and source with every archive run. dar is also a part of multiple Linux and BSD distributions, which are archived around the Internet. I think this provides a reasonable future-proofing to make sure dar archives will still be readable in the future. The other challenge is user ability. While dar is highly portable, it is fundamentally a CLI tool and will require CLI abilities on the part of users. I suspect, though, that I could write up a few pages of instructions to include and make that a reasonably easy process. Not everyone can use a CLI, but I would expect a person that could follow those instructions could be readily-enough found. One other benefit of dar is that it could easily be used with tapes. The LTO series is liked by various hobbyists, though it could pose formidable obstacles to non-hobbyists trying to aceess data in future decades. Additionally, since the archive is a big file, it lends itself to working with par2 to provide redundancy for certain amounts of data corruption. git-annex git-annex is an interesting program that is designed to facilitate managing large sets of data and moving it between repositories. git-annex has particular support for offline archive drives and tracks which drives contain which files. The idea would be to store the data to be archived in a git-annex repository. Then git-annex commands could generate filesystem trees on the external drives (or trees to br burned to read-only media). In a post about using git-annex for blu-ray backups, an earlier thread about DVD-Rs was mentioned. This has a few interesting properties. For one, with due care, the files can be stored on archival media as regular files. There are some different options for how to generate the archives; some of them would place the entire git-annex metadata on each drive/disc. With that arrangement, one could access the individual files without git-annex. With git-annex, one could reconstruct the final (or any intermediate) state of the archive appropriately, handling deltions, renames, etc. You would also easily be able to know where copies of your files are. The practice is somewhat more challenging. Hundreds of thousands of files what I would consider a medium-sized archive can pose some challenges, running into hours-long execution if used in conjunction with the directory special remote (but only minutes-long with a standard git-annex repo). Ruling out the directory special remote, I had thought I could maybe just work with my files in git-annex directly. However, I ran into some challenges with that approach as well. I am uncomfortable with git-annex mucking about with hard links in my source data. While it does try to preserve timestamps in the source data, these are lost on the clones. I wrote up my best effort to work around all this. In a forum post, the author of git-annex comments that I don t think that CDs/DVDs are a particularly good fit for git-annex, but it seems a couple of users have gotten something working. The page he references is Managing a large number of files archived on many pieces of read-only medium. Some of that discussion is a bit dated (for instance, the directory special remote has the importtree feature that implements what was being asked for there), but has some interesting tips. git-annex supplies win64 binaries, and git-annex is included with many distributions as well. So it should be nearly as accessible as dar in the future. Since git-annex would be required to restore a consistent recovery image, similar caveats as with dar apply; CLI experience would be needed, along with some written instructions. Bacula and BareOS Although primarily tape-based archivers, these do also also nominally support drives and optical media. However, they are much more tailored as backup tools, especially with the ability to pull from multiple machines. They require a database and extensive configuration, making them a poor fit for both the creation and future extractability of this project. Conclusions I m going to spend some more time with dar and git-annex, testing them out, and hope to write some future posts about my experiences.

3 May 2023

John Goerzen: Martha the Pilot

Martha, now 5, can t remember a time when she didn t fly periodically. She s come along in our airplane in short flights to a nearby restaurant and long ones to Michigan and South Dakota. All this time, she s been riding in the back seat next to Laura. Martha has been talking excitedly about riding up front next to me. She wants to be my co-pilot . I promised to give her an airplane wing pin when she did one I got from a pilot of a commercial flight when I was a kid. Of course, safety was first, so I wanted to be sure she was old enough to fly there without being a distraction. Last weekend, the moment finally arrived. She was so excited! She brought along her Claire bear aviator, one that I bought for her at an airport a little while back. She buckled in two of her dolls in the back seat. Martha's dolls And then up we went! Martha in the airplane Martha was so proud when we landed! We went to Stearman Field, just a short 10-minute flight away, and parked the plane right in front of the restaurant. We flew back, and Martha thought we should get a photo of her standing on the wing by the door. Great idea! Martha standing on the wing She was happily jabbering about the flight all the way home. She told us several times about the pin she got, watching out the window, watching all the screens in the airplane, and also that she didn t get sick at all despite some turbulence. And, she says, Now just you and I can go flying! Yes, that s something I m looking forward to!

14 April 2023

John Goerzen: Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN

Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there. But these days, the notion of corporate network and home network are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home. And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project. Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for. Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably no . We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all. So this page is fundamentally about making the network work, simply and effectively.

How do we make the Internet work in these scenarios? We re going to combine three concepts:
  1. A VPN, providing fully encrypted and authenticated communication and stable IPs
  2. Mesh Networking, in which devices automatically discover optimal paths to reach each other
  3. Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.
By combining these concepts, we arrive at some nice results:
  • You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
    • Combined with mosh, these sessions will be durable even across moving to other host networks.
    • You could just as well use telnet, because the underlying network should be secure.
  • You don t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that s all you need. hosts.allow could make a comeback!
  • You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.
There might sometimes be tradeoffs. For instance:
  • On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
  • With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can t reach the control plane over the Internet (Internet is down or provider is down)
Some other features that some of the tools provide include:
  • Easy sharing of limited access with friends/guests
  • Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
  • Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.
Let s dive in.

Types of Mesh VPNs I ll go over several types of meshes in this article:
  1. Fully decentralized with automatic hop routing This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I ll cover include Yggdrasil and tinc.
  2. Automatic peer-to-peer with centralized control In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
  3. Roll your own and hybrid approaches This is a grab bag of other ideas; for instance, running Yggdrasil over Tailscale.

Terminology For the sake of consistency, I m going to use common language to discuss things that have different terms in different ecosystems:
  • Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I ll call such a relay a broker . This may or may not be the same system that is a control plane for a tool.
  • All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I m going to call the unencrypted lower layer, whatever it is, the clearnet .

Evaluation Criteria Here are the things I want to see from a solution:
  • Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
  • Flexible, adapting to changes in network topology quickly and automatically.
  • Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
  • Private, minimizing leakage of information or metadata about me and my systems
  • Able to traverse CGNAT without having to use a broker whenever possible
  • A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
  • Fully or nearly fully Open Source
  • Free or very cheap for personal use
  • Wide operating system support, including headless Linux on x86_64 and ARM.

Fully Decentralized VPNs with Automatic Hop Routing Two systems fit this description: Yggdrasil and Tinc. Let s dive in.

Yggdrasil I ll start with Yggdrasil because I ve written so much about it already. It featured in prior posts such as:

Yggdrasil can be a private mesh VPN, or something more Yggdrasil can be a private mesh VPN, just like the other tools covered here. It s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate. Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away. Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn t. The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let s sit with that concept for a moment. Consider:
  • If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that s fine; all the other machines will discover this route to the world and use it when necessary.
  • All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
  • It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it s asked to.
  • Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.
Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.) One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is special , in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.

Meeting neighbors There are two ways that Yggdrasil knows about peers:
  • By broadcast discovery on the local LAN
  • By listening on a specific port (or being told to connect to a specific host/port)
Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.

Unique uses Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices perhaps ones that had never known of each other before that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it. Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.

Yggdrasil: Security and Privacy Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. I ll discuss firewalling more at the end of this article. Basically, you ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device. If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only. Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs. Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.

Yggdrasil: Connectivity and NAT traversal Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance. Yggdrasil s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP it s not, and it is even surprisingly good around bufferbloat. I ve found its performance to be on par with the other tools here, and it works as well as I d expect even on flaky 4G links. Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 and Yggdrasil can even make it speak TLS (even though that s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones). If you connect your system to multiple public peers, there is a chance though a very small one that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.

Yggdrasil: Sharing with friends If you re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that s it. (Well, presumably you also open up your firewall you did follow my advice to set one up, right?) If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers. Yggdrasil doesn t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.

Yggdrasil: DNS There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it s up to you.

Yggdrasil: Source code, pricing, and portability Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made). There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes keys). A peer in this case is just a node with a known clearnet IP address. Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.

Yggdrasil conclusions Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down. The unique added feature of being able to be part of a global mesh is a nice one. The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish). There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil. Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.

tinc tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to an older Yggdrasil without the public option. I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022. Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small. To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy. tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn t make sense for a laptop, and I suspect you d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP. Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh. Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn t as thorough as that of Yggdrasil.

tinc: Security and Privacy As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another. It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.

tinc: Connectivity and NAT traversal I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.

tinc: Sharing with friends tinc has no special support for this, and the difficulty of configuration makes it unlikely you d do this with tinc.

tinc: Source code, pricing, and portability tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy. tinc does not seem to be very actively maintained.

tinc conclusions I haven t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I m not going to be using tinc.

Automatic Peer-to-Peer Mesh VPNs with centralized control These tend to be the options that are frequently discussed. Let s talk about the options.

Tailscale Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh ( tailnet ). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs. When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker. Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard. Tailscale has a number of somewhat unique features in this space:
  • Funnel, which lets you expose ports on your system to the public Internet via the VPN.
  • Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
  • Node sharing, which lets you share a subset of your network with guests
  • A fantastic set of documentation, easily the best of the bunch.
Funnel, in particular, is interesting. With a couple of tailscale serve -style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it s not really a production solution but as a quick and easy way to demonstrate something cool to a friend, it s a neat feature.

Tailscale: Security and Privacy With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale s control plane, or of the credentials you use to access it? Let s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login. If a person does compromise your account or the Tailscale servers themselves, they can t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:
  • Tamper with your Tailscale ACLs, permitting new actions
  • Add new nodes to the network
  • Forcibly remove nodes from the network
  • Enable or disable optional features
Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are. Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below. Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it. Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane. Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.

Tailscale: Connectivity and NAT traversal When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is a fairly brittle setup and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won t work unless your Internet is up and the control plane is reachable. Assuming your Internet is up and Tailscale s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here. Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker. I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it s possible (maybe even probable) that if I d waited a bit longer, it would have sorted this all out.

Tailscale: Sharing with friends I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they re invited to, and cannot receive incoming connections from that network this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.

Tailscale: DNS Tailscale s DNS is called MagicDNS. It runs as a layer atop your standard DNS taking over /etc/resolv.conf on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick. It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds. I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.

Tailscale: Source code, pricing, and portability Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server. Tailscale s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A Community on Github plan also exists, and then there are more business-oriented plans as well. See the pricing page for details. As a small note, I appreciated Tailscale s install script. It properly added Tailscale s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.

Tailscale conclusions Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.

Zerotier Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I m mainly going to describe how it differs from Tailscale. The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface. However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I m not really convinced there s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow. A somewhat obscure feature, but possibly useful, is Zerotier s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.

Zerotier: Security and Privacy The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn t find anything about how sticky an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock. For this reason, I didn t proceed much further in my Zerotier evaluation.

Zerotier: Connectivity and NAT traversal Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it s more limited than Tailscale s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers ( root servers ) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.

Zerotier: Sharing with friends I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.

Zerotier: DNS Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.

Zerotier: Source code, pricing, and portability The client ZeroTier One is available on Github under a custom business source license which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is. The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.

Zerotier conclusions Frankly I don t see much reason to use Zerotier. The virtual Ethernet model seems to be a weird hybrid that doesn t bring much value. I m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that s esoteric enough and also solvable at other layers that it doesn t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don t see much reason to bother with it.

Netmaker Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I m failing to dig it up now. The project is in a bit of an early state; you can sign up for an upcoming closed beta with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide. It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most. I don t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn t seem to be in a state that will meet my needs yet.

Nebula Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker ( lighthouse ) yourself; perhaps on a $5/mo VPS. Due to the poor firewall traversal properties, I didn t do a full evaluation of Nebula, but it still has a very interesting design.

Nebula: Security and Privacy Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:
./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"
So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps. This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter. Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall. You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven t found information on this, but theoretically if NAT traversal isn t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don t know if this is truly the case, however.

Nebula: Connectivity and NAT traversal This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443. I couldn t find a lot of detail on what Nebula s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn t as capable as Tailscale. You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it s less versatile. You have to manually designate what relay to use. It s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.

Nebula: Sharing with friends There is no particular support here.

Nebula: DNS Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can t forward requests to a DNS server that s upstream for your laptop s particular current location. Actually, Nebula s DNS server doesn t forward at all. It also doesn t resolve its own name. The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it s unclear to me if this would make each lighthouse form a complete picture of the network.

Nebula: Source code, pricing, and portability Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.

Nebula conclusions I am attracted to Nebula s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.

Roll your own and hybrid Here is a grab bag of ideas:

Running Yggdrasil over Tailscale One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale. Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.

Public VPN provider for NAT traversal A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that. Be aware that a number of public VPN clients have a kill switch to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad s. You ll need to disable this if you are running a mesh atop it.

Other

Combining with local firewalls For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.

Performance I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.
Tool iperf3 (default) iperf3 -P 10 iperf3 -R
Direct (no VPN) 2406 2406 2764
Wireguard (kernel) 1515 1566 2027
Yggdrasil 892 1126 1105
Tailscale 950 1034 1085
Tinc 296 300 277
You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.

IP collisions When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:
  1. Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
  2. Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources
To summarize the state of play as highlighted in the reviews above:
  • Yggdrasil derives IPv6 addresses from a public key
  • tinc allows any node to set any IP
  • Tailscale IPs aren t user-assignable, but the assignment algorithm is unknown
  • Zerotier allows any IP to be allocated to any node at the control plane
  • I don t know what Netmaker does
  • Nebula IPs are baked into the cert and signed by the CA, but I haven t verified the enforcement algorithm
So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level). So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP? Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node s public key, making collisions technically impractical, if not outright impossible. However, if you use the /64 prefix, a collision may be more possible. Yggdrasil s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with. Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well. So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.

Final thoughts For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale. Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up. I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos! Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it s possible I got something wrong somewhere. Please let me know if you find any errors.
This blog post is a copy of a page on my website. That page may be periodically updated.

2 February 2023

John Goerzen: Using Yggdrasil As an Automatic Mesh Fabric to Connect All Your Docker Containers, VMs, and Servers

Sometimes you might want to run Docker containers on more than one host. Maybe you want to run some at one hosting facility, some at another, and so forth. Maybe you d like run VMs at various places, and let them talk to Docker containers and bare metal servers wherever they are. And maybe you d like to be able to easily migrate any of these from one provider to another. There are all sorts of very complicated ways to set all this stuff up. But there s also a simple one: Yggdrasil. My blog post Make the Internet Yours Again With an Instant Mesh Network explains some of the possibilities of Yggdrasil in general terms. Here I want to show you how to use Yggdrasil to solve some of these issues more specifically. Because Yggdrasil is always Encrypted, some of the security lifting is done for us.

Background Often in Docker, we connect multiple containers to a single network that runs on a given host. That much is easy. Once you start talking about containers on multiple hosts, then you start adding layers and layers of complexity. Once you start talking multiple providers, maybe multiple continents, then the complexity can increase. And, if you want to integrate everything from bare metal servers to VMs into this well, there are ways, but they re not easy. I m a believer in the KISS principle. Let s not make things complex when we don t have to.

Enter Yggdrasil As I ve explained before, Yggdrasil can automatically form a global mesh network. This is pretty cool! As most people use it, they join it to the main Yggdrasil network. But Yggdrasil can be run entirely privately as well. You can run your own private mesh, and that s what we ll talk about here. All we have to do is run Yggdrasil inside each container, VM, server, or whatever. We handle some basics of connectivity, and bam! Everything is host- and location-agnostic.

Setup in Docker The installation of Yggdrasil on a regular system is pretty straightforward. Docker is a bit more complicated for several reasons:
  • It blocks IPv6 inside containers by default
  • The default set of permissions doesn t permit you to set up tunnels inside a container
  • It doesn t typically pass multicast (broadcast) packets
Normally, Yggdrasil could auto-discover peers on a LAN interface. However, aside from some esoteric Docker networking approaches, Docker doesn t permit that. So my approach is going to be setting up one or more Yggdrasil router containers on a given Docker host. All the other containers talk directly to the router container and it s all good.

Basic installation In my Dockerfile, I have something like this:
FROM jgoerzen/debian-base-security:bullseye
RUN echo "deb http://deb.debian.org/debian bullseye-backports main" >> /etc/apt/sources.list && \
    apt-get --allow-releaseinfo-change update && \
    apt-get -y --no-install-recommends -t bullseye-backports install yggdrasil
...
COPY yggdrasil.conf /etc/yggdrasil/
RUN set -x; \
    chown root:yggdrasil /etc/yggdrasil/yggdrasil.conf && \
    chmod 0750 /etc/yggdrasil/yggdrasil.conf && \
    systemctl enable yggdrasil
The magic parameters to docker run to make Yggdrasil work are:
--cap-add=NET_ADMIN --sysctl net.ipv6.conf.all.disable_ipv6=0 --device=/dev/net/tun:/dev/net/tun
This example uses my docker-debian-base images, so if you use them as well, you ll also need to add their parameters. Note that it is NOT necessary to use --privileged. In fact, due to the network namespaces in use in Docker, this command does not let the container modify the host s networking (unless you use --net=host, which I do not recommend). The --sysctl parameter was the result of a lot of banging my head against the wall. Apparently Docker tries to disable IPv6 in the container by default. Annoying.

Configuration of the router container(s) The idea is that the router node (or more than one, if you want redundancy) will be the only ones to have an open incoming port. Although the normal Yggdrasil case of directly detecting peers in a broadcast domain is more convenient and more robust, this can work pretty well too. You can, of course, generate a template yggdrasil.conf with yggdrasil -genconf like usual. Some things to note for this one:
  • You ll want to change Listen to something like Listen: ["tls://[::]:12345"] where 12345 is the port number you ll be listening on.
  • You ll want to disable the MulticastInterfaces entirely by just setting it to [] since it doesn t work anyway.
  • If you expose the port to the Internet, you ll certainly want to firewall it to only authorized peers. Setting AllowedPublicKeys is another useful step.
  • If you have more than one router container on a host, each of them will both Listen and act as a client to the others. See below.

Configuration of the non-router nodes Again, you can start with a simple configuration. Some notes here:
  • You ll want to set Peers to something like Peers: ["tls://routernode:12345"] where routernode is the Docker hostname of the router container, and 12345 is its port number as defined above. If you have more than one local router container, you can simply list them all here. Yggdrasil will then fail over nicely if any one of them go down.
  • Listen should be empty.
  • As above, MulticastInterfaces should be empty.

Using the interfaces At this point, you should be able to ping6 between your containers. If you have multiple hosts running Docker, you can simply set up the router nodes on each to connect to each other. Now you have direct, secure, container-to-container communication that is host-agnostic! You can also set up Yggdrasil on a bare metal server or VM using standard procedures and everything will just talk nicely!

Security notes Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure your internal comms stay private: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. By disabling multicast discovery, you eliminate the chance for random machines on the LAN to join the mesh. By making sure that you firewall off (outside of Yggdrasil) who can connect to a Yggdrasil node with a listening port, you can authorize only your own machines. And, by setting AllowedPublicKeys on the nodes with listening ports, you can authenticate the Yggdrasil peers. Note that part of the benefit of the Yggdrasil mesh is normally that you don t have to propagate a configuration change to every participatory node that s a nice thing in general! You can also run a firewall inside your container (I like firehol for this purpose) and aggressively firewall the IPs that are allowed to connect via the Yggdrasil interface. I like to set a stable interface name like ygg0 in yggdrasil.conf, and then it becomes pretty easy to firewall the services. The Docker parameters that allow Yggdrasil to run are also sufficient to run firehol.

Naming Yggdrasil peers You probably don t want to hard-code Yggdrasil IPs all over the place. There are a few solutions:
  • You could run an internal DNS service
  • You can do a bit of scripting around Docker s --add-host command to add things to /etc/hosts

Other hints & conclusion Here are some other helpful use cases:
  • If you are migrating between hosts, you could leave your reverse proxy up at both hosts, both pointing to the target containers over Yggdrasil. The targets will be automatically found from both sides of the migration while you wait for DNS caches to update and such.
  • This can make services integrate with local networks a lot more painlessly than they might otherwise.
This is just an idea. The point of Yggdrasil is expanding our ideas of what we can do with a network, so here s one such expansion. Have fun!
Note: This post also has a permanent home on my webiste, where it may be periodically updated.

10 December 2022

John Goerzen: Music Playing: Both Whole-House and Mobile

It s been nearly 8 years since I last made choices about music playing. At the time, I picked Logitech Media Server (LMS, aka Slimserver and Squeezebox server) for whole-house audio and Ampache with the DSub Android app. It s time to revisit that approach. Here are the things I m looking for: The current setup Here are the current components: LMS makes an excellent whole-house audio system. I can pull up the webpage (or use an Android app like Squeezer) to browse my music library, queue things up to play, and so forth. I can also create playlists, which it saves as m3u files. This whole setup is boringly reliable. It just works, year in, year out. The main problem with this is that LMS has no real streaming/offline mobile support. It is also a rather dated system, with a painful UI for playlist management, and in general doesn t feel very modern. (It s written largely in Perl also!) So, I paired with it is Ampache. As a streaming player, Ampache is fantastic; I can access it from a web browser, and it will transcode my FLAC files to the quality I ve set in my user prefs. The DSub app for Android is fantastic and remembers my last-play locations and such. The problem is that Ampache doesn t write its playlists back to m3u format, so I can t use them with LMS. I have to therefore maintain all the playlists in LMS, and it has a smallish limit on the number of tracks per playlist. Ampache also doesn t auto-update from LMS playlists, so I have to delete and recreate the playlists catalog periodically to get updates into Ampache. Not fun. The new experiment I m trying out a new system based on these components: This looks a lot more complicated than what I had before, but in reality it only has one additional layer. Since Snapcast is a general audio syncing tool, and Jellyfin doesn t itself output audio, Mopidy and its extensions is the glue . There s a lot to like about this setup. There is one single canonical source for music and playlists. Jellyfin can do a lot more besides music, and its mobile app gives me video access also. The setup, in general, works pretty well. There are a few minor glitches, but nothing huge. For instance, Jellyfin fails to clear the play queue on the mopidy side. But there is one problem, though: when playing a playlist, it is played out of order. Jellyfin itself has the same issue internally, so I m unsure where the bug lies. Rejected option: Jellyfin with jellycli This could be a nice option; instead of mopidy with a plugin, just run jellycli in headless mode as a more native client. It also has the playlist ordering bug, and in addition, fails to play a couple of my albums which Mopidy-Jellyfin handles fine. But, if those bugs were addressed, it has a ton of promise as a simpler glue between Jellyfin and Snapcast than Mopidy. Rejected option: Mopidy-Subidy Plugin with Ampache Mopidy has a Subsonic plugin, and Ampache implements the Subsonic API. This would theoretically let me use a Mopidy client to play things on the whole-house system, coming from the same Ampache system. Although I did get this connected with some trial and error (legacy auth on, API version 1.13.0), it was extremely slow. Loading the list of playlists took minutes, the list of albums and artists many seconds. It didn t cache any answers either, so it was unusably slow. Rejected option: Ampache localplay with mpd Ampache has a feature called localplay which allows it to control a mpd server. I tested this out with mpd and snapcast. It works, but is highly limited. Basically, it causes Ampache to send a playlist a literal list of URLs to the mpd server. Unfortunately, seeking within a track is impossible from within the Ampache interface. I will note that once a person is using mpd, snapcast makes a much easier whole-house solution than the streaming option I was trying to get working 8 years ago.

8 December 2022

John Goerzen: Building an Asynchronous, Internet-Optional Instant Messaging System

I loaded up this title with buzzwords. The basic idea is that IM systems shouldn t have to only use the Internet. Why not let them be carried across LoRa radios, USB sticks, local Wifi networks, and yes, the Internet? I ll first discuss how, and then why.

How do set it up I ve talked about most of the pieces here already: So, putting this together:
  • All Delta Chat needs is access to a SMTP and IMAP server. This server could easily reside on localhost.
  • Existing email servers support transport of email using non-IP transports, including batch transports that can easily store it in files.
  • These batches can be easily carried by NNCP, Syncthing, Filespooler, etc. Or, if the connectivity is good enough, via traditional networking using Yggdrasil.
    • Side note: Both NNCP and email servers support various routing arrangements, and can easily use intermediary routing nodes. Syncthing can also mesh. NNCP supports asynchronous multicast, letting your messages opportunistically find the best way to their destination.

OK, so why would you do it? You might be thinking, doesn t asynchronous mean slow? Well, not necessarily. Asynchronous means reliability is more important than speed ; that is, slow (even to the point of weeks) is acceptable, but not required. NNCP and Syncthing, for instance, can easily deliver within a couple of seconds. But let s step back a bit. Let s say you re hiking in the wilderness in an area with no connectivity. You get back to your group at a campsite at the end of the day, and have taken some photos of the forest and sent them to some friends. Some of those friends are at the campsite; when you get within signal range, they get your messages right away. Some of those friends are in another country. So one person from your group drives into town and sits at a coffee shop for a few minutes, connected to their wifi. All the messages from everyone in the group go out, all the messages from outside the group come in. Then they go back to camp and the devices exchange messages. Pretty slick, eh?
Note: this article also has a more permanent home on my website, where it may be periodically updated.

28 November 2022

John Goerzen: Flying Joy

Wisdom from my 5-year-old: When flying in a small plane, it is important to give your dolls a headset and let them see out the window, too! Moments like this make me smile at being a pilot dad. A week ago, I also got to give 8 children and one adult their first ever ride in any kind of airplane, through EAA s Young Eagles program. I got to hear several say, Oh wow! It s SO beautiful! Look at all the little houses! And my favorite: How can I be a pilot?

2 September 2022

John Goerzen: Dead USB Drives Are Fine: Building a Reliable Sneakernet

OK, you re probably thinking. John, you talk a lot about things like Gopher and personal radios, and now you want to talk about building a reliable network out of USB drives? Well, yes. In fact, I ve already done it.

What is sneakernet? Normally, sneakernet is a sort of tongue-in-cheek reference to using disconnected storage to transport data or messages. By disconnect storage I mean anything like CD-ROMs, hard drives, SD cards, USB drives, and so forth. There are times when loading up 12TB on a device and driving it across town is just faster and easier than using the Internet for the same. And, sometimes you need to get data to places that have no Internet at all. Another reason for sneakernet is security. For instance, if your backup system is online, and your systems being backed up are online, then it could become possible for an attacker to destroy both your primary copy of data and your backups. Or, you might use a dedicated computer with no network connection to do GnuPG (GPG) signing.

What about reliable sneakernet, then? TCP is often considered a reliable protocol. That means that the sending side is generally able to tell if its message was properly received. As with most reliable protocols, we have these components:
  1. After transmitting a piece of data, the sender retains it.
  2. After receiving a piece of data, the receiver sends an acknowledgment (ACK) back to the sender.
  3. Upon receiving the acknowledgment, the sender removes its buffered copy of the data.
  4. If no acknowledgment is received at the sender, it retransmits the data, in case it gets lost in transit.
  5. It reorders any packets that arrive out of order, so that the recipient s data stream is ordered correctly.
Now, a lot of the things I just mentioned for sneakernet are legendarily unreliable. USB drives fail, CD-ROMs get scratched, hard drives get banged up. Think about putting these things in a bicycle bag or airline luggage. Some of them are going to fail. You might think, well, I ll just copy files to a USB drive instead of move them, and once I get them onto the destination machine, I ll delete them from the source. Congratulations! You are a human retransmit algorithm! We should be able to automate this! And we can.

Enter NNCP NNCP is one of those things that almost defies explanation. It is a toolkit for building asynchronous networks. It can use as a carrier: a pipe, TCP network connection, a mounted filesystem (specifically intended for cases like this), and much more. It also supports multi-hop asynchronous routing and asynchronous meshing, but these are beyond the scope of this particular article. NNCP s transports that involve live communication between two hops already had all the hallmarks of being reliable; there was a positive ACK and retransmit. As of version 8.7.0, NNCP s ACKs themselves can also be asynchronous meaning that every NNCP transport can now be reliable. Yes, that s right. Your ACKs can flow over tapes and USB drives if you want them to. I use this for archiving and backups. If you aren t already familiar with NNCP, you might take a look at my NNCP page. I also have a lot of blog posts about NNCP. Those pages describe the basics of NNCP: the packet (the unit of transmission in NNCP, which can be tiny or many TB), the end-to-end encryption, and so forth. The new command we will now be interested in is nncp-ack.

The Basic Idea Here are the basic steps to processing this stuff with NNCP:
  1. First, we use nncp-xfer -rx to process incoming packets from the USB (or other media) device. This moves them into the NNCP inbound queue, deleting them from the media device, and verifies the packet integrity.
  2. We use nncp-ack -node $NODE to create ACK packets responding to the packets we just loaded into the rx queue. It writes a list of generated ACKs onto fd 4, which we save off for later use.
  3. We run nncp-toss -seen to process the incoming queue. The use of -seen causes NNCP to remember the hashes of packets seen before, so a duplicate of an already-seen packet will not be processed twice. This command also processes incoming ACKs for packets we ve sent out previously; if they pass verification, the relevant packets are removed from the local machine s tx queue.
  4. Now, we use nncp-xfer -keep -tx -mkdir -node $NODE to send outgoing packets to a given node by writing them to a given directory on the media device. -keep causes them to remain in the outgoing queue.
  5. Finally, we use the list of generated ACK packets saved off in step 2 above. That list is passed to nncp-rm -node $NODE -pkt < $FILE to remove those specific packets from the outbound queue. The reason is that there will never be an ACK of ACK packet (that would create an infinite loop), so if we don t delete them in this manner, they would hang around forever.
You can see these steps follow the same basic outline on upstream s nncp-ack page. One thing to keep in mind: if anything else is running nncp-toss, there is a chance of a race condition between steps 1 and 2 (if nncp-toss gets to it first, it might not get an ack generated). This would sort itself out eventually, presumably, as the sender would retransmit and it would be ACKed later.

Further ideas NNCP guarantees the integrity of packets, but not ordering between packets; if you need that, you might look into my Filespooler program. It is designed to work with NNCP and can provide ordered processing.

An example script Here is a script you might try for this sort of thing. It may have more logic than you need really, you just need the steps above but hopefully it is clear.
#!/bin/bash
set -eo pipefail
MEDIABASE="/media/$USER"
# The local node name
NODENAME=" hostname "
# All nodes.  NODENAME should be in this list.
ALLNODES="node1 node2 node3"
RUNNNCP=""
# If you need to sudo, use something like RUNNNCP="sudo -Hu nncp"
NNCPPATH="/usr/local/nncp/bin"
ACKPATH=" mktemp -d "
# Process incoming packets.
#
# Parameters: $1 - the path to scan.  Must contain a directory
# named "nncp".
procrxpath ()  
    while [ -n "$1" ]; do
        BASEPATH="$1/nncp"
        shift
        if ! [ -d "$BASEPATH" ]; then
            echo "$BASEPATH doesn't exist; skipping"
            continue
        fi
        echo " *** Incoming: processing $BASEPATH"
        TMPDIR=" mktemp -d "
        # This rsync and the one below can help with
        # certain permission issues from weird foreign
        # media.  You could just eliminate it and
        # always use $BASEPATH instead of $TMPDIR below.
        rsync -rt "$BASEPATH/" "$TMPDIR/"
        # You may need these next two lines if using sudo as above.
        # chgrp -R nncp "$TMPDIR"
        # chmod -R g+rwX "$TMPDIR"
        echo "     Running nncp-xfer -rx"
        $RUNNNCP $NNCPPATH/nncp-xfer -progress -rx "$TMPDIR"
        for NODE in $ALLNODES; do
                if [ "$NODE" != "$NODENAME" ]; then
                        echo "     Running nncp-ack for $NODE"
                        # Now, we generate ACK packets for each node we will
                        # process.  nncp-ack writes a list of the created
                        # ACK packets to fd 4.  We'll use them later.
                        # If using sudo, add -C 5 after $RUNNNCP.
                        $RUNNNCP $NNCPPATH/nncp-ack -progress -node "$NODE" \
                           4>> "$ACKPATH/$NODE"
                fi
        done
        rsync --delete -rt "$TMPDIR/" "$BASEPATH/"
        rm -fr "$TMPDIR"
    done
 
proctxpath ()  
    while [ -n "$1" ]; do
        BASEPATH="$1/nncp"
        shift
        if ! [ -d "$BASEPATH" ]; then
            echo "$BASEPATH doesn't exist; skipping"
            continue
        fi
        echo " *** Outgoing: processing $BASEPATH"
        TMPDIR=" mktemp -d "
        rsync -rt "$BASEPATH/" "$TMPDIR/"
        # You may need these two lines if using sudo:
        # chgrp -R nncp "$TMPDIR"
        # chmod -R g+rwX "$TMPDIR"
        for DESTHOST in $ALLNODES; do
            if [ "$DESTHOST" = "$NODENAME" ]; then
                continue
            fi
            # Copy outgoing packets to this node, but keep them in the outgoing
            # queue with -keep.
            $RUNNNCP $NNCPPATH/nncp-xfer -keep -tx -mkdir -node "$DESTHOST" -progress "$TMPDIR"
            # Here is the key: that list of ACK packets we made above - now we delete them.
            # There will never be an ACK for an ACK, so they'd keep sending forever
            # if we didn't do this.
            if [ -f "$ACKPATH/$DESTHOST" ]; then
                echo "nncp-rm for node $DESTHOST"
                $RUNNNCP $NNCPPATH/nncp-rm -debug -node "$DESTHOST" -pkt < "$ACKPATH/$DESTHOST"
            fi
        done
        rsync --delete -rt "$TMPDIR/" "$BASEPATH/"
        rm -rf "$TMPDIR"
        # We only want to write stuff once.
        return 0
    done
 
procrxpath "$MEDIABASE"/*
echo " *** Initial tossing..."
# We make sure to use -seen to rule out duplicates.
$RUNNNCP $NNCPPATH/nncp-toss -progress -seen
proctxpath "$MEDIABASE"/*
echo "You can unmount devices now."
echo "Done."

This post is also available on my webiste, where it may be periodically updated.

Next.