Search Results: "jgoerzen"

3 January 2024

John Goerzen: Consider Security First

I write this in the context of my decision to ditch Raspberry Pi OS and move everything I possibly can, including my Raspberry Pi devices, to Debian. I will write about that later. But for now, I wanted to comment on something I think is often overlooked and misunderstood by people considering distributions or operating systems: the huge importance of getting security updates in an automated and easy way.

Background Let s assume that these statements are true, which I think are well-supported by available evidence:
  1. Every computer system (OS plus applications) that can do useful modern work has security vulnerabilities, some of which are unknown at any given point in time;
  2. During the lifetime of that computer system, some of these vulnerabilities will be discovered. For a (hopefully large) subset of those vulnerabilities, timely patches will become available.
Now then, it follows that applying those timely patches is a critical part of having a system that it as secure as possible. Of course, you have to do other things as well good passwords, secure practices, etc but, fundamentally, if your system lacks patches for known vulnerabilities, you ve already lost at the security ballgame.

How to stay patched There is something of a continuum of how you might patch your system. It runs roughly like this, from best to worst:
  1. All components are kept up-to-date automatically, with no intervention from the user/operator
  2. The operator is automatically alerted to necessary patches, and they can be easily installed with minimal intervention
  3. The operator is automatically alerted to necessary patches, but they require significant effort to apply
  4. The operator has no way to detect vulnerabilities or necessary patches
It should be obvious that the first situation is ideal. Every other situation relies on the timeliness of human action to keep up-to-date with security patches. This is a fallible situation; humans are busy, take trips, dismiss alerts, miss alerts, etc. That said, it is rare to find any system living truly all the way in that scenario, as you ll see.

What is your system ? A critical point here is: what is your system ? It includes:
  • Your kernel
  • Your base operating system
  • Your applications
  • All the libraries needed to run all of the above
Some OSs, such as Debian, make little or no distinction between the base OS and the applications. Others, such as many BSDs, have a distinction there. And in some cases, people will compile or install applications outside of any OS mechanism. (It must be stressed that by doing so, you are taking the responsibility of patching them on your own shoulders.)

How do common systems stack up?
  • Debian, with its support for unattended-upgrades, needrestart, debian-security-support, and such, is largely category 1. It can automatically apply security patches, in most cases can restart the necessary services for the patch to take effect, and will alert you when some processes or the system must be manually restarted for a patch to take effect (for instance, a kernel update). Those cases requiring manual intervention are category 2. The debian-security-support package will even warn you of gaps in the system. You can also use debsecan to scan for known vulnerabilities on a given installation.
  • FreeBSD has no way to automatically install security patches for things in the packages collection. As with many rolling-release systems, you can t automate the installation of these security patches with FreeBSD because it is not safe to blindly update packages. It s not safe to blindly update packages because they may bring along more than just security patches: they may represent major upgrades that introduce incompatibilities, etc. Unlike Debian s practice of backporting fixes and thus producing narrowly-tailored patches, forcing upgrades to newer versions precludes a minimal intervention install. Therefore, rolling release systems are category 3.
  • Things such as Snap, Flatpak, AppImage, Docker containers, Electron apps, and third-party binaries often contain embedded libraries and such for which you have no easy visibility into their status. For instance, if there was a bug in libpng, would you know how many of your containers had a vulnerability? These systems are category 4 you don t even know if you re vulnerable. It s for this reason that my Debian-based Docker containers apply security patches before starting processes, and also run unattended-upgrades and friends.

The pernicious library problem As mentioned in my last category above, hidden vulnerabilities can be a big problem. I ve been writing about this for years. Back in 2017, I wrote an article focused on Docker containers, but which applies to the other systems like Snap and so forth. I cited a study back then that Over 80% of the :latest versions of official images contained at least one high severity vulnerability. The situation is no better now. In December 2023, it was reported that, two years after the critical Log4Shell vulnerability, 25% of apps were still vulnerable to it. Also, only 21% of developers ever update third-party libraries after introducing them into their projects. Clearly, you can t rely on these images with embedded libraries to be secure. And since they are black box, they are difficult to audit. Debian s policy of always splitting libraries out from packages is hugely beneficial; it allows finegrained analysis of not just vulnerabilities, but also the dependency graph. If there s a vulnerability in libpng, you have one place to patch it and you also know exactly what components of your system use it. If you use snaps, or AppImages, you can t know if they contain a deeply embedded vulnerability, nor could you patch it yourself if you even knew. You are at the mercy of upstream detecting and remedying the problem a dicey situation at best.

Who makes the patches? Fundamentally, humans produce security patches. Often, but not always, patches originate with the authors of a program and then are integrated into distribution packages. It should be noted that every security team has finite resources; there will always be some CVEs that aren t patched in a given system for various reasons; perhaps they are not exploitable, or are too low-impact, or have better mitigations than patches. Debian has an excellent security team; they manage the process of integrating patches into Debian, produce Debian Security Advisories, maintain the Debian Security Tracker (which maintains cross-references with the CVE database), etc. Some distributions don t have this infrastructure. For instance, I was unable to find this kind of tracker for Devuan or Raspberry Pi OS. In contrast, Ubuntu and Arch Linux both seem to have active security teams with trackers and advisories.

Implications for Raspberry Pi OS and others As I mentioned above, I m transitioning my Pi devices off Raspberry Pi OS (Raspbian). Security is one reason. Although Raspbian is a fork of Debian, and you can install packages like unattended-upgrades on it, they don t work right because they use the Debian infrastructure, and Raspbian hasn t modified them to use their own infrastructure. I don t see any Raspberry Pi OS security advisories, trackers, etc. In short, they lack the infrastructure to support those Debian tools anyhow. Not only that, but Raspbian lags behind Debian in both new releases and new security patches, sometimes by days or weeks. Live Migrating from Raspberry Pi OS bullseye to Debian bookworm contains instructions for migrating Raspberry Pis to Debian.

17 June 2023

John Goerzen: Using dar for Data Archiving

This is the third post in a series about data archiving to removable media (optical discs and hard drives). In the first, I explained the difference between backing up and archiving, established goals for the project, and said I d evaluate git-annex and dar. The second post evaluated git-annex, and now it s time to look at dar. The series will conclude with a post comparing git-annex with dar. What is dar? I could open with the same thing I did with git-annex, just changing the name of the program: [dar] is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. It is, fundamentally, an archiver like tar or zip (makes one file representing a bunch of other files), but it goes far beyond that. dar s homepage lays out a comprehensive list of features, which I will try to summarize here. So to tie this together for this project, I will set up a 400MB slice size (to mimic what I did with git-annex), and see how dar saves the data and restores it. Isolated cataloges aren t strictly necessary for this, but by using them (and/or dar_manager), we can build up a database of files and locations and thus directly compare dar to git-annex location tracking. Walkthrough: Creating the first archive As with the git-annex walkthrough, I ll set some variables to make it easy to remember: OK, we can run the backup immediately. No special setup is needed. dar supports both short-form (single-character) parameters and long-form ones. Since the parameters probably aren t familiar to everyone, I will use the long-form ones in these examples. Here s how we create our initial full backup. I ll explain the parameters below:
$ dar \
--verbose \
--create $DRIVE/bak1 \
--on-fly-isolate $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
Let s look at each of these parameters: This same command could have been written with short options as:
$ dar -v -c $DRIVE/bak1 -@ $CATDIR/bak1 -s 400M -9 2 -p -R $SOURCEDIR
What does it look like while running? Here s an excerpt:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/[redacted]
Finished writing to file 1, ready to continue ? [return = YES Esc = NO]
...
Writing down archive contents...
Closing the escape layer...
Writing down the first archive terminator...
Writing down archive trailer...
Writing down the second archive terminator...
Closing archive low layer...
Archive is closed.
--------------------------------------------
581 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
0 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
0 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 581 inode(s)
--------------------------------------------
Making room in memory (releasing memory used by archive of reference)...
Now performing on-fly isolation...
...
That was easy! Let s look at the contents of the backup directory:
$ ls -lh $DRIVE
total 3.7G
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.01.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.02.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.03.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:27 bak1.04.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.05.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.06.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.07.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:28 bak1.08.dar
-rw-r--r-- 1 jgoerzen jgoerzen 400M Jun 16 19:29 bak1.09.dar
-rw-r--r-- 1 jgoerzen jgoerzen 156M Jun 16 19:33 bak1.10.dar
And the isolated catalog:
$ ls -lh $CATDIR
total 37K
-rw-r--r-- 1 jgoerzen jgoerzen 35K Jun 16 19:33 bak1.1.dar
The isolated catalog is stored compressed automatically. Well this was easy. With one command, we archived the entire data set, split into 400MB chunks, and wrote out the catalog data. Walkthrough: Inspecting the saved archive Can dar tell us which slice contains a given file? Sure:
$ dar --list $DRIVE/bak1 --list-format=slicing less
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
...
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
1-2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
2 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- [redacted]
...
This illustrates the transition from slice 1 to slice 2. The first file was stored entirely in slice 1; the second stored partially in slice 1 and partially in slice 2, and third solely in slice 2. We can get other kinds of information as well.
$ dar --list $DRIVE/bak1 less
[Data ][D][ EA ][FSA][Compr][S] Permission User Group Size Date filename
--------------------------------+------------+-------+-------+---------+-------------------------------+------------
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 24 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 16 Mio Mon Mar 5 07:58:09 2018 [redacted]
[Saved][ ] [-L-][ 0%][X] -rwxr--r-- jgoerzen jgoerzen 22 Mio Mon Mar 5 07:58:09 2018 [redacted]
These are the same files I was looking at before. Here we see they are 24MB, 16MB, and 22MB in size, and some additional metadata. Even more is available in the XML list format. Walkthrough: updates As with git-annex, I ve made some changes in the source directory: moved a file, added another, and deleted one. Let s create an incremental backup now:
$ dar \
--verbose \
--create $DRIVE/bak2 \
--on-fly-isolate $CATDIR/bak2 \
--ref $CATDIR/bak1 \
--slice 400M \
--min-digits 2 \
--pause \
--fs-root $SOURCEDIR
This command is very similar to the earlier one. Instead of writing an archive and catalog named bak1, we write one named bak2. What s new here is --ref $CATDIR/bak1. That says, make an incremental based on an archive of reference. All that is needed from that archive of reference is the detached catalog. --ref $DRIVE/bak1 would have worked equally well here. Here s what I did to the $SOURCEDIR: Let s see if dar s command output matches this:
...
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/file01-unchanged
Adding file to archive: /acrypt/no-backup/jgoerzen/testdata/cp
Saving Filesystem Specific Attributes for /acrypt/no-backup/jgoerzen/testdata/cp
Adding folder to archive: [redacted]
Saving Filesystem Specific Attributes for [redacted]
Adding reference to files that have been destroyed since reference backup...
...
--------------------------------------------
3 inode(s) saved
including 0 hard link(s) treated
0 inode(s) changed at the moment of the backup and could not be saved properly
0 byte(s) have been wasted in the archive to resave changing files
0 inode(s) with only metadata changed
578 inode(s) not saved (no inode/file change)
0 inode(s) failed to be saved (filesystem error)
0 inode(s) ignored (excluded by filters)
2 inode(s) recorded as deleted from reference backup
--------------------------------------------
Total number of inode(s) considered: 583
--------------------------------------------
EA saved for 0 inode(s)
FSA saved for 3 inode(s)
--------------------------------------------
...
Yes, it does. The rename is recorded as a deletion and an addition, since dar doesn t directly track renames. So the rename plus the deletion account for the two deletions. The rename plus the addition of cp count as 2 of the 3 inodes saved; the third is the modified directory from which files were deleted and moved out. Let s see the files that were created:
$ ls -lh $DRIVE/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 18M Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/drive/bak2.01.dar
$ ls -lh $CATDIR/bak2*
-rw-r--r-- 1 jgoerzen jgoerzen 22K Jun 16 19:52 /acrypt/no-backup/jgoerzen/dar-testing/cat/bak2.1.dar
What does list look like now?
Slice(s) [Data ][D][ EA ][FSA][Compr][S] Permission Filemane
--------+--------------------------------+----------+-----------------------------
[ ][ ] [---][-----][X] -rwxr--r-- [redacted]
1 [Saved][ ] [-L-][ 0%][X] -rwxr--r-- file01-unchanged
...
[--- REMOVED ENTRY ----][redacted]
[--- REMOVED ENTRY ----][redacted]
Here I show an example of:
  1. A file that was not changed from the initial backup. Its presence was simply noted, but because we re doing an incremental, the data wasn t saved.
  2. A file that is saved in this incremental, on slice 1.
  3. The two deleted files
Walkthrough: dar_manager As we ve seen above, the two archives (or their detached catalog) give us a complete picture of what files were present at the time of the creation of each archive, and what files were stored in a given archive. We can certainly continue working in that way. We can also use dar_manager to build a comprehensive database of these archives, to be able to find what media is necessary to restore each given file. Or, with dar_manager s when parameter, we can restore files as of a particular date. Let s try it out. First, we create our database:
$ dar_manager --create $DARDB
$ dar_manager --base $DARDB --add $DRIVE/bak1
Auto detecting min-digits to be 2
$ dar_manager --base $DARDB --add $DRIVE/bak2
Auto detecting min-digits to be 2
Here we created the database, and added our two catalogs to it. (Again, we could have as easily used $CATDIR/bak1; either the archive or its isolated catalog will work here.) It s important to add the catalogs in order. Let s do some quick experimentation with dar_manager:
$ dar_manager -v --base $DARDB --list
Decompressing and loading database to memory...
dar path :
dar options :
database version : 6
compression used : gzip
compression level: 9 archive # path basename
------------+--------------+---------------
1 /acrypt/no-backup/jgoerzen/dar-testing/drive bak1
2 /acrypt/no-backup/jgoerzen/dar-testing/drive bak2
$ dar_manager --base $DARDB --stat
archive # most recent/total data most recent/total EA
--------------+-------------------------+-----------------------
1 580/581 0/0
2 3/3 0/0
The list option shows the correlation between dar_manager archive number (1, 2) with filenames (bak1, bak2). It is coincidence here that 1/bak1 and 2/bak2 correlate; that s not necessarily the case. Most dar_manager commands operate on archive number, while dar commands operate on archive path/basename. Now let s see just what files are saved in archive , the incremental:
$ dar_manager --base $DARDB --used 2
[ Saved ][ ] [redacted]
[ Saved ][ ] file01-unchanged
[ Saved ][ ] cp
Now we can also where a file is stored. Here s one that was saved in the full backup and unmodified in the incremental:
$ dar_manager --base $DARDB --file [redacted]
1 Fri Jun 16 19:15:12 2023 saved absent
2 Fri Jun 16 19:15:12 2023 present absent
(The absent at the end refers to extended attributes that the file didn t have) Similarly, for files that were added or removed, they ll be listed only at the appropriate place. Walkthrough: Restoration I m not going to repeat the author s full restoration with dar page, but here are some quick examples. A simple way of doing everything is using incrementals for the whole series. To do that, you d have bak1 be full, bak2 based on bak1, bak3 based on bak2, bak4 based on bak3, etc. To restore from such a series, you have two options: If you get fancy for instance, bak2 is based on bak1, bak3 on bak2, bak4 on bak1 then you would want to use dar_manager to ensure a consistent restore is completed. Either way, the process is nearly identical. Also, I figure, to make things easy, you can save a copy of the entire set of isolated catalogs before you finalize each disc/drive. They re so small, and this would let someone with just the most recent disc build a dar_manager database without having to go through all the other discs. Anyhow, let s do a restore using just dar. I ll make a $RESTOREDIR and do it that way.
$ dar \
--verbose \
--extract $DRIVE/bak1 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
This execute lets us see how dar works; this is an illustration of the power it has (above pause); it s a snippet interpreted by /bin/sh with %n being one of the dar placeholders. If memory serves, it s not strictly necessary, as dar will prompt you for slices it needs if they re not mounted. Anyhow, you ll see it first reading the last slice, which contains the catalog, then reading from the beginning. Here we go:
Auto detecting min-digits to be 2
Opening archive bak1 ...
Opening the archive using the multi-slice abstraction layer...
Ready for slice 10. Press Enter
...
Loading catalogue into memory...
Locating archive contents...
Reading archive contents...
File ownership will not be restored du to the lack of privilege, you can disable this message by asking not to restore file ownership [return = YES Esc = NO]
Continuing...
Restoring file's data: [redacted]
Restoring file's FSA: [redacted]
Ready for slice 1. Press Enter
...
Ready for slice 2. Press Enter
...
--------------------------------------------
581 inode(s) restored
including 0 hard link(s)
0 inode(s) not restored (not saved in archive)
0 inode(s) not restored (overwriting policy decision)
0 inode(s) ignored (excluded by filters)
0 inode(s) failed to restore (filesystem error)
0 inode(s) deleted
--------------------------------------------
Total number of inode(s) considered: 581
--------------------------------------------
EA restored for 0 inode(s)
FSA restored for 0 inode(s)
--------------------------------------------
The warning is because I m not doing the extraction as root, which limits dar s ability to fully restore ownership data. OK, now the incremental:
$ dar \
--verbose \
--extract $DRIVE/bak2 \
--fs-root $RESTOREDIR \
--no-warn \
--execute "echo Ready for slice %n. Press Enter; read foo"
...
Ready for slice 1. Press Enter
...
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/file01-unchanged
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's FSA: /acrypt/no-backup/jgoerzen/dar-testing/restore/cp
Restoring file's data: /acrypt/no-backup/jgoerzen/dar-testing/restore/[redacted directory]
Removing file (reason is file recorded as removed in archive): [redacted file]
Removing file (reason is file recorded as removed in archive): [redacted file]
This all looks right! Now how about we compare the restore to the original source directory?
$ diff -durN $SOURCEDIR $RESTOREDIR
No changes perfect. We could instead do this restore via a single dar_manager command, though annoyingly, we d have to pass all top-level files/directories to dar_manager restore. But still, it s one command, and basically automates and optimizes the dar restores shown above. Conclusions Dar makes it extremely easy to just Do The Right Thing when making archives. One command makes a backup. It saves things in simple files. You can make an isolated catalog if you want, and it too is saved in a simple file. You can query what is in the files and where. You can restore from all or part of the files. You can simply play the backups forward, in order, to achieve a full and consistent restore. Or you can load data about them into dar_manager for an optimized restore. A bit of scripting will be necessary to make incrementals; finding the most recent backup or catalog. If backup files are named with care for instance, by date then this should be a pretty easy task. I haven t touched on resiliency yet. dar comes with tools for recovering archives that have had portions corrupted or lost. It can also rebuild the catalog if it is corrupted or lost. It adds tape marks (or escape sequences ) to the archive along with the data stream. So every entry in the catalog is actually stored in the archive twice: once alongside the file data, and once at the end in the collected catalog. This allows dar to scan a corrupted file for the tape marks and reconstruct whatever is still intact, even if the catalog is lost. dar also integrates with tools like sha256sum and par2 to simplify archive integrity testing and restoration. This balances against the need to use a tool (dar, optionally with a GUI frontend) to restore files. I ll discuss that more in the next post.

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

2 February 2023

John Goerzen: Using Yggdrasil As an Automatic Mesh Fabric to Connect All Your Docker Containers, VMs, and Servers

Sometimes you might want to run Docker containers on more than one host. Maybe you want to run some at one hosting facility, some at another, and so forth. Maybe you d like run VMs at various places, and let them talk to Docker containers and bare metal servers wherever they are. And maybe you d like to be able to easily migrate any of these from one provider to another. There are all sorts of very complicated ways to set all this stuff up. But there s also a simple one: Yggdrasil. My blog post Make the Internet Yours Again With an Instant Mesh Network explains some of the possibilities of Yggdrasil in general terms. Here I want to show you how to use Yggdrasil to solve some of these issues more specifically. Because Yggdrasil is always Encrypted, some of the security lifting is done for us.

Background Often in Docker, we connect multiple containers to a single network that runs on a given host. That much is easy. Once you start talking about containers on multiple hosts, then you start adding layers and layers of complexity. Once you start talking multiple providers, maybe multiple continents, then the complexity can increase. And, if you want to integrate everything from bare metal servers to VMs into this well, there are ways, but they re not easy. I m a believer in the KISS principle. Let s not make things complex when we don t have to.

Enter Yggdrasil As I ve explained before, Yggdrasil can automatically form a global mesh network. This is pretty cool! As most people use it, they join it to the main Yggdrasil network. But Yggdrasil can be run entirely privately as well. You can run your own private mesh, and that s what we ll talk about here. All we have to do is run Yggdrasil inside each container, VM, server, or whatever. We handle some basics of connectivity, and bam! Everything is host- and location-agnostic.

Setup in Docker The installation of Yggdrasil on a regular system is pretty straightforward. Docker is a bit more complicated for several reasons:
  • It blocks IPv6 inside containers by default
  • The default set of permissions doesn t permit you to set up tunnels inside a container
  • It doesn t typically pass multicast (broadcast) packets
Normally, Yggdrasil could auto-discover peers on a LAN interface. However, aside from some esoteric Docker networking approaches, Docker doesn t permit that. So my approach is going to be setting up one or more Yggdrasil router containers on a given Docker host. All the other containers talk directly to the router container and it s all good.

Basic installation In my Dockerfile, I have something like this:
FROM jgoerzen/debian-base-security:bullseye
RUN echo "deb http://deb.debian.org/debian bullseye-backports main" >> /etc/apt/sources.list && \
    apt-get --allow-releaseinfo-change update && \
    apt-get -y --no-install-recommends -t bullseye-backports install yggdrasil
...
COPY yggdrasil.conf /etc/yggdrasil/
RUN set -x; \
    chown root:yggdrasil /etc/yggdrasil/yggdrasil.conf && \
    chmod 0750 /etc/yggdrasil/yggdrasil.conf && \
    systemctl enable yggdrasil
The magic parameters to docker run to make Yggdrasil work are:
--cap-add=NET_ADMIN --sysctl net.ipv6.conf.all.disable_ipv6=0 --device=/dev/net/tun:/dev/net/tun
This example uses my docker-debian-base images, so if you use them as well, you ll also need to add their parameters. Note that it is NOT necessary to use --privileged. In fact, due to the network namespaces in use in Docker, this command does not let the container modify the host s networking (unless you use --net=host, which I do not recommend). The --sysctl parameter was the result of a lot of banging my head against the wall. Apparently Docker tries to disable IPv6 in the container by default. Annoying.

Configuration of the router container(s) The idea is that the router node (or more than one, if you want redundancy) will be the only ones to have an open incoming port. Although the normal Yggdrasil case of directly detecting peers in a broadcast domain is more convenient and more robust, this can work pretty well too. You can, of course, generate a template yggdrasil.conf with yggdrasil -genconf like usual. Some things to note for this one:
  • You ll want to change Listen to something like Listen: ["tls://[::]:12345"] where 12345 is the port number you ll be listening on.
  • You ll want to disable the MulticastInterfaces entirely by just setting it to [] since it doesn t work anyway.
  • If you expose the port to the Internet, you ll certainly want to firewall it to only authorized peers. Setting AllowedPublicKeys is another useful step.
  • If you have more than one router container on a host, each of them will both Listen and act as a client to the others. See below.

Configuration of the non-router nodes Again, you can start with a simple configuration. Some notes here:
  • You ll want to set Peers to something like Peers: ["tls://routernode:12345"] where routernode is the Docker hostname of the router container, and 12345 is its port number as defined above. If you have more than one local router container, you can simply list them all here. Yggdrasil will then fail over nicely if any one of them go down.
  • Listen should be empty.
  • As above, MulticastInterfaces should be empty.

Using the interfaces At this point, you should be able to ping6 between your containers. If you have multiple hosts running Docker, you can simply set up the router nodes on each to connect to each other. Now you have direct, secure, container-to-container communication that is host-agnostic! You can also set up Yggdrasil on a bare metal server or VM using standard procedures and everything will just talk nicely!

Security notes Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure your internal comms stay private: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. By disabling multicast discovery, you eliminate the chance for random machines on the LAN to join the mesh. By making sure that you firewall off (outside of Yggdrasil) who can connect to a Yggdrasil node with a listening port, you can authorize only your own machines. And, by setting AllowedPublicKeys on the nodes with listening ports, you can authenticate the Yggdrasil peers. Note that part of the benefit of the Yggdrasil mesh is normally that you don t have to propagate a configuration change to every participatory node that s a nice thing in general! You can also run a firewall inside your container (I like firehol for this purpose) and aggressively firewall the IPs that are allowed to connect via the Yggdrasil interface. I like to set a stable interface name like ygg0 in yggdrasil.conf, and then it becomes pretty easy to firewall the services. The Docker parameters that allow Yggdrasil to run are also sufficient to run firehol.

Naming Yggdrasil peers You probably don t want to hard-code Yggdrasil IPs all over the place. There are a few solutions:
  • You could run an internal DNS service
  • You can do a bit of scripting around Docker s --add-host command to add things to /etc/hosts

Other hints & conclusion Here are some other helpful use cases:
  • If you are migrating between hosts, you could leave your reverse proxy up at both hosts, both pointing to the target containers over Yggdrasil. The targets will be automatically found from both sides of the migration while you wait for DNS caches to update and such.
  • This can make services integrate with local networks a lot more painlessly than they might otherwise.
This is just an idea. The point of Yggdrasil is expanding our ideas of what we can do with a network, so here s one such expansion. Have fun!
Note: This post also has a permanent home on my webiste, where it may be periodically updated.

29 May 2022

John Goerzen: Fast, Ordered Unixy Queues over NNCP and Syncthing with Filespooler

It seems that lately I ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Why did I bother? My needs arose primarily from handling Backups over Asynchronous Communication methods in this case, NNCP. When backups contain incrementals that are unpacked on the destination, they must be applied in the correct order. In some cases, like ZFS, the receiving side will detect an out-of-order backup file and exit with an error. In those cases, processing in random order is acceptable but can be slow if, say, hundreds or thousands of hourly backups have stacked up over a period of time. The same goes for using gitsync-nncp to synchronize git repositories. In both cases, a best effort based on creation date is sufficient to produce a significant performance improvement. With other cases, such as tar or dar backups, the receiving cannot detect out of order incrementals. In those situations, the incrementals absolutely must be applied with strict ordering. There are many other situations that arise with these needs also. Filespooler is the answer to these.

Existing Work Before writing my own program, I of course looked at what was out there already. I looked at celeary, gearman, nq, rq, cctools work queue, ts/tsp (task spooler), filequeue, dramatiq, GNU parallel, and so forth. Unfortunately, none of these met my needs at all. They all tended to have properties like:
  • An extremely complicated client/server system that was incompatible with piping data over existing asynchronous tools
  • A large bias to processing of small web requests, resulting in terrible inefficiency or outright incompatibility with jobs in the TB range
  • An inability to enforce strict ordering of jobs, especially if they arrive in a different order from how they were queued
Many also lacked some nice-to-haves that I implemented for Filespooler:
  • Support for the encryption and cryptographic authentication of jobs, including metadata
  • First-class support for arbitrary compressors
  • Ability to use both stream transports (pipes) and filesystem-like transports (eg, rclone mount, S3, Syncthing, or Dropbox)

Introducing Filespooler Filespooler is a tool in the Unix tradition: that is, do one thing well, and integrate nicely with other tools using the fundamental Unix building blocks of files and pipes. Filespooler itself doesn t provide transport for jobs, but instead is designed to cooperate extremely easily with transports that can be written to as a filesystem or piped to which is to say, almost anything of interest. Filespooler is written in Rust and has an extensive Filespooler Reference as well as many tutorials on its homepage. To give you a few examples, here are some links:

Basics of How it Works Filespooler is intentionally simple:
  • The sender maintains a sequence file that includes a number for the next job packet to be created.
  • The receiver also maintains a sequence file that includes a number for the next job to be processed.
  • fspl prepare creates a Filespooler job packet and emits it to stdout. It includes a small header (<100 bytes in most cases) that includes the sequence number, creation timestamp, and some other useful metadata.
  • You get to transport this job packet to the receiver in any of many simple ways, which may or may not involve Filespooler s assistance.
  • On the receiver, Filespooler (when running in the default strict ordering mode) will simply look at the sequence file and process jobs in incremental order until it runs out of jobs to process.
The name of job files on-disk matches a pattern for identification, but the content of them is not significant; only the header matters. You can send job data in three ways:
  1. By piping it to fspl prepare
  2. By setting certain environment variables when calling fspl prepare
  3. By passing additional command-line arguments to fspl prepare, which can optionally be passed to the processing command at the receiver.
Data piped in is added to the job payload , while environment variables and command-line parameters are encoded in the header.

Basic usage Here I will excerpt part of the Using Filespooler over Syncthing tutorial; consult it for further detail. As a bit of background, Syncthing is a FLOSS decentralized directory synchronization tool akin to Dropbox (but with a much richer feature set in many ways).

Preparation First, on the receiver, you create the queue (passing the directory name to -q):
sender$ fspl queue-init -q ~/sync/b64queue
Now, we can send a job like this:
sender$ echo Hi   fspl prepare -s ~/b64seq -i -   fspl queue-write -q ~/sync/b64queue
Let s break that down:
  • First, we pipe Hi to fspl prepare.
  • fspl prepare takes two parameters:
    • -s seqfile gives the path to a sequence file used on the sender side. This file has a simple number in it that increments a unique counter for every generated job file. It is matched with the nextseq file within the queue to make sure that the receiver processes jobs in the correct order. It MUST be separate from the file that is in the queue and should NOT be placed within the queue. There is no need to sync this file, and it would be ideal to not sync it.
    • The -i option tells fspl prepare to read a file for the packet payload. -i - tells it to read stdin for this purpose. So, the payload will consist of three bytes: Hi\n (that is, including the terminating newline that echo wrote)
  • Now, fspl prepare writes the packet to its stdout. We pipe that into fspl queue-write:
    • fspl queue-write reads stdin and writes it to a file in the queue directory in a safe manner. The file will ultimately match the fspl-*.fspl pattern and have a random string in the middle.
At this point, wait a few seconds (or however long it takes) for the queue files to be synced over to the recipient. On the receiver, we can see if any jobs have arrived yet:
receiver$ fspl queue-ls -q ~/sync/b64queue
ID                   creation timestamp          filename
1                    2022-05-16T20:29:32-05:00   fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
Let s say we d like some information about the job. Try this:
receiver$ $ fspl queue-info -q ~/sync/b64queue -j 1
FSPL_SEQ=1
FSPL_CTIME_SECS=1652940172
FSPL_CTIME_NANOS=94106744
FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z
FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00
FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/b64queue
FSPL_JOB_FULLPATH=/home/jgoerzen/sync/b64queue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
This information is intentionally emitted in a format convenient for parsing. Now let s run the job!
receiver$ fspl queue-process -q ~/sync/b64queue --allow-job-params base64
SGkK
There are two new parameters here:
  • --allow-job-params says that the sender is trusted to supply additional parameters for the command we will be running.
  • base64 is the name of the command that we will run for every job. It will:
    • Have environment variables set as we just saw in queue-info
    • Have the text we previously prepared Hi\n piped to it
By default, fspl queue-process doesn t do anything special with the output; see Handling Filespooler Command Output for details on other options. So, the base64-encoded version of our string is SGkK . We successfully sent a packet using Syncthing as a transport mechanism! At this point, if you do a fspl queue-ls again, you ll see the queue is empty. By default, fspl queue-process deletes jobs that have been successfully processed.

For more See the Filespooler homepage.
This blog post is also available as a permanent, periodically-updated page.

3 March 2022

John Goerzen: Tools for Communicating Offline and in Difficult Circumstances

Note: this post is also available on my website, where it will be updated periodically. When things are difficult maybe there s been a disaster, or an invasion (this page is being written in 2022 just after Russia invaded Ukraine), or maybe you re just backpacking off the grid there are tools that can help you keep in touch, or move your data around. This page aims to survey some of them, roughly in order from easiest to more complex.

Simple radios Handheld radios shouldn t be forgotten. They are cheap, small, and easy to operate. Their range isn t huge maybe a couple of miles in rural areas, much less in cities but they can be a useful place to start. They tend to have no actual encryption features (the privacy features really aren t.) In the USA, options are FRS/GMRS and CB.

Syncthing With Syncthing, you can share files among your devices or with your friends. Syncthing essentially builds a private mesh for file sharing. Devices will auto-discover each other when on the same LAN or Wifi network, and opportunistically sync. I wrote more about offline uses of Syncthing, and its use with NNCP, in my blog post A simple, delay-tolerant, offline-capable mesh network with Syncthing (+ optional NNCP). Yes, it is a form of a Mesh Network! Homepage: https://syncthing.net/

Briar Briar is an instant messaging service based around Android. It s IM with a twist: it can use a mesh of Bluetooh devices. Or, if Internet is available, Tor. It has even been extended to support the use of SD cards and USB sticks to carry your messages. Like some others here, it can relay messages for third parties as well. Homepage: https://briarproject.org/

Manyverse and Scuttlebutt Manyverse is a client for Scuttlebutt, which is a sort of asynchronous, offline-friendly social network. You can use it to keep in touch with your family and friends, and it supports syncing over Bluetooth and Wifi even in the absence of Internet. Homepages: https://www.manyver.se/ and https://scuttlebutt.nz/

Yggdrasil Yggdrasil is a self-healing, fully end-to-end Encrypted Mesh Network. It can work among local devices or on the global Internet. It has network services that can egress onto things like Tor, I2P, and the public Internet. Yggdrasil makes a perfect companion to ad-hoc wifi as it has auto peer discovery on the local network. I talked about it in more detail in my blog post Make the Internet Yours Again With an Instant Mesh Network. Homepage: https://yggdrasil-network.github.io/

Ad-Hoc Wifi Few people know about the ad-hoc wifi mode. Ad-hoc wifi lets devices in range talk to each other without an access point. You just all set your devices to the same network name and password and there you go. However, there often isn t DHCP, so IP configuration can be a bit of a challenge. Yggdrasil helps here.

NNCP Moving now to more advanced tools, NNCP lets you assemble a network of peers that can use Asynchronous Communication over sneakernet, USB drives, radios, CD-Rs, Internet, tor, NNCP over Yggdrasil, Syncthing, Dropbox, S3, you name it . NNCP supports multi-hop file transfer and remote execution. It is fully end-to-end encrypted. Think of it as the offline version of ssh. Homepage: https://nncp.mirrors.quux.org/

Meshtastic Meshtastic uses long-range, low-power LoRa radios to build a long-distance, encrypted, instant messaging system that is a Mesh Network. It requires specialized hardware, about $30, but will tend to get much better range than simple radios, and with very little power. Homepages: https://meshtastic.org/ and https://meshtastic.letstalkthis.com/

Portable Satellite Communicators You can get portable satellite communicators that can send SMS from anywhere on earth with a clear view of the sky. The Garmin InReach mini and Zoleo are two credible options. Subscriptions range from about $10 to $40 per month depending on usage. They also have global SOS features.

Telephone Lines If you have a phone line and a modem, UUCP can get through just about anything. It s an older protocol that lacks modern security, but will deal with slow and noisy serial lines well. XBee SX radios also have a serial mode that can work well with UUCP.

Additional Suggestions It is probably useful to have a Linux live USB stick with whatever software you want to use handy. Debian can be installed from the live environment, or you could use a security-focused distribution such as Tails or Qubes.

References This page originated in my Mastodon thread and incorporates some suggestions I received there. It also formed a post on my blog.

25 August 2021

John Goerzen: Excellent Experience with Debian Bullseye

I ve appreciated the bullseye upgrade, like most Debian upgrades. I m not quite sure how, since I was already running a backports kernel, but somehow the entire system is snappier. Maybe newer X or something? I m really pleased with it. Hardware integration is even nicer now, particularly the automatic driverless support for scanners in addition to the existing support for printers. All in all, a very nice upgrade, and pretty painless. I experienced a few odd situations. For one, I had been using Gnome Flashback. Since xmonad-log-applet didn t compile there (due to bitrot in the log applet, not flashback), and I had been finding Gnome Flashback to be a rather dusty and forgotten corner of Gnome for a long time, I decided to try Mate. Mate just seemed utterly unable to handle a situation with a laptop and an external monitor very well. I want to use only the external monitor with the laptop lid is closed, and it just couldn t remember how to do the right thing external monitor on, laptop monitor off, laptop not put into suspend. gdm3 also didn t seem to be able to put the external monitor to sleep, either, causing a few nights of wasted power. So off I went to XFCE, which I had been using for years on my workstation anyhow. Lots more settings available in XFCE, plus things Just Worked there. Odd that XFCE, the thin and light DE, is now the one that has the most relevant settings. It seems the Gnome let s remove a bunch of features approach has extended to MATE as well. When I switched to XFCE, I also removed gdm3 from my system, leaving lightdm as the only DM on it. That matched what my desktop machine was using, and also what task-xfce-desktop called for. But strangely, the XFCE settings for lightdm were completely different between the laptop and the desktop. It turns out that with lightdm, you can have the lightdm-gtk-greeter and the accompanying lightdm-gtk-greeter-settings, or slick-greeter and the accompanying lightdm-settings. One machine had one greeter and settings, and the other had the other. Why, I don t know. But lightdm-gtk-greeter-settings had the necessary options for putting monitors to sleep on the login screen, so I went with it. This does highlight a bit of a weakness in Debian upgrades. There is SO MUCH choice in Debian, which I highly value. At some point, almost certainly without my conscious choice, one machine got one greeter and another got the other. Despite both having task-xfce-desktop installed, they got different desktop experiences. There isn t a great way to say OK, I know I had a bunch of things installed before, but NOW I want the default bullseye experience . But overall, it is an absolutely fantastic distribution. It is great to see this nonprofit community distribution continue to have such quality on such an immense scale. And hard to believe I ve been a Debian developer for 25 years. That seems almost impossible!

18 August 2021

John Goerzen: Distributed, Asynchronous Git Syncing with NNCP

I have a problem. I have a directory that I use with org-mode and org-roam. I want it to be synced across multiple machines. I also want to keep the history with git. And, I want to use end-to-end encryption (no storing a plain git repo on a remote server), have a serverless setup, not require any two machines to be up simultaneously, and be resilient in the face of races and conflicts. Whew. I ve tried a number of setups git-remote-gcrypt on a remote server (fragile), some complicated scripts around a separate repo in syncthing (requires one machine to be in charge ), etc. They all were subpar. Then NNCP introdoced asynchronous multicast and I was intrigued. So, I wrote gitsync-nncp, which uses NNCP to distribute git bundles to all the participating machines. The comprehensive documentation for gitsync-nncp goes into a lot more detail about how it works and what problems it solves. It s working quite well for me!

30 December 2020

John Goerzen: Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let s dig into the details. Today s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later. The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I m going to try to go into more detail here. Assumptions I am assuming a setup where: Hardware Let s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don t need encrypted backups or don t care that much about performance may people probably fall into this category you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server. I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it s been quite reliable. For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 toaster (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive. Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick plug it in, all the spooled data is moved over, then plug it in to the backup system and it s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup. Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I m going to focus on how to integrate ZFS with NNCP. Operating System Of course, it should be no surprise that I set this up on Debian. As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state. Security There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it s trivial to add gpg to the pipeline as well, and in fact I ll be demonstrating that in a future post for other reasons. Software Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won t work for this situation. I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well. Preparing NNCP I m going to assume three hosts in this setup: The basic NNCP workflow documentation covers the basic steps. You ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you ll add a via line like this:
backupsvr:  
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]
This tells NNCP that data destined for backupsvr should always be sent via spooler first. You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer. Generating Backup Data Now, on the laptop, install simplesnap (2.0.0 or above). Although you won t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:
zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap
Then, create a script /usr/local/bin/runsimplesnap like this:
#!/bin/bash
set -e
simplesnap --store tank/simplesnap --setname backups --local --host  hostname  \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap
su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'
if ip addr   grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi
The call to simplesnap sets it up to send the data to simplesnap-queue, which we ll create in a moment. The receivmd, plus noreap, sets it up to run without ZFS on the local system. The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don t want to automatically do this in case I m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options. Now, here s what /usr/local/bin/simplesnap-queue looks like:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2
This is a pretty simple script. simplesnap will call it with a path based on the store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive. Receiving ZFS backups In the NNCP configuration on the recipient s side, in the laptop section, we define what command it s allowed to run as zfsreceive:
      exec:  
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
       
We authorize the nncp user to run this under sudo in /etc/sudoers.d/local nncp:
Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive
The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later. Now, here s a basic nncp-zfs-receive script:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
DEST="$1"
# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"
And there you have it all the basics are in place. Update 2020-12-30: An earlier version of this article had zfs receive -F instead of zfs receive -o readonly=on -x mountpoint . These changed arguments are more robust.
Update 2021-01-04: I am now recommending zfs receive -u -o readonly=on ; see my successor article for more. Enhancements You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
# $1 will be the host/dataset
DEST="$1"
HOST=" echo "$1"   sed 's,/.*,,g' "
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
exiterror ()  
   logerror "$1"
   echo "$1" 1>&2
   exit 10
 
# Sanity check
if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi
runcommand zfs receive -F "$STORE/$DEST"
Now you ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did. Further notes on NNCP nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

16 December 2020

John Goerzen: How To Join the Fediverse and Cast Off the Attention Economy

In a recent post, I wrote about how the attention economy in use at big social networks hurts you. In this post, I m going to suggest what to do about it. Mastodon and the Fediverse When you use email, you can send a message from an account at Google to one at Yahoo, Microsoft, or any of millions of businesses and organizations running their own mail server. Unlike, say, Facebook, email isn t a single service, but rather a whole bunch of independent systems that can communicate (or federate) with each other. The Fediverse is similar, and the most advanced Fediverse client is Mastodon. Mastodon: It s easy to get started! Head over to joinmastodon.org and click Get Started . Pick a community don t worry, this isn t a hugely consequential decision, as you can always move or change later. You can browse activity from across the Fediverse, or just on your local community, so if you find a community with similar interests, it can be a neat way to find others to follow. If you re looking for more details, mastodon.help has a nice guide. Defeating the Attention Economy So, why does Mastodon make a difference? First of all, you get to pick your host (and even software). With Twitter, you pretty much are using Twitter (yes, I know of things like Hootsuite, but for the vast majority of people, it s twitter.com only). With Mastodon, you have choice. Pick the host that runs the software and has the kind of moderation you like. Secondly, Mastodon is not for profit. There is no money to be made in keeping you on the site. Almost all Mastodon instances are ad-free. And Mastodon s completely open protocols make it easy to go elsewhere if you like. It s Not Just Mastodon! There are plenty of other programs in the Fediverse. And, this is really key, they all interact with each other. You can share photos in Pixelfed (sort of like a federated Instagram) and see them and comment in Mastodon! Some things to point out: And there are many others. This blog, for instance, runs WordPress and uses an ActivityPub connector; comments from the Fediverse integrate here. Find me in the Fediverse You can look me up: just type in @jgoerzen@floss.social in the search box of any Mastodon instance and click Follow. You can also follow this blog at @jgoerzen@changelog.complete.org.

14 August 2020

John Goerzen: In Which COVID-19 Misinformation Leads To A Bunch of Graphs Made With Rust

A funny and by funny, I mean sad thing has happened. Recently the Kansas Department of Health and Environment (KDHE) has been analyzing data from the patchwork implementation of mask requirements in Kansas. They came to a conclusion that shouldn t be surprising to anyone: masks help. They published a chart showing this. A right-wing propaganda publication got ahold of this, and claimed the numbers were doctored because there were two-different Y-axes. I set about to analyze the data myself from public sources, and produced graphs of various kinds using a single Y-axis and supporting the idea that the graphs were not, in fact, doctored. Here s one graph that s showing that:
In order to do that, I had imported COVID-19 data from various public sources. Many states in the US are large enough to have significant variation in COVID-19 conditions, and many of the source people look at don t show county-level data over time. I wanted to do that. Eventually, I wrote covid19db, which ingests data from a number of public sources and generates a SQLite database file. Using Github Actions, this file is automatically updated every morning and available for download. Or, you can download the code and generate a database yourself locally. Then, I wrote covid19ks, which generates various pretty graphs covering the data. These graphs, incidentally, turn out to highlight just how poorly the United States is doing compared to the rest of the industrialized world. I hope that these resources, and especially covid19db, might be useful to others that would like to analyze the data. The code isn t the prettiest since it was done in a hurry, but I think that functionally this is useful.

6 November 2017

John Goerzen: The Yellow House Phone Company (Featuring Asterisk and an 11-year-old)

Well Jacob, do you think we should set up our own pretend phone company in the house? We can DO THAT? Yes! Then yes. Yes! YES YES YESYESYESYES YES! Let s do it, dad! Not long ago, my parents had dug up the old phone I used back in the day. We still have a landline, and Jacob was having fun discovering how an analog phone works. I told him about the special number he could call to get the time and temperature read out to him. He discovered what happens if you call your own number and hang up. He figured out how to play Mary Had a Little Lamb using touchtone keys (after a slightly concerned lecture from me setting out some rules to make sure his musical dialing wouldn t result in any, well, dialing.) He was hooked. So I thought that taking it to the next level would be a good thing for a rainy day. I have run Asterisk before, though I had unfortunately gotten rid of most of my equipment some time back. But I found a great deal on a Cisco 186 ATA (Analog Telephone Adapter). It has two FXS lines (FXS ports simulate the phone company, and provide dialtone and ring voltage to a connected phone), and of course hooks up to the LAN. We plugged that in, and Jacob was amazed to see its web interface come up. I had to figure out how to configure it (unfortunately, it uses SCCP rather than SIP, and figuring out Asterisk s chan_skinny took some doing, but we got there.) I set up voicemail. He loved it. He promptly figured out how to record his own greetings. We set up a second phone on the other line, so he could call between them. The cordless phones in our house support SIP, so I configured one of them as a third line. He spent a long time leaving himself messages. IMG_3465 Pretty soon we both started having ideas. I set up extension 777, where he could call for the time. Then he wanted a way to get the weather forecast. Well, weather-util generates a text-based report. With it, a little sed and grep tweaking, the espeak TTS engine, and a little help from sox, I had a shell script worked up that would read back a forecast whenever he called a certain extension. He was super excited! That s great, dad! Can it also read weather alerts too? Sure! weather-util has a nice option just for that. Both boys cackled as the system tried to read out the NWS header (their timestamps like 201711031258 started with two hundred one billion ) Then I found an online source for streaming NOAA Weather Radio feeds Jacob enjoys listening to weather radio and I set up another extension he could call to listen to that. More delight! But it really took off when I asked him, Would you like to record your own menu? You mean those things where it says press 1 or 2 for this or that? Yes. WE CAN DO THAT? Oh yes! YES, LET S DO IT RIGHT NOW! So he recorded a menu, then came and hovered by me while I hacked up extensions.conf, then eagerly went back to the phone to try it. Oh the excitement of hearing hisown voice, and finding that it worked! Pretty soon he was designing sub-menus ( OK Dad, so we ll set it up so people can press 2 for the weather, and then choose if they want weather radio or the weather report. I m recording that now. Got it? ) He has informed me that next Saturday we will build an intercom system like we have at school. I m going to have to have some ideas on how to tie Squeezebox in with Asterisk to make that happen, I think. Maybe this will do.

22 August 2017

John Goerzen: The Eclipse

Highway US-81 in northern Kansas and southern Nebraska is normally a pleasant, sleepy sort of drive. It was upgraded to a 4-lane road not too long ago, but as far as 4-lane roads go, its traffic is typically light. For drives from Kansas to South Dakota, it makes a pleasant route. Yesterday was eclipse day. I strongly suspect that highway 81 had more traffic that day than it ever has before, or ever will again. For nearly the entire 3-hour drive to Geneva, NE, it was packed though mostly still moving at a good speed. And for our entire drive back, highway 81 and every other southbound road we used was so full it felt like rush hour in Dallas. (Well, not quite. Traffic was still moving.) I believe scenes like this were played out across the continent. I ve been taking a lot of photos, and writing about our new baby Martha lately. Now it s time to write a bit about some more adventures with Jacob and Oliver they re now in third and fifth grades in school. We had been planning to fly, and airports I called were either full, or were planning to park planes in the grass, or even shut down some runways to use for parking. The airport in the little town of Beatrice, NE (which I had visited twice before) was even going to have a temporary FAA control tower. At the last minute, due to some storm activity near home at departure time, we unloaded the plane and drove instead. The atmosphere at the fairgrounds in Geneva was festive. One family had brought bubbles for their kids and extras to share. IMG_20170821_113229 I had bought the boys a book about the eclipse, which they were reading before and during the event. They were both great, safe users of their eclipse glasses. IMG_20170821_124809 Jacob caught a toad, and played with it for awhile. He wanted to bring it home with us, but I convinced him to let me take a picture of him with his toad friend instead. IMG_20170821_124553 While we were waiting for totality, a number of buses from the local school district arrived. So by the time the big moment arrived, we could hear the distant roar of delight and applause from the school children gathered at the far end of the field, plus all the excitement nearby. Both boys were absolutely ecstatic to be witnessing it (and so was I!) Wow! Awesome! And simple cackles of delight were heard. On the drive home, they both kept talking about how amazing it was, and it was once in a lifetime. We enjoyed our eclipse neighbors the woman from San Antonio next to us, the surprise discovery of another family from just a few miles from us parked two cars down, even running into relatives at a restaurant on the way home. The applause from all around when it started and when it ended. And the feeling, which is hard to describe, of awe and amazement at the wonders of our world and our universe. There are many problems with the world right now, but somehow there s something right about people coming together from all over to enjoy it.

10 August 2017

John Goerzen: A new baby and deep smiles

IMG_2059 A month ago, we were waiting for our new baby; time seemed to stand still. Now she is here! Martha Goerzen was born recently, and she is doing well and growing! Laura and I have enjoyed moments of cuddling her, watching her stare at our faces, hearing her (hopefully) soft sounds as she falls asleep in our arms. It is also heart-warming to see Martha s older brothers take such an interest in her. Here is the first time Jacob got to hold her: IMG_1846 Oliver, who is a boy very much into sports, play involving police and firefighters, and such, has started adding aww and she s so cute! to his common vocabulary. He can be very insistent about interrupting me to hold her, too.

9 June 2017

John Goerzen: Fixing the Problems with Docker Images

I recently wrote about the challenges in securing Docker container contents, and in particular with keeping up-to-date with security patches from all over the Internet. Today I want to fix that. Besides security, there is a second problem: the common way of running things in Docker pretends to provide a traditional POSIX API and environment, but really doesn t. This is a big deal. Before diving into that, I want to explain something: I have often heard it said the Docker provides single-process containers. This is unambiguously false in almost every case. Any time you have a shell script inside Docker that calls cp or even ls, you are running a second process. Web servers from Apache to whatever else use processes or threads of various types to service multiple connections at once. Many Docker containers are single-application, but a process is a core part of the POSIX API, and very little software would work if it was limited to a single process. So this is my little plea for more precise language. OK, soapbox mode off. Now then, in a traditional Linux environment, besides your application, there are other key components of the system. These are usually missing in Docker containers. So today, I will fix this also. In my docker-debian-base images, I have prepared a system that still has only 11MB RAM overhead, makes minimal changes on top of Debian, and yet provides a very complete environment and API. Here s what you get: The above goes into my minimal image. Additional images add layers on top of it, and here are some of the features they add: All of the above, including the optional features, has an 11MB overhead on start. Not bad for so much, right? From here, you can layer on top all your usual Dockery things. You can still run one application per container. But you can now make sure your disk doesn t fill up from logs, run your database vacuuming commands at will, have your blog download its RSS feeds every few minutes, etc all from within the container, as it should be. Furthermore, you don t have to reinvent the wheel, because Debian already ships with things to take care of a lot of this out of the box and now those tools will just work. There is some popular work done in this area already by phusion s baseimage-docker. However, I made my own for these reasons: Finally a word on the choice to use sysvinit. It would have been simpler to use systemd here, since it is the default in Debian these days. Unfortunately, systemd requires you to poke some holes in the Docker security model, as well as mount a cgroups filesystem from the host. I didn t consider this acceptable, and sysvinit ran without these workarounds, so I went with it. With all this, Docker becomes a viable replacement for KVM for various services on my internal networks. I ll be writing about that later.

6 June 2017

John Goerzen: Family Spring: A Story in Photos

This has been a spring with times to relax, times to be busy, times of anticipation of a new baby, and times of enjoying our family. Rather than write a lot of words about it, I m telling the story in photos. To view, click here, then click Show Info in the upper right to see captions. You can pause it with the button in the lower left, and use arrow keys to advance. Alternatively, there s a captionless slideshow available here. Here s one photo to get you started: Happy about the little sister on the way

5 June 2017

John Goerzen: Flying with my brothers

Picture one Sunday morning. Three guys are seemingly-randomly walking into a Mennonite church in rural Nebraska. One with long hair and well-maintained clothes from the 70s. Another dressed well enough to be preaching. And the third simply dressed to be comfortable, with short hair showing evidence of having worn a headset for a couple of hours that morning. This was the scene as we made a spur-of-the-moment visit to that church which resulted in quite some surprise all around, since my brother knew a number of people there. For instance:
Pastor: Peter! What are you doing here? Peter: [jokingly] Is that how you greet visitors here?
And then, of course, Peter would say, Well, we were flying home from South Dakota and figured we d stop in at Beatrice for fuel. And drop in on you. Followed by some surprise that we would stop at their little airport (which is quite a nice one). This all happened because it was windy. This is the fun adventure of aviation. Sometimes you plan to go to Texas, but the weather there is terrible, so you discover a 100-year-old landmark in Indiana instead. Or sometimes, like a couple of weeks ago, we planned to fly straight home but spent a few hours exploring rural Nebraska. The three of us flew to Sioux Falls, SD, in a little Cessna to visit my uncle and aunt up there. On our flight up, we stopped at the little airport in Seward, NE. It was complete with this unique elevated deck. In my imagination, this is used for people to drink beer while watching the planes land. IMG_20170512_113323 In South Dakota, we had a weekend full of card and board games, horseshoes, and Crokinole with my uncle and aunt, who are always fun to visit. We had many memories of visits up there as children and the pleasant enjoyment of the fact that we didn t need an 8-hour drive to get there. We flew back with a huge bag of large rhubarb from their garden (that too is something of a tradition!) It was a fun weekend to spend with my brothers first time we d been able to do this in a long while. And it marked the 11th state I ve flown into, and over 17,000 miles of flying.

22 December 2016

John Goerzen: Singing with Kids

For four years now, we ve had a tradition: I go up to the attic one night, make a lot of noise, and pretend to be Santa. The boys don t think Santa is real, but they get a huge kick out of this anyway. The other day, this wound up with me singing a duet with my 7-year-old Oliver, and seeing a hugely delighted 10-year-old Jacob. All last week, the boys had been lobbying for me to be Santa . They aren t going to be able to be here on Christmas day this year, so I thought why not let them have some fun. I chose one present to give them early too. So, Saturday night, I said they could get ready for Santa. They found some cookies somewhere, got out some milk. And Oliver wrote this wonderful note to Santa : IMG_20161217_204244_cropped That is a note I m going to keep for a long time. He helpfully drew arrows pointing to the milk, cookies, and even the pen. He even started Santa s reply at the bottom! So, Saturday night, I snuck up to the attic, pretended to be Santa, and ate some cookies, drank some milk, and wrote Oliver a note. And I left a present. Jacob has been really getting into music lately, and Laura suggested I find something for the boys. I went looking for something that could record also, and came up with what has got to be a kid s dream: a karaoke machine. The particular one I found came with two microphones, a CD player, audio recording onto SD card (though it s a little dodgy), and a screen for showing words on any music that s karaoke-enhanced. Cue gasps of awe and excitement from the boys when we came down in our PJs and sweats at 6:45 Sunday morning to check it out. IMG_8895 Jacob excitedly began exploring all the knobs and options on it (they were particularly fond of the echo feature), while Oliver wanted to sing. So we found one of his favorite Christmas songs, and here he is singing it with me. IMG_8908 When you have a system with a line in, line out, and several microphone jacks, you can get creative. With a few bits of adapters from my attic, the headset I use for amateur radio worked with it perfectly. Add on a little mic extension cord, and pretty soon Oliver was pretending to be an announcer for a football game! IMG_8919 Then, Oliver decided he would act out a football game while Jacob and I were the announcers. Something tells me there will be much fun had with this over the next while! Just wait until I show them how to hook up a handheld radio to it in order to make a remotely-activated loudspeaker

9 December 2016

John Goerzen: Giant Concrete Arrows, Old Maps, and Fascinated Kids

Let me set a scene for you. Two children, ages 7 and 10, are jostling for position. There s a little pushing and shoving to get the best view. This is pretty typical for siblings this age. But what, you may wonder, are they trying to see? A TV? Video game? No. Jacob and Oliver were in a library, trying to see a 98-year-old map of the property owners in Township 23, range 1 East, Harvey County, Kansas. And they were super excited about it, somewhat to the astonishment of the research librarian, who I am sure is more used to children jostling for position over the DVDs in the youth section than poring over maps in the non-circulating historical archives! All this started with giant concrete arrows in the middle of nowhere. Nearly a century ago, the US government installed a series of arrows on the ground in Kansas. These were part of a primitive air navigation system that led to the first transcontinental airmail service. Every so often, people stumble upon these abandoned arrows and there is a big discussion online. Even Snopes has had to verify their authenticity (verdict: true). Entire websites exist to tracking and locating the remnants of these arrows. And as one of the early air mail routes went through Kansas, every so often people find these arrows around here. I got the idea that it would be fun to replicate a journey along the old routes. Maybe I d spot a few old arrows and such. So I started collecting old maps: a Contract Airmail Route #34 (CAM 34) map from 1927, aviation sectionals from 1933 and 1946, etc. I noticed an odd thing on these maps: the Newton, KS airport was on the other side of the city from its present location, sometimes even several miles outside the city. What was going on? 1927 Airway Map
(1927 Airway Map) 1946 Wichita Sectional
(1946 Wichita sectional) So one foggy morning, I explained my puzzlement to the boys. I highlighted all the mysteries: were these maps correct? Were there really two Newton airports at one time? How many airports were there, and where were they? Why did they move? What was the story behind them? And I offered them the chance to be history detectives with me. And oh my goodness, were they ever excited! We had some information from a very helpful person at the Harvey County Historical Museum (thanks Kris!) So we suspected one airport at least was established in 1927. We also had a description of its location, though given in terms of township maps. So the boys and I made the short drive over to the museum. We reviewed their property maps, though they were all a little older than the time period we needed. We looked through books and at pictures. Oliver pored over a railroad map of Newton from a century ago, fascinated. Jacob was excited to discover on one map that there used to be a train track down the middle of Main Street! I was interested that the present Newton Airport was once known as Wirt Field, rather to my surprise. I somehow suspect most 2nd and 4th graders spend a lot less excited time on their research floor! Then on to the Newton Public Library to see if they d have anything more and that s when the map that produced all the excitement came out. It, by itself, didn t answer the question, but by piecing together a number of pieces of information newspaper stories, information from the museum, and the maps we were able to come up with a pretty good explanation, much to their excitement. Apparently, a man named Tangeman owned a golf course (the golf links according to the paper), and around 1927 the city of Newton purchased it, because of all the planes that were landing there. They turned it into a real airport. Later, they bought land east of the city and moved the airport there. However, during World War II, the Navy took over that location, so they built a third airport a few miles west of the city but moved back to the current east location after the Navy returned that field to them. Of course, a project like this just opens up all sorts of extra questions: why isn t it called Wirt Field anymore? What s the story of Frank Wirt? What led the Navy to take over Newton s airport? Why did planes start landing on the golf course? Where precisely was the west airport located? How long was it there? (I found an aerial photo from 1956 that looks like it may have a plane in that general area, but it seems later than I d have expected) So now I have the boys interested in going to the courthouse with me to research the property records out there. Jacob is continually astounded that we are discovering things that aren t in Wikipedia, and also excited that he could be the one to add them. To be continued, apparently!

12 November 2016

John Goerzen: Morning in the Skies

IMG_8515 This is morning. Time to fly. Two boys, happy to open the hangar door and get the plane ready. It s been a year since I passed the FAA exam and became a pilot. Memories like these are my favorite reminders why I did. It is such fun to see people s faces light up with the joy of flying a few thousand feet above ground, of the beauty and freedom and peace of the skies. I ve flown 14 different passengers in that time; almost every flight I ve taken has been with people, which I enjoy. I ve heard wow or beautiful so many times, and said it myself even more times. IMG_6083 I ve landed in two state parks, visited any number of wonderful small towns, seen historic sites and placid lakes, ascended magically over forests and plains. I ve landed at 31 airports in 10 states, flying over 13,000 miles. airports Not once have I encountered anyone other than friendly, kind, and outgoing. And why not? After all, we re working around magic flying carpet machines, right? IMG_7867_bw (That s my brother before a flight with me, by the way) Some weeks it is easy to be glum. This week has been that way for many, myself included. But then, whether you are in the air or on the ground, if you pay attention, you realize we still live in a beautiful world with many wonderful people. And, in fact, I got a reminder of that this week. Not long after the election, I got in a plane, pushed in the throttle, and started the takeoff roll down a runway in the midst of an Indiana forest. The skies were the best kind of clear blue, and pretty soon I lifted off and could see for miles. Off in the distance, I could see the last cottony remnants of the morning s fog, lying still in the valleys, surrounding the little farms and houses as if to give them a loving hug. Wow. Sometimes the flight is bumpy. Sometimes the weather doesn t cooperate, and it doesn t happen at all. Sometimes you can fly across four large states and it feels as smooth as glass the whole way. Whatever happens, at the end of the day, the magic flying carpet machine gets locked up again. We go home, rest our heads on our soft pillows, and if we so choose, remember the beauty we experienced that day. Really, this post is not about being a pilot. This post is a reminder to pay attention to all that is beautiful in this world. It surrounds us; the smell of pine trees in the forest, the delight in the faces of children, the gentle breeze in our hair, the kind word from a stranger, the very sunrise. I hope that more of us will pay attention to the moments of clear skies and wind at our back. Even at those moments when we pull the hangar door shut. IMG_20160716_093627

Next.