The Dell T320
Almost 2 years ago I made a Dell PowerEdge T320 my home server [1]. It was a decent upgrade from the PowerEdge T110 II that I had used previously. One benefit of that system was that I needed more RAM and the PowerEdge T1xx series use unbuffered ECC RAM which is unreasonably expensive as well as the DIMMs tending to be smaller (no Load Reduced DIMMS) and only having 4 slots. As I had bought two T320s I put all the RAM in a single server getting a total of 96G and then put some cheap DIMMs in the other one and sold it with 48G.
The T320 has all the server reliability features including hot-swap redundant PSUs and hot-swap hard drives. One thing it doesn t have redundancy on is the motherboard management system known as iDRAC. 3 days ago my suburb had a power outage and when power came back on the T320 gave an error message about a failure to initialise the iDRAC and put all the fans on maximum speed, which is extremely loud. When a T320 is running in a room that s not particularly hot and it doesn t have SAS disks it s a very quiet server, one of the quietest I ve ever owned. When it goes into emergency cooling mode due to iDRAC failure it s loud enough to be heard from the other end of the house with doors closed in between.
Googling this failure gave a few possible answers. One was for some combinations of booting with the iDRAC button held down, turning off for a while and booting with the iDRAC button held down, etc (this didn t work). One was for putting a iDRAC firmware file on the SD card so iDRAC could automatically load it (which I tested even though I didn t have the flashing LED which indicates that it is likely to work, but it didn t do anything). The last was to enable serial console and configure the iDRAC to load new firmware via TFTP, I didn t get a iDRAC message from the serial console just the regular BIOS stuff.
So it looks like I ll have to sell the T320 for parts or find someone who wants to run it in it s current form. Currently to boot it I have to press F1 a few times to bypass BIOS messages (someone on the Internet reported making a device to key-jam F1). Then when it boots it s unreasonably loud, but apparently if you are really keen you can buy fans that have temperature sensors to control their own speed and bypass the motherboard control.
I d appreciate any advice on how to get this going. At this stage I m not going to go back to it but if I can get it working properly I can sell it for a decent price.
The HP Z640
I ve replaced the T320 with a HP Z640 workstation with 32G of RAM which I had recently bought to play with Stable Diffusion. There were hundreds of Z640 workstations with NVidia Quadro M6000 GPUs going on eBay for under $400 each, it looked like a company that did a lot of ML work had either gone bankrupt or upgraded all their employees systems. The price for the systems was surprisingly cheap, at regular eBay prices it seems that the GPU and the RAM go for about the same price as the system. It turned out that Stable Diffusion didn t like the video card in my setup for unknown reasons but also that the E5-1650v3 CPU could render an image in 15 minutes which is fast enough to test it out but not fast enough for serious use. I had been planning to blog about that.
When I bought the T320 server the DDR3 Registered ECC RAM it uses cost about $100 for 8*8G DIMMs, with 16G DIMMs being much more expensive. Now the DDR4 Registered ECC RAM used by my Z640 goes for about $120 for 2*16G DIMMs. In the near future I ll upgrade that system to 64G of RAM. It s disappointing that the Z640 only has 4 DIMM sockets per CPU so if you get a single-CPU version (as I did) and don t get the really expensive Load Reduced RAM then you are limited to 64G. So the supposed capacity benefit of going from DDR3 to DDR4 doesn t seem to apply to this upgrade.
The Z640 I got has 4 bays for hot-swap SAS/SATA 2.5 SSD/HDDs and 2 internal bays for 3.5 hard drives. The T320 has 8*3.5 hot swap bays and I had 3 hard drives in them in a BTRFS RAID-10 configuration. Currently I ve got one hard drive attached via USB but that s obviously not a long-term solution. The 3 hard drives are 4TB, they have worked since 4TB was a good size. I have a spare 8TB disk so I could buy a second ($179 for a shingle HDD) to make a 8TB RAID-1 array. The other option is to pay $369 for a 4TB SSD (or $389 for a 4TB NVMe + $10 for the PCIe card) to keep the 3 device RAID-10. As tempting as 4TB SSDs are I ll probably get a cheap 8TB disk which will take capacity from 6TB to 8TB and I could use some extra 4TB disks for backups.
I haven t played with the AMT/MEBX features on this system, I presume that they will work the same way as AMT/MEBX on the HP Z420 I ve used previously [2].
Update:
HP has free updates for the BIOS etc available here [3]. Unfortunately it seems to require loading a kernel module supplied by HP to do this. This is a bad thing, kernel code that isn t in the mainline kernel is either of poor quality or isn t licensed correctly.
I had to change my monitoring system to alert on temperatures over 100% of the high range while on the T320 I had it set at 95% of high and never got warnings. This is disappointing, enterprise class gear running in a reasonably cool environment (ambient temperature of about 22C) should be able to run all CPU cores at full performance without hitting 95% of the high temperature level.
I recently got a new NVME drive. My plan was to create a fresh Debian install on an F2FS root partition with compression for maximum performance. As it turns out, this is not entirely trivil to accomplish.
For one, the Debian installer does not support F2FS (here is my attempt to add it from 2021).
And even if it did, grub does not support F2FS with the extra_attr flag that is required for compression support (at least as of grub 2.06).
Luckily, we can install Debian anyway with all these these shiny new features when we go the manual road with debootstrap and using systemd-boot as bootloader.
We can break down the process into several steps:
Warning: Playing around with partitions can easily result in data if you mess up! Make sure to double check your commands and create a data backup if you don t feel confident about the process.
Creating the partition partble
The first step is to create the GPT partition table on the new drive. There are several tools to do this, I recommend the ArchWiki page on this topic for details.
For simplicity I just went with the GParted since it has an easy GUI, but feel free to use any other tool.
The layout should look like this:
Type Partition Suggested size
EFI /dev/nvme0n1p1 512MiB
Linux swap /dev/nvme0n1p2 1GiB
Linux fs /dev/nvme0n1p3 remainder
Notes:
The disk names are just an example and have to be adjusted for your system.
Don t set disk labels, they don t appear on the new install anyway and some UEFIs might not like it on your boot partition.
The size of the EFI partition can be smaller, in practive it s unlikely that you need more than 300 MiB. However some UEFIs might be buggy and if you ever want to install an additional kernel or something like memtest86+ you will be happy to have the extra space.
The swap partition can be omitted, it is not strictly needed. If you need more swap for some reason you can also add more using a swap file later (see ArchWiki page). If you know you want to use suspend-to-RAM, you want to increase the size to something more than the size of your memory.
If you used GParted, create the EFI partition as FAT32 and set the esp flag. For the root partition use ext4 or F2FS if available.
Creating and mounting the root partition
To create the root partition, we need to install the f2fs-tools first:
sudo apt install f2fs-tools
Now we can create the file system with the correct flags:
--arch sets the CPU architecture (see Debian Wiki).
--components sets the archive components, if you don t want non-free pacakges you might want to remove some entries here.
unstable is the Debian release, you might want to change that to testing or bookworm.
$DFS points to the mounting point of the root partition.
http://deb.debian.org/debian is the Debian mirror, you might want to set that to http://ftp.de.debian.org/debian or similar if you have a fast mirror in you area.
Chrooting into the system
Before we can chroot into the newly created system, we need to prepare and mount virtual kernel file systems. First create the directories:
Then bind-mount the directories from your system to the mount point of the new system:
sudo mount -v -B /dev $DFS/dev
sudo mount -v -B /dev/pts $DFS/dev/pts
sudo mount -v -B /proc $DFS/proc
sudo mount -v -B /sys $DFS/sys
sudo mount -v -B /run $DFS/run
sudo mount -v -B /sys/firmware/efi/efivars $DFS/sys/firmware/efi/efivars
As a last step, we need to mount the EFI partition:
sudo mount -v -B /dev/nvme0n1p1 $DFS/boot/efi
Now we can chroot into new system:
sudo chroot $DFS /bin/bash
Configure the base system
The first step in the chroot is setting the locales. We need this since we might leak the locales from our base system into the chroot and if this happens we get a lot of annoying warnings.
Now you have a fully functional Debian chroot! However, it is not bootable yet, so let s fix that.
Define static file system information
The first step is to make sure the system mounts all partitions on startup with the correct mount flags.
This is done in /etc/fstab (see ArchWiki page).
Open the file and change its content to:
# file system mount point type options dump pass
# NVME efi partition
UUID=XXXX-XXXX /boot/efi vfat umask=0077 0 0
# NVME swap
UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX none swap sw 0 0
# NVME main partition
UUID=XXXXXXXX-XXXX-XXXX-XXXX-XXXXXXXXXXXX / f2fs compress_algorithm=zstd:6,compress_chksum,atgc,gc_merge,lazytime 0 1
You need to fill in the UUIDs for the partitions. You can use
ls -lAph /dev/disk/by-uuid/
to match the UUIDs to the more readable disk name under /dev.
Installing the kernel and bootloader
First install the systemd-boot and efibootmgr packages:
apt install systemd-boot efibootmgr
Now we can install the bootloader:
bootctl install --path=/boot/efi
You can verify the procedure worked with
efibootmgr -v
The next step is to install the kernel, you can find a fitting image with:
apt search linux-image-*
In my case:
apt install linux-image-amd64
After the installation of the kernel, apt will add an entry for systemd-boot automatically. Neat!
However, since we are in a chroot the current settings are not bootable.
The first reason is the boot partition, which will likely be the one from your current system.
To change that, navigate to /boot/efi/loader/entries, it should contain one config file.
When you open this file, it should look something like this:
title Debian GNU/Linux bookworm/sid
version 6.1.0-3-amd64
machine-id 2967cafb6420ce7a2b99030163e2ee6a
sort-key debian
options root=PARTUUID=f81d4fae-7dec-11d0-a765-00a0c91e6bf6 ro systemd.machine_id=2967cafb6420ce7a2b99030163e2ee6a
linux /2967cafb6420ce7a2b99030163e2ee6a/6.1.0-3-amd64/linux
initrd /2967cafb6420ce7a2b99030163e2ee6a/6.1.0-3-amd64/initrd.img-6.1.0-3-amd64
The PARTUUID needs to point to the partition equivalent to /dev/nvme0n1p3 on your system. You can use
ls -lAph /dev/disk/by-partuuid/
to match the PARTUUIDs to the more readable disk name under /dev.
The second problem is the ro flag in options which tell the kernel to boot in read-only mode.
The default is rw, so you can just remove the ro flag.
Once this is fixed, the new system should be bootable. You can change the boot order with:
efibootmgr --bootorder
However, before we reboot we might add well add a user and install some basic software.
Protocol Buffers are a popular choice for serializing structured data
due to their compact size, fast processing speed, language independence, and
compatibility. There exist other alternatives, including Cap n Proto,
CBOR, and Avro.
Usually, data structures are described in a proto definition file
(.proto). The protoc compiler and a language-specific plugin convert it into
code:
Akvorado collects network flows using IPFIX or sFlow, decodes them
with GoFlow2, encodes them to Protocol Buffers, and sends them to
Kafka to be stored in a ClickHouse database. Collecting a new field,
such as source and destination MAC addresses, requires modifications in multiple
places, including the proto definition file and the ClickHouse migration code.
Moreover, the cost is paid by all users.1 It would be nice to have an
application-wide schema and let users enable or disable the fields they
need.
While the main goal is flexibility, we do not want to sacrifice performance. On
this front, this is quite a success: when upgrading from 1.6.4 to 1.7.1, the
decoding and encoding performance almost doubled!
Faster Protocol Buffers encoding
I use the following code to benchmark both the decoding and
encoding process. Initially, the Decode() method is a thin layer above
GoFlow2 producer and stores the decoded data into the in-memory structure
generated by protoc. Later, some of the data will be encoded directly during
flow decoding. This is why we measure both the decoding and the
encoding.2
The canonical Go implementation for Protocol Buffers,
google.golang.org/protobuf is not the most
efficient one. For a long time, people were relying on gogoprotobuf.
However, the project is now deprecated. A good replacement is
vtprotobuf.3
Dynamic Protocol Buffers encoding
We have our baseline. Let s see how to encode our Protocol Buffers without a
.proto file. The wire format is simple and rely a lot on variable-width
integers.
Variable-width integers, or varints, are an efficient way of encoding unsigned
integers using a variable number of bytes, from one to ten, with small values
using fewer bytes. They work by splitting integers into 7-bit payloads and using
the 8th bit as a continuation indicator, set to 1 for all payloads
except the last.
Variable-width integers encoding in Protocol Buffers
For our usage, we only need two types: variable-width
integers and byte sequences. A byte sequence is encoded by prefixing it by its
length as a varint. When a message is encoded, each key-value pair is turned
into a record consisting of a field number, a wire type, and a payload. The
field number and the wire type are encoded as a single variable-width integer
called a tag.
Message encoded with Protocol Buffers
We use the following low-level functions to build the output buffer:
Our schema abstraction contains the appropriate information to encode a message
(ProtobufIndex) and to generate a proto definition file (fields starting with
Protobuf):
typeColumnstructKeyColumnKeyNamestringDisabledbool// [ ]// For protobuf.ProtobufIndexprotowire.NumberProtobufTypeprotoreflect.Kind// Uint64Kind, Uint32Kind, ProtobufEnummap[int]stringProtobufEnumNamestringProtobufRepeatedbool
We have a few helper methods around the protowire functions to directly
encode the fields while decoding the flows. They skip disabled fields or
non-repeated fields already encoded. Here is an excerpt of the sFlow
decoder:
For fields that are required later in the pipeline, like source and destination
addresses, they are stored unencoded in a separate structure:
typeFlowMessagestructTimeReceiveduint64SamplingRateuint32// For exporter classifierExporterAddressnetip.Addr// For interface classifierInIfuint32OutIfuint32// For geolocation or BMPSrcAddrnetip.AddrDstAddrnetip.AddrNextHopnetip.Addr// Core component may override themSrcASuint32DstASuint32GotASPathbool// protobuf is the protobuf representation for the information not contained above.protobuf[]byteprotobufSetbitset.BitSet
The protobuf slice holds encoded data. It is initialized with a capacity of
500 bytes to avoid resizing during encoding. There is also some reserved room at
the beginning to be able to encode the total size as a variable-width integer.
Upon finalizing encoding, the remaining fields are added and the message length
is prefixed:
func(schema*Schema)ProtobufMarshal(bf*FlowMessage)[]byteschema.ProtobufAppendVarint(bf,ColumnTimeReceived,bf.TimeReceived)schema.ProtobufAppendVarint(bf,ColumnSamplingRate,uint64(bf.SamplingRate))schema.ProtobufAppendIP(bf,ColumnExporterAddress,bf.ExporterAddress)schema.ProtobufAppendVarint(bf,ColumnSrcAS,uint64(bf.SrcAS))schema.ProtobufAppendVarint(bf,ColumnDstAS,uint64(bf.DstAS))schema.ProtobufAppendIP(bf,ColumnSrcAddr,bf.SrcAddr)schema.ProtobufAppendIP(bf,ColumnDstAddr,bf.DstAddr)// Add length and move it as a prefixend:=len(bf.protobuf)payloadLen:=end-maxSizeVarintbf.protobuf=protowire.AppendVarint(bf.protobuf,uint64(payloadLen))sizeLen:=len(bf.protobuf)-endresult:=bf.protobuf[maxSizeVarint-sizeLen:end]copy(result,bf.protobuf[end:end+sizeLen])returnresult
Minimizing allocations is critical for maintaining encoding performance. The
benchmark tests should be run with the -benchmem flag to monitor allocation
numbers. Each allocation incurs an additional cost to the garbage collector. The
Go profiler is a valuable tool for identifying areas of code that can be
optimized:
$ gotest-run=__nothing__-bench=Netflow/with_encoding\> -benchmem-cpuprofileprofile.out\> akvorado/inlet/flow
goos: linuxgoarch: amd64pkg: akvorado/inlet/flowcpu: AMD Ryzen 5 5600X 6-Core ProcessorNetflow/with_encoding-12 143953 7955 ns/op 8256 B/op 134 allocs/opPASSok akvorado/inlet/flow 1.418s$ gotoolpprofprofile.out
File: flow.testType: cpuTime: Feb 4, 2023 at 8:12pm (CET)Duration: 1.41s, Total samples = 2.08s (147.96%)Entering interactive mode (type "help" for commands, "o" for options)(pprof)web
After using the internal schema instead of code generated from the
proto definition file, the performance improved. However, this comparison is not
entirely fair as less information is being decoded and previously GoFlow2 was
decoding to its own structure, which was then copied to our own version.
As for testing, we use github.com/jhump/protoreflect: the
protoparse package parses the proto definition file we generate and the
dynamic package decodes the messages. Check the ProtobufDecode()
method for more details.4
To get the final figures, I have also optimized the decoding in GoFlow2. It
was relying heavily on binary.Read(). This function may use
reflection in certain cases and each call allocates a byte array to read data.
Replacing it with a more efficient version provides the following
improvement:
It is now easier to collect new data and the inlet component is faster!
Notice
Some paragraphs were editorialized by ChatGPT, using
editorialize and keep it short as a prompt. The result was proofread by a
human for correctness. The main idea is that ChatGPT should be better at
English than me.
While empty fields are not serialized to Protocol Buffers, empty
columns in ClickHouse take some space, even if they compress well.
Moreover, unused fields are still decoded and they may clutter the
interface.
There is a similar function using NetFlow. NetFlow and IPFIX
protocols are less complex to decode than sFlow as they are using a simpler
TLV structure.
vtprotobuf generates more optimized Go code by removing an
abstraction layer. It directly generates the code encoding each field to
bytes:
The script will configure radvd to advertise v6 routes to a /64 of clients on each of the interfaces delimited by a comma in LAN_IFACE. Current interface limit is 9. I can patch something to go up to 0xff, but I don t want to at this time.
If you have some static configuration that you want to preserve, place them into the file pointed to by $ HEADER_FILE and it will be prepended to the generated /etc/radvd.conf file. The up script will remove the file and re-create it every time your ppp link comes up, so keep that in mind and don t manually modify the file and expect it to perist. You ve been warned :-)
So here s the script. It s also linked above if you want to curl it.
#!/bin/bash
#
# Copyright 2023 Collier Technologies LLC
# https://wp.c9h.org/cj/?p=1844
#
# # These variables are for the use of the scripts run by run-parts
# PPP_IFACE="$1"
# PPP_TTY="$2"
# PPP_SPEED="$3"
# PPP_LOCAL="$4"
# PPP_REMOTE="$5"
# PPP_IPPARAM="$6"
$ PPP_IFACE:="ppp0"
if [[ -z $ PPP_LOCAL ]]; then
PPP_LOCAL=$(curl -s https://icanhazip.com/)
fi
6rd_prefix="2602::/24"
#printf "%x%x:%x%x\n" $(echo $PPP_LOCAL tr . ' ')
# This configuration option came from CenturyLink:
# https://www.centurylink.com/home/help/internet/modems-and-routers/advanced-setup/enable-ipv6.html
border_router=205.171.2.64
TUNIFACE=6rdif
first_ipv6_network=$(printf "2602:%02x:%02x%02x:%02x00::" $(echo $ PPP_LOCAL tr . ' '))
ip tunnel del $ TUNIFACE 2>/dev/null
ip -l 5 -6 addr flush scope global dev $ IFACE
ip tunnel add $ TUNIFACE mode sit local $ PPP_LOCAL remote $ border_router
ip tunnel 6rd dev $ TUNIFACE 6rd-prefix "$ first_ipv6_network 0/64" 6rd-relay_prefix $ border_router /32
ip link set up dev $ TUNIFACE
ip -6 route add default dev $ TUNIFACE
ip addr add $ first_ipv6_network 1/64 dev $ TUNIFACE
rm /etc/radvd.conf
i=1
DEFAULT_FILE="/etc/default/centurylink-6rd"
if [[ -f $ DEFAULT_FILE ]]; then
source $ DEFAULT_FILE
if [[ -f $ HEADER_FILE ]]; then
cp $ HEADER_FILE /etc/radvd.conf
fi
else
LAN_IFACE="setThis"
fi
for IFACE in $( echo $ LAN_IFACE tr , ' ' ) ; do
ipv6_network=$(printf "2602:%02x:%02x%02x:%02x0$ i ::" $(echo $ PPP_LOCAL tr . ' '))
ip addr add $ ipv6_network 1/64 dev $ TUNIFACE
ip -6 route replace "$ ipv6_network /64" dev $ IFACE metric 1
echo "
interface $ IFACE
AdvSendAdvert on;
MinRtrAdvInterval 3;
MaxRtrAdvInterval 10;
AdvLinkMTU 1280;
prefix $ ipv6_network /64
AdvOnLink on;
AdvAutonomous on;
AdvRouterAddr on;
AdvValidLifetime 86400;
AdvPreferredLifetime 86400;
;
;
" >> /etc/radvd.conf
let "i=i+1"
done
systemctl restart radvd
I was passing through London on Friday and I had time to get to The Horror Show! Exhibit at Somerset House, over by Victoria embankment. I learned about the exhibition from Gazelle Twin s website: she wrote music for the final part of the exhibit, with Maxine Peake delivering a monologue over the top.
I loved it. It was almost like David Keenan s book England s Hidden Reverse had been made into an exhibition. It s divided into three themes: Monster, Ghost and Witch, although the themes are loose with lots of cross over and threads running throughout.
Thatcher (right)
Derek Jarman's Blue
The show is littered with artefacts from culturally significant works from a recently-bygone age. There s a strong theme of hauntology. The artefacts that stuck out to me include some of Leigh Bowery s costumes; the spitting image doll of Thatcher; the cover of a Radio Times featuring the cult BBC drama Threads; Nigel Kneale s the stone tape VHS alongside more recent artefacts from Inside Number 9 and a MR James adaptation by Mark Gatiss (a clear thread of inspiration there); various bits relating to Derek Jarman including the complete Blue screening in a separate room; Mica Levi s eerie score to under the skin playing in the Witch section, and various artefacts and references to the underground music group Coil. Too much to mention!
Having said that, the things which left the most lasting impression are the some of the stand-alone works of art: the charity box boy model staring fractured and refracted through a glass door (above); the glossy drip of blood running down a wall; the performance piece on a Betamax tape; the self portrait of a drowned man; the final piece, "The Neon Heiroglyph".
Jonathan Jones at the Guardian liked it.
The show runs until the 19th February and is worth a visit if you can.
There's a bunch of ways you can store cryptographic keys. The most obvious is to just stick them on disk, but that has the downside that anyone with access to the system could just steal them and do whatever they wanted with them. At the far end of the scale you have Hardware Security Modules (HSMs), hardware devices that are specially designed to self destruct if you try to take them apart and extract the keys, and which will generate an audit trail of every key operation. In between you have things like smartcards, TPMs, Yubikeys, and other platform secure enclaves - devices that don't allow arbitrary access to keys, but which don't offer the same level of assurance as an actual HSM (and are, as a result, orders of magnitude cheaper).
The problem with all of these hardware approaches is that they have entirely different communication mechanisms. The industry realised this wasn't ideal, and in 1994 RSA released version 1 of the PKCS#11 specification. This defines a C interface with a single entry point - C_GetFunctionList. Applications call this and are given a structure containing function pointers, with each entry corresponding to a PKCS#11 function. The application can then simply call the appropriate function pointer to trigger the desired functionality, such as "Tell me how many keys you have" and "Sign this, please". This is both an example of C not just being a programming language and also of you having to shove a bunch of vendor-supplied code into your security critical tooling, but what could possibly go wrong.
(Linux distros work around this problem by using p11-kit, which is a daemon that speaks d-bus and loads PKCS#11 modules for you. You can either speak to it directly over d-bus, or for apps that only speak PKCS#11 you can load a module that just transports the PKCS#11 commands over d-bus. This moves the weird vendor C code out of process, and also means you can deal with these modules without having to speak the C ABI, so everyone wins)
One of my work tasks at the moment is helping secure SSH keys, ensuring that they're only issued to appropriate machines and can't be stolen afterwards. For Windows and Linux machines we can stick them in the TPM, but Macs don't have a TPM as such. Instead, there's the Secure Enclave - part of the T2 security chip on x86 Macs, and directly integrated into the M-series SoCs. It doesn't have anywhere near as many features as a TPM, let alone an HSM, but it can generate NIST curve elliptic curve keys and sign things with them and that's good enough. Things are made more complicated by Apple only allowing keys to be used by the app that generated them, so it's hard for applications to generate keys on behalf of each other. This can be mitigated by using CryptoTokenKit, an interface that allows apps to present tokens to the systemwide keychain. Although this is intended for allowing a generic interface for access to such tokens (kind of like PKCS#11), an app can generate its own keys in the Secure Enclave and then expose them to other apps via the keychain through CryptoTokenKit.
Of course, applications then need to know how to communicate with the keychain. Browsers mostly do so, and Apple's version of SSH can to an extent. Unfortunately, that extent is "Retrieve passwords to unlock on-disk keys", which doesn't help in our case. PKCS#11 comes to the rescue here! Apple ship a module called ssh-keychain.dylib, a PKCS#11 module that's intended to allow SSH to use keys that are present in the system keychain. Unfortunately it's not super well maintained - it got broken when Big Sur moved all the system libraries into a cache, but got fixed up a few releases later. Unfortunately every time I tested it with our CryptoTokenKit provider (and also when I retried with SecureEnclaveToken to make sure it wasn't just our code being broken), ssh would tell me "provider /usr/lib/ssh-keychain.dylib returned no slots" which is not especially helpful. Finally I realised that it was actually generating more debug output, but it was being sent to the system debug logs rather than the ssh debug output. Well, when I say "more debug output", I mean "Certificate []: algorithm is not supported, ignoring it", which still doesn't tell me all that much. So I stuck it in Ghidra and searched for that string, and the line above it was
with it immediately failing if the key isn't RSA. Which it isn't, since the Secure Enclave doesn't support RSA. Apple's PKCS#11 module appears incapable of making use of keys generated on Apple's hardware.
There's a couple of ways of dealing with this. The first, which is taken by projects like Secretive, is to implement the SSH agent protocol and have SSH delegate key management to that agent, which can then speak to the keychain. But if you want this to work in all cases you need to implement all the functionality in the existing ssh-agent, and that seems like a bunch of work. The second is to implement a PKCS#11 module, which sounds like less work but probably more mental anguish. I'll figure that out tomorrow.
Parallelizing the compilation of a large codebase is a breeze with distcc, which allows you to spread the load across multiple nodes and speed up the compilation time.Here s a sample network topology for a distributed build:Install distcc on the three Debian/Ubuntu-based nodes:
# apt install distcc
Edit /etc/default/distcc and set:
STARTDISTCC="true"
# Customize for your environment ALLOWEDNETS="192.168.2.0/24"
# Specify your network device LISTENER="192.168.2.146"
Additionally, the JOBS and NICE variables can be tweaked to suit the compute power that you have available.Start distcc:
# systemctl start distcc
Do the same all the nodes, and if you have a firewall enabled with ufw, you will need to open up the port 3632 to the master node.
# ufw allow 3632/tcp
Additionally, if you d like to use ssh over untrusted networks so code and communication with the worker nodes happen over a secure channel, ensure that SSH is running and is opened up to the master node in the same manner as above with the key of the master node in ~/.ssh/authorized_keys of the worker nodes. Opening port 3632 in this manner is a security hole, so take precautions over untrusted networks.Back in the master node, setup a DISTCC_HOSTS environment variable that lists the worker nodes, including the master node. Note the order of the hosts, as it is important. The first host will be more heavily used, and distcc has no way of knowing the capacity and capability of the hosts, so specify the most powerful host first.
At this point, you re ready to compile.Go to your codebase, in this case we use the Linux kernel source code for the purpose of example.
$ make tinyconfig $ time make -j$(nproc) CC=distcc bzImage
On another terminal, you can monitor the status of the distributed compilation with distmoncc-text or tools such as top or bpytop.Network throughput and latency will be a big factor in how much distcc will help speed up your build process. Using ssh may additionally introduce overhead, so play with the variables to see how much distcc can help speed up or optimize the build for your specific scenario. You may want to additionally consider ccache to speed up the build process.There are some aspects of the build process that are not effectively parallizable in this manner, such as the final linking step of the executable, for which you will not see any performance improvement with distcc.Give distcc a spin, and put any spare compute you have lying around in your home lab to good use.
We plan to spend three days continuing to the grow of the Reproducible Builds effort. As in previous events, the exact content of the meeting will be shaped by the participants. And, as mentioned in Holger Levsen s post to our mailing list, the dates have been booked and confirmed with the venue, so if you are considering attending, please reserve these dates in your calendar today.
R my Gr nblatt, an associate professor in the T l com Sud-Paris engineering school wrote up his pain points of using Nix and NixOS. Although some of the points do not touch on reproducible builds, R my touches on problems he has encountered with the different kinds of reproducibility that these distributions appear to promise including configuration files affecting the behaviour of systems, the fragility of upstream sources as well as the conventional idea of binary reproducibility.
Morten Linderud reported that he is quietly optimistic that if Go programming language resolves all of its issues with reproducible builds (tracking issue) then the Go binaries distributed from Google and by Arch Linux may be bit-for-bit identical. It s just a bit early to sorta figure out what roadblocks there are. [But] Go bootstraps itself every build, so in theory I think it should be possible.
On December 15th, Holger Levsen published an in-depth interview he performed with David A. Wheeler on supply-chain security and reproducible builds, but it also touches on the biggest challenges in computing as well.
This is part of a larger series of posts featuring the projects, companies and individuals who support the Reproducible Builds project. Other instalments include an article featuring the Civil Infrastructure Platform project and followed this up with a post about the Ford Foundation as well as a recent ones about ARDC, the Google Open Source Security Team (GOSST), Jan Nieuwenhuizen on Bootstrappable Builds, GNU Mes and GNU Guix and Hans-Christoph Steiner of the F-Droid project.
A number of changes were made to the Reproducible Builds website and documentation this month, including FC Stegerman adding an F-Droid/apksigcopier example to our embedded signatures page [], Holger Levsen making a large number of changes related to the 2022 summit in Venice as well as 2023 s summit in Hamburg [][][][] and Simon Butler updated our publications page [][].
On our mailing list this month, James Addison asked a question about whether there has been any effort to trace the files used by a build system in order to identify the corresponding build-dependency packages. [] In addition, Bernhard M. Wiedemann then posed a thought-provoking question asking How to talk to skeptics? , which was occasioned by a colleague who had published a blog post in May 2021 skeptical of reproducible builds. The thread generated a number of replies.
Android news
obfusk (FC Stegerman) performed a thought-provoking review of tools designed to determine the difference between two different .apk files shipped by a number of free-software instant messenger applications.
These scripts are often necessary in the Android/APK ecosystem due to these files containing embedded signatures so the conventional bit-for-bit comparison cannot be used. After detailing a litany of issues with these tools, they come to the conclusion that:
It s quite possible these messengers actually have reproducible builds, but the verification scripts they use don t actually allow us to verify whether they do.
This reflects the consensus view within the Reproducible Builds project: pursuing a situation in language or package ecosystems where binaries are bit-for-bit identical (over requiring a bespoke ecosystem-specific tool) is not a luxury demanded by purist engineers, but rather the only practical way to demonstrate reproducibility. obfusk also announced the first release of their own set of tools on our mailing list.
Related to this, obfusk also posted to an issue filed against Mastodon regarding the difficulties of creating bit-by-bit identical APKs, especially with respect to copying v2/v3 APK signatures created by different tools; they also reported that some APK ordering differences were not caused by building on macOS after all, but by using Android Studio [] and that F-Droid added 16 more apps published with Reproducible Builds in December.
Debian
As mentioned in last months report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads).
During December, meetings were held on the 1st, 8th, 15th, 22nd and 29th, resulting in a large number of uploads and bugs being addressed:
Upstream patches
The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:
Testing framework
The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
The osuosl167 machine is no longer a openqa-worker node anymore. [][]
Detect problems with APT repository signatures [] and update a repository signing key [].
Only install the foot-terminfo package on Debian systems. []
In addition, Mattia Rizzolo added support for the version of diffoscope in Debian stretch which doesn t support the --timeout flag. [][]
diffoscopediffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 228, 229 and 230 to Debian:
Fix compatibility with file(1) version 5.43, with thanks to Christoph Biedl. []
Skip the test_html.py::test_diff test if html2text is not installed. (#1026034)
Update copyright years. []
In addition, Jelle van der Waa added support for Berkeley DB version 6. []
Orthogonal to this, Holger Levsen bumped the Debian Standards-Version on all of our packages, including diffoscope [], strip-nondeterminism [], disorderfs [] and reprotest [].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via:
I m migrating some self-hosted virtual machines to Trisquel, and noticed that Trisquel does not offer cloud-images similar to the Debian Cloud and Ubuntu Cloud images. Thus my earlier approach based on virt-install --cloud-init and cloud-localds does not work with Trisquel. While I hope that Trisquel will eventually publish cloud-compatible images, I wanted to document an alternative approach for Trisquel based on preseeding. This is how I used to install Debian and Ubuntu in the old days, and the automated preseed method is best documented in the Debian installation manual. I was hoping to forget about the preseed format, but maybe it will become one of those legacy technologies that never really disappears? Like FAT16 and 8-bit microcontrollers.
Below I assume you have a virtual machine host server up that runs libvirt and has virt-install and similar tools; install them with the following command. I run a pre-release version of Trisquel 11 aramo on my VM-host, but I believe any recent dpkg-based distribution like Trisquel 9/10, PureOS 10, Debian 11 or Ubuntu 20.04/22.04 would work.
The approach can install Trisquel 9 (etiona), Trisquel 10 (nabia) and the pre-release of Trisquel 11. First download and verify the integrity of the netinst images that we will need. Unfortunately the Trisquel 11 netinst beta image does not have any checksum or signature available.
I have developed the following fairly minimal preseed file that works with all three Trisquel releases. Compare it against the official Trisquel 11 preseed skeleton and the Debian 11 example preseed file. You should modify obvious things like SSH key, host/IP settings, partition layout and decide for yourself how to deal with passwords. While Ubuntu/Trisquel usually wants to setup a user account, I prefer to login as root hence setting passwd/root-login to true and passwd/make-user to false.
Use the file above as a skeleton for preparing a VM-specific preseed file as follows. The environment variables HOST and IPS will be used later on too.
The following script is used to prepare the ISO images with the preseed file that we will need. This script is inspired by the Debian Wiki Preseed EditIso page and the Trisquel ISO customization wiki page. There are a couple of variations based on earlier works. Paths are updated to match the Trisquel netinst ISO layout, which differ slightly from Debian. We modify isolinux.cfg to boot the auto label without a timeout. On Trisquel 11 the auto boot label exists, but on Trisquel 9 and Trisquel 10 it does not exist so we add it in order to be able to start the automated preseed installation.
root@trana:~# cat gen-preseed-iso
#!/bin/sh
# Copyright (C) 2018-2022 Simon Josefsson -- GPLv3+
# https://wiki.debian.org/DebianInstaller/Preseed/EditIso
# https://trisquel.info/en/wiki/customizing-trisquel-iso
set -e
set -x
ISO="$1"
PRESEED="$2"
OUTISO="$3"
LASTPWD="$PWD"
test -f "$ISO"
test -f "$PRESEED"
test ! -f "$OUTISO"
TMPDIR=$(mktemp -d)
mkdir "$TMPDIR/mnt"
mkdir "$TMPDIR/tmp"
cp "$PRESEED" "$TMPDIR"/preseed.cfg
cd "$TMPDIR"
mount "$ISO" mnt/
cp -rT mnt/ tmp/
umount mnt/
chmod +w -R tmp/
gunzip tmp/initrd.gz
echo preseed.cfg cpio -H newc -o -A -F tmp/initrd
gzip tmp/initrd
chmod -w -R tmp/
sed -i "s/timeout 0/timeout 1/" tmp/isolinux.cfg
sed -i "s/default vesamenu.c32/default auto/" tmp/isolinux.cfg
if ! grep -q auto tmp/adtxt.cfg; then
cat<<EOF >> tmp/adtxt.cfg
label auto
menu label ^Automated install
kernel linux
append auto=true priority=critical vga=788 initrd=initrd.gz --- quiet
EOF
fi
cd tmp/
find -follow -type f xargs md5sum > md5sum.txt
cd ..
cd "$LASTPWD"
genisoimage -r -J -b isolinux.bin -c boot.cat \
-no-emul-boot -boot-load-size 4 -boot-info-table \
-o "$OUTISO" "$TMPDIR/tmp/"
rm -rf "$TMPDIR"
exit 0
^D
root@trana:~# chmod +x gen-preseed-iso
root@trana:~#
Next run the command on one of the downloaded ISO image and the generated preseed file.
root@trana:~# ./gen-preseed-iso /root/iso/trisquel-netinst_10.0.1_amd64.iso vm-$HOST.preseed vm-$HOST.iso
+ ISO=/root/iso/trisquel-netinst_10.0.1_amd64.iso
+ PRESEED=vm-foo.preseed
+ OUTISO=vm-foo.iso
+ LASTPWD=/root
+ test -f /root/iso/trisquel-netinst_10.0.1_amd64.iso
+ test -f vm-foo.preseed
+ test ! -f vm-foo.iso
+ mktemp -d
+ TMPDIR=/tmp/tmp.mNEprT4Tx9
+ mkdir /tmp/tmp.mNEprT4Tx9/mnt
+ mkdir /tmp/tmp.mNEprT4Tx9/tmp
+ cp vm-foo.preseed /tmp/tmp.mNEprT4Tx9/preseed.cfg
+ cd /tmp/tmp.mNEprT4Tx9
+ mount /root/iso/trisquel-netinst_10.0.1_amd64.iso mnt/
mount: /tmp/tmp.mNEprT4Tx9/mnt: WARNING: source write-protected, mounted read-only.
+ cp -rT mnt/ tmp/
+ umount mnt/
+ chmod +w -R tmp/
+ gunzip tmp/initrd.gz
+ echo preseed.cfg
+ cpio -H newc -o -A -F tmp/initrd
5 blocks
+ gzip tmp/initrd
+ chmod -w -R tmp/
+ sed -i s/timeout 0/timeout 1/ tmp/isolinux.cfg
+ sed -i s/default vesamenu.c32/default auto/ tmp/isolinux.cfg
+ grep -q auto tmp/adtxt.cfg
+ cat
+ cd tmp/
+ find -follow -type f
+ xargs md5sum
+ cd ..
+ cd /root
+ genisoimage -r -J -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o vm-foo.iso /tmp/tmp.mNEprT4Tx9/tmp/
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using GCRY_000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha512.mod (gcry_sha256.mod)
Using XNU_U000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/xnu_uuid.mod (xnu_uuid_test.mod)
Using PASSW000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/password_pbkdf2.mod (password.mod)
Using PART_000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/part_sunpc.mod (part_sun.mod)
Using USBSE000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_pl2303.mod (usbserial_ftdi.mod)
Using USBSE001.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_ftdi.mod (usbserial_usbdebug.mod)
Using VIDEO000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/videotest.mod (videotest_checksum.mod)
Using GFXTE000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gfxterm_background.mod (gfxterm_menu.mod)
Using GCRY_001.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha256.mod (gcry_sha1.mod)
Using MULTI000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/multiboot2.mod (multiboot.mod)
Using USBSE002.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_usbdebug.mod (usbserial_common.mod)
Using MDRAI000.MOD;1 for /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/mdraid09.mod (mdraid09_be.mod)
Size of boot image is 4 sectors -> No emulation
22.89% done, estimate finish Thu Dec 29 23:36:18 2022
45.70% done, estimate finish Thu Dec 29 23:36:18 2022
68.56% done, estimate finish Thu Dec 29 23:36:18 2022
91.45% done, estimate finish Thu Dec 29 23:36:18 2022
Total translation table size: 2048
Total rockridge attributes bytes: 24816
Total directory bytes: 40960
Path table size(bytes): 64
Max brk space used 46000
21885 extents written (42 MB)
+ rm -rf /tmp/tmp.mNEprT4Tx9
+ exit 0
root@trana:~#
Now the image is ready for installation, so invoke virt-install as follows. The machine will start directly, launching the preseed automatic installation. At this point, I usually click on the virtual machine in virt-manager to follow screen output until the installation has finished. If everything works OK the machines comes up and I can ssh into it.
root@trana:~# virt-install --name $HOST --disk vm-$HOST.img,size=5 --cdrom vm-$HOST.iso --osinfo linux2020 --autostart --noautoconsole --wait
Using linux2020 default --memory 4096
Starting install...
Allocating 'vm-foo.img' 0 B 00:00:00 ...
Creating domain... 0 B 00:00:00
Domain is still running. Installation may be in progress.
Waiting for the installation to complete.
Domain has shutdown. Continuing.
Domain creation completed.
Restarting guest.
root@trana:~#
There are some problems that I have noticed that would be nice to fix, but are easy to work around. The first is that at the end of the installation of Trisquel 9 and Trisquel 10, the VM hangs after displaying Sent SIGKILL to all processes followed by Requesting system reboot. I kill the VM manually using virsh destroy foo and start it up again using virsh start foo. For production use I expect to be running Trisquel 11, where the problem doesn t happen, so this does not bother me enough to debug further. The remaining issue that once booted, a Trisquel 11 VM has lost its DNS nameserver configuration, presumably due to poor integration with systemd-resolved. Both Trisquel 9 and Trisquel 10 uses systemd-resolved where DNS works after first boot, so this appears to be a Trisquel 11 bug. You can work around it with rm -f /etc/resolv.conf && echo 'nameserver A.B.C.D' > /etc/resolv.conf or drink the systemd Kool-Aid. If you want to clean up and re-start the process, here is how you wipe out what you did. After this, you may run the sed, ./gen-preseed-iso and virt-install commands again. Remember, use virsh shutdown foo to gracefully shutdown a VM.
I use GnuPG to compute cryptographic signatures for my emails, git commits/tags, and software release artifacts (tarballs). Part of GnuPG is gpg-agent which talks to OpenSSH, which I login to remote servers and to clone git repositories. I dislike storing cryptographic keys on general-purpose machines, and have used hardware-backed OpenPGP keys since around 2006 when I got a FSFE Fellowship Card. GnuPG via gpg-agent handles this well, and the private key never leaves the hardware. The ZeitControl cards were (to my knowledge) proprietary hardware running some non-free operating system and OpenPGP implementation. By late 2012 the YubiKey NEO supported OpenPGP, and while the hardware and operating system on it was not free, at least it ran a free software OpenPGP implementation and eventually I setup my primary RSA key on it. This worked well for a couple of years, and when I in 2019 wished to migrate to a new key, the FST-01G device with open hardware running free software that supported Ed25519 had become available. I created a key and have been using the FST-01G on my main laptop since then. This little device has been working, the signature counter on it is around 14501 which means around 10 signatures/day since then!
Currently I am in the process of migrating towards a new laptop, and moving the FST-01G device between them is cumbersome, especially if I want to use both laptops in parallel. That s why I need to setup a new hardware device to hold my OpenPGP key, which can go with my new laptop. This is a good time to re-visit alternatives. I quickly decided that I did not want to create a new key, only to import my current one to keep everything working. My requirements on the device to chose hasn t changed since 2019, see my summary at the end of the earlier blog post. Unfortunately the FST-01G is out of stock and the newer FST-01SZ has also out of stock. While Tillitis looks promising (and I have one to play with), it does not support OpenPGP (yet). What to do? Fortunately, I found some FST-01SZ device in my drawer, and decided to use it pending a more satisfactory answer. Hopefully once I get around to generate a new OpenPGP key in a year or so, I will do a better survey of options that are available on the market then. What are your (freedom-respecting) OpenPGP hardware recommendations?
FST-01SZ circuit board
Similar to setting up the FST-01G, the FST-01SZ needs to be setup before use. I m doing the following from Trisquel 11 but any GNU/Linux system would work. When the device is inserted at first time, some kernel messages are shown (see /var/log/syslog or use the dmesg command):
usb 3-3: new full-speed USB device number 39 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0004, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: Fraucheky
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-0.0
usb-storage 3-3:1.0: USB Mass Storage device detected
scsi host1: usb-storage 3-3:1.0
scsi 1:0:0:0: Direct-Access FSIJ Fraucheky 1.0 PQ: 0 ANSI: 0
sd 1:0:0:0: Attached scsi generic sg2 type 0
sd 1:0:0:0: [sdc] 128 512-byte logical blocks: (65.5 kB/64.0 KiB)
sd 1:0:0:0: [sdc] Write Protect is off
sd 1:0:0:0: [sdc] Mode Sense: 03 00 00 00
sd 1:0:0:0: [sdc] No Caching mode page found
sd 1:0:0:0: [sdc] Assuming drive cache: write through
sdc:
sd 1:0:0:0: [sdc] Attached SCSI removable disk
Interestingly, the NeuG software installed on the device I got appears to be version 1.0.9:
jas@kaka:~$ head /media/jas/Fraucheky/README
NeuG - a true random number generator implementation
Version 1.0.9
2018-11-20
Niibe Yutaka
Free Software Initiative of Japan
What's NeuG?
============
jas@kaka:~$
I could not find version 1.0.9 published anywhere, but the device came with a SD-card that contain a copy of the source, so I uploaded it until a more canonical place is located. Putting the device in the serial mode can be done using a sudo eject /dev/sdc command which results in the following syslog output.
usb 3-3: reset full-speed USB device number 39 using xhci_hcd
usb 3-3: device firmware changed
usb 3-3: USB disconnect, device number 39
sdc: detected capacity change from 128 to 0
usb 3-3: new full-speed USB device number 40 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0001, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: NeuG True RNG
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-1.0.9-42315277
cdc_acm 3-3:1.0: ttyACM0: USB ACM device
Now download Gnuk, verify its integrity and build it. You may need some additional packages installed, try apt-get install gcc-arm-none-eabi openocd python3-usb. As you can see, I m using the stable 1.2 branch of Gnuk, currently on version 1.2.20. The ./configure parameters deserve some explanation. The kdf_do=required sets up the device to require KDF usage. The --enable-factory-reset allows me to use the command factory-reset (with admin PIN) inside gpg --card-edit to completely wipe the card. Some may consider that too dangerous, but my view is that if someone has your admin PIN it is game over anyway. The --vidpid=234b:0000 is specifies the USB VID/PID to use, and --target=FST_01SZ is critical to set the platform (you ll may brick the device if you pick the wrong --target setting).
jas@kaka:~/src$ rm -rf gnuk neug
jas@kaka:~/src$ git clone https://gitlab.com/jas/neug.git
Cloning into 'neug'...
remote: Enumerating objects: 2034, done.
remote: Counting objects: 100% (2034/2034), done.
remote: Compressing objects: 100% (603/603), done.
remote: Total 2034 (delta 1405), reused 2013 (delta 1405), pack-reused 0
Receiving objects: 100% (2034/2034), 910.34 KiB 3.50 MiB/s, done.
Resolving deltas: 100% (1405/1405), done.
jas@kaka:~/src$ git clone https://salsa.debian.org/gnuk-team/gnuk/gnuk.git
Cloning into 'gnuk'...
remote: Enumerating objects: 13765, done.
remote: Counting objects: 100% (959/959), done.
remote: Compressing objects: 100% (337/337), done.
remote: Total 13765 (delta 629), reused 907 (delta 599), pack-reused 12806
Receiving objects: 100% (13765/13765), 12.59 MiB 3.05 MiB/s, done.
Resolving deltas: 100% (10077/10077), done.
jas@kaka:~/src$ cd neug
jas@kaka:~/src/neug$ git describe
release/1.0.9
jas@kaka:~/src/neug$ git tag -v git describe
object 5d51022a97a5b7358d0ea62bbbc00628c6cec06a
type commit
tag release/1.0.9
tagger NIIBE Yutaka <gniibe@fsij.org> 1542701768 +0900
Version 1.0.9.
gpg: Signature made Tue Nov 20 09:16:08 2018 CET
gpg: using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg: issuer "gniibe@fsij.org"
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg: aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD 323C E267 B052 364F 028D
jas@kaka:~/src/neug$ cd ../gnuk/
jas@kaka:~/src/gnuk$ git checkout STABLE-BRANCH-1-2
Branch 'STABLE-BRANCH-1-2' set up to track remote branch 'STABLE-BRANCH-1-2' from 'origin'.
Switched to a new branch 'STABLE-BRANCH-1-2'
jas@kaka:~/src/gnuk$ git describe
release/1.2.20
jas@kaka:~/src/gnuk$ git tag -v git describe
object 9d3c08bd2beb73ce942b016d4328f0a596096c02
type commit
tag release/1.2.20
tagger NIIBE Yutaka <gniibe@fsij.org> 1650594032 +0900
Gnuk: Version 1.2.20
gpg: Signature made Fri Apr 22 04:20:32 2022 CEST
gpg: using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg: aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD 323C E267 B052 364F 028D
jas@kaka:~/src/gnuk/src$ git submodule update --init
Submodule 'chopstx' (https://salsa.debian.org/gnuk-team/chopstx/chopstx.git) registered for path '../chopstx'
Cloning into '/home/jas/src/gnuk/chopstx'...
Submodule path '../chopstx': checked out 'e12a7e0bb3f004c7bca41cfdb24c8b66daf3db89'
jas@kaka:~/src/gnuk$ cd chopstx
jas@kaka:~/src/gnuk/chopstx$ git describe
release/1.21
jas@kaka:~/src/gnuk/chopstx$ git tag -v git describe
object e12a7e0bb3f004c7bca41cfdb24c8b66daf3db89
type commit
tag release/1.21
tagger NIIBE Yutaka <gniibe@fsij.org> 1650593697 +0900
Chopstx: Version 1.21
gpg: Signature made Fri Apr 22 04:14:57 2022 CEST
gpg: using EDDSA key 249CB3771750745D5CDD323CE267B052364F028D
gpg: Good signature from "NIIBE Yutaka <gniibe@fsij.org>" [unknown]
gpg: aka "NIIBE Yutaka <gniibe@debian.org>" [unknown]
gpg: WARNING: This key is not certified with a trusted signature!
gpg: There is no indication that the signature belongs to the owner.
Primary key fingerprint: 249C B377 1750 745D 5CDD 323C E267 B052 364F 028D
jas@kaka:~/src/gnuk/chopstx$ cd ../src
jas@kaka:~/src/gnuk/src$ kdf_do=required ./configure --enable-factory-reset --vidpid=234b:0000 --target=FST_01SZ
Header file is: board-fst-01sz.h
Debug option disabled
Configured for bare system (no-DFU)
PIN pad option disabled
CERT.3 Data Object is NOT supported
Card insert/removal by HID device is NOT supported
Life cycle management is supported
Acknowledge button is supported
KDF DO is required before key import/generation
jas@kaka:~/src/gnuk/src$ make less
jas@kaka:~/src/gnuk/src$ cd ../regnual/
jas@kaka:~/src/gnuk/regnual$ make less
jas@kaka:~/src/gnuk/regnual$ cd ../../
jas@kaka:~/src$ sudo python3 neug/tool/neug_upgrade.py -f gnuk/regnual/regnual.bin gnuk/src/build/gnuk.bin
gnuk/regnual/regnual.bin: 4608
gnuk/src/build/gnuk.bin: 109568
CRC32: b93ca829
Device:
Configuration: 1
Interface: 1
20000e00:20005000
Downloading flash upgrade program...
start 20000e00
end 20002000
# 20002000: 32 : 4
Run flash upgrade program...
Wait 1 second...
Wait 1 second...
Device:
08001000:08020000
Downloading the program
start 08001000
end 0801ac00
jas@kaka:~/src$
The kernel log will contain the following, and the card is ready to use as an OpenPGP card. You may unplug it and re-insert it as you wish.
usb 3-3: reset full-speed USB device number 41 using xhci_hcd
usb 3-3: device firmware changed
usb 3-3: USB disconnect, device number 41
usb 3-3: new full-speed USB device number 42 using xhci_hcd
usb 3-3: New USB device found, idVendor=234b, idProduct=0000, bcdDevice= 2.00
usb 3-3: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 3-3: Product: Gnuk Token
usb 3-3: Manufacturer: Free Software Initiative of Japan
usb 3-3: SerialNumber: FSIJ-1.2.20-42315277
Setting up the card is the next step, and there are many tutorials around for this, eventually I settled with the following sequence. Let s start with setting the admin PIN. First make sure that pcscd nor scdaemon is running, which is good hygien since those processes cache some information and with a stale connection this easily leads to confusion. Cache invalidation sigh.
jas@kaka:~$ gpg-connect-agent "SCD KILLSCD" "SCD BYE" /bye
jas@kaka:~$ ps auxww grep -e pcsc -e scd
jas 30221 0.0 0.0 3468 1692 pts/3 R+ 11:49 0:00 grep --color=auto -e pcsc -e scd
jas@kaka:~$ gpg --card-edit
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: [not set]
Language prefs ...: [not set]
Salutation .......:
URL of public key : [not set]
Login data .......: [not set]
Signature PIN ....: forced
Key attributes ...: rsa2048 rsa2048 rsa2048
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: off
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card> admin
Admin commands are allowed
gpg/card> kdf-setup
gpg/card> passwd
gpg: OpenPGP card no. D276000124010200FFFE423152770000 detected
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 3
PIN changed.
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection?
Now it would be natural to setup the PIN and reset code. However the Gnuk software is configured to not allow this until the keys are imported. You would get the following somewhat cryptical error messages if you try. This took me a while to understand, since this is device-specific, and some other OpenPGP implementations allows you to configure a PIN and reset code before key import.
Your selection? 4
Error setting the Reset Code: Card error
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? 1
Error changing the PIN: Conditions of use not satisfied
1 - change PIN
2 - unblock PIN
3 - change Admin PIN
4 - set the Reset Code
Q - quit
Your selection? q
Continue to configure the card and make it ready for key import. Some settings deserve comments. The lang field may be used to setup the language, but I have rarely seen it use, and I set it to sv (Swedish) mostly to be able to experiment if any software adhears to it. The URL is important to point to somewhere where your public key is stored, the fetch command of gpg --card-edit downloads it and sets up GnuPG with it when you are on a clean new laptop. The forcesig command changes the default so that a PIN code is not required for every digital signature operation, remember that I averaged 10 signatures per day for the past 2-3 years? Think of the wasted energy typing those PIN codes every time! Changing the cryptographic key type is required when I import 25519-based keys.
gpg/card> name
Cardholder's surname: Josefsson
Cardholder's given name: Simon
gpg/card> lang
Language preferences: sv
gpg/card> sex
Salutation (M = Mr., F = Ms., or space): m
gpg/card> login
Login data (account name): jas
gpg/card> url
URL to retrieve public key: https://josefsson.org/key-20190320.txt
gpg/card> forcesig
gpg/card> key-attr
Changing card key attribute for: Signature key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
Note: There is no guarantee that the card supports the requested size.
If the key generation does not succeed, please check the
documentation of your card to see what sizes are allowed.
Changing card key attribute for: Encryption key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: cv25519
Changing card key attribute for: Authentication key
Please select what kind of key you want:
(1) RSA
(2) ECC
Your selection? 2
Please select which elliptic curve you want:
(1) Curve 25519
(4) NIST P-384
Your selection? 1
The card will now be re-configured to generate a key of type: ed25519
gpg/card>
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 3 3 3
Signature counter : 0
KDF setting ......: on
Signature key ....: [none]
Encryption key....: [none]
Authentication key: [none]
General key info..: [none]
gpg/card>
The device is now ready for key import! Bring out your offline laptop and boot it and use the keytocard command on the subkeys to import them. This assumes you saved a copy of the GnuPG home directory after generating the master and subkeys before, which I did in my own previous tutorial when I generated the keys. This may be a bit unusual, and there are simpler ways to do this (e.g., import a copy of the secret keys into a fresh GnuPG home directory).
$ cp -a gnupghome-backup-mastersubkeys gnupghome-import-fst01sz-42315277-2022-12-24
$ ps auxww grep -e pcsc -e scd
$ gpg --homedir $PWD/gnupghome-import-fst01sz-42315277-2022-12-24 --edit-key B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE
...
Secret key is available.
gpg: checking the trustdb
gpg: marginals needed: 3 completes needed: 1 trust model: pgp
gpg: depth: 0 valid: 1 signed: 0 trust: 0-, 0q, 0n, 0m, 0f, 1u
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb* cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
(2) Encryption key
Your selection? 2
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb* cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 2
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb* ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
(3) Authentication key
Your selection? 3
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb* ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 2
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> key 3
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb* ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> keytocard
Please select where to store the key:
(1) Signature key
(3) Authentication key
Your selection? 1
sec ed25519/D73CF638C53C06BE
created: 2019-03-20 expired: 2019-10-22 usage: SC
trust: ultimate validity: expired
ssb cv25519/02923D7EE76EBD60
created: 2019-03-20 expired: 2019-10-22 usage: E
ssb ed25519/80260EE8A9B92B2B
created: 2019-03-20 expired: 2019-10-22 usage: A
ssb* ed25519/51722B08FE4745A2
created: 2019-03-20 expired: 2019-10-22 usage: S
[ expired] (1). Simon Josefsson <simon@josefsson.org>
gpg> quit
Save changes? (y/N) y
$
Now insert it into your daily laptop and have GnuPG and learn about the new private keys and forget about any earlier locally available card bindings this usually manifests itself by GnuPG asking you to insert a OpenPGP card with another serial number. Earlier I did rm -rf ~/.gnupg/private-keys-v1.d/ but the scd serialno followed by learn --force is nicer. I also sets up trust setting for my own key.
jas@kaka:~$ gpg-connect-agent "scd serialno" "learn --force" /bye
...
jas@kaka:~$ echo "B1D2BD1375BECB784CF4F8C4D73CF638C53C06BE:6:" gpg --import-ownertrust
jas@kaka:~$ gpg --card-status
Reader ...........: 234B:0000:FSIJ-1.2.20-42315277:0
Application ID ...: D276000124010200FFFE423152770000
Application type .: OpenPGP
Version ..........: 2.0
Manufacturer .....: unmanaged S/N range
Serial number ....: 42315277
Name of cardholder: Simon Josefsson
Language prefs ...: sv
Salutation .......: Mr.
URL of public key : https://josefsson.org/key-20190320.txt
Login data .......: jas
Signature PIN ....: not forced
Key attributes ...: ed25519 cv25519 ed25519
Max. PIN lengths .: 127 127 127
PIN retry counter : 5 5 5
Signature counter : 3
KDF setting ......: on
Signature key ....: A3CC 9C87 0B9D 310A BAD4 CF2F 5172 2B08 FE47 45A2
created ....: 2019-03-20 23:40:49
Encryption key....: A9EC 8F4D 7F1E 50ED 3DEF 49A9 0292 3D7E E76E BD60
created ....: 2019-03-20 23:40:26
Authentication key: CA7E 3716 4342 DF31 33DF 3497 8026 0EE8 A9B9 2B2B
created ....: 2019-03-20 23:40:37
General key info..: sub ed25519/51722B08FE4745A2 2019-03-20 Simon Josefsson <simon@josefsson.org>
sec# ed25519/D73CF638C53C06BE created: 2019-03-20 expires: 2023-09-19
ssb> ed25519/80260EE8A9B92B2B created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> ed25519/51722B08FE4745A2 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
ssb> cv25519/02923D7EE76EBD60 created: 2019-03-20 expires: 2023-09-19
card-no: FFFE 42315277
jas@kaka:~$
Verify that you can digitally sign and authenticate using the key and you are done!
jas@kaka:~$ echo foo gpg -a --sign gpg --verify
gpg: Signature made Sat Dec 24 13:49:59 2022 CET
gpg: using EDDSA key A3CC9C870B9D310ABAD4CF2F51722B08FE4745A2
gpg: Good signature from "Simon Josefsson <simon@josefsson.org>" [ultimate]
jas@kaka:~$ ssh-add -L
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILzCFcHHrKzVSPDDarZPYqn89H5TPaxwcORgRg+4DagE cardno:FFFE42315277
jas@kaka:~$
So time to relax and celebrate christmas? Hold on not so fast! Astute readers will have noticed that the output said PIN retry counter: 5 5 5 . That s not the default PIN retry counter for Gnuk! How did that happen? Indeed, good catch and great question, my dear reader. I wanted to include how you can modify the Gnuk source code, re-build it and re-flash the Gnuk as well. This method is different than flashing Gnuk onto a device that is running NeuG so the commands I used to flash the firmware in the start of this blog post no longer works in a device running Gnuk. Fortunately modern Gnuk supports updating firmware by specifying the Admin PIN code only, and provides a simple script to achieve this as well. The PIN retry counter setting is hard coded in the openpgp-do.c file, and we run a a perl command to modify the file, rebuild Gnuk and upgrade the FST-01SZ. This of course wipes all your settings, so you will have the opportunity to practice all the commands earlier in this post once again!
jas@kaka:~/src/gnuk/src$ perl -pi -e 's/PASSWORD_ERRORS_MAX 3/PASSWORD_ERRORS_MAX 5/' openpgp-do.c
jas@kaka:~/src/gnuk/src$ make less
jas@kaka:~/src/gnuk/src$ cd ../tool/
jas@kaka:~/src/gnuk/tool$ ./upgrade_by_passwd.py
Admin password:
Device:
Configuration: 1
Interface: 0
../regnual/regnual.bin: 4608
../src/build/gnuk.bin: 110592
CRC32: b93ca829
Device:
Configuration: 1
Interface: 0
20002a00:20005000
Downloading flash upgrade program...
start 20002a00
end 20003c00
Run flash upgrade program...
Waiting for device to appear:
Wait 1 second...
Wait 1 second...
Device:
08001000:08020000
Downloading the program
start 08001000
end 0801b000
Protecting device
Finish flashing
Resetting device
Update procedure finished
jas@kaka:~/src/gnuk/tool$
Now finally, I wish you all a Merry Christmas and Happy Hacking!
The first fresh release of our RcppDE package in over four years (!!) is now on CRAN.
RcppDE is a port of DEoptim, a popular package for derivative-free optimisation using differential evolution optimization, from plain C to C++. By using RcppArmadillo the code becomes a lot shorter and more legible. Our other main contribution is to leverage some of the excellence we get for free from using Rcpp, in particular the ability to optimise user-supplied compiled objective functions which can make things a lot faster than repeatedly evaluating interpreted objective functions as DEoptim does (and which, in fairness, most other optimisers do too). The gains can be quite substantial.
This release brings two helpful patches from Max Coulter who spotted two distinct areas for improvement, based on how DEoptim how changed in recent years. I updated a number of aspects of continuous integration since the last release, and also streamlined and simplified the C++ interface (and in doing so also squashed a bug or two).
Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppDE page, or the repository.
If you like this or other open-source work I do, you can sponsor me at GitHub.
TL;DR
Never trust show commit changes diff on Cisco IOS XR.
Cisco IOS XR is the operating system running for the Cisco ASR, NCS, and
8000 routers. Compared to Cisco IOS, it features a candidate
configuration and a running configuration. In configuration mode, you can
modify the first one and issue the commit command to apply it to the running
configuration.1 This is a common concept for many NOS.
Before committing the candidate configuration to the running configuration, you
may want to check the changes that have accumulated until now. That s where the
show commit changes diff command2 comes up. Its goal is to show the
difference between the running configuration (show running-configuration) and
the candidate configuration (show configuration merge). How hard can it be?
Let s put an interface down on IOS XR 7.6.2 (released in August 2022):
The + sign before interface HundredGigE0/1/0/1 makes it look like you did
create a new interface. Maybe there was a typo? No, the diff is just broken. If
you look at the candidate configuration, everything is like you expect:
RP/0/RP0/CPU0:router(config)#show configuration merge int Hu0/1/0/1
Wed Nov 23 11:08:43.360 CETinterface HundredGigE0/1/0/1 description PNI: (some description) bundle id 4000 mode active lldp receive disable transmit disable ! shutdown load-interval 30
Here is a more problematic example on IOS XR 7.2.2 (released in January 2021).
We want to unconfigure three interfaces:
RP/0/RP0/CPU0:router(config)#no int GigabitEthernet 0/0/0/5
RP/0/RP0/CPU0:router(config)#int TenGigE 0/0/0/5 shut
RP/0/RP0/CPU0:router(config)#no int TenGigE 0/0/0/28
RP/0/RP0/CPU0:router(config)#int TenGigE 0/0/0/28 shut
RP/0/RP0/CPU0:router(config)#no int TenGigE 0/0/0/29
RP/0/RP0/CPU0:router(config)#int TenGigE 0/0/0/29 shut
RP/0/RP0/CPU0:router(config)#show commit changes diff
Mon Nov 7 15:07:22.990 CETBuilding configuration...!! IOS XR Configuration 7.2.2- interface GigabitEthernet0/0/0/5- shutdown !+ interface TenGigE0/0/0/5+ shutdown ! interface TenGigE0/0/0/28- description Trunk (some description)- bundle id 2 mode active !end
The two first commands are correctly represented by the first two chunks of the
diff: we remove GigabitEthernet0/0/0/5 and create TenGigE0/0/0/5. The two
next commands are also correctly represented by the last chunk of the diff.
TenGigE0/0/0/28 was already shut down, so it is expected that only
description and bundle id are removed. However, the diff command forgets
about the modifications for TenGigE0/0/0/29. The diff should include a chunk
similar to the last one.
RP/0/RP0/CPU0:router(config)#show run int TenGigE 0/0/0/29
Mon Nov 7 15:07:43.571 CETinterface TenGigE0/0/0/29 description Trunk to other router bundle id 2 mode active shutdown!RP/0/RP0/CPU0:router(config)#show configuration merge int TenGigE 0/0/0/29
Mon Nov 7 15:07:53.584 CETinterface TenGigE0/0/0/29 shutdown!
How can the diff be correct for TenGigE0/0/0/28 but incorrect for
TenGigE0/0/0/29 while they have the same configuration? How can you trust the
diff command if it forgets part of the configuration?
Do you remember the last time you ran an Ansible playbook and discovered the
whole router ospf block disappeared without a warning? If you use automation
tools, you should check how the diff is assembled. Automation tools should build
it from the result of show running-config and show configuration merge. This
is what NAPALM does. This is not what cisco.iosxr
collection for Ansible does.
The problem is not limited to the interface directives. You can get similar
issues for other parts of the configuration. For example, here is what we get
when removing inactive BGP neighbors on IOS XR 7.2.2:
The only correct chunk is for neighbor 217.29.66.112. All the others are missing
some of the removed lines. 217.29.67.15 is even missing all of them. How bad is
the code providing such a diff?
I could go all day with examples such as these. Cisco TAC is happy to open a
case in DDTS, their bug tracker, to fix specific occurrences of this
bug.3 However, I fail to understand why the XR team is not just
providing the diff between show run and show configuration merge. The output
would always be correct!
IOS XR has several limitations. The most inconvenient one is the
inability to change the AS number in the router bgp directive. Such a
limitation is a great pain for both operations and automation.
This command could have been just show commit, as show commit
changes diff is the only valid command you can execute from this point.
Starting from IOS XR 7.5.1, show commit changes diff precise is also a
valid command. However, I have failed to find any documentation about it and
it seems to provide the same output as show commit changes diff. That s
how clunky IOS XR can be.
See CSCwa26251 as an example of a fix for something I reported
earlier this year. You need a valid Cisco support contract to be able to see
its content.
It should be easy to work towards eventual consistency. I should be able to
do them bit by little bit, leaving things half-done, and picking them up
later with little (mental) effort. Eventually my records would be perfect and
consistent.
This has been an approach I've used for many things in my life for a long time.
It has something in common with eat the elephant one mouthful at a time. I
think there are some categories of problems that you can't solve this way:
perhaps because with this approach you are always stuck in a local
minima/maxima. On the other hand I often feel that there's no way I can address
some problems at all without doing so in the smallest of steps.
Personal finance is definitely one of those.
History, Setup
So for quite some time I have a QNAP TS-873x here, equipped with 8
Western Digital Red 10 TB disks, plus 2 WD Blue 500G M2 SSDs. The QNAP
itself has an AMD Embedded R-Series RX-421MD with 4 cores and was
equipped with 48G RAM.
Initially I had been quite happy, the system is nice. It was fast, it
was easy to get to run and the setup of things I wanted was simple
enough. All in a web interface that tries to imitate a kind of
workstation feeling and also tries to hide that it is actually a
webinterface.
Natually with that amount of disks I had a RAID6 for the disks, plus
RAID1 for the SSDs. And then configured as a big storage pool with the
RAID1 as cache. Below the hood QNAP uses MDADM Raid and LVM (if you
want, with thin provisioning), in some form of emdedded linux. The
interface allows for regular snapshots of your storage with flexible
enough schedules to create them, so it all appears pretty good.
QNAP slow
Fast forward some time and it gets annoying. First off you really
should have regular raid resyncs scheduled, and while you can set
priorities on them and have them low priority, they make the whole
system feel very sluggish, quite annoying. And sure, power failure
(rare, but can happen) means another full resync run. Also, it appears
all of the snapshots are always mounted to some
/mnt/snapshot/something place (df on the system gets quite unusable).
Second, the reboot times. QNAP seems to be affected by the more
features, fuck performance virus, and bloat their OS with more and
more features while completly ignoring the performance. Everytime they
do an upgrade it feels worse. Lately reboot times went up to 10 to
15 minutes - and then it still hadn t started the virtual machines /
docker containers one might run on. Another 5 to 10 minutes for those.
Opening the file explorer - ages on calculating what to show. Trying
to get the storage setup shown? Go get a coffee, but please fetch the
beans directly from the plantation, or you are too fast.
Annoying it was. And no, no broken disks or fan or anything, it all
checks out fine.
Replace QNAPs QTS system
So I started looking around what to do. More RAM may help a little
bit, but I already had 48G, the system itself appears to only do 64G
maximum, so not much chance of it helping enough. Hardware is all fine
and working, so software needs to be changed. Sounds hard, but turns
out, it is not.
TrueNAS
And I found that multiple people replaced the QNAPs own system with a
TrueNAS installation and generally had been happy. Looking further I
found that TrueNAS has a variant called Scale - which is based on
Debian. Doubly good, that, so I went off checking what I may need for
it.
Requirements
Heck, that was a step back. To install TrueNAS you need an HDMI out
and a disk to put it on. The one that QTS uses is too small, so no
option.
QNAPs original internal
USB drive, DOM
So either use one of the SSDs that played cache (and
should do so again in TrueNAS, or get the QNAP original replaced.
HDMI out is simple, get a cheap card and put it into one of the two
PCIe-4x slots, done. The disk thing looked more complicated, as QNAP
uses some internal usb stick thing . Turns out it is just a USB
stick that has an 8+1pin connector. Couldn t find anything nice as
replacement, but hey, there are 9-pin to USB-A adapters.
a 9pin to USB A adapter
With that adapter, one can take some random M2 SSD and an M2-to-USB
case, plus some cabling, and voila, we have a nice system disk.
9pin adapter to USB-A connected with some
more cable
Obviously there isn t a good place to put this SSD case and cable, but the
QNAP case is large enough to find space and use some cable ties to store it
safely. Space enough to get the cable from the side, where the
mainboard is to the place I mounted it, so all fine.
Mounted SSD in its external case
The next best M2 SSD was a Western Digital Red with 500G - and while
this is WAY too much for TrueNAS, it works. And hey, only using a tiny
fraction? Oh so much more cells available internally to use when
others break. Or something
Together with the Asus card mounted I was able to install TrueNAS.
Which is simple, their installer is easy enough to follow, just make
sure to select the right disk to put it on.
Preserving data during the move
Switching from QNAP QTS to TrueNAS Scale means changing from MDADM
Raid with LVM and ext4 on top to ZFS and as such all data on it gets
erased. So a backup first is helpful, and I got myself two external
Seagate USB Disks of 6TB each - enough for the data I wanted to keep.
Copying things all over took ages, especially as the QNAP backup
thingie sucks, it was breaking quite often. Also, for some reason I
did not investigate, the performance of it was real bad. It started at
a maximum of 50MB/s, but the last terabyte of data was copied
at MUCH less than that, and so it took much longer than I anticipated.
Copying back was slow too, but much less so. Of course reading things
usually is faster than writing, with it going around 100MB/s most of
the time, which is quite a bit more - still not what USB3 can actually
do, but I guess the AMD chip doesn t want to go that fast.
TrueNAS experience
The installation went mostly smooth, the only real trouble had been on
my side. Turns out that a bad network cable does NOT help the network
setup, who would have thought. Other than that it is the usual set of
questions you would expect, a reboot, and then some webinterface.
And here the differences start. The whole system boots up much faster.
Not even a third of the time compared to QTS.
One important thing: As TrueNAS scale is Debian based, and hence a
linux kernel, it automatically detects and assembles the old RAID
arrays that QTS put on. Which TrueNAS can do nothing with, so it helps
to manually stop them and wipe the disks.
Afterwards I put ZFS on the disks, with a similar setup to what I had
before. The spinning rust are the data disks in a RAIDZ2 setup, the
two SSDs are added as cache devices. Unlike MDADM, ZFS does not have a
long sync process. Also unlike the MDADM/LVM/EXT4 setup from before,
ZFS works different. It manages the raid thing but it also does the
volume and filesystem parts. Quite different handling, and I m still
getting used to it, so no, I won t write some ZFS introduction now.
Features
The two systems can not be compared completly, they are having a
pretty different target audience. QNAP is more for the user that wants
some network storage that offers a ton of extra features easily
available via a clickable interface. While TrueNAS appears more
oriented to people that want a fast but reliable storage system.
TrueNAS does not offer all the extra bloat the QNAP delivers. Still,
you have the ability to run virtual machines and it seems it comes
with Rancher, so some kubernetes/container ability is there. It lacks
essential features like assigning PCI devices to virtual machines, so
is not useful right now, but I assume that will come in a future
version.
I am still exploring it all, but I like what I have right now. Still
rebuilding my setup to have all shares exported and used again, but
the most important are working already.
I started migrating my graphical workstations to Wayland, specifically
migrating from i3 to Sway. This is mostly to address serious graphics
bugs in the latest Framwork
laptop, but also something I
felt was inevitable.
The current status is that I've been able to convert my i3
configuration to Sway, and adapt my systemd startup sequence to the
new environment. Screen sharing only works with Pipewire, so I also
did that migration, which basically requires an upgrade to Debian
bookworm to get a nice enough Pipewire release.
I'm testing Wayland on my laptop, but I'm not using it as a daily
driver because I first need to upgrade to Debian bookworm on my main
workstation.
Most irritants have been solved one way or the other. My main problem
with Wayland right now is that I spent a frigging week doing the
conversion: it's exciting and new, but it basically sucked the life
out of all my other projects and it's distracting, and I want it to
stop.
The rest of this page documents why I made the switch, how it
happened, and what's left to do. Hopefully it will keep you from
spending as much time as I did in fixing this.
TL;DR: Wayland is mostly ready. Main blockers you might find are
that you need to do manual configurations, DisplayLink (multiple
monitors on a single cable) doesn't work in Sway, HDR and color
management are still in development.
I had to install the following packages:
And did some of tweaks in my $HOME, mostly dealing with my esoteric
systemd startup sequence, which you won't have to deal with if you are
not a fan.
Why switch?
I originally held back from migrating to Wayland: it seemed like a
complicated endeavor hardly worth the cost. It also didn't seem
actually ready.
But after reading this blurb on LWN, I decided to at least
document the situation here. The actual quote that convinced me it
might be worth it was:
It s amazing. I have never experienced gaming on Linux that looked
this smooth in my life.
... I'm not a gamer, but I docare about
latency. The longer version is
worth a read as well.
The point here is not to bash one side or the other, or even do a
thorough comparison. I start with the premise that Xorg is likely
going away in the future and that I will need to adapt some day. In
fact, the last major Xorg release (21.1, October 2021) is rumored
to be the last ("just like the previous release...", that said,
minor releases are still coming out, e.g. 21.1.4). Indeed, it
seems even core Xorg people have moved on to developing Wayland, or at
least Xwayland, which was spun off it its own source tree.
X, or at least Xorg, in in maintenance mode and has been for
years. Granted, the X Window System is getting close to forty
years old at this point: it got us amazingly far for something that
was designed around the time the firstgraphical
interface. Since Mac and (especially?) Windows released theirs,
they have rebuilt their graphical backends numerous times, but UNIX
derivatives have stuck on Xorg this entire time, which is a testament
to the design and reliability of X. (Or our incapacity at developing
meaningful architectural change across the entire ecosystem, take your
pick I guess.)
What pushed me over the edge is that I had some pretty bad driver
crashes with Xorg while screen sharing under Firefox, in Debian
bookworm (around November 2022). The symptom would be that the UI
would completely crash, reverting to a text-only console, while
Firefox would keep running, audio and everything still
working. People could still see my screen, but I couldn't, of course,
let alone interact with it. All processes still running, including
Xorg.
(And no, sorry, I haven't reported that bug, maybe I should have, and
it's actually possible it comes up again in Wayland, of course. But at
first, screen sharing didn't work of course, so it's coming a much
further way. After making screen sharing work, though, the bug didn't
occur again, so I consider this a Xorg-specific problem until further
notice.)
There were also frustrating glitches in the UI, in general. I actually
had to setup a compositor alongside i3 to make things bearable at
all. Video playback in a window was laggy, sluggish, and out of sync.
Wayland fixed all of this.
Wayland equivalents
This section documents each tool I have picked as an alternative to
the current Xorg tool I am using for the task at hand. It also touches
on other alternatives and how the tool was configured.
Note that this list is based on the series of tools I use in
desktop.
TODO: update desktop with the following when done,
possibly moving old configs to a ?xorg archive.
Window manager: i3 sway
This seems like kind of a no-brainer. Sway is around, it's
feature-complete, and it's in Debian.
I'm a bit worried about the "Drew DeVault community", to be
honest. There's a certain aggressiveness in the community I don't like
so much; at least an open hostility towards more modern UNIX tools
like containers and systemd that make it hard to do my work while
interacting with that community.
I'm also concern about the lack of unit tests and user manual for
Sway. The i3 window manager has been designed by a fellow
(ex-)Debian developer I have a lot of respect for (Michael
Stapelberg), partly because of i3 itself, but also working with
him on other projects. Beyond the characters, i3 has a user
guide, a code of conduct, and lots more
documentation. It has a test suite.
Sway has... manual pages, with the homepage just telling users to use
man -k sway to find what they need. I don't think we need that kind
of elitism in our communities, to put this bluntly.
But let's put that aside: Sway is still a no-brainer. It's the easiest
thing to migrate to, because it's mostly compatible with i3. I had
to immediately fix those resources to get a minimal session going:
i3
Sway
note
set_from_resources
set
no support for X resources, naturally
new_window pixel 1
default_border pixel 1
actually supported in i3 as well
That's it. All of the other changes I had to do (and there were
actually a lot) were all Wayland-specific changes, not
Sway-specific changes. For example, use brightnessctl instead of
xbacklight to change the backlight levels.
See a copy of my full sway/config for details.
Other options include:
dwl: tiling, minimalist, dwm for Wayland, not in Debian
Status bar: py3status waybar
I have invested quite a bit of effort in setting up my status bar with
py3status. It supports Sway directly, and did not actually require
any change when migrating to Wayland.
Unfortunately, I had trouble making nm-applet work. Based on this
nm-applet.service, I found that you need to pass --indicator for
it to show up at all.
In theory, tray icon support was merged in 1.5, but in practice
there are still several limitations, like icons not
clickable. Also, on startup, nm-applet --indicator triggers this
error in the Sway logs:
nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property IconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property AttentionIconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property ItemIsMenu
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
... but that seems innocuous. The tray icon displays but is not
clickable.
Note that there is currently (November 2022) a pull request to
hook up a "Tray D-Bus Menu" which, according to Reddit might fix
this, or at least be somewhat relevant.
If you don't see the icon, check the bar.tray_output property in the
Sway config, try: tray_output *.
The non-working tray was the biggest irritant in my migration. I have
used nmtui to connect to new Wifi hotspots or change connection
settings, but that doesn't support actions like "turn off WiFi".
I eventually fixed this by switching from py3status to
waybar, which was another yak horde shaving session, but
ultimately, it worked.
Web browser: Firefox
Firefox has had support for Wayland for a while now, with the team
enabling it by default in nightlies around January 2022. It's
actually not easy to figure out the state of the port, the meta bug
report is still open and it's huge: it currently (Sept 2022)
depends on 76 open bugs, it was opened twelve (2010) years ago, and
it's still getting daily updates (mostly linking to other tickets).
Firefox 106 presumably shipped with "Better screen sharing for
Windows and Linux Wayland users", but I couldn't quite figure out what
those were.
TL;DR: echo MOZ_ENABLE_WAYLAND=1 >> ~/.config/environment.d/firefox.conf && apt install xdg-desktop-portal-wlr
How to enable it
Firefox depends on this silly variable to start correctly under
Wayland (otherwise it starts inside Xwayland and looks fuzzy and fails
to screen share):
MOZ_ENABLE_WAYLAND=1 firefox
To make the change permanent, many recipes recommend adding this to an
environment startup script:
if [ "$XDG_SESSION_TYPE" == "wayland" ]; then
export MOZ_ENABLE_WAYLAND=1
fi
At least that's the theory. In practice, Sway doesn't actually run any
startup shell script, so that can't possibly work. Furthermore,
XDG_SESSION_TYPE is not actually set when starting Sway from gdm3
which I find really confusing, and I'm not the onlyone. So
the above trick doesn't actually work, even if the environment
(XDG_SESSION_TYPE) is set correctly, because we don't have
conditionals in environment.d(5).
(Note that systemd.environment-generator(7)do support running
arbitrary commands to generate environment, but for some some do not
support user-specific configuration files... Even then it may be a
solution to have a conditional MOZ_ENABLE_WAYLAND environment, but
I'm not sure it would work because ordering between those two isn't
clear: maybe the XDG_SESSION_TYPE wouldn't be set just yet...)
At first, I made this ridiculous script to workaround those
issues. Really, it seems to me Firefox should just parse the
XDG_SESSION_TYPE variable here... but then I realized that Firefox
works fine in Xorg when the MOZ_ENABLE_WAYLAND is set.
So now I just set that variable in environment.d and It Just Works :
MOZ_ENABLE_WAYLAND=1
Screen sharing
Out of the box, screen sharing doesn't work until you install
xdg-desktop-portal-wlr or similar
(e.g. xdg-desktop-portal-gnome on GNOME). I had to reboot for the
change to take effect.
Without those tools, it shows the usual permission prompt with "Use
operating system settings" as the only choice, but when we accept...
nothing happens. After installing the portals, it actualyl works, and
works well!
This was tested in Debian bookworm/testing with Firefox ESR 102 and
Firefox 106.
Major caveat: we can only share a full screen, we can't currently
share just a window. The major upside to that is that, by default,
it streams onlyone output which is actually what I want most
of the time! See the screencast compatibility for more
information on what is supposed to work.
This is actually a huge improvement over the situation in Xorg,
where Firefox can only share a window or all monitors, which led
me to use Chromium a lot for video-conferencing. With this change, in
other words, I will not need Chromium for anything anymore, whoohoo!
If slurp, wofi, or bemenu are
installed, one of them will be used to pick the monitor to share,
which effectively acts as some minimal security measure. See
xdg-desktop-portal-wlr(1) for how to configure that.
Side note: Chrome fails to share a full screen
I was still using Google Chrome (or, more accurately, Debian's
Chromium package) for some videoconferencing. It's mainly because
Chromium was the only browser which will allow me to share only one of
my two monitors, which is extremely useful.
To start chrome with the Wayland backend, you need to use:
If it shows an ugly gray border, check the Use system title bar and
borders setting.
It can do some screensharing. Sharing a window and a tab seems to
work, but sharing a full screen doesn't: it's all black. Maybe not
ready for prime time.
And since Firefox can do what I need under Wayland now, I will not
need to fight with Chromium to work under Wayland:
apt purge chromium
Note that a similar fix was necessary for Signal Desktop, see this
commit. Basically you need to figure out a way to pass those same
flags to signal:
News: feed2exec, gnus
See Email, above, or Emacs in Editor, below.
Editor: Emacs okay-ish
Emacs is being actively ported to Wayland. According to this LWN
article, the first (partial, to Cairo) port was done in 2014 and a
working port (to GTK3) was completed in 2021, but wasn't merged until
late 2021. That is: after Emacs 28 was released (April
2022).
So we'll probably need to wait for Emacs 29 to have native Wayland
support in Emacs, which, in turn, is unlikely to arrive in time for
the Debian bookworm freeze. There are, however, unofficial
builds for both Emacs 28 and 29 provided by spwhitton which
may provide native Wayland support.
I tested the snapshot packages and they do not quite work well
enough. First off, they completely take over the builtin Emacs they
hijack the $PATH in /etc! and certain things are simply not
working in my setup. For example, this hook never gets ran on startup:
(add-hook 'after-init-hook 'server-start t)
Still, like many X11 applications, Emacs mostly works fine under
Xwayland. The clipboard works as expected, for example.
Scaling is a bit of an issue: fonts look fuzzy.
I have heard anecdotal evidence of hard lockups with Emacs running
under Xwayland as well, but haven't experienced any problem so far. I
did experience a Wayland crash with the snapshot version however.
TODO: look again at Wayland in Emacs 29.
Backups: borg
Mostly irrelevant, as I do not use a GUI.
Color theme: srcery, redshift gammastep
I am keeping Srcery as a color theme, in general.
Redshift is another story: it has no support for Wayland out of
the box, but it's apparently possible to apply a hack on the TTY
before starting Wayland, with:
redshift -m drm -PO 3000
This tip is from the arch wiki which also has other suggestions
for Wayland-based alternatives. Both KDE and GNOME have their own "red
shifters", and for wlroots-based compositors, they (currently,
Sept. 2022) list the following alternatives:
greetd and QtGreet (former in Debian, not latter, which
means we're stuck with the weird agreety which doesn't work at
all)
sddm: KDE's default, in Debian, probably heavier or as heavy as
gdm3
Terminal: xterm foot
One of the biggest question mark in this transition was what to do
about Xterm. After writing twoarticles about terminal
emulators as a professional journalist, decades of working on the
terminal, and probably using dozens of different terminal emulators,
I'm still not happy with any of them.
This is such a big topic that I actually have an entire blog post
specifically about this.
For starters, using xterm under Xwayland works well enough, although
the font scaling makes things look a bit too fuzzy.
I have also tried foot: it ... just works!
Fonts are much crisper than Xterm and Emacs. URLs are not clickable
but the URL selector (control-shift-u) is just plain
awesome (think "vimperator" for the terminal).
There's cool hack to jump between prompts.
Copy-paste works. True colors work. The word-wrapping is excellent: it
doesn't lose one byte. Emojis are nicely sized and colored. Font
resize works. There's even scroll back search
(control-shift-r).
Foot went from a question mark to being a reason to switch to Wayland,
just for this little goodie, which says a lot about the quality of
that software.
The selection clicks are a not quite what I would expect though. In
rxvt and others, you have the following patterns:
single click: reset selection, or drag to select
double: select word
triple: select quotes or line
quadruple: select line
I particularly find the "select quotes" bit useful. It seems like foot
just supports double and triple clicks, with word and line
selected. You can select a rectangle with control,. It
correctly extends the selection word-wise with right click if
double-click was first used.
One major problem with Foot is that it's a new terminal, with its own
termcap entry. Support for foot was added to ncurses in the
20210731 release, which was shipped after the current Debian
stable release (Debian bullseye, which ships 6.2+20201114-2). A
workaround for this problem is to install the foot-terminfo package
on the remote host, which is available in Debian stable.
This should eventually resolve itself, as Debian bookworm has a newer
version. Note that some corrections were also shipped in the
20211113 release, but that is also shipped in Debian bookworm.
That said, I am almost certain I will have to revert back to xterm
under Xwayland at some point in the future. Back when I was using
GNOME Terminal, it would mostly work for everything until I had to use
the serial console on a (HP ProCurve) network switch, which have a
fancy TUI that was basically unusable there. I fully expect such
problems with foot, or any other terminal than xterm, for that matter.
The foot wiki has good troubleshooting instructions as well.
Update: I did find one tiny thing to improve with foot, and it's the
default logging level which I found pretty verbose. After discussing
it with the maintainer on IRC, I submitted this patch to tweak
it, which I described like this on Mastodon:
today's reason why i will go to hell when i die (TRWIWGTHWID?): a
600-word, 63 lines commit log for a one line change:
https://codeberg.org/dnkl/foot/pulls/1215
The above list comes partly from https://arewewaylandyet.com/ and
awesome-wayland. It is likely incomplete.
I have read some good things about bemenu, fuzzel, and wofi.
A particularly tricky option is that my rofi password management
depends on xdotool for some operations. At first, I thought this was
just going to be (thankfully?) impossible, because we actually like
the idea that one app cannot send keystrokes to another. But it seems
there are actually alternatives to this, like wtype or
ydotool, the latter which requires root access. wl-ime-type
does that through the input-method-unstable-v2 protocol (sample
emoji picker, but is not packaged in Debian.
As it turns out, wtype just works as expected, and fixing this was
basically a two-line patch. Another alternative, not in Debian, is
wofi-pass.
The other problem is that I actually heavily modified rofi. I use
"modis" which are not actually implemented in wofi or tofi, so I'm
left with reinventing those wheels from scratch or using the rofi +
wayland fork... It's really too bad that fork isn't being
reintegrated...
For now, I'm actually still using rofi under Xwayland. The main
downside is that fonts are fuzzy, but it otherwise just works.
Note that wlogout could be a partial replacement (just for the
"power menu").
Image viewers: geeqie ?
I'm not very happy with geeqie in the first place, and I suspect the
Wayland switch will just make add impossible things on top of the
things I already find irritating (Geeqie doesn't support copy-pasting
images).
In practice, Geeqie doesn't seem to work so well under Wayland. The
fonts are fuzzy and the thumbnail preview just doesn't work anymore
(filed as Debian bug 1024092). It seems it also has problems
with scaling.
Alternatives:
See also this list and that list for other list of image
viewers, not necessarily ported to Wayland.
TODO: pick an alternative to geeqie, nomacs would be gorgeous if it
wouldn't be basically abandoned upstream (no release since 2020), has
an unpatchedCVE-2020-23884 since July 2020, does bad
vendoring, and is in bad shape in Debian (4 minor releases
behind).
So for now I'm still grumpily using Geeqie.
Media player: mpv, gmpc / sublime
This is basically unchanged. mpv seems to work fine under Wayland,
better than Xorg on my new laptop (as mentioned in the introduction),
and that before the version which improves Wayland support
significantly, by bringing native Pipewire support and DMA-BUF
support.
gmpc is more of a problem, mainly because it is abandoned. See
2022-08-22-gmpc-alternatives for the full discussion, one of
the alternatives there will likely support Wayland.
Finally, I might just switch to sublime-music instead... In any
case, not many changes here, thankfully.
Screensaver: xsecurelock swaylock
I was previously using xss-lock and xsecurelock as a screensaver, with
xscreensaver "hacks" as a backend for xsecurelock.
The basic screensaver in Sway seems to be built with swayidle and
swaylock. It's interesting because it's the same "split" design
as xss-lock and xsecurelock.
That, unfortunately, does not include the fancy "hacks" provided by
xscreensaver, and that is unlikely to be implemented upstream.
Other alternatives include gtklock and waylock (zig), which
do not solve that problem either.
It looks like swaylock-plugin, a swaylock fork, which at least
attempts to solve this problem, although not directly using the real
xscreensaver hacks. swaylock-effects is another attempt at this,
but it only adds more effects, it doesn't delegate the image display.
Other than that, maybe it's time to just let go of those funky
animations and just let swaylock do it's thing, which is display a
static image or just a black screen, which is fine by me.
In the end, I am just using swayidle with a configuration based on
the systemd integration wiki page but with additional tweaks from
this service, see the resulting swayidle.service file.
Interestingly, damjan also has a service for swaylock itself,
although it's not clear to me what its purpose is...
Screenshot: maim grim, pubpaste
I'm a heavy user of maim (and a package uploader in Debian). It
looks like the direct replacement to maim (and slop) is grim
(and slurp). There's also swappy which goes on top of grim
and allows preview/edit of the resulting image, nice touch (not in
Debian though).
See also awesome-wayland screenshots for other alternatives:
there are many, including X11 tools like Flameshot that also
support Wayland.
One key problem here was that I have my own screenshot / pastebin
software which will needed an update for Wayland as well. That,
thankfully, meant actually cleaning up a lot of horrible code that
involved calling xterm and xmessage for user interaction. Now,
pubpaste uses GTK for prompts and looks much better. (And before
anyone freaks out, I already had to use GTK for proper clipboard
support, so this isn't much of a stretch...)
Screen recorder: simplescreenrecorder wf-recorder
In Xorg, I have used both peek or simplescreenrecorder for
screen recordings. The former will work in Wayland, but has no
sound support. The latter has a fork with Wayland support but
it is limited and buggy ("doesn't support recording area selection and
has issues with multiple screens").
It looks like wf-recorder will just do everything correctly out
of the box, including audio support (with --audio, duh). It's also
packaged in Debian.
One has to wonder how this works while keeping the "between app
security" that Wayland promises, however... Would installing such a
program make my system less secure?
Many other options are available, see the awesome Wayland
screencasting list.
RSI: workrave nothing?
Workrave has no support for Wayland. activity watch is a
time tracker alternative, but is not a RSI watcher. KDE has
rsiwatcher, but that's a bit too much on the heavy side for my
taste.
SafeEyes looks like an alternative at first, but it has many
issues under Wayland (escape doesn't work, idle doesn't
work, it just doesn't work really). timekpr-nextcould be
an alternative as well, and has support for Wayland.
I am also considering just abandoning workrave, even if I stick with
Xorg, because it apparently introduces significant latency in the
input pipeline.
And besides, I've developed a pretty unhealthy alert fatigue with
Workrave. I have used the program for so long that my fingers know
exactly where to click to dismiss those warnings very effectively. It
makes my work just more irritating, and doesn't fix the fundamental
problem I have with computers.
Other apps
This is a constantly changing list, of course. There's a bit of a
"death by a thousand cuts" in migrating to Wayland because you realize
how many things you were using are tightly bound to X.
.Xresources - just say goodbye to that old resource system, it
was used, in my case, only for rofi, xterm, and ... Xboard!?
keyboard layout switcher: built-in to Sway since 2017 (PR
1505, 1.5rc2+), requires a small configuration change, see
this answer as well, looks something like this command:
That works refreshingly well, even better than in Xorg, I must say.
swaykbdd is an alternative that supports per-window layouts
(in Debian).
wallpaper: currently using feh, will need a replacement, TODO:
figure out something that does, like feh, a random shuffle.
swaybg just loads a single image, duh. oguri might be a
solution, but unmaintained, used here, not in
Debian. wallutils is another option, also not in
Debian. For now I just don't have a wallpaper, the background is a
solid gray, which is better than Xorg's default (which is whatever
crap was left around a buffer by the previous collection of
programs, basically)
notifications: currently using dunst in some places, which
works well in both Xorg and Wayland, not a blocker, salut a
possible alternative (not in Debian), damjan uses mako. TODO:
install dunst everywhere
nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property IconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property AttentionIconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property ItemIsMenu
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
... but it seems innocuous. The tray icon displays but, as stated
above, is not clickable. If you don't see the icon, check the
bar.tray_output property in the Sway config, try: tray_output *.
Note that there is currently (November 2022) a pull request to
hook up a "Tray D-Bus Menu" which, according to Reddit might
fix this, or at least be somewhat relevant.
This was the biggest irritant in my migration. I have used nmtui
to connect to new Wifi hotspots or change connection settings, but
that doesn't support actions like "turn off WiFi".
I eventually fixed this by switching from py3status to
waybar.
window switcher: in i3 I was using this bespoke i3-focus
script, which doesn't work under Sway, swayr an option, not in
Debian. So I put together this other bespoke hack from
multiple sources, which works.
PDF viewer: currently using atril (which supports Wayland), could
also just switch to zatura/mupdf permanently, see also calibre
for a discussion on document viewers
More X11 / Wayland equivalents
For all the tools above, it's not exactly clear what options exist in
Wayland, or when they do, which one should be used. But for some basic
tools, it seems the options are actually quite clear. If that's the
case, they should be listed here:
Note that arandr and autorandr are not directly part of
X. arewewaylandyet.com refers to a few alternatives. We suggest
wdisplays and kanshi above (see also this service
file) but wallutils can also do the autorandr stuff, apparently,
and nwg-displays can do the arandr part. Neither are packaged in
Debian yet.
So I have tried wdisplays and it Just Works, and well. The UI even
looks better and more usable than arandr, so another clean win from
Wayland here.
TODO: test kanshi as a autorandr replacement
Other issues
systemd integration
I've had trouble getting session startup to work. This is partly
because I had a kind of funky system to start my session in the first
place. I used to have my whole session started from .xsession like
this:
But obviously, the xsession.target is not started by the Sway
session. It seems to just start a default.target, which is really
not what we want because we want to associate the services directly
with the graphical-session.target, so that they don't start when
logging in over (say) SSH.
damjan on #debian-systemd showed me his sway-setup which
features systemd integration. It involves starting a different session
in a completely new .desktop file. That work was submitted
upstream but refused on the grounds that "I'd rather not give a
preference to any particular init system." Another PR was
abandoned because "restarting sway does not makes sense: that
kills everything".
The work was therefore moved to the wiki.
So. Not a great situation. The upstream wikisystemd
integration suggests starting the systemd target from within
Sway, which has all sorts of problems:
you don't get Sway logs anywhere
control groups are all messed up
I have done a lot of work trying to figure this out, but I remember
that starting systemd from Sway didn't actually work for me: my
previously configured systemd units didn't correctly start, and
especially not with the right $PATH and environment.
So I went down that rabbit hole and managed to correctly configure
Sway to be started from the systemd --user session.
I have partly followed the wiki but also picked ideas from damjan's
sway-setup and xdbob's sway-services. Another option is
uwsm (not in Debian).
This is the config I have in .config/systemd/user/:
You will also need at least part of my sway/config, which
sends the systemd notification (because, no, Sway doesn't support any
sort of readiness notification, that would be too easy). And you might
like to see my swayidle-config while you're there.
Finally, you need to hook this up somehow to the login manager. This
is typically done with a desktop file, so drop
sway-session.desktop in /usr/share/wayland-sessions and
sway-user-service somewhere in your $PATH (typically
/usr/bin/sway-user-service).
The session then looks something like this:
Environment propagation
At first, my terminals and rofi didn't have the right $PATH, which
broke a lot of my workflow. It's hard to tell exactly how Wayland
gets started or where to inject environment. This discussion
suggests a few alternatives and this Debian bug report discusses
this issue as well.
I eventually picked environment.d(5) since I already manage my user
session with systemd, and it fixes a bunch of other problems. I used
to have a .shenv that I had to manually source everywhere. The only
problem with that approach is that it doesn't support conditionals,
but that's something that's rarely needed.
Pipewire
This is a whole topic onto itself, but migrating to Wayland also
involves using Pipewire if you want screen sharing to work. You
can actually keep using Pulseaudio for audio, that said, but that
migration is actually something I've wanted to do anyways: Pipewire's
design seems much better than Pulseaudio, as it folds in JACK
features which allows for pretty neat tricks. (Which I should probably
show in a separate post, because this one is getting rather long.)
I first tried this migration in Debian bullseye, and it didn't work
very well. Ardour would fail to export tracks and I would get
into weird situations where streams would just drop mid-way.
A particularly funny incident is when I was in a meeting and I
couldn't hear my colleagues speak anymore (but they could) and I went
on blabbering on my own for a solid 5 minutes until I realized what
was going on. By then, people had tried numerous ways of letting me
know that something was off, including (apparently) coughing, saying
"hello?", chat messages, IRC, and so on, until they just gave up and
left.
I suspect that was also a Pipewire bug, but it could also have been
that I muted the tab by error, as I recently learned that clicking on
the little tiny speaker icon on a tab mutes that tab. Since the tab
itself can get pretty small when you have lots of them, it's actually
quite frequently that I mistakenly mute tabs.
Anyways. Point is: I already knew how to make the migration, and I had
already documented how to make the change in Puppet. It's
basically:
An optional (but key, IMHO) configuration you should also make is to
"switch on connect", which will make your Bluetooth or USB headset
automatically be the default route for audio, when connected. In
~/.config/pipewire/pipewire-pulse.conf.d/autoconnect.conf:
See the excellent as usual Arch wiki page about Pipewire for
that trick and more information about Pipewire. Note that you must
not put the file in ~/.config/pipewire/pipewire.conf (or
pipewire-pulse.conf, maybe) directly, as that will break your
setup. If you want to add to that file, first copy the template from
/usr/share/pipewire/pipewire-pulse.conf first.
So far I'm happy with Pipewire in bookworm, but I've heard mixed
reports from it. I have high hopes it will become the standard media
server for Linux in the coming months or years, which is great because
I've been (rather boldly, I admit) on the record saying I don't like
PulseAudio.
Rereading this now, I feel it might have been a little unfair, as
"over-engineered and tries to do too many things at once" applies
probably even more to Pipewire than PulseAudio (since it also handles
video dispatching).
That said, I think Pipewire took the right approach by implementing
existing interfaces like Pulseaudio and JACK. That way we're not
adding a third (or fourth?) way of doing audio in Linux; we're just
making the server better.
Keypress drops
Sometimes I lose keyboard presses. This correlates with the following
warning from Sway:
d c 06 10:36:31 curie sway[343384]: 23:32:14.034 [ERROR] [wlr] [libinput] event5 - SONiX USB Keyboard: client bug: event processing lagging behind by 37ms, your system is too slow
... and corresponds to an open bug report in Sway. It seems the
"system is too slow" should really be "your compositor is too slow"
which seems to be the case here on this older system
(curie). It doesn't happen often, but it does happen,
particularly when a bunch of busy processes start in parallel (in my
case: a linter running inside a container and notmuch new).
The proposed fix for this in Sway is to gain real time privileges
and add the CAP_SYS_NICE capability to the binary. We'll see how
that goes in Debian once 1.8 gets released and shipped.
Improvements over i3
Tiling improvements
There's a lot of improvements Sway could bring over using plain
i3. There are pretty neat auto-tilers that could replicate the
configurations I used to have in Xmonad or Awesome, see:
Display latency tweaks
TODO: You can tweak the display latency in wlroots compositors with the
max_render_time parameter, possibly getting lower latency than
X11 in the end.
Sound/brightness changes notifications
TODO: Avizo can display a pop-up to give feedback on volume and
brightness changes. Not in Debian. Other alternatives include
SwayOSD and sway-nc, also not in Debian.
Debugging tricks
The xeyes (in the x11-apps package) will run in Wayland, and can
actually be used to easily see if a given window is also in
Wayland. If the "eyes" follow the cursor, the app is actually running
in xwayland, so not natively in Wayland.
Another way to see what is using Wayland in Sway is with the command:
Conclusion
In general, this took me a long time, but it mostly works. The tray
icon situation is pretty frustrating, but there's a workaround and I
have high hopes it will eventually fix itself. I'm also actually
worried about the DisplayLink support because I eventually want to
be using this, but hopefully that's another thing that will hopefully
fix itself before I need it.
A word on the security model
I'm kind of worried about all the hacks that have been added to
Wayland just to make things work. Pretty much everywhere we need to,
we punched a hole in the security model:
windows can overlay on top of each other (so one app could, for
example, spoof a password dialog, through the layer-shell
protocol)
Wikipedia describes the security properties of Wayland as it
"isolates the input and output of every window, achieving
confidentiality, integrity and availability for both." I'm not sure
those are actually realized in the actual implementation, because of
all those holes punched in the design, at least in Sway. For example,
apparently the GNOME compositor doesn't have the virtual-keyboard
protocol, but they do have (another?!) text input protocol.
Wayland does offer a better basis to implement such a system,
however. It feels like the Linux applications security model lacks
critical decision points in the UI, like the user approving "yes, this
application can share my screen now". Applications themselves might
have some of those prompts, but it's not mandatory, and that is
worrisome.
I just did again the usual web search, and I have verified that Mastodon still does not support managing multiple domains on the same instance, and that there is still no way to migrate an account to a different instance without losing all posts (and more?).
As much as I like the idea of a federated social network, open standards and so on, I do not think that it would be wise for me to spend time developing a social network identity on somebody else's instance which could disappear at any time.
I have managed my own email server since the '90s, but I do not feel that the system administration effort required to maintain a private Mastodon instance would be justified at this point: there is not even a Debian package! Mastodon either needs to become much simpler to maintain or become much more socially important, and so far it is neither. Also, it would be wasteful to use so many computing resources for a single-user instance.
While it is not ideal, for the time being I compromised by redirecting
WebFinger requests
for md@linux.it using this Apache configuration:
<Location /.well-known/host-meta>
Header set Access-Control-Allow-Origin: "*"
Header set Content-Type: "application/xrd+json; charset=utf-8"
Header set Cache-Control: "max-age=86400"
</Location>
<Location /.well-known/webfinger>
Header set Access-Control-Allow-Origin: "*"
Header set Content-Type: "application/jrd+json; charset=utf-8"
Header set Cache-Control: "max-age=86400"
</Location>
# WebFinger (https://www.rfc-editor.org/rfc/rfc7033)
RewriteMap lc int:tolower
RewriteMap unescape int:unescape
RewriteCond % REQUEST_URI ^/\.well-known/webfinger$
RewriteCond $ lc:$ unescape:% QUERY_STRING (?:^ &)resource=acct:([^&]+)@linux\.it(?:$ &)
RewriteRule .+ /home/soci/%1/public_html/webfinger.json [L,QSD]
# answer 404 to requests missing "acct:" or for domains != linux.it
RewriteCond % REQUEST_URI ^/\.well-known/webfinger$
RewriteCond $ unescape:% QUERY_STRING (?:^ &)resource=
RewriteRule .+ - [L,R=404]
# answer 400 to requests without the resource parameter
RewriteCond % REQUEST_URI ^/\.well-known/webfinger$
RewriteRule .+ - [L,R=400]
If you ve done anything in the Kubernetes space in recent years, you ve most likely come across the words Service Mesh . It s backed by a set of mature technologies that provides cross-cutting networking, security, infrastructure capabilities to be used by workloads running in Kubernetes in a manner that is transparent to the actual workload. This abstraction enables application developers to not worry about building in otherwise sophisticated capabilities for networking, routing, circuit-breaking and security, and simply rely on the services offered by the service mesh.In this post, I ll be covering Linkerd, which is an alternative to Istio. It has gone through a significant re-write when it transitioned from the JVM to a Go-based Control Plane and a Rust-based Data Plane a few years back and is now a part of the CNCF and is backed by Buoyant. It has proven itself widely for use in production workloads and has a healthy community and release cadence.It achieves this with a side-car container that communicates with a Linkerd control plane that allows central management of policy, telemetry, mutual TLS, traffic routing, shaping, retries, load balancing, circuit-breaking and other cross-cutting concerns before the traffic hits the container. This has made the task of implementing the application services much simpler as it is managed by container orchestrator and service mesh. I covered Istio in a prior post a few years back, and much of the content is still applicable for this post, if you d like to have a look.Here are the broad architectural components of Linkerd:The components are separated into the control plane and the data plane.The control plane components live in its own namespace and consists of a controller that the Linkerd CLI interacts with via the Kubernetes API. The destination service is used for service discovery, TLS identity, policy on access control for inter-service communication and service profile information on routing, retries, timeouts. The identity service acts as the Certificate Authority which responds to Certificate Signing Requests (CSRs) from proxies for initialization and for service-to-service encrypted traffic. The proxy injector is an admission webhook that injects the Linkerd proxy side car and the init container automatically into a pod when the linkerd.io/inject: enabled is available on the namespace or workload.On the data plane side are two components. First, the init container, which is responsible for automatically forwarding incoming and outgoing traffic through the Linkerd proxy via iptables rules. Second, the Linkerd proxy, which is a lightweight micro-proxy written in Rust, is the data plane itself.I will be walking you through the setup of Linkerd (2.12.2 at the time of writing) on a Kubernetes cluster.Let s see what s running on the cluster currently. This assumes you have a cluster running and kubectl is installed and available on the PATH.
On most systems, this should be sufficient to setup the CLI. You may need to restart your terminal to load the updated paths. If you have a non-standard configuration and linkerd is not found after the installation, add the following to your PATH to be able to find the cli:
export PATH=$PATH:~/.linkerd2/bin/
At this point, checking the version would give you the following:
$ linkerd version Client version: stable-2.12.2 Server version: unavailable
Setting up Linkerd Control PlaneBefore installing Linkerd on the cluster, run the following step to check the cluster for pre-requisites:
kubernetes-api -------------- can initialize the client can query the Kubernetes API
kubernetes-version ------------------ is running the minimum Kubernetes API version is running the minimum kubectl version
pre-kubernetes-setup -------------------- control plane namespace does not already exist can create non-namespaced resources can create ServiceAccounts can create Services can create Deployments can create CronJobs can create ConfigMaps can create Secrets can read Secrets can read extension-apiserver-authentication configmap no clock skew detected
linkerd-version --------------- can determine the latest version cli is up-to-date
Status check results are
All the pre-requisites appear to be good right now, and so installation can proceed.The first step of the installation is to setup the Custom Resource Definitions (CRDs) that Linkerd requires. The linkerd cli only prints the resource YAMLs to standard output and does not create them directly in Kubernetes, so you would need to pipe the output to kubectl apply to create the resources in the cluster that you re working with.
$ linkerd install --crds kubectl apply -f - Rendering Linkerd CRDs... Next, run linkerd install kubectl apply -f - to install the control plane.
customresourcedefinition.apiextensions.k8s.io/authorizationpolicies.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/httproutes.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/meshtlsauthentications.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/networkauthentications.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/serverauthorizations.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/servers.policy.linkerd.io created customresourcedefinition.apiextensions.k8s.io/serviceprofiles.linkerd.io created
Next, install the Linkerd control plane components in the same manner, this time without the crds switch:
$ linkerd install kubectl apply -f - namespace/linkerd created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-identity created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-identity created serviceaccount/linkerd-identity created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-destination created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-destination created serviceaccount/linkerd-destination created secret/linkerd-sp-validator-k8s-tls created validatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-sp-validator-webhook-config created secret/linkerd-policy-validator-k8s-tls created validatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-policy-validator-webhook-config created clusterrole.rbac.authorization.k8s.io/linkerd-policy created clusterrolebinding.rbac.authorization.k8s.io/linkerd-destination-policy created role.rbac.authorization.k8s.io/linkerd-heartbeat created rolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created clusterrole.rbac.authorization.k8s.io/linkerd-heartbeat created clusterrolebinding.rbac.authorization.k8s.io/linkerd-heartbeat created serviceaccount/linkerd-heartbeat created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-proxy-injector created serviceaccount/linkerd-proxy-injector created secret/linkerd-proxy-injector-k8s-tls created mutatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-proxy-injector-webhook-config created configmap/linkerd-config created secret/linkerd-identity-issuer created configmap/linkerd-identity-trust-roots created service/linkerd-identity created service/linkerd-identity-headless created deployment.apps/linkerd-identity created service/linkerd-dst created service/linkerd-dst-headless created service/linkerd-sp-validator created service/linkerd-policy created service/linkerd-policy-validator created deployment.apps/linkerd-destination created cronjob.batch/linkerd-heartbeat created deployment.apps/linkerd-proxy-injector created service/linkerd-proxy-injector created secret/linkerd-config-overrides created
Kubernetes will start spinning up the data plane components and you should see the following when you list the pods:
kubernetes-api -------------- can initialize the client can query the Kubernetes API
kubernetes-version ------------------ is running the minimum Kubernetes API version is running the minimum kubectl version
linkerd-existence ----------------- 'linkerd-config' config map exists heartbeat ServiceAccount exist control plane replica sets are ready no unschedulable pods control plane pods are ready cluster networks contains all pods cluster networks contains all services
linkerd-config -------------- control plane Namespace exists control plane ClusterRoles exist control plane ClusterRoleBindings exist control plane ServiceAccounts exist control plane CustomResourceDefinitions exist control plane MutatingWebhookConfigurations exist control plane ValidatingWebhookConfigurations exist proxy-init container runs as root user if docker container runtime is used
linkerd-identity ---------------- certificate config is valid trust anchors are using supported crypto algorithm trust anchors are within their validity period trust anchors are valid for at least 60 days issuer cert is using supported crypto algorithm issuer cert is within its validity period issuer cert is valid for at least 60 days issuer cert is issued by the trust anchor
linkerd-webhooks-and-apisvc-tls ------------------------------- proxy-injector webhook has valid cert proxy-injector cert is valid for at least 60 days sp-validator webhook has valid cert sp-validator cert is valid for at least 60 days policy-validator webhook has valid cert policy-validator cert is valid for at least 60 days
linkerd-version --------------- can determine the latest version cli is up-to-date
control-plane-version --------------------- can retrieve the control plane version control plane is up-to-date control plane and cli versions match
linkerd-control-plane-proxy --------------------------- control plane proxies are healthy control plane proxies are up-to-date control plane proxies and cli versions match
Status check results are
Everything looks good.Setting up the Viz ExtensionAt this point, the required components for the service mesh are setup, but let s also install the viz extension, which provides a good visualization capabilities that will come in handy subsequently. Once again, linkerd uses the same pattern for installing the extension.
$ linkerd viz install kubectl apply -f - namespace/linkerd-viz created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-metrics-api created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-metrics-api created serviceaccount/metrics-api created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-prometheus created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-prometheus created serviceaccount/prometheus created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-admin created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-auth-delegator created serviceaccount/tap created rolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-tap-auth-reader created secret/tap-k8s-tls created apiservice.apiregistration.k8s.io/v1alpha1.tap.linkerd.io created role.rbac.authorization.k8s.io/web created rolebinding.rbac.authorization.k8s.io/web created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-check created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-check created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-admin created clusterrole.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-api created clusterrolebinding.rbac.authorization.k8s.io/linkerd-linkerd-viz-web-api created serviceaccount/web created server.policy.linkerd.io/admin created authorizationpolicy.policy.linkerd.io/admin created networkauthentication.policy.linkerd.io/kubelet created server.policy.linkerd.io/proxy-admin created authorizationpolicy.policy.linkerd.io/proxy-admin created service/metrics-api created deployment.apps/metrics-api created server.policy.linkerd.io/metrics-api created authorizationpolicy.policy.linkerd.io/metrics-api created meshtlsauthentication.policy.linkerd.io/metrics-api-web created configmap/prometheus-config created service/prometheus created deployment.apps/prometheus created service/tap created deployment.apps/tap created server.policy.linkerd.io/tap-api created authorizationpolicy.policy.linkerd.io/tap created clusterrole.rbac.authorization.k8s.io/linkerd-tap-injector created clusterrolebinding.rbac.authorization.k8s.io/linkerd-tap-injector created serviceaccount/tap-injector created secret/tap-injector-k8s-tls created mutatingwebhookconfiguration.admissionregistration.k8s.io/linkerd-tap-injector-webhook-config created service/tap-injector created deployment.apps/tap-injector created server.policy.linkerd.io/tap-injector-webhook created authorizationpolicy.policy.linkerd.io/tap-injector created networkauthentication.policy.linkerd.io/kube-api-server created service/web created deployment.apps/web created serviceprofile.linkerd.io/metrics-api.linkerd-viz.svc.cluster.local created serviceprofile.linkerd.io/prometheus.linkerd-viz.svc.cluster.local created
A few seconds later, you should see the following in your pod list:
The viz components live in the linkerd-viz namespace.You can now checkout the viz dashboard:
$ linkerd viz dashboard Linkerd dashboard available at: http://localhost:50750 Grafana dashboard available at: http://localhost:50750/grafana Opening Linkerd dashboard in the default browser Opening in existing browser session.
The Meshed column indicates the workload that is currently integrated with the Linkerd control plane. As you can see, there are no application deployments right now that are running.Injecting the Linkerd Data Plane componentsThere are two ways to integrate Linkerd to the application containers:1 by manually injecting the Linkerd data plane components 2 by instructing Kubernetes to automatically inject the data plane componentsInject Linkerd data plane manuallyLet s try the first option. Below is a simple nginx-app that I will deploy into the cluster:
Back in the viz dashboard, I do see the workload deployed, but it isn t currently communicating with the Linkerd control plane, and so doesn t show any metrics, and the Meshed count is 0:Looking at the Pod s deployment YAML, I can see that it only includes the nginx container:
Let s directly inject the linkerd data plane into this running container. We do this by retrieving the YAML of the deployment, piping it to linkerd cli to inject the necessary components and then piping to kubectl apply the changed resources.
Back in the viz dashboard, the workload now is integrated into Linkerd control plane.Looking at the updated Pod definition, we see a number of changes that the linkerd has injected that allows it to integrate with the control plane. Let s have a look:
At this point, the necessary components are setup for you to explore Linkerd further. You can also try out the jaeger and multicluster extensions, similar to the process of installing and using the viz extension and try out their capabilities.Inject Linkerd data plane automaticallyIn this approach, we shall we how to instruct Kubernetes to automatically inject the Linkerd data plane to workloads at deployment time.We can achieve this by adding the linkerd.io/inject annotation to the deployment descriptor which causes the proxy injector admission hook to execute and inject linkerd data plane components automatically at the time of deployment.
This annotation can also be specified at the namespace level to affect all the workloads within the namespace. Note that any resources created before the annotation was added to the namespace will require a rollout restart to trigger the injection of the Linkerd components.Uninstalling LinkerdNow that we have walked through the installation and setup process of Linkerd, let s also cover how to remove it from the infrastructure and go back to the state prior to its installation.The first step would be to remove extensions, such as viz.
The eighteenth release of littler as a CRAN package just landed, following in the now sixteen year history (!!) as a package started by Jeff in 2006, and joined by me a few weeks later.
littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only started to do in recent years.
littler lives on Linux and Unix, has its difficulties on macOS due to yet-another-braindeadedness there (who ever thought case-insensitive filesystems as a default were a good idea?) and simply does not exist on Windows (yet the build system could be extended see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo, as well as in the examples vignette.
This release, coming just a few weeks since the last release in August, heeds to clang-15 a updates one signature to a proper interface. It also contains one kindly contributed patch updating install2.r (and installBioc.r) to cope with a change in R-devel.
The full change description follows.
Changes in littler version 0.3.17 (2022-10-29)
Changes in package
An internal function prototype was updated for clang-15.
Changes in examples
The install2.r and installBioc. were updated for an update in R-devel (Tatsuya Shima and Dirk in #104).
My CRANberries service provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and also on the package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as soon via Ubuntu binaries at CRAN thanks to the tireless Michael Rutter.
Comments and suggestions are welcome at the GitHub repo.
If you like this or other open-source work I do, you can now sponsor me at GitHub.
This is the 22nd Discworld novel and follows Interesting Times in internal continuity. Like some of the other
Rincewind novels, it stands alone well enough that you could arguably
start reading here, but I have no idea why you'd want to.
When we last saw Rincewind, he was being magically yanked out of the
Agatean Empire. The intent was to swap him with a cannon and land him
back in Ankh-Morpork, but an unfortunate expansion of the spell to three
targets instead of two meant that a kangaroo had a very bad day. Ever
since then, Rincewind has been trying to survive the highly inhospitable
land of FourEcks (XXXX), so called because no one in Ankh-Morpork knows
where it is.
The faculty at the Unseen University didn't care enough about Rincewind to
bother finding him until the Librarian fell sick. He's feverish and
miserable, but worse, he's lost control of his morphic function, which
means that he's randomly turning into other things and is unable to take
care of the books. When those books are magical, this is dangerous. One
possible solution is to stabilize the Librarian's form with a spell, but
to do that they need his real name. The only person who might know it is
the former assistant librarian: Rincewind.
I am increasingly convinced that one of the difficulties in getting people
hooked on Discworld is that the series starts with two Rincewind books,
and the Rincewind books just aren't very good.
The fundamental problem is that Rincewind isn't a character, he's a gag.
Discworld starts out as mostly gags, but then the characterization
elsewhere gets deeper, the character interactions become more complex, and
Pratchett adds more and more philosophy. That, not the humor, is what I
think makes these books worth reading. But none of this applies to
Rincewind. By this point, he's been the protagonist of six novels, and
still the only thing I know about him is that he runs away from
everything. Other than that, he's just sort of... there.
In the better Rincewind novels, some of the gap is filled by Twoflower,
the Luggage, Cohen the barbarian, the Librarian (who sadly is out of
commission for most of this book), or the Unseen University faculty. But
they're all supporting characters. Most of them are also built around a
single (if better) gag. As a result, the Rincewind books tend more
towards joke collections than the rest of Discworld. There isn't a
philosophical or characterization through line to hold them together.
The Last Continent is, as you might have guessed, a parody of
Australia. And by that I mean it's a mash-up of Crocodile Dundee,
Mad Max, The Adventures of Priscilla, Queen of the Desert,
and every dad joke about Australia that you've heard. Pratchett loves
movie references and I do not love movie references, so there's always
part of his books that doesn't click for me, but this one was just Too
Much. Yes, everything in Australia is poisonous. Yes, Australians talk
funny. Oh look, there's another twist on a Crocodile Dundee quote.
Yes, yes, that's a knife. Gah. The Rincewind sections were either
confusing (there's some sort of drug-trip kangaroo god because reasons) or
cliched and boring. Sometimes both.
The second plot, following the Unseen University faculty in their inept
attempts to locate Rincewind, is better. Their bickering is still a bit
one-trick and works better in the background of stronger characters (such
as Death and Susan), but Pratchett does make
their oblivious overconfidence entertaining. It's enough to sustain half
of the book, but not enough to make up for the annoyances of the Rincewind
plot.
To his credit, I think Pratchett was really trying to say something
interesting in this novel about Discworld metaphysics. There are bits in
the Australian plot that clearly are references to Aboriginal beliefs,
which I didn't entirely follow but which I'm glad were in there. The
Unseen University faculty showing up in the middle of a creation myth and
completely misunderstanding it was a good scene. But the overall story
annoyed me and failed to hold my interest.
I don't feel qualified to comment on the Priscilla scenes, since
I've never seen the movie and have only a vague understanding of its role
in trans history. I'm not sure his twists on the story quite worked, but
I'm glad that Pratchett is exploring gender; that wasn't as common when
these books were written.
Overall, though, this was forgettable and often annoying. There are a few
great lines and a few memorable bits in any Pratchett book, including this
one, but the Rincewind books just aren't... good. Not like the rest of
the series, at least. I will be very happy to get back to the witches in
the next book.
Followed in publication order by Carpe Jugulum, and later
thematically by The Last Hero.
Rating: 5 out of 10
In my home lab(s), I have a handful of machines spread around a few
points of presence, with mostly residential/commercial cable/DSL
uplinks, which means, generally, NAT. This makes monitoring those
devices kind of impossible. While I do punch holes for SSH, using jump
hosts gets old quick, so I'm considering adding a virtual private
network (a "VPN", not a VPN service) so that all machines can
be reachable from everywhere.
I see three ways this can work:
a home-made Wireguard VPN, deployed with Puppet
a Wireguard VPN overlay, with Tailscale or equivalent
IPv6, native or with tunnels
So which one will it be?
Wireguard Puppet modules
As is (unfortunately) typical with Puppet, I found multiple different
modules to talk with Wireguard.
The voxpupuli module seems to be the most promising. The
abaranov module is more popular and has more contributors, but
it has more open issues and PRs.
More critically, the voxpupuli module was written after the abaranov
author didn't respond to a PR from the voxpupuli author trying to
add more automation (namely private key management).
It looks like setting up a wireguard network would be as simple as
this on node A:
This configuration come from this pull request I sent to the
module to document how to use that fact.
Note that the addresses used here are examples that shouldn't be
reused and do not confirm to RFC5737 ("IPv4 Address Blocks
Reserved for Documentation", 192.0.2.0/24 (TEST-NET-1),
198.51.100.0/24 (TEST-NET-2), and 203.0.113.0/24 (TEST-NET-3)) or
RFC3849 ("IPv6 Address Prefix Reserved for Documentation",
2001:DB8::/32), but that's another story.
(To avoid boostrapping problems, the resubmit-facts configuration
could be used so that other nodes facts are more immediately
available.)
One problem with the above approach is that you explicitly need to
take care of routing, network topology, and addressing. This can get
complicated quickly, especially if you have lots of devices, behind
NAT, in multiple locations (which is basically my life at home,
unfortunately).
Concretely, basic Wireguard only support one peer behind
NAT. There are some workarounds for this, but they generally imply
a relay server of some sort, or some custom registry, it's
kind of a mess. And this is where overlay networks like Tailscale come
in.
Tailscale
Tailscale is basically designed to deal with this problem. It's
not fully opensource, but pretty close, and they have an interesting
philosophy behind that. The client is opensource, and there
is an opensource version of the server side, called
headscale. They have recently (late 2022) hired the main headscale
developer while promising to keep supporting it, which is pretty
amazing.
Tailscale provides an overlay network based on Wireguard, where each
peer basically has a peer-to-peer encrypted connexion, with automatic
key rotation. They also ship a multitude of applications and features
on top of that like file sharing, keyless SSH access, and so on. The
authentication layer is based on an existing SSO provider, you
don't just register with Tailscale with new account, you login with
Google, Microsoft, or GitHub (which, really, is still Microsoft).
The Headscale server ships with many features out of that:
Neither project (client or server) is in Debian (RFP 972439 for
the client, none filed yet for the server), which makes deploying this
for my use case rather problematic. Their install instructions
are basically a curl bash but they also provide packages for
various platforms. Their Debian install instructions are
surprisingly good, and check most of the third party checklist
we're trying to establish. (It's missing a pin.)
There's also a Puppet module for tailscale, naturally.
What I find a little disturbing with Tailscale is that you not only
need to trust Tailscale with authorizing your devices, you also
basically delegate that trust also to the SSO provider. So, in my
case, GitHub (or anyone who compromises my account there) can
penetrate the VPN. A little scary.
Tailscale is also kind of an "all or nothing" thing. They have
MagicDNS, file transfers, all sorts of things, but those things
require you to hook up your resolver with Tailscale. In fact,
Tailscale kind of assumes you will use their nameservers, and have
suffered great lengths to figure out how to do that. And
naturally, here, it doesn't seem to work reliably; my resolv.conf
somehow gets replaced and the magic resolution of the ts.net domain
fails.
(I wonder why we can't opt in to just publicly resolve the ts.net
domain. I don't care if someone can enumerate the private IP addreses
or machines in use in my VPN, at least I don't care as much as
fighting with resolv.conf everywhere.)
Because I mostly have access to the routers on the networks I'm on, I
don't think I'll be using tailscale in the long term. But it's pretty
impressive stuff: in the time it took me to even review the Puppet
modules to configure Wireguard (which is what I'll probably end up
doing), I was up and running with Tailscale (but with a broken DNS,
naturally).
(And yes, basic Wireguard won't bring me DNS either, but at least I
won't have to trust Tailscale's Debian packages, and Tailscale, and
Microsoft, and GitHub with this thing.)
IPv6
IPv6 is actually what is supposed to solve this. Not NAT port
forwarding crap, just real IPs everywhere.
The problem is: even though IPv6 adoption is still growing, it's
kind of reaching a plateau at around 40% world-wide, with Canada
lagging behind at 34%. It doesn't help that major ISPs in Canada
(e.g. Bell Canada, Videotron) don't care at all about IPv6
(e.g. Videotron in betasince 2011). So we can't rely on
those companies to do the right thing here.
The typical solution here is often to use a tunnel like HE's
tunnelbroker.net. It's kind of tricky to configure, but once
it's done, it works. You get end-to-end connectivity as long as
everyone on the network is on IPv6.
And that's really where the problem lies here; the second one of
your nodes can't setup such a tunnel, you're kind of stuck and that
tool completely breaks down. IPv6 tunnels also don't give you the kind
of security a VPN provides as well, naturally.
The other downside of a tunnel is you don't really get peer-to-peer
connectivity: you go through the tunnel. So you can expect higher
latencies and possibly lower bandwidth as well. Also, HE.net doesn't
currently charge for this service (and they've been doing this for a
long time), but this could change in the future (just like
Tailscale, that said).
Concretely, the latency difference is rather minimal, Google:
--- ipv6.l.google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,8ms
RTT[ms]: min = 13, median = 14, p(90) = 14, max = 15
--- google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,0ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14
In the case of GitHub, latency is actually lower, interestingly:
--- ipv6.github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 134,6ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14
--- github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 293,1ms
RTT[ms]: min = 29, median = 29, p(90) = 29, max = 30
That is because HE.net peers directly with my ISP and Fastly (which
is behind GitHub.com's IPv6, apparently?), so it's only 6 hops
away. While over IPv4, the ping goes over New York, before landing
AWS's Ashburn, Virginia datacenters, for a whopping 13 hops...
I managed setup a HE.net tunnel at home, because I also need IPv6
for other reasons (namely debugging at work). My first attempt at
setting this up in the office failed, but now that I found the
openwrt.org guide, it worked... for a while, and I was able to
produce the above, encouraging, mini benchmarks.
Unfortunately, a few minutes later, IPv6 just went down again. And the
problem with that is that many programs (and especially
OpenSSH) do not respect the Happy Eyeballs protocol (RFC
8305), which means various mysterious "hangs" at random times on
random applications. It's kind of a terrible user experience, on top
of breaking the one thing it's supposed to do, of course, which is to
give me transparent access to all the nodes I maintain.
Even worse, it would still be a problem for other remote nodes I might
setup where I might not have acess to the router to setup the
tunnel. It's also not absolutely clear what happens if you setup the
same tunnel in two places... Presumably, something is smart enough to
distribute only a part of the /48 block selectively, but I don't
really feel like going that far, considering how flaky the setup is
already.
Other options
If this post sounds a little biased towards IPv6 and Wireguard, it's
because it is. I would like everyone to migrate to IPv6 already, and
Wireguard seems like a simple and sound system.
I'm aware of many other options to make VPNs. So before anyone jumps
in and says "but what about...", do know that I have personnally
experimented with:
ipsec, specifically strongswan: hard to configure
(especially configure correctly!), harder even to debug, otherwise
really nice because transparent (e.g. no need for special subnets),
used at work, but also considering a replacement there because it's
a major barrier to entry to train new staff
OpenVPN: mostly used as a client for [VPN service][]s like
Riseup VPN or Mullvad, mostly relevant for client-server
configurations, not really peer-to-peer, shared secrets or TLS,
kind of an hassle to maintain, see also SoftEther for an
alternative implementation
All of those solutions have significant problems and I do not wish to
use any of those for this project.
Also note that Tailscale is only one of many projects laid over
Wireguard to do that kind of thing, see this LWN review for
others (basically NetbBird, Firezone, and Netmaker).
Future work
Those are options that came up after writing this post, and might
warrant further examination in the future.
Meshbird, a "distributed private networking" with little
information about how it actually works other than "encrypted with
strong AES-256"
Nebula, "A scalable overlay networking tool with a focus on
performance, simplicity and security", written by Slack people to
replace IPsec, docs, runs as an overlay for Slack's 50k
node network, only packaged in Debian experimental, lagging behind
upstream (1.4.0, from May 2021 vs upstream's 1.6.1 from September
2022), requires a central CA, Golang, I'm in "wait and see" mode
for now
ouroboros: "peer-to-peer packet network prototype", sounds and
seems complicated
QuickTUN is interesting because it's just a small wrapper
around NaCL, and it's in Debian... but maybe too obscure for my own
good
unetd: Wireguard-based full mesh networking from OpenWRT, not
in Debian
vpncloud: "high performance peer-to-peer mesh VPN over UDP
supporting strong encryption, NAT traversal and a simple
configuration", sounds interesting, not in Debian
Conclusion
Right now, I'm going to deploy Wireguard tunnels with Puppet. It seems
like kind of a pain in the back, but it's something I will be able to
reuse for work, possibly completely replacing strongswan.
I have another Puppet module for IPsec which I was planning to
publish, but now I'm thinking I should just abort that and replace
everything with Wireguard, assuming we still need VPNs at work in the
future. (I have a number of reasons to believe we might not need any
in the near future anyways...)