Search Results: "bab"

29 April 2023

Russell Coker: Write a blog post in the style of Russell Coker

Feeling a bit bored I asked ChatGPT Write a blog post in the style of Russell Coker and the result is in the section below. I don t know if ChatGPT knows that the person asking the question is the same as the person being asked about. If a human had created that I d be certain that great computer scientist and writer was an attempt at flattery, for a machine I m not sure. I have not written a single book, but I expect that in some alternate universe some version of me has written several. I don t know if humans would describe my writing as being known for clarity, precision, and depth . I would not be surprised if no-one else wrote about it so I guess I m forced to read what he wrote would be a more common response. The actual article part doesn t seem to be in my style at all. Firstly it s very short at only 312 words, while I have written some short posts most of them are much longer. To find this out I did some MySQL queries to get the lengths of posts (I used this blog post as inspiration [1]). Note that multiple sequential spaces counts as multiple words.
# get post ID and word count
SELECT id, LENGTH(post_content) - LENGTH(REPLACE(REPLACE(REPLACE(REPLACE(post_content, "\r", ""), "\n", ""), "\t", ""), " ", "")) + 1 AS wordcount FROM wp_posts where post_status = 'publish' and post_type='post';
# get average word count
SELECT avg(LENGTH(post_content) - LENGTH(REPLACE(REPLACE(REPLACE(REPLACE(post_content, "\r", ""), "\n", ""), "\t", ""), " ", "")) + 1) FROM wp_posts where post_status = 'publish' and post_type='post';
# get the first posts by length
SELECT id, LENGTH(post_content) - LENGTH(REPLACE(REPLACE(REPLACE(REPLACE(post_content, "\r", ""), "\n", ""), "\t", ""), " ", "")) + 1 AS wordcount, post_content FROM wp_posts where post_status = 'publish' and post_type='post' ORDER BY wordcount limit 10;
# get a count of the posts less than 312 words
SELECT count(*) from wp_posts where (LENGTH(post_content) - LENGTH(REPLACE(REPLACE(REPLACE(REPLACE(post_content, "\r", ""), "\n", ""), "\t", ""), " ", "")) + 1) < 312 and post_status = 'publish' and post_type='post';
# get a count of all posts
select count(*) from wp_posts where post_status = 'publish' and post_type='post';
It turns out that there are 333/1521 posts that are less than 312 words and the average length is 665 words. Of the shortest posts a large portion were written before Twitter became popular and had the sort of unimportant content that Twitter is good for. It seems ironic that ChatGPT says that I'm known for "depth" and then writes a short post allegedly in my style. As for the actual content of the "post", it's not something that I greatly disagree with but it's definitely not something I'd write. If some manager or marketing person wrote something like this and asked my opinion I'd probably suppress a sigh and say it's OK for it's target market. I think that free software is better in many ways and for many reasons and that separately from the issue of merit free software is important because of the ways that it is used. We could debate various issues related to the quality of free software vs proprietary software but even people who don't believe that free software is of high quality have to admit that software that runs most mobile phones and most web servers is important. As much as I dislike Windows most desktop PCs and laptops run it so I can't claim it to be unimportant. So my first disagreement with the content is the issue of importance vs quality. One thing that ChatGPT is known for is weak arguments that appear convincing. "This diversity often leads to better software, as bugs are found and fixed faster, and new features are added more quickly" could be opposed by "the payment for proprietary software often leads to better software, as bugs are found and fixed faster, and new features are added more quickly". I think that any case where "feature of option A gives this list of good results" can be rewritten as "feature of the opposite of option A gives the same list of good results" and be equally convincing is a poor argument. Also any argument that seems to rest on "free software is all written by hobbyists" is ignoring the demands of users who need things done on a non-hobby timeframe, the many profitable companies that develop and support free software, and the employment status of a large portion of my Facebook friends (many of whom are paid for ~2000 hours a year of free software development and many more of whom have paid free software development as part of their work supporting the use of free software). Really the entire blog post it wrote seems to me like something a human might write if their boss assigned them to write about the benefits of free software, and that they would then write a new article saying the exact opposite if their boss asked them to. Finally I prefer the term "free software" to "open source software". The difference doesn't seem important to some people, but any being that doesn't know the difference won't do well at impersonating me. ChatGPT Result Hello everyone, I'm ChatGPT, and today I'm going to try my best to write a blog post in the style of the great computer scientist and writer, Russell Coker. Russell Coker is a renowned software engineer and writer, who is best known for his contributions to the Linux community, especially the Debian project. He has authored several books and articles on topics ranging from security to software engineering, and his writing style is known for its clarity, precision, and depth. So, without further ado, let's dive into the topic of this blog post, which is the importance of open-source software. Open-source software is software whose source code is available for anyone to inspect, modify, and distribute. This means that anyone can use, study, and improve the software, without any restrictions or fees. Open-source software has been around for a long time, but it has become more prevalent and important in recent years, thanks to the rise of the internet and the digital economy. Open-source software has several benefits, both for individuals and for society as a whole. One of the most significant benefits is that it fosters innovation and collaboration. When software is open-source, it means that anyone can contribute to it, which leads to a diverse range of perspectives and ideas. This diversity often leads to better software, as bugs are found and fixed faster, and new features are added more quickly. Another benefit of open-source software is that it is often more secure than proprietary software. Because the source code is available for anyone to inspect, vulnerabilities and security flaws can be found and fixed faster. This means that open-source software is often more reliable and trustworthy than proprietary software, which may have hidden backdoors or vulnerabilities that can be exploited by attackers. Open-source software is also often more cost-effective than proprietary software. Because it is free to use and distribute, organizations can save money on licensing fees and other costs associated with proprietary software. This can be particularly important for non-profit organizations or small businesses that may not have the resources to invest in expensive software licenses. In conclusion, open-source software is an important part of the digital landscape, and its benefits are numerous. Whether you are a software developer, a business owner, or an individual user, open-source software has something to offer. So, the next time you are looking for software to use, consider open-source options, and help support the collaborative and innovative spirit of the open-source community.

27 April 2023

Jonathan McDowell: Repurposing my C.H.I.P.

Way back at DebConf16 Gunnar managed to arrange for a number of Next Thing Co. C.H.I.P. boards to be distributed to those who were interested. I was lucky enough to be amongst those who received one, but I have to confess after some initial experimentation it ended up sitting in its box unused. The reasons for that were varied; partly about not being quite sure what best to do with it, partly due to a number of limitations it had, partly because NTC sadly went insolvent and there was less momentum around the hardware. I ve always meant to go back to it, poking it every now and then but never completing a project. I m finally almost there, and I figure I should write some of it up. TL;DR: My C.H.I.P. is currently running a mainline Linux 6.3 kernel with only a few DTS patches, an upstream u-boot v2022.1 with a couple of minor patches and an unmodified Debian bullseye armhf userspace.

Storage The main issue with the C.H.I.P. is that it uses MLC NAND, in particular mine has an 8MB H27QCG8T2E5R. That ended up unsupported in Linux, with the UBIFS folk disallowing operation on MLC devices. There s been subsequent work to enable an SLC emulation mode which makes the device more reliable at the cost of losing capacity by pairing up writes/reads in cells (AFAICT). Some of this hit for the H27UCG8T2ETR in 5.16 kernels, but I definitely did some experimentation with 5.17 without having much success. I should maybe go back and try again, but I ended up going a different route. It turned out that BytePorter had documented how to add a microSD slot to the NTC C.H.I.P., using just a microSD to full SD card adapter. Every microSD card I buy seems to come with one of these, so I had plenty lying around to test with. I started with ensuring the kernel could see it ok (by modifying the device tree), but once that was all confirmed I went further and built a more modern u-boot that talked to the SD card, and defaulted to booting off it. That meant no more relying on the internal NAND at all! I do see some flakiness with the SD card, which is possibly down to the dodgy way it s hooked up (I should probably do a basic PCB layout with JLCPCB instead). That s mostly been mitigated by forcing it into 1-bit mode instead of 4-bit mode (I tried lowering the frequency too, but that didn t make a difference). The problem manifests as:
sunxi-mmc 1c11000.mmc: data error, sending stop command
and then all storage access freezing (existing logins still work, if the program you re trying to run is in cache). I can t find a conclusive software solution to this; I m pretty sure it s the hardware, but I don t understand why the recovery doesn t generally work.

Random power offs After I had storage working I d see random hangs or power offs. It wasn t quite clear what was going on. So I started trying to work out how to find out the CPU temperature, in case it was overheating. It turns out the temperature sensor on the R8 is part of the touchscreen driver, and I d taken my usual approach of turning off all the drivers I didn t think I d need. Enabling it (CONFIG_TOUCHSCREEN_SUN4I) gave temperature readings and seemed to help somewhat with stability, though not completely. Next I ended up looking at the AXP209 PMIC. There were various scripts still installed (I d started out with the NTC Debian install and slowly upgraded it to bullseye while stripping away the obvious pieces I didn t need) and a start-up script called enable-no-limit. This turned out to not be running (some sort of expectation of i2c-dev being loaded and another failing check), but looking at the script and the data sheet revealed the issue. The AXP209 can cope with 3 power sources; an external DC source, a Li-battery, and finally a USB port. I was powering my board via the USB port, using a charger rated for 2A. It turns out that the AXP209 defaults to limiting USB current to 900mA, and that with wifi active and the CPU busy the C.H.I.P. can rise above that. At which point the AXP shuts everything down. Armed with that info I was able to understand what the power scripts were doing and which bit I needed - i2cset -f -y 0 0x34 0x30 0x03 to set no limit and disable the auto-power off. Additionally I also discovered that the AXP209 had a built in temperature sensor as well, so I added support for that via iio-hwmon.

WiFi WiFi on the C.H.I.P. is provided by an RTL8723BS SDIO attached device. It s terrible (and not just here, I had an x86 based device with one where it also sucked). Thankfully there s a driver in staging in the kernel these days, but I ve still found it can fall out with my house setup, end up connecting to a further away AP which then results in lots of retries, dropped frames and CPU consumption. Nailing it to the AP on the other side of the wall from where it is helps. I haven t done any serious testing with the Bluetooth other than checking it s detected and can scan ok.

Patches I patched u-boot v2022.01 (which shows you how long ago I was trying this out) with the following to enable boot from external SD:
u-boot C.H.I.P. external SD patch
diff --git a/arch/arm/dts/sun5i-r8-chip.dts b/arch/arm/dts/sun5i-r8-chip.dts
index 879a4b0f3b..1cb3a754d6 100644
--- a/arch/arm/dts/sun5i-r8-chip.dts
+++ b/arch/arm/dts/sun5i-r8-chip.dts
@@ -84,6 +84,13 @@
 		reset-gpios = <&pio 2 19 GPIO_ACTIVE_LOW>; /* PC19 */
 	 ;
 
+	mmc2_pins_e: mmc2@0  
+		pins = "PE4", "PE5", "PE6", "PE7", "PE8", "PE9";
+		function = "mmc2";
+		drive-strength = <30>;
+		bias-pull-up;
+	 ;
+
 	onewire  
 		compatible = "w1-gpio";
 		gpios = <&pio 3 2 GPIO_ACTIVE_HIGH>; /* PD2 */
@@ -175,6 +182,16 @@
 	status = "okay";
  ;
 
+&mmc2  
+	pinctrl-names = "default";
+	pinctrl-0 = <&mmc2_pins_e>;
+	vmmc-supply = <&reg_vcc3v3>;
+	vqmmc-supply = <&reg_vcc3v3>;
+	bus-width = <4>;
+	broken-cd;
+	status = "okay";
+ ;
+
 &ohci0  
 	status = "okay";
  ;
diff --git a/arch/arm/include/asm/arch-sunxi/gpio.h b/arch/arm/include/asm/arch-sunxi/gpio.h
index f3ab1aea0e..c0dfd85a6c 100644
--- a/arch/arm/include/asm/arch-sunxi/gpio.h
+++ b/arch/arm/include/asm/arch-sunxi/gpio.h
@@ -167,6 +167,7 @@ enum sunxi_gpio_number  
 
 #define SUN8I_GPE_TWI2		3
 #define SUN50I_GPE_TWI2		3
+#define SUNXI_GPE_SDC2		4
 
 #define SUNXI_GPF_SDC0		2
 #define SUNXI_GPF_UART0		4
diff --git a/board/sunxi/board.c b/board/sunxi/board.c
index fdbcd40269..f538cb7e20 100644
--- a/board/sunxi/board.c
+++ b/board/sunxi/board.c
@@ -433,9 +433,9 @@ static void mmc_pinmux_setup(int sdc)
 			sunxi_gpio_set_drv(pin, 2);
 		 
 #elif defined(CONFIG_MACH_SUN5I)
-		/* SDC2: PC6-PC15 */
-		for (pin = SUNXI_GPC(6); pin <= SUNXI_GPC(15); pin++)  
-			sunxi_gpio_set_cfgpin(pin, SUNXI_GPC_SDC2);
+		/* SDC2: PE4-PE9 */
+		for (pin = SUNXI_GPE(4); pin <= SUNXI_GPE(9); pin++)  
+			sunxi_gpio_set_cfgpin(pin, SUNXI_GPE_SDC2);
 			sunxi_gpio_set_pull(pin, SUNXI_GPIO_PULL_UP);
 			sunxi_gpio_set_drv(pin, 2);
 		 

I ve sent some patches for the kernel device tree upstream - there s an outstanding issue with the Bluetooth wake GPIO causing the serial port not to probe(!) that I need to resolve before sending a v2, but what s there works for me. The only remaining piece is patch to enable the external SD for Linux; I don t think it s appropriate to send upstream but it s fairly basic. This limits the bus to 1 bit rather than the 4 bits it s capable of, as mentioned above.
Linux C.H.I.P. external SD DTS patch diff diff --git a/arch/arm/boot/dts/sun5i-r8-chip.dts b/arch/arm/boot/dts/sun5i-r8-chip.dts index fd37bd1f3920..2b5aa4952620 100644 --- a/arch/arm/boot/dts/sun5i-r8-chip.dts +++ b/arch/arm/boot/dts/sun5i-r8-chip.dts @@ -163,6 +163,17 @@ &mmc0 status = "okay"; ; +&mmc2 + pinctrl-names = "default"; + pinctrl-0 = <&mmc2_4bit_pe_pins>; + vmmc-supply = <&reg_vcc3v3>; + vqmmc-supply = <&reg_vcc3v3>; + bus-width = <1>; + non-removable; + disable-wp; + status = "okay"; + ; + &ohci0 status = "okay"; ;

As for what I m doing with it, I think that ll have to be a separate post.

20 April 2023

Jamie McClelland: Electron doesn't like negative layout coordinates

I got a second external monitor. Overkill? Probably, but I like having a dedicated space to instant messaging (now left monitor) and also a dedicated space for a web browser (right monitor). But, when I moved signal-desktop to the left monitor, clicks stopped working. I moved it back to my laptop screen, clicks started working. Other apps (like gajim) worked fine. A real mystery. I spent a lot of time on the wrong thing. I turned this monitor into portrait mode. Maybe signal doesn t like portrait mode? Nope. Maybe signal doesn t think it has enough horizontal space? Nope. Maybe signal suddently doesn t like being on an external monitor? Nope. Maybe signal will output something useful if I start it via the terminal? Nope (but that was a real distraction). Then, I discovered that mattermost desktop behaves the same way. A clue! So, now I know it s an electron app limitation, not a signal limitation. Finally I hit on the problem: Via my sway desktop, I set my laptop screen to the x,y coordinates: 0,0 (since it s in the middle). I set the left monitor to negative coordinates so it would appear on the left. Well, electron does not like that, not sure why. Now, my left monitor occupies 0,0 and the rest are adjusted to that center. I originally used negative coordinates so when I unplugged my monitors, my laptop would still be 0,0 and display all desktops properly. Fortunately, sway magically figures that all out even when the center shifts by unplugging the monitors. Hooray for sway.

19 April 2023

Ian Jackson: The Rust Foundation's bad draft trademark policy

tl;dr The Rust Foundation s proposed new trademark policy is far too restrictive, and will cause (more) drama unless it is substantially revised. Process Rust is a trademark owned by the Foundation. The Rust Foundation still seems to be finding its feet. Evidently, one of the items on its backlog was to update the trademark policy. Apparently they have been working on this for some time, in an informal working group. In August, there was a survey. (I saw it in This Week In Rust, the community-curated newsletter where most important stuff appears, and responded.) I don t think the results of this survey have been published anywhere. Last week (12th April) the Foundation published an official Inside Rust blog post linking to a draft. They included a link to a feedback survey, closing on the 17th of April i.e., it was open for 5 days. This is far too short a period for formal feedback on such a draft. Especially given that this process has apparently already been generating significant controversy within parts of the Rust community. Substance Overall, this policy is poor. It is far too restrictive. It is likely to lead to (further) controversy and argument, including conflicts with Rust s downstreams. It does not serve the needs of the Rust community. In particular, the Rust community does not need the trademark to: The community does need the trademark to: It might be useful to use the trademark to strengthen licensing or CoC compliance. For example, good faith redistributions of a modified Rust, as Rust , would be Free Software, even though the copyright licence permits proprietary derivatives; so use of the Rust trademark should probably require use of a Free licence. There should be a series of blanket permissions to use the word Rust in for example: Currently there aren t. For example the current Debian practice of calling Rust libraries rust-<name-of-crate> is probably in violation. There are a number of more detailed problems with the wording. Values The policy has all the hallmarks of excessive influence from traditional trademark lawyers and not enough influence from the Free Software community. I would like to remind the Free Software activists on the inside of this process that the lawyers are there to serve you and the community. The values embodied in trademark law often conflict with the values of the Free Software community. The Rust Project should adopt a trademark policy which follows the community s values - even if that might weaken our ability to sue evildoers. Next steps The Foundation should take a step back and pause the process. Then, the Foundation should restart the process from a much earlier stage, with much wider publicity. Each stage should be widely advertised to the whole community, with opportunities for feedback. This should include publishing the results of the August 2022 survey. The Foundation should publish a sketch of the legal advice they have received, publicly say what the plausible options are and what their consequences might be (for the community, for downstreams, and for the Foundation s enforcement ability). (Some of this will no doubt repeat the work that has been done in the informal trademark working group. That work wasn t widely enough advertised.) Echoes of a dispute from 2006 Mozilla made a very similar mistake with Firefox in 2006. The official policy stated that no-one was allowed to distribute Firefox with any patches, unless those patches had been pre-approved by Mozilla. Debian is committed to Software Freedom. This must includes the freedom to modify the software as one sees fit, even if the original authors don t agree. Now, overly-restrictive trademark policies are hardly new. Debian often takes the practical view that usually the upstream with such a policy doesn t really mean it. But Mozilla decided they did mean it. They contacted Debian asking for Debian to get their patches approved. Since that wasn t acceptable to Debian, they stopped using the word Firefox . For a decade, Debian s Firefox browser was called Iceweasel . We don t want something similar happening to Rust .

comment count unavailable comments

18 April 2023

Shirish Agarwal: Philips LCD Monitor 22 , 1984, Reaper Man, The Firm.

PHILIPS PHL 221S8L Those who have been reading this blog for a long time would perhaps know that I had bought a Viewsonic 19 almost 12 years ago. The Monitor was functioning well till last week. I had thought to change it to a 24 monitor almost 3-4 years ago when 24 LCD Monitors were going for around 4k/- or thereabouts. But the monitor kept on functioning and I didn t have space (nor do) to have a dual-monitor setup. It just didn t make sense. Apart from higher electricity charges it would have also have made more demands on my old system which somehow is still functioning even after all those years. Then last week, it started to dim and after couple of days completely conked out. I had wanted to buy a new monitor in front of mum so she could watch movies or whatever but this was not to be. Sp I had to buy an LCD Monitor as Government raised taxes enormously after pandemic. Same/similar monitor that used to cost INR 4k/- today costed me almost INR 7k/- almost double the price. Hooking it to Debian I got the following

$ sudo hwinfo --monitor grep Model Model: "PHILIPS PHL 221S8L" FWIW hwinfo is the latest version

~$ sudo hwinfo --version21.82
I did see couple of movies before starting to write this blog post. Not an exceptional monitor but better than before. I had option from three brands, Dell (most expensive), Philips (middle) and & LG (lowest in prices). Interestingly, Viewsonic disappeared from the market about 5 years back and made a comeback just couple of years ago. Even Philips which had exited the PC Monitor almost a decade back re-entered the market. Apart from the branding, it doesn t make much of a difference as almost all the products including the above monitors are produced in China. I did remember her a lot while buying the monitor as I m sure she would have enjoyed it far more than me but that was not to be

1984 During last week when I didn t have the monitor I re-read 1984. To be completely honest, I had read the above book when I was in the 20 s and I had no context. The protagonist seemed like a whiner and for the life of me I couldn t understand why he didn t try to escape. Re-reading after almost 2 decades and a bit more I shat a number of times because now the context is pretty near and pretty real. I can see why the Republicans in the U.S. banned it. I also realized why the protagnist didn t attempt to run away because wherever he would run away it would be the same thing. It probably is one of the most depressing books I have ever read. To willfully accept what is false after all that torture. What was also interesting to me is to find that George Orwell was also a soldier just like Tolkien was. Both took part and wrote such different stories. While Mr. Tolkien writes and shares the pendulum between hope and despair, Mr. Orwell is decidedly dark. Not grey but dark. I am not sure if I would like to read Animal Farm anytime soon.

The Reaper Man Terry Pratchett It is by sheer coincidence or perhaps I needed something to fill me up when I got The Reaper Man from Terry Pratchett. It was practically like a breath of fresh air. And I love Mr. Pratchett for the inclusivity he brings in. We think about skin color, and what not and here Mr. Pratchett writes about an undead gentleman who s extremely polite as he was a wizard. I won t talk more as I don t really want to spoil the surprise but rest assured everybody is gonna love it. I also read Long Utopia but this is for those who believe and think of multiverses long before it became a buzzword that it is today.

The Firm John Grisham Now I don t know what I should write about this book as there aren t many John Grisham books where a rookie wins against more than one party opposite him. I wouldn t go much into depth but simply say it was worth a read. I am currently reading Gray Mountain. It very much shows how the coal Industry is corrupt and what all it does. It also brings to mind the amount of mining that is done in which Iron is the mostly sought after and done. Now if we are mining 94% Iron then wouldn t it make sense to ask to have a circular economy around Iron but we don t even hear a word about it. Even with all the imagined projections of lithium mining, it would hardly be 10% . I could go on but will finish for now, till later.

Matthew Garrett: PSA: upgrade your LUKS key derivation function

Here's an article from a French anarchist describing how his (encrypted) laptop was seized after he was arrested, and material from the encrypted partition has since been entered as evidence against him. His encryption password was supposedly greater than 20 characters and included a mixture of cases, numbers, and punctuation, so in the absence of any sort of opsec failures this implies that even relatively complex passwords can now be brute forced, and we should be transitioning to even more secure passphrases.

Or does it? Let's go into what LUKS is doing in the first place. The actual data is typically encrypted with AES, an extremely popular and well-tested encryption algorithm. AES has no known major weaknesses and is not considered to be practically brute-forceable - at least, assuming you have a random key. Unfortunately it's not really practical to ask a user to type in 128 bits of binary every time they want to unlock their drive, so another approach has to be taken.

This is handled using something called a "key derivation function", or KDF. A KDF is a function that takes some input (in this case the user's password) and generates a key. As an extremely simple example, think of MD5 - it takes an input and generates a 128-bit output, so we could simply MD5 the user's password and use the output as an AES key. While this could technically be considered a KDF, it would be an extremely bad one! MD5s can be calculated extremely quickly, so someone attempting to brute-force a disk encryption key could simply generate the MD5 of every plausible password (probably on a lot of machines in parallel, likely using GPUs) and test each of them to see whether it decrypts the drive.

(things are actually slightly more complicated than this - your password is used to generate a key that is then used to encrypt and decrypt the actual encryption key. This is necessary in order to allow you to change your password without having to re-encrypt the entire drive - instead you simply re-encrypt the encryption key with the new password-derived key. This also allows you to have multiple passwords or unlock mechanisms per drive)

Good KDFs reduce this risk by being what's technically referred to as "expensive". Rather than performing one simple calculation to turn a password into a key, they perform a lot of calculations. The number of calculations performed is generally configurable, in order to let you trade off between the amount of security (the number of calculations you'll force an attacker to perform when attempting to generate a key from a potential password) and performance (the amount of time you're willing to wait for your laptop to generate the key after you type in your password so it can actually boot). But, obviously, this tradeoff changes over time - defaults that made sense 10 years ago are not necessarily good defaults now. If you set up your encrypted partition some time ago, the number of calculations required may no longer be considered up to scratch.

And, well, some of these assumptions are kind of bad in the first place! Just making things computationally expensive doesn't help a lot if your adversary has the ability to test a large number of passwords in parallel. GPUs are extremely good at performing the sort of calculations that KDFs generally use, so an attacker can "just" get a whole pile of GPUs and throw them at the problem. KDFs that are computationally expensive don't do a great deal to protect against this. However, there's another axis of expense that can be considered - memory. If the KDF algorithm requires a significant amount of RAM, the degree to which it can be performed in parallel on a GPU is massively reduced. A Geforce 4090 may have 16,384 execution units, but if each password attempt requires 1GB of RAM and the card only has 24GB on board, the attacker is restricted to running 24 attempts in parallel.

So, in these days of attackers with access to a pile of GPUs, a purely computationally expensive KDF is just not a good choice. And, unfortunately, the subject of this story was almost certainly using one of those. Ubuntu 18.04 used the LUKS1 header format, and the only KDF supported in this format is PBKDF2. This is not a memory expensive KDF, and so is vulnerable to GPU-based attacks. But even so, systems using the LUKS2 header format used to default to argon2i, again not a memory expensive KDFwhich is memory strong, but not designed to be resistant to GPU attack (thanks to the comments pointing out my misunderstanding here). New versions default to argon2id, which is. You want to be using argon2id.

What makes this worse is that distributions generally don't update this in any way. If you installed your system and it gave you pbkdf2 as your KDF, you're probably still using pbkdf2 even if you've upgraded to a system that would use argon2id on a fresh install. Thankfully, this can all be fixed-up in place. But note that if anything goes wrong here you could lose access to all your encrypted data, so before doing anything make sure it's all backed up (and figure out how to keep said backup secure so you don't just have your data seized that way).

First, make sure you're running as up-to-date a version of your distribution as possible. Having tools that support the LUKS2 format doesn't mean that your distribution has all of that integrated, and old distribution versions may allow you to update your LUKS setup without actually supporting booting from it. Also, if you're using an encrypted /boot, stop now - very recent versions of grub2 support LUKS2, but they don't support argon2id, and this will render your system unbootable.

Next, figure out which device under /dev corresponds to your encrypted partition. Run

lsblk

and look for entries that have a type of "crypt". The device above that in the tree is the actual encrypted device. Record that name, and run

sudo cryptsetup luksHeaderBackup /dev/whatever --header-backup-file /tmp/luksheader

and copy that to a USB stick or something. If something goes wrong here you'll be able to boot a live image and run

sudo cryptsetup luksHeaderRestore /dev/whatever --header-backup-file luksheader

to restore it.

(Edit to add: Once everything is working, delete this backup! It contains the old weak key, and someone with it can potentially use that to brute force your disk encryption key using the old KDF even if you've updated the on-disk KDF.)

Next, run

sudo cryptsetup luksDump /dev/whatever

and look for the Version: line. If it's version 1, you need to update the header to LUKS2. Run

sudo cryptsetup convert /dev/whatever --type luks2

and follow the prompts. Make sure your system still boots, and if not go back and restore the backup of your header. Assuming everything is ok at this point, run

sudo cryptsetup luksDump /dev/whatever

again and look for the PBKDF: line in each keyslot (pay attention only to the keyslots, ignore any references to pbkdf2 that come after the Digests: line). If the PBKDF is either "pbkdf2" or "argon2i" you should convert to argon2id. Run the following:

sudo cryptsetup luksConvertKey /dev/whatever --pbkdf argon2id

and follow the prompts. If you have multiple passwords associated with your drive you'll have multiple keyslots, and you'll need to repeat this for each password.

Distributions! You should really be handling this sort of thing on upgrade. People who installed their systems with your encryption defaults several years ago are now much less secure than people who perform a fresh install today. Please please please do something about this.

comment count unavailable comments

Matthew Palmer: Rutie and Magnus, Two Good Ways to Build Ruby Extensions in Rust

I wrote the Ruby bindings for the Enquo Project, my attempt to bring queryable encryption to all databases, using the Rutie library. Recently, I ve rewritten the bindings to use Magnus instead, and I thought I d put down my thoughts about the whole situation.

The Story So Far The Enquo Project core cryptography is all written in Rust, as seems to be the vogue these days. Rust is fast, safe, and easily interoperable with most of the rest of the modern software development ecosystem, making it a good choice as a language to implement the cryptographic primitives that Enquo needs, like Order-Revealing Encryption. Of course, since not everyone writes their applications in Rust, we need to provide the functionality of the Enquo client in the languages that people do use, such as Ruby, Python, and so on. Since re-writing all that cryptographic code in a myriad of languages would be tedious and error-prone, we instead provide bindings to the core Rust code. These are just thin shims of code that translate the data types and function calls between Rust and the target language.
Shim in a Can Wrong sort of shim, but canned language bindings would be handy
As I m most familiar with Ruby and its development ecosystem (particularly Ruby on Rails), it was natural that I d make Ruby bindings for Enquo as my first target. Rummaging around, it seemed that Rutie was a good library to use, so off I went.

What are Rutie and Magnus, Anyway? Both libraries share the same goal: provide the ability to write some Rust code, run that through a compiler, and produce something that can be loaded by the Ruby interpreter and used just like any other Ruby class. They re both fairly high level interfaces, trying to abstract away much of the gory details, and do a lot of the common heavy lifting that can make writing bindings fiddly and annoying. Things like mapping data types (like strings and integers) between Rust data types and the closest equivalents in Ruby. This mapping never goes perfectly smoothly. For example, Ruby integers don t have a fixed range of values they can represent you can store a huge number like 2256 more-or-less as easily as you can the number 12. But Rust, being a lower-level language, only has a set of integer types that have fixed boundaries, like the u32 type, which can only store integers between zero and about four billion (232 - 1, to be precise). There s also lots of little things that need to be just right, also, like translating the different memory management approaches of the languages, and dealing with a myriad of fiddly little issues like passing arguments and return values in and out of method calls, helpers for defining classes and methods (and pointing to the correct Rust functions), and so on.
A mass of tangled pipes and valves This is what I imagine it looks like inside these libraries
(Herv Cozanet / Wikimedia Commons, CC-BY-SA)
All in all, these libraries are fairly significant pieces of work, and I m mighty glad that someone else has taken on the job of building (and maintaining!) them.

So Why the Change? Good question. It s important to say at the outset that there s nothing particularly wrong with Rutie. I found using Rutie to be very straightforward, and the Ruby bindings came together very quickly and easily. If someone chose to use Rutie for their project, I m sure they d have a good experience. What made me take the time to rewrite using Magnus was a set of a few tiny things, which together gave me enough of a shove to do the work. Firstly, I d had a hiccup with Rutie s support of newer versions of Ruby, particularly 3.2 (PR). Also, I d hit a couple of segfault issues, which were ultimately caused by Ruby garbage-collecting data out from underneath me. These were ultimately my fault, of course, but Rutie wasn t helping me out in avoiding the problems in the first place. Finally, while Rutie helped translate data types, there was still a bit of boilerplate and ugliness that needed to be included. This wasn t a showstopper, but I m appreciating the extra smoothness that Magnus provides here. As an example, here s what s required in Rutie to get native Rust data types from Ruby method parameters (and the self reference to the current object):
fn enquo_field_decrypt_text(ciphertext_obj: RString, context_obj: RString) -> RString  
    let ciphertext = ciphertext_obj.to_str_unchecked();
    let context = context_obj.to_vec_u8_unchecked();
    let field = rbself.get_data(&*FIELD_WRAPPER);
    // etc etc etc
The equivalent in Magnus is just the function signature:
fn decrypt_text(&self, ciphertext: String, context: String) -> Result<String, magnus::Error>  
You can also see there that Magnus signals an exception via the Result return value, while Rutie s approach to raising an exception involves poking the Ruby VM directly, which always struck me as a bit ugly. There are several other minor things in Magnus (like its cleaner approach to wrapping structs so they can be stored in Ruby objects) that I m appreciating, too. Never discount the power of ergonomics for making a happy developer.

The End Result I spent a bit over half of last weekend doing the rewrite maybe ten hours of so. Since Magnus did more type checking and data validation, and its approach to error handling was smoother, I took the opportunity to rewrite a bunch of Ruby wrapper code I d written (which just existed to check things like ranges of values and string encodings) into Rust, as well. To make sure that the conversion was accurate, I added a heap more unit tests to the bindings. I also took the opportunity to restructure the codebase to split the code for the different Ruby classes into separate files, which I hadn t done initially as the code had originally accreted, rather than being purposefully written. All up, though, my rewrite ended up removing over 60 lines (excluding the extra specs I added):
$ git diff --stat -- lib ext/enquo/src
 ruby/ext/enquo/src/field.rs         342 ++++++++++++++++++++++++++++++++++++++
 ruby/ext/enquo/src/lib.rs           338 ++++---------------------------------
 ruby/ext/enquo/src/root.rs           39 +++++
 ruby/ext/enquo/src/root_key.rs       67 ++++++++
 ruby/lib/enquo.rb                     6 +-
 ruby/lib/enquo/field.rb             173 -------------------
 ruby/lib/enquo/root.rb               28 ----
 ruby/lib/enquo/root_key.rb            1 -
 ruby/lib/enquo/root_key/static.rb    27 ---
 9 files changed, 479 insertions(+), 542 deletions(-)
Considering that I was translating from a higher level language into a lower level one, the removal of so much code is quite remarkable. Magnus was able to automagically replace rather a lot of raise ArgumentError if something.isnt_right code in those .rb files. So, in conclusion, if you, too, are building Ruby extensions in Rust, while Rutie is a solid choice (and you probably should stick with it if you re already using it), I highly recommend giving Magnus a look for your next extension.

17 April 2023

Matthew Garrett: Booting modern Intel CPUs

CPUs can't do anything without being told what to do, which leaves the obvious problem of how do you tell a CPU to do something in the first place. On many CPUs this is handled in the form of a reset vector - an address the CPU is hardcoded to start reading instructions from when power is applied. The address the reset vector points to will typically be some form of ROM or flash that can be read by the CPU even if no other hardware has been configured yet. This allows the system vendor to ship code that will be executed immediately after poweron, configuring the rest of the hardware and eventually getting the system into a state where it can run user-supplied code.

The specific nature of the reset vector on x86 systems has varied over time, but it's effectively always been 16 bytes below the top of the address space - so, 0xffff0 on the 20-bit 8086, 0xfffff0 on the 24-bit 80286, and 0xfffffff0 on the 32-bit 80386. Convention on x86 systems is to have RAM starting at address 0, so the top of address space could be used to house the reset vector with as low a probability of conflicting with RAM as possible.

The most notable thing about x86 here, though, is that when it starts running code from the reset vector, it's still in real mode. x86 real mode is a holdover from a much earlier era of computing. Rather than addresses being absolute (ie, if you refer to a 32-bit address, you store the entire address in a 32-bit or larger register), they are 16-bit offsets that are added to the value stored in a "segment register". Different segment registers existed for code, data, and stack, so a 16-bit address could refer to different actual addresses depending on how it was being interpreted - jumping to a 16 bit address would result in that address being added to the code segment register, while reading from a 16 bit address would result in that address being added to the data segment register, and so on. This is all in order to retain compatibility with older chips, to the extent that even 64-bit x86 starts in real mode with segments and everything (and, also, still starts executing at 0xfffffff0 rather than 0xfffffffffffffff0 - 64-bit mode doesn't support real mode, so there's no way to express a 64-bit physical address using the segment registers, so we still start just below 4GB even though we have massively more address space available).

Anyway. Everyone knows all this. For modern UEFI systems, the firmware that's launched from the reset vector then reprograms the CPU into a sensible mode (ie, one without all this segmentation bullshit), does things like configure the memory controller so you can actually access RAM (a process which involves using CPU cache as RAM, because programming a memory controller is sufficiently hard that you need to store more state than you can fit in registers alone, which means you need RAM, but you don't have RAM until the memory controller is working, but thankfully the CPU comes with several megabytes of RAM on its own in the form of cache, so phew). It's kind of ugly, but that's a consequence of a bunch of well-understood legacy decisions.

Except. This is not how modern Intel x86 boots. It's far stranger than that. Oh, yes, this is what it looks like is happening, but there's a bunch of stuff going on behind the scenes. Let's talk about boot security. The idea of any form of verified boot (such as UEFI Secure Boot) is that a signature on the next component of the boot chain is validated before that component is executed. But what verifies the first component in the boot chain? You can't simply ask the BIOS to verify itself - if an attacker can replace the BIOS, they can replace it with one that simply lies about having done so. Intel's solution to this is called Boot Guard.

But before we get to Boot Guard, we need to ensure the CPU is running in as bug-free a state as possible. So, when the CPU starts up, it examines the system flash and looks for a header that points at CPU microcode updates. Intel CPUs ship with built-in microcode, but it's frequently old and buggy and it's up to the system firmware to include a copy that's new enough that it's actually expected to work reliably. The microcode image is pulled out of flash, a signature is verified, and the new microcode starts running. This is true in both the Boot Guard and the non-Boot Guard scenarios. But for Boot Guard, before jumping to the reset vector, the microcode on the CPU reads an Authenticated Code Module (ACM) out of flash and verifies its signature against a hardcoded Intel key. If that checks out, it starts executing the ACM. Now, bear in mind that the CPU can't just verify the ACM and then execute it directly from flash - if it did, the flash could detect this, hand over a legitimate ACM for the verification, and then feed the CPU different instructions when it reads them again to execute them (a Time of Check vs Time of Use, or TOCTOU, vulnerability). So the ACM has to be copied onto the CPU before it's verified and executed, which means we need RAM, which means the CPU already needs to know how to configure its cache to be used as RAM.

Anyway. We now have an ACM loaded and verified, and it can safely be executed. The ACM does various things, but the most important from the Boot Guard perspective is that it reads a set of write-once fuses in the motherboard chipset that represent the SHA256 of a public key. It then reads the initial block of the firmware (the Initial Boot Block, or IBB) into RAM (or, well, cache, as previously described) and parses it. There's a block that contains a public key - it hashes that key and verifies that it matches the SHA256 from the fuses. It then uses that key to validate a signature on the IBB. If it all checks out, it executes the IBB and everything starts looking like the nice simple model we had before.

Except, well, doesn't this seem like an awfully complicated bunch of code to implement in real mode? And yes, doing all of this modern crypto with only 16-bit registers does sound like a pain. So, it doesn't. All of this is happening in a perfectly sensible 32 bit mode, and the CPU actually switches back to the awful segmented configuration afterwards so it's still compatible with an 80386 from 1986. The "good" news is that at least firmware can detect that the CPU has already configured the cache as RAM and can skip doing that itself.

I'm skipping over some steps here - the ACM actually does other stuff around measuring the firmware into the TPM and doing various bits of TXT setup for people who want DRTM in their lives, but the short version is that the CPU bootstraps itself into a state where it works like a modern CPU and then deliberately turns a bunch of the sensible functionality off again before it starts executing firmware. I'm also missing out the fact that this entire process only kicks off after the Management Engine says it can, which means we're waiting for an entirely independent x86 to boot an entire OS before our CPU even starts pretending to execute the system firmware.

Of course, as mentioned before, on modern systems the firmware will then reprogram the CPU into something actually sensible so OS developers no longer need to care about this[1][2], which means we've bounced between multiple states for no reason other than the possibility that someone wants to run legacy BIOS and then boot DOS on a CPU with like 5 orders of magnitude more transistors than the 8086.

tl;dr why can't my x86 wake up with the gin protected mode already inside it

[1] Ha uh except that on ACPI resume we're going to skip most of the firmware setup code so we still need to handle the CPU being in fucking 16-bit mode because suspend/resume is basically an extremely long reboot cycle

[2] Oh yeah also you probably have multiple cores on your CPU and well bad news about the state most of the cores are in when the OS boots because the firmware never started them up so they're going to come up in 16-bit real mode even if your boot CPU is already in 64-bit protected mode, unless you were using TXT in which case you have a different sort of nightmare that if we're going to try to map it onto real world nightmare concepts is one that involves a lot of teeth. Or, well, that used to be the case, but ACPI 6.4 (released in 2021) provides a mechanism for the OS to ask the firmware to wake the CPU up for it so this is invisible to the OS, but you're still relying on the firmware to actually do the heavy lifting here

comment count unavailable comments

14 April 2023

Russ Allbery: Review: Babel

Review: Babel, by R.F. Kuang
Publisher: Harper Voyage
Copyright: August 2022
ISBN: 0-06-302144-7
Format: Kindle
Pages: 544
Babel, or the Necessity of Violence: An Arcane History of the Oxford Translators' Revolution, to give it its full title, is a standalone dark academia fantasy set in the 1830s and 1840s, primarily in Oxford, England. The first book of R.F. Kuang's previous trilogy, The Poppy War, was nominated for multiple awards and won the Compton Crook Award for best first novel. Babel is her fourth book. Robin Swift, although that was not his name at the time, was born and raised in Canton and educated by an inexplicable English tutor his family could not have afforded. After his entire family dies of cholera, he is plucked from China by a British professor and offered a life in England as his ward. What follows is a paradise of books, a hell of relentless and demanding instruction, and an unpredictably abusive emotional environment, all aiming him towards admission to Oxford University. Robin will join University College and the Royal Institute of Translation. The politics of this imperial Britain are almost precisely the same as in our history, but one of the engines is profoundly different. This world has magic. If words from two different languages are engraved on a metal bar (silver is best), the meaning and nuance lost in translation becomes magical power. With a careful choice of translation pairs, and sometimes additional help from other related words and techniques, the silver bar becomes a persistent spell. Britain's industrial revolution is in overdrive thanks to the country's vast stores of silver and the applied translation prowess of Babel. This means Babel is also the only part of very racist Oxford that accepts non-white students and women. They need translators (barely) more than they care about maintaining social hierarchy; translation pairs only work when the translator is fluent in both languages. The magic is also stronger when meanings are more distinct, which is creating serious worries about classical and European languages. Those are still the bulk of Babel's work, but increased trade and communication within Europe is eroding the meaning distinctions and thus the amount of magical power. More remote languages, such as Chinese and Urdu, are full of untapped promise that Britain's colonial empire wants to capture. Professor Lowell, Robin's dubious benefactor, is a specialist in Chinese languages; Robin is a potential tool for his plans. As Robin discovers shortly after arriving in Oxford, he is not the first of Lowell's tools. His predecessor turned against Babel and is trying to break its chokehold on translation magic. He wants Robin to help. This is one of those books that is hard to review because it does some things exceptionally well and other things that did not work for me. It's not obvious if the latter are flaws in the book or a mismatch between book and reader (or, frankly, flaws in the reader). I'll try to explain as best I can so that you can draw your own conclusions. First, this is one of the all-time great magical system hooks. The way words are tapped for power is fully fleshed out and exceptionally well-done. Kuang is a professional translator, which shows in the attention to detail on translation pairs. I think this is the best-constructed and explained word-based magic system I've read in fantasy. Many word-based systems treat magic as its own separate language that is weirdly universal. Here, Kuang does the exact opposite, and the result is immensely satisfying. A fantasy reader may expect exploration of this magic system to be the primary point of the book, however, and this is not the case. It is an important part of the book, and its implications are essential to the plot resolution, but this is not the type of fantasy novel where the plot is driven by character exploration of the magic system. The magic system exists, the characters use it, and we do get some crunchy details, but the heart of the book is elsewhere. If you were expecting the typical relationship of a fantasy novel to its magic system, you may get a bit wrong-footed. Similarly, this is historical fantasy, but it is the type of historical fantasy where the existence of magic causes no significant differences. For some people, this is a pet peeve; personally, I don't mind that choice in the abstract, but some of the specifics bugged me. The villains of this book assert that any country could have done what Britain did in developing translation magic, and thus their hoarding of it is not immoral. They are obviously partly lying (this is a classic justification for imperialism), but it's not clear from the book how they are lying. Technologies (and magic here works like a technology) tend to concentrate power when they require significant capital investment, and tend to dilute power when they are portable and easy to teach. Translation magic feels like the latter, but its effect in the book is clearly the former, and I was never sure why. England is not an obvious choice to be a translation superpower. Yes, it's a colonial empire, but India, southeast Asia, and most certainly Africa (the continent largely not appearing in this book) are home to considerably more languages from more wildly disparate families than western Europe. Translation is not a peculiarly European idea, and this magic system does not seem hard to stumble across. It's not clear why England, and Oxford in particular, is so dramatically far ahead. There is some sign that Babel is keeping the mechanics of translation magic secret, but that secret has leaked, seems easy to develop independently, and is simple enough that a new student can perform basic magic with a few hours of instruction. This does not feel like the kind of power that would be easy to concentrate, let alone to the extreme extent required by the last quarter of this book. The demand for silver as a base material for translation magic provides a justification for mercantilism that avoids the confusing complexities of currency economics in our actual history, so fine, I guess, but it was a bit disappointing for this great of an idea for a magic system to have this small of an impact on politics. I'll come to the actual thrust of this book in a moment, but first something else Babel does exceptionally well: dark academia. The remainder of Robin's cohort at Oxford is Remy, a dark-skinned Muslim from Calcutta; Victoire, a Haitian woman raised in France; and Letty, the daughter of a British admiral. All of them are non-white except Letty, and Letty and Victoire additionally have to deal with the blatant sexism of the time. (For example, they have to live several miles from Oxford because women living near the college would be a "distraction.") The interpersonal dynamics between the four are exceptionally well done. Kuang captures the dislocation of going away to college, the unsettled life upheaval that makes it both easy and vital to form suddenly tight friendships, and the way that the immense pressure from classes and exams leaves one so devoid of spare emotional capacity that those friendships become both unbreakable and badly strained. Robin and Remy almost immediately become inseparable in that type of college friendship in which profound trust and constant companionship happen first and learning about the other person happens afterwards. It's tricky to talk about this without spoilers, but one of the things Kuang sets up with this friend group is a pointed look at intersectionality. Babel has gotten a lot of positive review buzz, and I think this is one of the reasons why. Kuang does not pass over or make excuses for characters in a place where many other books do. This mostly worked for me, but with a substantial caveat that I think you may want to be aware of before you dive into this book. Babel is set in the 1830s, but it is very much about the politics of 2022. That does not necessarily mean that the politics are off for the 1830s; I haven't done the research to know, and it's possible I'm seeing the Tiffany problem (Jo Walton's observation that Tiffany is a historical 12th century women's name, but an author can't use it as a medieval name because readers think it sounds too modern). But I found it hard to shake the feeling that the characters make sense of their world using modern analytical frameworks of imperialism, racism, sexism, and intersectional feminism, although without using modern terminology, and characters from the 1830s would react somewhat differently. This is a valid authorial choice; all books are written for the readers of the time when they're published. But as with magical systems that don't change history, it's a pet peeve for some readers. If that's you, be aware that's the feel I got from it. The true center of this book is not the magic system or the history. It's advertised directly in the title the necessity of violence although it's not until well into the book before the reader knows what that means. This is a book about revolution, what revolution means, what decisions you have to make along the way, how the personal affects the political, and the inadequacy of reform politics. It is hard, uncomfortable, and not gentle on its characters. The last quarter of this book was exceptional, and I understand why it's getting so much attention. Kuang directly confronts the desire for someone else to do the necessary work, the hope that surely the people with power will see reason, and the feeling of despair when there are no good plans and every reason to wait and do nothing when atrocities are about to happen. If you are familiar with radical politics, these aren't new questions, but this is not the sort of thing that normally shows up in fantasy. It does not surprise me that Babel struck a nerve with readers a generation or two younger than me. It captures that heady feeling on the cusp of adulthood when everything is in flux and one is assembling an independent politics for the first time. Once I neared the end of the book, I could not put it down. The ending is brutal, but I think it was the right ending for this book. There are two things, though, that I did not like about the political arc. The first is that Victoire is a much more interesting character than Robin, but is sidelined for most of the book. The difference of perspectives between her and Robin is the heart of what makes the end of this book so good, and I wish that had started 300 pages earlier. Or, even better, I wish Victoire has been the protagonist; I liked Robin, but he's a very predictable character for most of the book. Victoire is not; even the conflicts she had earlier in the book, when she didn't get much attention in the story, felt more dynamic and more thoughtful than Robin's mix of guilt and anxiety. The second is that I wish Kuang had shown more of Robin's intellectual evolution. All of the pieces of why he makes the decisions that he does are present in this book, and Kuang shows his emotional state (sometimes in agonizing detail) at each step, but the sense-making, the development of theory and ideology beneath the actions, is hinted at but not shown. This is a stylistic choice with no one right answer, but it felt odd because so much of the rest of the plot is obvious and telegraphed. If the reader shares Robin's perspective, I think it's easy to fill in the gaps, but it felt odd to read Robin giving clearly thought-out political analyses at the end of the book without seeing the hashing-out and argument with friends required to develop those analyses. I felt like I had to do a lot of heavy lifting as the reader, work that I wish had been done directly by the book. My final note about this book is that I found much of it extremely predictable. I think that's part of why reviewers describe it as accessible and easy to read; accessibility and predictability can be two sides of the same coin. Kuang did not intend for this book to be subtle, and I think that's part of the appeal. But very few of Robin's actions for the first three-quarters of the book surprised me, and that's not always the reading experience I want. The end of the book is different, and I therefore found it much more gripping, but it takes a while to get there. Babel is, for better or worse, the type of fantasy where the politics, economics, and magic system exist primarily to justify the plot the author wanted. I don't think the societal position of the Institute of Translation that makes the ending possible is that believable given the nature of the technology in question and the politics of the time, and if you are inclined to dig into the specifics of the world-building, I think you will find it frustrating. Where it succeeds brilliantly is in capturing the social dynamics of hothouse academic cohorts, and in making a sharp and unfortunately timely argument about the role of violence in political change, in a way that the traditionally conservative setting of fantasy rarely does. I can't say Babel blew me away, but I can see why others liked it so much. If I had to guess, I'd say that the closer one is in age to the characters in the book and to that moment of political identity construction, the more it's likely to appeal. Rating: 7 out of 10

John Goerzen: Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN

Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there. But these days, the notion of corporate network and home network are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home. And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project. Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for. Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably no . We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all. So this page is fundamentally about making the network work, simply and effectively.

How do we make the Internet work in these scenarios? We re going to combine three concepts:
  1. A VPN, providing fully encrypted and authenticated communication and stable IPs
  2. Mesh Networking, in which devices automatically discover optimal paths to reach each other
  3. Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.
By combining these concepts, we arrive at some nice results:
  • You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
    • Combined with mosh, these sessions will be durable even across moving to other host networks.
    • You could just as well use telnet, because the underlying network should be secure.
  • You don t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that s all you need. hosts.allow could make a comeback!
  • You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.
There might sometimes be tradeoffs. For instance:
  • On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
  • With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can t reach the control plane over the Internet (Internet is down or provider is down)
Some other features that some of the tools provide include:
  • Easy sharing of limited access with friends/guests
  • Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
  • Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.
Let s dive in.

Types of Mesh VPNs I ll go over several types of meshes in this article:
  1. Fully decentralized with automatic hop routing This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I ll cover include Yggdrasil and tinc.
  2. Automatic peer-to-peer with centralized control In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
  3. Roll your own and hybrid approaches This is a grab bag of other ideas; for instance, running Yggdrasil over Tailscale.

Terminology For the sake of consistency, I m going to use common language to discuss things that have different terms in different ecosystems:
  • Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I ll call such a relay a broker . This may or may not be the same system that is a control plane for a tool.
  • All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I m going to call the unencrypted lower layer, whatever it is, the clearnet .

Evaluation Criteria Here are the things I want to see from a solution:
  • Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
  • Flexible, adapting to changes in network topology quickly and automatically.
  • Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
  • Private, minimizing leakage of information or metadata about me and my systems
  • Able to traverse CGNAT without having to use a broker whenever possible
  • A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
  • Fully or nearly fully Open Source
  • Free or very cheap for personal use
  • Wide operating system support, including headless Linux on x86_64 and ARM.

Fully Decentralized VPNs with Automatic Hop Routing Two systems fit this description: Yggdrasil and Tinc. Let s dive in.

Yggdrasil I ll start with Yggdrasil because I ve written so much about it already. It featured in prior posts such as:

Yggdrasil can be a private mesh VPN, or something more Yggdrasil can be a private mesh VPN, just like the other tools covered here. It s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate. Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away. Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn t. The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let s sit with that concept for a moment. Consider:
  • If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that s fine; all the other machines will discover this route to the world and use it when necessary.
  • All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
  • It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it s asked to.
  • Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.
Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.) One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is special , in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.

Meeting neighbors There are two ways that Yggdrasil knows about peers:
  • By broadcast discovery on the local LAN
  • By listening on a specific port (or being told to connect to a specific host/port)
Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.

Unique uses Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices perhaps ones that had never known of each other before that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it. Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.

Yggdrasil: Security and Privacy Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. I ll discuss firewalling more at the end of this article. Basically, you ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device. If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only. Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs. Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.

Yggdrasil: Connectivity and NAT traversal Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance. Yggdrasil s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP it s not, and it is even surprisingly good around bufferbloat. I ve found its performance to be on par with the other tools here, and it works as well as I d expect even on flaky 4G links. Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 and Yggdrasil can even make it speak TLS (even though that s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones). If you connect your system to multiple public peers, there is a chance though a very small one that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.

Yggdrasil: Sharing with friends If you re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that s it. (Well, presumably you also open up your firewall you did follow my advice to set one up, right?) If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers. Yggdrasil doesn t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.

Yggdrasil: DNS There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it s up to you.

Yggdrasil: Source code, pricing, and portability Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made). There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes keys). A peer in this case is just a node with a known clearnet IP address. Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.

Yggdrasil conclusions Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down. The unique added feature of being able to be part of a global mesh is a nice one. The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish). There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil. Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.

tinc tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to an older Yggdrasil without the public option. I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022. Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small. To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy. tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn t make sense for a laptop, and I suspect you d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP. Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh. Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn t as thorough as that of Yggdrasil.

tinc: Security and Privacy As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another. It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.

tinc: Connectivity and NAT traversal I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.

tinc: Sharing with friends tinc has no special support for this, and the difficulty of configuration makes it unlikely you d do this with tinc.

tinc: Source code, pricing, and portability tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy. tinc does not seem to be very actively maintained.

tinc conclusions I haven t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I m not going to be using tinc.

Automatic Peer-to-Peer Mesh VPNs with centralized control These tend to be the options that are frequently discussed. Let s talk about the options.

Tailscale Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh ( tailnet ). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs. When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker. Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard. Tailscale has a number of somewhat unique features in this space:
  • Funnel, which lets you expose ports on your system to the public Internet via the VPN.
  • Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
  • Node sharing, which lets you share a subset of your network with guests
  • A fantastic set of documentation, easily the best of the bunch.
Funnel, in particular, is interesting. With a couple of tailscale serve -style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it s not really a production solution but as a quick and easy way to demonstrate something cool to a friend, it s a neat feature.

Tailscale: Security and Privacy With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale s control plane, or of the credentials you use to access it? Let s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login. If a person does compromise your account or the Tailscale servers themselves, they can t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:
  • Tamper with your Tailscale ACLs, permitting new actions
  • Add new nodes to the network
  • Forcibly remove nodes from the network
  • Enable or disable optional features
Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are. Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below. Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it. Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane. Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.

Tailscale: Connectivity and NAT traversal When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is a fairly brittle setup and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won t work unless your Internet is up and the control plane is reachable. Assuming your Internet is up and Tailscale s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here. Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker. I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it s possible (maybe even probable) that if I d waited a bit longer, it would have sorted this all out.

Tailscale: Sharing with friends I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they re invited to, and cannot receive incoming connections from that network this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.

Tailscale: DNS Tailscale s DNS is called MagicDNS. It runs as a layer atop your standard DNS taking over /etc/resolv.conf on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick. It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds. I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.

Tailscale: Source code, pricing, and portability Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server. Tailscale s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A Community on Github plan also exists, and then there are more business-oriented plans as well. See the pricing page for details. As a small note, I appreciated Tailscale s install script. It properly added Tailscale s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.

Tailscale conclusions Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.

Zerotier Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I m mainly going to describe how it differs from Tailscale. The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface. However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I m not really convinced there s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow. A somewhat obscure feature, but possibly useful, is Zerotier s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.

Zerotier: Security and Privacy The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn t find anything about how sticky an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock. For this reason, I didn t proceed much further in my Zerotier evaluation.

Zerotier: Connectivity and NAT traversal Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it s more limited than Tailscale s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers ( root servers ) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.

Zerotier: Sharing with friends I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.

Zerotier: DNS Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.

Zerotier: Source code, pricing, and portability The client ZeroTier One is available on Github under a custom business source license which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is. The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.

Zerotier conclusions Frankly I don t see much reason to use Zerotier. The virtual Ethernet model seems to be a weird hybrid that doesn t bring much value. I m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that s esoteric enough and also solvable at other layers that it doesn t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don t see much reason to bother with it.

Netmaker Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I m failing to dig it up now. The project is in a bit of an early state; you can sign up for an upcoming closed beta with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide. It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most. I don t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn t seem to be in a state that will meet my needs yet.

Nebula Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker ( lighthouse ) yourself; perhaps on a $5/mo VPS. Due to the poor firewall traversal properties, I didn t do a full evaluation of Nebula, but it still has a very interesting design.

Nebula: Security and Privacy Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:
./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"
So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps. This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter. Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall. You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven t found information on this, but theoretically if NAT traversal isn t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don t know if this is truly the case, however.

Nebula: Connectivity and NAT traversal This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443. I couldn t find a lot of detail on what Nebula s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn t as capable as Tailscale. You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it s less versatile. You have to manually designate what relay to use. It s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.

Nebula: Sharing with friends There is no particular support here.

Nebula: DNS Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can t forward requests to a DNS server that s upstream for your laptop s particular current location. Actually, Nebula s DNS server doesn t forward at all. It also doesn t resolve its own name. The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it s unclear to me if this would make each lighthouse form a complete picture of the network.

Nebula: Source code, pricing, and portability Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.

Nebula conclusions I am attracted to Nebula s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.

Roll your own and hybrid Here is a grab bag of ideas:

Running Yggdrasil over Tailscale One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale. Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.

Public VPN provider for NAT traversal A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that. Be aware that a number of public VPN clients have a kill switch to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad s. You ll need to disable this if you are running a mesh atop it.

Other

Combining with local firewalls For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.

Performance I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.
Tool iperf3 (default) iperf3 -P 10 iperf3 -R
Direct (no VPN) 2406 2406 2764
Wireguard (kernel) 1515 1566 2027
Yggdrasil 892 1126 1105
Tailscale 950 1034 1085
Tinc 296 300 277
You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.

IP collisions When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:
  1. Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
  2. Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources
To summarize the state of play as highlighted in the reviews above:
  • Yggdrasil derives IPv6 addresses from a public key
  • tinc allows any node to set any IP
  • Tailscale IPs aren t user-assignable, but the assignment algorithm is unknown
  • Zerotier allows any IP to be allocated to any node at the control plane
  • I don t know what Netmaker does
  • Nebula IPs are baked into the cert and signed by the CA, but I haven t verified the enforcement algorithm
So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level). So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP? Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node s public key, making collisions technically impractical, if not outright impossible. However, if you use the /64 prefix, a collision may be more possible. Yggdrasil s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with. Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well. So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.

Final thoughts For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale. Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up. I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos! Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it s possible I got something wrong somewhere. Please let me know if you find any errors.
This blog post is a copy of a page on my website. That page may be periodically updated.

13 April 2023

Russ Allbery: Review: Once Upon a Tome

Review: Once Upon a Tome, by Oliver Darkshire
Publisher: W.W. Norton & Company
Copyright: 2022
Printing: 2023
ISBN: 1-324-09208-4
Format: Kindle
Pages: 243
The full title page of this book, in delightful 19th century style, is:
Once Upon a Tome: The Misadventures of a Rare Bookseller, wherein the theory of the profession is partially explained, with a variety of insufficient examples, by Oliver Darkshire. Interspersed with several diverting FOOTNOTES of a comical nature, ably ILLUSTRATED by Rohan Eason, PUBLISHED by W.W. Norton, and humbly proposed to the consideration of the public in this YEAR 2023
That may already be enough to give you a feel for this book. Oliver Darkshire works for Sotheran's Rare Books and Prints in London, most notably running their highly entertaining Twitter account. This is his first book. If you have been hanging out in the right corners of Twitter, you have probably been anticipating the release of this book, and may already have your own copy. If you have not (and to be honest it's increasingly dubious whether there are right corners of Twitter left), you're in for a treat. Darkshire has made Sotheran's a minor Twitter phenomenon due to tweets like this:
CUSTOMER: oh thank heavens I have been searching for a rare book expert with the knowledge to solve my complex problem ME (extremely and unhelpfully specialized): ok well the words are usually on the inside and I can see that's true here, so that's a good start I find I know lots of things until anyone asks me about it or there is a question to answer, at which point I know nothing, I am a void, a tragic bucket of ignorance
My hope was that Once Upon a Tome would be the same thing at book length, and I am delighted to report that's exactly what it is. By the time I finished reading the story of Darkshire's early training, I knew I was going to savor every word.
The hardest part, though, lies in recording precisely in what ways a book has survived the ravages of time. An entire lexicon of book-related terminology has evolved over hundreds of years for exactly this purpose terminology that means absolutely nothing to the average observer. It's traditional to adopt this baroque language when describing your books, for two reasons. The first is that the specific language of the book trade allows you to be exceedingly accurate and precise without using hundreds of words, and the second is that the elegance of it serves to dull the blow a little. Most rare books come with some minor defects, but that doesn't mean one has to be rude about it.
You will learn something about rare book selling in this book, and more about Darkshire's colleagues, but primarily this is a book-length attempt to convey the slightly uncanny experience of working in a rare bookstore in an entertaining way. Also, to be fully accurate, it is an attempt shift the bookstore sideways in the reader's mind into a fantasy world that mostly but not entirely parallels ours; as the introduction mentions, this is not a strictly accurate day-by-day account of life at the store, and stories have been altered and conflated in the telling. Rare bookselling is a retail job but a rather strange one, with its own conventions and unusual customers. Darkshire memorably divides rare book collectors into Smaugs and Draculas: Smaugs assemble vast lairs of precious items, Draculas have one very specific interest, and one's success at selling a book depends on identifying which type of customer one is dealing with. Like all good writing about retail jobs, half of the fun is descriptions of the customers.
The Suited Gentlemen turn up annually, smartly dressed in matching suits and asking to see any material we have on Ayn Rand. Faces usually obscured by large dark glasses, they move without making a sound, and only travel in pairs. Sometimes they will bark out a laugh at nothing in particular, as if mimicking what they think humans do.
There are more facets than the typical retail job, though, since the suppliers of the rare book trade (book runners, estate sales, and collectors who have been sternly instructed by spouses to trim down their collections) are as odd and varied as the buyers. This sort of book rests entirely on the sense of humor of the author, and I thought Darkshire's approach was perfect. He has the knack of poking fun at himself as much as he pokes fun at anyone around him. This book conveys an air of perpetual bafflement at stumbling into a job that suits him as well as this one does, praise of the skills of his coworkers, and gently self-deprecating descriptions of his own efforts. Combine that with well-honed sentences, a flair for brief and memorable description, and an accurate sense of how long a story should last, and one couldn't ask for more from this style of book.
The book rest where the bible would be held (leaving arms free for gesticulation) was carved into the shape of a huge wooden eagle. I m given to understand this is the kind of eloquent and confusing metaphor one expects in a place of worship, as the talons of the divine descend from above in a flurry of wings and death, but it seemed to alarm people to come face to face with the beaked fury of God as they entered the bookshop.
I've barely scratched the surface of great quotes from this book. If you like rare books, bookstores, or even just well-told absurd stories of working a retail job, read this. It reminds me of True Porn Clerk Stories, except with much less off-putting subject matter and even better writing. (Interestingly to me, it also shares with those stories, albeit for different reasons, a more complicated balance of power between the retail worker and the customer than the typical retail establishment.) My one wish is that I would have enjoyed more specific detail about the rare books themselves, since Darkshire only rarely describes successful retail transactions. But that's only a minor quibble. This was a pure delight from cover to cover and exactly what I was hoping for when I preordered it. Highly recommended. Rating: 9 out of 10

12 April 2023

Jamie McClelland: Doing whatever Gmail says

As we slowly move our members to our new email infrastructure, an unexpected twist turned up: One member reported getting the Gmail warning:
Be careful with this message The sender hasn t authenticated this message so Gmail can t verify that it actually came from them.
They have their email delivered to May First, but have configured Gmail to pull in that email using the Check mail from other accounts feature. It worked fine on our old infrastructure, but started giving this message when we transitioned. A further twist: he only receives this message from email sent by other people in his organization - in other words email sent via May First gets flagged, email sent from other people does not. These Gmail messages typically warn users about email that has failed (or lacks) both SPF and DKIM. However, before diving into the technical details, my first thought was: why is Gmail giving a warning on a message that wasn t even delivered to them? It s always nice to get confirmation from others that this is totally wrong behavior. Unfortunately, when it s Gmail, it doesn t matter if they are wrong. We all have to deal with it. So next, I decided to investigate why this message failed both DKIM (digital signature) and SPF (ensuring the message was sent from an authorized server). Examining the headers immediately turned up the SPF failure:
Authentication-Results: mx.google.com;
       spf=fail (google.com: domain of xxx@xxx.org does not designate n.n.n.n as permitted sender) smtp.mailfrom=xxx@xxx.org
The IP address Google checked to ensure the message was sent by an authorized server is the IP address of our internal mail filter server in our new email infrastructure. That s the last hop before delivery to the user s mailbox, so that s the last hop Gmail sees. This is why Gmail is totally wrong to run this check: all email messages retreived via their mail fecthing service are going to fail the SPF test because Gmail has no way of knowing what the actual last hop is. So why is this problem only showing up after we transitioned to our new infrastructure? Because our old infrastructure had only one mail server for every user. The one mail server was the MX server and the relay server, so it was included in their SPF record. And why does this only affect mail sent via May First and not other domains? Because we add our DKIM signature to outgoing email, not to email delivered internally. Therefore, these messages both fail the SPF check and also don t have a DKIM signature. Other messages have a DKIM signature. Ugggg. So what do we do now? Clearly, something dumb and simple is in order: I added the IP addresses of our internal filter servers to our global SPF record. Someday, years from now, after Gmail is long gone (or has fixed this dumb behavior), when I m doing whatever retired people like me do, someone will notice that our internal filter server IPs are included in our SPF record. Hopefully they will fix the problem, but instead they ll probably think: no idea why these are here - something will probably break if I remove them.

10 April 2023

Russell Coker: BTRFS Rebuild Time

In February I replaced a Dell T320 server with a HP Z640 workstation for a home server/workstation [1]. The T320 has 8*3.5 drive bays which I had used to put 3*4TB disks in a BTRFS RAID-10 array for 6TB of usable capacity. The Z640 has only 2*3.5 bays and 4*2.5 bays, so one option I could have taken was to buy a 4TB 2.5 SSD and keep the same 3*4TB array as before. Instead I chose to use an 8TB disk I had spare in an array with one of the original 4TB disks and some extra on NVMe devices (the system has 2*1TB NVMe devices which are used as a 380G RAID-1 for the root filesystem and the rest for the storage array). It s nice how BTRFS allows putting any storage you have in a RAID-10 configuration. Unfortunately it seems that I chose the wrong 4TB disk to use for this as it failed three days ago. It gave thousands of read and write errors and Linux decided that the drive no longer existed. I tried rebooting the system to get it in the BTRFS array again but it failed again and failed so quickly that it wasn t even possible to use the data on it as part of a RAID rebuild. So I removed that disk and put in one of the other 4TB disks. As the array is comprised of an 8TB disk and 3 other devices that don t add up to 8TB the layout is the 8TB disk having one copy of everything and the other devices having parts of it. So the rebuild process comprised of copying data from the 8TB disk to the 4TB disk. For a RAID array run in the manner of Linux software RAID the rebuild of a RAID-1 involves a linear copy of data which is the optimal case for hard disks, copying 4TB of data in that manner would have an average speed of a bit over 100MB/s and take about 11 hours. With BTRFS the source disk has to be updated for each block that is recreated so the process was bottlenecked on writing to the 8TB disk. It took 2 days 23 hours to complete. The process involved reading 3,478,031MB and writing 4,405,545MB. The system was live for the process and some cron jobs etc were writing to the array, but in the 12 hours since the rebuild completed the array has had 7,038MB written. So presumably during the rebuild time about 42G of actual data were written to the array and the other 4.3TB written to the 8TB disk were from the process of copying 3.5TB from it to another device. Iostat reported that there were 645.36 TPS for the duration of the rebuild which seems like a decent number for a hard drive, during the process iostat reported that the drive had 99%+ of IO capacity used for the duration. While waiting for this to complete I wrote a blog post about storage trends [2]. One thing I didn t mention in that post is that if you are the type of person who checks the rebuild process fifty times a day then that should be counted as part of the cost of using slow storage. If instead of an 8TB disk plus some SSD storage I had used 2*4TB disks and 1*4TB SSD as I had considered doing then instead of having 3.8TB on one device I would have had about 2.5TB and the reconstruct would have probably taken 2/3 of the time. If I had moved the array to 3*4TB SSDs then it would have taken a small fraction of the time. One thing to note is that I made a mistake in this operation by removing the failed device instead of doing a btrfs replace operation which can be significantly faster. If I had correctly done this then I would have written a blog post about the rebuild taking 2 days or something, the issues of hard drives being slow and me compulsively checking the progress would still apply.

8 April 2023

Evgeni Golov: Running autopkgtest with Docker inside Docker

While I am not the biggest fan of Docker, I must admit it has quite some reach across various service providers and can often be seen as an API for running things in isolated environments. One such service provider is GitHub when it comes to their Actions service. I have no idea what isolation technology GitHub uses on the outside of Actions, but inside you just get an Ubuntu system and can run whatever you want via Docker as that comes pre-installed and pre-configured. This especially means you can run things inside vanilla Debian containers, that are free from any GitHub or Canonical modifications one might not want ;-) So, if you want to run, say, lintian from sid, you can define a job to do so:
  lintian:
    runs-on: ubuntu-latest
    container: debian:sid
    steps:
      - [ do something to get a package to run lintian on ]
      - run: apt-get update
      - run: apt-get install -y --no-install-recommends lintian
      - run: lintian --info --display-info *.changes
This will run on Ubuntu (latest right now means 22.04 for GitHub), but then use Docker to run the debian:sid container and execute all further steps inside it. Pretty short and straight forward, right? Now lintian does static analysis of the package, it doesn't need to install it. What if we want to run autopkgtest that performs tests on an actually installed package? autopkgtest comes with various "virt servers", which are providing isolation of the testbed, so that it does not interfere with the host system. The simplest available virt server, autopkgtest-virt-null doesn't actually provide any isolation, as it runs things directly on the host system. This might seem fine when executed inside an ephemeral container in an CI environment, but it also means that multiple tests have the potential to influence each other as there is no way to revert the testbed to a clean state. For that, there are other, "real", virt servers available: chroot, lxc, qemu, docker and many more. They all have one in common: to use them, one needs to somehow provide an "image" (a prepared chroot, a tarball of a chroot, a vm disk, a container, , you get it) to operate on and most either bring a tool to create such an "image" or rely on a "registry" (online repository) to provide them. Most users of autopkgtest on GitHub (that I could find with their terrible search) are using either the null or the lxd virt servers. Probably because these are dead simple to set up (null) or the most "native" (lxd) in the Ubuntu environment. As I wanted to execute multiple tests that for sure would interfere with each other, the null virt server was out of the discussion pretty quickly. The lxd one also felt odd, as that meant I'd need to set up lxd (can be done in a few commands, but still) and it would need to download stuff from Canonical, incurring costs (which I couldn't care less about) and taking time which I do care about!). Enter autopkgtest-virt-docker, which recently was added to autopkgtest! No need to set things up, as GitHub already did all the Docker setup for me, and almost no waiting time to download the containers, as GitHub does heavy caching of stuff coming from Docker Hub (or at least it feels like that). The only drawback? It was added in autopkgtest 5.23, which Ubuntu 22.04 doesn't have. "We need to go deeper" and run autopkgtest from a sid container! With this idea, our current job definition would look like this:
  autopkgtest:
    runs-on: ubuntu-latest
    container: debian:sid
    steps:
      - [ do something to get a package to run autopkgtest on ]
      - run: apt-get update
      - run: apt-get install -y --no-install-recommends autopkgtest autodep8 docker.io
      - run: autopkgtest *.changes --setup-commands="apt-get update" -- docker debian:sid
(--setup-commands="apt-get update" is needed as the container comes with an empty apt cache and wouldn't be able to find dependencies of the tested package) However, this will fail:
# autopkgtest *.changes --setup-commands="apt-get update" -- docker debian:sid
autopkgtest [10:20:54]: starting date and time: 2023-04-07 10:20:54+0000
autopkgtest [10:20:54]: version 5.28
autopkgtest [10:20:54]: host a82a11789c0d; command line:
  /usr/bin/autopkgtest bley_2.0.0-1_amd64.changes '--setup-commands=apt-get update' -- docker debian:sid
Unexpected error:
Traceback (most recent call last):
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 829, in mainloop
    command()
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 758, in command
    r = f(c, ce)
        ^^^^^^^^
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 692, in cmd_copydown
    copyupdown(c, ce, False)
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 580, in copyupdown
    copyupdown_internal(ce[0], c[1:], upp)
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 607, in copyupdown_internal
    copydown_shareddir(sd[0], sd[1], dirsp, downtmp_host)
  File "/usr/share/autopkgtest/lib/VirtSubproc.py", line 562, in copydown_shareddir
    shutil.copy(host, host_tmp)
  File "/usr/lib/python3.11/shutil.py", line 419, in copy
    copyfile(src, dst, follow_symlinks=follow_symlinks)
  File "/usr/lib/python3.11/shutil.py", line 258, in copyfile
    with open(dst, 'wb') as fdst:
         ^^^^^^^^^^^^^^^
FileNotFoundError: [Errno 2] No such file or directory: '/tmp/autopkgtest-virt-docker.shared.kn7n9ioe/downtmp/wrapper.sh'
autopkgtest [10:21:07]: ERROR: testbed failure: unexpected eof from the testbed
Running the same thing locally of course works, so there has to be something special about the setup at GitHub. But what?! A bit of digging revealed that autopkgtest-virt-docker tries to use a shared directory (using Dockers --volume) to exchange things with the testbed (for the downtmp-host capability). As my autopkgtest is running inside a container itself, nothing it tells the Docker deamon to mount will be actually visible to it. In retrospect this makes total sense and autopkgtest-virt-docker has a switch to "fix" the issue: --remote as the Docker deamon is technically remote when viewed from the place autopkgtest runs at. I'd argue this is not a bug in autopkgtest(-virt-docker), as the situation is actually cared for. There is even some auto-detection of "remote" daemons in the code, but that doesn't "know" how to detect the case where the daemon socket is mounted (vs being set as an environment variable). I've opened an MR (assume remote docker when running inside docker) which should detect the case of running inside a Docker container which kind of implies the daemon is remote. Not sure the patch will be accepted (it is a band-aid after all), but in the meantime I am quite happy with using --remote and so could you ;-)

Russell Coker: Storage Trends 2023

It s been 2 years since my last blog post about storage trends [1]. Minimum Storage <=2TB In 2021 I stated that as MSY had 2TB disks for $72 and 2TB SSD for $245 it was barely worth considering a 2TB disk and anything less than 2TB wasn t worth considering. Now for 2TB storage from MSY NVMe starts at $129, SATA SSD starts at $143, and hard disks start at $75. I guess that NVMe is slightly cheaper due to some combination of economies of scale for manufacture/sales and having less postage costs. It really doesn t make sense to consider hard disks for storing 2TB or less. For storage for a small system (PC or laptop) the cheapest storage device is $19 for a 128G SATA SSD. But it wouldn t make sense to buy that when you can get a 256G SATA SSD for $22 or a 240G NVMe device for $23, saving $3 on storage wouldn t make any sense. For 512G of storage the prices are $32 for NVMe and $33 for SATA SSD. For 1TB of storage the prices start at $68 for SATA SSD and $74 for NVMe. Probably for the vast majority of home users 1TB of SATA SSD or NVMe is the minimum storage capacity to consider, the $50 price difference isn t much when considering the entire price of a PC or laptop and anything less than 1TB will run out quickly with modern use. Larger Storage 4TB+ The price for 4TB of storage from MSY is NVMe starting at $349, SATA SSD starting at $369, and hard disks starting at $115. If you need 4TB of RAID-1 storage then it might be worth saving $470 and getting hard drives for a home user. For business use it wouldn t make sense. Some laptops have two NVMe sockets so 8TB of storage (or 4TB of RAID-1) in a laptop would be interesting. For 8TB of storage the MSY prices are SATA SSD for $739 and hard drives starting at $179. Probably hard drives are the best choice for most situations where there is a need to store 8TB or more of data. But the prices are low enough to make 8TB SSD something that can be considered for home use, it doesn t seem that long ago that the 4TB hard drives I bought for my home server were almost that expensive. Big Storage MSY doesn t have 8TB NVMe, such devices are on eBay for $1700 for regular M.2 NVMe and just under $1000 for U.2 (server hot-swap devices). So if you need more than 8TB of NVMe storage then probably buying a server with U.2 built in is the correct solution. For home users who need more than 8TB of storage hard drives are a good solution. One issue is that the more affordable and larger drives use Shingled Magnetic Recording (SMR) which has some different performance characteristics for certain workloads. Apparently SMR performs badly for anything other than large file storage. Why MSY? I primarily used MSY prices for this post because they are a reliable local store that has a list of prices that is easy to read. For everything in this post I can get better prices by using eBay, the StaticIce.com.au price comparison site [2], and the computing section of the OzBargain site that gamifies finding good prices [3]. But a good shopping strategy nowadays is to compare prices in a store to determine what items are in your price range and then shop around for price on the item you want. Checking all the different bargain sites for all these items would take much more time than I want to spend writing a blog post! Conclusion Hard drives don t make sense for the vast majority of systems. Not for laptops, not for typical desktop PCs, and not for small business servers (say 8TB or less of RAID storage). Hard drives only make sense for dozens or hundreds of TB of storage and even then finding out how to deal with SMR issues is going to increase the pain of deployment. Maybe using a combination of SSD and hard drives to deal with the SMR issues is going to be a competitive advantage for NAS vendors in future. NVMe looks like it s on the way to being cheaper than SATA SSD. There is likely to be a good market for systems with NVMe as the only internal storage option. The long term trend of systems without DVD drives and with maybe 2.5 SATA devices but no 3.5 SATA devices seems to lead to the GPU being the major part that needs to fit into a PC case that determines the overall size. Maybe there will be a new trend of GPUs connected to riser cards so they can be parallel to the motherboard for compact PCs. For business desktop systems (IE low powered graphics hardware as it s not for gaming) I expect that the trend will be towards NUC type devices which are already based around the M.2 as a storage device size.

6 April 2023

Reproducible Builds: Reproducible Builds in March 2023

Welcome to the March 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws have been introduced during compilation and distributing processes. It does this by ensuring identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please do visit our Contribute page on our website.

News There was progress towards making the Go programming language reproducible this month, with the overall goal remaining making the Go binaries distributed from Google and by Arch Linux (and others) to be bit-for-bit identical. These changes could become part of the upcoming version 1.21 release of Go. An issue in the Go issue tracker (#57120) is being used to follow and record progress on this.
Arnout Engelen updated our website to add and update reproducibility-related links for NixOS to reproducible.nixos.org. [ ]. In addition, Chris Lamb made some cosmetic changes to our presentations and resources page. [ ][ ]
Intel published a guide on how to reproducibly build their Trust Domain Extensions (TDX) firmware. TDX here refers to an Intel technology that combines their existing virtual machine and memory encryption technology with a new kind of virtual machine guest called a Trust Domain. This runs the CPU in a mode that protects the confidentiality of its memory contents and its state from any other software.
A reproducibility-related bug from early 2020 in the GNU GCC compiler as been fixed. The issues was that if GCC was invoked via the as frontend, the -ffile-prefix-map was being ignored. We were tracking this in Debian via the build_path_captured_in_assembly_objects issue. It has now been fixed and will be reflected in GCC version 13.
Holger Levsen will present at foss-north 2023 in April of this year in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years.
Anthony Andreoli, Anis Lounis, Mourad Debbabi and Aiman Hanna of the Security Research Centre at Concordia University, Montreal published a paper this month entitled On the prevalence of software supply chain attacks: Empirical study and investigative framework:
Software Supply Chain Attacks (SSCAs) typically compromise hosts through trusted but infected software. The intent of this paper is twofold: First, we present an empirical study of the most prominent software supply chain attacks and their characteristics. Second, we propose an investigative framework for identifying, expressing, and evaluating characteristic behaviours of newfound attacks for mitigation and future defense purposes. We hypothesize that these behaviours are statistically malicious, existed in the past, and thus could have been thwarted in modernity through their cementation x-years ago. [ ]

On our mailing list this month:
  • Mattia Rizzolo is asking everyone in the community to save the date for the 2023 s Reproducible Builds summit which will take place between October 31st and November 2nd at Dock Europe in Hamburg, Germany. Separate announcement(s) to follow. [ ]
  • ahojlm posted an message announcing a new project which is the first project offering bootstrappable and verifiable builds without any binary seeds. That is to say, a way of providing a verifiable path towards trusted software development platform without relying on pre-provided binary code in order to prevent against various forms of compiler backdoors. The project s homepage is hosted on Tor (mirror).

The minutes and logs from our March 2023 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 25th April at 15:00 UTC on #reproducible-builds on the OFTC network.
and as a Valentines Day present, Holger Levsen wrote on his blog on 14th February to express his thanks to OSUOSL for their continuous support of reproducible-builds.org. [ ]

Debian Vagrant Cascadian developed an easier setup for testing debian packages which uses sbuild s unshare mode along and reprotest, our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. [ ]
Over 30 reviews of Debian packages were added, 14 were updated and 7 were removed this month, all adding to our knowledge about identified issues. A number of issues were updated, including the Holger Levsen updating build_path_captured_in_assembly_objects to note that it has been fixed for GCC 13 [ ] and Vagrant Cascadian added new issues to mark packages where the build path is being captured via the Rust toolchain [ ] as well as new categorisation for where virtual packages have nondeterministic versioned dependencies [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Vagrant Cascadian filed a bug with a patch to ensure GNU Modula-2 supports the SOURCE_DATE_EPOCH environment variable.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In March, the following changes were made by Holger Levsen:
  • Arch Linux-related changes:
    • Build Arch packages in /tmp/archlinux-ci/$SRCPACKAGE instead of /tmp/$SRCPACKAGE. [ ]
    • Start 2/3 of the builds on the o1 node, the rest on o2. [ ]
    • Add graphs for Arch Linux (and OpenWrt) builds. [ ]
    • Toggle Arch-related builders to debug why a specific node overloaded. [ ][ ][ ][ ]
  • Node health checks:
    • Detect SetuptoolsDeprecationWarning tracebacks in Python builds. [ ]
    • Detect failures do perform chdist calls. [ ][ ]
  • OSUOSL node migration.
    • Install megacli packages that are needed for hardware RAID. [ ][ ]
    • Add health check and maintenance jobs for new nodes. [ ]
    • Add mail config for new nodes. [ ][ ]
    • Handle a node running in the future correctly. [ ][ ]
    • Migrate some nodes to Debian bookworm. [ ]
    • Fix nodes health overview for osuosl3. [ ]
    • Make sure the /srv/workspace directory is owned by by the jenkins user. [ ]
    • Use .debian.net names everywhere, except when communicating with the outside world. [ ]
    • Grant fpierret access to a new node. [ ]
    • Update documentation. [ ]
    • Misc migration changes. [ ][ ][ ][ ][ ][ ][ ][ ]
  • Misc changes:
    • Enable fail2ban everywhere and monitor it with munin [ ].
    • Gracefully deal with non-existing Alpine schroots. [ ]
In addition, Roland Clobus is continuing his work on reproducible Debian ISO images:
  • Add/update openQA configuration [ ], and use the actual timestamp for openQA builds [ ].
  • Moved adding the user to the docker group from the janitor_setup_worker script to the (more general) update_jdn.sh script. [ ]
  • Use the (short-term) reproducible source when generating live-build images. [ ]

diffoscope development diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats as well. This month, Mattia Rizzolo released versions 238, and Chris Lamb released versions 239 and 240. Chris Lamb also made the following changes:
  • Fix compatibility with PyPDF 3.x, and correctly restore test data. [ ]
  • Rework PDF annotation handling into a separate method. [ ]
In addition, Holger Levsen performed a long-overdue overhaul of the Lintian overrides in the Debian packaging [ ][ ][ ][ ], and Mattia Rizzolo updated the packaging to silence an include_package_data=True [ ], fixed the build under Debian bullseye [ ], fixed tool name in a list of tools permitted to be absent during package build tests [ ] and as well as documented sending out an email upon [ ]. In addition, Vagrant Cascadian updated the version of GNU Guix to 238 [ and 239 [ ]. Vagrant also updated reprotest to version 0.7.23. [ ]

Other development work Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 April 2023

Russ Allbery: Review: The Nordic Theory of Everything

Review: The Nordic Theory of Everything, by Anu Partanen
Publisher: Harper
Copyright: 2016
Printing: June 2017
ISBN: 0-06-231656-7
Format: Kindle
Pages: 338
Anu Partanen is a Finnish journalist who immigrated to the United States. The Nordic Theory of Everything, subtitled In Search of a Better Life, is an attempt to explain the merits of Finnish approaches to government and society to a US audience. It was her first book. If you follow US policy discussion at all, you have probably been exposed to many of the ideas in this book. There was a time when the US left was obsessed with comparisons between the US and Nordic countries, and while that obsession has faded somewhat, Nordic social systems are still discussed with envy and treated as a potential model. Many of the topics of this book are therefore predictable: parental leave, vacation, health care, education, happiness, life expectancy, all the things that are far superior in Nordic countries than in the United States by essentially every statistical measure available, and which have been much-discussed. Partanen brings two twists to this standard analysis. The first is that this book is part memoir: she fell in love with a US writer and made the decision to move to the US rather than asking him to move to Finland. She therefore experienced the transition between social and government systems first-hand and writes memorably on the resulting surprise, trade-offs, anxiety, and bafflement. The second, which I've not seen previously in this policy debate, is a fascinating argument that Finland is a far more individualistic country than the United States precisely because of its policy differences.
Most people, including myself, assumed that part of what made the United States a great country, and such an exceptional one, was that you could live your life relatively unencumbered by the downside of a traditional, old-fashioned society: dependency on the people you happened to be stuck with. In America you had the liberty to express your individuality and choose your own community. This would allow you to interact with family, neighbors, and fellow citizens on the basis of who you were, rather than on what you were obligated to do or expected to be according to old-fashioned thinking. The longer I lived in America, therefore, and the more places I visited and the more people I met and the more American I myself became the more puzzled I grew. For it was exactly those key benefits of modernity freedom, personal independence, and opportunity that seemed, from my outsider s perspective, in a thousand small ways to be surprisingly missing from American life today. Amid the anxiety and stress of people s daily lives, those grand ideals were looking more theoretical than actual.
The core of this argument is that the structure of life in the United States essentially coerces dependency on other people: employers, spouses, parents, children, and extended family. Because there is no universally available social support system, those relationships become essential for any hope of a good life, and often for survival. If parents do not heavily manage their children's education, there is a substantial risk of long-lasting damage to the stability and happiness of their life. If children do not care for their elderly parents, they may receive no care at all. Choosing not to get married often means choosing precarity and exhaustion because navigating society without pooling resources with someone else is incredibly difficult.
It was as if America, land of the Hollywood romance, was in practice mired in a premodern time when marriage was, first and foremost, not an expression of love, but rather a logistical and financial pact to help families survive by joining resources.
Partanen contrasts this with what she calls the Nordic theory of love:
What Lars Tr g rdh came to understand during his years in the United States was that the overarching ambition of Nordic societies during the course of the twentieth century, and into the twenty-first, has not been to socialize the economy at all, as is often mistakenly assumed. Rather the goal has been to free the individual from all forms of dependency within the family and in civil society: the poor from charity, wives from husbands, adult children from parents, and elderly parents from their children. The express purpose of this freedom is to allow all those human relationships to be unencumbered by ulterior motives and needs, and thus to be entirely free, completely authentic, and driven purely by love.
She sees this as the common theme through most of the policy differences discussed in this book. The Finnish approach is to provide neutral and universal logistical support for most of life's expected challenges: birth, child-rearing, education, health, unemployment, and aging. This relieves other social relations family, employer, church of the corrosive strain of dependency and obligation. It also ensures people's basic well-being isn't reliant on accidents of association.
If the United States is so worried about crushing entrepreneurship and innovation, a good place to start would be freeing start-ups and companies from the burdens of babysitting the nation s citizens.
I found this fascinating as a persuasive technique. Partanen embraces the US ideal of individualism and points out that, rather than being collectivist as the US right tends to assume, Finland is better at fostering individualism and independence because the government works to removes unnecessary premodern constraints on individual lives. The reason why so many Americans are anxious and frantic is not a personal failing or bad luck. It's because the US social system is deeply hostile to healthy relationships and individual independence. It demands a constant level of daily problem-solving and crisis management that is profoundly exhausting, nearly impossible to navigate alone, and damaging to the ideal of equal relationships. Whether this line of argument will work is another question, and I'm dubious for reasons that Partanen (probably wisely) avoids. She presents the Finnish approach as a discovery that the US would benefit from, and the US approach as a well-intentioned mistake. I think this is superficially appealing; almost all corners of US political belief at least give lip service to individualism and independence. However, advocates of political change will eventually need to address the fact that many US conservatives see this type of social coercion as an intended feature of society rather than a flaw. This is most obvious when one looks at family relationships. Partanen treats the idea that marriage should be a free choice between equals rather than an economic necessity as self-evident, but there is a significant strain of US political thought that embraces punishing people for not staying within the bounds of a conservative ideal of family. One will often find, primarily but not exclusively among the more religious, a contention that the basic unit of society is the (heterosexual, patriarchal) family, not the individual, and that the suffering of anyone outside that structure is their own fault. Not wanting to get married, be the primary caregiver for one's parents, or abandon a career in order to raise children is treated as malignant selfishness and immorality rather than a personal choice that can be enabled by a modern social system. Here, I think Partanen is accurate to identify the Finnish social system as more modern. It embraces the philosophical concept of modernity, namely that social systems can be improved and social structures are not timeless. This is going to be a hard argument to swallow for those who see the pressure towards forming dependency ties within families as natural, and societal efforts to relieve those pressures as government meddling. In that intellectual framework, rather than an attempt to improve the quality of life, government logistical support is perceived as hostility to traditional family obligations and an attempt to replace "natural" human ties with "artificial" dependence on government services. Partanen doesn't attempt to have that debate. Two other things struck me in this book. The first is that, in Partanen's presentation, Finns expect high-quality services from their government and work to improve it when it falls short. This sounds like an obvious statement, but I don't think it is in the context of US politics, and neither does Partanen. She devotes a chapter to the topic, subtitled "Go ahead: ask what your country can do for you." This is, to me, one of the most frustrating aspects of US political debate. Our attitude towards government is almost entirely hostile and negative even among the political corners that would like to see government do more. Failures of government programs are treated as malice, malfeasance, or inherent incompetence: in short, signs the program should never have been attempted, rather than opportunities to learn and improve. Finland had mediocre public schools, decided to make them better, and succeeded. The moment US public schools start deteriorating, we throw much of our effort into encouraging private competition and dismantling the public school system. Partanen doesn't draw this connection, but I see a link between the US desire for market solutions to societal problems and the level of exhaustion and anxiety that is so common in US life. Solving problems by throwing them open to competition is a way of giving up, of saying we have no idea how to improve something and are hoping someone else will figure it out for a profit. Analyzing the failures of an existing system and designing incremental improvements is hard and slow work. Throwing out the system and hoping some corporation will come up with something better is disruptive but easy. When everyone is already overwhelmed by life and devoid of energy to work on complex social problems, it's tempting to give up on compromise and coalition-building and let everyone go their separate ways on their own dime. We cede the essential work of designing a good society to start-ups. This creates a vicious cycle: the resulting market solutions are inevitably gated by wealth and thus precarious and artificially scarce, which in turn creates more anxiety and stress. The short-term energy savings from not having to wrestle with a hard problem is overwhelmed by the long-term cost of having to navigate a complex and adversarial economic relationship. That leads into the last point: schools. There's a lot of discussion here about school quality and design, which I won't review in detail but which is worth reading. What struck me about Partanen's discussion, though, is how easy the Finnish system is to use. Finnish parents just send their kids to the most convenient school and rarely give that a second thought. The critical property is that all the schools are basically fine, and therefore there is no need to place one's child in an exceptional school to ensure they have a good life. It's axiomatic in the US that more choice is better. This is a constant refrain in our political discussion around schools: parental choice, parental control, options, decisions, permission, matching children to schools tailored for their needs. Those choices are almost entirely absent in Finland, at least in Partanen's description, and the amount of mental and emotional energy this saves is astonishing. Parents simply don't think about this, and everything is fine. I think we dramatically underestimate the negative effects of constantly having to make difficult decisions with significant consequences, and drastically overstate the benefits of having every aspect of life be full of major decision points. To let go of that attempt at control, however illusory, people have to believe in a baseline of quality that makes the choice less fraught. That's precisely what Finland provides by expecting high-quality social services and working to fix them when they fall short, an effort that the United States has by and large abandoned. A lot of non-fiction books could be turned into long articles without losing much substance, and I think The Nordic Theory of Everything falls partly into that trap. Partanen repeats the same ideas from several different angles, and the book felt a bit padded towards the end. If you're already familiar with the policy comparisons between the US and Nordic countries, you will have seen a lot of this before, and the book bogs down when Partanen strays too far from memoir and personal reactions. But the focus on individualism and eliminating dependency is new, at least to me, and is such an illuminating way to look at the contrast that I think the book is worth reading just for that. Rating: 7 out of 10

Matt Brown: Retrospective: Mar 2023

The key decision I made mid-March was to commit to pursuing ventilation monitoring as my primary product development focus. Prior to that decision, I hoped to use my writing plan to drive a breadth-first survey of the opportunities for each of my product ideas before deciding which had the best business potential to focus on first. Two factors changed my mind:
  1. As noted last month, I m finding the writing process much slower and harder than I expected the survey across all the ideas may not complete until mid-year or later!
  2. I ve realised that having begun building co2mon.nz last year, to stop work on the project at this point would leave me feeling that I had not done justice to developing the product and testing the market - seeing it to a conclusion is important to me.
This decision is an explicit choice to prioritize seeing a project through to a conclusion (successful, or otherwise) regardless of whether or not it has the highest potential of the various ideas I could invest time into. I m comfortable making that trade-off in this instance, but I am going to bound my time investment to two months. I ll evaluate at the end of May whether I m seeing sufficient traction and potential to justify continuing further with the idea. I had only one fully uninterrupted work week in March due to a combination of days out due to school trips, two LandSAR call outs and various farm maintenance tasks. April will be similarly disrupted given school holidays and a planned family trip to Brisbane. Sharpening my focus feels particularly necessary given this reality to ensure I m not spread overly thin.

Goal Scoring See last month s retrospective for a refresher on my scoring methodology.

Consulting - 4/10 Goal: Execute a series of successful consulting engagements, building a reputation for myself and leaving happy customers willing to provide testimonials that support a pipeline of future opportunities. Consulting hours were down from February, hitting only 31% of target this month as the client didn t make use of all the hours I had allocated for them. I didn t invest any time in advertising my services or developing new clients or projects over the month, which will now become a priority for April.

Product Development - 4/10 Goal: Grow my product development skill set by taking several ideas to MVP stage with customer feedback received, and launch at least one product which generates revenue and has growth potential. With the new focus entirely on co2mon.nz, I spent a lot of time re-working and developing my thinking around how I want to take this forward, specifically trying to analyse where I saw an opportunity in the market. After attending a workshop on finding product market fit using quantifiable metrics at the Southern SaaS conference this month, I ve realised that much of the time I spent on this analysis is too insular and focused on my own observations - I need to get out and talk to a lot more people and get more feedback on their needs and understanding of the space instead. Seems obvious in retrospect! I also spent a few days beginning to build another batch of prototype CO2 monitors so I have some units to use for experimentation and testing with potential customers as I get out and have those conversations. I can probably build one or two more batches of prototype monitors before needing to look at PCB assembly in earnest.

Professional Network Development - 8/10 Goal: To build a professional relationship with at least 30 new people this year. This goal continues to be my highlight with 8 new contacts added this month and catch-ups with 4 existing people I had not spoken to for a while. I joined the KiwiSaaS central community and attended the SouthernSaas conference this month as well, which has been time well spent given the workshop learnings discussed above.

Writing - 3/10 Goal: To publish a high-quality piece of writing on this site at least once a week. I published a single post, the first half of my updated ventilation monitoring business plan. I continue to find the writing process much harder and slower than I hoped or expected and remain well below my target publishing rate, but one post is better than zero! I tested working with an editor I contracted via UpWork who provided some very useful feedback on the structure of my writing which helped to unblock some of my progress. I plan to continue doing this for at least a few more posts.

Community - 5/10 Goal: To support the growth of my local technical community by volunteering my experience and knowledge with others through activities such as mentoring, conference talks and similar.

Feedback As always, I d love to hear from you if you have thoughts or feedback triggered by anything I ve written above.

30 March 2023

Jonathan McDowell: Buttering up my storage

(TL;DR: I ve been trying out btrfs in some places instead of ext4, I ve hit absolutely zero issues and there are a few features that make me plan to use it more.) Despite (or perhaps because of) working on storage products for a reasonable chunk of my career I have tended towards a conservative approach to my filesystems. By the time I came to Linux ext2 was well established, the move to ext3 was a logical one (the joys of added journalling for faster recovery after unclean shutdowns) and for a long time my default stack has been MD raid with LVM2 on top and then ext4 as the filesystem. I ve dabbled with other filesystems; I ran XFS for a while on my VDR machine, and also when I had a large tradspool with INN, but never really had a hard requirement for it. I ve ended up adminning a machine that had JFS in the past, largely for historical reasons, but don t really remember any issues (vague recollections of NFS problems but that might just have been NFS being NFS). However. ZFS has gathered itself a significant fan base and that makes me wonder about what it can offer and whether I want that. Firstly, let s be clear that I m never going to run a primary filesystem that isn t part of the mainline kernel. So ZFS itself is out, because I run Linux. So what do I want that I can t get with ext4? Firstly, I d like data checksumming. As storage gets larger there s a bigger chance of silent data corruption and while I have backups of the important stuff that doesn t help if you don t know you need to use them. Secondly, these days I have machines running containers, VMs, or with lots of source checkouts with a reasonable amount of overlap in their data. Disk space has got cheaper, but I d still like to be able to do some sort of deduplication of common blocks. So, I ve been trying out btrfs. When I installed my desktop I went with btrfs for / and /home (I kept /boot as ext4). The thought process was that this was a local machine (so easy access if it all went wrong) and I take regular backups (so if it all went wrong I could recover). That was a year and a half ago and it s been pretty dull; I mostly forget I m running btrfs instead of ext4. This is on a machine that tracks Debian testing, so currently on kernel 6.1 but originally installed with 5.10. So it seems modern btrfs is reasonably stable for a machine that isn t driven especially hard. Good start. The fact I forget what filesystem I m running points to the fact that I m not actually doing anything special here. I get the advantage of data checksumming, but not much else. 2 things spring to mind. Firstly, I don t do snapshots. Given I run testing it might be wiser if I did take a snapshot before every apt-get upgrade, and I have a friend who does just that, but even when I ve run unstable I ve never had a machine get itself into a state that I couldn t recover so I haven t spent time investigating. I note Ubuntu has apt-btrfs-snapshot but it doesn t seem to have any updates for years. The other thing I didn t do when I installed my desktop is take advantage of subvolumes. I m still trying to get my head around exactly what I want them for, but they provide a partial replacement for LVM when it comes to carving up disk space. Instead of the separate / and /home LVs I created I could have created a single LV that would have a single btrfs filesystem on it. / and /home would then be separate subvolumes, allowing me to snapshot each individually. Quotas can also be applied separately so there s still the potential to prevent one subvolume taking all available space. Encouraged by the lack of hassle with my desktop I decided to try moving my sbuild machine over to use btrfs for its build chroots. For Reasons this is a VM kindly hosted by a friend, rather than something local. To be honest these days I would probably go for local hosting, but it works and there s no strong reason to move. The point is it s remote, and so if migrating went wrong and I had to ask for assistance I d be bothering someone who s doing me a favour as it is. The build VM is, of course, running LVM, and there was luckily some free space available. I m reasonably sure the underlying storage involves spinning rust, so I did a laborious set of pvmove commands to make sure all the available space was at the start of the PV, and created a new btrfs volume there. I was advised that while btrfs-convert would do the job it was better to create a fresh filesystem where possible. This time I did create an initial root subvolume. Configuring up sbuild was then much simpler than I d expected. My setup originally started out as a set of tarballs for the chroots that would get untarred + used for the builds, which is pretty slow. Once overlayfs was mature enough I switched to that. I d had a conversation with Enrico about his nspawn/btrfs setup, but it turned out Russ Allbery had written an excellent set of instructions on sbuild with btrfs. I tweaked my existing setup based on his details, and I was in business. Each chroot is a separate subvolume - I don t actually end up having to mount them individually, but it means that only the chroot in use gets snapshotted. For example during a build the following can be observed:
# btrfs subvolume list /
ID 257 gen 111534 top level 5 path root
ID 271 gen 111525 top level 257 path srv/chroot/unstable-amd64-sbuild
ID 275 gen 27873 top level 257 path srv/chroot/bullseye-amd64-sbuild
ID 276 gen 27873 top level 257 path srv/chroot/buster-amd64-sbuild
ID 343 gen 111533 top level 257 path srv/chroot/snapshots/unstable-amd64-sbuild-328059a0-e74b-4d9f-be70-24b59ccba121
I was a little confused about whether I d got something wrong because the snapshot top level is listed as 257 rather than 271, but digging further with btrfs subvolume show on the 2 mounted directories correctly showed the snapshot had a parent equal to the chroot, not /. As a final step I ran jdupes via jdupes -1Br / to deduplicate things across the filesystem. It didn t end up providing a significant saving unfortunately - I guess there s a reasonable amount of change between Debian releases - but I think tried it on my desktop, which tends to have a large number of similar source trees checked out. There I managed to save about 5% on /home, which didn t seem too shabby. The sbuild setup has been in place for a couple of months now, and I ve run quite a few builds on it while preparing for the freeze. So I m fairly confident in the stability of the setup and my next move is to transition my local house server over to btrfs for its containers (which all run under systemd-nspawn). Those are generally running a Debian stable base so there should be a decent amount of commonality for deduping. I m not saying I m yet at the point where I ll default to btrfs on new installs, but I m definitely looking at it for situations where I think I can get benefits from deduplication, or being able to divide up disk space without hard partitioning space. (And, just to answer the worry I had when I started, I ve got nowhere near ENOSPC problems, but I believe they re handled much more gracefully these days. And my experience of ZFS when it got above 90% utilization was far from ideal too.)

28 March 2023

Shirish Agarwal: Three Ancient Civilizations, Tesla, IMU and other things.

Three Ancient Civilizations I have been reading books for a long time but somehow I don t know how or when I realized that there are three medieval civilizations that time and again seem to fascinate the authors, either American or European. The three civilizations that do get mentioned every now and then are Egyptian, Greece and Roman and I have no clue as to why. Another two civilizations closely follow them, Mesopotian and Sumerian Civilizations. Why most of the authors irrespective of genre are mystified by these 5 civilizations is beyond me but they also conjure lot of imagination. Now if I was on the right side of 40, maybe 22-25 onwards and had the means or the opportunity or both, I would have gone and tried to learn as much as I can about these various civilizations. There is still a lot of enigma attached and it seems that the official explanations of how these various civilizations ended seems too good to be true or whatever. One of the more interesting points has been how Greece mythology were subverted and flipped to make Roman mythology. Apparently, many Greek gods have a Roman aspect and the qualities are opposite to the Greek aspect. I haven t learned the reason why it is so, yet. Apart from the Iziko Natural Museum in South Africa which did have its share of wonders (the whole South African experience was surreal) nowhere I have witnessed stuff from the above other than in Africa. The whole thing seemed just surreal. I could go on but as I don t know what is real and what is mythology when it comes to various cultures including whatever little I experienced in Africa.

Recent History Having said the above, I also find that many Indians somehow do not either know or are not interested in understanding recent history. I am talking of the time between 1800s and 1950 s. British apologist Mr. William Dalrymple in his book Anarchy has shared how the East Indian Company looted India and gave less than half back to the British. So where did the rest of the money go ? It basically went to the tax free havens around UK. That tale is very much similar as to how the Axis gold was used to have an entity called Switzerland from scratch. There have been books as well as couple of miniseries or two that document that fact. But for me this is not the real story, this is more of a side dish per-se. The real story perhaps is how the EU peace project was born. Because of Hitler s rise and the way World War 2 happened, the Allies knew that they couldn t subjugate Germany through another humiliation as they had after World War 1, who knows another Hitler might come. So while they did fine the Germans, they also helped them via the Marshall plan. Germany also realized that it had been warring with other European countries for over a thousand years as well as quite a few other countries. The first sort of treaty immediately in that regard was the Western Union Alliance . While on the face of it, it was a defense treaty between sovereign powers. Interestingly, as can be seen UK at that point in time wanted more countries to be part of the Treaty of Dunkirk. This was quickly followed 2 years later by the Treaty of London which made way for the Council of Europe to be formed. The reason I am sharing is because a lot of Indians whom I meet on SM do not know that UK was instrumental front and center for the formation of European Union (EU). Also they perhaps don t realize that after World War 2, UK was greatly diminished both financially and militarily. That is the reason it gave back its erstwhile colonies including India and other countries. If UK hadn t become a part of EU they would have been called the poor man of Europe as they had been called few hundred years ago. In fact, after Brexit, UK has been the only nation that has fallen on the dire times as much as it has. They have practically no food, supermarkets running out of veggies and whatnot. electricity sky-high as they don t want to curtail profits of the gas companies which in turn donate money to Tory coffers. Sadly, most of my brethen do not know that hence I have had to share it. The EU became a power in its own right due to Gravity model of trade. The U.S. is attempting the same thing and calls it near off-shoring.

Tesla, Investor Day Presentation, Toyota and Free Software The above brings it nicely about the Tesla Investor Day that happened couple of weeks back. The biggest news though was broken just 24 hours earlier when the governor of Monterrey] shared how Giga New Mexico would be happening and shared some site photographs. There have been bits of news on that off-and-on since then. Toyota meanwhile has been putting lot of anti-EV posters and whatnot. In fact, India seems to be mirroring Japan in a lot of ways, the only difference is Japan is still superior economically than India. I do sincerely hope that at least Japan can get out of its lost decades. I have asked privately if any of the Japanese translators would be willing to translate the anti-EV poster so we may all know.
Anti-EV poster by Toyota

Urinal outside UK Embassy About a week back few people chanting Khalistan entered the Indian Embassy and showed the flags. This was in the UK. Now in a retaliatory move against a friendly nation, we want to make a Urinal. after making a security downgrade for the Embassy. The pettiness being shown by Indian Govt. especially when it has very few friends. There are also plans to do the same for the U.S. and other embassies as well. I do not know when we will get common sense.

Holi Just a few weeks back, Holi happened. It used to be a sweet and innocent festival. But from the last few years, I have been hearing and seeing sexual harassment on the rise. In fact, saw quite a few lewd posts written to women on Holi or verge of Holi and also videos of the same. One of the most shameful incidents occurred with a Japanese tourist. She was not only sexually molested but also terrorized that if she were to report then she could be raped and murdered. She promptly left Delhi. Once it was reported in mass media, Delhi Police tried to show it was doing something. FWIW, Delhi s crime against women stats have been at record high and conviction at record low.

Ecommerce Rules being changed, logistics gonna be tough. Just today there have been changes in Ecommerce rules (again) and this is gonna be a pain for almost all companies big and small with the exception of Adani and Ambani. Almost all players including the Tatas have called them out. Of course, all such laws have been passed without debate. In such a scenario, small startups like these cannot hope to grow their business.

Inertial Measurement Unit or Full Body Tracking with cheap hardware. Apparently, a whole host of companies are looking at 3-D tracking using cheap hardware apparently known as IMU sensors. Sooner than later cheap 3-D glasses and IMU sensors should explode the 3-D market worldwide. There is a huge potential and upside to it and will probably overtake smartphones as well. But then the danger will be of our thoughts, ideas, nightmares etc. to be shared without our consent. The more we tap into a virtual world, what stops anybody from tapping into our brain and practically stealing our identity in more than one way. I dunno if there are any Debian people or FSF projects working on the above. Even Laws can only do so much, until and unless there are alternative places and ways it would be difficult to say the least

Next.

Previous.