arduino
was in Debian.
But it turns out that the Debian package s version doesn t support the DigiSpark. (AFAICT from the list it offered me, I m not sure it supports any ATTiny85 board.) Also, disturbingly, its board manager seemed to be offering to install board support, suggesting it would download stuff from the internet and run it. That wouldn t be acceptable for my main laptop.
I didn t expect to be doing much programming or debugging, and the project didn t have significant security requirements: the chip, in my circuit, has only a very narrow ability do anything to the real world, and no network connection of any kind. So I thought it would be tolerable to do the project on my low-security video laptop . That s the machine where I m prepared to say yes to installing random software off the internet.
So I went to the upstream Arduino site and downloaded a tarball containing the Arduino IDE. After unpacking that in /opt
it ran and produced a pointy-clicky IDE, as expected. I had already found a 3rd-party tutorial saying I needed to add a magic URL (from the DigiSpark s vendor) in the preferences. That indeed allowed it to download a whole pile of stuff. Compilers, bootloader clients, god knows what.
However, my tiny test program didn t make it to the board. Half-buried in a too-small window was an error message about the board s bootloader ( Micronucleus ) being too new.
The boards I had came pre-flashed with micronucleus 2.2. Which is hardly new, But even so the official Arduino IDE (or maybe the DigiSpark s board package?) still contains an old version. So now we have all the downsides of curl bash
-ware, but we re lacking the it s up to date and it just works upsides.
Further digging found some random forum posts which suggested simply downloading a newer micronucleus and manually stuffing it into the right place: one overwrites a specific file, in the middle the heaps of stuff that the Arduino IDE s board support downloader squirrels away in your home directory. (In my case, the home directory of the untrusted shared user on the video laptop,)
So, whatever . I did that. And it worked!
Having demo d my ability to run code on the board, I set about writing my program.
Writing C again
The programming language offered via the Arduino IDE is C.
It s been a little while since I started a new thing in C. After having spent so much of the last several years writing Rust. C s primitiveness quickly started to grate, and the program couldn t easily be as DRY as I wanted (Don t Repeat Yourself, see Wilson et al, 2012, 4, p.6). But, I carried on; after all, this was going to be quite a small job.
Soon enough I had a program that looked right and compiled.
Before testing it in circuit, I wanted to do some QA. So I wrote a simulator harness that #include
d my Arduino source file, and provided imitations of the few Arduino library calls my program used. As an side advantage, I could build and run the simulation on my main machine, in my normal development environment (Emacs, make
, etc.). The simulator runs confirmed the correct behaviour. (Perhaps there would have been some more faithful simulation tool, but the Arduino IDE didn t seem to offer it, and I wasn t inclined to go further down that kind of path.)
So I got the video laptop out, and used the Arduino IDE to flash the program. It didn t run properly. It hung almost immediately. Some very ad-hoc debugging via led-blinking (like printf debugging, only much worse) convinced me that my problem was as follows:
Arduino C has 16-bit ints. My test harness was on my 64-bit Linux machine. C was autoconverting things (when building for the micrcocontroller). The way the Arduino IDE ran the compiler didn t pass the warning options necessary to spot narrowing implicit conversions. Those warnings aren t the default in C in general javax.jmdns
, with hex DNS packet dumps. WTF.
Other things that were vexing about the Arduino IDE: it has fairly fixed notions (which don t seem to be documented) about how your files and directories ought to be laid out, and magical machinery for finding things you put nearby its sketch (as it calls them) and sticking them in its ear, causing lossage. It has a tendency to become confused if you edit files under its feet (e.g. with git checkout
). It wasn t really very suited to a workflow where principal development occurs elsewhere.
And, important settings such as the project s clock speed, or even the target board, or the compiler warning settings to use weren t stored in the project directory along with the actual code. I didn t look too hard, but I presume they must be in a dotfile somewhere. This is madness.
Apparently there is an Arduino CLI too. But I was already quite exasperated, and I didn t like the idea of going so far off the beaten path, when the whole point of using all this was to stay with popular tooling and share fate with others. (How do these others cope? I have no idea.)
As for the integer overflow bug:
I didn t seriously consider trying to figure out how to control in detail the C compiler options passed by the Arduino IDE. (Perhaps this is possible, but not really documented?) I did consider trying to run a cross-compiler myself from the command line, with appropriate warning options, but that would have involved providing (or stubbing, again) the Arduino/DigiSpark libraries (and bugs could easily lurk at that interface).
Instead, I thought, if only I had written the thing in Rust . But that wasn t possible, was it? Does Rust even support this board?
Rust on the DigiSpark
I did a cursory web search and found a very useful blog post by Dylan Garrett.
This encouraged me to think it might be a workable strategy. I looked at the instructions there. It seemed like I could run them via the privsep arrangement I use to protect myself when developing using upstream cargo packages from crates.io
.
I got surprisingly far surprisingly quickly. It did, rather startlingly, cause my rustup
to download a random recent Nightly Rust, but I have six of those already for other Reasons. Very quickly I got the trinket LED blink example, referenced by Dylan s blog post, to compile. Manually copying the file to the video laptop allowed me to run the previously-downloaded micronucleus executable and successfully run the blink example on my board!
I thought a more principled approach to the bootloader client might allow a more convenient workflow. I found the upstream Micronucleus git releases and tags, and had a look over its source code, release dates, etc. It seemed plausible, so I compiled v2.6 from source. That was a success: now I could build and install a Rust program onto my board, from the command line, on my main machine. No more pratting about with the video laptop.
I had got further, more quickly, with Rust, than with the Arduino IDE, and the outcome and workflow was superior.
So, basking in my success, I copied the directory containing the example into my own project, renamed it, and adjusted the path
references.
That didn t work. Now it didn t build. Even after I copied about .cargo/config.toml
and rust-toolchain.toml
it didn t build, producing a variety of exciting messages, depending what precisely I tried. I don t have detailed logs of my flailing: the instructions say to build it by cd
ing to the subdirectory, and, given that what I was trying to do was to not follow those instructions, it didn t seem sensible to try to prepare a proper repro so I could file a ticket. I wasn t optimistic about investigating it more deeply myself: I have some experience of fighting cargo, and it s not usually fun. Looking at some of the build control files, things seemed quite complicated.
Additionally, not all of the crates are on crates.io
. I have no idea why not. So, I would need to supply local copies of them anyway. I decided to just git subtree add
the avr-hal
git tree.
(That seemed better than the approach taken by the avr-hal project s cargo template, since that template involve a cargo dependency on a foreign git
repository. Perhaps it would be possible to turn them into path
dependencies, but given that I had evidence of file-location-sensitive behaviour, which I didn t feel like I wanted to spend time investigating, using that seems like it would possibly have invited more trouble. Also, I don t like package templates very much. They re a form of clone-and-hack: you end up stuck with whatever bugs or oddities exist in the version of the template which was current when you started.)
Since I couldn t get things to build outside avr-hal
, I edited the example, within avr-hal
, to refer to my (one) program.rs
file outside avr-hal
, with a #[path]
instruction. That s not pretty, but it worked.
I also had to write a nasty shell script to work around the lack of good support in my nailing-cargo
privsep tool for builds where cargo
must be invoked in a deep subdirectory, and/or Cargo.lock
isn t where it expects, and/or the target
directory containing build products is in a weird place. It also has to filter the output from cargo
to adjust the pathnames in the error messages. Otherwise, running both cd A; cargo build
and cd B; cargo build
from a Makefile
produces confusing sets of error messages, some of which contain filenames relative to A
and some relative to B
, making it impossible for my Emacs to reliably find the right file.
RIIR (Rewrite It In Rust)
Having got my build tooling sorted out I could go back to my actual program.
I translated the main program, and the simulator, from C to Rust, more or less line-by-line. I made the Rust version of the simulator produce the same output format as the C one. That let me check that the two programs had the same (simulated) behaviour. Which they did (after fixing a few glitches in the simulator log formatting).
Emboldened, I flashed the Rust version of my program to the DigiSpark. It worked right away!
RIIR had caused the bug to vanish. Of course, to rewrite the program in Rust, and get it to compile, it was necessary to be careful about the types of all the various integers, so that s not so surprising. Indeed, it was the point. I was then able to refactor the program to be a bit more natural and DRY, and improve some internal interfaces. Rust s greater power, compared to C, made those cleanups easier, so making them worthwhile.
However, when doing real-world testing I found a weird problem: my timings were off. Measured, the real program was too fast by a factor of slightly more than 2. A bit of searching (and searching my memory) revealed the cause: I was using a board template for an Adafruit Trinket. The Trinket has a clock speed of 8MHz. But the DigiSpark runs at 16.5MHz. (This is discussed in a ticket against one of the C/C++ libraries supporting the ATTiny85 chip.)
The Arduino IDE had offered me a choice of clock speeds. I have no idea how that dropdown menu took effect; I suspect it was adding prelude code to adjust the clock prescaler. But my attempts to mess with the CPU clock prescaler register by hand at the start of my Rust program didn t bear fruit.
So instead, I adopted a bodge: since my code has (for code structure reasons, amongst others) only one place where it dealt with the underlying hardware s notion of time, I simply changed my delay
function to adjust the passed-in delay values, compensating for the wrong clock speed.
There was probably a more principled way. For example I could have (re)based my work on either of the two unmerged open MRs which added proper support for the DigiSpark board, rather than abusing the Adafruit Trinket definition. But, having a nearly-working setup, and an explanation for the behaviour, I preferred the narrower fix to reopening any cans of worms.
An offer of help
As will be obvious from this posting, I m not an expert in dev tools for embedded systems. Far from it. This area seems like quite a deep swamp, and I m probably not the person to help drain it. (Frankly, much of the improvement work ought to be done, and paid for, by hardware vendors.)
But, as a full Member of the Debian Project, I have considerable gatekeeping authority there. I also have much experience of software packaging, build systems, and release management. If anyone wants to try to improve the situation with embedded tooling in Debian, and is willing to do the actual packaging work. I would be happy to advise, and to review and sponsor your contributions.
An obvious candidate: it seems to me that micronucleus
could easily be in Debian. Possibly a DigiSpark board definition could be provided to go with the arduino
package.
Unfortunately, IMO Debian s Rust packaging tooling and workflows are very poor, and the first of my suggestions for improvement wasn t well received. So if you need help with improving Rust packages in Debian, please talk to the Debian Rust Team yourself.
Conclusions
Embedded programming is still rather a mess and probably always will be.
Embedded build systems can be bizarre. Documentation is scant. You re often expected to download board support packages full of mystery binaries, from the board vendor (or others).
Dev tooling is maddening, especially if aimed at novice programmers. You want version control? Hermetic tracking of your project s build and install configuration? Actually to be told by the compiler when you write obvious bugs? You re way off the beaten track.
As ever, Free Software is under-resourced and the maintainers are often busy, or (reasonably) have other things to do with their lives.
All is not lost
Rust can be a significantly better bet than C for embedded software:
The Rust compiler will catch a good proportion of programming errors, and an experienced Rust programmer can arrange (by suitable internal architecture) to catch nearly all of them. When writing for a chip in the middle of some circuit, where debugging involves staring an LED or a multimeter, that s precisely what you want.
Rust embedded dev tooling was, in this case, considerably better. Still quite chaotic and strange, and less mature, perhaps. But: significantly fewer mystery downloads, and significantly less crazy deviations from the language s normal build system. Overall, less bad software supply chain integrity.
The ATTiny85 chip, and the DigiSpark board, served my hardware needs very well. (More about the hardware aspects of this project in a future posting.)Publisher: | Harper Voyage |
Copyright: | August 2022 |
ISBN: | 0-06-302144-7 |
Format: | Kindle |
Pages: | 544 |
ssh hostname
, where hostname is one of your machines (server, laptop, whatever), and as long as hostname
is up, you can reach it, wherever it is, wherever you are.
MulticastInterfaces
section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys
as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only.
Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs.
Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.
yggdrasilctl debug_remotegetpeers key=ABC...
). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.
/etc/hosts
and leave it at that, but it s up to you.
/etc/resolv.conf
on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick.
It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf
. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds.
I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.
./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"
Tool | iperf3 (default) | iperf3 -P 10 | iperf3 -R |
---|---|---|---|
Direct (no VPN) | 2406 | 2406 | 2764 |
Wireguard (kernel) | 1515 | 1566 | 2027 |
Yggdrasil | 892 | 1126 | 1105 |
Tailscale | 950 | 1034 | 1085 |
Tinc | 296 | 300 | 277 |
genkeys
tool that uses more CPU cycles to generate keys that are maximally difficult to collide with.
Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well.
So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.
Publisher: | The New Press |
Copyright: | 2022 |
ISBN: | 1-62097-690-0 |
Format: | Kindle |
Pages: | 257 |
hz.tools
will be tagged
#hztools.librtlsdr
to read in
an IQ stream, did some filtering, and played the real valued audio stream via
pulseaudio
. Over 4 years this has slowly grown through persistence, lots of
questions to too many friends to thank (although I will try), and the eternal
patience of my wife hearing about radios nonstop for years into a number
of Go repos that can do quite a bit, and support a handful of radios.
I ve resisted making the repos public not out of embarrassment or a desire to
keep secrets, but rather, an attempt to keep myself free of any maintenance
obligations to users so that I could freely break my own API, add and remove
API surface as I saw fit. The worst case was to have this project feel like
work, and I can t imagine that will happen if I feel frustrated by PRs
that are getting ahead of me solving problems I didn t yet know about, or
bugs I didn t understand the fix for.
As my rate of changes to the most central dependencies has slowed, i ve begun
to entertain the idea of publishing them. After a bit of back and forth, I ve
decided
it s time to make a number of them public,
and to start working on them in the open, as I ve built up a bit of knowledge
in the space, and I and feel confident that the repo doesn t contain overt
lies. That s not to say it doesn t contain lies, but those lies are likely
hidden and lurking in the dark. Beware.
That being said, it shouldn t be a surprise to say I ve not published
everything yet for the same reasons as above. I plan to open repos as the
rate of changes slows and I understand the problems the library solves well
enough or if the project dead ends and I ve stopped learning.
hz.tools/rf
library contains the abstract concept of frequency, and
some very basic helpers to interact with frequency ranges (such as helpers
to deal with frequency ranges, or frequency range math) as well as frequencies
and some very basic conversions (to meters, etc) and parsers (to parse values
like 10MHz
). This ensures that all the hz.tools
libraries have a shared
understanding of Frequencies, a standard way of representing ranges of
Frequencies, and the ability to handle the IO boundary with things like CLI
arguments, JSON or YAML.
The git repo can be found at
github.com/hztools/go-rf, and is
importable as hz.tools/rf.
// Parse a frequency using hz.tools/rf.ParseHz, and print it to stdout.
freq := rf.MustParseHz("-10kHz")
fmt.Printf("Frequency: %s\n", freq+rf.MHz)
// Prints: 'Frequency: 990kHz'
// Return the Intersection between two RF ranges, and print
// it to stdout.
r1 := rf.Range rf.KHz, rf.MHz
r2 := rf.Range rf.Hz(10), rf.KHz * 100
fmt.Printf("Range: %s\n", r1.Intersection(r2))
// Prints: Range: 1000Hz->100kHz
io
idioms
so that this library feels as idiomatic as it can, so that Go builtins interact
with IQ in a way that s possible to reason about, and to avoid reinventing the
wheel by designing new API surface. While some of the API looks (and is even
called) the same thing as a similar function in io
, the implementation is
usually a lot more naive, and may have unexpected sharp edges such as
concurrency issues or performance problems.
The following IQ types are implemented using the sdr.Samples
interface. The
hz.tools/sdr
package contains helpers for conversion between types, and some
basic manipulation of IQ streams.
IQ Format | hz.tools Name | Underlying Go Type |
---|---|---|
Interleaved uint8 (rtl-sdr) | sdr.SamplesU8 |
[][2]uint8 |
Interleaved int8 (hackrf, uhd) | sdr.SamplesI8 |
[][2]int8 |
Interleaved int16 (pluto, uhd) | sdr.SamplesI16 |
[][2]int16 |
Interleaved float32 (airspy, uhd) | sdr.SamplesC64 |
[]complex64 |
SDR | Format | RX/TX | State |
---|---|---|---|
rtl | u8 | RX | Good |
HackRF | i8 | RX/TX | Good |
PlutoSDR | i16 | RX/TX | Good |
rtl kerberos | u8 | RX | Old |
uhd | i16/c64/i8 | RX/TX | Good |
airspyhf | c64 | RX | Exp |
Import | What is it? |
---|---|
hz.tools/sdr | Core IQ types, supporting types and implementations that interact with the byte boundary |
hz.tools/sdr/rtl | sdr.Receiver implementation using librtlsdr . |
hz.tools/sdr/rtl/kerberos | Helpers to enable coherent RX using the Kerberos SDR. |
hz.tools/sdr/rtl/e4k | Helpers to interact with the E4000 RTL-SDR dongle. |
hz.tools/sdr/fft | Interfaces for performing an FFT, which are implemented by other packages. |
hz.tools/sdr/rtltcp | sdr.Receiver implementation for rtl_tcp servers. |
hz.tools/sdr/pluto | sdr.Transceiver implementation for the PlutoSDR using libiio . |
hz.tools/sdr/uhd | sdr.Transceiver implementation for UHD radios, specifically the B210 and B200mini |
hz.tools/sdr/hackrf | sdr.Transceiver implementation for the HackRF using libhackrf . |
hz.tools/sdr/mock | Mock SDR for testing purposes. |
hz.tools/sdr/airspyhf | sdr.Receiver implementation for the AirspyHF+ Discovery with libairspyhf . |
hz.tools/sdr/internal/simd | SIMD helpers for IQ operations, written in Go ASM. This isn t the best to learn from, and it contains pure go implemtnations alongside. |
hz.tools/sdr/stream | Common Reader/Writer helpers that operate on IQ streams. |
hz.tools/fftw
package contains bindings to libfftw3
to implement
the hz.tools/sdr/fft.Planner
type to transform between the time and
frequency domain.
The git repo can be found at
github.com/hztools/go-fftw, and is
importable as hz.tools/fftw.
This is the default throughout most of my codebase, although that default is
only expressed at the leaf package libraries should not be hardcoding the
use of this library in favor of taking an fft.Planner
, unless it s used as
part of testing. There are a bunch of ways to do an FFT out there, things like
clFFT
or a pure-go FFT implementation could be plugged in depending on what s
being solved for.
hz.tools/fm
and hz.tools/am
packages contain demodulators for
AM analog radio, and FM analog radio. This code is a bit old, so it has
a lot of room for cleanup, but it ll do a very basic demodulation of IQ
to audio.
The git repos can be found at
github.com/hztools/go-fm and
github.com/hztools/go-am,
and are importable as
hz.tools/fm and
hz.tools/am.
As a bonus, the hz.tools/fm
package also contains a modulator, which has been
tested on the air and with some of my handheld radios. This code is a bit
old, since the hz.tools/fm
code is effectively the first IQ processing code
I d ever written, but it still runs and I run it from time to time.
// Basic sketch for playing FM radio using a reader stream from
// an SDR or other IQ stream.
bandwidth := 150*rf.KHz
reader, err = stream.ConvertReader(reader, sdr.SampleFormatC64)
if err != nil
...
demod, err := fm.Demodulate(reader, fm.DemodulatorConfig
Deviation: bandwidth / 2,
Downsample: 8, // some value here depending on sample rate
Planner: fftw.Plan,
)
if err != nil
...
speaker, err := pulseaudio.NewWriter(pulseaudio.Config
Format: pulseaudio.SampleFormatFloat32NE,
Rate: demod.SampleRate(),
AppName: "rf",
StreamName: "fm",
Channels: 1,
SinkName: "",
)
if err != nil
...
buf := make([]float32, 1024*64)
for
i, err := demod.Read(buf)
if err != nil
...
if i == 0
panic("...")
if err := speaker.Write(buf[:i]); err != nil
...
hz.tools/rfcap
package is the reference implementation of the
rfcap spec , and is how I store IQ captures
locally, and how I send them across a byte boundary.
The git repo can be found at
github.com/hztools/go-rfcap, and is
importable as hz.tools/rfcap.
If you re interested in storing IQ in a way others can use, the better approach
is to use SigMF rfcap
exists for cases
like using UNIX pipes to move IQ around, through APIs, or when I send
IQ data through an OS socket, to ensure the sample format (and other metadata)
is communicated with it.
rfcap
has a number of limitations, for instance, it can not express a change
in frequency or sample rate during the capture, since the header is fixed at
the beginning of the file.
Barbie No, seriously! If anyone can make a good film about a doll franchise, it's probably Greta Gerwig. Not only was Little Women (2019) more than admirable, the same could be definitely said for Lady Bird (2017). More importantly, I can't help feel she was the real 'Driver' behind Frances Ha (2012), one of the better modern takes on Claudia Weill's revelatory Girlfriends (1978). Still, whenever I remember that Barbie will be a film about a billion-dollar toy and media franchise with a nettlesome history, I recall I rubbished the "Facebook film" that turned into The Social Network (2010). Anyway, the trailer for Barbie is worth watching, if only because it seems like a parody of itself.
Blitz It's difficult to overstate just how important the aerial bombing of London during World War II is crucial to understanding the British psyche, despite it being a constructed phenomenon from the outset. Without wishing to underplay the deaths of over 40,000 civilian deaths, Angus Calder pointed out in the 1990s that the modern mythology surrounding the event "did not evolve spontaneously; it was a propaganda construct directed as much at [then neutral] American opinion as at British." It will therefore be interesting to see how British Grenadian Trinidadian director Steve McQueen addresses a topic so essential to the British self-conception. (Remember the controversy in right-wing circles about the sole Indian soldier in Christopher Nolan's Dunkirk (2017)?) McQueen is perhaps best known for his 12 Years a Slave (2013), but he recently directed a six-part film anthology for the BBC which addressed the realities of post-Empire immigration to Britain, and this leads me to suspect he sees the Blitz and its surrounding mythology with a more critical perspective. But any attempt to complicate the story of World War II will be vigorously opposed in a way that will make the recent hullabaloo surrounding The Crown seem tame. All this is to say that the discourse surrounding this release may be as interesting as the film itself.
Dune, Part II Coming out of the cinema after the first part of Denis Vileneve's adaptation of Dune (2021), I was struck by the conception that it was less of a fresh adaptation of the 1965 novel by Frank Herbert than an attempt to rehabilitate David Lynch's 1984 version and in a broader sense, it was also an attempt to reestablish the primacy of cinema over streaming TV and the myriad of other distractions in our lives. I must admit I'm not a huge fan of the original novel, finding within it a certain prurience regarding hereditary military regimes and writing about them with a certain sense of glee that belies a secret admiration for them... not to mention an eyebrow-raising allegory for the Middle East. Still, Dune, Part II is going to be a fantastic spectacle.
Ferrari It'll be curious to see how this differs substantially from the recent Ford v Ferrari (2019), but given that Michael Mann's Heat (1995) so effectively re-energised the gangster/heist genre, I'm more than willing to kick the tires of this about the founder of the eponymous car manufacturer. I'm in the minority for preferring Mann's Thief (1981) over Heat, in part because the former deals in more abstract themes, so I'd have perhaps prefered to look forward to a more conceptual film from Mann over a story about one specific guy.
How Do You Live There are a few directors one can look forward to watching almost without qualification, and Hayao Miyazaki (My Neighbor Totoro, Kiki's Delivery Service, Princess Mononoke Howl's Moving Castle, etc.) is one of them. And this is especially so given that The Wind Rises (2013) was meant to be the last collaboration between Miyazaki and Studio Ghibli. Let's hope he is able to come out of retirement in another ten years.
Indiana Jones and the Dial of Destiny Given I had a strong dislike of Indiana Jones and the Kingdom of the Crystal Skull (2008), I seriously doubt I will enjoy anything this film has to show me, but with 1981's Raiders of the Lost Ark remaining one of my most treasured films (read my brief homage), I still feel a strong sense of obligation towards the Indiana Jones name, despite it feeling like the copper is being pulled out of the walls of this franchise today.
Kafka I only know Polish filmmaker Agnieszka Holland through her Spoor (2017), an adaptation of Olga Tokarczuk's 2009 eco-crime novel Drive Your Plow Over the Bones of the Dead. I wasn't an unqualified fan of Spoor (nor the book on which it is based), but I am interested in Holland's take on the life of Czech author Franz Kafka, an author enmeshed with twentieth-century art and philosophy, especially that of central Europe. Holland has mentioned she intends to tell the story "as a kind of collage," and I can hope that it is an adventurous take on the over-furrowed biopic genre. Or perhaps Gregor Samsa will awake from uneasy dreams to find himself transformed in his bed into a huge verminous biopic.
The Killer It'll be interesting to see what path David Fincher is taking today, especially after his puzzling and strangely cold Mank (2020) portraying the writing process behind Orson Welles' Citizen Kane (1941). The Killer is said to be a straight-to-Netflix thriller based on the graphic novel about a hired assassin, which makes me think of Fincher's Zodiac (2007), and, of course, Se7en (1995). I'm not as entranced by Fincher as I used to be, but any film with Michael Fassbender and Tilda Swinton (with a score by Trent Reznor) is always going to get my attention.
Killers of the Flower Moon In Killers of the Flower Moon, Martin Scorsese directs an adaptation of a book about the FBI's investigation into a conspiracy to murder Osage tribe members in the early years of the twentieth century in order to deprive them of their oil-rich land. (The only thing more quintessentially American than apple pie is a conspiracy combined with a genocide.) Separate from learning more about this disquieting chapter of American history, I'd love to discover what attracted Scorsese to this particular story: he's one of the few top-level directors who have the ability to lucidly articulate their intentions and motivations.
Napoleon It often strikes me that, despite all of his achievements and fame, it's somehow still possible to claim that Ridley Scott is relatively underrated compared to other directors working at the top level today. Besides that, though, I'm especially interested in this film, not least of all because I just read Tolstoy's War and Peace (read my recent review) and am working my way through the mind-boggling 431-minute Soviet TV adaptation, but also because several auteur filmmakers (including Stanley Kubrick) have tried to make a Napoleon epic and failed.
Oppenheimer In a way, a biopic about the scientist responsible for the atomic bomb and the Manhattan Project seems almost perfect material for Christopher Nolan. He can certainly rely on stars to queue up to be in his movies (Robert Downey Jr., Matt Damon, Kenneth Branagh, etc.), but whilst I'm certain it will be entertaining on many fronts, I fear it will fall into the well-established Nolan mould of yet another single man struggling with obsession, deception and guilt who is trying in vain to balance order and chaos in the world.
The Way of the Wind Marked by philosophical and spiritual overtones, all of Terrence Malick's films are perfumed with themes of transcendence, nature and the inevitable conflict between instinct and reason. My particular favourite is his stunning Days of Heaven (1978), but The Thin Red Line (1998) and A Hidden Life (2019) also touched me ways difficult to relate, and are one of the few films about the Second World War that don't touch off my sensitivity about them (see my remarks about Blitz above). It is therefore somewhat Malickian that his next film will be a biblical drama about the life of Jesus. Given Malick's filmography, I suspect this will be far more subdued than William Wyler's 1959 Ben-Hur and significantly more equivocal in its conviction compared to Paolo Pasolini's ardently progressive The Gospel According to St. Matthew (1964). However, little beyond that can be guessed, and the film may not even appear until 2024 or even 2025.
Zone of Interest I was mesmerised by Jonathan Glazer's Under the Skin (2013), and there is much to admire in his borderline 'revisionist gangster' film Sexy Beast (2000), so I will definitely be on the lookout for this one. The only thing making me hesitate is that Zone of Interest is based on a book by Martin Amis about a romance set inside the Auschwitz concentration camp. I haven't read the book, but Amis has something of a history in his grappling with the history of the twentieth century, and he seems to do it in a way that never sits right with me. But if Paul Verhoeven's Starship Troopers (1997) proves anything at all, it's all in the adaption.
War and Peace (1867) Leo Tolstoy It's strange to think that there is almost no point in reviewing this novel: who hasn't heard of War and Peace? What more could possibly be said about it now? Still, when I was growing up, War and Peace was always the stereotypical example of the 'impossible book', and even start it was, at best, a pointless task, and an act of hubris at worst. And so there surely exists a parallel universe in which I never have and will never will read the book... Nevertheless, let us try to set the scene. Book nine of the novel opens as follows:
On the twelfth of June, 1812, the forces of Western Europe crossed the Russian frontier and war began; that is, an event took place opposed to human reason and to human nature. Millions of men perpetrated against one another such innumerable crimes, frauds, treacheries, thefts, forgeries, issues of false money, burglaries, incendiarisms and murders as in whole centuries are not recorded in the annals of all the law courts of the world, but which those who committed them did not at the time regard as being crimes. What produced this extraordinary occurrence? What were its causes? [ ] The more we try to explain such events in history reasonably, the more unreasonable and incomprehensible they become to us.Set against the backdrop of the Napoleonic Wars and Napoleon's invasion of Russia, War and Peace follows the lives and fates of three aristocratic families: The Rostovs, The Bolkonskys and the Bezukhov's. These characters find themselves situated athwart (or against) history, and all this time, Napoleon is marching ever closer to Moscow. Still, Napoleon himself is essentially just a kind of wallpaper for a diverse set of personal stories touching on love, jealousy, hatred, retribution, naivety, nationalism, stupidity and much much more. As Elif Batuman wrote earlier this year, "the whole premise of the book was that you couldn t explain war without recourse to domesticity and interpersonal relations." The result is that Tolstoy has woven an incredibly intricate web that connects the war, noble families and the everyday Russian people to a degree that is surprising for a book started in 1865. Tolstoy's characters are probably timeless (especially the picaresque adventures and constantly changing thoughts Pierre Bezukhov), and the reader who has any social experience will immediately recognise characters' thoughts and actions. Some of this is at a 'micro' interpersonal level: for instance, take this example from the elegant party that opens the novel:
Each visitor performed the ceremony of greeting this old aunt whom not one of them knew, not one of them wanted to know, and not one of them cared about. The aunt spoke to each of them in the same words, about their health and her own and the health of Her Majesty, who, thank God, was better today. And each visitor, though politeness prevented his showing impatience, left the old woman with a sense of relief at having performed a vexatious duty and did not return to her the whole evening.But then, some of the focus of the observations are at the 'macro' level of the entire continent. This section about cities that feel themselves in danger might suffice as an example:
At the approach of danger, there are always two voices that speak with equal power in the human soul: one very reasonably tells a man to consider the nature of the danger and the means of escaping it; the other, still more reasonably, says that it is too depressing and painful to think of the danger, since it is not in man s power to foresee everything and avert the general course of events, and it is therefore better to disregard what is painful till it comes and to think about what is pleasant. In solitude, a man generally listens to the first voice, but in society to the second.And finally, in his lengthy epilogues, Tolstoy offers us a dissertation on the behaviour of large organisations, much of it through engagingly witty analogies. These epilogues actually turn out to be an oblique and sarcastic commentary on the idiocy of governments and the madness of war in general. Indeed, the thorough dismantling of the 'great man' theory of history is a common theme throughout the book:
During the whole of that period [of 1812], Napoleon, who seems to us to have been the leader of all these movements as the figurehead of a ship may seem to a savage to guide the vessel acted like a child who, holding a couple of strings inside a carriage, thinks he is driving it. [ ] Why do [we] all speak of a military genius ? Is a man a genius who can order bread to be brought up at the right time and say who is to go to the right and who to the left? It is only because military men are invested with pomp and power and crowds of sychophants flatter power, attributing to it qualities of genius it does not possess.Unlike some other readers, I especially enjoyed these diversions into the accounting and workings of history, as well as our narrow-minded way of trying to 'explain' things in a singular way:
When an apple has ripened and falls, why does it fall? Because of its attraction to the earth, because its stalk withers, because it is dried by the sun, because it grows heavier, because the wind shakes it, or because the boy standing below wants to eat it? Nothing is the cause. All this is only the coincidence of conditions in which all vital organic and elemental events occur. And the botanist who finds that the apple falls because the cellular tissue decays and so forth is equally right with the child who stands under the tree and says the apple fell because he wanted to eat it and prayed for it.Given all of these serious asides, I was also not expecting this book to be quite so funny. At the risk of boring the reader with citations, take this sarcastic remark about the ineptness of medicine men:
After his liberation, [Pierre] fell ill and was laid up for three months. He had what the doctors termed 'bilious fever.' But despite the fact that the doctors treated him, bled him and gave him medicines to drink he recovered.There is actually a multitude of remarks that are not entirely complimentary towards Russian medical practice, but they are usually deployed with an eye to the human element involved rather than simply to the detriment of a doctor's reputation "How would the count have borne his dearly loved daughter s illness had he not known that it was costing him a thousand rubles?" Other elements of note include some stunning set literary pieces, such as when Prince Andrei encounters a gnarly oak tree under two different circumstances in his life, and when Nat sha's 'Russian' soul is awakened by the strains of a folk song on the balalaika. Still, despite all of these micro- and macro-level happenings, for a long time I felt that something else was going on in War and Peace. It was difficult to put into words precisely what it was until I came across this passage by E. M. Forster:
After one has read War and Peace for a bit, great chords begin to sound, and we cannot say exactly what struck them. They do not arise from the story [and] they do not come from the episodes nor yet from the characters. They come from the immense area of Russia, over which episodes and characters have been scattered, from the sum-total of bridges and frozen rivers, forests, roads, gardens and fields, which accumulate grandeur and sonority after we have passed them. Many novelists have the feeling for place, [but] very few have the sense of space, and the possession of it ranks high in Tolstoy s divine equipment. Space is the lord of War and Peace, not time.'Space' indeed. Yes, potential readers should note the novel's great length, but the 365 chapters are actually remarkably short, so the sensation of reading it is not in the least overwhelming. And more importantly, once you become familiar with its large cast of characters, it is really not a difficult book to follow, especially when compared to the other Russian classics. My only regret is that it has taken me so long to read this magnificent novel and that I might find it hard to find time to re-read it within the next few years.
Coming Up for Air (1939) George Orwell It wouldn't be a roundup of mine without at least one entry from George Orwell, and, this year, that place is occupied by a book I hadn't haven't read in almost two decades Still, the George Bowling of Coming Up for Air is a middle-aged insurance salesman who lives in a distinctly average English suburban row house with his nuclear family. One day, after winning some money on a bet, he goes back to the village where he grew up in order to fish in a pool he remembers from thirty years before. Less important than the plot, however, is both the well-observed remarks and scathing criticisms that Bowling has of the town he has returned to, combined with an ominous sense of foreboding before the Second World War breaks out. At several times throughout the book, George's placid thoughts about his beloved carp pool are replaced by racing, anxious thoughts that overwhelm his inner peace:
War is coming. In 1941, they say. And there'll be plenty of broken crockery, and little houses ripped open like packing-cases, and the guts of the chartered accountant's clerk plastered over the piano that he's buying on the never-never. But what does that kind of thing matter, anyway? I'll tell you what my stay in Lower Binfield had taught me, and it was this. IT'S ALL GOING TO HAPPEN. All the things you've got at the back of your mind, the things you're terrified of, the things that you tell yourself are just a nightmare or only happen in foreign countries. The bombs, the food-queues, the rubber truncheons, the barbed wire, the coloured shirts, the slogans, the enormous faces, the machine-guns squirting out of bedroom windows. It's all going to happen. I know it - at any rate, I knew it then. There's no escape. Fight against it if you like, or look the other way and pretend not to notice, or grab your spanner and rush out to do a bit of face-smashing along with the others. But there's no way out. It's just something that's got to happen.Already we can hear psychological madness that underpinned the Second World War. Indeed, there is no great story in Coming Up For Air, no wonderfully empathetic characters and no revelations or catharsis, so it is impressive that I was held by the descriptions, observations and nostalgic remembrances about life in modern Lower Binfield, its residents, and how it has changed over the years. It turns out, of course, that George's beloved pool has been filled in with rubbish, and the village has been perverted by modernity beyond recognition. And to cap it off, the principal event of George's holiday in Lower Binfield is an accidental bombing by the British Royal Air Force. Orwell is always good at descriptions of awful food, and this book is no exception:
The frankfurter had a rubber skin, of course, and my temporary teeth weren't much of a fit. I had to do a kind of sawing movement before I could get my teeth through the skin. And then suddenly pop! The thing burst in my mouth like a rotten pear. A sort of horrible soft stuff was oozing all over my tongue. But the taste! For a moment I just couldn't believe it. Then I rolled my tongue around it again and had another try. It was fish! A sausage, a thing calling itself a frankfurter, filled with fish! I got up and walked straight out without touching my coffee. God knows what that might have tasted of.Many other tell-tale elements of Orwell's fictional writing are in attendance in this book as well, albeit worked out somewhat less successfully than elsewhere in his oeuvre. For example, the idea of a physical ailment also serving as a metaphor is present in George's false teeth, embodying his constant preoccupation with his ageing. (Readers may recall Winston Smith's varicose ulcer representing his repressed humanity in Nineteen Eighty-Four). And, of course, we have a prematurely middle-aged protagonist who almost but not quite resembles Orwell himself. Given this and a few other niggles (such as almost all the women being of the typical Orwell 'nagging wife' type), it is not exactly Orwell's magnum opus. But it remains a fascinating historical snapshot of the feeling felt by a vast number of people just prior to the Second World War breaking out, as well as a captivating insight into how the process of nostalgia functions and operates.
Howards End (1910) E. M. Forster Howards End begins with the following sentence:
One may as well begin with Helen s letters to her sister.In fact, "one may as well begin with" my own assumptions about this book instead. I was actually primed to consider Howards End a much more 'Victorian' book: I had just finished Virginia Woolf's Mrs Dalloway and had found her 1925 book at once rather 'modern' but also very much constrained by its time. I must have then unconsciously surmised that a book written 15 years before would be even more inscrutable, and, with its Victorian social mores added on as well, Howards End would probably not undress itself so readily in front of the reader. No doubt there were also the usual expectations about 'the classics' as well. So imagine my surprise when I realised just how inordinately affable and witty Howards End turned out to be. It doesn't have that Wildean shine of humour, of course, but it's a couple of fields over in the English countryside, perhaps abutting the more mordant social satires of the earlier George Orwell novels (see Coming Up for Air above). But now let us return to the story itself. Howards End explores class warfare, conflict and the English character through a tale of three quite different families at the beginning of the twentieth century: the rich Wilcoxes; the gentle & idealistic Schlegels; and the lower-middle class Basts. As the Bloomsbury Group Schlegel sisters desperately try to help the Basts and educate the rich but close-minded Wilcoxes, the three families are drawn ever closer and closer together. Although the whole story does, I suppose, revolve around the house in the title (which is based on the Forster's own childhood home), Howards End is perhaps best described as a comedy of manners or a novel that shows up the hypocrisy of people and society. In fact, it is surprising how little of the story actually takes place in the eponymous house, with the overwhelming majority of the first half of the book taking place in London. But it is perhaps more illuminating to remark that the Howards End of the book is a house that the Wilcoxes who own it at the start of the novel do not really need or want. What I particularly liked about Howards End is how the main character's ideals alter as they age, and subsequently how they find their lives changing in different ways. Some of them find themselves better off at the end, others worse. And whilst it is also surprisingly funny, it still manages to trade in heavier social topics as well. This is apparent in the fact that, although the characters themselves are primarily in charge of their own destinies, their choices are still constrained by the changing world and shifting sense of morality around them. This shouldn't be too surprising: after all, Forster's novel was published just four years before the Great War, a distinctly uncertain time. Not for nothing did Virginia Woolf herself later observe that "on or about December 1910, human character changed" and that "all human relations have shifted: those between masters and servants, husbands and wives, parents and children." This process can undoubtedly be seen rehearsed throughout Forster's Howards End, and it's a credit to the author to be able to capture it so early on, if not even before it was widespread throughout Western Europe. I was also particularly taken by Forster's fertile use of simile. An extremely apposite example can be found in the description Tibby Schlegel gives of his fellow Cambridge undergraduates. Here, Timmy doesn't want to besmirch his lofty idealisation of them with any banal specificities, and wishes that the idea of them remain as ideal Platonic forms instead. Or, as Forster puts it, to Timmy it is if they are "pictures that must not walk out of their frames." Wilde, at his most weakest, is 'just' style, but Forster often deploys his flair for a deeper effect. Indeed, when you get to the end of this section mentioning picture frames, you realise Forster has actually just smuggled into the story a failed attempt on Tibby's part to engineer an anonymous homosexual encounter with another undergraduate. It is a credit to Forster's sleight-of-hand that you don't quite notice what has just happened underneath you and that the books' reticence to honestly describe what has happened is thus structually analogus Tibby's reluctance to admit his desires to himself. Another layer to the character of Tibby (and the novel as a whole) is thereby introduced without the imposition of clumsy literary scaffolding. In a similar vein, I felt very clever noticing the arch reference to Debussy's Pr lude l'apr s-midi d'un faune until I realised I just fell into the trap Forster set for the reader in that I had become even more like Tibby in his pseudo-scholarly views on classical music. Finally, I enjoyed that each chapter commences with an ironic and self-conscious bon mot about society which is only slightly overblown for effect. Particularly amusing are the ironic asides on "women" that run through the book, ventriloquising the narrow-minded views of people like the Wilcoxes. The omniscient and amiable narrator of the book also recalls those ironically distant voiceovers from various French New Wave films at times, yet Forster's narrator seems to have bigger concerns in his mordant asides: Forster seems to encourage some sympathy for all of the characters even the more contemptible ones at their worst moments. Highly recommended, as are Forster's A Room with a View (1908) and his slightly later A Passage to India (1913).
The Good Soldier (1915) Ford Madox Ford The Good Soldier starts off fairly simply as the narrator's account of his and his wife's relationship with some old friends, including the eponymous 'Good Soldier' of the book's title. It's an experience to read the beginning of this novel, as, like any account of endless praise of someone you've never met or care about, the pages of approving remarks about them appear to be intended to wash over you. Yet as the chapters of The Good Soldier go by, the account of the other characters in the book gets darker and darker. Although the author himself is uncritical of others' actions, your own critical faculties are slowgrly brought into play, and you gradully begin to question the narrator's retelling of events. Our narrator is an unreliable narrator in the strict sense of the term, but with the caveat that he is at least is telling us everything we need to know to come to our own conclusions. As the book unfolds further, the narrator's compromised credibility seems to infuse every element of the novel even the 'Good' of the book's title starts to seem like a minor dishonesty, perhaps serving as the inspiration for the irony embedded in the title of The 'Great' Gatsby. Much more effectively, however, the narrator's fixations, distractions and manner of speaking feel very much part of his dissimulation. It sometimes feels like he is unconsciously skirting over the crucial elements in his tale, exactly like one does in real life when recounting a story containing incriminating ingredients. Indeed, just how much the narrator is conscious of his own concealment is just one part of what makes this such an interesting book: Ford Madox Ford has gifted us with enough ambiguity that it is also possible that even the narrator cannot find it within himself to understand the events of the story he is narrating. It was initially hard to believe that such a carefully crafted analysis of a small group of characters could have been written so long ago, and despite being fairly easy to read, The Good Soldier is an almost infinitely subtle book even the jokes are of the subtle kind and will likely get a re-read within the next few years.
Anna Karenina (1878) Leo Tolstoy There are many similar themes running through War and Peace (reviewed above) and Anna Karenina. Unrequited love; a young man struggling to find a purpose in life; a loving family; an overwhelming love of nature and countless fascinating observations about the minuti of Russian society. Indeed, rather than primarily being about the eponymous Anna, Anna Karenina provides a vast panorama of contemporary life in Russia and of humanity in general. Nevertheless, our Anna is a sophisticated woman who abandons her empty existence as the wife of government official Alexei Karenin, a colourless man who has little personality of his own, and she turns to a certain Count Vronsky in order to fulfil her passionate nature. Needless to say, this results in tragic consequences as their (admittedly somewhat qualified) desire to live together crashes against the rocks of reality and Russian society. Parallel to Anna's narrative, though, Konstantin Levin serves as the novel's alter-protagonist. In contrast to Anna, Levin is a socially awkward individual who straddles many schools of thought within Russia at the time: he is neither a free-thinker (nor heavy-drinker) like his brother Nikolai, and neither is he a bookish intellectual like his half-brother Serge. In short, Levin is his own man, and it is generally agreed by commentators that he is Tolstoy's surrogate within the novel. Levin tends to come to his own version of an idea, and he would rather find his own way than adopt any prefabricated view, even if confusion and muddle is the eventual result. In a roughly isomorphic fashion then, he resembles Anna in this particular sense, whose story is a counterpart to Levin's in their respective searches for happiness and self-actualisation. Whilst many of the passionate and exciting passages are told on Anna's side of the story (I'm thinking horse race in particular, as thrilling as anything in cinema ), many of the broader political thoughts about the nature of the working classes are expressed on Levin's side instead. These are stirring and engaging in their own way, though, such as when he joins his peasants to mow the field and seems to enter the nineteenth-century version of 'flow':
The longer Levin mowed, the more often he felt those moments of oblivion during which it was no longer his arms that swung the scythe, but the scythe itself that lent motion to his whole body, full of life and conscious of itself, and, as if by magic, without a thought of it, the work got rightly and neatly done on its own. These were the most blissful moments.Overall, Tolstoy poses no didactic moral message towards any of the characters in Anna Karenina, and merely invites us to watch rather than judge. (Still, there is a hilarious section that is scathing of contemporary classical music, presaging many of the ideas found in Tolstoy's 1897 What is Art?). In addition, just like the earlier War and Peace, the novel is run through with a number of uncannily accurate observations about daily life:
Anna smiled, as one smiles at the weaknesses of people one loves, and, putting her arm under his, accompanied him to the door of the study.... as well as the usual sprinkling of Tolstoy's sardonic humour ("No one is pleased with his fortune, but everyone is pleased with his wit."). Fyodor Dostoyevsky, the other titan of Russian literature, once described Anna Karenina as a "flawless work of art," and if you re only going to read one Tolstoy novel in your life, it should probably be this one.
parted -a optimal -s /dev/sda mklabel gpt
parted -a optimal -s /dev/sda mkpart primary vfat 0% 1G
parted -a optimal -s /dev/sda set 1 esp on
mkfs.vfat -n boot_disk -F 32 /dev/sda1
cryptsetup luksFormat /dev/sda2
cryptsetup luksOpen /dev/sda2 ENC
mkfs.btrfs -L root_disk /dev/mapper/ENC
mount -o compress=zstd:1 /dev/mapper/ENC /mnt
btrfs subvol create /mnt/@
cd /mnt/@
btrfs subvol create ./home
btrfs subvol create ./opt
mkdir -p var
btrfs subvol create ./var/log
btrfs suvol set-default /mnt/@
cd /mnt/
debootstrap --include=dbus,locales,tzdata unstable @/ http://deb.debian.org/debian
umount /mnt
mount -o compress=zstd:1 /dev/mapper/ENC /mnt
mkdir -p /mnt/boot/efi
mount /dev/sda1 /mnt/boot/efi
grml-chroot /mnt /bin/bash
apt-get update
apt-get install linux-image-amd64 dracut openssh-client \
kde-plasma-desktop plasma-workspace-wayland \
plasma-nm cryptsetup btrfs-progs sudo
LABEL="root_disk" / btrfs defaults,compress=zstd:1 0 0And the crypttab entry
ENCRYPTED_ROOT UUID=xxxx none discard,x-initrd.attachI've not written actual UUID above this is just for the purpose of showing the content of /etc/crypttab. Once these entries are added we need to recreate initrd. I just reconfigured the installed kernel package for retriggerring the recreation of initrd using dracut. .. Reconfiguration was locales is done by editing /etc/locales.gen to uncomment en_US.UTF-8 and writing /etc/timezone with Asia/Kolkata. I used DEBIAN_FRONTEND=noninteractive to avoid another prompt asking for locale and timezone information.
export DEBIAN_FRONTEND=noninteractive
dpkg-reconfigure locales
dpkg-reconfigure tzdata
systemd.gpt_auto=no quiet root=LABEL=root_diskI'm disabling systemd-gpt-generator to avoid race condition between crypttab entry and auto generated entry by systemd. I faced this mainly because of my stupidity of not adding entry root=LABEL=root_disk
apt-get install -y systemd-boot
bootctl install --make-entry-directory=yes --entry-token=machine-id
dpkg-reconfigure linux-image-6.0.0-5-amd64
apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux
We also tell DKMS that we need to rebuild the initrd when upgrading:
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
/dev/sdc
with:
sgdisk --zap-all /dev/sdc
sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
sgdisk -n2:1M:+512M -t2:EF00 /dev/sdc
sgdisk -n3:0:+1G -t3:BF01 /dev/sdc
sgdisk -n4:0:0 -t4:BF00 /dev/sdc
root@curie:/home/anarcat# sgdisk -p /dev/sdc
Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
Model: ESD-S1C
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): [REDACTED]
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 16-sector boundaries
Total free space is 14 sectors (7.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 48 2047 1000.0 KiB EF02
2 2048 1050623 512.0 MiB EF00
3 1050624 3147775 1024.0 MiB BF01
4 3147776 1953525134 930.0 GiB BF00
Unfortunately, we can't be sure of the sector size here, because the
USB controller is probably lying to us about it. Normally, this
smartctl
command should tell us the sector size as well:
root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Black Mobile
Device Model: WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Above is the example of the builtin HDD drive. But the SSD device
enclosed in that USB controller doesn't support SMART commands,
so we can't trust that it really has 512 bytes sectors.
This matters because we need to tweak the ashift
value
correctly. We're going to go ahead the SSD drive has the common 4KB
settings, which means ashift=12
.
Note here that we are not creating a separate partition for
swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and
that issue is still not fixed upstream. Ubuntu recommends using a
separate partition for swap instead. But since this is "just" a
workstation, we're betting that we will not suffer from this problem,
after hearing a report from another Debian developer running this
setup on their workstation successfully.
We do not recommend this setup though. In fact, if I were to redo this
partition scheme, I would probably use LUKS encryption and setup a
dedicated swap partition, as I had problems with ZFS encryption as
well.
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-o feature@zpool_checkpoint=enabled \
-O acltype=posixacl -O canmount=off \
-O compression=lz4 \
-O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=/boot -R /mnt \
bpool /dev/sdc3
I haven't investigated all those settings and just trust the upstream
guide on the above.
zpool create \
-o ashift=12 \
-O encryption=on -O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O relatime=on \
-O canmount=off \
-O mountpoint=/ -R /mnt \
rpool /dev/sdc4
Breaking this down:
-o ashift=12
: mentioned above, 4k sector size-O encryption=on -O keylocation=prompt -O keyformat=passphrase
:
encryption, prompt for a password, default algorithm is
aes-256-gcm
, explicit in the guide, made implicit here-O acltype=posixacl -O xattr=sa
: enable ACLs, with better
performance (not enabled by default)-O dnodesize=auto
: related to extended attributes, less
compatibility with other implementations-O compression=zstd
: enable zstd compression, can be
disabled/enabled by dataset to with zfs set compression=off
rpool/example
-O relatime=on
: classic atime
optimisation, another that could
be used on a busy server is atime=off
-O canmount=off
: do not make the pool mount automatically with
mount -a
?-O mountpoint=/ -R /mnt
: mount pool on /
in the future, but
/mnt
for now-O normalization=formD
: normalize file names on comparisons (not
storage), implies utf8only=on
, which is a bad idea (and
effectively meant my first sync failed to copy some files,
including this folder from a supysonic checkout). and this
cannot be changed after the filesystem is created. bad, bad, bad.[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.
ROOT
and BOOT
zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
zfs create -o canmount=off -o mountpoint=none bpool/BOOT
Note that it's unclear to me why those datasets are necessary, but
they seem common practice, also used in this FreeBSD
example. The OpenZFS guide mentions the Solaris upgrades and
Ubuntu's zsys that use that container for upgrades and rollbacks.
This blog post seems to explain a bit the layout behind the
installer. zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
zfs mount rpool/ROOT/debian &&
zfs create -o mountpoint=/boot bpool/BOOT/debian
I guess the debian
name here is because we could technically have
multiple operating systems with the same underlying datasets. zfs create rpool/home &&
zfs create -o mountpoint=/root rpool/home/root &&
chmod 700 /mnt/root &&
zfs create rpool/var
zfs create -o com.sun:auto-snapshot=false rpool/var/cache &&
zfs create -o com.sun:auto-snapshot=false rpool/var/tmp &&
chmod 1777 /mnt/var/tmp
zfs create -o canmount=off rpool/var/lib &&
zfs create -o com.sun:auto-snapshot=false rpool/var/lib/docker
Notice here a peculiarity: we must create rpool/var/lib
to
create rpool/var/lib/docker
otherwise we get this error:
cannot create 'rpool/var/lib/docker': parent does not exist
... and no, just creating /mnt/var/lib
doesn't fix that
problem. In fact, it makes things even more confusing because an
existing directory shadows a mountpoint, which is the opposite of
how things normally work.
Also note that you will probably need to change storage driver in
Docker, see the zfs-driver documentation for details but,
basically, I did:
echo ' "storage-driver": "zfs" ' > /etc/docker/daemon.json
Note that podman has the same problem (and similar solution):
printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
tmpfs
for /run
:
mkdir /mnt/run &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock
/srv
, as that's the HDD stuff.
Also mount the EFI partition:
mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/
At this point, everything should be mounted in /mnt
. It should look
like this:
root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/debian 899G 384K 899G 1% /mnt
bpool/BOOT/debian 832M 123M 709M 15% /mnt/boot
rpool/home 899G 256K 899G 1% /mnt/home
rpool/home/root 899G 256K 899G 1% /mnt/root
rpool/var 899G 384K 899G 1% /mnt/var
rpool/var/cache 899G 256K 899G 1% /mnt/var/cache
rpool/var/tmp 899G 256K 899G 1% /mnt/var/tmp
rpool/var/lib/docker 899G 256K 899G 1% /mnt/var/lib/docker
/dev/sdc2 511M 4.0K 511M 1% /mnt/boot/efi
Now that we have everything setup and mounted, let's copy all files
over.
for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
You can check that the list is correct with:
mount -l -t ext4,btrfs,vfat awk ' print $3 '
Note that we skip /srv
as it's on a different disk.
On the first run, we had:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
16,831,437 100% 184.14MB/s 0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
28,019,293,280 94% 47.63MB/s 0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,267,990 98% 50.71MB/s 0:10:40 (xfr#736577, to-chk=0/867732)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
24,456,268,098 98% 68.03MB/s 0:05:42 (xfr#159867, ir-chk=6875/172377)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125 93% 74.82MB/s 0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978 96% 76.82MB/s 1:05:15 (xfr#2256473, to-chk=0/2993950)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Note the failure to transfer that supysonic file? It turns out they
had a weird filename in their source tree, since then removed,
but still it showed how the utf8only
feature might not be such a bad
idea. At this point, the procedure was restarted all the way back to
"Creating pools", after unmounting all ZFS filesystems (umount
/mnt/run /mnt/boot/efi && umount -t zfs -a
) and destroying the pool,
which, surprisingly, doesn't require any confirmation (zpool destroy
rpool
).
The second run was cleaner:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/110)
syncing / to /mnt/...
28,019,033,070 97% 42.03MB/s 0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,807,102 98% 44.84MB/s 0:12:04 (xfr#736580, to-chk=0/867723)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
24,043,086,450 96% 62.03MB/s 0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507 96% 67.09MB/s 1:14:43 (xfr#2256845, to-chk=0/2994364)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Also note the transfer speed: we seem capped at 76MB/s, or
608Mbit/s. This is not as fast as I was expecting: the USB connection
seems to be at around 5Gbps:
anarcat@curie:~$ lsusb -tv head -4
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
ID 0b05:1932 ASUSTek Computer, Inc.
So it shouldn't cap at that speed. It's possible the USB adapter is
failing to give me the full speed though. It's not the M.2 SSD drive
either, as that has a ~500MB/s bandwidth, acccording to its spec.
At this point, we're about ready to do the final configuration. We
drop to single user mode and do the rest of the procedure. That used
to be shutdown now
, but it seems like the systemd switch broke that,
so now you can reboot into grub and pick the "recovery"
option. Alternatively, you might try systemctl rescue
, as I found
out.
I also wanted to copy the drive over to another new NVMe drive, but
that failed: it looks like the USB controller I have doesn't work with
older, non-NVME drives.
mount --rbind /dev /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys /mnt/sys &&
chroot /mnt /bin/bash
Next we add an extra service that imports the bpool on boot, to make
sure it survives a zpool.cache
destruction:
cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF
Enable the service:
systemctl enable zfs-import-bpool.service
I had to trim down /etc/fstab
and /etc/crypttab
to only contain
references to the legacy filesystems (/srv
is still BTRFS!).
If we don't already have a tmpfs
defined in /etc/fstab
:
ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount
Rebuild boot loader with support for ZFS, but also to workaround
GRUB's missing zpool-features support:
grub-probe /boot grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub
For good measure, make sure the right disk is configured here, for
example you might want to tag both drives in a RAID array:
dpkg-reconfigure grub-pc
Install grub to EFI while you're there:
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy
Filesystem mount ordering. The rationale here in the OpenZFS
guide is a little strange, but I don't dare ignore that.
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &
Verify that zed updated the cache by making sure these are not empty:
cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool
Once the files have data, stop zed:
fg
Press Ctrl-C.
Fix the paths to eliminate /mnt
:
sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*
Snapshot initial install:
zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install
Exit chroot:
exit
for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
Then we unmount all filesystems:
mount grep -v zfs tac awk '/\/mnt/ print $3 ' xargs -i umount -lf
zpool export -a
Reboot, swap the drives, and boot in ZFS. Hurray!
fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
fio
tests, one by one, 60 seconds
each. It should take about 12 minutes to run, as there are 3 pair of
tests, read/write, with and without async.
My bias, before building, running and analysing those results is that
ZFS should outperform the traditional stack on writes, but possibly
not on reads. It's also possible it outperforms it on both, because
it's a newer drive. A new test might be possible with a new external
USB drive as well, although I doubt I will find the time to do this.
systemctl rescue
The network might have been started before or after the test as well:
systemctl start systemd-networkd
So it should be fairly reliable as basically nothing else is running.
Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and
manually merged:
test | read I/O | read IOPS | write I/O | write IOPS |
---|---|---|---|---|
rand4k4g1x | 39.27 | 10052 | 212.15 | 54310 |
rand4k4g1x--fsync=1 | 39.29 | 10057 | 2.73 | 699 |
rand64k256m16x | 1297.00 | 20751 | 1068.57 | 17097 |
rand64k256m16x--fsync=1 | 1290.90 | 20654 | 353.82 | 5661 |
rand1m16g1x | 315.15 | 315 | 563.77 | 563 |
rand1m16g1x--fsync=1 | 345.88 | 345 | 157.01 | 157 |
test | read I/O | read IOPS | write I/O | write IOPS |
---|---|---|---|---|
rand4k4g1x | 77.20 | 19763 | 27.13 | 6944 |
rand4k4g1x--fsync=1 | 76.16 | 19495 | 6.53 | 1673 |
rand64k256m16x | 1882.40 | 30118 | 70.58 | 1129 |
rand64k256m16x--fsync=1 | 1865.13 | 29842 | 71.98 | 1151 |
rand1m16g1x | 921.62 | 921 | 102.21 | 102 |
rand1m16g1x--fsync=1 | 908.37 | 908 | 64.30 | 64 |
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
Translating this:
mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded.
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
That's double or triple the run time, from 2 seconds to 6
seconds. Most of the time is spent in run time, inside the
container. Here's the breakdown:
umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a
And disconnected the drive, to see how I would recover this system
from another Linux system in case of a total motherboard failure.
To import an existing pool, plug the device, then import the pool with
an alternate root, so it doesn't mount over your existing filesystems,
then you mount the root filesystem and all the others:
zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock
sgdisk
, but I couldn't figure
out how to do this with sgdisk
, so this uses sfdisk
to dump the
partition from the first disk to an external, identical drive:
sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-o feature@zpool_checkpoint=enabled \
-O acltype=posixacl -O xattr=sa \
-O compression=lz4 \
-O devices=off \
-O relatime=on \
-O canmount=off \
-O mountpoint=/boot -R /mnt \
bpool-tubman /dev/sdb3
The change from the main boot pool are:
sdb
used to be the M.2 device, it's now
nvme0n1
)zpool create \
-o ashift=12 \
-O encryption=on -O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O relatime=on \
-O canmount=off \
-O mountpoint=/ -R /mnt \
rpool-tubman /dev/sdb4
sanoid
command had a --readonly
argument to simulate changes,
but syncoid
didn't so I tried to fix that with an upstream PR.
It seems it would be better to do this by hand, but this was much
easier. The full first sync was:
root@curie:/home/anarcat# ./bin/syncoid -r bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create bpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================> ] 53%
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create rpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================> ] 99%
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================> ] 38%
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%
Funny how the CRITICAL ERROR
doesn't actually stop syncoid
and it
just carries on merrily doing when it's telling you it's "cowardly
refusing to destroy your existing target"... Maybe that's because my pull
request broke something though...
During the transfer, the computer was very sluggish: everything feels
like it has ~30-50ms latency extra:
anarcat@curie:sanoid$ LANG=C top -b -n 1 head -20
top - 13:07:05 up 6 days, 4:01, 1 user, load average: 16.13, 16.55, 11.83
Tasks: 606 total, 6 running, 598 sleeping, 0 stopped, 2 zombie
%Cpu(s): 18.8 us, 72.5 sy, 1.2 ni, 5.0 id, 1.2 wa, 0.0 hi, 1.2 si, 0.0 st
MiB Mem : 15898.4 total, 1387.6 free, 13170.0 used, 1340.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1319.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
70 root 20 0 0 0 0 S 83.3 0.0 6:12.67 kswapd0
4024878 root 20 0 282644 96432 10288 S 44.4 0.6 0:11.43 puppet
3896136 root 20 0 35328 16528 48 S 22.2 0.1 2:08.04 mbuffer
3896135 root 20 0 10328 776 168 R 16.7 0.0 1:22.93 zfs
3896138 root 20 0 10588 788 156 R 16.7 0.0 1:49.30 zfs
350 root 0 -20 0 0 0 R 11.1 0.0 1:03.53 z_rd_int
351 root 0 -20 0 0 0 S 11.1 0.0 1:04.15 z_rd_int
3896137 root 20 0 4384 352 244 R 11.1 0.0 0:44.73 pv
4034094 anarcat 30 10 20028 13960 2428 S 11.1 0.1 0:00.70 mbsync
4036539 anarcat 20 0 9604 3464 2408 R 11.1 0.0 0:00.04 top
352 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
353 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
354 root 0 -20 0 0 0 S 5.6 0.0 1:04.01 z_rd_int
I wonder how much of that is due to syncoid, particularly because I
often saw mbuffer
and pv
in there which are not strictly necessary
to do those kind of operations, as far as I understand.
Once that's done, export the pools to disconnect the drive:
zpool export bpool-tubman
zpool export rpool-tubman
anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s
... while both over USB, whoohoo 300MB/s!
systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now
When the scrub runs, if it finds anything it will send an event which
will get picked up by the zed
daemon which will then send a
notification, see below for an example.
TODO: deploy on curie, if possible (probably not because no RAID)
TODO: this should be in Puppet
Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
eid: 39536
class: scrub_finish
host: tubman
time: 2022-10-09 00:58:07-0400
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct 9 00:58:07 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb4 ONLINE 0 1 0
sdc4 ONLINE 0 0 0
cache
sda3 ONLINE 0 0 0
errors: No known data errors
This, in itself, is a little worrisome. But it helpfully links to this
more detailed documentation (and props up there: the link still
works) which explains this is a "minor" problem (something that could
be included in the report).
In this case, this happened on a server setup on 2021-04-28, but the
disks and server hardware are much older. The server itself
(marcos v1) was built
around 2011, over 10 years ago now. The hard drive in question is:
root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Some more SMART stats:
root@tubman:~# smartctl -a -qnoserial /dev/sdb grep -e Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes: 512 bytes logical, 4096 bytes physical
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12464 (206 202 0)
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 10966h+55m+23.757s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 21107792664
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3201579750
That's over a year of power on, which shouldn't be so bad. It has
written about 10TB of data (21107792664 LBAs * 512 byte/LBA
), which
is about two full writes. According to its specification, this
device is supposed to support 55 TB/year of writes, so we're far below
spec. Note that are still far from the "non-recoverable read error per
bits" spec (1 per 10E15), as we've basically read 13E12 bits
(3201579750 LBAs * 512 byte/LBA
= 13E12 bits).
It's likely this disk was made in 2018, so it is in its fourth
year.
Interestingly, /dev/sdc
is also a Seagate drive, but of a different
series:
root@tubman:~# smartctl -qnoserial -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
It has seen much more reads than the other disk which is also interesting:
root@tubman:~# smartctl -a -qnoserial /dev/sdc grep -e Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes: 512 bytes logical, 4096 bytes physical
9 Power_On_Hours 0x0032 059 059 000 Old_age Always - 36240
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 33994h+10m+52.118s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 30730174438
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 51894566538
That's 4 years of Head_Flying_Hours
, and over 4 years (4 years and
48 days) of Power_On_Hours
. The copyright date on that drive's
specs goes back to 2016, so it's a much older drive.
SMART self-test succeeded.
fio
. Right now, I'm just
cargo-culting stuff from other folks and I don't really like
it. stressant is a good example of my struggles, in the sense
that it doesn't really work that well for disk tests.
I would love to have just a single .fio
job file that lists multiple
jobs to run serially. For example, this file describes the above
workload pretty well:
[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1
... except the jobs are actually started in parallel, even though they
are stonewall
'd, as far as I can tell by the reports. I sent a
mail to the fio mailing list for clarification.
It looks like the jobs are started in parallel, but actual
(correctly) run serially. It seems like this might just be a matter of
reporting the right timestamps in the end, although it does feel like
starting all the processes (even if not doing any work yet) could
skew the results.
sdc
to sdd
, for example), and this would
greatly confuse ZFS.
Here, for example, is sdd
reappearing out of the blue:
May 19 11:22:53 curie kernel: [ 699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [ 699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [ 699.922433] scsi 4:0:0:0: Direct-Access ROG ESD-S1C 0 PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [ 699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [ 699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [ 699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [ 699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [ 699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [ 699.961602] sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [ 699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk
Next time I run a ZFS command (say zpool list
), the command
completely hangs (D
state) and this comes up in the logs:
May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648]
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync state:D stack: 0 pid: 997 ppid: 2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670] schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675] ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678] io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689] __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702] __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816] zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929] dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032] spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138] ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245] txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354] ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366] thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376] ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379] kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382] ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386] ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412] ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420] schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424] __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435] ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537] spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644] zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool state:D stack: 0 pid:11815 ppid: 3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995] io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004] cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118] txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223] txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325] spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430] ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
Here's another example, where you see the USB controller bleeping out
and back into existence:
mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel: __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel: ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel: spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel: zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool state:D stack: 0 pid:11815 ppid: 2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel: cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel: ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel: txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel: txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel: spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel: ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
I understand those are rather extreme conditions: I would fully expect
the pool to stop working if the underlying drives disappear. What
doesn't seem acceptable is that a command would completely hang like
this.
H
(check) matrix
of width N
, you can check your message vector (msg
) of length N
by
multipling H
and msg
, and checking if the output vector is all zero.
// scheme contains our G (generator) and
// H (check) matrices.
scheme := G: Matrix ... , H: Matrix ...
// msg contains our LDPC message (data and
// check bits).
msg := Vector ...
// N is also the length of the encoded
// msg vector after check bits have been
// added.
N := scheme.G.Width
// Now, let's generate our 'check' vector.
ch := Multiply(scheme.H, msg)
// if the ch vector is all zeros, we know
// that the message is valid, and we don't
// need to do anything.
if ch.IsZero()
// handle the case where the message
// is fine as-is.
return ...
// Expensive decode here
g
(generator) matrix out, building a bipartite graph, and
iteratively reprocessing the bit values, until constraints are satisfied and
the message has been corrected.
This got me thinking - what is the output vector when it s not all zeros?
Since 1
values in the output vector indicates consistency problems in the
message bits as they relate to the check bits, I wondered if this could be used
to speed up my LDPC decoder. It appears to work, so this post is half an attempt
to document this technique before I put it in my hot path, and half a plea for
those who do like to talk about FEC to tell me what name this technique
actually is.
// for clarity's sake, the Vector
// type is being used as the lookup
// key here, even though it may
// need to be a hash or string in
// some cases.
idx := map[Vector]int
for i := 0; i < N; i++
// Create a vector of length N
v := Vector
v.FlipBit(i)
// Now, let's use the generator matrix to encode
// the data with checksums, and then use the
// check matrix on the message to figure out what
// bit pattern results
ev := Multiply(scheme.H, Multiply(v, scheme.G))
idx[ev] = i
idx
mapping, we can now
go back to the hot path on Checking the incoming message data:
// if the ch vector is all zeros, we know
// that the message is valid, and we don't
// need to do anything.
if ch.IsZero()
// handle the case where the message
// is fine as-is.
return ...
errIdx, ok := idx[ch]
if ok
msg.FlipBit(errIdx)
// Verify the LDPC message using
// H again here.
return ...
// Expensive decode here
802.3an-2006
. Even if I was to find a collision for a
higher-order k-Bit value, I m tempted to continue with this approach, and treat
each set of bits in the Vector s bin (like a hash-table), checking the LDPC
validity after each bit set in the bin. As long as the collision rate is small
enough, it should be possible to correct k-Bits of error faster than the more
expensive Belief Propagation approach. That being said, I m not entirely
convinced collisions will be very common, but it ll take a bit more time
working through the math to say that with any confidence.
Have you seen this approach called something official in publications? See
an obvious flaw in the system? Send me a tip, please!
Under Canadian Radio-television and Telecommunications Commission (CRTC) rules in place since 2017, telecom networks are supposed to ensure that cellphones are able to contact 911 even if they do not have service.I could personally confirm that my phone couldn't reach 911 services, because all calls would fail: the problem was that towers were still up, so your phone wouldn't fall back to alternative service providers (which could have resolved the issue). I can only speculate as to why Rogers didn't take cell phone towers out of the network to let phones work properly for 911 service, but it seems like a dangerous game to play. Hilariously, the CRTC itself didn't have a reliable phone service due to the service outage:
Please note that our phone lines are affected by the Rogers network outage. Our website is still available: https://crtc.gc.ca/eng/contact/https://mobile.twitter.com/CRTCeng/status/1545421218534359041 I wonder if they will file a complaint against Rogers themselves about this. I probably should. It seems the federal government is thinking more of the same medicine will fix the problem and has told companies should "help" each other in an emergency. I doubt this will fix anything, and could actually make things worse if the competitors actually interoperate more, as it could cause multi-provider, cascading failures.
He pulls from Ethan Zuckerman s idea of a web that is plural in purpose that just as pool halls, libraries and churches each have different norms, purposes and designs, so too should different places on the internet. To achieve this, Tarnoff wants governments to pass laws that would make the big platforms unprofitable and, in their place, fund small-scale, local experiments in social media design. Instead of having platforms ruled by engagement-maximizing algorithms, Tarnoff imagines public platforms run by local librarians that include content from public media.(Links mine: the Washington Post obviously prefers to not link to the real web, and instead doesn't link to Zuckerman's site all and suggests Amazon for the book, in a cynical example.) And in another example of how the private sector has failed us, there was recently a fluke in the AMBER alert system where the entire province was warned about a loose shooter in Saint-Elz ar except the people in the town, because they have spotty cell phone coverage. In other words, millions of people received a strongly toned, "life-threatening", alert for a city sometimes hours away, except the people most vulnerable to the alert. Not missing a beat, the CAQ party is promising more of the same medicine again and giving more money to telcos to fix the problem, suggesting to spend three billion dollars in private infrastructure.
hledger
, I'm not convinced that it has been a good idea.
I'm quoting my Twitter feedback here in order to respond. The context is
handling when I have used the "wrong" card to pay for something: a card
affiliated with my family expenses for something personal, or vice versa. With
double-entry book-keeping, and one pair of transactions, the destination
account can either record the expense category:
2022-08-20 coffee
family:liabilities:creditcard -3
jon:expenses:coffee 3
or the fact it was paid for on the wrong card
2022-08-20 coffee
family:liabilities:creditcard -3
family:liabilities:jon 3 ; jon owes family
but not easily both.
https://twitter.com/pranesh/status/1516819846431789058:
When you accidentally use the family CV for personal expenses, credit the account "family:liabilities:creditcard:jon" instead of "family:liabilities:creditcard". That'll allow you to track w/ 2 postings.This is an interesting idea: create a sub-account underneath the credit card, and I would have a separate balance representing the money I owed. Before:
$ hledger bal -t
-3 family:liabilities:creditcard
3 jon:expenses:coffee
transaction
2022-08-20 coffee
family:liabilities:creditcard:jon -3
jon:expenses:coffee 3
Corresponding balances
$ hledger bal -t
-3 family:liabilities:creditcard
-3 jon
3 jon:expenses:coffee
Great. However, what process would clear the balance on that sub-account? In
practice, I don't make a separate, explicit payment to the credit card from
my personal accounts. It's paid off in full by direct debit from the family
shared account. In practice, such dues are accumulated and settled with one
off bank transfers, now and then.
Since the sub-account is still part of the credit card heirarchy, I can't
just use a set of virtual postings to consolidate that value with other
liabilities, or cover it. Any transaction in there which did not correspond
to a real transaction on the credit card would make the balance drift away
from the real-word credit statements. The only way I could see this working
would be if the direct debit that settles the credit card was artificially
split to clear the sub-account, and then the amount owed would be lost.
https://twitter.com/pranesh/status/1516819846431789058:
Else, add: family:assets:receivable:jon $3A "receivable" account would function like the "dues" accounts I described in hledger (except "receivable" is an established account type in double-entry book-keeping). Here I think Pranesh is proposing using these two accounts in addition to the others on a posting. E.g.
jon:liabilities:family:cc $-3
2022-08-20 coffee
family:liabilities:creditcard -3
jon:expenses:coffee 3
family:assets:receivable:jon 3
jon:liabilities:family -3
This balances, and we end up with two other accounts, which are tracking the
exact same thing. I only owe 3, but if you didn't know that the accounts were
"views" onto the same thing, you could mistakenly think I owed 6.
I can't see the advantage of this over just using a virtual, unbalanced posting.
Dues, Liabilities
I'd invented accounts called "dues" to track moneys owed. The more correct
term for this in accounting parlance would be "accounts receivable", as in
one of the examples above. I could instead be tracking moneys due; this is
a classic liability. Liabilities have negative balances.
jon:liabilities:family -3
This means, I owe the family 3.
Liability accounts like that are identical to "dues" accounts. A positive
balance in a Liability is a counter-intuitive way of describing moneys owed
to me, rather than by me. And, reviewing a lot of the coding I did this year,
I've got myself hopelessly confused with the signs, and made lots of errors.
Crucially, double-entry has not protected me from making those mistakes:
of course, I'm circumventing it by using unbalanced virtual postings in many
cases (although I was not consistent in where I did this), but even if I used
a pair of accounts as in the last example above, I could still screw it up.
We currently use cookies to support our use of Google Analytics on the Website and Service. Google Analytics collects information about how you use the Website and Service. [...] This helps us to provide you with a good experience when you browse our Website and use our Service and also allows us to improve our Website and our Service.When I asked Matrix people about why they were using Google Analytics, they explained this was for development purposes and they were aiming for velocity at the time, not privacy (paraphrasing here). They also included a "free to snitch" clause:
If we are or believe that we are under a duty to disclose or share your personal data, we will do so in order to comply with any legal obligation, the instructions or requests of a governmental authority or regulator, including those outside of the UK.Those are really broad terms, above and beyond what is typically expected legally. Like the current retention policies, such user tracking and ... "liberal" collaboration practices with the state set a bad precedent for other home servers. Thankfully, since the above policy was published (2017), the GDPR was "implemented" (2018) and it seems like both the Element.io privacy policy and the Matrix.org privacy policy have been somewhat improved since. Notable points of the new privacy policies:
matrix.org
serviceWe will forget your copy of your data upon your request. We will also forward your request to be forgotten onto federated homeservers. However - these homeservers are outside our span of control, so we cannot guarantee they will forget your data.It's great they implemented those mechanisms and, after all, if there's an hostile party in there, nothing can prevent them from using screenshots to just exfiltrate your data away from the client side anyways, even with services typically seen as more secure, like Signal. As an aside, I also appreciate that Matrix.org has a fairly decent code of conduct, based on the TODO CoC which checks all the boxes in the geekfeminism wiki.
matrix.org
to block
known abusers (users or servers). Bans are pretty flexible and
can operate at the user, room, or server level.
Matrix people suggest making the bot admin of your channels, because
you can't take back admin from a user once given.
This tool and Mjolnir are based on the admin API built into Synapse.
- System notify users (all users/users from a list, specific user)
- delete sessions/devices not seen for X days
- purge the remote media cache
- select rooms with various criteria (external/local/empty/created by/encrypted/cleartext)
- purge history of theses rooms
- shutdown rooms
+R
mode ("only
registered users can join") by default, except that anyone can
register their own homeserver, which makes this limited.
Server admins can block IP addresses and home servers, but those tools
are not easily available to room admins. There is an API
(m.room.server_acl
in /devtools
) but it is not reliable
(thanks Austin Huang for the clarification).
Matrix has the concept of guest accounts, but it is not used very
much, and virtually no client or homeserver supports it. This contrasts with the way
IRC works: by default, anyone can join an IRC network even without
authentication. Some channels require registration, but in general you
are free to join and look around (until you get blocked, of course).
I have seen anecdotal evidence (CW: Twitter, nitter link) that "moderating bridges is hell", and
I can imagine why. Moderation is already hard enough on one
federation, when you bridge a room with another network, you inherit
all the problems from that network but without the entire abuse
control tools from the original network's API...
joe
on server B,
they will hijack that room on that specific server. This will not
(necessarily) affect users on the other servers, as servers could
refuse parts of the updates or ban the compromised account (or
server).
It does seem like a major flaw that room credentials are bound to
Matrix identifiers, as opposed to the E2E encryption credentials. In
an encrypted room even with fully verified members, a compromised or
hostile home server can still take over the room by impersonating an
admin. That admin (or even a newly minted user) can then send events
or listen on the conversations.
This is even more frustrating when you consider that Matrix events are
actually signed and therefore have some authentication attached
to them, acting like some sort of Merkle tree (as it contains a link
to previous events). That signature, however, is made from the
homeserver PKI keys, not the client's E2E keys, which makes E2E feel
like it has been "bolted on" later.
connect
block on both servers@anarcat:matrix.org
) is bound to
that specific home server. If that server goes down, that user is
completely disconnected. They could register a new account elsewhere
and reconnect, but then they basically lose all their configuration:
contacts, joined channels are all lost.
(Also notice how the Matrix IDs don't look like a typical user address
like an email in XMPP. They at least did their homework and got the
allocation for the scheme.)
#room:matrix.org
is also
visible as #room:example.com
on the example.com
home server. Both
addresses refer to the same room underlying room.
(Finding this in the Element settings is not obvious though, because
that "alias" are actually called a "local address" there. So to create
such an alias (in Element), you need to go in the room settings'
"General" section, "Show more" in "Local address", then add the alias
name (e.g. foo
), and then that room will be available on your
example.com
homeserver as #foo:example.com
.)
So a room doesn't belong to a server, it belongs to the federation,
and anyone can join the room from any serer (if the room is public, or
if invited otherwise). You can create a room on server A and when a
user from server B joins, the room will be replicated on server B as
well. If server A fails, server B will keep relaying traffic to
connected users and servers.
A room is therefore not fundamentally addressed with the above alias,
instead ,it has a internal Matrix ID, which basically a random
string. It has a server name attached to it, but that was made just to
avoid collisions. That can get a little confusing. For example, the
#fractal:gnome.org
room is an alias on the gnome.org
server, but
the room ID is !hwiGbsdSTZIwSRfybq:matrix.org
. That's because the
room was created on matrix.org
, but the preferred branding is
gnome.org
now.
As an aside, rooms, by default, live forever, even after the last user
quits. There's an admin API to delete rooms and a tombstone
event to redirect to another one, but neither have a GUI yet. The
latter is part of MSC1501 ("Room version upgrades") which allows
a room admin to close a room, with a message and a pointer to another
room.
matrix.example.com
) must never change in
the future, as renaming home servers is not supported.
The documentation used to say you could "run a hot spare" but that has
been removed. Last I heard, it was not possible to run a
high-availability setup where multiple, separate locations could
replace each other automatically. You can have high performance
setups where the load gets distributed among workers, but those
are based on a shared database (Redis and PostgreSQL) backend.
So my guess is it would be possible to create a "warm" spare server of
a matrix home server with regular PostgreSQL replication, but
that is not documented in the Synapse manual. This sort of setup
would also not be useful to deal with networking issues or denial of
service attacks, as you will not be able to spread the load over
multiple network locations easily. Redis and PostgreSQL heroes are
welcome to provide their multi-primary solution in the comments. In
the meantime, I'll just point out this is a solution that's handled
somewhat more gracefully in IRC, by having the possibility of
delegating the authentication layer.
.well-known
pattern (or SRV
records, but that's "not recommended" and a bit confusing) to
delegate that service to another server. Be warned that the server
still needs to be explicitly configured for your domain. You can't
just put:
"m.server": "matrix.org:443"
... on https://example.com/.well-known/matrix/server
and start using
@you:example.com
as a Matrix ID. That's because Matrix doesn't
support "virtual hosting" and you'd still be connecting to rooms and
people with your matrix.org
identity, not example.com
as you would
normally expect. This is also why you cannot rename your home
server.
The server discovery API is what allows servers to find each
other. Clients, on the other hand, use the client-server discovery
API: this is what allows a given client to find your home server
when you type your Matrix ID on login.
matrix.debian.social
) takes a few minutes
and then fails. That is because the home server has to sync the
entire room state when you join the room. There was promising work on
this announced in the lengthy 2021 retrospective, and some of
that work landed (partial sync) in the 1.53 release already.
Other improvements coming include sliding sync, lazy loading
over federation, and fast room joins. So that's actually
something that could be fixed in the fairly short term.
But in general, communication in Matrix doesn't feel as "snappy" as on
IRC or even Signal. It's hard to quantify this without instrumenting a
full latency test bed (for example the tools I used in the terminal
emulators latency tests), but
even just typing in a web browser feels slower than typing in a xterm
or Emacs for me.
Even in conversations, I "feel" people don't immediately respond as
fast. In fact, this could be an interesting double-blind experiment to
make: have people guess whether they are talking to a person on
Matrix, XMPP, or IRC, for example. My theory would be that people
could notice that Matrix users are slower, if only because of the TCP
round-trip time each message has to take.
matrix:
scheme, it's just not exactly clear what they should do with
it, especially when the handler is just another web page (e.g. Element
web).
In general, when compared with tools like Signal or WhatsApp, Matrix
doesn't fare so well in terms of user discovery. I probably have some
of my normal contacts that have a Matrix account as well, but there's
really no way to know. It's kind of creepy when Signal tells you
"this person is on Signal!" but it's also pretty cool that it works,
and they actually implemented it pretty well.
Registration is also less obvious: in Signal, the app confirms your
phone number automatically. It's friction-less and quick. In Matrix,
you need to learn about home servers, pick one, register (with a
password! aargh!), and then setup encryption keys (not default),
etc. It's a lot more friction.
And look, I understand: giving away your phone number is a huge
trade-off. I don't like it either. But it solves a real problem and
makes encryption accessible to a ton more people. Matrix does have
"identity servers" that can serve that purpose, but I don't feel
confident sharing my phone number there. It doesn't help that the
identity servers don't have private contact discovery: giving them
your phone number is a more serious security compromise than with
Signal.
There's a catch-22 here too: because no one feels like giving away
their phone numbers, no one does, and everyone assumes that stuff
doesn't work anyways. Like it or not, Signal forcing people to
divulge their phone number actually gives them critical mass that
means actually a lot of my relatives are on Signal and I don't have
to install crap like WhatsApp to talk with them.
/READ
command for
the latter:
/ALIAS READ script exec \$_->activity(0) for Irssi::windows
And yes, that's a Perl script in my IRC client. I am not aware of any
Matrix client that does stuff like that, except maybe Weechat, if we
can call it a Matrix client, or Irssi itself, now that it has a
Matrix plugin (!).
As for other clients, I have looked through the Matrix Client
Matrix (confusing right?) to try to figure out which one to try,
and, even after selecting Linux
as a filter, the chart is just too
wide to figure out anything. So I tried those, kind of randomly:
weechat-matrix
or gomuks
. At least Weechat
is scriptable so I could continue playing the power-user. Right now my
strategy with messaging (and that includes microblogging like Twitter
or Mastodon) is that everything goes through my IRC client, so Weechat
could actually fit well in there. Going with gomuks
, on the other
hand, would mean running it in parallel with Irssi or ... ditching
IRC, which is a leap I'm not quite ready to take just yet.
Oh, and basically none of those clients (except Nheko and Element)
support VoIP, which is still kind of a second-class citizen in
Matrix. It does not support large multimedia rooms, for example:
Jitsi was used for FOSDEM instead of the native videoconferencing
system.
matrix.org
publishes a (federated) block list
of hostile servers (#matrix-org-coc-bl:matrix.org
, yes, of course
it's a room).
Interestingly, Email is also in that stage, where there are block
lists of spammers, and it's a race between those blockers and
spammers. Large email providers, obviously, are getting closer to the
EFnet stage: you could consider they only accept email from themselves
or between themselves. It's getting increasingly hard to deliver mail
to Outlook and Gmail for example, partly because of bias against small
providers, but also because they are including more and more
machine-learning tools to sort through email and those systems are,
fundamentally, unknowable. It's not quite the same as splitting the
federation the way EFnet did, but the effect is similar.
HTTP has somehow managed to live in a parallel universe, as it's
technically still completely federated: anyone can start a web server
if they have a public IP address and anyone can connect to it. The
catch, of course, is how you find the darn thing. Which is how Google
became one of the most powerful corporations on earth, and how they
became the gatekeepers of human knowledge online.
I have only briefly mentioned XMPP here, and my XMPP fans will
undoubtedly comment on that, but I think it's somewhere in the middle
of all of this. It was co-opted by Facebook and Google, and
both corporations have abandoned it to its fate. I remember fondly the
days where I could do instant messaging with my contacts who had a
Gmail account. Those days are gone, and I don't talk to anyone over
Jabber anymore, unfortunately. And this is a threat that Matrix still
has to face.
It's also the threat Email is currently facing. On the one hand
corporations like Facebook want to completely destroy it and have
mostly succeeded: many people just have an email account to
register on things and talk to their friends over Instagram or
(lately) TikTok (which, I know, is not Facebook, but they started that
fire).
On the other hand, you have corporations like Microsoft and Google who
are still using and providing email services because, frankly, you
still do need email for stuff, just like fax is still around
but they are more and more isolated in their own silo. At this point,
it's only a matter of time they reach critical mass and just decide
that the risk of allowing external mail coming in is not worth the
cost. They'll simply flip the switch and work on an allow-list
principle. Then we'll have closed the loop and email will be
dead, just like IRC is "dead" now.
I wonder which path Matrix will take. Could it liberate us from these
vicious cycles?
Update: this generated some discussions on lobste.rs.
Needs to be able to create two copies always. Can get stuck in irreversible read-only mode if only one copy can be made.Even as of now, RAID-1 and RAID-10 has this note:
The simple redundancy RAID levels utilize different mirrors in a way that does not achieve the maximum performance. The logic can be improved so the reads will spread over the mirrors evenly or based on device congestion.Granted, that's not a stability concern anymore, just performance. A reviewer of a draft of this article actually claimed that BTRFS only reads from one of the drives, which hopefully is inaccurate, but goes to show how confusing all this is. There are other warnings in the Debian wiki that are quite scary. Even the legendary Arch wiki has a warning on top of their BTRFS page, still. Even if those issues are now fixed, it can be hard to tell when they were fixed. There is a changelog by feature but it explicitly warns that it doesn't know "which kernel version it is considered mature enough for production use", so it's also useless for this. It would have been much better if BTRFS was released into the world only when those bugs were being completely fixed. Or that, at least, features were announced when they were stable, not just "we merged to mainline, good luck". Even now, we get mixed messages even in the official BTRFS documentation which says "The Btrfs code base is stable" (main page) while at the same time clearly stating unstable parts in the status page (currently RAID56). There are much harsher BTRFS critics than me out there so I will stop here, but let's just say that I feel a little uncomfortable trusting server data with full RAID arrays to BTRFS. But surely, for a workstation, things should just work smoothly... Right? Well, let's see the snags I hit.
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931,5G 0 disk
sda1 8:1 0 200M 0 part /boot/efi
sda2 8:2 0 1G 0 part /boot
sda3 8:3 0 7,8G 0 part
fedora_swap 253:5 0 7.8G 0 crypt [SWAP]
sda4 8:4 0 922,5G 0 part
fedora_crypt 253:4 0 922,5G 0 crypt /
(This might not entirely be accurate: I rebuilt this from the Debian
side of things.)
This is pretty straightforward, except for the swap partition:
normally, I just treat swap like any other logical volume and create
it in a logical volume. This is now just speculation, but I bet it was
setup this way because "swap" support was only added in BTRFS 5.0.
I fully expect BTRFS experts to yell at me now because this is an old
setup and BTRFS is so much better now, but that's exactly the point
here. That setup is not that old (2018? old? really?), and migrating
to a new partition scheme isn't exactly practical right now. But let's
move on to more practical considerations.
/dev/nvme0n1
and nvme1n1
/dev/md1
vg_tbbuild05
(multiple PVs can be added to a single VG which is
why there is that abstraction)NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
nvme0n1 259:0 0 1.7T 0 disk
nvme0n1p1 259:1 0 8M 0 part
nvme0n1p2 259:2 0 512M 0 part
md0 9:0 0 511M 0 raid1 /boot
nvme0n1p3 259:3 0 1.7T 0 part
md1 9:1 0 1.7T 0 raid1
crypt_dev_md1 253:0 0 1.7T 0 crypt
vg_tbbuild05-root 253:1 0 30G 0 lvm /
vg_tbbuild05-swap 253:2 0 125.7G 0 lvm [SWAP]
vg_tbbuild05-srv 253:3 0 1.5T 0 lvm /srv
nvme0n1p4 259:4 0 1M 0 part
I stripped the other nvme1n1
disk because it's basically the same.
Now, if we look at my BTRFS-enabled workstation, which doesn't even
have RAID, we have the following:
/dev/sda
with, again, /dev/sda4
being where BTRFS livesfedora_crypt
, which is, confusingly, kind of like a
volume group. it's where everything lives. i think.home
, root
, /
, etc. those are actually the things
that get mounted. you'd think you'd mount a filesystem, but no, you
mount a subvolume. that is backwards.lsblk
:
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931,5G 0 disk
sda1 8:1 0 200M 0 part /boot/efi
sda2 8:2 0 1G 0 part /boot
sda3 8:3 0 7,8G 0 part [SWAP]
sda4 8:4 0 922,5G 0 part
fedora_crypt 253:4 0 922,5G 0 crypt /srv
Notice how we don't see all the BTRFS volumes here? Maybe it's because
I'm mounting this from the Debian side, but lsblk
definitely gets
confused here. I frankly don't quite understand what's going on, even
after repeatedly looking around the rather dismal
documentation. But that's what I gather from the following
commands:
root@curie:/home/anarcat# btrfs filesystem show
Label: 'fedora' uuid: 5abb9def-c725-44ef-a45e-d72657803f37
Total devices 1 FS bytes used 883.29GiB
devid 1 size 922.47GiB used 916.47GiB path /dev/mapper/fedora_crypt
root@curie:/home/anarcat# btrfs subvolume list /srv
ID 257 gen 108092 top level 5 path home
ID 258 gen 108094 top level 5 path root
ID 263 gen 108020 top level 258 path root/var/lib/machines
I only got to that point through trial and error. Notice how I use an
existing mountpoint to list the related subvolumes. If I try to use
the filesystem path, the one that's listed in filesystem show
, I
fail:
root@curie:/home/anarcat# btrfs subvolume list /dev/mapper/fedora_crypt
ERROR: not a btrfs filesystem: /dev/mapper/fedora_crypt
ERROR: can't access '/dev/mapper/fedora_crypt'
Maybe I just need to use the label? Nope:
root@curie:/home/anarcat# btrfs subvolume list fedora
ERROR: cannot access 'fedora': No such file or directory
ERROR: can't access 'fedora'
This is really confusing. I don't even know if I understand this
right, and I've been staring at this all afternoon. Hopefully, the
lazyweb will correct me eventually.
(As an aside, why are they called "subvolumes"? If something is a
"sub" of "something else", that "something else" must exist
right? But no, BTRFS doesn't have "volumes", it only has
"subvolumes". Go figure. Presumably the filesystem still holds "files"
though, at least empirically it doesn't seem like it lost anything so
far.
In any case, at least I can refer to this section in the future, the
next time I fumble around the btrfs
commandline, as I surely will. I
will possibly even update this section as I get better at it, or based
on my reader's judicious feedback.
/etc/fstab
,
on the Debian side of things:
UUID=5abb9def-c725-44ef-a45e-d72657803f37 /srv btrfs defaults 0 2
This thankfully ignores all the subvolume nonsense because it relies
on the UUID. mount
tells me that's actually the "root" (? /
?)
subvolume:
root@curie:/home/anarcat# mount grep /srv
/dev/mapper/fedora_crypt on /srv type btrfs (rw,relatime,space_cache,subvolid=5,subvol=/)
Let's see if I can mount the other volumes I have on there. Remember
that subvolume list
showed I had home
, root
, and
var/lib/machines
. Let's try root
:
mount -o subvol=root /dev/mapper/fedora_crypt /mnt
Interestingly, root
is not the same as /
, it's a different
subvolume! It seems to be the Fedora root (/
, really) filesystem. No
idea what is happening here. I also have a home
subvolume, let's
mount it too, for good measure:
mount -o subvol=home /dev/mapper/fedora_crypt /mnt/home
Note that lsblk
doesn't notice those two new mountpoints, and that's
normal: it only lists block devices and subvolumes (rather
inconveniently, I'd say) do not show up as devices:
root@curie:/home/anarcat# lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
sda 8:0 0 931,5G 0 disk
sda1 8:1 0 200M 0 part
sda2 8:2 0 1G 0 part
sda3 8:3 0 7,8G 0 part
sda4 8:4 0 922,5G 0 part
fedora_crypt 253:4 0 922,5G 0 crypt /srv
This is really, really confusing. Maybe I did something wrong in the
setup. Maybe it's because I'm mounting it from outside Fedora. Either
way, it just doesn't feel right.
root@curie:/home/anarcat# df -h /srv /mnt /mnt/home
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/fedora_crypt 923G 886G 31G 97% /srv
/dev/mapper/fedora_crypt 923G 886G 31G 97% /mnt
/dev/mapper/fedora_crypt 923G 886G 31G 97% /mnt/home
(Notice, in passing, that it looks like the same filesystem is mounted
in different places. In that sense, you'd expect /srv
and /mnt
(and /mnt/home
?!) to be exactly the same, but no: they are entirely
different directory structures, which I will not call "filesystems"
here because everyone's head will explode in sparks of confusion.)
Yes, disk space is shared (that's the Size
and Avail
columns,
makes sense). But nope, no cookie for you: they all have the same
Used
columns, so you need to actually walk the entire filesystem to
figure out what each disk takes.
(For future reference, that's basically:
root@curie:/home/anarcat# time du -schx /mnt/home /mnt /srv
124M /mnt/home
7.5G /mnt
875G /srv
883G total
real 2m49.080s
user 0m3.664s
sys 0m19.013s
And yes, that was painfully slow.)
ZFS actually has some oddities in that regard, but at least it tells
me how much disk each volume (and snapshot) takes:
root@tubman:~# time df -t zfs -h
Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/debian 3.5T 1.4G 3.5T 1% /
rpool/var/tmp 3.5T 384K 3.5T 1% /var/tmp
rpool/var/spool 3.5T 256K 3.5T 1% /var/spool
rpool/var/log 3.5T 2.0G 3.5T 1% /var/log
rpool/home/root 3.5T 2.2G 3.5T 1% /root
rpool/home 3.5T 256K 3.5T 1% /home
rpool/srv 3.5T 80G 3.5T 3% /srv
rpool/var/cache 3.5T 114M 3.5T 1% /var/cache
bpool/BOOT/debian 571M 90M 481M 16% /boot
real 0m0.003s
user 0m0.002s
sys 0m0.000s
That's 56360 times faster, by the way.
But yes, that's not fair: those in the know will know there's a
different command to do what df
does with BTRFS filesystems, the
btrfs filesystem usage
command:
root@curie:/home/anarcat# time btrfs filesystem usage /srv
Overall:
Device size: 922.47GiB
Device allocated: 916.47GiB
Device unallocated: 6.00GiB
Device missing: 0.00B
Used: 884.97GiB
Free (estimated): 30.84GiB (min: 27.84GiB)
Free (statfs, df): 30.84GiB
Data ratio: 1.00
Metadata ratio: 2.00
Global reserve: 512.00MiB (used: 0.00B)
Multiple profiles: no
Data,single: Size:906.45GiB, Used:881.61GiB (97.26%)
/dev/mapper/fedora_crypt 906.45GiB
Metadata,DUP: Size:5.00GiB, Used:1.68GiB (33.58%)
/dev/mapper/fedora_crypt 10.00GiB
System,DUP: Size:8.00MiB, Used:128.00KiB (1.56%)
/dev/mapper/fedora_crypt 16.00MiB
Unallocated:
/dev/mapper/fedora_crypt 6.00GiB
real 0m0,004s
user 0m0,000s
sys 0m0,004s
Almost as fast as ZFS's df! Good job. But wait. That doesn't actually
tell me usage per subvolume. Notice it's filesystem usage
, not
subvolume usage
, which unhelpfully refuses to exist. That command
only shows that one "filesystem" internal statistics that are pretty
opaque.. You can also appreciate that it's wasting 6GB of
"unallocated" disk space there: I probably did something Very Wrong
and should be punished by Hacker News. I also wonder why it has 1.68GB
of "metadata" used...
At this point, I just really want to throw that thing out of the
window and restart from scratch. I don't really feel like learning the
BTRFS internals, as they seem oblique and completely bizarre to me. It
feels a little like the state of PHP now: it's actually pretty solid,
but built upon so many layers of cruft that I still feel it corrupts
my brain every time I have to deal with it (needle or haystack first?
anyone?)...
debian/changelog
at least:
procmail (3.22-1) unstable; urgency=low
* New upstream release, which uses the standard' format for Maildir
filenames and retries on name collision. It also contains some
bug fixes from the 3.23pre snapshot dated 2001-09-13.
* Removed sendmail' from the Recommends field, since we already
have exim' (the default Debian MTA) and mail-transport-agent'.
* Removed suidmanager support. Conflicts: suidmanager (<< 0.50).
* Added support for DEB_BUILD_OPTIONS in the source package.
* README.Maildir: Do not use locking on the example recipe,
since it's wrong to do so in this case.
-- Santiago Vila <sanvila@debian.org> Wed, 21 Nov 2001 09:40:20 +0100
All Debian suites from buster onwards ship the 3.22-26 release,
although the maintainer just pushed a 3.22-27 release to fix a seven
year old null pointer dereference, after this article was drafted.
Procmail is also shipped in all major distributions: Fedora
and its derivatives, Debian derivatives, Gentoo, Arch,
FreeBSD, OpenBSD. We all seem to be ignoring this problem.
The upstream website (http://procmail.org/) has been down since
about 2015, according to Debian bug #805864, with no change
since.
In effect, every distribution is currently maintaining its fork of
this dead program.
Note that, after filing a bug to keep Debian from shipping
procmail in a stable release again, I was told that the
Debian maintainer is apparently in contact with the upstream. And,
surprise! they still plan to release that fabled 3.23 release, which
has been now in "pre-release" for all those twenty years.
In fact, it turns out that 3.23 is considered released already, and
that the procmail author actually pushed a 3.24 release, codenamed
"Two decades of fixes". That amounts to 25 commits since 3.23pre
some of which address serious security issues, but none of which
address fundamental issues with the code base.
root:mail
in
Debian. There's no debconf
or pre-seed setting that can change
this. There has been two bug reports against the Debian to make this
configurable (298058, 264011), but both were closed to say
that, basically, you should use dpkg-statoverride
to change the
permissions on the binary.
So if anything, you should immediately run this command on any host
that you have procmail
installed on:
dpkg-statoverride --update --add root root 0755 /usr/bin/procmail
Note that this might break email delivery. It might also not work at
all, thanks to usrmerge. Not sure. Yes, everything is on
fire. This is fine.
In my opinion, even assuming we keep procmail in Debian, that default
should be reversed. It should be up to people installing procmail to
assign it those dangerous permissions, after careful consideration of
the risk involved.
The last maintainer of procmail explicitly advised us (in that null
pointer dereference bug) and other projects (e.g. OpenBSD, in [2])
to stop shipping it, back in 2014. Quote:
Executive summary: delete the procmail port; the code is not safe and should not be used as a basis for any further work.I just read some of the code again this morning, after the original author claimed that procmail was active again. It's still littered with bizarre macros like:
#define bit_set(name,which,value) \
(value?(name[bit_index(which)] =bit_mask(which)):\
(name[bit_index(which)]&=~bit_mask(which)))
... from regexp.c, line 66 (yes, that's a custom regex
engine). Or this one:
#define jj (aleps.au.sopc)
It uses insecure functions like strcpy
extensively. malloc()
is thrown around goto
s like it's 1984 all over again. (To be fair,
it has been feeling like 1984 a lot lately, but that's another matter
entirely.)
That null pointer deref bug? It's fixed upstream now, in this
commit merged a few hours ago, which I presume might be in response
to my request to remove procmail from Debian.
So while that's nice, this is the just tip of the iceberg. I speculate
that one could easily find an exploitable crash in procmail if only by
running it through a fuzzer. But I don't need to speculate: procmail
had, for years, serious security issues that could possibly lead to
root privilege escalation, remotely exploitable if procmail is (as
it's designed to do) exposed to the network.
Maybe I'm overreacting. Maybe the procmail author will go through the
code base and do a proper rewrite. But I don't think that's what is in
the cards right now. What I expect will happen next is that people
will start fuzzing procmail, throw an uncountable number of bug
reports at it which will get fixed in a trickle while never fixing the
underlying, serious design flaws behind procmail.
procmail(1)
itself are typically part of mail
servers. For example, Dovecot has its own LDA which implements
the standard Sieve language (RFC 5228). (Interestingly, Sieve
was published as RFC 3028 in 2001, before procmail was formally
abandoned.)
Courier also has "maildrop" which has its own filtering mechanism,
and there is fdm (2007) which is a fetchmail and procmail
replacement. Update: there's also mailprocessing, which is not
an LDA, but processing an existing folder. It was, however,
specifically designed to replace complex Procmail rules.
But procmail, of course, doesn't just ship procmail; that would just
be too easy. It ships mailstat(1)
which we could probably ignore
because it only parses procmail log files. But more importantly, it
also ships:
lockfile(1)
- conditional semaphore-file creatorformail(1)
- mail (re)formatterlockfile(1)
already has a somewhat acceptable replacement in the form of
flock(1)
, part of util-linux (which is Essential, so installed on
any normal Debian system). It might not be a direct drop-in
replacement, but it should be close enough.
formail(1)
is similar: the courier maildrop
package ships
reformail(1)
which is, presumably, a rewrite of formail. It's
unclear if it's a drop-in replacement, but it should probably possible
to port uses of formail to it easily.
Update: theThe real challenge is, of course, migrating those oldmaildrop
package ships a SUID root binary (two, even). So if you want onlyreformail(1)
, you might want to disable that with:It would be perhaps better to havedpkg-statoverride --update --add root root 0755 /usr/bin/lockmail.maildrop dpkg-statoverride --update --add root root 0755 /usr/bin/maildrop
reformail(1)
as a separate package, see bug 1006903 for that discussion.
.procmailrc
recipes to Sieve (basically). I added a few examples in the appendix
below. You might notice the Sieve examples are easier to read, which
is a nice added bonus.
procmail
installed everywhere, possibly because userdir-ldap was using
it for lockfile
until 2019. I sent a patch to fix that and scrambled
to remove get rid of procmail everywhere. That took about a day.
But many other sites are now in that situation, possibly not imagining
they have this glaring security hole in their infrastructure.
~/.dovecot.sieve
.
user+foo@example.com
to the folder
foo
. You might write something like this in procmail:
MAILDIR=$HOME/Maildir/
DEFAULT=$MAILDIR
LOGFILE=$HOME/.procmail.log
VERBOSE=off
EXTENSION=$1 # Need to rename it - ?? does not like $1 nor 1
:0
* EXTENSION ?? [a-zA-Z0-9]+
.$EXTENSION/
That, in sieve language, would be:
require ["variables", "envelope", "fileinto", "subaddress"];
########################################################################
# wildcard +extension
# https://doc.dovecot.org/configuration_manual/sieve/examples/#plus-addressed-mail-filtering
if envelope :matches :detail "to" "*"
# Save name in $ name in all lowercase
set :lower "name" "$ 1 ";
fileinto "$ name ";
stop;
Subject:
line having FreshPorts
in it into the freshports
folder, and mails from alternc.org
mailing lists into the alternc
folder:
:0
## mailing list freshports
* ^Subject.*FreshPorts.*
.freshports/
:0
## mailing list alternc
* ^List-Post.*mailto:.*@alternc.org.*
.alternc/
Equivalent Sieve:
if header :contains "subject" "FreshPorts"
fileinto "freshports";
elsif header :contains "List-Id" "alternc.org"
fileinto "alternc";
:0
* ^Subject: Cron
* ^From: .*root@
.rapports/
Would look something like this in Sieve:
if header :comparator "i;octet" :contains "Subject" "Cron"
if header :regex :comparator "i;octet" "From" ".*root@"
fileinto "rapports";
Note that this is what the automated converted does (below). It's not
very readable, but it works.
if header :contains "Precedence" "bulk"
fileinto "bulk";
if exists "List-Id"
fileinto "lists";
if anyof (header :contains "from" "example.com",
header :contains ["to", "cc"] "anarcat@example.com")
fileinto "example";
You can even pile up a bunch of options together to have one big rule
with multiple patterns:
if anyof (exists "X-Cron-Env",
header :contains ["subject"] ["security run output",
"monthly run output",
"daily run output",
"weekly run output",
"Debian Package Updates",
"Debian package update",
"daily mail stats",
"Anacron job",
"nagios",
"changes report",
"run output",
"[Systraq]",
"Undelivered mail",
"Postfix SMTP server: errors from",
"backupninja",
"DenyHosts report",
"Debian security status",
"apt-listchanges"
],
header :contains "Auto-Submitted" "auto-generated",
envelope :contains "from" ["nagios@",
"logcheck@",
"root@"])
fileinto "rapports";
Next.