Search Results: "tach"

20 October 2021

Arturo Borrero Gonz lez: Iterating on how we do NFS at Wikimedia Cloud Services

Logos This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. NFS is a central piece of infrastructure that is essential to services like Toolforge. Recently, the Cloud Services team at Wikimedia had been reviewing how we do NFS. The current situation NFS is a central piece of technology for some of the services that the Wikimedia Cloud Services team offers to the community. We have several shares that power different use cases: Toolforge user home directories live on NFS, and Cloud VPS users can also access dumps using this protocol. The current setup involves several physical hardware servers, with about 20TB of storage, offering shares over 10G links to the cloud. For the system to be more fault-tolerant, we duplicate each share for redundancy using DRBD. Running NFS on dedicated hardware servers has traditionally offered us advantages: mostly on the performance and the capacity fields. As time has passed, we have been enumerating more and more reasons to review how we do NFS. For one, the current setup is in violation of some of our internal rules regarding realm separation. Additionally, we had been longing for additional flexibility managing our servers: we wanted to use virtual machines managed by Openstack Nova. The DRBD-based high-availability system required mostly a hand-crafted procedure for failover/failback. There s also some scalability concerns as NFS is easy to grow up, but not to grow horizontally, and of course, we have to be able to keep the tenancy setup while doing so, something that NFS does by using LDAP/Unix users and may get complicated too when growing. In general, the servers have become too big to fail , clearly technical debt, and it has taken us years to decide on taking on the task to rethink the architecture. It s worth mentioning that in an ideal world, we wouldn t depend on NFS, but the truth is that it will still be a central piece of infrastructure for years to come in services like Toolforge. Over a series of brainstorming meetings, the WMCS team evaluated the situation and sorted out the many moving parts. The team managed to boil down the potential service future to two competing options: Then we decided to research both options in parallel. For a number of reasons, the evaluation was timeboxed to three weeks. Both ideas had a couple of points in common: the NFS data would be stored on our Ceph farm via Cinder volumes, and we would rely on Ceph reliability to avoid using DRBD. Another open topic was how to back up data from Ceph, to store our important bits in more than one basket. We will get to the back up topic later. The manila experiment The Wikimedia Foundation was an early adopter of some Openstack components (Nova, Glance, Designate, Horizon), but Manila was never evaluated for usage until now. Our approach for this experiment was to closely follow the upstream guidelines. We read the documentation and tried to understand the different setups you can build with Manila. As we often feel with other Openstack components, the documentation doesn t perfectly describe how to introduce a given component in your particular local setup. Here we use an admin-controller flat-topology Neutron network. This network is shared by all tenants (or projects) in our Openstack deployment. Also, Manila can use many different driver backends, for things like NetApps or CephFS that we don t use , yet. After some research, the generic driver was the one that seemed to better fit our use case. The generic driver leverages Nova virtual machines instances plus Cinder volume to create and manage the shares. In general, Manila supports two operational modes, whether it should create/destroy the share servers (i.e, the virtual machine instances) or not. This option is called driver_handles_share_server (or DHSS) and takes a boolean value. We were interested in trying with DHSS=true, to really benefit from the potential of the setup. Manila diagram NFS idea 6, original image in Wikitech So, after sorting all these variables, we moved on with our initial testing. We built a PoC setup as depicted in the diagram above, with the manila-share component running in a virtual machine inside the cloud. The PoC led to us reporting several bugs upstream: In some cases we tried to address these bugs ourselves: It s worth mentioning that the upstream community was extra-welcoming to us, and we re thankful for that. However, at the end of our three-week period, our Manila setup still wasn t working as expected. Your experience may change with other drivers perhaps the ZFSonLinux or the CephFS ones. In general, we were having trouble making the setup work as expected, so we decided to abandon this approach in favor of the other option we were considering at the beginning. Simple virtual machine serving NFS The alternative was to create a Nova virtual machine instance by hand and to configure it using puppet. We have been investing in an automation framework lately, so the idea is to not actually create the server by hand. Anyway, the data would be decoupled from the instance into Cinder volumes, which led us to the question we left for later: How should we back up those terabytes of important information? Just to be clear, the backup problem was independent of the above options; with Manila we would still have had to solve the same challenge. We would like to see our data be backed up somewhere else other than in Ceph. And that s exactly where we are at right now. We ve been exploring different backup strategies and will finally use the Cinder backup API. Conclusion The iteration will end with the dedicated NFS hardware servers being stopped, and the shares being served from within the cloud. The migration will take some time to happen because we will check and double-check that everything works as expected (including from the performance point of view) before making definitive changes. We already have some plans to make sure our users experience as little service impact as possible. The most troublesome shares will be those related to Toolforge. At some point we will need to disallow writes to the NFS share, rsync the data out of the hardware servers into the Cinder volumes, point the NFS clients to the new virtual machines, and then enable writes again. The main Toolforge share has about 8TB of data, so this will take a while. We will have more updates in the future. Who knows, perhaps our next-next iteration, in a couple of years, will see us adopting Openstack Manila for good. Featured image credit: File:(from break water) Manila Skyline panoramio.jpg, ewol, CC BY-SA 3.0 This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

15 October 2021

Sven Hoexter: ThinkPad P15v Gen1, Xorg and a Samsung QHD Display

Wasted quite some hours until I found a working Modeline in this stack exchange post so the ThinkPad works with a HDMI attached Samsung QHD display. Internal display of the ThinkPad is a FHD display detected as eDP-1, the external one is DP-3 and according to the packaging known by Samsung as S24A600NWU. The auto deteced EDID modes for QHD - 2560x1440 - did not work at all, the display simply stays dark. After a lot of back and forth with the i915 driver vs nouveau vs nvidia/nvidia-drm with and without modesetting, the following Modeline did the magic:
xrandr --newmode 2560x1440_54.97  221.00  2560 2608 2640 2720  1440 1443 1447 1478  +HSync -VSync
xrandr --addmode DP-3 2560x1440_54.97
xrandr --output DP-3 --mode 2560x1440_54.97 --right-of eDP-1 --primary
Modelines for 50Hz and 60Hz generated with cvt 2560 1440 60 did not work, neither did the one extracted with edid-decode -X from the hex blob found in .local/share/xorg/Xorg.0.log. From the auto-detected Modelines FHD - 1920x1080 - did work. In case someone struggles with a similar setup, that might be a starting point. Fun part, if I attach my several years old Dell E7470 everything is just fine out of the box. But that one just has an Intel GPU and not the unholy combination I've here:
$ lspci grep -E "VGA 3D"
00:02.0 VGA compatible controller: Intel Corporation CometLake-H GT2 [UHD Graphics] (rev 05)
01:00.0 3D controller: NVIDIA Corporation GP107GLM [Quadro P620] (rev ff)

16 September 2021

Chris Lamb: On Colson Whitehead's Harlem Shuffle

Colson Whitehead's latest novel, Harlem Shuffle, was always going to be widely reviewed, if only because his last two books won Pulitzer prizes. Still, after enjoying both The Underground Railroad and The Nickel Boys, I was certainly going to read his next book, regardless of what the critics were saying indeed, it was actually quite agreeable to float above the manufactured energy of the book's launch. Saying that, I was encouraged to listen to an interview with the author by Ezra Klein. Now I had heard Whitehead speak once before when he accepted the Orwell Prize in 2020, and once again he came across as a pretty down-to-earth guy. Or if I were to emulate the detached and cynical tone Whitehead embodied in The Nickel Boys, after winning so many literary prizes in the past few years, he has clearly rehearsed how to respond to the cliched questions authors must be asked in every interview. With the obligatory throat-clearing of 'so, how did you get into writing?', for instance, Whitehead replies with his part of the catechism that 'It seemed like being a writer could be a cool job. You could work from home and not talk to people.' The response is the right combination of cute and self-effacing... and with its slight tone-deafness towards enforced isolation, it was no doubt honed before Covid-19. Harlem Shuffle tells three separate stories about Ray Carney, a furniture salesman and 'fence' for stolen goods in New York in the 1960s. Carney doesn't consider himself a genuine criminal though, and there's a certain logic to his relativistic morality. After all, everyone in New York City is on the take in some way, and if some 'lightly used items' in Carney's shop happened to have had 'previous owners', well, that's not quite his problem. 'Nothing solid in the city but the bedrock,' as one character dryly observes. Yet as Ezra pounces on in his NYT interview mentioned abov, the focus on the Harlem underworld means there are very few women in the book, and Whitehead's circular response ah well, it's a book about the criminals at that time! was a little unsatisfying. Not only did it feel uncharacteristically slippery of someone justly lauded for his unflinching power of observation (after all, it was the author who decided what to write about in the first place), it foreclosed on the opportunity to delve into why the heist and caper genres (from The Killing, The Feather Thief, Ocean's 11, etc.) have historically been a 'male' mode of storytelling. Perhaps knowing this to be the case, the conversation quickly steered towards Ray Carney's wife, Elizabeth, the only woman in the book who could be said possesses some plausible interiority. The following off-hand remark from Whitehead caught my attention:
My wife is convinced that [Elizabeth] knows everything about Carney's criminal life, and is sort of giving him a pass. And I'm not sure if that's true. I have to have to figure out exactly what she knows and when she knows it and how she feels about it.
I was quite taken by this, although not simply due to its effect on the story it self. As in, it immediately conjured up a charming picture of Whitehead's domestic arrangements: not only does Whitehead's wife feel free to disagree with what one of Whitehead's 'own' characters knows or believes, but that Colson has no problem whatsoever sharing that disagreement with the public at large. (It feels somehow natural that Whitehead's wife believes her counterpart knows more than she lets on, whilst Whitehead himself imbues the protagonist's wife with a kind of neo-Victorian innocence.) I'm minded to agree with Whitehead's partner myself, if only due to the passages where Elizabeth is studiously ignoring Carney's otherwise unexplained freak-outs. But all of these meta-thoughts simply underline just how emancipatory the Death of the Author can be. This product of academic literary criticism (the term was coined by Roland Barthes' 1967 essay of the same name) holds that the original author's intentions, ideas or biographical background carry no especial weight in determining how others should interpret their work. It is usually understood as meaning that a writer's own views are no more valid or 'correct' than the views held by someone else. (As an aside, I've found that most readers who encounter this concept for the first time have been reading books in this way since they were young. But the opposite is invariably true with cinephiles, who often have a bizarre obsession with researching or deciphering the 'true' interpretation of a film.) And with all that in mind, can you think of a more wry example of how freeing (and fun) nature of the Death of the Author than an author's own partner dissenting with their (Pulitzer Prize-winning) husband on the position of a lynchpin character?
The 1964 Harlem riot began after James Powell, a 15-year-old African American, was shot and killed by Thomas Gilligan, an NYPD police officer in front of 10s of witnesses. Gilligan was subsequently cleared by a grand jury.
As it turns out, the reviews for Harlem Shuffle have been almost universally positive, and after reading it in the two days after its release, I would certainly agree it is an above-average book. But it didn't quite take hold of me in the way that The Underground Railroad or The Nickel Boys did, especially the later chapters of The Nickel Boys that were set in contemporary New York and could thus make some (admittedly fairly explicit) connections from the 1960s to the present day that kind of connection is not there in Harlem Shuffle, or at least I did not pick up on it during my reading. I can see why one might take exception to that, though. For instance, it is certainly true that the week-long Harlem Riot forms a significant part of the plot, and some events in particular are entirely contingent on the ramifications of this momentous event. But it's difficult to argue the riot's impact are truly integral to the story, so not only is this uprising against police brutality almost regarded as a background event, any contemporary allusion to the murder of George Floyd is subsequently watered down. It's nowhere near the historical rubbernecking of Forrest Gump (1994), of course, but that's not a battle you should ever be fighting. Indeed, whilst a certain smoothness of affect is to be priced into the Whitehead reading experience, my initial overall reaction to Harlem Shuffle was fairly flat, despite all the action and intrigue on the page. The book perhaps belies its origins as a work conceived during quarantine after all, the book is essentially comprised of three loosely connected novellas, almost as if the unreality and mental turbulence of lockdown prevented the author from performing the psychological 'deep work' of producing a novel-length text with his usual depth of craft. A few other elements chimed with this being a 'lockdown novel' as well, particularly the book's preoccupation with the sheer physicality of the city compared to the usual complex interplay between its architecture and its inhabitants. This felt like it had been directly absorbed into the book from the author walking around his deserted city, and thus being able to take in details for the first time:
The doorways were entrances into different cities no, different entrances into one vast, secret city. Ever close, adjacent to all you know, just underneath. If you know where to look.
And I can't fail to mention that you can almost touch Whitehead's sublimated hunger to eat out again as well:
Stickups were chops they cook fast and hot, you re in and out. A stakeout was ribs fire down low, slow, taking your time. [ ] Sometimes when Carney jumped into the Hudson when he was a kid, some of that stuff got into his mouth. The Big Apple Diner served it up and called it coffee.
More seriously, however, the relatively thin personalities of minor characters then reminded me of the simulacrum of Zoom-based relationships, and the essentially unsatisfactory endings to the novellas felt reminiscent of lockdown pseudo-events that simply fizzle out without a bang. One of the stories ties up loose ends with: 'These things were usually enough to terminate a mob war, and they appeared to end the hostilities in this case as well.' They did? Well, okay, I guess.
The corner of 125th Street and Morningside Avenue in 2019, the purported location of Carney's fictional furniture store. Signage plays a prominent role in Harlem Shuffle, possibly due to the author's quarantine walks.
Still, it would be unfair to characterise myself as 'disappointed' with the novel, and none of this piece should be taken as really deep criticism. The book certainly was entertaining enough, and pretty funny in places as well:
Carney didn t have an etiquette book in front of him, but he was sure it was bad manners to sit on a man s safe. [ ] The manager of the laundromat was a scrawny man in a saggy undershirt painted with sweat stains. Launderer, heal thyself.
Yet I can't shake the feeling that every book you write is a book that you don't, and so we might need to hold out a little longer for Whitehead's 'George Floyd novel'. (Although it is for others to say how much of this sentiment is the expectations of a White Reader for The Black Author to ventriloquise the pain of 'their' community.) Some room for personal critique is surely permitted. I dearly missed the junk food energy of the dry and acerbic observations that run through Whitehead's previous work. At one point he had a good line on the model tokenisation that lurks behind 'The First Negro to...' labels, but the callbacks to this idea ceased without any payoff. Similar things happened with the not-so-subtle critiques of the American Dream:
Entrepreneur? Pepper said the last part like manure. That s just a hustler who pays taxes. [ ] One thing I ve learned in my job is that life is cheap, and when things start getting expensive, it gets cheaper still.
Ultimately, though, I think I just wanted more. I wanted a deeper exploration of how the real power in New York is not wielded by individual street hoodlums or even the cops but in the form of real estate, essentially serving as a synecdoche for Capital as a whole. (A recent take of this can be felt in Jed Rothstein's 2021 documentary, WeWork: Or the Making and Breaking of a $47 Billion Unicorn and it is perhaps pertinent to remember that the US President at the time this novel was written was affecting to be a real estate tycoon.). Indeed, just like the concluding scenes of J. J. Connolly's Layer Cake, although you can certainly pull off a cool heist against the Man, power ultimately resides in those who control the means of production... and a homespun furniture salesman on the corner of 125 & Morningside just ain't that. There are some nods to kind of analysis in the conclusion of the final story ('Their heist unwound as if it had never happened, and Van Wyck kept throwing up buildings.'), but, again, I would have simply liked more. And when I attempted then file this book away into the broader media landscape, given the current cultural visibility of 1960s pop culture (e.g. One Night in Miami (2020), Judas and the Black Messiah (2021), Summer of Soul (2021), etc.), Harlem Shuffle also seemed like a missed opportunity to critically analyse our (highly-qualified) longing for the civil rights era. I can certainly understand why we might look fondly on the cultural products from a period when politics was less alienated, when society was less atomised, and when it was still possible to imagine meaningful change, but in this dimension at least, Harlem Shuffle seems to merely contribute to this nostalgic escapism.

12 September 2021

Russ Allbery: Pod::Thread 3.00

This Perl module converts POD documents to thread, the macro language I use for my static site builder (about which, more in another software release coming up shortly). This release fixes a whole ton of long-standing problems. In brief, a lot of the POD implementation was previously done by chasing bugs rather than testing comprehensively, as reflected by the 65% code coverage in the previous release. The test suite now achieves about 95% code coverage (most of the rest is obscure error handling around encoding) and cleans up a bunch of long-standing problems with internal links. I had previously punted entirely on section links containing markup, and as a result the section links shown in the navigation bar or table of contents were missing the markup and headings containing thread metacharacters were mangled beyond recognition. That was because I was trying to handle resolving links using regexes (after I got rid of the original two-pass approach that required a driver script). This release uses Text::Balanced instead, the same parsing module used by my static site generator, to solve the problem (mostly) correctly. (I think there may still be a few very obscure edge cases, but I probably won't encounter them.) The net result should hopefully be better conversion of my software documentation, including INN documentation, when I regenerate my site (which will be coming soon). You can get the latest release from CPAN or from the Pod::Thread distribution page.

7 September 2021

Russell Coker: Oracle Cloud Free Tier

It seems that every cloud service of note has a free tier nowadays and the Oracle Cloud is the latest that I ve discovered (thanks to r/homelab which I highly recommend reading). Here s Oracle s summary of what they offer for free [1]. Oracle s always free tier (where presumable always is defined as until we change our contract ) currently offers ARM64 VMs to a total capacity of 4 CPU cores, 24G of RAM, and 200G of storage with a default VM size of 1/4 that (1 CPU core and 6G of RAM). It also includes 2 AMD64 VMs that each have 1G of RAM, but a 64bit VM with 1G of RAM isn t that useful nowadays. Web Interface The first thing to note is that the management interface is a massive pain to use. When a login times out for security reasons it redirects to a web page that gives a 404 error, maybe the redirection works OK if you are using it when it times out, but if you go off and spend an hour doing something else you will return to a 404 page. A web interface should never refer you to a page with a 404. There doesn t seem to be a way of bookmarking the commonly used links (as AWS does) and the set of links on the left depend on the section you are in with no obvious way of going between sections. Sometimes I got stuck in a set of pages about authentication controls (the identity cloud ) and there seems to be no link I could click on to get me back to cloud computing, I had to go to a bookmarked link for the main cloud login page. A web interface should never force the user to type in the main URL or go to a bookmark, you should be able to navigate from every page to every other page in a logical manner. An advanced user might have their own bookmarks in their browser to suit their workflow. But a beginner should be able to go to anywhere without breaking the session. Some parts of the interface appear to be copied from AWS, but unfortunately not the good parts. The way AWS manages IP access control is not easy to manage and it s not clear why packets are dropped, Oracle copies all this. On the upside Oracle has some good Datadog style analytics so for a new deployment you can debug IP access control by seeing records of rejected packets. Just to make it extra annoying when you create a rule with multiple ports specified the web interface will expand it out to multiple rules for one port each, having ports 80 and 443 on separate lines doesn t make things easier. Also it forces you to have IPv4 and IPv6 as separate rules, so if you want HTTP and HTTPS on both IPv4 and IPv6 (a common requirement) then you need 4 separate rules. One final annoying thing is that the web interface doesn t make your previous settings a default. As I ve created many ARM images and haven t created a single AMD image it should know that the probability that I want to create an AMD image is very low and stop defaulting to that. Recovery When trying a new system you will inevitably break things and have to recover things. The way to recover from a configuration error that prevents your VM from booting and getting to a state of allowing a login is to go to stop the VM, then go to the Boot volume section under Resources and use the settings button to detach the boot volume. Then you go to another VM (which must be running), go to the Attached block volumes menu and attach it as Paravirtualised (not iSCSI and not default which will probably be iSCSI). After some time the block device will appear and you can mount it and do stuff to it. Then after umounting it you detach it from the recovery VM and attach it again to the original VM (where it will still have an entry in the Boot volume section) and boot the original VM. As an aside it s really annoying that you can t attach a volume to a VM that isn t running. My first attempt at image recovery started with making a snapshot of the Boot volume, this didn t work well because the image uses EFI and therefore GPT and because the snapshot was larger than the original block device (which incidentally was the default size). I admit that I might have made a mistake making the snapshot, but if so it shouldn t be so easy to do. With GPT if you have a larger block device then partitioning tools complain about the backup partition table not being found, and they complain even more if you try to go back to the smaller size later on. Generally GPT partition tables are a bad idea for VMs, when I run the host I don t use partition tables, I have a separate block device for each filesystem or swap space. Snapshots aren t needed for recovery, they don t seem to work very well, and if it s possible to attach a snapshot to a VM in place of it s original Boot volume I haven t figured out how to do it. Console Connection If you boot Oracle Linux a derivative of RHEL that has SE Linux enabled in enforcing mode (yay) then you can go to the Console connection . The console is a Javascript console which allows you to login on a virtual serial console on device /dev/ttyAMA0. It tells you to type help but that isn t accepted, you have a straight Linux console login prompt. If you boot Ubuntu then you don t get a working serial console, it tells you to type help for help but doesn t respond to that. It seems that the Oracle Linux kernel 5.4.17-2102.204.4.4.el7uek.aarch64 is compiled with support for /dev/ttyAMA0 (the default ARM serial device) while the kernel 5.11.0-1016-oracle compiled by Oracle for their Ubuntu VMs doesn t have it. Performance I haven t done any detailed tests of VM performance. As a quick test I used zstd to compress a 154MB file, on my home workstation (E5-2620 v4 @ 2.10GHz) it took 11.3 seconds of CPU time to compress with zstd -9 and 7.2s to decompress. On the Oracle cloud it took 7.2s and 5.4s. So it seems that for some single core operations the ARM CPU used by the Oracle cloud is about 30% to 50% faster than a E5-2620 v4 (a slightly out of date server processor that uses DDR4 RAM). If you ran all the free resources in a single VM that would make a respectable build server. If you want to contribute to free software development and only have a laptop with 4G of RAM then an ARM build/test server with 24G of RAM and 4 cores would be very useful. Ubuntu Configuration The advantage of using EFI is that you can manage the kernel from within the VM. The default Oracle kernel for Ubuntu has a lot of modules included and is compiled with a lot of security options including SE Linux. Competitors https://aws.amazon.com/free AWS offers 750 hours (just over 31 days) per month of free usage of a t2.micro or t3.micro EC2 instance (which means 1GB of RAM). But that only lasts for 12 months and it s still only 1GB of RAM. AWS has some other things that could be useful like 1 million free Lambda requests per month. If you want to run your personal web site on Lambda you shouldn t hit that limit. They also apparently have some good offers for students. https://cloud.google.com/free The Google Cloud Project (GCP) offers $300 of credit. https://cloud.google.com/free/docs/gcp-free-tier#free-tier-usage-limits GCP also has ongoing free tier usage for some services. Some of them are pretty much unlimited use (50GB of storage for Cloud Source Repositories is a heap of source code). But for VMs you get the equivalent of 1*e2-micro instance running 24*7. A e2-micro has 1G of RAM. You also only get 30G of storage and 1GB of outbound data. It s clearly not as generous an offer as Oracle, but Oracle is the underdog so they have to try harder. https://azure.microsoft.com/en-us/free/ Azure appears to be much the same as AWS, free Linux VM for a year and then other less popular services free forever (or until they change the contract). https://www.ibm.com/cloud/free The IBM cloud free tier is the least generous offer, a VM is only free for 30 days. But what they offer for 30 days is pretty decent. If you want to try the IBM cloud and see if it can do what your company needs then this will do well. If you want to have free hosting for your hobby stuff then it s no good. Oracle seems like the most generous offer if you want to do stuff, but also one of the least valuable if you want to learn things that will help you at a job interview. For job interviews AWS seems the most useful and then GCP and Azure vying for second place.

10 July 2021

Laura Arjona Reina: Android backups with rsync

A quick note to self to remind how I do backups of my Android device with rsync (and adb). I have followed this guide: How to use rsync over USB on Android with adb My personal notes: address = 127.0.0.1
port = 1873
uid = 0
gid = 0
[root]
path = /
use chroot = false
read only = false'
adb shell /data/local/tmp/rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/proc/self/fd/2 didn't work, produced this message: "@ERROR: protocol startup error" so I ended up doing: adb shell
rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/sdcard/rsync.log
and opened another tab to perform the rsync commands from my laptop: rsync -av --progress --stats rsync://localhost:6010/root/storage .
rsync -av --progress --stats rsync://localhost:6010/root/data .
Then I saw that rsync was copying the symlinks instead of their contents: /storage/self/primary was a broken link to /mnt/user/0/primary So I ran again the commands with -LK: rsync -av --progress --stats -LK rsync://localhost:6010/root/storage .
rsync -av --progress --stats -LK rsync://localhost:6010/root/data .
and now I have a copy of all the files I'm interested. In addition to this, I run an adb backup of the system: adb backup -f ./adb_backup_apk_shared_all_system.ad -apk -shared -all -system and I think that's all that I need for the case I want to remove stuff from my phone or some disaster happens.

Sean Whitton: Live replacement of provider cloud images with upstream Debian

Tonight I m provisioning a new virtual machine at Hetzner and I wanted to share how Consfigurator is helping with that. Hetzner have a Debian buster image you can start with, as you d expect, but it comes with things like cloud-init, preconfiguration to use Hetzner s apt mirror which doesn t serve source packages(!), and perhaps other things I haven t discovered. It s a fine place to begin, but I want all the configuration for this server to be explicit in my Consfigurator consfig, so it is good to start with pristine upstream Debian. I could boot one of Hetzner s installation ISOs but that s slow and manual. Consfigurator can replace the OS in the VM s root filesystem and reboot for me, and we re ready to go. Here s the configuration:
(defhost foo.silentflame.com (:deploy ((:ssh :user "root") :sbcl))
  (os:debian-stable "buster" :amd64)
  ;; Hetzner's Debian 10 image comes with a three-partition layout and boots
  ;; with traditional BIOS.
  (disk:has-volumes
   (physical-disk
    :device-file "/dev/sda" :boots-with '(grub:grub :target "i386-pc")))
  (on-change (installer:cleanly-installed-once
              nil
              ;; This is a specification of the OS Hetzner's image has, so
              ;; Consfigurator knows how to install SBCL and debootstrap(8).
              ;; In this case it's the same Debian release as the replacement.
              '(os:debian-stable "buster" :amd64))
    ;; Clear out the old OS's EFI system partition contents, in case we can
    ;; switch to booting with EFI at some point (if we wanted we could specify
    ;; an additional x86_64-efi target above, and grub-install would get run
    ;; to repopulate /boot/efi, but I don't think Hetzner can boot from it yet).
    (file:directory-does-not-exist "/boot/efi/EFI")
    (apt:installed "linux-image-amd64")
    (installer:bootloaders-installed)
    (fstab:entries-for-volumes
     (disk:volumes
       (mounted-ext4-filesystem :mount-point "/")
       (partition
        (mounted-fat32-filesystem
         :mount-options '("umask=0077") :mount-point "/boot/efi"))))
    (file:lacks-lines "/etc/fstab" "# UNCONFIGURED FSTAB FOR BASE SYSTEM")
    (file:is-copy-of "/etc/resolv.conf" "/old-os/etc/resolv.conf")
    (mount:unmounted-below-and-removed "/old-os"))
  (apt:mirror "http://ftp.de.debian.org/debian")
  (apt:no-pdiffs)
  (apt:standard-sources.list)
  (sshd:installed)
  (as "root" (ssh:authorized-keys +spwsshkey+))
  (sshd:no-passwords)
  (timezone:configured "Etc/UTC")
  (swap:has-swap-file "2G")
  (network:clean-/etc/network/interfaces)
  (network:static "enp1s0" "xxx.xxx.xxx.xxx" "xxx.xxx.1.1" "255.255.255.255"))
and to use it you evaluate this at the REPL:
CONSFIG> (deploy ((:ssh :user "root" :hop "xxx.xxx.xxx.xxx") :sbcl) foo.silentflame.com)
Here the :HOP parameter specifies the IP address of the new machine, as DNS hasn t been updated yet. Consfigurator installs SBCL and debootstrap(8), prepares a minimal system, replaces the contents of /, gets to work applying the other properties, and then reboots. This gets us a properly populated fstab:
UUID=...            /           ext4    relatime    0   1
PARTUUID=...        /boot/efi   vfat    umask=0077  0   2
/var/lib/swapfile   swap        swap    defaults    0   0
(slightly doctored for more readable alignment) There s ordering logic so that the swapfile will end up after whatever filesystem contains it; a UUID is used for ext4 filesystems, but for fat32 filesystems, to be safe, a PARTUUID is used. The application of (INSTALLER:BOOTLOADERS-INSTALLED) handles calling both update-grub(8) and grub-install(8), relying on the metadata specified about /dev/sda. Next time we execute Consfigurator against the machine, it ll ignore all the property applications attached to the application of (INSTALLER:CLEANLY-INSTALLED-ONCE) with ON-CHANGE, and just apply everything following that block. There are a few things I don t have good solutions for. When you boot Hetzner s image the primary network interface is eth0, but then for a freshly debootstrapped Debian you get enp1s0, and I haven t got a good way of knowing what it ll be (if you know it ll have the same name, you can use (NETWORK:PRESERVE-STATIC-ONCE) to create a file in /etc/network/interfaces.d based on the current default route and corresponding interface). Another tricky thing is SSH host keys. It s easy to use Consfigurator to add host keys to your laptop s ~/.ssh/known_hosts, but in this case the host key changes back and forth from whatever the Hetzner image has and the newly generated key you get afterwards. One option might be to copy the old host keys out of /old-os before it gets deleted, like how /etc/resolv.conf is copied. This work is based on Propellor s equivalent functionality. I think my approach to handling /etc/fstab and bootloader installation is an improvement on what Joey does.

29 June 2021

Ritesh Raj Sarraf: Plant Territorial Behavior

This blog post is about my observations of some of the plants in my home garden. While still a n00b on the subject, these notes are my observations and experiences over days, weeks and months. Thankfully, with the capability to take frequent pictures, it has been easy to do an assessment and generate a report of some of these amazing behaviors of plants, in an easy timeline order; all thanks to the EXIF data embedded. This has very helpfully allowed me to record my, otherwise minor observations, into great detail; and make some sense out of it by correlating the data over time. It is an emotional experience. You see, plants are amazing. When I sow a sapling, water it, feed it, watch it grow, prune it, medicate it, and what not; I build up affection towards it. Though, at the same time, to me it is a strict relationship, not too attached; as in it doesn t hurt to uproot a plant if there is a good reason. But still, I find some sort of association to it. With plants around, it feels I have a lot of lives around me. All prospering, communicating, sharing. And communicate they do. What is needed is just the right language to observe and absorb their signals and decipher what they are trying to say.

Devastation How in this world, when you are caring for your plants, can it transform:

From This
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Bael Plant
Healthy Bael Plant

To This
Dead Mulberry Plant
Dead Mulberry Plant
Very Sick Bael Plant
Very Sick Bael Plant
With emotions involved, this can be an unpleasant experience. Bael is a dear plant to me. The plant as a whole has religious values (Shiva). As well, its fruits have lots of health benefits, especially for the intestines. Its leaves have a lot of medicinal properties. When I planted the Bael, there were a lot of emotions that went along. On the other hand, the Mulberry is something I put in with a lot of enthusiasm. Mulberries are now rare to find, especially in urban locations. For one, they have a very short shelf life; But more than that, the way lifestyles are heading towards, I was always worried if my children would ever have a day to see and taste these fruits. The mulberry that I planted, yielded twice; once very soon when I had planted and second, before it died. Infact, it died while during its second yield phase. It was quite saddening to see that happen. It made me wonder why it happened. I had been caring for the plants fairly well. Watering them timely, feeding them the right amount of nutrients. They were getting a good amount of sun. But still their health was deteriorating. And then the demise of the Mulberry. Many thoughts hit my mind. I consulted the claimed experts in the domain, the maali, the gardener. I got a very vague answer; there must be termites in the soil. It didn t make much sense to me. I mean if there are termites they d hit day one. They won t sleep for months and just wake up one fine day and start attacking the roots of particular plants; not all. I wasn t convinced with the termite theory; But still, give the expert the blind hand, I went with his word. When my mulberry was dead, I dug its roots. Looking for proof, to see if there were any termites, I uprooted it. But I couldn t find any trace of termites. And the plant next to it was perfectly healthy and blossoming. So I was convinced that it wasn t the termite but something else. But else what ? I still didn t have an answer to that.

Thinking The Corona pandemic had embraced and there was a lot to worry, and worry not any, if you change the perspective. With plants around in my home, and our close engagement with them, and the helplessness that I felt after seeking help from the experts, it was time again; to build up some knowledge on the subject. But how ? How do you go about a subject you have not much clue about ? A subject which has always been around in the surrounding but very seldom have I dedicated focused thought to it. To be honest, the initial thought of diving on the subject made me clueless. I had no idea where to begin with. But, so, as has been my past history, I chose to take it as a curiosity. I gathered some books, skimmed through a couple of pages. Majority of the books I got hold of were about DIYs and How to do Home Gardening types. It was a decent introduction to a novice but my topic of curiosity was different. Thankfully, with the Internet, and YouTube in particular, a lot of good stuff is available as documentary videos. While going through some, I came across a video which mentioned about carnivore plants. Like, for example, this one.
Carnivore Plant
Carnivore Plant
This got me thinking that there could be a possibility of something similar, that did the fate to my Mulberry plant. But who did it ? And how to dive further on this suspicion ? And most of all, if that thought of possibility was actually the reality. Or was I just hitting in the dark ?

Beginning To put some perspective, here s how it started. When we moved into our home, the gardener put in a couple of plants stock, as part of the property handover. Now, I don t exactly recollect the name of the plants that came in stock, neither English nor Hindi; But at my neighbor s place, the plant is still there. Here are some of the pictures of this beauty. But don t just go by the looks as looks can be deceiving
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
We hadn t put any serious thought about the plants we were offered by the gardener. After all, we never had ever thought of any mishap either.

Plants we planted Apart from what was offered by the builder/gardener as part of the property handover, in over the next 6 months of we moving in, I planted 3 tree type plants.
  1. Mulberry
  2. Bael
  3. Rudraksha
The Mulberry, as I have described so far, died a tragic death. Bael, on the other hand, fought hard. But very little did we know that the plant was struggling the fight. Our impression was that we must have been given a bad breed of the plant. Or maybe the termite theory had some truth. For the Rudraksha plant, the growth was slow. This was the very first time I had seen a Rudraksha plant, so I had no clue of what its growth rate could be, and what to expect out of it. I wasn t sure if the local climate suited the plant. A quick search showed no objections to the plant in the local climate, but that was it. So my theory has been to put in the plant, and observe. Here s what my Rudraksha plant looked like during the initial days/weeks of its settlement
Rudraksha Plant
Rudraksha Plant

The Hint Days passed and so on. Not much had progressed in gathering information. The plant s health was usual; deteriorating at a slow pace. On day, thinking of the documentaries I had been watching, it hit my mind about the plant behavior.
  • Plants can be Carnivores.
  • Plants can be Aggressive.
  • Plants can be Invaders.
  • Plants can be Territorial.
There are many plants where their aggression can be witnessed with bare human eyes. Like creepers. Some of them are good at spreading tentacles, grabbing onto other plants' stems and branches and spread above it. This was my hint from the documentaries. That s one of the many ways plants set their dominance. That is what hit my mind that if plants are aggressive on the out, underneath the soil, they should be having similar behavior. I mean, what we see as humans is just a part of the actual plant. More than half of the actual plant is usually underneath the soil, in most plants. So there s a high chance to get more information out, if you dig the soil and look the roots.

The Digging As I mentioned earlier, I do establish bindings, emotions and attachments. But not much usually comes in the way to curiosity. To dig further on the theory that the problem was elsewhere, with within the plants ecosystem, we needed to pick on another subject - the plant. And the plant we chose was the plant which was planted in the initial offering to us, when we moved into our home. It was the same plant breed which was neighboring all our newly planted trees: Rudraksha, Mulberry and Bael. If you look closely into the pictures above of these plants, you ll notice the stem of another plant, the Territorial Dominator, is close-by to these 3 plants. That s because the gardener put in a good number of them to get his action item complete. So we chose to dig and uproot one of those plant to start with. Now, while they may look gentle on the outside, with nice red colored tiny flowers, these plants were giants underneath. Their roots were huge. It took some sweat shredding to single-handedly remove them.
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted

Today is brighter

Bael I ll let the pictures do the initial talking today.
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
The above ones are pictures of the same Bael plant, which had struggled to live, for almost 14 months. Back then, this plant was starved of its resources. It was dying a slow death out of starvation. After we uprooted the other dominant species, the Bael has recovered and has regained its charm. In the pictures above of the Bael plant, you can clearly mark out the difference in its stem. The dark colored one is from its months of struggle, while the bright green is from now where it is well nourished and regained its health.

Mulberry As for the Mulberry, I couldn t save it. But I later managed to get another one. But it turns out I didn t take good, full length pictures of the new mulberry when I planted. The only picture I have is this:
Second Mulberry Plant
Second Mulberry Plant
I recollect when I brought it home, it was around 1 - 1.5 feet in length. This is where I have it today: Majestically standing, 12 feet and counting
Second Mulberry Plant 7 feet tall
Second Mulberry Plant 7 feet tall

Rudraksha Then:
Rudraksha Plant
Rudraksha Plant
And now: I feel quite happy about
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
All these plants are on the very same soil with the very same care taker. What has changed is my experience and learning.

Plant Co-Existence Plant co-existence is a difficult topic. My knowledge on plants is very limited in general, and co-existence is something tricky, unexplored, at times invisible (when underneath the soil). So it is a difficult topic. So far, what I ve learnt is purely observations, experiences and hints from the documentaries. There surely are many many plants that co-exist very well. A good example is my Bael plant itself, which is healthily co-sharing its space with 2 other Croton plants. Same goes for the Rudraksha, which has a close-by neighbor in an Adenium and an Allamanda. The plant world is mesmerizing. How they behave, communicate and many many more signs. There s so much to observe, learn, explore and document. I hope to have more such observations and experiences to share

19 June 2021

Chris Lamb: *Raiders of the Lost Ark*: 40 Years On

"Again, we see there is nothing you can possess which I cannot take away."
The cinema was a rare and expensive treat in my youth, so I first came across Raiders of the Lost Ark by recording it from television onto a poor quality VHS. I only mention this as it meant I watched a slightly different film to the one intended, as my copy somehow missed off the first 10 minutes. For those not as intimately familiar with the film as me, this is just in time to see a Belloq demand Dr. Jones hand over the Peruvian head (see above), just in time to learn that Indy loathes snakes, and just in time to see the inadvertent reproduction of two Europeans squabbling over the spoils of a foreign land. What this truncation did to my interpretation of the film (released thirty years ago today on June 19th 1981) is interesting to explore. Without Jones' physical and moral traits being demonstrated on-screen (as well as missing the weighing the gold head and the rollercoaster boulder scene), it actually made the idea of 'Indiana Jones' even more of a mythical archetype. The film wisely withholds Jones' backstory, but my directors cut deprived him of even more, and counterintuitively imbued him with even more of a legendary hue as the elision made his qualities an assumption beyond question. Indiana Jones, if you can excuse the clich , needed no introduction at all. Good artists copy, great artists steal. And oh boy, does Raiders steal. I've watched this film about twenty times over the past two decades and it's now firmly entered into my personal canon. But watching it on its thirtieth anniversary was different not least because I could situate it in a broader cinematic context. For example, I now see the Gestapo officer in Major Strasser from Casablanca (1942), in fact just as I can with many of Raiders' other orientalist tendencies: not only in its breezy depictions of backwards sand people, but also of North Africa as an entrep t and playground for a certain kind of Western gangster. The opening as well, set in an equally reductionist pseudo-Peru, now feels like Werner Herzog's Aguirre, the Wrath of God (1972) but without, of course, any self-conscious colonial critique.
The imagery of the ark appears to be borrowed from James Tissot's The Ark Passes Over the Jordan, part of the fin de siecle fascination with the occult and (ironically enough given the background of Raiders' director), a French Catholic revival.
I can now also appreciate some of the finer edges that make this film just so much damn fun to watch. For instance, the comic book conceit that Jones and Belloq are a 'shadowy reflection' of one other and that they need 'only a nudge' to make one like the other. As is the idea that Belloq seems to be actually enjoying being evil. I also spotted Jones rejecting the martini on the plane. This feels less like a comment on corrupting effect of alcohol (he drinks rather heavily elsewhere in the film), but rather a subtle distancing from James Bond. This feels especially important given that the action-packed cold open is, let us be honest for a second, ripped straight from the 007 franchise. John William's soundtracks are always worth mentioning. The corny Raiders March does almost nothing for me, but the highly-underrated 'Ark theme' certainly does. I delight in its allusions to Gregorian chant, the diabolus in musica and the Hungarian minor scale, fusing the Christian doctrine of the Holy Trinity (the stacked thirds, get it?), the ars antiqua of the Middle Ages with an 'exotic' twist that the Russian Five associated with central European Judaism.
The best use of the ark leitmotif is, of course, when it is opened. Here, Indy and Marion are saved by not opening their eyes whilst the 'High Priest' Belloq and the rest of the Nazis are all melted away. I'm no Biblical scholar, but I'm almost certain they were alluding to Leviticus 16:2 here:
The Lord said to Moses: Tell your brother Aaron that he is not to come whenever he chooses into the Most Holy Place behind the curtain in front of the atonement cover on the ark, or else he will die, for I will appear in the cloud above the mercy seat.
But would it be too much of a stretch to also see the myth of Orpheus and Eurydices too? Orpheus's wife would only be saved from the underworld if he did not turn around until he came to his own house. But he turned round to look at his wife, and she instantly slipped back into the depths:
For he who overcome should turn back his gaze
Towards the Tartarean cave,
Whatever excellence he takes with him
He loses when he looks on those below.
Perhaps not, given that Marion and the ark are not lost in quite the same way. But whilst touching on gender, it was interesting to update my view of archaeologist Ren Belloq. To countermand his slight queer coding (a trope of Disney villains such as Scar, Jafar, Cruella, etc.), there is a rather clumsy subplot involving Belloq repeatedly (and half-heartedly) failing to seduce Marion. This disavows any idea that Belloq isn't firmly heterosexual, essential for the film's mainstream audience, but it is especially important in Raiders because, if we recall the relationship between Belloq and Jones: 'it would take only a nudge to make you like me'. (This would definitely put a new slant on 'Top men'.)
However, my favourite moment is where the Nazis place the ark in a crate in order to transport it to the deserted island. On route, the swastikas on the side of the crate spontaneously burn away, and a disturbing noise is heard in the background. This short scene has always fascinated me, partly because it's the first time in the film that the power of the ark is demonstrated first-hand but also because gives the object an other-worldly nature that, to the best of my knowledge, has no parallel in the rest of cinema. Still, I had always assumed that the Aak disfigured the swastikas because of their association with the Nazis, interpreting the act as God's condemnation of the Third Reich. But now I catch myself wondering whether the ark would have disfigured any iconography as a matter of principle or whether their treatment was specific to the swastika. We later get a partial answer to this question, as the 'US Army' inscriptions in the Citizen Kane warehouse remain untouched. Far from being an insignificant concern, the filmmakers appear to have wandered into a highly-contested theological debate. As in, if the burning of the swastika is God's moral judgement of the Nazi regime, then God is clearly both willing and able to intervene in human affairs. So why did he not, to put it mildly, prevent Auschwitz? From this perspective, Spielberg appears to be limbering up for some of the academic critiques surrounding Holocaust representations that will follow Schindler's List (1993). Given my nostalgic and somewhat ironic attachment to Raiders, it will always be difficult for me to objectively appraise the film. Even so, it feels like it is underpinned by an earnest attempt to entertain the viewer, largely absent in the affected cynicism of contemporary cinema. And when considered in the totality of Hollywood's output, its tonal and technical flaws are not actually that bad or at least Marion's muddled characterisation and its breezy chauvinism (for example) clearly have far worse examples. Perhaps the most remarkable thing about the film in 2021 is that it hasn't changed that much at all. It spawned one good sequel (The Last Crusade), one bad one (The Temple of Doom), and one hardly worth mentioning at all, yet these adventures haven't affected the original Raiders in any meaningful way. In fact, if anything has affected the original text it is, once again, George Lucas himself, as knowing the impending backlash around the Star Wars prequels adds an inadvertent paratext to all his earlier works. Yet in a 1978 discussion prior to the creation of Raiders, you can get a keen sense of how Lucas' childlike enthusiasm will always result in something either extremely good or something extremely bad somehow no middle ground is quite possible. Yes, it's easy to rubbish his initial ideas 'We'll call him Indiana Smith! but hasn't Lucas actually captured the essence of a heroic 'Americana' here, and that the final result is simply a difference of degree, not kind?

Chris Lamb: Raiders of the Lost Ark: 40 Years On

"Again, we see there is nothing you can possess which I cannot take away."
The cinema was a rare and expensive treat in my youth, so I first came across Raiders of the Lost Ark by recording it from television onto a poor quality VHS. I only mention this as it meant I watched a slightly different film to the one intended, as my copy somehow missed off the first 10 minutes. For those not as intimately familiar with the film as me, this is just in time to see a Belloq demand Dr. Jones hand over the Peruvian head (see above), just in time to learn that Indy loathes snakes, and just in time to see the inadvertent reproduction of two Europeans squabbling over the spoils of a foreign land. What this truncation did to my interpretation of the film (released thirty years ago today on June 19th 1981) is interesting to explore. Without Jones' physical and moral traits being demonstrated on-screen (as well as missing the weighing the gold head and the rollercoaster boulder scene), it actually made the idea of 'Indiana Jones' even more of a mythical archetype. The film wisely withholds Jones' backstory, but my directors cut deprived him of even more, and counterintuitively imbued him with even more of a legendary hue as the elision made his qualities an assumption beyond question. Indiana Jones, if you can excuse the clich , needed no introduction at all. Good artists copy, great artists steal. And oh boy, does Raiders steal. I've watched this film about twenty times over the past two decades and it's now firmly entered into my personal canon. But watching it on its thirtieth anniversary was different not least because I could situate it in a broader cinematic context. For example, I now see the Gestapo officer in Major Strasser from Casablanca (1942), in fact just as I can with many of Raiders' other orientalist tendencies: not only in its breezy depictions of backwards sand people, but also of North Africa as an entrep t and playground for a certain kind of Western gangster. The opening as well, set in an equally reductionist pseudo-Peru, now feels like Werner Herzog's Aguirre, the Wrath of God (1972) but without, of course, any self-conscious colonial critique.
The imagery of the ark appears to be borrowed from James Tissot's The Ark Passes Over the Jordan, part of the fin de siecle fascination with the occult and (ironically enough given the background of Raiders' director), a French Catholic revival.
I can now also appreciate some of the finer edges that make this film just so much damn fun to watch. For instance, the comic book conceit that Jones and Belloq are a 'shadowy reflection' of one other and that they need 'only a nudge' to make one like the other. As is the idea that Belloq seems to be actually enjoying being evil. I also spotted Jones rejecting the martini on the plane. This feels less like a comment on corrupting effect of alcohol (he drinks rather heavily elsewhere in the film), but rather a subtle distancing from James Bond. This feels especially important given that the action-packed cold open is, let us be honest for a second, ripped straight from the 007 franchise. John William's soundtracks are always worth mentioning. The corny Raiders March does almost nothing for me, but the highly-underrated 'Ark theme' certainly does. I delight in its allusions to Gregorian chant, the diabolus in musica and the Hungarian minor scale, fusing the Christian doctrine of the Holy Trinity (the stacked thirds, get it?), the ars antiqua of the Middle Ages with an 'exotic' twist that the Russian Five associated with central European Judaism.
The best use of the ark leitmotif is, of course, when it is opened. Here, Indy and Marion are saved by not opening their eyes whilst the 'High Priest' Belloq and the rest of the Nazis are all melted away. I'm no Biblical scholar, but I'm almost certain they were alluding to Leviticus 16:2 here:
The Lord said to Moses: Tell your brother Aaron that he is not to come whenever he chooses into the Most Holy Place behind the curtain in front of the atonement cover on the ark, or else he will die, for I will appear in the cloud above the mercy seat.
But would it be too much of a stretch to also see the myth of Orpheus and Eurydices too? Orpheus's wife would only be saved from the underworld if he did not turn around until he came to his own house. But he turned round to look at his wife, and she instantly slipped back into the depths:
For he who overcome should turn back his gaze
Towards the Tartarean cave,
Whatever excellence he takes with him
He loses when he looks on those below.
Perhaps not, given that Marion and the ark are not lost in quite the same way. But whilst touching on gender, it was interesting to update my view of archaeologist Ren Belloq. To countermand his slight queer coding (a trope of Disney villains such as Scar, Jafar, Cruella, etc.), there is a rather clumsy subplot involving Belloq repeatedly (and half-heartedly) failing to seduce Marion. This disavows any idea that Belloq isn't firmly heterosexual, essential for the film's mainstream audience, but it is especially important in Raiders because, if we recall the relationship between Belloq and Jones: 'it would take only a nudge to make you like me'. (This would definitely put a new slant on 'Top men'.)
However, my favourite moment is where the Nazis place the ark in a crate in order to transport it to the deserted island. On route, the swastikas on the side of the crate spontaneously burn away, and a disturbing noise is heard in the background. This short scene has always fascinated me, partly because it's the first time in the film that the power of the ark is demonstrated first-hand but also because gives the object an other-worldly nature that, to the best of my knowledge, has no parallel in the rest of cinema. Still, I had always assumed that the Aak disfigured the swastikas because of their association with the Nazis, interpreting the act as God's condemnation of the Third Reich. But now I catch myself wondering whether the ark would have disfigured any iconography as a matter of principle or whether their treatment was specific to the swastika. We later get a partial answer to this question, as the 'US Army' inscriptions in the Citizen Kane warehouse remain untouched. Far from being an insignificant concern, the filmmakers appear to have wandered into a highly-contested theological debate. As in, if the burning of the swastika is God's moral judgement of the Nazi regime, then God is clearly both willing and able to intervene in human affairs. So why did he not, to put it mildly, prevent Auschwitz? From this perspective, Spielberg appears to be limbering up for some of the academic critiques surrounding Holocaust representations that will follow Schindler's List (1993). Given my nostalgic and somewhat ironic attachment to Raiders, it will always be difficult for me to objectively appraise the film. Even so, it feels like it is underpinned by an earnest attempt to entertain the viewer, largely absent in the affected cynicism of contemporary cinema. And when considered in the totality of Hollywood's output, its tonal and technical flaws are not actually that bad or at least Marion's muddled characterisation and its breezy chauvinism (for example) clearly have far worse examples. Perhaps the most remarkable thing about the film in 2021 is that it hasn't changed that much at all. It spawned one good sequel (The Last Crusade), one bad one (The Temple of Doom), and one hardly worth mentioning at all, yet these adventures haven't affected the original Raiders in any meaningful way. In fact, if anything has affected the original text it is, once again, George Lucas himself, as knowing the impending backlash around the Star Wars prequels adds an inadvertent paratext to all his earlier works. Yet in a 1978 discussion prior to the creation of Raiders, you can get a keen sense of how Lucas' childlike enthusiasm will always result in something either extremely good or something extremely bad somehow no middle ground is quite possible. Yes, it's easy to rubbish his initial ideas 'We'll call him Indiana Smith! but hasn't Lucas actually captured the essence of a heroic 'Americana' here, and that the final result is simply a difference of degree, not kind?

14 June 2021

Fran ois Marier: Self-hosting an Ikiwiki blog

8.5 years ago, I moved my blog to Ikiwiki and Branchable. It's now time for me to take the next step and host my blog on my own server. This is how I migrated from Branchable to my own Apache server.

Installing Ikiwiki dependencies Here are all of the extra Debian packages I had to install on my server:
apt install ikiwiki ikiwiki-hosting-common gcc libauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-sslauthen-passphrase-perl libcgi-formbuilder-perl libcrypt-ssleay-perl libjson-xs-perl librpc-xml-perl python-docutils libxml-feed-perl libsearch-xapian-perl libmailtools-perl highlight-common libsearch-xapian-perl xapian-omega
apt install --no-install-recommends ikiwiki-hosting-web libgravatar-url-perl libmail-sendmail-perl libcgi-session-perl
apt purge libnet-openid-consumer-perl
Then I enabled the CGI module in Apache:
a2enmod cgi
and un-commented the following in /etc/apache2/mods-available/mime.conf:
AddHandler cgi-script .cgi

Creating a separate user account Since Ikiwiki needs to regenerate my blog whenever a new article is pushed to the git repo or a comment is accepted, I created a restricted user account for it:
adduser blog
adduser blog sshuser
chsh -s /usr/bin/git-shell blog

git setup Thanks to Branchable storing blogs in git repositories, I was able to import my blog using a simple git clone in /home/blog (the srcdir):
git clone --bare git://feedingthecloud.branchable.com/ source.git
Note that the name of the directory (source.git) is important for the ikiwikihosting plugin to work. Then I pulled the .setup file out of the setup branch in that repo and put it in /home/blog/.ikiwiki/FeedingTheCloud.setup. After that, I deleted the setup branch and the origin remote from that clone:
git branch -d setup
git remote rm origin
Following the recommended git configuration, I created a working directory (the repository) for the blog user to modify the blog as needed:
cd /home/blog/
git clone /home/blog/source.git FeedingTheCloud
I added my own ssh public key to /home/blog/.ssh/authorized_keys so that I could push to the srcdir from my laptop. Finaly, I generated a new ssh key without a passphrase:
ssh-keygen -t ed25519
and added it as deploy key to the GitHub repo which acts as a read-only mirror of my blog.

Ikiwiki config While I started with the Branchable setup file, I changed the following things in it:
adminemail: webmaster@fmarier.org
srcdir: /home/blog/FeedingTheCloud
destdir: /var/www/blog
url: https://feeding.cloud.geek.nz
cgiurl: https://feeding.cloud.geek.nz/blog.cgi
cgi_wrapper: /var/www/blog/blog.cgi
cgi_wrappermode: 675
add_plugins:
- goodstuff
- lockedit
- comments
- blogspam
- sidebar
- attachment
- favicon
- format
- highlight
- search
- theme
- moderatedcomments
- flattr
- calendar
- headinganchors
- notifyemail
- anonok
- autoindex
- date
- relativedate
- htmlbalance
- pagestats
- sortnaturally
- ikiwikihosting
- gitpush
- emailauth
disable_plugins:
- brokenlinks
- fortune
- more
- openid
- orphans
- passwordauth
- progress
- recentchanges
- repolist
- toggle
- txt
sslcookie: 1
cookiejar:
  file: /home/blog/.ikiwiki/cookies
useragent: ikiwiki
git_wrapper: /home/blog/source.git/hooks/post-update
urlalias:
- http://feeds.cloud.geek.nz/
- http://www.feeding.cloud.geek.nz/
owner: francois@fmarier.org
hostname: feeding.cloud.geek.nz
emailauth_sender: login@fmarier.org
allowed_attachments: admin()
Then I created the destdir:
mkdir /var/www/blog
chown blog:blog /var/www/blog
and generated the initial copy of the blog as the blog user:
ikiwiki --setup .ikiwiki/FeedingTheCloud.setup --wrappers --rebuild
One thing that failed to generate properly was the tag cloug (from the pagestats plugin). I have not been able to figure out why it fails to generate any output when run this way, but if I push to the repo and let the git hook handle the rebuilding of the wiki, the tag cloud is generated correctly. Consequently, fixing this is not high on my list of priorities, but if you happen to know what the problem is, please reach out.

Apache config Here's the Apache config I put in /etc/apache2/sites-available/blog.conf:
<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz
    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem
    Header set Strict-Transport-Security: "max-age=63072000; includeSubDomains; preload"
    Include /etc/fmarier-org/blog-common
</VirtualHost>
<VirtualHost *:443>
    ServerName www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz
    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/feeding.cloud.geek.nz/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/feeding.cloud.geek.nz/privkey.pem
    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>
<VirtualHost *:80>
    ServerName feeding.cloud.geek.nz
    ServerAlias www.feeding.cloud.geek.nz
    ServerAlias feeds.cloud.geek.nz
    Redirect permanent / https://feeding.cloud.geek.nz/
</VirtualHost>
and the common config I put in /etc/fmarier-org/blog-common:
ServerAdmin webmaster@fmarier.org
DocumentRoot /var/www/blog
LogLevel core:info
CustomLog $ APACHE_LOG_DIR /blog-access.log combined
ErrorLog $ APACHE_LOG_DIR /blog-error.log
AddType application/rss+xml .rss
<Location /blog.cgi>
        Options +ExecCGI
</Location>
before enabling all of this using:
a2ensite blog
apache2ctl configtest
systemctl restart apache2.service
The feeds.cloud.geek.nz domain used to be pointing to Feedburner and so I need to maintain it in order to avoid breaking RSS feeds from folks who added my blog to their reader a long time ago.

Server-side improvements Since I'm now in control of the server configuration, I was able to make several improvements to how my blog is served. First of all, I enabled the HTTP/2 and Brotli modules:
a2enmod http2
a2enmod brotli
and enabled Brotli compression by putting the following in /etc/apache2/conf-available/compression.conf:
<IfModule mod_brotli.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION BROTLI_COMPRESS
  </IfDefine>
</IfModule>
<IfModule mod_deflate.c>
  <IfDefine !TRANSFER_COMPRESSION>
    Define TRANSFER_COMPRESSION DEFLATE
  </IfDefine>
</IfModule>
<IfDefine TRANSFER_COMPRESSION>
  <IfModule mod_filter.c>
    AddOutputFilterByType $ TRANSFER_COMPRESSION  text/html text/plain text/xml text/css text/javascript
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/x-javascript application/javascript application/ecmascript
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/rss+xml
    AddOutputFilterByType $ TRANSFER_COMPRESSION  application/xml
  </IfModule>
</IfDefine>
and replacing /etc/apache2/mods-available/deflate.conf with the following:
# Moved to /etc/apache2/conf-available/compression.conf as per https://bugs.debian.org/972632
before enabling this new config:
a2enconf compression
Next, I made my blog available as a Tor onion service by putting the following in /etc/apache2/sites-available/blog.conf:
<VirtualHost *:443>
    ServerName feeding.cloud.geek.nz
    ServerAlias xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion
    Header set Onion-Location "http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion% REQUEST_URI s"
    Header set alt-svc 'h2="xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion:443"; ma=315360000; persist=1'
    ... 
<VirtualHost *:80>
    ServerName xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion
    Include /etc/fmarier-org/blog-common
</VirtualHost>
Then I followed the Mozilla Observatory recommendations and enabled the following security headers:
Header set Content-Security-Policy: "default-src 'none'; report-uri https://fmarier.report-uri.com/r/d/csp/enforce ; style-src 'self' 'unsafe-inline' ; img-src 'self' https://seccdn.libravatar.org/ ; script-src https://feeding.cloud.geek.nz/ikiwiki/ https://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ http://xfdug5vmfi6oh42fp6ahhrqdjcf7ysqat6fkp5dhvde4d7vlkqixrsad.onion/ikiwiki/ 'unsafe-inline' 'sha256-pA8FbKo4pYLWPDH2YMPqcPMBzbjH/RYj0HlNAHYoYT0=' 'sha256-Kn5E/7OLXYSq+EKMhEBGJMyU6bREA9E8Av9FjqbpGKk=' 'sha256-/BTNlczeBxXOoPvhwvE1ftmxwg9z+WIBJtpk3qe7Pqo=' ; base-uri 'self'; form-action 'self' ; frame-ancestors 'self'"
Header set X-Frame-Options: "SAMEORIGIN"
Header set Referrer-Policy: "same-origin"
Header set X-Content-Type-Options: "nosniff"
Note that the Mozilla Observatory is mistakenly identifying HTTP onion services as insecure, so you can ignore that failure. I also used the Mozilla TLS config generator to improve the TLS config for my server. Then I added security.txt and gpc.json to the root of my git repo and then added the following aliases to put these files in the right place:
Alias /.well-known/gpc.json /var/www/blog/gpc.json
Alias /.well-known/security.txt /var/www/blog/security.txt
I also followed these instructions to create a sitemap for my blog with the following alias:
Alias /sitemap.xml /var/www/blog/sitemap/index.rss
Finally, I simplified a few error pages to save bandwidth:
ErrorDocument 301 " "
ErrorDocument 302 " "
ErrorDocument 404 "Not Found"

Monitoring 404s Another advantage of running my own web server is that I can monitor the 404s easily using logcheck by putting the following in /etc/logcheck/logcheck.logfiles:
/var/log/apache2/blog-error.log 
Based on that, I added a few redirects to point bots and users to the location of my RSS feed:
Redirect permanent /atom /index.atom
Redirect permanent /comments.rss /comments/index.rss
Redirect permanent /comments.atom /comments/index.atom
Redirect permanent /FeedingTheCloud /index.rss
Redirect permanent /feed /index.rss
Redirect permanent /feed/ /index.rss
Redirect permanent /feeds/posts/default /index.rss
Redirect permanent /rss /index.rss
Redirect permanent /rss/ /index.rss
and to tell them to stop trying to fetch obsolete resources:
Redirect gone /~ff/FeedingTheCloud
Redirect gone /gittip_button.png
Redirect gone /ikiwiki.cgi
I also used these 404s to discover a few old Feedburner URLs that I could redirect to the right place using archive.org:
Redirect permanent /feeds/1572545745827565861/comments/default /posts/watch-all-of-your-logs-using-monkeytail/comments.atom
Redirect permanent /feeds/1582328597404141220/comments/default /posts/news-feeds-rssatom-for-mythtvorg-and/comments.atom
...
Redirect permanent /feeds/8490436852808833136/comments/default /posts/recovering-lost-git-commits/comments.atom
Redirect permanent /feeds/963415010433858516/comments/default /posts/debugging-openwrt-routers-by-shipping/comments.atom
I also put the following robots.txt in the git repo in order to stop a bunch of authentication errors coming from crawlers:
User-agent: *
Disallow: /blog.cgi
Disallow: /ikiwiki.cgi

Future improvements There are a few things I'd like to improve on my current setup. The first one is to remove the iwikihosting and gitpush plugins and replace them with a small script which would simply git push to the read-only GitHub mirror. Then I could uninstall the ikiwiki-hosting-common and ikiwiki-hosting-web since that's all I use them for. Next, I would like to have proper support for signed git pushes. At the moment, I have the following in /home/blog/source.git/config:
[receive]
    advertisePushOptions = true
    certNonceSeed = "(random string)"
but I'd like to also reject unsigned pushes. While my blog now has a CSP policy which doesn't rely on unsafe-inline for scripts, it does rely on unsafe-inline for stylesheets. I tried to remove this but the actual calls to allow seemed to be located deep within jQuery and so I gave up. Update: now fixed. Finally, I'd like to figure out a good way to deal with articles which don't currently have comments. At the moment, if you try to subscribe to their comment feed, it returns a 404. For example:
[Sun Jun 06 17:43:12.336350 2021] [core:info] [pid 30591:tid 140253834704640] [client 66.249.66.70:57381] AH00128: File does not exist: /var/www/blog/posts/using-iptables-with-network-manager/comments.atom
This is obviously not ideal since many feed readers will refuse to add a feed which is currently not found even though it could become real in the future. If you know of a way to fix this, please let me know.

7 June 2021

Russ Allbery: Review: Stoneskin

Review: Stoneskin, by K.B. Spangler
Series: Deep Witches #0
Publisher: A Girl and Her Fed Books
Copyright: September 2017
ASIN: B075PHK498
Format: Kindle
Pages: 226
Stoneskin is a prequel to the Deep Witches Trilogy, which is why I have it marked as book 0 of the series. Unlike most prequels, it was written and published before the series and there doesn't seem to be any reason not to read it first. Tembi Moon is an eight-year-old girl from the poor Marumaru area on the planet of Adhama. Humanity has spread to the stars and first terraformed the worlds and then bioformed themselves to live there. The differences are subtle, but Tembi's skin becomes thicker and less sensitive when damaged (either physically or emotionally) and she can close her ears against dust storms. One day, she wakes up in an unknown alley and finds herself on the world of Miha'ana, sixteen thousand light-years away, where she is rescued and brought home by a Witch named Matindi. In this science fiction future, nearly all interstellar travel is done through the Deep. The Deep is not like the typical hand-waved science fiction subspace, most notably in that it's alive. No one is entirely sure where it came from or what sort of creature it is. It sometimes manages to communicate in words, but abstract patterns with feelings attached are more common, and it only communicates with specific people. Those people are Witches, who are chosen by the Deep via some criteria no one understands. Witches can use the Deep to move themselves or anything else around the galaxy. All interstellar logistics rely on them. The basics of Tembi's story are not that unusual; she's been chosen by the Deep to be a Witch. What is remarkable is that she's young and she's poor, completely disconnected from the power structures of the galaxy. But, once chosen, her path as far as the rest of the galaxy is concerned is fixed: she will go to Lancaster to be trained as a Witch. Matindi is able to postpone this for a time by keeping an eye on her, but not forever. I bought this book because of the idea of the Deep, and that is indeed the best part of the book. There is a lot of mystery about its exact nature, most of which is not resolved in this book, but it mostly behaves like a giant and extremely strange dog, and it's awesome. Replacing the various pseudo-scientific explanations for faster than light travel with interactions with a dream-weird giant St. Bernard with too many paws that talks in swirls of colored bubbles and is very eager to please its friends is brilliant. This book covers a lot of years of Tembi's life and is, as advertised, a prelude to a story that is not resolved here. It's a coming of age story in which she does eventually end up at Lancaster, learns and chafes at the elaborate and very conservative structures humans have put in place to try to make interactions with the Deep predictable and reliable, and eventually gets drawn into the politics of war and the question of when people have a responsibility to intervene. Tembi, and the reader, also have many opportunities to get extremely upset at how the Deep is treated and how much entitlement the Witches have about their access and control, although how the Deep feels about it is left for a future book. Not all of this story is as good as the premise. There are some standard coming of age tropes that I'm not fond of, such as Tembi's predictable temporary falling out with the Deep (although the Deep's reaction is entertaining). It's also not at all a complete story, although that's clearly signaled by the subtitle. But as an introduction to the story universe and an extended bit of scene-setting, it accomplishes what it sets out to do. It's also satisfyingly thoughtful about the moral trade-offs around stability and the value of preserving institutions. I know which side I'm on within the universe, but I appreciated how much nuance and thoughtfulness Spangler puts into the contrary opinion. I'm hooked on the universe and want to learn more about the Deep, enough so that I've already bought the first book of the main trilogy. Followed by The Blackwing War. Rating: 7 out of 10

6 June 2021

Evgeni Golov: Controlling Somfy roller shutters using an ESP32 and ESPHome

Our house has solar powered, remote controllable roller shutters on the roof windows, built by the German company named HEIM & HAUS. However, when you look closely at the remote control or the shutter motor, you'll see another brand name: SIMU. As the shutters don't have any wiring inside the house, the only way to control them is via the remote interface. So let's go on the Internet and see how one can do that, shall we? ;) First thing we learn is that SIMU remote stuff is just re-branded Somfy. Great, another name! Looking further we find that Somfy uses some obscure protocol to prevent (replay) attacks (spoiler: it doesn't!) and there are tools for RTL-SDR and Arduino available. That's perfect! Always sniff with RTL-SDR first! Given the two re-brandings in the supply chain, I wasn't 100% sure our shutters really use the same protocol. So the first "hack" was to listen and decrypt the communication using RTL-SDR:
$ git clone https://github.com/dimhoff/radio_stuff
$ cd radio_stuff
$ make -C converters am_to_ook
$ make -C decoders decode_somfy
$ rtl_fm -M am -f 433.42M -s 270K   ./am_to_ook -d 10 -t 2000 -    ./decode_somfy
<press some buttons on the remote>
The output contains the buttons I pressed, but also the id of the remote and the command counter (which is supposed to prevent replay attacks). At this point I could just use the id and the counter to send own commands, but if I'd do that too often, the real remote would stop working, as its counter won't increase and the receiver will drop the commands when the counters differ too much. But that's good enough for now. I know I'm looking for the right protocol at the right frequency. As the end result should be an ESP32, let's move on! Acquiring the right hardware Contrary to an RTL-SDR, one usually does not have a spare ESP32 with 433MHz radio at home, so I went shopping: a NodeMCU-32S clone and a CC1101. The CC1101 is important as most 433MHz chips for Arduino/ESP only work at 433.92MHz, but Somfy uses 433.42MHz and using the wrong frequency would result in really bad reception. The CC1101 is essentially an SDR, as you can tune it to a huge spectrum of frequencies. Oh and we need some cables, a bread board, the usual stuff ;) The wiring is rather simple: ESP32 wiring for a CC1101 And the end result isn't too beautiful either, but it works: ESP32 and CC1101 in a simple case Acquiring the right software In my initial research I found an Arduino sketch and was totally prepared to port it to ESP32, but luckily somebody already did that for me! Even better, it's explicitly using the CC1101. Okay, okay, I cheated, I actually ordered the hardware after I found this port and the reference to CC1101. ;) As I am using ESPHome for my ESPs, the idea was to add a "Cover" that's controlling the shutters to it. Writing some C++, how hard can it be? Turns out, not that hard. You can see the code in my GitHub repo. It consists of two (relevant) files: somfy_cover.h and somfy.yaml. somfy_cover.h essentially wraps the communication with the Somfy_Remote_Lib library into an almost boilerplate Custom Cover for ESPHome. There is nothing too fancy in there. The only real difference to the "Custom Cover" example from the documentation is the split into SomfyESPRemote (which inherits from Component) and SomfyESPCover (which inherits from Cover) -- this is taken from the Custom Sensor documentation and allows me to define one "remote" that controls multiple "covers" using the add_cover function. The first two params of the function are the NVS name and key (think database table and row), and the third is the rolling code of the remote (stored in somfy_secrets.h, which is not in Git). In ESPHome a Cover shall define its properties as CoverTraits. Here we call set_is_assumed_state(true), as we don't know the state of the shutters - they could have been controlled using the other (real) remote - and setting this to true allows issuing open/close commands at all times. We also call set_supports_position(false) as we can't tell the shutters to move to a specific position. The one additional feature compared to a normal Cover interface is the program function, which allows to call the "program" command so that the shutters can learn a new remote. somfy.yaml is the ESPHome "configuration", which contains information about the used hardware, WiFi credentials etc. Again, mostly boilerplate. The interesting parts are the loading of the additional libraries and attaching the custom component with multiple covers and the additional PROG switches:
esphome:
  name: somfy
  platform: ESP32
  board: nodemcu-32s
  libraries:
    - SmartRC-CC1101-Driver-Lib@2.5.6
    - Somfy_Remote_Lib@0.4.0
    - EEPROM
  includes:
    - somfy_secrets.h
    - somfy_cover.h
 
cover:
  - platform: custom
    lambda:  -
      auto somfy_remote = new SomfyESPRemote();
      somfy_remote->add_cover("somfy", "badezimmer", SOMFY_REMOTE_BADEZIMMER);
      somfy_remote->add_cover("somfy", "kinderzimmer", SOMFY_REMOTE_KINDERZIMMER);
      App.register_component(somfy_remote);
      return somfy_remote->covers;
    covers:
      - id: "somfy"
        name: "Somfy Cover"
      - id: "somfy2"
        name: "Somfy Cover2"
switch:
  - platform: template
    name: "PROG"
    turn_on_action:
      - lambda:  -
          ((SomfyESPCover*)id(somfy))->program();
  - platform: template
    name: "PROG2"
    turn_on_action:
      - lambda:  -
          ((SomfyESPCover*)id(somfy2))->program();
The switch to trigger the program mode took me a bit. As the Cover interface of ESPHome does not offer any additional functions besides movement control, I first wrote code to trigger "program" when "stop" was pressed three times in a row, but that felt really cumbersome and also had the side effect that the remote would send more than needed, sometimes confusing the shutters. I then decided to have a separate button (well, switch) for that, but the compiler yelled at me I can't call program on a Cover as it does not have such a function. Turns out, you need to explicitly cast to SomfyESPCover and then it works, even if the code becomes really readable, NOT. Oh and as the switch does not have any code to actually change/report state, it effectively acts as a button that can be pressed. At this point we can just take an existing remote, press PROG for 5 seconds, see the blinds move shortly up and down a bit and press PROG on our new ESP32 remote and the shutters will learn the new remote. And thanks to the awesome integration of ESPHome into HomeAssistant, this instantly shows up as a new controllable cover there too. Future Additional Work I started writing this post about a year ago And the initial implementation had some room for improvement More than one remote The initial code only created one remote and one cover element. Sure, we could attach that to all shutters (there are 4 of them), but we really want to be able to control them separately. Thankfully I managed to read enough ESPHome docs, and learned how to operate std::vector to make the code dynamically accept new shutters. Using ESP32's NVS The ESP32 has a non-volatile key-value storage which is much nicer than throwing bits at an emulated EEPROM. The first library I used for that explicitly used EEPROM storage and it would have required quite some hacking to make it work with NVS. Thankfully the library I am using now has a plugable storage interface, and I could just write the NVS backend myself and upstream now supports that. Yay open-source! Remaining issues Real state is unknown As noted above, the ESP does not know the real state of the shutters: a command could have been lost in transmission (the Somfy protocol is send-only, there is no feedback) or the shutters might have been controlled by another remote. At least the second part could be solved by listening all the time and trying to decode commands heard over the air, but I don't think this is worth the time -- worst that can happen is that a closed (opened) shutter receives another close (open) command and that is harmless as they have integrated endstops and know that they should not move further. Can't program new remotes with ESP only To program new remotes, one has to press the "PROG" button for 5 seconds. This was not exposed in the old library, but the new one does support "long press", I just would need to add another ugly switch to the config and I currently don't plan to do so, as I do have working remotes for the case I need to learn a new one.

2 June 2021

Matthew Garrett: Producing a trustworthy x86-based Linux appliance

Let's say you're building some form of appliance on top of general purpose x86 hardware. You want to be able to verify the software it's running hasn't been tampered with. What's the best approach with existing technology?

Let's split this into two separate problems. The first is to do as much as we can to ensure that the software can't be modified without our consent[1]. This requires that each component in the boot chain verify that the next component is legitimate. We call the first component in this chain the root of trust, and in the x86 world this is the system firmware[2]. This firmware is responsible for verifying the bootloader, and the easiest way to do this on x86 is to use UEFI Secure Boot. In this setup the firmware contains a set of trusted signing certificates and will only boot executables with a chain of trust to one of these certificates. Switching the system into setup mode from the firmware menu will allow you to remove the existing keys and install new ones.

(Note: You shouldn't use the trusted certificate directly for signing bootloaders - instead, the trusted certificate should be used to sign another certificate and the key for that certificate used to sign your bootloader. This way, if you ever need to revoke the signing certificate, you can simply sign a new one with the trusted parent and push out a revocation update instead of having to provision new keys)

But what do you want to sign? In the general purpose Linux world, we use an intermediate bootloader called Shim to bridge from the Microsoft signing authority to a distribution one. Shim then verifies the signature on grub, and grub in turn verifies the signature on the kernel. This is a large body of code that exists because of the use cases that general purpose distributions need to support - primarily, booting on arbitrary off the shelf hardware, and allowing arbitrary and complicated boot setups. This is unnecessary in the appliance case, where the hardware target can be well defined, where there's no need for interoperability with the Microsoft signing authority, and where the boot configuration can be extremely static.

We can skip all of this complexity using systemd-boot's unified Linux image support. This has the format described here, but the short version is that it's simply a kernel and initramfs linked into a small EFI executable that will run them. Instructions for generating such an image are here, and if you follow them you'll end up with a single static image that can be directly executed by the firmware. Signing this avoids dealing with a whole host of problems associated with relying on shim and grub, but note that you'll be embedding the initramfs as well. Again, this should be fine for appliance use-cases, but you'll need your build system to support building the initramfs at image creation time rather than relying on it being generated on the host.

At this point we have a single image that can be verified by the firmware and will get us to the point of a running kernel and initramfs. Unless you've got enough RAM that you can put your entire workload in the initramfs, you're going to want a filesystem as well, and you're going to want to verify that that filesystem hasn't been tampered with. The easiest approach to this is to use dm-verity, a device-mapper layer that uses a hash tree to verify that the filesystem contents haven't been modified. The kernel needs to know what the root hash is, so this can either be embedded into your initramfs image or into the kernel command line. Either way, it'll end up in the signed boot image, so nobody will be able to tamper with it.

It's important to note that a dm-verity partition is read-only - the kernel doesn't have the cryptographic secret that would be required to generate a new hash tree if the partition is modified. So if you require the ability to write data or logs anywhere, you'll need to add a new partition for that. If this partition is unencrypted, an attacker with access to the device will be able to put whatever they want on there. You should treat any data you read from there as untrusted, and ensure that it's validated before use (ie, don't just feed it to a random parser written in C and expect that everything's going to be ok). On the other hand, if it's encrypted, remember that you can't just put the encryption key in the boot image - an attacker with access to the device is going to be able to dump that and extract it. You'll probably want to use a TPM-sealed encryption secret, which will be discussed later on.

At this point everything in the boot process is cryptographically verified, and so should be difficult to tamper with. Unfortunately this isn't really sufficient - on x86 systems there's typically no verification of the integrity of the secure boot database. An attacker with physical access to the system could attach a programmer directly to the firmware flash and rewrite the secure boot database to include keys they control. They could then replace the boot image with one that they've signed, and the machine would happily boot code that the attacker controlled. We need to be able to demonstrate that the system booted using the correct secure boot keys, and the only way we can do that is to use the TPM.

I wrote an introduction to TPMs a while back. The important thing to know here is that the TPM contains a set of Platform Configuration Registers that are large enough to contain a cryptographic hash. During boot, each component of the boot process will generate a "measurement" of other security critical components, including the next component to be booted. These measurements are a representation of the data in question - they may simply be a hash of the object being measured, or the hash of a structure containing various pieces of metadata. Each measurement is passed to the TPM, along with the PCR it should be measured into. The TPM takes the new measurement, appends it to the existing value, and then stores the hash of this concatenated data in the PCR. This means that the final PCR value depends not only on the measurement, but also on every previous measurement. Without breaking the hash algorithm, there's no way to set the PCR to an arbitrary value. The hash values and some associated data are stored in a log that's kept in system RAM, which we'll come back to later.

Different PCRs store different pieces of information, but the one that's most interesting to us is PCR 7. Its use is documented in the TCG PC Client Platform Firmware Profile (section 3.3.4.8), but the short version is that the firmware will measure the secure boot keys that are used to boot the system. If the secure boot keys are altered (such as by an attacker flashing new ones), the PCR 7 value will change.

What can we do with this? There's a couple of choices. For devices that are online, we can perform remote attestation, a process where the device can provide a signed copy of the PCR values to another system. If the system also provides a copy of the TPM event log, the individual events in the log can be replayed in the same way that the TPM would use to calculate the PCR values, and then compared to the actual PCR values. If they match, that implies that the log values are correct, and we can then analyse individual log entries to make assumptions about system state. If a device has been tampered with, the PCR 7 values and associated log entries won't match the expected values, and we can detect the tampering.

If a device is offline, or if there's a need to permit local verification of the device state, we still have options. First, we can perform remote attestation to a local device. I demonstrated doing this over Bluetooth at LCA back in 2020. Alternatively, we can take advantage of other TPM features. TPMs can be configured to store secrets or keys in a way that renders them inaccessible unless a chosen set of PCRs have specific values. This is used in tpm2-totp, which uses a secret stored in the TPM to generate a TOTP value. If the same secret is enrolled in any standard TOTP app, the value generated by the machine can be compared to the value in the app. If they match, the PCR values the secret was sealed to are unmodified. If they don't, or if no numbers are generated at all, that demonstrates that PCR 7 is no longer the same value, and that the system has been tampered with.

Unfortunately, TOTP requires that both sides have possession of the same secret. This is fine when a user is making that association themselves, but works less well if you need some way to ship the secret on a machine and then separately ship the secret to a user. If the user can simply download the secret via some API, so can an attacker. If an attacker has the secret, they can modify the secure boot database and re-seal the secret to the new PCR 7 value. That means having to add some form of authentication, along with a strong binding of machine serial number to a user (in order to avoid someone with valid credentials simply downloading all the secrets).

Instead, we probably want some mechanism that uses asymmetric cryptography. A keypair can be generated on the TPM, which will refuse to release an unencrypted copy of the private key. The public key, however, can be exported and stored. If it's acceptable for a verification app to connect to the internet then the public key can simply be obtained that way - if not, a certificate can be issued to the key, and this exposed to the verifier via a QR code. The app then verifies that the certificate is signed by the vendor, and if so extracts the public key from that. The private key can have an associated policy that only permits its use when PCR 7 has an appropriate value, so the app then generates a nonce and asks the user to type that into the device. The device generates a signature over that nonce and displays that as a QR code. The app verifies the signature matches, and can then assert that PCR 7 has the expected value.

Once we can assert that PCR 7 has the expected value, we can assert that the system booted something signed by us and thus infer that the rest of the boot chain is also secure. But this is still dependent on the TPM obtaining trustworthy information, and unfortunately the bus that the TPM sits on isn't really terribly secure (TPM Genie is an example of an interposer for i2c-connected TPMs, but there's no reason an LPC one can't be constructed to attack the sort usually used on PCs). TPMs do support encrypted communication channels, but bootstrapping those isn't straightforward without firmware support. The easiest way around this is to make use of a firmware-based TPM, where the TPM is implemented in software running on an ancillary controller. Intel's solution is part of their Platform Trust Technology and runs on the Management Engine, AMD run it on the Platform Security Processor. In both cases it's not terribly feasible to intercept the communications, so we avoid this attack. The downside is that we're then placing more trust in components that are running much more code than a TPM would and which have a correspondingly larger attack surface. Which is preferable is going to depend on your threat model.

Most of this should be achievable using Yocto, which now has support for dm-verity built in. It's almost certainly going to be easier using this than trying to base on top of a general purpose distribution. I'd love to see this become a largely push button receive secure image process, so might take a go at that if I have some free time in the near future.

[1] Obviously technologies that can be used to ensure nobody other than me is able to modify the software on devices I own can also be used to ensure that nobody other than the manufacturer is able to modify the software on devices that they sell to third parties. There's no real technological solution to this problem, but we shouldn't allow the fact that a technology can be used in ways that are hostile to user freedom to cause us to reject that technology outright.
[2] This is slightly complicated due to the interactions with the Management Engine (on Intel) or the Platform Security Processor (on AMD). Here's a good writeup on the Intel side of things.

comment count unavailable comments

6 May 2021

Matthew Garrett: More doorbell adventures

Back in my last post on this topic, I'd got shell on my doorbell but hadn't figured out why the HTTP callbacks weren't always firing. I still haven't, but I have learned some more things.

Doorbird sell a chime, a network connected device that is signalled by the doorbell when someone pushes a button. It costs about $150, which seems excessive, but would solve my problem (ie, that if someone pushes the doorbell and I'm not paying attention to my phone, I miss it entirely). But given a shell on the doorbell, how hard could it be to figure out how to mimic the behaviour of one?

Configuration for the doorbell is all stored under /mnt/flash, and there's a bunch of files prefixed 1000eyes that contain config (1000eyes is the German company that seems to be behind Doorbird). One of these was called 1000eyes.peripherals, which seemed like a good starting point. The initial contents were "Peripherals":[] , so it seemed likely that it was intended to be JSON. Unfortunately, since I had no access to any of the peripherals, I had no idea what the format was. I threw the main application into Ghidra and found a function that had debug statements referencing "initPeripherals and read a bunch of JSON keys out of the file, so I could simply look at the keys it referenced and write out a file based on that. I did so, and it didn't work - the app stubbornly refused to believe that there were any defined peripherals. The check that was failing was pcVar4 = strstr(local_50[0],PTR_s_"type":"_0007c980);, which made no sense, since I very definitely had a type key in there. And then I read it more closely. strstr() wasn't being asked to look for "type":, it was being asked to look for "type":". I'd left a space between the : and the opening " in the value, which meant it wasn't matching. The rest of the function seems to call an actual JSON parser, so I have no idea why it doesn't just use that for this part as well, but deleting the space and restarting the service meant it now believed I had a peripheral attached.

The mobile app that's used for configuring the doorbell now showed a device in the peripherals tab, but it had a weird corrupted name. Tapping it resulted in an error telling me that the device was unavailable, and on the doorbell itself generated a log message showing it was trying to reach a device with the hostname bha-04f0212c5cca and (unsurprisingly) failing. The hostname was being generated from the MAC address field in the peripherals file and was presumably supposed to be resolved using mDNS, but for now I just threw a static entry in /etc/hosts pointing at my Home Assistant device. That was enough to show that when I opened the app the doorbell was trying to call a CGI script called peripherals.cgi on my fake chime. When that failed, it called out to the cloud API to ask it to ask the chime[1] instead. Since the cloud was completely unaware of my fake device, this didn't work either. I hacked together a simple server using Python's HTTPServer and was able to return data (another block of JSON). This got me to the point where the app would now let me get to the chime config, but would then immediately exit. adb logcat showed a traceback in the app caused by a failed assertion due to a missing key in the JSON, so I ran the app through jadx, found the assertion and from there figured out what keys I needed. Once that was done, the app opened the config page just fine.

Unfortunately, though, I couldn't edit the config. Whenever I hit "save" the app would tell me that the peripheral wasn't responding. This was strange, since the doorbell wasn't even trying to hit my fake chime. It turned out that the app was making a CGI call to the doorbell, and the thread handling that call was segfaulting just after reading the peripheral config file. This suggested that the format of my JSON was probably wrong and that the doorbell was not handling that gracefully, but trying to figure out what the format should actually be didn't seem easy and none of my attempts improved things.

So, new approach. Rather than writing the config myself, why not let the doorbell do it? I should be able to use the genuine pairing process if I could mimic the chime sufficiently well. Hitting the "add" button in the app asked me for the username and password for the chime, so I typed in something random in the expected format (six characters followed by four zeroes) and a sufficiently long password and hit ok. A few seconds later it told me it couldn't find the device, which wasn't unexpected. What was a little more unexpected was that the log on the doorbell was showing it trying to hit another bha-prefixed hostname (and, obviously, failing). The hostname contains the MAC address, but I hadn't told the doorbell the MAC address of the chime, just its username. Some more digging showed that the doorbell was calling out to the cloud API, giving it the 6 character prefix from the username and getting a MAC address back. Doing the same myself revealed that there was a straightforward mapping from the prefix to the mac address - changing the final character from "a" to "b" incremented the MAC by one. It's actually just a base 26 encoding of the MAC, with aaaaaa translating to 00408C000000.

That explained how the hostname was being generated, and in return I was able to work backwards to figure out which username I should use to generate the hostname I was already using. Attempting to add it now resulted in the doorbell making another CGI call to my fake chime in order to query its feature set, and by mocking that up as well I was able to send back a file containing X-Intercom-Type, X-Intercom-TypeId and X-Intercom-Class fields that made the doorbell happy. I now had a valid JSON file, which cleared up a couple of mysteries. The corrupt name was because the name field isn't supposed to be ASCII - it's base64 encoded UTF16-BE. And the reason I hadn't been able to figure out the JSON format correctly was because it looked something like this:

"Peripherals":[] "prefix": "type":"DoorChime","name":"AEQAbwBvAHIAYwBoAGkAbQBlACAAVABlAHMAdA==","mac":"04f0212c5cca","user":"username","password":"password" ]


Note that there's a total of one [ in this file, but two ]s? Awesome. Anyway, I could now modify the config in the app and hit save, and the doorbell would then call out to my fake chime to push config to it. Weirdly, the association between the chime and a specific button on the doorbell is only stored on the chime, not on the doorbell. Further, hitting the doorbell didn't result in any more HTTP traffic to my fake chime. However, it did result in some broadcast UDP traffic being generated. Searching for the port number led me to the Doorbird LAN API and a complete description of the format and encryption mechanism in use. Argon2I is used to turn the first five characters of the chime's password (which is also stored on the doorbell itself) into a 256-bit key, and this is used with ChaCha20 to decrypt the payload. The payload then contains a six character field describing the device sending the event, and then another field describing the event itself. Some more scrappy Python and I could pick up these packets and decrypt them, which showed that they were being sent whenever any event occurred on the doorbell. This explained why there was no storage of the button/chime association on the doorbell itself - the doorbell sends packets for all events, and the chime is responsible for deciding whether to act on them or not.

On closer examination, it turns out that these packets aren't just sent if there's a configured chime. One is sent for each configured user, avoiding the need for a cloud round trip if your phone is on the same network as the doorbell at the time. There was literally no need for me to mimic the chime at all, suitable events were already being sent.

Still. There's a fair amount of WTFery here, ranging from the strstr() based JSON parsing, the invalid JSON, the symmetric encryption that uses device passwords as the key (requiring the doorbell to be aware of the chime's password) and the use of only the first five characters of the password as input to the KDF. It doesn't give me a great deal of confidence in the rest of the device's security, so I'm going to keep playing.

[1] This seems to be to handle the case where the chime isn't on the same network as the doorbell

comment count unavailable comments

19 April 2021

Ritesh Raj Sarraf: Catching Up Your Sources

I ve mostly had the preference of controlling my data rather than depend on someone else. That s one reason why I still believe email to be my most reliable medium for data storage, one that is not plagued/locked by a single entity. If I had the resources, I d prefer all digital data to be broken down to its simplest form for storage, like email format, and empower the user with it i.e. their data. Yes, there are free services that are indirectly forced upon common users, and many of us get attracted to it. Many of us do not think that the information, which is shared for the free service in return, is of much importance. Which may be fair, depending on the individual, given that they get certain services without paying any direct dime.

New age communication So first, we had email and usenet. As I mentioned above, email was designed with fine intentions. Intentions that make it stand even today, independently. But not everything, back then, was that great either. For example, instant messaging was very closed and centralised then too. Things like: ICQ, MSN, Yahoo Messenger; all were centralized. I wonder if people still have access to their ICQ logs. Not much has chagned in the current day either. We now have domination by: Facebook Messenger, Google Whatever the new marketing term they introdcue, WhatsApp, Telegram, Signal etc. To my knowledge, they are all centralized. Over all this time, I m yet to see a product come up with good (business) intentions, to really empower the end user. In this information age, the most invaluable data is user activity. That s one data everyone is after. If you decline to share that bit of free data in exchange for the free services, mind you, that that free service like Facebook, Google, Instagram, WhatsApp, Truecaller, Twitter; none of it would come to you at all. Try it out. So the reality is that while you may not be valuating the data you offer in exchange correctly, there s a lot that is reaped from it. But still, I think each user has (and should have) the freedom to opt in for these tech giants and give them their personal bit, for free services in return. That is a decent barter deal. And it is a choice that one if free to choose

Retaining my data I m fond of keeping an archive folder in my mailbox. A folder that holds significant events in the form of an email usually, if documented. Over the years, I chose to resort to the email format because I felt it was more reliable in the longer term than any other formats. The next best would be plain text. In my lifetime, I have learnt a lot from the internet; so it is natural that my preference has been with it. Mailing Lists, IRCs, HOWTOs, Guides, Blog posts; all have helped. And over the years, I ve come across hundreds of such content that I d always like to preserve. Now there are multiple ways to preserving data. Like, for example, big tech giants. In most usual cases, your data for your lifetime, should be fine with a tech giant. In some odd scenarios, you may be unlucky if you relied on a service provider that went bankrupt. But seriously, I think users should be fine if they host their data with Microsoft, Google etc; as long as they abide by their policies. There s also the catch of alignment. As the user, you should ensure to align (and transition) with the product offerings of your service provider. Otherwise, what may look constant and always reliable, will vanish in the blink of an eye. I guess Google Plus would be a good example. There was some Google Feed service too. Maybe Google Photos in the near decade future, just like Google Picasa in the previous (or current) decade.

History what is On the topic of retaining information, lets take a small drift. I still admire our ancestors. I don t know what went in their mind when they were documenting events in the form of scriptures, carvings, temples, churches, mosques etc; but one things for sure, they were able to leave a fine means of communication. They are all gone but a large number of those events are evident through the creations that they left. Some of those events have been strong enough that further rulers/invaders have had tough times trying to wipe it out from history. Remember, history is usually not the truth, but the statement to be believed by the teller. And the teller is usually the survivor, or the winner you may call. But still, the information retention techniques were better. I haven t visited, but admire whosoever built the Kailasa Temple, Ellora, without which, we d be made to believe what not by all invaders and rulers of the region. But the majestic standing of the temple, is a fine example of the history and the events that have occured in the past.
Ellora Temple -  The majectic carving believed to be carved out of a single stone
Ellora Temple - The majectic carving believed to be carved out of a single stone
Dominance has the power to rewrite history and unfortunately that s true and it has done its part. It is just that in a mere human s defined lifetime, it is not possible to witness the transtion from current to history, and say that I was there then and I m here now, and this is not the reality. And if not dominance, there s always the other bit, hearsay. With it, you can always put anything up for dispute. Because there s no way one can go back in time and produce a fine evidence. There s also a part about religion. Religion can be highly sentimental. And religion can be a solid way to get an agenda going. For example, in India - a country which today consitutionally is a secular country, there have been multiple attempts to discard the belief, that never ever did the thing called Ramayana exist. That the Rama Setu, nicely reworded as Adam s Bridge by who so ever, is a mere result of science. Now Rama, or Hanumana, or Ravana, or Valmiki, aren t going to come over and prove that that is true or false. So such subjects serve as a solid base to get an agenda going. And probably we ve even succeeded in proving and believing that there was never an event like Ramayana or the Mahabharata. Nor was there ever any empire other than the Moghul or the British Empire. But yes, whosoever made the Ellora Temple or the many many more of such creations, did a fine job of making a dent for the future, to know of what the history possibly could also be.

Enough of the drift So, in my opinion, having events documented is important. It d be nice to have skills documented too so that it can be passed over generations but that s a debatable topic. But events, I believe should be documented. And documented in the best possible ways so that its existence is not diminished. A documentation in the form of certain carvings on a rock is far more better than links and posts shared on Facebook, Twitter, Reddit etc. For one, these are all corporate entities with vested interests and can pretext excuse in the name of compliance and conformance. So, for the helpless state and generation I am in, I felt email was the best possible independent form of data retention in today s age. If I really had the resources, I d not rely on digital age. This age has no guarantee of retaining and recording information in any reliable manner. Instead, it is just mostly junk, which is manipulative and changeable, conditionally.

Email and RSS So for my communication, I like to prefer emails over any other means. That doesn t mean I don t use the current trends. I do. But this blog is mostly about penning my desires. And desire be to have communication over email format. Such is the case that for information useful over the internet, I crave to have it formatted in email for archival. RSS feeds is my most common mode of keeping track of information I care about. Not all that I care for is available in RSS feeds but hey such is life. And adaptability is okay. But my preference is still RSS. So I use RSS feeds through a fine software called feed2imap. A software that fits my bill fairly well. feed2imap is:
  • An rss feed news aggregator
  • Pulls and extracts news feeds in the form of an email
  • Can push the converted email over pop/imap
  • Can convert all image content to email mime attachment
In a gist, it makes the online content available to me offline in the most admired email format In my mail box, in today s day, my preferred email client is Evolution. It does a good job of dealing with such emails (rss feed items). An image example of accessing the rrs feed item through it is below
RSS News Item through Evolution
RSS News Item through Evolution
The good part is that my actual data is always independent of such MUAs. Tomorrow, as technology - trends - economics evolve, something new would come as a replacement but my data would still be mine.

9 April 2021

Michael Prokop: A Ceph war story

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them. At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it s worth sharing those with others. But first things first, let s step back and clarify what we had to deal with. The system and its upgrade One part of the upgrade included 3 Debian servers (we re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster. First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration. During the first two servers reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors clocks aren t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:
% sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       65.44138 root default
-2       21.81310     host server1
 0   hdd  1.08989         osd.0    down  1.00000 1.00000
 1   hdd  1.08989         osd.1    down  1.00000 1.00000
 2   hdd  1.63539         osd.2    down  1.00000 1.00000
 3   hdd  1.63539         osd.3    down  1.00000 1.00000
 4   hdd  1.63539         osd.4    down  1.00000 1.00000
 5   hdd  1.63539         osd.5    down  1.00000 1.00000
18   hdd  2.18279         osd.18   down  1.00000 1.00000
20   hdd  2.18179         osd.20   down  1.00000 1.00000
28   hdd  2.18179         osd.28   down  1.00000 1.00000
29   hdd  2.18179         osd.29   down  1.00000 1.00000
30   hdd  2.18179         osd.30   down  1.00000 1.00000
31   hdd  2.18179         osd.31   down  1.00000 1.00000
-4       21.81409     host server2
 6   hdd  1.08989         osd.6    down  1.00000 1.00000
 7   hdd  1.08989         osd.7    down  1.00000 1.00000
 8   hdd  1.63539         osd.8    down  1.00000 1.00000
 9   hdd  1.63539         osd.9    down  1.00000 1.00000
10   hdd  1.63539         osd.10   down  1.00000 1.00000
11   hdd  1.63539         osd.11   down  1.00000 1.00000
19   hdd  2.18179         osd.19     up  1.00000 1.00000
21   hdd  2.18279         osd.21     up  1.00000 1.00000
22   hdd  2.18279         osd.22     up  1.00000 1.00000
32   hdd  2.18179         osd.32   down  1.00000 1.00000
33   hdd  2.18179         osd.33   down  1.00000 1.00000
34   hdd  2.18179         osd.34   down  1.00000 1.00000
-3       21.81419     host server3
12   hdd  1.08989         osd.12   down  1.00000 1.00000
13   hdd  1.08989         osd.13   down  1.00000 1.00000
14   hdd  1.63539         osd.14   down  1.00000 1.00000
15   hdd  1.63539         osd.15   down  1.00000 1.00000
16   hdd  1.63539         osd.16   down  1.00000 1.00000
17   hdd  1.63539         osd.17   down  1.00000 1.00000
23   hdd  2.18190         osd.23   down  1.00000 1.00000
24   hdd  2.18279         osd.24   down  1.00000 1.00000
25   hdd  2.18279         osd.25   down  1.00000 1.00000
35   hdd  2.18179         osd.35   down  1.00000 1.00000
36   hdd  2.18179         osd.36   down  1.00000 1.00000
37   hdd  2.18179         osd.37   down  1.00000 1.00000
Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back? We stumbled upon this beauty in our logs:
kernel: [   73.697957] XFS (sdl1): SB stripe unit sanity check failed
kernel: [   73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff
kernel: [   73.698799] XFS (sdl1): Unmount and run xfs_repair
kernel: [   73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer:
kernel: [   73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
kernel: [   73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
kernel: [   73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8  bD+.."@..=..e...
kernel: [   73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
kernel: [   73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning.
ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5
ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32
kernel: [   73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
kernel: [   73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
kernel: [   73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
kernel: [   73.703373] XFS (sdl1): SB validate failed with error -117.
The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD. Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:
synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
cache_node_purge: refcount was 1, not zero (node=0x1d3c400)
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000
releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number
bad magic number
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Ouch. This very much looked related to the actual issue we re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using -d sunit=512 -d swidth=512 at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!). Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ceph --format json osd dump , like this for all our OSDs (Zsh syntax ftw!):
synpromika@server1 ~ % for f in  0..37  ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump   jq -r ".osds[]   select(.osd==$f)   .uuid")"
osd-0: 4568c300-ad83-4288-963e-badcd99bf54f
osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f
osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e
osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c
[...]
Identifying the corresponding raw device for each OSD UUID is possible via:
synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f"
synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"$ UUID "
/dev/sdc1
The OSD s key ID can be retrieved via:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph auth get osd."$ OSD_ID " -f json 2>/dev/null   jq -r '.[]   .key'
AQCKFpZdm0We[...]
Now we also need to identify the underlying block device:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph osd metadata osd."$ OSD_ID " -f json   jq -r '.bluestore_bdev_partition_path'    
/dev/sdc2
With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure. We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server. Thanks to this, we managed to get the Ceph cluster up and running again. We didn t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA! Root Cause Analysis So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:
synpromika@server1 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
synpromika@server2 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.21.0-0112
Firmware Version = 4.680.00-8489
synpromika@server3 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make? Now let s check firmware changelogs. For the 24.21.0-0097 release we found this:
- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889)
- xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)
Our XFS problem certainly was related to the controller s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:
WARN  server2.example.org  Mount options of /var/lib/ceph/osd/ceph-21      WARN - Missing: sunit=1024, Exceeding: sunit=512
We compared the new OSD 21 with an existing one (OSD 25 on server3):
synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota
Thanks to our documentation, we could compare execution logs of their creation:
% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log
-synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25
+synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21
[...]
-command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1
-meta-data=/dev/sdj1              isize=2048   agcount=4, agsize=6272 blks
[...]
+command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
+meta-data=/dev/sdi1              isize=2048   agcount=4, agsize=6336 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
-data     =                       bsize=4096   blocks=25088, imaxpct=25
-         =                       sunit=128    swidth=64 blks
+data     =                       bsize=4096   blocks=25344, imaxpct=25
+         =                       sunit=64     swidth=64 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=1608, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
[...]
So back then, we even tried to track this down but couldn t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam s razor, assuming the simplest explanation is usually the right one, so let s check the disk properties and see what differs:
synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
524288
262144
synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
262144
262144
See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:
synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk            
524288
synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size
524288
synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 
262144
synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size
262144
It doesn t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version? The XFS behaviour change Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left). Let s look at such a failing XFS partition with the Grml live system:
root@grml ~ # grml-version
grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24]
root@grml ~ # uname -a
Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux
root@grml ~ # grml-hostname grml-2020-06
Setting hostname to grml-2020-06: done
root@grml ~ # exec zsh
root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux
Desired=Unknown/Install/Remove/Purge/Hold
  Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 / Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 / Name           Version      Architecture Description
+++-==============-============-============-=========================================
ii  util-linux     2.35.2-4     amd64        miscellaneous system utilities
ii  xfsprogs       5.6.0-1+b2   amd64        Utilities for managing the XFS filesystem
There it s failing, no matter which mount option we try:
root@grml-2020-06 ~ # mount ./sdd1.dd /mnt
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
root@grml-2020-06 ~ # dmesg   tail -30
[...]
[   64.788640] XFS (loop1): SB stripe unit sanity check failed
[   64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff
[   64.788671] XFS (loop1): Unmount and run xfs_repair
[   64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   64.788679] XFS (loop1): SB validate failed with error -117.
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
32 root@grml-2020-06 ~ # dmesg   tail -1
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
32 root@grml-2020-06 ~ # dmesg   tail -14
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
[   80.751277] XFS (loop1): SB stripe unit sanity check failed
[   80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff 
[   80.751324] XFS (loop1): Unmount and run xfs_repair
[   80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   80.751338] XFS (loop1): SB validate failed with error -117.
Also xfs_repair doesn t help either:
root@grml-2020-06 ~ # xfs_info ./sdd1.dd
meta-data=./sdd1.dd              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
With the SB stripe unit sanity check failed message, we could easily track this down to the following commit fa4ca9c:
% git show fa4ca9c5574605d1e48b7e617705230a0640b6da   cat
commit fa4ca9c5574605d1e48b7e617705230a0640b6da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jun 5 10:06:16 2018 -0700
    
    xfs: catch bad stripe alignment configurations
    
    When stripe alignments are invalid, data alignment algorithms in the
    allocator may not work correctly. Ensure we catch superblocks with
    invalid stripe alignment setups at mount time. These data alignment
    mismatches are now detected at mount time like this:
    
    XFS (loop0): SB stripe unit sanity check failed
    XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
    XFS (loop0): Unmount and run xfs_repair
    XFS (loop0): First 128 bytes of corrupted metadata buffer:
    0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
    0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
    000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
    0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
    00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
    000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ... .4..........
    00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
    XFS (loop0): SB validate failed with error -117.
    
    And the mount fails.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c
index b5dca3c8c84d..c06b6fc92966 100644
--- fs/xfs/libxfs/xfs_sb.c
+++ fs/xfs/libxfs/xfs_sb.c
@@ -278,6 +278,22 @@ xfs_mount_validate_sb(
                return -EFSCORRUPTED;
         
        
+       if (sbp->sb_unit)  
+               if (!xfs_sb_version_hasdalign(sbp)  
+                   sbp->sb_unit > sbp->sb_width  
+                   (sbp->sb_width % sbp->sb_unit) != 0)  
+                       xfs_notice(mp, "SB stripe unit sanity check failed");
+                       return -EFSCORRUPTED;
+                 
+         else if (xfs_sb_version_hasdalign(sbp))   
+               xfs_notice(mp, "SB stripe alignment sanity check failed");
+               return -EFSCORRUPTED;
+         else if (sbp->sb_width)  
+               xfs_notice(mp, "SB stripe width sanity check failed");
+               return -EFSCORRUPTED;
+        
+
+       
        if (xfs_sb_version_hascrc(&mp->m_sb) &&
            sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE)  
                xfs_notice(mp, "v5 SB sanity check failed");
This change is included in kernel versions 4.18-rc1 and newer:
% git describe --contains fa4ca9c5574605d1e48
v4.18-rc1~37^2~14
Now let s try with an older kernel version (4.9.0), using old Grml 2017.05 release:
root@grml ~ # grml-version
grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31]
root@grml ~ # uname -a
Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux
root@grml ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.0 (stretch)
Release:        9.0
Codename:       stretch
root@grml ~ # grml-hostname grml-2017-05
Setting hostname to grml-2017-05: done
root@grml ~ # exec zsh
root@grml-2017-05 ~ #
root@grml-2017-05 ~ # xfs_info ./sdd1.dd
xfs_info: ./sdd1.dd is not a mounted XFS filesystem
1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt
root@grml-2017-05 ~ # mount -t xfs
/root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota)
root@grml-2017-05 ~ # ls /mnt
activate.monmap  active  block  block_uuid  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  require_osd_release  systemd  type  whoami
root@grml-2017-05 ~ # xfs_info /mnt
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:
root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/
root@grml-2017-05 ~ # umount /mnt/
And indeed, mounting this rewritten filesystem then also works with newer kernels:
root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/
root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=64    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # mount -t xfs                
/root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)
FTR: The sunit=512,swidth=512 from the xfs mount option is identical to xfs_info s output sunit=64,swidth=64 (because mount.xfs s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so sunit = 512*512 := 64*4096 ). mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):
synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 262144 262144
/sys/block/sdd/queue/: 262144 262144
/sys/block/sde/queue/: 262144 262144
/sys/block/sdf/queue/: 262144 262144
/sys/block/sdg/queue/: 262144 262144
/sys/block/sdh/queue/: 262144 262144
/sys/block/sdi/queue/: 262144 262144
/sys/block/sdj/queue/: 262144 262144
/sys/block/sdk/queue/: 262144 262144
/sys/block/sdl/queue/: 262144 262144
/sys/block/sdm/queue/: 262144 262144
/sys/block/sdn/queue/: 262144 262144
[...]
synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 524288 262144
/sys/block/sdd/queue/: 524288 262144
/sys/block/sde/queue/: 524288 262144
/sys/block/sdf/queue/: 524288 262144
/sys/block/sdg/queue/: 524288 262144
/sys/block/sdh/queue/: 524288 262144
/sys/block/sdi/queue/: 524288 262144
/sys/block/sdj/queue/: 524288 262144
/sys/block/sdk/queue/: 524288 262144
/sys/block/sdl/queue/: 524288 262144
/sys/block/sdm/queue/: 524288 262144
/sys/block/sdn/queue/: 524288 262144
[...]
This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones. Make sure to also read the XFS FAQ regarding How to calculate the correct sunit,swidth values for optimal performance . We also stumbled upon two interesting reads in RedHat s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947. Am I affected? How to work around it? To check whether your XFS mount points are affected by this issue, the following command line should be useful:
awk '$3 == "xfs" print $2 ' /proc/self/mounts   while read mount ; do echo -n "$mount " ; xfs_info $mount   awk '$0 ~ "swidth" gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3 '   awk '  if ($1 > $2) print "impacted"; else print "OK" ' ; done
If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise). Lessons learned Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho. Looking for help with your IT infrastructure? Let us know!

5 April 2021

Anton Gladky: How to vote in Debian using the command line only

Currently, Debian has two running votings: DPL election 2021 and GR about RSM. If you want to use the command line only for sending the filled ballot per email, there are a couple of helpers. Let us assume, that you have got the ballot, filled it and saved as a vote.txt.

Signed-only message If it is acceptable for you yo send signed only message (not encrypted), use following snippets:
  • DPL-vote:
cat vote.txt   gpg --clearsign    mail leader2021@vote.debian.org
  • GR-rms-vote:
cat vote.txt   gpg --clearsign    mail gr_rms@vote.debian.org

Signed and encrypted message If you wish to encrypt the message, the public key which is attached to the ballot should be imported with
gpg --import public_key.asc
Then you can vote.
  • DPL-vote:
cat vote.txt   gpg --encrypt --armor -s -r leader2021@vote.debian.org    mail leader2021@vote.debian.org
  • GR-rms-vote:
cat vote.txt   gpg --encrypt --armor -s -r gr_rms@vote.debian.org    mail gr_rms@vote.debian.org
You can specify some more parameters to the mail such as From:"-field and reply-to address:
...  mail gr_rms@vote.debian.org -a "From: Max Mustermann <max.mustermann@debian.org>" -r max.mustermann@debian.org
Hope that helps.

31 March 2021

Timo Jyrinki: MotionPhoto / MicroVideo File Formats on Pixel Phones

Google Pixel phones support what they call Motion Photo which is essentially a photo with a short video clip attached to it. They are quite nice since they bring the moment alive, especially as the capturing of the video starts a small moment before the shutter button is pressed. For most viewing programs they simply show as static JPEG photos, but there is more to the files.
I d really love proper Shotwell support for these file formats, so I posted a longish explanation with many of the details in this blog post to a ticket there too. Examples of the newer format are linked there too.
Info posted to Shotwell ticket

There are actually two different formats, an old one that is already obsolete, and a newer current format. The older ones are those that your Pixel phone recorded as MVIMG_[datetime].jpg", and they have the following meta-data:
Xmp.GCamera.MicroVideo                       XmpText     1  1
Xmp.GCamera.MicroVideoVersion XmpText 1 1
Xmp.GCamera.MicroVideoOffset XmpText 7 4022143
Xmp.GCamera.MicroVideoPresentationTimestampUs XmpText 7 1331607
The offset is actually from the end of the file, so one needs to calculate accordingly. But it is exact otherwise, so one simply extract a file with that meta-data information:
#!/bin/bash
#
# Extracts the microvideo from a MVIMG_*.jpg file

# The offset is from the ending of the file, so calculate accordingly
offset=$(exiv2 -p X "$1" grep MicroVideoOffset sed 's/.*\"\(.*\)"/\1/')
filesize=$(du --apparent-size --block=1 "$1" sed 's/^\([0-9]*\).*/\1/')
extractposition=$(expr $filesize - $offset)
echo offset: $offset
echo filesize: $filesize
echo extractposition=$extractposition
dd if="$1" skip=1 bs=$extractposition of="$(basename -s .jpg $1).mp4"
The newer format is recorded in filenames called PXL_[datetime].MP.jpg , and they have a _lot_ of additional metadata:
Xmp.GCamera.MotionPhoto                      XmpText     1  1
Xmp.GCamera.MotionPhotoVersion XmpText 1 1
Xmp.GCamera.MotionPhotoPresentationTimestampUs XmpText 6 233320
Xmp.xmpNote.HasExtendedXMP XmpText 32 E1F7505D2DD64EA6948D2047449F0FFA
Xmp.Container.Directory XmpText 0 type="Seq"
Xmp.Container.Directory[1] XmpText 0 type="Struct"
Xmp.Container.Directory[1]/Container:Item XmpText 0 type="Struct"
Xmp.Container.Directory[1]/Container:Item/Item:Mime XmpText 10 image/jpeg
Xmp.Container.Directory[1]/Container:Item/Item:Semantic XmpText 7 Primary
Xmp.Container.Directory[1]/Container:Item/Item:Length XmpText 1 0
Xmp.Container.Directory[1]/Container:Item/Item:Padding XmpText 1 0
Xmp.Container.Directory[2] XmpText 0 type="Struct"
Xmp.Container.Directory[2]/Container:Item XmpText 0 type="Struct"
Xmp.Container.Directory[2]/Container:Item/Item:Mime XmpText 9 video/mp4
Xmp.Container.Directory[2]/Container:Item/Item:Semantic XmpText 11 MotionPhoto
Xmp.Container.Directory[2]/Container:Item/Item:Length XmpText 7 1679555
Xmp.Container.Directory[2]/Container:Item/Item:Padding XmpText 1 0
Sounds like fun and lots of information. However I didn t see why the length in first item is 0 and I didn t see how to use the latter Length info. But I can use the mp4 headers to extract it:
#!/bin/bash
#
# Extracts the motion part of a MotionPhoto file PXL_*.MP.mp4

extractposition=$(grep --binary --byte-offset --only-matching --text \
-P "\x00\x00\x00\x18\x66\x74\x79\x70\x6d\x70\x34\x32" $1 sed 's/^\([0-9]*\).*/\1/')

dd if="$1" skip=1 bs=$extractposition of="$(basename -s .jpg $1).mp4"
UPDATE: I wrote most of this blog post earlier. When now actually getting to publishing it a week later, I see the obvious ie the Length is again simply the offset from the end of the file so one could do the same less brute force approach as for MVIMG. I ll leave the above as is however for the of binary grepping.(cross-posted to my other blog)

29 March 2021

Benjamin Mako Hill: Identifying Underproduced Software

I wrote this blog post with Kaylea Champion and a version of this post was originally posted on the Community Data Science Collective blog. Critical software we all rely on can silently crumble away beneath us. Unfortunately, we often don t find out software infrastructure is in poor condition until it is too late. Over the last year or so, I have been supporting Kaylea Champion on a project my group announced earlier to measure software underproduction a term we use to describe software that is low in quality but high in importance. Underproduction reflects an important type of risk in widely used free/libre open source software (FLOSS) because participants often choose their own projects and tasks. Because FLOSS contributors work as volunteers and choose what they work on, important projects aren t always the ones to which FLOSS developers devote the most attention. Even when developers want to work on important projects, relative neglect among important projects is often difficult for FLOSS contributors to see. Given all this, what can we do to detect problems in FLOSS infrastructure before major failures occur? Kaylea Champion and I recently published a paper laying out our new method for measuring underproduction at the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER) 2021 that we believe provides one important answer to this question.

A conceptual diagram of underproduction. The x-axis shows relative importance, the y-axis relative quality. The top left area of the graph described by these axes is 'overproduction' -- high quality, low importance. The diagonal is Alignment: quality and importance are approximately the same. The lower right depicts underproduction -- high importance, low quality -- the area of potential risk.Conceptual diagram showing how our conception of underproduction relates to quality and importance of software.
In the paper, we describe a general approach for detecting underproduced software infrastructure that consists of five steps: (1) identifying a body of digital infrastructure (like a code repository); (2) identifying a measure of quality (like the time to takes to fix bugs); (3) identifying a measure of importance (like install base); (4) specifying a hypothesized relationship linking quality and importance if quality and importance are in perfect alignment; and (5) quantifying deviation from this theoretical baseline to find relative underproduction. To show how our method works in practice, we applied the technique to an important collection of FLOSS infrastructure: 21,902 packages in the Debian GNU/Linux distribution. Although there are many ways to measure quality, we used a measure of how quickly Debian maintainers have historically dealt with 461,656 bugs that have been filed over the last three decades. To measure importance, we used data from Debian s Popularity Contest opt-in survey. After some statistical machinations that are documented in our paper, the result was an estimate of relative underproduction for the 21,902 packages in Debian we looked at. One of our key findings is that underproduction is very common in Debian. By our estimates, at least 4,327 packages in Debian are underproduced. As you can see in the list of the most underproduced packages again, as estimated using just one more measure many of the most at risk packages are associated with the desktop and windowing environments where there are many users but also many extremely tricky integration-related bugs.
This table shows the 30 packages with the most severe underproduction problem in Debian, shown as a series of boxplots.These 30 packages have the highest level of underproduction in Debian according to our analysis.
We hope these results are useful to folks at Debian and the Debian QA team. We also hope that the basic method we ve laid out is something that others will build off in other contexts and apply to other software repositories.
In addition to the paper itself and the video of the conference presentation on Youtube by Kaylea, we ve put a repository with all our code and data in an archival repository Harvard Dataverse and we d love to work with others interested in applying our approach in other software ecosytems.

For more details, check out the full paper which is available as a freely accessible preprint.

This project was supported by the Ford/Sloan Digital Infrastructure Initiative. Wm Salt Hale of the Community Data Science Collective and Debian Developers Paul Wise and Don Armstrong provided valuable assistance in accessing and interpreting Debian bug data. Ren Just generously provided insight and feedback on the manuscript.

Paper Citation: Kaylea Champion and Benjamin Mako Hill. 2021. Underproduction: An Approach for Measuring Risk in Open Source Software. In Proceedings of the IEEE International Conference on Software Analysis, Evolution and Reengineering (SANER 2021). IEEE.

Contact Kaylea Champion (kaylea@uw.edu) with any questions or if you are interested in following up.

Next.