Search Results: "rrs"

1 April 2024

Colin Watson: Free software activity in March 2024

My Debian contributions this month were all sponsored by Freexian.

3 March 2024

Petter Reinholdtsen: RAID status from LSI Megaraid controllers using free software

The last few days I have revisited RAID setup using the LSI Megaraid controller. These are a family of controllers called PERC by Dell, and is present in several old PowerEdge servers, and I recently got my hands on one of these. I had forgotten how to handle this RAID controller in Debian, so I had to take a peek in the Debian wiki page "Linux and Hardware RAID: an administrator's summary" to remember what kind of software is available to configure and monitor the disks and controller. I prefer Free Software alternatives to proprietary tools, as the later tend to fall into disarray once the manufacturer loose interest, and often do not work with newer Linux Distributions. Sadly there is no free software tool to configure the RAID setup, only to monitor it. RAID can provide improved reliability and resilience in a storage solution, but only if it is being regularly checked and any broken disks are being replaced in time. I thus want to ensure some automatic monitoring is available. In the discovery process, I came across a old free software tool to monitor PERC2, PERC3, PERC4 and PERC5 controllers, which to my surprise is not present in debian. To help change that I created a request for packaging of the megactl package, and tried to track down a usable version. The original project site is on Sourceforge, but as far as I can tell that project has been dead for more than 15 years. I managed to find a more recent fork on github from user hmage, but it is unclear to me if this is still being maintained. It has not seen much improvements since 2016. A more up to date edition is a git fork from the original github fork by user namiltd, and this newer fork seem a lot more promising. The owner of this github repository has replied to change proposals within hours, and had already added some improvements and support for more hardware. Sadly he is reluctant to commit to maintaining the tool and stated in my first pull request that he think a new release should be made based on the git repository owned by hmage. I perfectly understand this reluctance, as I feel the same about maintaining yet another package in Debian when I barely have time to take care of the ones I already maintain, but do not really have high hopes that hmage will have time to spend on it and hope namiltd will change his mind. In any case, I created a draft package based on the namiltd edition and put it under the debian group on salsa.debian.org. If you own a Dell PowerEdge server with one of the PERC controllers, or any other RAID controller using the megaraid or megaraid_sas Linux kernel modules, you might want to check it out. If enough people are interested, perhaps the package will make it into the Debian archive. There are two tools provided, megactl for the megaraid Linux kernel module, and megasasctl for the megaraid_sas Linux kernel module. The simple output from the command on one of my machines look like this (yes, I know some of the disks have problems. :).
# megasasctl 
a0       PERC H730 Mini           encl:1 ldrv:2  batt:good
a0d0       558GiB RAID 1   1x2  optimal
a0d1      3067GiB RAID 0   1x11 optimal
a0e32s0     558GiB  a0d0  online   errs: media:0  other:19
a0e32s1     279GiB  a0d1  online  
a0e32s2     279GiB  a0d1  online  
a0e32s3     279GiB  a0d1  online  
a0e32s4     279GiB  a0d1  online  
a0e32s5     279GiB  a0d1  online  
a0e32s6     279GiB  a0d1  online  
a0e32s8     558GiB  a0d0  online   errs: media:0  other:17
a0e32s9     279GiB  a0d1  online  
a0e32s10    279GiB  a0d1  online  
a0e32s11    279GiB  a0d1  online  
a0e32s12    279GiB  a0d1  online  
a0e32s13    279GiB  a0d1  online  
#
In addition to displaying a simple status report, it can also test individual drives and print the various event logs. Perhaps you too find it useful? In the packaging process I provided some patches upstream to improve installation and ensure a Appstream metainfo file is provided to list all supported HW, to allow isenkram to propose the package on all servers with a relevant PCI card. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

13 March 2023

Antoine Beaupr : Framework 12th gen laptop review

The Framework is a 13.5" laptop body with swappable parts, which makes it somewhat future-proof and certainly easily repairable, scoring an "exceedingly rare" 10/10 score from ifixit.com. There are two generations of the laptop's main board (both compatible with the same body): the Intel 11th and 12th gen chipsets. I have received my Framework, 12th generation "DIY", device in late September 2022 and will update this page as I go along in the process of ordering, burning-in, setting up and using the device over the years. Overall, the Framework is a good laptop. I like the keyboard, the touch pad, the expansion cards. Clearly there's been some good work done on industrial design, and it's the most repairable laptop I've had in years. Time will tell, but it looks sturdy enough to survive me many years as well. This is also one of the most powerful devices I ever lay my hands on. I have managed, remotely, more powerful servers, but this is the fastest computer I have ever owned, and it fits in this tiny case. It is an amazing machine. On the downside, there's a bit of proprietary firmware required (WiFi, Bluetooth, some graphics) and the Framework ships with a proprietary BIOS, with currently no Coreboot support. Expect to need the latest kernel, firmware, and hacking around a bunch of things to get resolution and keybindings working right. Like others, I have first found significant power management issues, but many issues can actually be solved with some configuration. Some of the expansion ports (HDMI, DP, MicroSD, and SSD) use power when idle, so don't expect week-long suspend, or "full day" battery while those are plugged in. Finally, the expansion ports are nice, but there's only four of them. If you plan to have a two-monitor setup, you're likely going to need a dock. Read on for the detailed review. For context, I'm moving from the Purism Librem 13v4 because it basically exploded on me. I had, in the meantime, reverted back to an old ThinkPad X220, so I sometimes compare the Framework with that venerable laptop as well. This blog post has been maturing for months now. It started in September 2022 and I declared it completed in March 2023. It's the longest single article on this entire website, currently clocking at about 13,000 words. It will take an average reader a full hour to go through this thing, so I don't expect anyone to actually do that. This introduction should be good enough for most people, read the first section if you intend to actually buy a Framework. Jump around the table of contents as you see fit for after you did buy the laptop, as it might include some crucial hints on how to make it work best for you, especially on (Debian) Linux.

Advice for buyers Those are things I wish I would have known before buying:
  1. consider buying 4 USB-C expansion cards, or at least a mix of 4 USB-A or USB-C cards, as they use less power than other cards and you do want to fill those expansion slots otherwise they snag around and feel insecure
  2. you will likely need a dock or at least a USB hub if you want a two-monitor setup, otherwise you'll run out of ports
  3. you have to do some serious tuning to get proper (10h+ idle, 10 days suspend) power savings
  4. in particular, beware that the HDMI, DisplayPort and particularly the SSD and MicroSD cards take a significant amount power, even when sleeping, up to 2-6W for the latter two
  5. beware that the MicroSD card is what it says: Micro, normal SD cards won't fit, and while there might be full sized one eventually, it's currently only at the prototyping stage
  6. the Framework monitor has an unusual aspect ratio (3:2): I like it (and it matches classic and digital photography aspect ratio), but it might surprise you

Current status I have the framework! It's setup with a fresh new Debian bookworm installation. I've ran through a large number of tests and burn in. I have decided to use the Framework as my daily driver, and had to buy a USB-C dock to get my two monitors connected, which was own adventure. Update: Framework just (2023-03-23) just announced a whole bunch of new stuff: The recording is available in this video and it's not your typical keynote. It starts ~25 minutes late, audio is crap, lightning and camera are crap, clapping seems to be from whatever staff they managed to get together in a room, decor is bizarre, colors are shit. It's amazing.

Specifications Those are the specifications of the 12th gen, in general terms. Your build will of course vary according to your needs.
  • CPU: i5-1240P, i7-1260P, or i7-1280P (Up to 4.4-4.8 GHz, 4+8 cores), Iris Xe graphics
  • Storage: 250-4000GB NVMe (or bring your own)
  • Memory: 8-64GB DDR4-3200 (or bring your own)
  • WiFi 6e (AX210, vPro optional, or bring your own)
  • 296.63mm X 228.98mm X 15.85mm, 1.3Kg
  • 13.5" display, 3:2 ratio, 2256px X 1504px, 100% sRGB, >400 nit
  • 4 x USB-C user-selectable expansion ports, including
    • USB-C
    • USB-A
    • HDMI
    • DP
    • Ethernet
    • MicroSD
    • 250-1000GB SSD
  • 3.5mm combo headphone jack
  • Kill switches for microphone and camera
  • Battery: 55Wh
  • Camera: 1080p 60fps
  • Biometrics: Fingerprint Reader
  • Backlit keyboard
  • Power Adapter: 60W USB-C (or bring your own)
  • ships with a screwdriver/spludger
  • 1 year warranty
  • base price: 1000$CAD, but doesn't give you much, typical builds around 1500-2000$CAD

Actual build This is the actual build I ordered. Amounts in CAD. (1CAD = ~0.75EUR/USD.)

Base configuration
  • CPU: Intel Core i5-1240P (AKA Alder Lake P 8 4.4GHz P-threads, 8 3.2GHz E-threads, 16 total, 28-64W), 1079$
  • Memory: 16GB (1 x 16GB) DDR4-3200, 104$

Customization
  • Keyboard: US English, included

Expansion Cards
  • 2 USB-C $24
  • 3 USB-A $36
  • 2 HDMI $50
  • 1 DP $50
  • 1 MicroSD $25
  • 1 Storage 1TB $199
  • Sub-total: 384$

Accessories
  • Power Adapter - US/Canada $64.00

Total
  • Before tax: 1606$
  • After tax and duties: 1847$
  • Free shipping

Quick evaluation This is basically the TL;DR: here, just focusing on broad pros/cons of the laptop.

Pros

Cons
  • the 11th gen is out of stock, except for the higher-end CPUs, which are much less affordable (700$+)
  • the 12th gen has compatibility issues with Debian, followup in the DebianOn page, but basically: brightness hotkeys, power management, wifi, the webcam is okay even though the chipset is the infamous alder lake because it does not have the fancy camera; most issues currently seem solvable, and upstream is working with mainline to get their shit working
  • 12th gen might have issues with thunderbolt docks
  • they used to have some difficulty keeping up with the orders: first two batches shipped, third batch sold out, fourth batch should have shipped (?) in October 2021. they generally seem to keep up with shipping. update (august 2022): they rolled out a second line of laptops (12th gen), first batch shipped, second batch shipped late, September 2022 batch was generally on time, see this spreadsheet for a crowdsourced effort to track those supply chain issues seem to be under control as of early 2023. I got the Ethernet expansion card shipped within a week.
  • compared to my previous laptop (Purism Librem 13v4), it feels strangely bulkier and heavier; it's actually lighter than the purism (1.3kg vs 1.4kg) and thinner (15.85mm vs 18mm) but the design of the Purism laptop (tapered edges) makes it feel thinner
  • no space for a 2.5" drive
  • rather bright LED around power button, but can be dimmed in the BIOS (not low enough to my taste) I got used to it
  • fan quiet when idle, but can be noisy when running, for example if you max a CPU for a while
  • battery described as "mediocre" by Ars Technica (above), confirmed poor in my tests (see below)
  • no RJ-45 port, and attempts at designing ones are failing because the modular plugs are too thin to fit (according to Linux After Dark), so unlikely to have one in the future Update: they cracked that nut and ship an 2.5 gbps Ethernet expansion card with a realtek chipset, without any firmware blob (!)
  • a bit pricey for the performance, especially when compared to the competition (e.g. Dell XPS, Apple M1)
  • 12th gen Intel has glitchy graphics, seems like Intel hasn't fully landed proper Linux support for that chipset yet

Initial hardware setup A breeze.

Accessing the board The internals are accessed through five TorX screws, but there's a nice screwdriver/spudger that works well enough. The screws actually hold in place so you can't even lose them. The first setup is a bit counter-intuitive coming from the Librem laptop, as I expected the back cover to lift and give me access to the internals. But instead the screws is release the keyboard and touch pad assembly, so you actually need to flip the laptop back upright and lift the assembly off (!) to get access to the internals. Kind of scary. I also actually unplugged a connector in lifting the assembly because I lifted it towards the monitor, while you actually need to lift it to the right. Thankfully, the connector didn't break, it just snapped off and I could plug it back in, no harm done. Once there, everything is well indicated, with QR codes all over the place supposedly leading to online instructions.

Bad QR codes Unfortunately, the QR codes I tested (in the expansion card slot, the memory slot and CPU slots) did not actually work so I wonder how useful those actually are. After all, they need to point to something and that means a URL, a running website that will answer those requests forever. I bet those will break sooner than later and in fact, as far as I can tell, they just don't work at all. I prefer the approach taken by the MNT reform here which designed (with the 100 rabbits folks) an actual paper handbook (PDF). The first QR code that's immediately visible from the back of the laptop, in an expansion cord slot, is a 404. It seems to be some serial number URL, but I can't actually tell because, well, the page is a 404. I was expecting that bar code to lead me to an introduction page, something like "how to setup your Framework laptop". Support actually confirmed that it should point a quickstart guide. But in a bizarre twist, they somehow sent me the URL with the plus (+) signs escaped, like this:
https://guides.frame.work/Guide/Framework\+Laptop\+DIY\+Edition\+Quick\+Start\+Guide/57
... which Firefox immediately transforms in:
https://guides.frame.work/Guide/Framework/+Laptop/+DIY/+Edition/+Quick/+Start/+Guide/57
I'm puzzled as to why they would send the URL that way, the proper URL is of course:
https://guides.frame.work/Guide/Framework+Laptop+DIY+Edition+Quick+Start+Guide/57
(They have also "let the team know about this for feedback and help resolve the problem with the link" which is a support code word for "ha-ha! nope! not my problem right now!" Trust me, I know, my own code word is "can you please make a ticket?")

Seating disks and memory The "DIY" kit doesn't actually have that much of a setup. If you bought RAM, it's shipped outside the laptop in a little plastic case, so you just seat it in as usual. Then you insert your NVMe drive, and, if that's your fancy, you also install your own mPCI WiFi card. If you ordered one (which was my case), it's pre-installed. Closing the laptop is also kind of amazing, because the keyboard assembly snaps into place with magnets. I have actually used the laptop with the keyboard unscrewed as I was putting the drives in and out, and it actually works fine (and will probably void your warranty, so don't do that). (But you can.) (But don't, really.)

Hardware review

Keyboard and touch pad The keyboard feels nice, for a laptop. I'm used to mechanical keyboard and I'm rather violent with those poor things. Yet the key travel is nice and it's clickety enough that I don't feel too disoriented. At first, I felt the keyboard as being more laggy than my normal workstation setup, but it turned out this was a graphics driver issues. After enabling a composition manager, everything feels snappy. The touch pad feels good. The double-finger scroll works well enough, and I don't have to wonder too much where the middle button is, it just works. Taps don't work, out of the box: that needs to be enabled in Xorg, with something like this:
cat > /etc/X11/xorg.conf.d/40-libinput.conf <<EOF
Section "InputClass"
      Identifier "libinput touch pad catchall"
      MatchIsTouchpad "on"
      MatchDevicePath "/dev/input/event*"
      Driver "libinput"
      Option "Tapping" "on"
      Option "TappingButtonMap" "lmr"
EndSection
EOF
But be aware that once you enable that tapping, you'll need to deal with palm detection... So I have not actually enabled this in the end.

Power button The power button is a little dangerous. It's quite easy to hit, as it's right next to one expansion card where you are likely to plug in a cable power. And because the expansion cards are kind of hard to remove, you might squeeze the laptop (and the power key) when trying to remove the expansion card next to the power button. So obviously, don't do that. But that's not very helpful. An alternative is to make the power button do something else. With systemd-managed systems, it's actually quite easy. Add a HandlePowerKey stanza to (say) /etc/systemd/logind.conf.d/power-suspends.conf:
[Login]
HandlePowerKey=suspend
HandlePowerKeyLongPress=poweroff
You might have to create the directory first:
mkdir /etc/systemd/logind.conf.d/
Then restart logind:
systemctl restart systemd-logind
And the power button will suspend! Long-press to power off doesn't actually work as the laptop immediately suspends... Note that there's probably half a dozen other ways of doing this, see this, this, or that.

Special keybindings There is a series of "hidden" (as in: not labeled on the key) keybindings related to the fn keybinding that I actually find quite useful.
Key Equivalent Effect Command
p Pause lock screen xset s activate
b Break ? ?
k ScrLk switch keyboard layout N/A
It looks like those are defined in the microcontroller so it would be possible to add some. For example, the SysRq key is almost bound to fn s in there. Note that most other shortcuts like this are clearly documented (volume, brightness, etc). One key that's less obvious is F12 that only has the Framework logo on it. That actually calls the keysym XF86AudioMedia which, interestingly, does absolutely nothing here. By default, on Windows, it opens your browser to the Framework website and, on Linux, your "default media player". The keyboard backlight can be cycled with fn-space. The dimmer version is dim enough, and the keybinding is easy to find in the dark. A skinny elephant would be performed with alt PrtScr (above F11) KEY, so for example alt fn F11 b should do a hard reset. This comment suggests you need to hold the fn only if "function lock" is on, but that's actually the opposite of my experience. Out of the box, some of the fn keys don't work. Mute, volume up/down, brightness, monitor changes, and the airplane mode key all do basically nothing. They don't send proper keysyms to Xorg at all. This is a known problem and it's related to the fact that the laptop has light sensors to adjust the brightness automatically. Somehow some of those keys (e.g. the brightness controls) are supposed to show up as a different input device, but don't seem to work correctly. It seems like the solution is for the Framework team to write a driver specifically for this, but so far no progress since July 2022. In the meantime, the fancy functionality can be supposedly disabled with:
echo 'blacklist hid_sensor_hub'   sudo tee /etc/modprobe.d/framework-als-blacklist.conf
... and a reboot. This solution is also documented in the upstream guide. Note that there's another solution flying around that fixes this by changing permissions on the input device but I haven't tested that or seen confirmation it works.

Kill switches The Framework has two "kill switches": one for the camera and the other for the microphone. The camera one actually disconnects the USB device when turned off, and the mic one seems to cut the circuit. It doesn't show up as muted, it just stops feeding the sound. Both kill switches are around the main camera, on top of the monitor, and quite discreet. Then turn "red" when enabled (i.e. "red" means "turned off").

Monitor The monitor looks pretty good to my untrained eyes. I have yet to do photography work on it, but some photos I looked at look sharp and the colors are bright and lively. The blacks are dark and the screen is bright. I have yet to use it in full sunlight. The dimmed light is very dim, which I like.

Screen backlight I bind brightness keys to xbacklight in i3, but out of the box I get this error:
sep 29 22:09:14 angela i3[5661]: No outputs have backlight property
It just requires this blob in /etc/X11/xorg.conf.d/backlight.conf:
Section "Device"
    Identifier  "Card0"
    Driver      "intel"
    Option      "Backlight"  "intel_backlight"
EndSection
This way I can control the actual backlight power with the brightness keys, and they do significantly reduce power usage.

Multiple monitor support I have been able to hook up my two old monitors to the HDMI and DisplayPort expansion cards on the laptop. The lid closes without suspending the machine, and everything works great. I actually run out of ports, even with a 4-port USB-A hub, which gives me a total of 7 ports:
  1. power (USB-C)
  2. monitor 1 (DisplayPort)
  3. monitor 2 (HDMI)
  4. USB-A hub, which adds:
  5. keyboard (USB-A)
  6. mouse (USB-A)
  7. Yubikey
  8. external sound card
Now the latter, I might be able to get rid of if I switch to a combo-jack headset, which I do have (and still need to test). But still, this is a problem. I'll probably need a powered USB-C dock and better monitors, possibly with some Thunderbolt chaining, to save yet more ports. But that means more money into this setup, argh. And figuring out my monitor situation is the kind of thing I'm not that big of a fan of. And neither is shopping for USB-C (or is it Thunderbolt?) hubs. My normal autorandr setup doesn't work: I have tried saving a profile and it doesn't get autodetected, so I also first need to do:
autorandr -l framework-external-dual-lg-acer
The magic:
autorandr -l horizontal
... also works well. The worst problem with those monitors right now is that they have a radically smaller resolution than the main screen on the laptop, which means I need to reset the font scaling to normal every time I switch back and forth between those monitors and the laptop, which means I actually need to do this:
autorandr -l horizontal &&
eho Xft.dpi: 96   xrdb -merge &&
systemctl restart terminal xcolortaillog background-image emacs &&
i3-msg restart
Kind of disruptive.

Expansion ports I ordered a total of 10 expansion ports. I did manage to initialize the 1TB drive as an encrypted storage, mostly to keep photos as this is something that takes a massive amount of space (500GB and counting) and that I (unfortunately) don't work on very often (but still carry around). The expansion ports are fancy and nice, but not actually that convenient. They're a bit hard to take out: you really need to crimp your fingernails on there and pull hard to take them out. There's a little button next to them to release, I think, but at first it feels a little scary to pull those pucks out of there. You get used to it though, and it's one of those things you can do without looking eventually. There's only four expansion ports. Once you have two monitors, the drive, and power plugged in, bam, you're out of ports; there's nowhere to plug my Yubikey. So if this is going to be my daily driver, with a dual monitor setup, I will need a dock, which means more crap firmware and uncertainty, which isn't great. There are actually plans to make a dual-USB card, but that is blocked on designing an actual board for this. I can't wait to see more expansion ports produced. There's a ethernet expansion card which quickly went out of stock basically the day it was announced, but was eventually restocked. I would like to see a proper SD-card reader. There's a MicroSD card reader, but that obviously doesn't work for normal SD cards, which would be more broadly compatible anyways (because you can have a MicroSD to SD card adapter, but I have never heard of the reverse). Someone actually found a SD card reader that fits and then someone else managed to cram it in a 3D printed case, which is kind of amazing. Still, I really like that idea that I can carry all those little adapters in a pouch when I travel and can basically do anything I want. It does mean I need to shuffle through them to find the right one which is a little annoying. I have an elastic band to keep them lined up so that all the ports show the same side, to make it easier to find the right one. But that quickly gets undone and instead I have a pouch full of expansion cards. Another awesome thing with the expansion cards is that they don't just work on the laptop: anything that takes USB-C can take those cards, which means you can use it to connect an SD card to your phone, for backups, for example. Heck, you could even connect an external display to your phone that way, assuming that's supported by your phone of course (and it probably isn't). The expansion ports do take up some power, even when idle. See the power management section below, and particularly the power usage tests for details.

USB-C charging One thing that is really a game changer for me is USB-C charging. It's hard to overstate how convenient this is. I often have a USB-C cable lying around to charge my phone, and I can just grab that thing and pop it in my laptop. And while it will obviously not charge as fast as the provided charger, it will stop draining the battery at least. (As I wrote this, I had the laptop plugged in the Samsung charger that came with a phone, and it was telling me it would take 6 hours to charge the remaining 15%. With the provided charger, that flew down to 15 minutes. Similarly, I can power the laptop from the power grommet on my desk, reducing clutter as I have that single wire out there instead of the bulky power adapter.) I also really like the idea that I can charge my laptop with a power bank or, heck, with my phone, if push comes to shove. (And vice-versa!) This is awesome. And it works from any of the expansion ports, of course. There's a little led next to the expansion ports as well, which indicate the charge status:
  • red/amber: charging
  • white: charged
  • off: unplugged
I couldn't find documentation about this, but the forum answered. This is something of a recurring theme with the Framework. While it has a good knowledge base and repair/setup guides (and the forum is awesome) but it doesn't have a good "owner manual" that shows you the different parts of the laptop and what they do. Again, something the MNT reform did well. Another thing that people are asking about is an external sleep indicator: because the power LED is on the main keyboard assembly, you don't actually see whether the device is active or not when the lid is closed. Finally, I wondered what happens when you plug in multiple power sources and it turns out the charge controller is actually pretty smart: it will pick the best power source and use it. The only downside is it can't use multiple power sources, but that seems like a bit much to ask.

Multimedia and other devices Those things also work:
  • webcam: splendid, best webcam I've ever had (but my standards are really low)
  • onboard mic: works well, good gain (maybe a bit much)
  • onboard speakers: sound okay, a little metal-ish, loud enough to be annoying, see this thread for benchmarks, apparently pretty good speakers
  • combo jack: works, with slight hiss, see below
There's also a light sensor, but it conflicts with the keyboard brightness controls (see above). There's also an accelerometer, but it's off by default and will be removed from future builds.

Combo jack mic tests The Framework laptop ships with a combo jack on the left side, which allows you to plug in a CTIA (source) headset. In human terms, it's a device that has both a stereo output and a mono input, typically a headset or ear buds with a microphone somewhere. It works, which is better than the Purism (which only had audio out), but is on par for the course for that kind of onboard hardware. Because of electrical interference, such sound cards very often get lots of noise from the board. With a Jabra Evolve 40, the built-in USB sound card generates basically zero noise on silence (invisible down to -60dB in Audacity) while plugging it in directly generates a solid -30dB hiss. There is a noise-reduction system in that sound card, but the difference is still quite striking. On a comparable setup (curie, a 2017 Intel NUC), there is also a his with the Jabra headset, but it's quieter, more in the order of -40/-50 dB, a noticeable difference. Interestingly, testing with my Mee Audio Pro M6 earbuds leads to a little more hiss on curie, more on the -35/-40 dB range, close to the Framework. Also note that another sound card, the Antlion USB adapter that comes with the ModMic 4, also gives me pretty close to silence on a quiet recording, picking up less than -50dB of background noise. It's actually probably picking up the fans in the office, which do make audible noises. In other words, the hiss of the sound card built in the Framework laptop is so loud that it makes more noise than the quiet fans in the office. Or, another way to put it is that two USB sound cards (the Jabra and the Antlion) are able to pick up ambient noise in my office but not the Framework laptop. See also my audio page.

Performance tests

Compiling Linux 5.19.11 On a single core, compiling the Debian version of the Linux kernel takes around 100 minutes:
5411.85user 673.33system 1:37:46elapsed 103%CPU (0avgtext+0avgdata 831700maxresident)k
10594704inputs+87448000outputs (9131major+410636783minor)pagefaults 0swaps
This was using 16 watts of power, with full screen brightness. With all 16 cores (make -j16), it takes less than 25 minutes:
19251.06user 2467.47system 24:13.07elapsed 1494%CPU (0avgtext+0avgdata 831676maxresident)k
8321856inputs+87427848outputs (30792major+409145263minor)pagefaults 0swaps
I had to plug the normal power supply after a few minutes because battery would actually run out using my desk's power grommet (34 watts). During compilation, fans were spinning really hard, quite noisy, but not painfully so. The laptop was sucking 55 watts of power, steadily:
  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average  87.9   0.0  10.7   1.4   0.1 17.8 6583.6 5054.3 233.0 223.9 233.1  55.96
 GeoMean  87.9   0.0  10.6   1.2   0.0 17.6 6427.8 5048.1 227.6 218.7 227.7  55.96
  StdDev   1.4   0.0   1.2   0.6   0.2  3.0 1436.8  255.5 50.0 47.5 49.7   0.20
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum  85.0   0.0   7.8   0.5   0.0 13.0 3594.0 4638.0 117.0 111.0 120.0  55.52
 Maximum  90.8   0.0  12.9   3.5   0.8 38.0 10174.0 5901.0 374.0 362.0 375.0  56.41
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
CPU:  55.96 Watts on average with standard deviation 0.20
Note: power read from RAPL domains: package-0, uncore, package-0, core, psys.
These readings do not cover all the hardware in this device.

memtest86+ I ran Memtest86+ v6.00b3. It shows something like this:
Memtest86+ v6.00b3        12th Gen Intel(R) Core(TM) i5-1240P
CLK/Temp: 2112MHz    78/78 C   Pass  2% #
L1 Cache:   48KB    414 GB/s   Test 46% ##################
L2 Cache: 1.25MB    118 GB/s   Test #3 [Moving inversions, 1s & 0s] 
L3 Cache:   12MB     43 GB/s   Testing: 16GB - 18GB [1GB of 15.7GB]
Memory  :  15.7GB  14.9 GB/s   Pattern: 
--------------------------------------------------------------------------------
CPU: 4P+8E-Cores (16T)    SMP: 8T (PAR))    Time:  0:27:23  Status: Pass     \
RAM: 1600MHz (DDR4-3200) CAS 22-22-22-51    Pass:  1        Errors: 0
--------------------------------------------------------------------------------
Memory SPD Information
----------------------
 - Slot 2: 16GB DDR-4-3200 - Crucial CT16G4SFRA32A.C16FP (2022-W23)
                          Framework FRANMACP04
 <ESC> Exit  <F1> Configuration  <Space> Scroll Lock            6.00.unknown.x64
So about 30 minutes for a full 16GB memory test.

Software setup Once I had everything in the hardware setup, I figured, voil , I'm done, I'm just going to boot this beautiful machine and I can get back to work. I don't understand why I am so na ve some times. It's mind boggling. Obviously, it didn't happen that way at all, and I spent the best of the three following days tinkering with the laptop.

Secure boot and EFI First, I couldn't boot off of the NVMe drive I transferred from the previous laptop (the Purism) and the BIOS was not very helpful: it was just complaining about not finding any boot device, without dropping me in the real BIOS. At first, I thought it was a problem with my NVMe drive, because it's not listed in the compatible SSD drives from upstream. But I figured out how to enter BIOS (press F2 manically, of course), which showed the NVMe drive was actually detected. It just didn't boot, because it was an old (2010!!) Debian install without EFI. So from there, I disabled secure boot, and booted a grml image to try to recover. And by "boot" I mean, I managed to get to the grml boot loader which promptly failed to load its own root file system somehow. I still have to investigate exactly what happened there, but it failed some time after the initrd load with:
Unable to find medium containing a live file system
This, it turns out, was fixed in Debian lately, so a daily GRML build will not have this problems. The upcoming 2022 release (likely 2022.10 or 2022.11) will also get the fix. I did manage to boot the development version of the Debian installer which was a surprisingly good experience: it mounted the encrypted drives and did everything pretty smoothly. It even offered me to reinstall the boot loader, but that ultimately (and correctly, as it turns out) failed because I didn't have a /boot/efi partition. At this point, I realized there was no easy way out of this, and I just proceeded to completely reinstall Debian. I had a spare NVMe drive lying around (backups FTW!) so I just swapped that in, rebooted in the Debian installer, and did a clean install. I wanted to switch to bookworm anyways, so I guess that's done too.

Storage limitations Another thing that happened during setup is that I tried to copy over the internal 2.5" SSD drive from the Purism to the Framework 1TB expansion card. There's no 2.5" slot in the new laptop, so that's pretty much the only option for storage expansion. I was tired and did something wrong. I ended up wiping the partition table on the original 2.5" drive. Oops. It might be recoverable, but just restoring the partition table didn't work either, so I'm not sure how I recover the data there. Normally, everything on my laptops and workstations is designed to be disposable, so that wasn't that big of a problem. I did manage to recover most of the data thanks to git-annex reinit, but that was a little hairy.

Bootstrapping Puppet Once I had some networking, I had to install all the packages I needed. The time I spent setting up my workstations with Puppet has finally paid off. What I actually did was to restore two critical directories:
/etc/ssh
/var/lib/puppet
So that I would keep the previous machine's identity. That way I could contact the Puppet server and install whatever was missing. I used my Puppet optimization trick to do a batch install and then I had a good base setup, although not exactly as it was before. 1700 packages were installed manually on angela before the reinstall, and not in Puppet. I did not inspect each one individually, but I did go through /etc and copied over more SSH keys, for backups and SMTP over SSH.

LVFS support It looks like there's support for the (de-facto) standard LVFS firmware update system. At least I was able to update the UEFI firmware with a simple:
apt install fwupd-amd64-signed
fwupdmgr refresh
fwupdmgr get-updates
fwupdmgr update
Nice. The 12th gen BIOS updates, currently (January 2023) beta, can be deployed through LVFS with:
fwupdmgr enable-remote lvfs-testing
echo 'DisableCapsuleUpdateOnDisk=true' >> /etc/fwupd/uefi_capsule.conf 
fwupdmgr update
Those instructions come from the beta forum post. I performed the BIOS update on 2023-01-16T16:00-0500.

Resolution tweaks The Framework laptop resolution (2256px X 1504px) is big enough to give you a pretty small font size, so welcome to the marvelous world of "scaling". The Debian wiki page has a few tricks for this.

Console This will make the console and grub fonts more readable:
cat >> /etc/default/console-setup <<EOF
FONTFACE="Terminus"
FONTSIZE=32x16
EOF
echo GRUB_GFXMODE=1024x768 >> /etc/default/grub
update-grub

Xorg Adding this to your .Xresources will make everything look much bigger:
! 1.5*96
Xft.dpi: 144
Apparently, some of this can also help:
! These might also be useful depending on your monitor and personal preference:
Xft.autohint: 0
Xft.lcdfilter:  lcddefault
Xft.hintstyle:  hintfull
Xft.hinting: 1
Xft.antialias: 1
Xft.rgba: rgb
It my experience it also makes things look a little fuzzier, which is frustrating because you have this awesome monitor but everything looks out of focus. Just bumping Xft.dpi by a 1.5 factor looks good to me. The Debian Wiki has a page on HiDPI, but it's not as good as the Arch Wiki, where the above blurb comes from. I am not using the latter because I suspect it's causing some of the "fuzziness". TODO: find the equivalent of this GNOME hack in i3? (gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']"), taken from this Framework guide

Issues

BIOS configuration The Framework BIOS has some minor issues. One issue I personally encountered is that I had disabled Quick boot and Quiet boot in the BIOS to diagnose the above boot issues. This, in turn, triggers a bug where the BIOS boot manager (F12) would just hang completely. It would also fail to boot from an external USB drive. The current fix (as of BIOS 3.03) is to re-enable both Quick boot and Quiet boot. Presumably this is something that will get fixed in a future BIOS update. Note that the following keybindings are active in the BIOS POST check:
Key Meaning
F2 Enter BIOS setup menu
F12 Enter BIOS boot manager
Delete Enter BIOS setup menu

WiFi compatibility issues I couldn't make WiFi work at first. Obviously, the default Debian installer doesn't ship with proprietary firmware (although that might change soon) so the WiFi card didn't work out of the box. But even after copying the firmware through a USB stick, I couldn't quite manage to find the right combination of ip/iw/wpa-supplicant (yes, after repeatedly copying a bunch more packages over to get those bootstrapped). (Next time I should probably try something like this post.) Thankfully, I had a little USB-C dongle with a RJ-45 jack lying around. That also required a firmware blob, but it was a single package to copy over, and with that loaded, I had network. Eventually, I did managed to make WiFi work; the problem was more on the side of "I forgot how to configure a WPA network by hand from the commandline" than anything else. NetworkManager worked fine and got WiFi working correctly. Note that this is with Debian bookworm, which has the 5.19 Linux kernel, and with the firmware-nonfree (firmware-iwlwifi, specifically) package.

Battery life I was having between about 7 hours of battery on the Purism Librem 13v4, and that's after a year or two of battery life. Now, I still have about 7 hours of battery life, which is nicer than my old ThinkPad X220 (20 minutes!) but really, it's not that good for a new generation laptop. The 12th generation Intel chipset probably improved things compared to the previous one Framework laptop, but I don't have a 11th gen Framework to compare with). (Note that those are estimates from my status bar, not wall clock measurements. They should still be comparable between the Purism and Framework, that said.) The battery life doesn't seem up to, say, Dell XPS 13, ThinkPad X1, and of course not the Apple M1, where I would expect 10+ hours of battery life out of the box. That said, I do get those kind estimates when the machine is fully charged and idle. In fact, when everything is quiet and nothing is plugged in, I get dozens of hours of battery life estimated (I've seen 25h!). So power usage fluctuates quite a bit depending on usage, which I guess is expected. Concretely, so far, light web browsing, reading emails and writing notes in Emacs (e.g. this file) takes about 8W of power:
Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   1.7   0.0   0.5  97.6   0.2  1.2 4684.9 1985.2 126.6 39.1 128.0   7.57
 GeoMean   1.4   0.0   0.4  97.6   0.1  1.2 4416.6 1734.5 111.6 27.9 113.3   7.54
  StdDev   1.0   0.2   0.2   1.2   0.0  0.5 1584.7 1058.3 82.1 44.0 80.2   0.71
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.2   0.0   0.2  94.9   0.1  1.0 2242.0  698.2 82.0 17.0 82.0   6.36
 Maximum   4.1   1.1   1.0  99.4   0.2  3.0 8687.4 4445.1 463.0 249.0 449.0   9.10
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
System:   7.57 Watts on average with standard deviation 0.71
Expansion cards matter a lot in the battery life (see below for a thorough discussion), my normal setup is 2xUSB-C and 1xUSB-A (yes, with an empty slot, and yes, to save power). Interestingly, playing a video in a (720p) window in a window takes up more power (10.5W) than in full screen (9.5W) but I blame that on my desktop setup (i3 + compton)... Not sure if mpv hits the VA-API, maybe not in windowed mode. Similar results with 1080p, interestingly, except the window struggles to keep up altogether. Full screen playback takes a relatively comfortable 9.5W, which means a solid 5h+ of playback, which is fine by me. Fooling around the web, small edits, youtube-dl, and I'm at around 80% battery after about an hour, with an estimated 5h left, which is a little disappointing. I had a 7h remaining estimate before I started goofing around Discourse, so I suspect the website is a pretty big battery drain, actually. I see about 10-12 W, while I was probably at half that (6-8W) just playing music with mpv in the background... In other words, it looks like editing posts in Discourse with Firefox takes a solid 4-6W of power. Amazing and gross. (When writing about abusive power usage generates more power usage, is that an heisenbug? Or schr dinbug?)

Power management Compared to the Purism Librem 13v4, the ongoing power usage seems to be slightly better. An anecdotal metric is that the Purism would take 800mA idle, while the more powerful Framework manages a little over 500mA as I'm typing this, fluctuating between 450 and 600mA. That is without any active expansion card, except the storage. Those numbers come from the output of tlp-stat -b and, unfortunately, the "ampere" unit makes it quite hard to compare those, because voltage is not necessarily the same between the two platforms.
  • TODO: review Arch Linux's tips on power saving
  • TODO: i915 driver has a lot of parameters, including some about power saving, see, again, the arch wiki, and particularly enable_fbc=1
TL:DR; power management on the laptop is an issue, but there's various tweaks you can make to improve it. Try:
  • powertop --auto-tune
  • apt install tlp && systemctl enable tlp
  • nvme.noacpi=1 mem_sleep_default=deep on the kernel command line may help with standby power usage
  • keep only USB-C expansion cards plugged in, all others suck power even when idle
  • consider upgrading the BIOS to latest beta (3.06 at the time of writing), unverified power savings
  • latest Linux kernels (6.2) promise power savings as well (unverified)
Update: also try to follow the official optimization guide. It was made for Ubuntu but will probably also work for your distribution of choice with a few tweaks. They recommend using tlpui but it's not packaged in Debian. There is, however, a Flatpak release. In my case, it resulted in the following diff to tlp.conf: tlp.patch.

Background on CPU architecture There were power problems in the 11th gen Framework laptop, according to this report from Linux After Dark, so the issues with power management on the Framework are not new. The 12th generation Intel CPU (AKA "Alder Lake") is a big-little architecture with "power-saving" and "performance" cores. There used to be performance problems introduced by the scheduler in Linux 5.16 but those were eventually fixed in 5.18, which uses Intel's hardware as an "intelligent, low-latency hardware-assisted scheduler". According to Phoronix, the 5.19 release improved the power saving, at the cost of some penalty cost. There were also patch series to make the scheduler configurable, but it doesn't look those have been merged as of 5.19. There was also a session about this at the 2022 Linux Plumbers, but they stopped short of talking more about the specific problems Linux is facing in Alder lake:
Specifically, the kernel's energy-aware scheduling heuristics don't work well on those CPUs. A number of features present there complicate the energy picture; these include SMT, Intel's "turbo boost" mode, and the CPU's internal power-management mechanisms. For many workloads, running on an ostensibly more power-hungry Pcore can be more efficient than using an Ecore. Time for discussion of the problem was lacking, though, and the session came to a close.
All this to say that the 12gen Intel line shipped with this Framework series should have better power management thanks to its power-saving cores. And Linux has had the scheduler changes to make use of this (but maybe is still having trouble). In any case, this might not be the source of power management problems on my laptop, quite the opposite. Also note that the firmware updates for various chipsets are supposed to improve things eventually. On the other hand, The Verge simply declared the whole P-series a mistake...

Attempts at improving power usage I did try to follow some of the tips in this forum post. The tricks powertop --auto-tune and tlp's PCIE_ASPM_ON_BAT=powersupersave basically did nothing: I was stuck at 10W power usage in powertop (600+mA in tlp-stat). Apparently, I should be able to reach the C8 CPU power state (or even C9, C10) in powertop, but I seem to be stock at C7. (Although I'm not sure how to read that tab in powertop: in the Core(HW) column there's only C3/C6/C7 states, and most cores are 85% in C7 or maybe C6. But the next column over does show many CPUs in C10 states... As it turns out, the graphics card actually takes up a good chunk of power unless proper power management is enabled (see below). After tweaking this, I did manage to get down to around 7W power usage in powertop. Expansion cards actually do take up power, and so does the screen, obviously. The fully-lit screen takes a solid 2-3W of power compared to the fully dimmed screen. When removing all expansion cards and making the laptop idle, I can spin it down to 4 watts power usage at the moment, and an amazing 2 watts when the screen turned off.

Caveats Abusive (10W+) power usage that I initially found could be a problem with my desktop configuration: I have this silly status bar that updates every second and probably causes redraws... The CPU certainly doesn't seem to spin down below 1GHz. Also note that this is with an actual desktop running with everything: it could very well be that some things (I'm looking at you Signal Desktop) take up unreasonable amount of power on their own (hello, 1W/electron, sheesh). Syncthing and containerd (Docker!) also seem to take a good 500mW just sitting there. Beyond my desktop configuration, this could, of course, be a Debian-specific problem; your favorite distribution might be better at power management.

Idle power usage tests Some expansion cards waste energy, even when unused. Here is a summary of the findings from the powerstat page. I also include other devices tested in this page for completeness:
Device Minimum Average Max Stdev Note
Screen, 100% 2.4W 2.6W 2.8W N/A
Screen, 1% 30mW 140mW 250mW N/A
Backlight 1 290mW ? ? ? fairly small, all things considered
Backlight 2 890mW 1.2W 3W? 460mW? geometric progression
Backlight 3 1.69W 1.5W 1.8W? 390mW? significant power use
Radios 100mW 250mW N/A N/A
USB-C N/A N/A N/A N/A negligible power drain
USB-A 10mW 10mW ? 10mW almost negligible
DisplayPort 300mW 390mW 600mW N/A not passive
HDMI 380mW 440mW 1W? 20mW not passive
1TB SSD 1.65W 1.79W 2W 12mW significant, probably higher when busy
MicroSD 1.6W 3W 6W 1.93W highest power usage, possibly even higher when busy
Ethernet 1.69W 1.64W 1.76W N/A comparable to the SSD card
So it looks like all expansion cards but the USB-C ones are active, i.e. they draw power with idle. The USB-A cards are the least concern, sucking out 10mW, pretty much within the margin of error. But both the DisplayPort and HDMI do take a few hundred miliwatts. It looks like USB-A connectors have this fundamental flaw that they necessarily draw some powers because they lack the power negotiation features of USB-C. At least according to this post:
It seems the USB A must have power going to it all the time, that the old USB 2 and 3 protocols, the USB C only provides power when there is a connection. Old versus new.
Apparently, this is a problem specific to the USB-C to USB-A adapter that ships with the Framework. Some people have actually changed their orders to all USB-C because of this problem, but I'm not sure the problem is as serious as claimed in the forums. I couldn't reproduce the "one watt" power drains suggested elsewhere, at least not repeatedly. (A previous version of this post did show such a power drain, but it was in a less controlled test environment than the series of more rigorous tests above.) The worst offenders are the storage cards: the SSD drive takes at least one watt of power and the MicroSD card seems to want to take all the way up to 6 watts of power, both just sitting there doing nothing. This confirms claims of 1.4W for the SSD (but not 5W) power usage found elsewhere. The former post has instructions on how to disable the card in software. The MicroSD card has been reported as using 2 watts, but I've seen it as high as 6 watts, which is pretty damning. The Framework team has a beta update for the DisplayPort adapter but currently only for Windows (LVFS technically possible, "under investigation"). A USB-A firmware update is also under investigation. It is therefore likely at least some of those power management issues will eventually be fixed. Note that the upcoming Ethernet card has a reported 2-8W power usage, depending on traffic. I did my own power usage tests in powerstat-wayland and they seem lower than 2W. The upcoming 6.2 Linux kernel might also improve battery usage when idle, see this Phoronix article for details, likely in early 2023.

Idle power usage tests under Wayland Update: I redid those tests under Wayland, see powerstat-wayland for details. The TL;DR: is that power consumption is either smaller or similar.

Idle power usage tests, 3.06 beta BIOS I redid the idle tests after the 3.06 beta BIOS update and ended up with this results:
Device Minimum Average Max Stdev Note
Baseline 1.96W 2.01W 2.11W 30mW 1 USB-C, screen off, backlight off, no radios
2 USB-C 1.95W 2.16W 3.69W 430mW USB-C confirmed as mostly passive...
3 USB-C 1.95W 2.16W 3.69W 430mW ... although with extra stdev
1TB SSD 3.72W 3.85W 4.62W 200mW unchanged from before upgrade
1 USB-A 1.97W 2.18W 4.02W 530mW unchanged
2 USB-A 1.97W 2.00W 2.08W 30mW unchanged
3 USB-A 1.94W 1.99W 2.03W 20mW unchanged
MicroSD w/o card 3.54W 3.58W 3.71W 40mW significant improvement! 2-3W power saving!
MicroSD w/ card 3.53W 3.72W 5.23W 370mW new measurement! increased deviation
DisplayPort 2.28W 2.31W 2.37W 20mW unchanged
1 HDMI 2.43W 2.69W 4.53W 460mW unchanged
2 HDMI 2.53W 2.59W 2.67W 30mW unchanged
External USB 3.85W 3.89W 3.94W 30mW new result
Ethernet 3.60W 3.70W 4.91W 230mW unchanged
Note that the table summary is different than the previous table: here we show the absolute numbers while the previous table was doing a confusing attempt at showing relative (to the baseline) numbers. Conclusion: the 3.06 BIOS update did not significantly change idle power usage stats except for the MicroSD card which has significantly improved. The new "external USB" test is also interesting: it shows how the provided 1TB SSD card performs (admirably) compared to existing devices. The other new result is the MicroSD card with a card which, interestingly, uses less power than the 1TB SSD drive.

Standby battery usage I wrote some quick hack to evaluate how much power is used during sleep. Apparently, this is one of the areas that should have improved since the first Framework model, let's find out. My baseline for comparison is the Purism laptop, which, in 10 minutes, went from this:
sep 28 11:19:45 angela systemd-sleep[209379]: /sys/class/power_supply/BAT/charge_now                      =   6045 [mAh]
... to this:
sep 28 11:29:47 angela systemd-sleep[209725]: /sys/class/power_supply/BAT/charge_now                      =   6037 [mAh]
That's 8mAh per 10 minutes (and 2 seconds), or 48mA, or, with this battery, about 127 hours or roughly 5 days of standby. Not bad! In comparison, here is my really old x220, before:
sep 29 22:13:54 emma systemd-sleep[176315]: /sys/class/power_supply/BAT0/energy_now                     =   5070 [mWh]
... after:
sep 29 22:23:54 emma systemd-sleep[176486]: /sys/class/power_supply/BAT0/energy_now                     =   4980 [mWh]
... which is 90 mwH in 10 minutes, or a whopping 540mA, which was possibly okay when this battery was new (62000 mAh, so about 100 hours, or about 5 days), but this battery is almost dead and has only 5210 mAh when full, so only 10 hours standby. And here is the Framework performing a similar test, before:
sep 29 22:27:04 angela systemd-sleep[4515]: /sys/class/power_supply/BAT1/charge_full                    =   3518 [mAh]
sep 29 22:27:04 angela systemd-sleep[4515]: /sys/class/power_supply/BAT1/charge_now                     =   2861 [mAh]
... after:
sep 29 22:37:08 angela systemd-sleep[4743]: /sys/class/power_supply/BAT1/charge_now                     =   2812 [mAh]
... which is 49mAh in a little over 10 minutes (and 4 seconds), or 292mA, much more than the Purism, but half of the X220. At this rate, the battery would last on standby only 12 hours!! That is pretty bad. Note that this was done with the following expansion cards:
  • 2 USB-C
  • 1 1TB SSD drive
  • 1 USB-A with a hub connected to it, with keyboard and LAN
Preliminary tests without the hub (over one minute) show that it doesn't significantly affect this power consumption (300mA). This guide also suggests booting with nvme.noacpi=1 but this still gives me about 5mAh/min (or 300mA). Adding mem_sleep_default=deep to the kernel command line does make a difference. Before:
sep 29 23:03:11 angela systemd-sleep[3699]: /sys/class/power_supply/BAT1/charge_now                     =   2544 [mAh]
... after:
sep 29 23:04:25 angela systemd-sleep[4039]: /sys/class/power_supply/BAT1/charge_now                     =   2542 [mAh]
... which is 2mAh in 74 seconds, which is 97mA, brings us to a more reasonable 36 hours, or a day and a half. It's still above the x220 power usage, and more than an order of magnitude more than the Purism laptop. It's also far from the 0.4% promised by upstream, which would be 14mA for the 3500mAh battery. It should also be noted that this "deep" sleep mode is a little more disruptive than regular sleep. As you can see by the timing, it took more than 10 seconds for the laptop to resume, which feels a little alarming as your banging the keyboard to bring it back to life. You can confirm the current sleep mode with:
# cat /sys/power/mem_sleep
s2idle [deep]
In the above, deep is selected. You can change it on the fly with:
printf s2idle > /sys/power/mem_sleep
Here's another test:
sep 30 22:25:50 angela systemd-sleep[32207]: /sys/class/power_supply/BAT1/charge_now                     =   1619 [mAh]
sep 30 22:31:30 angela systemd-sleep[32516]: /sys/class/power_supply/BAT1/charge_now                     =   1613 [mAh]
... better! 6 mAh in about 6 minutes, works out to 63.5mA, so more than two days standby. A longer test:
oct 01 09:22:56 angela systemd-sleep[62978]: /sys/class/power_supply/BAT1/charge_now                     =   3327 [mAh]
oct 01 12:47:35 angela systemd-sleep[63219]: /sys/class/power_supply/BAT1/charge_now                     =   3147 [mAh]
That's 180mAh in about 3.5h, 52mA! Now at 66h, or almost 3 days. I wasn't sure why I was seeing such fluctuations in those tests, but as it turns out, expansion card power tests show that they do significantly affect power usage, especially the SSD drive, which can take up to two full watts of power even when idle. I didn't control for expansion cards in the above tests running them with whatever card I had plugged in without paying attention so it's likely the cause of the high power usage and fluctuations. It might be possible to work around this problem by disabling USB devices before suspend. TODO. See also this post. In the meantime, I have been able to get much better suspend performance by unplugging all modules. Then I get this result:
oct 04 11:15:38 angela systemd-sleep[257571]: /sys/class/power_supply/BAT1/charge_now                     =   3203 [mAh]
oct 04 15:09:32 angela systemd-sleep[257866]: /sys/class/power_supply/BAT1/charge_now                     =   3145 [mAh]
Which is 14.8mA! Almost exactly the number promised by Framework! With a full battery, that means a 10 days suspend time. This is actually pretty good, and far beyond what I was expecting when starting down this journey. So, once the expansion cards are unplugged, suspend power usage is actually quite reasonable. More detailed standby tests are available in the standby-tests page, with a summary below. There is also some hope that the Chromebook edition specifically designed with a specification of 14 days standby time could bring some firmware improvements back down to the normal line. Some of those issues were reported upstream in April 2022, but there doesn't seem to have been any progress there since. TODO: one final solution here is suspend-then-hibernate, which Windows uses for this TODO: consider implementing the S0ix sleep states , see also troubleshooting TODO: consider https://github.com/intel/pm-graph

Standby expansion cards test results This table is a summary of the more extensive standby-tests I have performed:
Device Wattage Amperage Days Note
baseline 0.25W 16mA 9 sleep=deep nvme.noacpi=1
s2idle 0.29W 18.9mA ~7 sleep=s2idle nvme.noacpi=1
normal nvme 0.31W 20mA ~7 sleep=s2idle without nvme.noacpi=1
1 USB-C 0.23W 15mA ~10
2 USB-C 0.23W 14.9mA same as above
1 USB-A 0.75W 48.7mA 3 +500mW (!!) for the first USB-A card!
2 USB-A 1.11W 72mA 2 +360mW
3 USB-A 1.48W 96mA <2 +370mW
1TB SSD 0.49W 32mA <5 +260mW
MicroSD 0.52W 34mA ~4 +290mW
DisplayPort 0.85W 55mA <3 +620mW (!!)
1 HDMI 0.58W 38mA ~4 +250mW
2 HDMI 0.65W 42mA <4 +70mW (?)
Conclusions:
  • USB-C cards take no extra power on suspend, possibly less than empty slots, more testing required
  • USB-A cards take a lot more power on suspend (300-500mW) than on regular idle (~10mW, almost negligible)
  • 1TB SSD and MicroSD cards seem to take a reasonable amount of power (260-290mW), compared to their runtime equivalents (1-6W!)
  • DisplayPort takes a surprising lot of power (620mW), almost double its average runtime usage (390mW)
  • HDMI cards take, surprisingly, less power (250mW) in standby than the DP card (620mW)
  • and oddly, a second card adds less power usage (70mW?!) than the first, maybe a circuit is used by both?
A discussion of those results is in this forum post.

Standby expansion cards test results, 3.06 beta BIOS Framework recently (2022-11-07) announced that they will publish a firmware upgrade to address some of the USB-C issues, including power management. This could positively affect the above result, improving both standby and runtime power usage. The update came out in December 2022 and I redid my analysis with the following results:
Device Wattage Amperage Days Note
baseline 0.25W 16mA 9 no cards, same as before upgrade
1 USB-C 0.25W 16mA 9 same as before
2 USB-C 0.25W 16mA 9 same
1 USB-A 0.80W 62mA 3 +550mW!! worse than before
2 USB-A 1.12W 73mA <2 +320mW, on top of the above, bad!
Ethernet 0.62W 40mA 3-4 new result, decent
1TB SSD 0.52W 34mA 4 a bit worse than before (+2mA)
MicroSD 0.51W 22mA 4 same
DisplayPort 0.52W 34mA 4+ upgrade improved by 300mW
1 HDMI ? 38mA ? same
2 HDMI ? 45mA ? a bit worse than before (+3mA)
Normal 1.08W 70mA ~2 Ethernet, 2 USB-C, USB-A
Full results in standby-tests-306. The big takeaway for me is that the update did not improve power usage on the USB-A ports which is a big problem for my use case. There is a notable improvement on the DisplayPort power consumption which brings it more in line with the HDMI connector, but it still doesn't properly turn off on suspend either. Even worse, the USB-A ports now sometimes fails to resume after suspend, which is pretty annoying. This is a known problem that will hopefully get fixed in the final release.

Battery wear protection The BIOS has an option to limit charge to 80% to mitigate battery wear. There's a way to control the embedded controller from runtime with fw-ectool, partly documented here. The command would be:
sudo ectool fwchargelimit 80
I looked at building this myself but failed to run it. I opened a RFP in Debian so that we can ship this in Debian, and also documented my work there. Note that there is now a counter that tracks charge/discharge cycles. It's visible in tlp-stat -b, which is a nice improvement:
root@angela:/home/anarcat# tlp-stat -b
--- TLP 1.5.0 --------------------------------------------
+++ Battery Care
Plugin: generic
Supported features: none available
+++ Battery Status: BAT1
/sys/class/power_supply/BAT1/manufacturer                   = NVT
/sys/class/power_supply/BAT1/model_name                     = Framewo
/sys/class/power_supply/BAT1/cycle_count                    =      3
/sys/class/power_supply/BAT1/charge_full_design             =   3572 [mAh]
/sys/class/power_supply/BAT1/charge_full                    =   3541 [mAh]
/sys/class/power_supply/BAT1/charge_now                     =   1625 [mAh]
/sys/class/power_supply/BAT1/current_now                    =    178 [mA]
/sys/class/power_supply/BAT1/status                         = Discharging
/sys/class/power_supply/BAT1/charge_control_start_threshold = (not available)
/sys/class/power_supply/BAT1/charge_control_end_threshold   = (not available)
Charge                                                      =   45.9 [%]
Capacity                                                    =   99.1 [%]
One thing that is still missing is the charge threshold data (the (not available) above). There's been some work to make that accessible in August, stay tuned? This would also make it possible implement hysteresis support.

Ethernet expansion card The Framework ethernet expansion card is a fancy little doodle: "2.5Gbit/s and 10/100/1000Mbit/s Ethernet", the "clear housing lets you peek at the RTL8156 controller that powers it". Which is another way to say "we didn't completely finish prod on this one, so it kind of looks like we 3D-printed this in the shop".... The card is a little bulky, but I guess that's inevitable considering the RJ-45 form factor when compared to the thin Framework laptop. I have had a serious issue when trying it at first: the link LEDs just wouldn't come up. I made a full bug report in the forum and with upstream support, but eventually figured it out on my own. It's (of course) a power saving issue: if you reboot the machine, the links come up when the laptop is running the BIOS POST check and even when the Linux kernel boots. I first thought that the problem is likely related to the powertop service which I run at boot time to tweak some power saving settings. It seems like this:
echo 'on' > '/sys/bus/usb/devices/4-2/power/control'
... is a good workaround to bring the card back online. You can even return to power saving mode and the card will still work:
echo 'auto' > '/sys/bus/usb/devices/4-2/power/control'
Further research by Matt_Hartley from the Framework Team found this issue in the tlp tracker that shows how the USB_AUTOSUSPEND setting enables the power saving even if the driver doesn't support it, which, in retrospect, just sounds like a bad idea. To quote that issue:
By default, USB power saving is active in the kernel, but not force-enabled for incompatible drivers. That is, devices that support suspension will suspend, drivers that do not, will not.
So the fix is actually to uninstall tlp or disable that setting by adding this to /etc/tlp.conf:
USB_AUTOSUSPEND=0
... but that disables auto-suspend on all USB devices, which may hurt other power usage performance. I have found that a a combination of:
USB_AUTOSUSPEND=1
USB_DENYLIST="0bda:8156"
and this on the kernel commandline:
usbcore.quirks=0bda:8156:k
... actually does work correctly. I now have this in my /etc/default/grub.d/framework-tweaks.cfg file:
# net.ifnames=0: normal interface names ffs (e.g. eth0, wlan0, not wlp166
s0)
# nvme.noacpi=1: reduce SSD disk power usage (not working)
# mem_sleep_default=deep: reduce power usage during sleep (not working)
# usbcore.quirk is a workaround for the ethernet card suspend bug: https:
//guides.frame.work/Guide/Fedora+37+Installation+on+the+Framework+Laptop/
108?lang=en
GRUB_CMDLINE_LINUX="net.ifnames=0 nvme.noacpi=1 mem_sleep_default=deep usbcore.quirks=0bda:8156:k"
# fix the resolution in grub for fonts to not be tiny
GRUB_GFXMODE=1024x768
Other than that, I haven't been able to max out the card because I don't have other 2.5Gbit/s equipment at home, which is strangely satisfying. But running against my Turris Omnia router, I could pretty much max a gigabit fairly easily:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec  238             sender
[  5]   0.00-10.00  sec  1.09 GBytes   934 Mbits/sec                  receiver
The card doesn't require any proprietary firmware blobs which is surprising. Other than the power saving issues, it just works. In my power tests (see powerstat-wayland), the Ethernet card seems to use about 1.6W of power idle, without link, in the above "quirky" configuration where the card is functional but without autosuspend.

Proprietary firmware blobs The framework does need proprietary firmware to operate. Specifically:
  • the WiFi network card shipped with the DIY kit is a AX210 card that requires a 5.19 kernel or later, and the firmware-iwlwifi non-free firmware package
  • the Bluetooth adapter also loads the firmware-iwlwifi package (untested)
  • the graphics work out of the box without firmware, but certain power management features come only with special proprietary firmware, normally shipped in the firmware-misc-nonfree but currently missing from the package
Note that, at the time of writing, the latest i915 firmware from linux-firmware has a serious bug where loading all the accessible firmware results in noticeable I estimate 200-500ms lag between the keyboard (not the mouse!) and the display. Symptoms also include tearing and shearing of windows, it's pretty nasty. One workaround is to delete the two affected firmware files:
cd /lib/firmware && rm adlp_guc_70.1.1.bin adlp_guc_69.0.3.bin
update-initramfs -u
You will get the following warning during build, which is good as it means the problematic firmware is disabled:
W: Possible missing firmware /lib/firmware/i915/adlp_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/adlp_guc_70.1.1.bin for module i915
But then it also means that critical firmware isn't loaded, which means, among other things, a higher battery drain. I was able to move from 8.5-10W down to the 7W range after making the firmware work properly. This is also after turning the backlight all the way down, as that takes a solid 2-3W in full blast. The proper fix is to use some compositing manager. I ended up using compton with the following systemd unit:
[Unit]
Description=start compositing manager
PartOf=graphical-session.target
ConditionHost=angela
[Service]
Type=exec
ExecStart=compton --show-all-xerrors --backend glx --vsync opengl-swc
Restart=on-failure
[Install]
RequiredBy=graphical-session.target
compton is orphaned however, so you might be tempted to use picom instead, but in my experience the latter uses much more power (1-2W extra, similar experience). I also tried compiz but it would just crash with:
anarcat@angela:~$ compiz --replace
compiz (core) - Warn: No XI2 extension
compiz (core) - Error: Another composite manager is already running on screen: 0
compiz (core) - Fatal: No manageable screens found on display :0
When running from the base session, I would get this instead:
compiz (core) - Warn: No XI2 extension
compiz (core) - Error: Couldn't load plugin 'ccp'
compiz (core) - Error: Couldn't load plugin 'ccp'
Thanks to EmanueleRocca for figuring all that out. See also this discussion about power management on the Framework forum. Note that Wayland environments do not require any special configuration here and actually work better, see my Wayland migration notes for details.
Also note that the iwlwifi firmware also looks incomplete. Even with the package installed, I get those errors in dmesg:
[   19.534429] Intel(R) Wireless WiFi driver for Linux
[   19.534691] iwlwifi 0000:a6:00.0: enabling device (0000 -> 0002)
[   19.541867] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[   19.541881] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[   19.541882] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-72.ucode failed with error -2
[   19.541890] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[   19.541895] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[   19.541896] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-71.ucode failed with error -2
[   19.541903] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[   19.541907] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[   19.541908] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-70.ucode failed with error -2
[   19.541913] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[   19.541916] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[   19.541917] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-69.ucode failed with error -2
[   19.541922] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[   19.541926] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[   19.541927] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-68.ucode failed with error -2
[   19.541933] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[   19.541937] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[   19.541937] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-67.ucode failed with error -2
[   19.544244] iwlwifi 0000:a6:00.0: firmware: direct-loading firmware iwlwifi-ty-a0-gf-a0-66.ucode
[   19.544257] iwlwifi 0000:a6:00.0: api flags index 2 larger than supported by driver
[   19.544270] iwlwifi 0000:a6:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.1
[   19.544523] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[   19.544528] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[   19.544530] iwlwifi 0000:a6:00.0: loaded firmware version 66.55c64978.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
Some of those are available in the latest upstream firmware package (iwlwifi-ty-a0-gf-a0-71.ucode, -68, and -67), but not all (e.g. iwlwifi-ty-a0-gf-a0-72.ucode is missing) . It's unclear what those do or don't, as the WiFi seems to work well without them. I still copied them in from the latest linux-firmware package in the hope they would help with power management, but I did not notice a change after loading them. There are also multiple knobs on the iwlwifi and iwlmvm drivers. The latter has a power_schmeme setting which defaults to 2 (balanced), setting it to 3 (low power) could improve battery usage as well, in theory. The iwlwifi driver also has power_save (defaults to disabled) and power_level (1-5, defaults to 1) settings. See also the output of modinfo iwlwifi and modinfo iwlmvm for other driver options.

Graphics acceleration After loading the latest upstream firmware and setting up a compositing manager (compton, above), I tested the classic glxgears. Running in a window gives me odd results, as the gears basically grind to a halt:
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
137 frames in 5.1 seconds = 26.984 FPS
27 frames in 5.4 seconds =  5.022 FPS
Ouch. 5FPS! But interestingly, once the window is in full screen, it does hit the monitor refresh rate:
300 frames in 5.0 seconds = 60.000 FPS
I'm not really a gamer and I'm not normally using any of that fancy graphics acceleration stuff (except maybe my browser does?). I installed intel-gpu-tools for the intel_gpu_top command to confirm the GPU was engaged when doing those simulations. A nice find. Other useful diagnostic tools include glxgears and glxinfo (in mesa-utils) and (vainfo in vainfo). Following to this post, I also made sure to have those settings in my about:config in Firefox, or, in user.js:
user_pref("media.ffmpeg.vaapi.enabled", true);
Note that the guide suggests many other settings to tweak, but those might actually be overkill, see this comment and its parents. I did try forcing hardware acceleration by setting gfx.webrender.all to true, but everything became choppy and weird. The guide also mentions installing the intel-media-driver package, but I could not find that in Debian. The Arch wiki has, as usual, an excellent reference on hardware acceleration in Firefox.

Chromium / Signal desktop bugs It looks like both Chromium and Signal Desktop misbehave with my compositor setup (compton + i3). The fix is to add a persistent flag to Chromium. In Arch, it's conveniently in ~/.config/chromium-flags.conf but that doesn't actually work in Debian. I had to put the flag in /etc/chromium.d/disable-compositing, like this:
export CHROMIUM_FLAGS="$CHROMIUM_FLAGS --disable-gpu-compositing"
It's possible another one of the hundreds of flags might fix this issue better, but I don't really have time to go through this entire, incomplete, and unofficial list (!?!). Signal Desktop is a similar problem, and doesn't reuse those flags (because of course it doesn't). Instead I had to rewrite the wrapper script in /usr/local/bin/signal-desktop to use this instead:
exec /usr/bin/flatpak run --branch=stable --arch=x86_64 org.signal.Signal --disable-gpu-compositing "$@"
This was mostly done in this Puppet commit. I haven't figured out the root of this problem. I did try using picom and xcompmgr; they both suffer from the same issue. Another Debian testing user on Wayland told me they haven't seen this problem, so hopefully this can be fixed by switching to wayland.

Graphics card hangs I believe I might have this bug which results in a total graphical hang for 15-30 seconds. It's fairly rare so it's not too disruptive, but when it does happen, it's pretty alarming. The comments on that bug report are encouraging though: it seems this is a bug in either mesa or the Intel graphics driver, which means many people have this problem so it's likely to be fixed. There's actually a merge request on mesa already (2022-12-29). It could also be that bug because the error message I get is actually:
Jan 20 12:49:10 angela kernel: Asynchronous wait on fence 0000:00:02.0:sway[104431]:cb0ae timed out (hint:intel_atomic_commit_ready [i915]) 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC authenticated 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC submission enabled 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
It's a solid 30 seconds graphical hang. Maybe the keyboard and everything else keeps working. The latter bug report is quite long, with many comments, but this one from January 2023 seems to say that Sway 1.8 fixed the problem. There's also an earlier patch to add an extra kernel parameter that supposedly fixes that too. There's all sorts of other workarounds in there, for example this:
echo "options i915 enable_dc=1 enable_guc_loading=1 enable_guc_submission=1 edp_vswing=0 enable_guc=2 enable_fbc=1 enable_psr=1 disable_power_well=0"   sudo tee /etc/modprobe.d/i915.conf
from this comment... So that one is unsolved, as far as the upstream drivers are concerned, but maybe could be fixed through Sway.

Weird USB hangs / graphical glitches I have had weird connectivity glitches better described in this post, but basically: my USB keyboard and mice (connected over a USB hub) drop keys, lag a lot or hang, and I get visual glitches. The fix was to tighten the screws around the CPU on the motherboard (!), which is, thankfully, a rather simple repair.

USB docks are hell Note that the monitors are hooked up to angela through a USB-C / Thunderbolt dock from Cable Matters, with the lovely name of 201053-SIL. It has issues, see this blog post for an in-depth discussion.

Shipping details I ordered the Framework in August 2022 and received it about a month later, which is sooner than expected because the August batch was late. People (including me) expected this to have an impact on the September batch, but it seems Framework have been able to fix the delivery problems and keep up with the demand. As of early 2023, their website announces that laptops ship "within 5 days". I have myself ordered a few expansion cards in November 2022, and they shipped on the same day, arriving 3-4 days later.

The supply pipeline There are basically 6 steps in the Framework shipping pipeline, each (except the last) accompanied with an email notification:
  1. pre-order
  2. preparing batch
  3. preparing order
  4. payment complete
  5. shipping
  6. (received)
This comes from the crowdsourced spreadsheet, which should be updated when the status changes here. I was part of the "third batch" of the 12th generation laptop, which was supposed to ship in September. It ended up arriving on my door step on September 27th, about 33 days after ordering. It seems current orders are not processed in "batches", but in real time, see this blog post for details on shipping.

Shipping trivia I don't know about the others, but my laptop shipped through no less than four different airplane flights. Here are the hops it took: I can't quite figure out how to calculate exactly how much mileage that is, but it's huge. The ride through Alaska is surprising enough but the bounce back through Winnipeg is especially weird. I guess the route happens that way because of Fedex shipping hubs. There was a related oddity when I had my Purism laptop shipped: it left from the west coast and seemed to enter on an endless, two week long road trip across the continental US.

Other resources

22 April 2022

Ritesh Raj Sarraf: Systemd Service Hang

Finally, TIL, what can all be the reason for systemd services to hang indefinitely. The internet is flooded with numerous reports on this topic but no clear answers. So no more uselessly marked workarounds like: systemctl daemon-reload and systemctl-daemon-reexec for this scenario. The scene would be something along the lines of:
rrs         6467  0.0  0.0  23088 15852 pts/1    Ss   12:53   0:00          \_ /bin/bash
rrs        11512  0.0  0.0  14876  4608 pts/1    S+   13:18   0:00              \_ systemctl restart snapper-timeline.timer
rrs        11513  0.0  0.0  14984  3076 pts/1    S+   13:18   0:00                  \_ /bin/systemd-tty-ask-password-agent --watch
rrs        11514  0.0  0.0 234756  6752 pts/1    Sl+  13:18   0:00                  \_ /usr/bin/pkttyagent --notify-fd 5 --fallback
The snapper-timeline service is important to me and it not running for months is a complete failure. Disappointingly, commands like systemctl --failed do not report of this oddity. The overall system status is reported to be fine, which is completely incorrect. Thankfully, a kind soul s comment gave the hint. The problem is that you could be having certain services in Activating status, which thus blocks all other services; quietly. So much for the unnecessary fun. Looking further, in my case, it was:
rrs@priyasi:~$ systemctl list-jobs 
JOB  UNIT                           TYPE  STATE  
81   timers.target                  start waiting
85   man-db.timer                   start waiting
88   fstrim.timer                   start waiting
3832 snapper-timeline.service       start waiting
83   snapper-timeline.timer         start waiting
39   systemd-time-wait-sync.service start running
87   logrotate.timer                start waiting
84   debspawn-clear-caches.timer    start waiting
89   plocate-updatedb.timer         start waiting
91   dpkg-db-backup.timer           start waiting
93   e2scrub_all.timer              start waiting
40   time-sync.target               start waiting
86   apt-listbugs.timer             start waiting

13 jobs listed.
13:12                      
That was it. I knew the systemd-timesyncd service, in the past, had given me enough headaches. And so was it this time, just quietly doing it all again.
rrs@priyasi:~$ systemctl status systemd-time-wait-sync.service
  systemd-time-wait-sync.service - Wait Until Kernel Time Synchronized
     Loaded: loaded (/lib/systemd/system/systemd-time-wait-sync.service; enabled; vendor preset>
     Active: activating (start) since Fri 2022-04-22 13:14:25 IST; 1min 38s ago
       Docs: man:systemd-time-wait-sync.service(8)
   Main PID: 11090 (systemd-time-wa)
      Tasks: 1 (limit: 37051)
     Memory: 836.0K
        CPU: 7ms
     CGroup: /system.slice/systemd-time-wait-sync.service
              11090 /lib/systemd/systemd-time-wait-sync

Apr 22 13:14:25 priyasi systemd[1]: Starting Wait Until Kernel Time Synchronized...
Apr 22 13:14:25 priyasi systemd-time-wait-sync[11090]: adjtime state 5 status 40 time Fri 2022->
13:16                   => 3  
Dear LazyWeb, anybody knows of why the systemd-time-wait-sync service would hang indefinitely? I ve had identical setups on many machines, in the same network, where others don t exhibit this problem.
rrs@priyasi:~$ systemctl cat systemd-time-wait-sync.service

...snipped...

[Service]
Type=oneshot
ExecStart=/lib/systemd/systemd-time-wait-sync
TimeoutStartSec=infinity
RemainAfterExit=yes

[Install]
WantedBy=sysinit.target
The TimeoutStartSec=infinity is definitely an attribute that shouldn t be shipped in any system services. There are use cases for it but that should be left for local admins to explicitly decide. Hanging for infinity is not a desired behavior for a system service. In figuring all this out, today I learnt the handy systemctl list-jobs command, which will give the list of active running/blocked/waiting jobs.

20 April 2022

Ritesh Raj Sarraf: Btrfs Subvol Fix

There surely is need for better tooling on the BTRFS File System side. While migrating my setup from one machine to another, this is one issue I came to be aware of, only today, when my backup tool (btrbk) complained about it. Following the pointers, I see the below snippet in btrfs-subvolume manual page.
       A snapshot that was created by send/receive will be read-only, with different last change generation, read-only and with set received_uuid which identifies the subvolume on the
       filesystem that produced the stream. The usecase relies on matching data on both sides. Changing the subvolume to read-write after it has been received requires to reset the
       received_uuid. As this is a notable change and could potentially break the incremental send use case, performing it by btrfs property set requires force if that is really desired by
       user.

           Note
           The safety checks have been implemented in 5.14.2, any subvolumes previously received (with a valid received_uuid) and read-write status may exist and could still lead to
           problems with send/receive. You can use btrfs subvolume show to identify them. Flipping the flags to read-only and back to read-write will reset the received_uuid manually. There
           may exist a convenience tool in the future.
Fixing the Received UUID: flag meant running the below:
rrs@priyasi:.../spool$ sudo btrfs sub show /
WARNING: the subvolume is read-write and has received_uuid set,
         don't use it for incremental send. Please see section
         'SUBVOLUME FLAGS' in manual page btrfs-subvolume for
         further information.
ROOTVOL
        Name:                   ROOTVOL
        UUID:                   122b0de1-e6f2-6845-aba0-6bf766c16526
        Parent UUID:            -
        Received UUID:          34772967-c709-5146-bf20-898f7dbc2c1f
        Creation time:          2021-12-02 19:59:29 +0530
        Subvolume ID:           256
        Generation:             138473
        Gen at creation:        7
        Parent ID:              5
        Top level ID:           5
        Flags:                  -
        Send transid:           35245
        Send time:              2021-12-02 19:59:29 +0530
        Receive transid:        34
        Receive time:           2021-12-02 20:13:11 +0530
        Snapshot(s):
                                ROOTVOL/.snapshots/1/snapshot
                                ROOTVOL/.snapshots/2/snapshot
22:40                      


rrs@priyasi:.../spool$ sudo btrfs property set / ro true
WARNING: read-write subvolume with received_uuid, this is bad
22:40                      



rrs@priyasi:.../spool$ sudo btrfs property set -f / ro false
22:40                      



rrs@priyasi:.../spool$ sudo btrfs sub show /
ROOTVOL
        Name:                   ROOTVOL
        UUID:                   122b0de1-e6f2-6845-aba0-6bf766c16526
        Parent UUID:            -
        Received UUID:          -
        Creation time:          2021-12-02 19:59:29 +0530
        Subvolume ID:           256
        Generation:             138473
        Gen at creation:        7
        Parent ID:              5
        Top level ID:           5
        Flags:                  -
        Send transid:           0
        Send time:              2021-12-02 19:59:29 +0530
        Receive transid:        138480
        Receive time:           2022-04-20 22:40:43 +0530
        Snapshot(s):
                                ROOTVOL/.snapshots/1/snapshot
                                ROOTVOL/.snapshots/2/snapshot
22:40                      
Hoping there won t be surprises in the coming months.

12 February 2022

Ritesh Raj Sarraf: apt-offline 1.8.4

apt-offline 1.8.4 apt-offline version 1.8.4 has been released. This release includes many bug fixes but the important ones are:
  • Better GPG signature handling
  • Support for verifying InRelease files

Changelog
apt-offline (1.8.4-1) unstable; urgency=medium
  [ Debian Janitor ]
  * Update standards version to 4.5.0, no changes needed.
  [ Paul Wise ]
  * Clarify file type in unknown file message
  * Fix typos
  * Remove trailing whitespace
  * Update LICENSE file to match official GNU version
  * Complain when there are no valid keyrings instead of missing keyrings
  * Make all syncrhronised files world readable
  * Fix usage of indefinite articles
  * Only show the APT Offline GUI once in the menu
  * Update out of date URLs
  * Fix date and whitespace issues in the manual page
  * Replace stereotyping with an appropriate word
  * Switch more Python shebangs to Python 3
  * Correct usage of the /tmp/ directory
  * Fix YAML files
  * Fix usage of the log API
  * Make the copying of changelog lines less brittle
  * Do not split keyring paths on whitespace
  [ Ritesh Raj Sarraf ]
  * Drop the redundant import of the apt module.
    Thanks to github/dandelionred
  * Fix deprecation of get_bugs() in debianbts
  * Drop the unused IgnoredBugTypes
  * Set encoding for files when opening
  * Better error logging when apt fails
  * Don't mandate a default option
  * Demote metadata errors to verbose
  * Also log an error message for every failed .deb url
  * Check hard for the url type
  * Check for ascii armored signature files.
    Thanks to David Klnischkies
  * Add MIME type for InRelease files
  * Drop patch 0001-Drop-the-redundant-import-of-the-apt-module.patch.
    Now part of the 1.8.4 release
  * Prepare release 1.8.3
  * Prepare release 1.8.4
  * debian packaging
    + Bump debhelper compatibility to 13
    + Update install files
  [ Dean Anderson ]
  * [#143] Added support for verifying InRelease files
 -- Ritesh Raj Sarraf <rrs@debian.org>  Sat, 12 Feb 2022 18:52:58 +0530

Resources
  • Tarball and Zip archive for apt-offline are available here
  • Packages should be available in Debian.
  • Development for apt-offline is currently hosted here

11 January 2022

Ritesh Raj Sarraf: ThinkPad AMD Debian

After a hiatus of 6 years, it was nice to be back with the ThinkPad. This blog post briefly touches upon my impressions with the current generation ThinkPad T14 Gen2 AMD variant.
ThinkPad T14 Gen2 AMD
ThinkPad T14 Gen2 AMD

Lenovo It took 8 weeks to get my hands on the machine. Given the pandemic, restrictions and uncertainities, not sure if I should call it an ontime delivery. This was a CTO - Customise-to-order; so was nice to get rid of things I really didn t care/use much. On the other side, it also meant I could save on some power. It also came comparatively cheaper overall.
  • No fingerprint reader
  • No Touch screen
There s still parts where Lenovo could improve. Or less frustate a customer. I don t understand why a company would provide a full customization option on their portal, while at the same time, not provide an explicit option to choose the make/model of the hardware one wants. Lenovo deliberately chooses to not show/specify which WiFi adapter one could choose. So, as I suspected, I ended up with a MEDIATEK Corp. Device 7961 wifi adapter.

AMD For the first time in my computing life, I m now using AMD at the core. I was pretty frustrated with annoying Intel Graphics bugs, so decided to take the plunge and give AMD/ATI a shot, knowing that the radeon driver does have decent support. So far, on the graphics side of things, I m glad that things look bright. The stock in-kernel radeon driver has been working perfect for my needs and I haven t had to tinker even once so far, in my 30 days of use. On the overall system performance, I have not done any benchmarks nor do I want to do. But wholly, the system performance is smooth.

Power/Thermal This is where things need more improvement on the AMD side. This AMD laptop terribly draws a lot of power in suspend mode. And it isn t just this machine, but also the previous T14 Gen1 which has similar problems. I m not sure if this is a generic ThinkPad problem, or an AMD specific problem. But coming from the Dell XPS 13 9370 Intel, this does draw a lot lot more power. So much, that I chose to use hibernation instead. Similarly, on the thermal side, this machine doesn t cool down well as compared the the Dell XPS Intel one. On an idle machine, its temperature are comparatively higher. Looking at powertop reports, it does show to consume an average of 10 watts power even while idle. I m hoping these are Linux ingeration issues and that Lenovo/AMD will improve things in the coming months. But given the user feedback on the ThinkPad T14 Gen1 thread, it may just be wishful thinking.

Linux The overall hardware support has been surprisingly decent. The MediaTek WiFi driver had some glitches but with Linux 5.15+, things have considerably improved. And I hope the trend will continue with forthcoming Linux releases. My previous device driver experience with MediaTek wasn t good but I took the plunge, considering that in the worst scenario I d have the option to swap the card. There s a lot of marketing about Linux + Intel. But I took a jibe with Linux + AMD. There are glitches but nothing so far that has been a dealbreaker. If anything, I wish Lenovo/AMD would seriously work on the power/thermal issues.

Migration Other than what s mentioned above, I haven t had any serious issues. I may have had some rare occassional hangs but they ve been so infrequent that I haven t spent time to investigate those. Upon receiving the machine, my biggest requirement was how to switch my current workstation from Dell XPS to Lenovo ThinkPad. I ve been using btrfs for some time now. And over the years, built my own practise on how to structure it. Things like, provisioning [sub]volumes, based on use cases is one thing I see. Like keeping separate subvols for: cache/temporary data, copy-on-write data , swap etc. I wish these things could be simplified; either on the btrfs tooling side or some different tool on top of it. Below is filtered list of subvols created over years, that were worthy of moving to the new machine.
rrs@priyasi:~$ cat btrfs-volume-layout 
ID 550 gen 19166 top level 5 path home/foo/.cache
ID 552 gen 1522688 top level 5 path home/rrs
ID 553 gen 1522688 top level 552 path home/rrs/.cache
ID 555 gen 1426323 top level 552 path home/rrs/rrs-home/Libvirt-Images
ID 618 gen 1522672 top level 5 path var/spool/news
ID 634 gen 1522670 top level 5 path var/tmp
ID 635 gen 1522688 top level 5 path var/log
ID 639 gen 1522226 top level 5 path var/cache
ID 992 gen 1522670 top level 5 path disk-tmp
ID 1018 gen 1522688 top level 552 path home/rrs/NoBackup
ID 1196 gen 1522671 top level 5 path etc
ID 23721 gen 775692 top level 5 path swap
18:54                      

btrfs send/receive This did come in handy but I sorely missed some feature. Maybe they aren t there, or are there and I didn t look close enough. Over the years, different attributes were set to different subvols. Over time I forget what feature was added where. But from a migration point of view, it d be nice to say, Take this volume and take it with all its attributes . I didn t find that functionality in send/receive. There s get/set-property which I noticed later but by then it was late. So some sort of tooling, ideally something like btrfs migrate or somesuch would be nicer. In the file system world, we already have nice tools to take care of similar scenarios. Like with rsync, I can request it to carry all file attributes. Also, iirc, send/receive works only on ro volumes. So there s more work one needs to do in:
  1. create ro vol
  2. send
  3. receive
  4. don t forget to set rw property
  5. And then somehow find out other properties set on each individual subvols and [re]apply the same on the destination
I wish this all be condensed into a sub-command. For my own sake, for this migration, the steps used were:
user@debian:~$ for volume in  sudo btrfs sub list /media/user/TOSHIBA/Migrate/   cut -d ' ' -f9   grep -v ROOTVOL   grep -v etc   grep -v btrbk ; do echo $volume; sud
o btrfs send /media/user/TOSHIBA/$volume   sudo btrfs receive /media/user/BTRFSROOT/ ; done            
Migrate/snapshot_disk-tmp
At subvol /media/user/TOSHIBA/Migrate/snapshot_disk-tmp
At subvol snapshot_disk-tmp
Migrate/snapshot-home_foo_.cache
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_foo_.cache
At subvol snapshot-home_foo_.cache
Migrate/snapshot-home_rrs
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs
At subvol snapshot-home_rrs
Migrate/snapshot-home_rrs_.cache
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs_.cache
At subvol snapshot-home_rrs_.cache
ERROR: crc32 mismatch in command
Migrate/snapshot-home_rrs_rrs-home_Libvirt-Images
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs_rrs-home_Libvirt-Images
At subvol snapshot-home_rrs_rrs-home_Libvirt-Images
ERROR: crc32 mismatch in command
Migrate/snapshot-var_spool_news
At subvol /media/user/TOSHIBA/Migrate/snapshot-var_spool_news
At subvol snapshot-var_spool_news
Migrate/snapshot-var_lib_machines
At subvol /media/user/TOSHIBA/Migrate/snapshot-var_lib_machines
At subvol snapshot-var_lib_machines
Migrate/snapshot-var_lib_machines_DebianSidTemplate
..... snipped .....
And then, follow-up with:
user@debian:~$ for volume in  sudo btrfs sub list /media/user/BTRFSROOT/   cut -d ' ' -f9 ; do echo $volume; sudo btrfs property set -ts /media/user/BTRFSROOT/$volume ro false; done
ROOTVOL
ERROR: Could not open: No such file or directory
etc
snapshot_disk-tmp
snapshot-home_foo_.cache
snapshot-home_rrs
snapshot-var_spool_news
snapshot-var_lib_machines
snapshot-var_lib_machines_DebianSidTemplate
snapshot-var_lib_machines_DebSidArmhf
snapshot-var_lib_machines_DebianJessieTemplate
snapshot-var_tmp
snapshot-var_log
snapshot-var_cache
snapshot-disk-tmp
And then finally, renaming everything to match proper:
user@debian:/media/user/BTRFSROOT$ for x in snapshot*; do vol=$(echo $x   cut -d '-' -f2   sed -e "s _ / g"); echo $x $vol; sudo mv $x $vol; done
snapshot-var_lib_machines var/lib/machines
snapshot-var_lib_machines_Apertisv2020ospackTargetARMHF var/lib/machines/Apertisv2020ospackTargetARMHF
snapshot-var_lib_machines_Apertisv2021ospackTargetARM64 var/lib/machines/Apertisv2021ospackTargetARM64
snapshot-var_lib_machines_Apertisv2022dev3ospackTargetARMHF var/lib/machines/Apertisv2022dev3ospackTargetARMHF
snapshot-var_lib_machines_BusterArm64 var/lib/machines/BusterArm64
snapshot-var_lib_machines_DebianBusterTemplate var/lib/machines/DebianBusterTemplate
snapshot-var_lib_machines_DebianJessieTemplate var/lib/machines/DebianJessieTemplate
snapshot-var_lib_machines_DebianSidTemplate var/lib/machines/DebianSidTemplate
snapshot-var_lib_machines_DebianSidTemplate_var_lib_portables var/lib/machines/DebianSidTemplate/var/lib/portables
snapshot-var_lib_machines_DebSidArm64 var/lib/machines/DebSidArm64
snapshot-var_lib_machines_DebSidArmhf var/lib/machines/DebSidArmhf
snapshot-var_lib_machines_DebSidMips var/lib/machines/DebSidMips
snapshot-var_lib_machines_JenkinsApertis var/lib/machines/JenkinsApertis
snapshot-var_lib_machines_v2019 var/lib/machines/v2019
snapshot-var_lib_machines_v2019LinuxSupport var/lib/machines/v2019LinuxSupport
snapshot-var_lib_machines_v2020 var/lib/machines/v2020
snapshot-var_lib_machines_v2021dev3Slim var/lib/machines/v2021dev3Slim
snapshot-var_lib_machines_v2021dev3SlimTarget var/lib/machines/v2021dev3SlimTarget
snapshot-var_lib_machines_v2022dev2OspackMinimal var/lib/machines/v2022dev2OspackMinimal
snapshot-var_lib_portables var/lib/portables
snapshot-var_log var/log
snapshot-var_spool_news var/spool/news
snapshot-var_tmp var/tmp

snapper Entirely independent of this, but indirectly related. I use snapper as my snapshotting tool. It worked perfect on my previous machine. While everything got migrated, the only thing that fell apart was snapper. It just wouldn t start/run proper. Funny thing is that I just removed the snapper configs and reinitialized with the exact same config again, and voila snapper was happy.

Conclusion That was pretty much it. With the above and then also migrating /boot and then just chroot to install the boot loader. At some time, I d like to explore other boot options but given that that is such a non-essential task, it is low on the list. The good part was that I booted into my new machine with my exact workstation setup as it was. All the way to the user cache and the desktop session. So it was nice on that part. But I surely think there s room for a better migration experience here. If not directly as btrfs migrate, then maybe as an independent tool. The problem is that such a tool is going to be used once in years, so I didn t find the motivation to write one. But this surely would be a good use case for the distribution vendors.

21 November 2021

Antoine Beaupr : mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync). But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time. Read on for the details.

Benchmark setup All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye. The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive. The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C). The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:
$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir
The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely. Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline. A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps
That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps
As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:
anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/   pv -s 13G   tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%
Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer. (But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?) Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:
anarcat@marcos:~$ tar fc - Maildir   pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%
That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:
anarcat@angela:~(main)$ tar fc - Maildir   pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%
So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways. Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file I started with this .mbsyncrc configuration file:
SyncState *
Sync New ReNew Flags
IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore anarcat-remote
Account anarcat
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/
Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both
Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax. Also, that syntax is a little overly complicated. For example, Far needs colons, like:
Far :anarcat-remote:
Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global. Using a more standard format like .INI or TOML could improve that situation.

Stellar performance A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive. It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync. The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    
Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m. Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:
120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps
That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365
Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:
===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       
That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:
anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0
(Note that nice switch away from slavery-related terms too.) The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:
This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).
In other words:
  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted
You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:
notmuch search --output=files --format=text0 "$search_spam" \
      xargs -r -0 mv -t "$HOME/Maildir/$ PREFIX junk/cur/"
This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:
Maildir error: duplicate UID 37578.
And indeed, there are now two messages with that UID in the mailbox:
anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S
This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":
When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:
(setq mu4e-change-filenames-when-moving t)
So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above. (A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.) Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like. Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect. The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more. Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it... Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues. Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages. I have used the following .service file:
[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service
[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true
[Install]
WantedBy=default.target
And the following .timer:
[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos
[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service
[Install]
WantedBy=timers.target
Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:
[Unit]
Description=notmuch new
After=mbsync.service
[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new
[Install]
WantedBy=mbsync.service
An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup The sample file suggests this should work:
IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"
Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
And it actually seems to work:
$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087
It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well? Interestingly, the SSH setup is not faster than IMAP. With IMAP:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476
With SSH:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393
Basically: 200ms slower. Tolerable.

Migrating from SMD The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:
  1. install isync, with the patch:
    dpkg -i isync_1.4.3-1.1~_amd64.deb
    
  2. copy all files over from previous workstation to avoid a full resync (optional):
    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
    
  3. rename all files to match new hostname (optional):
    find Maildir-mbsync/ -type f -name '*.angela,*' -print0    rename -0 's/\.angela,/\.curie,/'
    
  4. trash the notmuch database (optional):
    rm -rf Maildir-mbsync/.notmuch/xapian/
    
  5. disable all smd and notmuch services:
    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
  6. do one last sync with smd:
    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
    
  7. backup notmuch on the client and server:
    notmuch dump   pv > notmuch.dump
    
  8. backup the maildir on the client and server:
    cp -al Maildir Maildir-bak
    
  9. create the SSH key:
    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/id_ed25519_mbsync.pub
    
  10. add to .ssh/authorized_keys on the server, like this: command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...
  11. move old files aside, if present:
    mv Maildir Maildir-smd
    
  12. move new files in place (CRITICAL SECTION BEGINS!):
    mv Maildir-mbsync Maildir
    
  13. run a test sync, only pulling changes: mbsync --create-near --remove-none --expunge-none --noop anarcat-register
  14. if that works well, try with all mailboxes: mbsync --create-near --remove-none --expunge-none --noop -a
  15. if that works well, try again with a full sync: mbsync register mbsync -a
  16. reindex and restore the notmuch database, this should take ~25 minutes:
    notmuch new
    pv notmuch.dump   notmuch restore
    
  17. enable the systemd services and retire the smd-* services: systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload
During the migration, notmuch helpfully told me the full list of those lost messages:
[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: F5086003-2917-4659-B7D2-66C62FCD4128@gmail.com
[...]
Warning: cannot apply tags to missing message: mailman.2.1316793601.53477.sage-members@mailman.sage.org
Warning: cannot apply tags to missing message: mailman.7.1317646801.26891.outages-discussion@outages.org
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]
As detailed in the crash report, all of those were actually innocuous and could be ignored. Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:
nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.
While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file Putting it all together, I ended up with the following configuration file:
SyncState *
Sync All
# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-remote
Account anarcat-tunnel
# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/
# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both
IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C register@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-register-remote
Account anarcat-register-tunnel
MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/
Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both
Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that. It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync The first thing that happened on a full sync is this crash:
Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps
That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:
 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<20190619152034.BFB8810E07A@marcos.anarc.at>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: Exception parsing message with ID (<40A270DB.9090609@alternatives.ca>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)
Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs
This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude! Now that we have a full sync, we can test incremental synchronization. That is also much slower:
===> multitime results
1: sh -c "offlineimap -o   true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002
That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory. The OfflineIMAP mail spool is missing quite a few messages as well:
anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*'   wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385247
... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate Those are all the options I have considered, in alphabetical order
  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python
Most projects were not evaluated due to lack of time.

Conclusion I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response. Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

6 November 2021

Vincent Bernat: How to rsync files between two remotes?

scp -3 can copy files between two remote hosts through localhost. This comes in handy when the two servers cannot communicate directly or if they are unable to authenticate one to the other.1 Unfortunately, rsync does not support such a feature. Here is a trick to emulate the behavior of scp -3 with SSH tunnels. When syncing with a remote host, rsync invokes ssh to spawn a remote rsync --server process. It interacts with it through its standard input and output. The idea is to recreate the same setup using SSH tunnels and socat, a versatile tool to establish bidirectional data transfers. The first step is to connect to the source server and ask rsync the command-line to spawn the remote rsync --server process. The -e flag overrides the command to use to get a remote shell: instead of ssh, we use echo.
$ ssh web04
$ rsync -e 'sh -c ">&2 echo $@" echo' -aLv /data/. web05:/data/.
web05 rsync --server -vlogDtpre.iLsfxCIvu . /data/.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [sender=3.2.3]
The second step is to connect to the destination server with local port forwarding. When connecting to the local port 5000, the TCP connection is forwarded through SSH to the remote port 5000 and handled by socat. When receiving the connection, socat spawns the rsync --server command we got at the previous step and connects its standard input and output to the incoming TCP socket.
$ ssh -L 127.0.0.1:5000:127.0.0.1:5000 web05
$ socat tcp-listen:5000,reuseaddr exec:"rsync --server -vlogDtpre.iLsfxCIvu . /data/."
The last step is to connect to the source with remote port forwarding. socat is used in place of a regular SSH connection and connects its standard input and output to a TCP socket connected to the remote port 5000. Thanks to the remote port forwarding, SSH forwards the data to the local port 5000. From there, it is relayed back to the destination, as described in the previous step.
$ ssh -R 127.0.0.1:5000:127.0.0.1:5000 web04
$ rsync -e 'sh -c "socat stdio tcp-connect:127.0.0.1:5000"' -aLv /data/. remote:/data/.
sending incremental file list
haproxy.debian.net/
haproxy.debian.net/dists/buster-backports-1.8/Contents-amd64.bz2
haproxy.debian.net/dists/buster-backports-1.8/Contents-i386.bz2
[ ]
media.luffy.cx/videos/2021-frnog34-jerikan/progressive.mp4
sent 921,719,453 bytes  received 26,939 bytes  7,229,383.47 bytes/sec
total size is 7,526,872,300  speedup is 8.17
This little diagram may help understand how everything fits together:
Diagram showing how all processes are connected together: rsync, socat and ssh
How each process is connected together. Arrows labeled stdio are implemented as two pipes connecting the process to the left to the standard input and output of the process to the right. Don't be fooled by the apparent symmetry!
The rsync manual page prohibits the use of --server. Use this hack at your own risk!
The options --server and --sender are used internally by rsync, and should never be typed by a user under normal circumstances. Some awareness of these options may be needed in certain scenarios, such as when setting up a login that can only run an rsync command. For instance, the support directory of the rsync distribution has an example script named rrsync (for restricted rsync) that can be used with a restricted ssh login.

Addendum I was hoping to get something similar with a one-liner. But this does not work!
$ socat \
>  exec:"ssh web04 rsync --server --sender -vlLogDtpre.iLsfxCIvu . /data/." \
>  exec:"ssh web05 rsync --server -vlogDtpre.iLsfxCIvu /data/. /data/." \
over-long vstring received (511 > 255)
over-long vstring received (511 > 255)
rsync error: requested action not supported (code 4) at compat.c(387) [sender=3.2.3]
rsync error: requested action not supported (code 4) at compat.c(387) [Receiver=3.2.3]
socat[878291] E waitpid(): child 878292 exited with status 4
socat[878291] E waitpid(): child 878293 exited with status 4

  1. And SSH agent forwarding is dangerous. Don t use it if you can.

9 October 2021

Ritesh Raj Sarraf: Lotus to Lily

The Lotus story so far My very first experience with water flowering plants was pretty good. I learnt a good deal of things; from setting up the pond, germinating the lotus seeds, setting up the right soil, witnessing the growth of the lotus plant, fish eco-system to take care of the pond. Overall, a lot of things learnt. But I couldn t succeed in getting the Lotus flower. A lot many reasons. The granite container developed some leakage, which I had to fix by emptying it, which might have caused some shock to the lotus. But more than that, in my understanding, the reason for not being able to flower the lotus, was the amount of sunlight. From what I have learned, these plants need a minimum of 6-8 hrs of sunlight to really give you with the flowering result, whereas the setup of my pond was on the ground with hardly 3-4 hrs of sun. And that too, with all the plants growing, resulted in indirect sunlight.

Lotus to Lily For my new setup, I chose a large oval container. And this one, I placed on my terrace, carefully choosing a spot where it d get 6-8 hrs of very bright sun on usual days. Other than that, the rest of the setup is pretty similar to my previous setup in the garden. Guppies, Solar Water Fountain etc.
Initial lily pond setup
Initial lily pond setup
The good thing about the terrace is that the setup gets ample amount of sun. You can see that in the picture above, with the amount of algae that has been formed. Something that is vital for the plant s ecosystem. I must thank my wonderful neighbor who kindly shared a sapling from their lily plant. They already had had success with flowering the lily. So I had high hopes to see the day come when I d be happy to write down my experience in this blog post. Though, a lot of patience is needed. I got the lily some time in January this year. And it blossomed now, in October. So, here s me sharing my happiness here, in particular order of how I documented the process.
Monday morning greeted with a blossomed lily
Monday morning greeted with a blossomed lily
Lily Blossom Closeup
Lily Blossom Closeup
Beautiful water reflection
Beautiful water reflection

Dawn to Dusk The other thing that I learned in this whole lily episode is that the flower goes back to sleeping at dusk. And back to flowering again at dawn. There s so much to learn in the surrounding, only if you spare some time to the little things with mother nature.
Lily status at dusk
Lily status at dusk
Lily the next day
Lily the next day
Not sure how long this phenomenon is to last, but overall witnessing this whole process has been mesmerizing. This past week has been great.

3 October 2021

Ritesh Raj Sarraf: Human Society

In my past, I ve had experiences that have had me thinking. My experiences have been mostly in the South Asian Indian Sub-Continent, so may not be fair to generalize it.

7 July 2021

Ritesh Raj Sarraf: Insect Camouflage Plant

I was quite impressed by the ability of this insect; yet to be. The way it has camouflaged itself is mesmerizing. I ll let the video do the talking as this one is going to be difficult to express in words.

29 June 2021

Ritesh Raj Sarraf: Plant Territorial Behavior

This blog post is about my observations of some of the plants in my home garden. While still a n00b on the subject, these notes are my observations and experiences over days, weeks and months. Thankfully, with the capability to take frequent pictures, it has been easy to do an assessment and generate a report of some of these amazing behaviors of plants, in an easy timeline order; all thanks to the EXIF data embedded. This has very helpfully allowed me to record my, otherwise minor observations, into great detail; and make some sense out of it by correlating the data over time. It is an emotional experience. You see, plants are amazing. When I sow a sapling, water it, feed it, watch it grow, prune it, medicate it, and what not; I build up affection towards it. Though, at the same time, to me it is a strict relationship, not too attached; as in it doesn t hurt to uproot a plant if there is a good reason. But still, I find some sort of association to it. With plants around, it feels I have a lot of lives around me. All prospering, communicating, sharing. And communicate they do. What is needed is just the right language to observe and absorb their signals and decipher what they are trying to say.

Devastation How in this world, when you are caring for your plants, can it transform:

From This
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Mulberry Plant
Healthy Bael Plant
Healthy Bael Plant

To This
Dead Mulberry Plant
Dead Mulberry Plant
Very Sick Bael Plant
Very Sick Bael Plant
With emotions involved, this can be an unpleasant experience. Bael is a dear plant to me. The plant as a whole has religious values (Shiva). As well, its fruits have lots of health benefits, especially for the intestines. Its leaves have a lot of medicinal properties. When I planted the Bael, there were a lot of emotions that went along. On the other hand, the Mulberry is something I put in with a lot of enthusiasm. Mulberries are now rare to find, especially in urban locations. For one, they have a very short shelf life; But more than that, the way lifestyles are heading towards, I was always worried if my children would ever have a day to see and taste these fruits. The mulberry that I planted, yielded twice; once very soon when I had planted and second, before it died. Infact, it died while during its second yield phase. It was quite saddening to see that happen. It made me wonder why it happened. I had been caring for the plants fairly well. Watering them timely, feeding them the right amount of nutrients. They were getting a good amount of sun. But still their health was deteriorating. And then the demise of the Mulberry. Many thoughts hit my mind. I consulted the claimed experts in the domain, the maali, the gardener. I got a very vague answer; there must be termites in the soil. It didn t make much sense to me. I mean if there are termites they d hit day one. They won t sleep for months and just wake up one fine day and start attacking the roots of particular plants; not all. I wasn t convinced with the termite theory; But still, give the expert the blind hand, I went with his word. When my mulberry was dead, I dug its roots. Looking for proof, to see if there were any termites, I uprooted it. But I couldn t find any trace of termites. And the plant next to it was perfectly healthy and blossoming. So I was convinced that it wasn t the termite but something else. But else what ? I still didn t have an answer to that.

Thinking The Corona pandemic had embraced and there was a lot to worry, and worry not any, if you change the perspective. With plants around in my home, and our close engagement with them, and the helplessness that I felt after seeking help from the experts, it was time again; to build up some knowledge on the subject. But how ? How do you go about a subject you have not much clue about ? A subject which has always been around in the surrounding but very seldom have I dedicated focused thought to it. To be honest, the initial thought of diving on the subject made me clueless. I had no idea where to begin with. But, so, as has been my past history, I chose to take it as a curiosity. I gathered some books, skimmed through a couple of pages. Majority of the books I got hold of were about DIYs and How to do Home Gardening types. It was a decent introduction to a novice but my topic of curiosity was different. Thankfully, with the Internet, and YouTube in particular, a lot of good stuff is available as documentary videos. While going through some, I came across a video which mentioned about carnivore plants. Like, for example, this one.
Carnivore Plant
Carnivore Plant
This got me thinking that there could be a possibility of something similar, that did the fate to my Mulberry plant. But who did it ? And how to dive further on this suspicion ? And most of all, if that thought of possibility was actually the reality. Or was I just hitting in the dark ?

Beginning To put some perspective, here s how it started. When we moved into our home, the gardener put in a couple of plants stock, as part of the property handover. Now, I don t exactly recollect the name of the plants that came in stock, neither English nor Hindi; But at my neighbor s place, the plant is still there. Here are some of the pictures of this beauty. But don t just go by the looks as looks can be deceiving
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
Dominating Plant
We hadn t put any serious thought about the plants we were offered by the gardener. After all, we never had ever thought of any mishap either.

Plants we planted Apart from what was offered by the builder/gardener as part of the property handover, in over the next 6 months of we moving in, I planted 3 tree type plants.
  1. Mulberry
  2. Bael
  3. Rudraksha
The Mulberry, as I have described so far, died a tragic death. Bael, on the other hand, fought hard. But very little did we know that the plant was struggling the fight. Our impression was that we must have been given a bad breed of the plant. Or maybe the termite theory had some truth. For the Rudraksha plant, the growth was slow. This was the very first time I had seen a Rudraksha plant, so I had no clue of what its growth rate could be, and what to expect out of it. I wasn t sure if the local climate suited the plant. A quick search showed no objections to the plant in the local climate, but that was it. So my theory has been to put in the plant, and observe. Here s what my Rudraksha plant looked like during the initial days/weeks of its settlement
Rudraksha Plant
Rudraksha Plant

The Hint Days passed and so on. Not much had progressed in gathering information. The plant s health was usual; deteriorating at a slow pace. On day, thinking of the documentaries I had been watching, it hit my mind about the plant behavior.
  • Plants can be Carnivores.
  • Plants can be Aggressive.
  • Plants can be Invaders.
  • Plants can be Territorial.
There are many plants where their aggression can be witnessed with bare human eyes. Like creepers. Some of them are good at spreading tentacles, grabbing onto other plants' stems and branches and spread above it. This was my hint from the documentaries. That s one of the many ways plants set their dominance. That is what hit my mind that if plants are aggressive on the out, underneath the soil, they should be having similar behavior. I mean, what we see as humans is just a part of the actual plant. More than half of the actual plant is usually underneath the soil, in most plants. So there s a high chance to get more information out, if you dig the soil and look the roots.

The Digging As I mentioned earlier, I do establish bindings, emotions and attachments. But not much usually comes in the way to curiosity. To dig further on the theory that the problem was elsewhere, with within the plants ecosystem, we needed to pick on another subject - the plant. And the plant we chose was the plant which was planted in the initial offering to us, when we moved into our home. It was the same plant breed which was neighboring all our newly planted trees: Rudraksha, Mulberry and Bael. If you look closely into the pictures above of these plants, you ll notice the stem of another plant, the Territorial Dominator, is close-by to these 3 plants. That s because the gardener put in a good number of them to get his action item complete. So we chose to dig and uproot one of those plant to start with. Now, while they may look gentle on the outside, with nice red colored tiny flowers, these plants were giants underneath. Their roots were huge. It took some sweat shredding to single-handedly remove them.
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted
Dominating Plant Uprooted

Today is brighter

Bael I ll let the pictures do the initial talking today.
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
Healthy Bael
The above ones are pictures of the same Bael plant, which had struggled to live, for almost 14 months. Back then, this plant was starved of its resources. It was dying a slow death out of starvation. After we uprooted the other dominant species, the Bael has recovered and has regained its charm. In the pictures above of the Bael plant, you can clearly mark out the difference in its stem. The dark colored one is from its months of struggle, while the bright green is from now where it is well nourished and regained its health.

Mulberry As for the Mulberry, I couldn t save it. But I later managed to get another one. But it turns out I didn t take good, full length pictures of the new mulberry when I planted. The only picture I have is this:
Second Mulberry Plant
Second Mulberry Plant
I recollect when I brought it home, it was around 1 - 1.5 feet in length. This is where I have it today: Majestically standing, 12 feet and counting
Second Mulberry Plant 7 feet tall
Second Mulberry Plant 7 feet tall

Rudraksha Then:
Rudraksha Plant
Rudraksha Plant
And now: I feel quite happy about
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
Rudraksha Plant
All these plants are on the very same soil with the very same care taker. What has changed is my experience and learning.

Plant Co-Existence Plant co-existence is a difficult topic. My knowledge on plants is very limited in general, and co-existence is something tricky, unexplored, at times invisible (when underneath the soil). So it is a difficult topic. So far, what I ve learnt is purely observations, experiences and hints from the documentaries. There surely are many many plants that co-exist very well. A good example is my Bael plant itself, which is healthily co-sharing its space with 2 other Croton plants. Same goes for the Rudraksha, which has a close-by neighbor in an Adenium and an Allamanda. The plant world is mesmerizing. How they behave, communicate and many many more signs. There s so much to observe, learn, explore and document. I hope to have more such observations and experiences to share

25 May 2021

Vincent Bernat: Jerikan+Ansible: a configuration management system for network

There are many resources for network automation with Ansible. Most of them only expose the first steps or limit themselves to a narrow scope. They give no clue on how to expand from that. Real network environments may be large, versatile, heterogeneous, and filled with exceptions. The lack of real-world examples for Ansible deployments, unlike Puppet and SaltStack, leads many teams to build brittle and incomplete automation solutions. We have released under an open-source license our attempt to tackle this problem: Here is a quick demo to configure a new peering:
This work is the collective effort of C dric Hasco t, Jean-Christophe Legatte, Lo c Pailhas, S bastien Hurtel, Tchadel Icard, and Vincent Bernat. We are the network team of Blade, a French company operating Shadow, a cloud-computing product. In May 2021, our company was bought by Octave Klaba and the infrastructure is being transferred to OVHcloud, saving Shadow as a product, but making our team redundant. Our network was around 800 devices, spanning over 10 datacenters with more than 2.5 Tbps of available egress bandwidth. The released material is therefore a substantial example of managing a medium-scale network using Ansible. We have left out the handling of our legacy datacenters to make the final result more readable while keeping enough material to not turn it into a trivial example.

Jerikan The first component is Jerikan. As input, it takes a list of devices, configuration data, templates, and validation scripts. It generates a set of configuration files for each device. Ansible could cover this task, but it has the following limitations:
  • it is slow;
  • errors are difficult to debug;1 and
  • the hierarchy to look up a variable is rigid.
Jerikan inputs and outputs
Jerikan inputs and outputs
If you want to follow the examples, you only need to have Docker and Docker Compose installed. Clone the repository and you are ready!

Source of truth We use YAML files, versioned with Git, as the single source of truth instead of using a database, like NetBox, or a mix of a database and text files. This provides many advantages:
  • anyone can use their preferred text editor;
  • the team prepares changes in branches;
  • the team reviews changes using merge requests;
  • the merge requests expose the changes to the generated configuration files;
  • rollback to a previous state is easy; and
  • it is fast.
The first file is devices.yaml. It contains the device list. The second file is classifier.yaml. It defines a scope for each device. A scope is a set of keys and values. It is used in templates and to look up data associated with a device.
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
continent: apac
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p1.sk1
location: sk1
member: '1'
model: dell-s4048
os: cumulus
pod: '1'
shorthost: to1-p1
The device name is matched against a list of regular expressions and the scope is extended by the result of each match. For to1-p1.sk1.blade-group.net, the following subset of classifier.yaml defines its scope:
matchers:
  - '^(([^.]*)\..*)\.blade-group\.net':
      environment: prod
      host: '\1'
      shorthost: '\2'
  - '\.(sk1)\.':
      location: '\1'
      continent: apac
  - '^to([12])-[as]?p(\d+)\.':
      member: '\1'
      pod: '\2'
  - '^to[12]-p\d+\.':
      groups:
        - tor
        - tor-bgp
        - tor-bgp-compute
  - '^to[12]-(p ap)\d+\.sk1\.':
      os: cumulus
      model: dell-s4048
The third file is searchpaths.py. It describes which directories to search for a variable. A Python function provides a list of paths to look up in data/ for a given scope. Here is a simplified version:2
def searchpaths(scope):
    paths = [
        "host/ scope[location] / scope[shorthost] ",
        "location/ scope[location] ",
        "os/ scope[os] - scope[model] ",
        "os/ scope[os] ",
        'common'
    ]
    for idx in range(len(paths)):
        try:
            paths[idx] = paths[idx].format(scope=scope)
        except KeyError:
            paths[idx] = None
    return [path for path in paths if path]
With this definition, the data for to1-p1.sk1.blade-group.net is looked up in the following paths:
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
[ ]
Search paths:
  host/sk1/to1-p1
  location/sk1
  os/cumulus-dell-s4048
  os/cumulus
  common
Variables are scoped using a namespace that should be specified when doing a lookup. We use the following ones:
  • system for accounts, DNS, syslog servers,
  • topology for ports, interfaces, IP addresses, subnets,
  • bgp for BGP configuration
  • build for templates and validation scripts
  • apps for application variables
When looking up for a variable in a given namespace, Jerikan looks for a YAML file named after the namespace in each directory in the search paths. For example, if we look up a variable for to1-p1.sk1.blade-group.net in the bgp namespace, the following YAML files are processed: host/sk1/to1-p1/bgp.yaml, location/sk1/bgp.yaml, os/cumulus-dell-s4048/bgp.yaml, os/cumulus/bgp.yaml, and common/bgp.yaml. The search stops at the first match. The schema.yaml file allows us to override this behavior by asking to merge dictionaries and arrays across all matching files. Here is an excerpt of this file for the topology namespace:
system:
  users:
    merge: hash
  sampling:
    merge: hash
  ansible-vars:
    merge: hash
  netbox:
    merge: hash
The last feature of the source of truth is the ability to use Jinja2 templates for keys and values by prefixing them with ~ :
# In data/os/junos/system.yaml
netbox:
  manufacturer: Juniper
  model: "~  model upper  "
# In data/groups/tor-bgp-compute/system.yaml
netbox:
  role: net_tor_gpu_switch
Looking up for netbox in the system namespace for to1-p2.ussfo03.blade-group.net yields the following result:
$ ./run-jerikan scope to1-p2.ussfo03.blade-group.net
continent: us
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p2.ussfo03
location: ussfo03
member: '1'
model: qfx5110-48s
os: junos
pod: '2'
shorthost: to1-p2
[ ]
Search paths:
[ ]
  groups/tor-bgp-compute
[ ]
  os/junos
  common
$ ./run-jerikan lookup to1-p2.ussfo03.blade-group.net system netbox
manufacturer: Juniper
model: QFX5110-48S
role: net_tor_gpu_switch
This also works for structured data:
# In groups/adm-gateway/topology.yaml
interface-rescue:
  address: "~  lookup('topology', 'addresses').rescue  "
  up:
    - "~ip route add default via   lookup('topology', 'addresses').rescue ipaddr('first_usable')   table rescue"
    - "~ip rule add from   lookup('topology', 'addresses').rescue ipaddr('address')   table rescue priority 10"
# In groups/adm-gateway-sk1/topology.yaml
interfaces:
  ens1f0: "~  lookup('topology', 'interface-rescue')  "
This yields the following result:
$ ./run-jerikan lookup gateway1.sk1.blade-group.net topology interfaces
[ ]
ens1f0:
  address: 121.78.242.10/29
  up:
  - ip route add default via 121.78.242.9 table rescue
  - ip rule add from 121.78.242.10 table rescue priority 10
When putting data in the source of truth, we use the following rules:
  1. Don t repeat yourself.
  2. Put the data in the most specific place without breaking the first rule.
  3. Use templates with parsimony, mostly to help with the previous rules.
  4. Restrict the data model to what is needed for your use case.
The first rule is important. For example, when specifying IP addresses for a point-to-point link, only specify one side and deduce the other value in the templates. The last rule means you do not need to mimic a BGP YANG model to specify BGP peers and policies:
peers:
  transit:
    cogent:
      asn: 174
      remote:
        - 38.140.30.233
        - 2001:550:2:B::1F9:1
      specific-import:
        - name: ATT-US
          as-path: ".*7018$"
          lp-delta: 50
  ix-sfmix:
    rs-sfmix:
      monitored: true
      asn: 63055
      remote:
        - 206.197.187.253
        - 206.197.187.254
        - 2001:504:30::ba06:3055:1
        - 2001:504:30::ba06:3055:2
    blizzard:
      asn: 57976
      remote:
        - 206.197.187.42
        - 2001:504:30::ba05:7976:1
      irr: AS-BLIZZARD

Templates The list of templates to compile for each device is stored in the source of truth, under the build namespace:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: junos/main.j2
config-base.txt: junos/base.j2
config-irr.txt: junos/irr.j2
$ ./run-jerikan lookup to1-p1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: cumulus/main.j2
frr.conf: cumulus/frr.j2
interfaces.conf: cumulus/interfaces.j2
ports.conf: cumulus/ports.j2
dhcpd.conf: cumulus/dhcp.j2
default-isc-dhcp: cumulus/default-isc-dhcp.j2
authorized_keys: cumulus/authorized-keys.j2
motd: linux/motd.j2
acl.rules: cumulus/acl.j2
rsyslog.conf: cumulus/rsyslog.conf.j2
Templates are using Jinja2. This is the same engine used in Ansible. Jerikan ships some custom filters but also reuse some of the useful filters from Ansible, notably ipaddr. Here is an excerpt of templates/junos/base.j2 to configure DNS and NTP servers on Juniper devices:
system  
  ntp  
 % for ntp in lookup("system", "ntp") % 
    server   ntp  ;
 % endfor % 
   
  name-server  
 % for dns in lookup("system", "dns") % 
      dns  ;
 % endfor % 
   
 
The equivalent template for Cisco IOS-XR is:
 % for dns in lookup('system', 'dns') % 
domain vrf VRF-MANAGEMENT name-server   dns  
 % endfor % 
!
 % for syslog in lookup('system', 'syslog') % 
logging   syslog   vrf VRF-MANAGEMENT
 % endfor % 
!
There are three helper functions provided:
  • devices() returns the list of devices matching a set of conditions on the scope. For example, devices("location==ussfo03", "groups==tor-bgp") returns the list of devices in San Francisco in the tor-bgp group. You can also omit the operator if you want the specified value to be equal to the one in the local scope. For example, devices("location") returns devices in the current location.
  • lookup() does a key lookup. It takes the namespace, the key, and optionally, a device name. If not provided, the current device is assumed.
  • scope() returns the scope of the provided device.
Here is how you would define iBGP sessions between edge devices in the same location:
 % for neighbor in devices("location", "groups==edge") if neighbor != device % 
   % for address in lookup("topology", "addresses", neighbor).loopback tolist % 
protocols bgp group IPV  address ipv  -EDGES-IBGP  
  neighbor   address    
    description "IPv  address ipv  : iBGP to   neighbor  ";
   
 
   % endfor % 
 % endfor % 
We also have a global key-value store to save information to be reused in another template or device. This is quite useful to automatically build DNS records. First, capture the IP address inserted into a template with store() as a filter:
interface Loopback0
 description 'Loopback:'
  % for address in lookup('topology', 'addresses').loopback tolist % 
 ipv  address ipv   address   address store('addresses', 'Loopback0') ipaddr('cidr')  
  % endfor % 
!
Then, reuse it later to build DNS records by iterating over store():4
 % for device, ip, interface in store('addresses') % 
   % set interface = interface replace('/', '-') replace('.', '-') replace(':', '-') % 
   % set name = ' . '.format(interface lower, device) % 
  name  . IN   'A' if ip ipv4 else 'AAAA'     ip ipaddr('address')  
 % endfor % 
Templates are compiled locally with ./run-jerikan build. The --limit argument restricts the devices to generate configuration files for. Build is not done in parallel because a template may depend on the data collected by another template. Currently, it takes 1 minute to compile around 3000 files spanning over 800 devices.
Jerikan outputs when building templates
Output of Jerikan after building configuration files for six devices
When an error occurs, a detailed traceback is displayed, including the template name, the line number and the value of all visible variables. This is a major time-saver compared to Ansible!
templates/opengear/config.j2:15: in top-level template code
    config.interfaces.  interface  .netmask   adddress   ipaddr("netmask")  
        continent  = 'us'
        device     = 'con1-ag2.ussfo03.blade-group.net'
        environment = 'prod'
        host       = 'con1-ag2.ussfo03'
        infos      =  'address': '172.30.24.19/21' 
        interface  = 'wan'
        location   = 'ussfo03'
        loop       = <LoopContext 1/2>
        member     = '2'
        model      = 'cm7132-2-dac'
        os         = 'opengear'
        shorthost  = 'con1-ag2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = JerkianUndefined, query = 'netmask', version = False, alias = 'ipaddr'
[ ]
        # Check if value is a list and parse each element
        if isinstance(value, (list, tuple, types.GeneratorType)):
            _ret = [ipaddr(element, str(query), version) for element in value]
            return [item for item in _ret if item]
>       elif not value or value is True:
E       jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'adddress'
We don t have general-purpose rules when writing templates. Like for the source of truth, there is no need to create generic templates able to produce any BGP configuration. There is a balance to be found between readability and avoiding duplication. Templates can become scary and complex: sometimes, it s better to write a filter or a function in jerikan/jinja.py. Mastering Jinja2 is a good investment. Take time to browse through our templates as some of them show interesting features.

Checks Optionally, each configuration file can be validated by a script in the checks/ directory. Jerikan looks up the key checks in the build namespace to know which checks to run:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build checks
- description: Juniper configuration file syntax check
  script: checks/junoser
  cache:
    input: config.txt
    output: config-set.txt
- description: check YAML data
  script: checks/data.yaml
  cache: data.yaml
In the above example, checks/junoser is executed if there is a change to the generated config.txt file. It also outputs a transformed version of the configuration file which is easier to understand when using diff. Junoser checks a Junos configuration file using Juniper s XML schema definition for Netconf.5 On error, Jerikan displays:
jerikan/build.py:127: RuntimeError
-------------- Captured syntax check with Junoser call --------------
P: checks/junoser edge2.ussfo03.blade-group.net
C: /app/jerikan
O:
E: Invalid syntax:  set system syslog archive size 10m files 10 word-readable
S: 1

Integration into GitLab CI The next step is to compile the templates using a CI. As we are using GitLab, Jerikan ships with a .gitlab-ci.yml file. When we need to make a change, we create a dedicated branch and a merge request. GitLab compiles the templates using the same environment we use on our laptops and store them as an artifact.
Merge requests in GitLab for a change
Merge request to add a new port in USSFO03. The templates were compiled successfully but approval from another team member is still required to merge.
Before approving the merge request, another team member looks at the changes in data and templates but also the differences for the generated configuration files:
Differences for generated configuration files
The change configures a port on the Juniper device, adds records to DNS, and updates NetBox with the new IP addresses (not shown).

Ansible After Jerikan has built the configuration files, Ansible takes over. It is also packaged as a Docker image to avoid the trouble to maintain the right Python virtual environment and ensure everyone is using the same versions.

Inventory Jerikan has generated an inventory file. It contains all the managed devices, the variables defined for each of them and the groups converted to Ansible groups:
ob1-n1.sk1.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob2-n1.sk1.blade-group.net ansible_host=172.29.15.13 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob1-n1.ussfo03.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
none ansible_connection=local
[oob]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[os-ios]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[model-c2960s]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[location-sk1]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
[location-ussfo03]
ob1-n1.ussfo03.blade-group.net
[in-sync]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
none
in-sync is a special group for devices which configuration should match the golden configuration. Daily and unattended, Ansible should be able to push configurations to this group. The mid-term goal is to cover all devices. none is a special device for tasks not related to a specific host. This includes synchronizing NetBox, IRR objects, and the DNS, updating the RPKI, and building the geofeed files.

Playbook We use a single playbook for all devices. It is described in the ansible/playbooks/site.yaml file. Here is a shortened version:
- hosts: adm-gateway:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - blade.adm-gateway
    - done
- hosts: os-linux:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - done
- hosts: os-junos:!done
  gather_facts: false
  roles:
    - blade.junos
    - done
- hosts: os-opengear:!done
  gather_facts: false
  roles:
    - blade.opengear
    - done
- hosts: none:!done
  gather_facts: false
  roles:
    - blade.none
    - done
A host executes only one of the play. For example, a Junos device executes the blade.junos role. Once a play has been executed, the device is added to the done group and the other plays are skipped. The playbook can be executed with the configuration files generated by the GitLab CI using the ./run-ansible-gitlab command. This is a wrapper around Docker and the ansible-playbook command and it accepts the same arguments. To deploy the configuration on the edge devices for the SK1 datacenter in check mode, we use:
$ ./run-ansible-gitlab playbooks/site.yaml --limit='edge:&location-sk1' --diff --check
[ ]
PLAY RECAP *************************************************************
edge1.sk1.blade-group.net  : ok=6    changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
edge2.sk1.blade-group.net  : ok=5    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
We have some rules when writing roles:
  • --check must detect if a change is needed;
  • --diff must provide a visualization of the planned changes;
  • --check and --diff must not display anything if there is nothing to change;
  • writing a custom module tailored to our needs is a valid solution;
  • the whole device configuration is managed;6
  • secrets must be stored in Vault;
  • templates should be avoided as we have Jerikan for that; and
  • avoid duplication and reuse tasks.7
We avoid using collections from Ansible Galaxy, the exception being collections to connect and interact with vendor devices, like cisco.iosxr collection. The quality of Ansible Galaxy collections is quite random and it is an additional maintenance burden. It seems better to write roles tailored to our needs. The collections we use are in ci/ansible/ansible-galaxy.yaml. We use Mitogen to get a 10 speedup on Ansible executions on Linux hosts. We also have a few playbooks for operational purpose: upgrading the OS version, isolate an edge router, etc. We were also planning on how to add operational checks in roles: are all the BGP sessions up? They could have been used to validate a deployment and rollback if there is an issue. Currently, our playbooks are run from our laptops. To keep tabs, we are using ARA. A weekly dry-run on devices in the in-sync group also provides a dashboard on which devices we need to run Ansible on.

Configuration data and templates Jerikan ships with pre-populated data and templates matching the configuration of our USSFO03 and SK1 datacenters. They do not exist anymore but, we promise, all this was used in production back in the days!
Network architecture for Blade datacenter
The latest iteration of our network infrastructure for SK1, USSFO03, and future data centers. The production network is using BGPttH using a spine-leaf fabric. The out-of-band network is using a simple L2 design, using the spanning tree protocol, as well as a set of console servers.
Notably, you can find the configuration for:
our edge routers
Some are running on Junos, like edge2.ussfo03, the others on IOS-XR, like edge1.sk1. The implemented functionalities are similar in both cases and we could swap one for the other. It includes the BGP configuration for transits, peerings, and IX as well as the associated BGP policies. PeeringDB is queried to get the maximum number of prefixes to accept for peerings. bgpq3 and a containerized IRRd help to filter received routes. A firewall is added to protect the routing engine. Both IPv4 and IPv6 are configured.
our BGP-based fabric
BGP is used inside the datacenter8 and is extended on bare-metal hosts. The configuration is automatically derived from the device location and the port number.9 Top-of-the-rack devices are using passive BGP sessions for ports towards servers. They are also serving a provisioning network to let them boot using DHCP and PXE. They also act as a DHCP server. The design is multivendor. Some devices are running Cumulus Linux, like to1-p1.ussfo03, while some others are running Junos, like to1-p2.ussfo03.
our out-of-band fabric
We are using Cisco Catalyst 2960 switches to build an L2 out-of-band network. To provide redundancy and saving a few bucks on wiring, we build small loops and run the spanning-tree protocol. See ob1-p1.ussfo03. It is redundantly connected to our gateway servers. We also use OpenGear devices for console access. See con1-n1.ussfo03
our administrative gateways
These Linux servers have multiple purposes: SSH jump boxes, rescue connection, direct access to the out-of-band network, zero-touch provisioning of network devices,10 Internet access for management flows, centralization of the console servers using Conserver, and API for autoconfiguration of BGP sessions for bare-metal servers. They are the first servers installed in a new datacenter and are used to provision everything else. Check both the generated files and the associated Ansible tasks.

  1. Ansible does not even provide a line number when there is an error in a template. You may need to find the problem by bisecting.
    $ ansible --version
    ansible 2.10.8
    [ ]
    $ cat test.j2
    Hello   name  !
    $ ansible all -i localhost, \
    >  --connection=local \
    >  -m template \
    >  -a "src=test.j2 dest=test.txt"
    localhost   FAILED! =>  
        "changed": false,
        "msg": "AnsibleUndefinedVariable: 'name' is undefined"
     
    
  2. You may recognize the same concepts as in Hiera, the hierarchical key-value store from Puppet. At first, we were using Jerakia, a similar independent store exposing an HTTP REST interface. However, the lookup overhead is too large for our use. Jerikan implements the same functionality within a Python function.
  3. The list of available filters is mangled inside jerikan/jinja.py. This is a remain of the fact we do not maintain Jerikan as a standalone software.
  4. This is a bit confusing: we have a store() filter and a store() function. With Jinja2, filters and functions live in two different namespaces.
  5. We are using a fork with some modifications to be able to validate our configurations and exposing an HTTP service to reduce the time spent on each configuration check.
  6. There is a trend in network automation to automate a configuration subset, for example by having a playbook to create a new BGP session. We believe this is wrong: with time, your configuration will get out-of-sync with its expected state, notably hand-made changes will be left undetected.
  7. See ansible/roles/blade.linux/tasks/firewall.yaml and ansible/roles/blade.linux/tasks/interfaces.yaml. They are meant to be called when needed, using import_role.
  8. We also have some datacenters using BGP EVPN VXLAN at medium-scale using Juniper devices. As they are still in production today, we didn t include this feature but we may publish it in the future.
  9. In retrospect, this may not be a good idea unless you are pretty sure everything is uniform (number of switches for each layer, number of ports). This was not our case. We now think it is a better idea to assign a prefix to each device and write it in the source of truth.
  10. Non-linux based devices are upgraded and configured unattended. Cumulus Linux devices are automatically upgraded on install but the final configuration has to be pushed using Ansible: we didn t want to duplicate the configuration process using another tool.

24 May 2021

Ritesh Raj Sarraf: Kget Goodness

Why is it so hard to have a proper download manager in today s day ? We had it in the previous decade. Or do tech giants self-proclaim that the world lives only in their cloud. At one point, there used to be great download managers for all major web browsers, either in-built, or external. Then came the latest trend with Chrome and Firefox, where they make it difficult to have an external download manager work proper. On either one s extension store, I find it difficult to see a proper download manager. And I wonder what Google was consuming when making Google Takeout. They provide you with a specially crafted link for the downloads, and if for any reason your network gets lost, you are forced to redownload your takeout again. And with the pathetic download manager in Chrome and Firefox, an average Joe user would run in loops. THANK YOU GOOG. While I can understand the reason behind making the links time bound and invalidating them as soon as the time expires. But, enlighten your PMs that the world is not even. Not everyone has the same internet bandwidth. And admit that the download manager in your browser is crap. And then devise a better workflow and a product.

KGET Thankfully, there s Kget in the KDE Suite. While there s insanity elsewhere, luckily the kget tool still can do nice things. nuff said.
KGET to the rescue
KGET to the rescue

22 May 2021

Ritesh Raj Sarraf: Setting Up a Secure Webapp

As a person who prefers full access to data in the simplest format, while at the same time having it useful with latest technologies, my quest for trying things out is an ongoing activity. Earlier, I blogged about my needs of collating news feeds in a simple format, readily accessible offline, while still being useful and aligned with the modern paradigm. In today s age, the other common aspect of our life, is digitization of moments. With the advent of great technology and affordable economics, the world now has access to great devices to capture moments in digital form. Most people, these days, are equipped with smart devices, like mobile phones, that come with pretty good image capturing devices. Our lives, our societies, how we interact; a lot of it is now built around the assumption of smart devices and digital services. A lot of good things have happened of it. We are now able to send messages to people, securely, in a matter of seconds. We are now able to capture moments, which otherwise we d often miss; all thanks to devices like smart mobile phones that most of all carry almost all along with us. In this regard, we are quite indebted to the technology revolution that has improved life styles all over. Of course, like every coin has a flip side, thus too, too much of obsession with digital life, also has its drawbacks. A simple reminder to self should always be about what a human life is designed for and try to live by its rules.

Moments Given the tools general availability, we are all more used to capturing moments. Once upon a time, taking a photo meant going to a photo studio. Then came a generation where we d have personal devices with roll films, to which we captured the moments. And then get it processed from a photo developing service provider. And that is when we d know which all moments came out correct and which ones were spoilt. In such cases, there s always the wish I could have that moment again. Thankfully, now, in our current generation, we have the liberty to take pictures and validate them almost immediately. We also have the flexibility to take multiple shots of the moment and do the filtering later. There s also many intelligent and invasive services, all mostly provided free, to help you organize these moments; at a small cost of keeping an eye on your life activity. In a summary, mankind, now, generates lots and lots of data. So much lot that now even mammoths like Google are forced to make a call whether it is more profitable to sniff user activity while providing them a service for free, or ask people to pay otherwise too.

Privacy Many of us are wary of the amount of personal data we generate, which is our asset; and how the big tech giants want a piece of it in the name of free services. And in such quest, free aside, there s always a lookout for privacy savvy tools that can help us draw a bold line in between Public and Private

Google Photos Google Photos has been a great tool. The way it organizes, siphons and present your data is simply amazing. No good would have been the photos taken, if they weren t easily searchable, organized, annotated, and presented. But Google Photos is proprietary and invasive. The amount of information they have access to should be a concern to all people. For services that invade so much, the world needs Free Services.

PhotoPrism I only came across this project around a month ago, while the project has been around since 2019. So far, in my exploration, this is one of the best Free Software tools to manage and organize my digital photos. The other favorite tool that I regularly depend on is Digikam and it still is a gem. In a gist, PhotoPrism aims to be the equivalent of Google Photos, for the privacy savvy people. As of date, it has a decent list of features available. And for some of the missing ones, I ve come up with a fairly okay workflow with other tools, which is one of the reasons of this blog post. PhotoPrism is a web app, written in the Go Programming Language. Its layout and workflow as a Photo Management application is similar to Google Photos. It is very performant compared to other applications. Since it is a Progressive Web App, access through a Laptop, Tablet or Mobile is almost the same. It uses Google TensorFlow for some of its features and thus you feel like using Google Photos in some regard. Some shortcomings and workarounds
  • Facial Recognition: To date, facial recognition is a planned feature. But this was easy to tackle given that Digikam has pretty solid facial recognition. So I use Digikam to detect, recognize and annotate faces. And then PhotoPrism is happy enough to use those annotations and present relevant data.
  • HTTP Web App: The upstream project has done a good job of making use of Docker container technology in presenting this web app solution. A software solution, the equivalent of a Google Photos, does need a heavyweight config. In free software, it is all about reusing available tools. PhotoPrism makes use of tools like MySQL, Vue.js, TensorFlow, GoLang, GoLibs and strives to provide a single package solution, all thanks to Docker containers.
In its current offering, PhotoPrism is run as a HTTP Web App. I wanted to have an added layer of security on top of it. And thus run an nginx reverse proxy on top of it. Along with it, I run the proxy service on HTTPS, thus making all traffic from clients to the proxy encrypted. I also wanted an added layer of HTTP AUTH on top, so I explored some options and finally settled down with http-auth-digest Also, in the current implementation, PhotoPrism doesn t have a strong notion of normal and private photos in its data organization. I wanted normal photos available under a standard auth realm and private photos under a different realm. And along with a different realm, I also wanted some added security directives for it. So far, it looks like I ve put together a decent solution with the help of nginx.
  • First of all, since the port from the host is forwarded to the docker instance, that needed to be controlled. Instead of the default of listening on all interfaces, I changed it to loopback only. Because my primary and only interface is going to be the nginx reverse proxy
  • Setup nginx with a self-signed certificate to have all communication encrypted.
  • Setup nginx as a reverse proxy to talk to PhotoPrism.
    server  
       listen 80;
       listen [::]:80;
       server_name lenovo;
       return 302 https://$host$request_uri;
       rewrite ^ https://$http_host$request_uri? permanent;    # force redirect http to https
        server_tokens off;
     
    server  
    listen 443 ssl http2 default_server;
    listen [::]:443 ssl http2 default_server;
    ssl_certificate_key /etc/ssl/private/ssl-cert-snakeoil.key;
    ssl_certificate /etc/ssl/certs/ssl-cert-snakeoil.pem;
    server_tokens off;
    server_name lenovo;
    client_max_body_size 500M;
        auth_digest_expires 900s;
        auth_digest_evasion_time 5s;
        auth_digest_replays 500;
    location /private  
        auth_digest 'abc';
        auth_digest_user_file /etc/nginx/htdigest;
        auth_digest_expires 900s;
        auth_digest_evasion_time 5s;
        auth_digest_replays 500;
        proxy_pass http://localhost:2342/private;
     
    location /discover  
        auth_digest 'abc';
        auth_digest_user_file /etc/nginx/htdigest;
        auth_digest_expires 900s;
        auth_digest_evasion_time 5s;
        auth_digest_replays 500;
        proxy_pass http://localhost:2342/discover;
     
    location /  
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header Host $host;
        proxy_buffering off;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        proxy_pass http://localhost:2342;
          
        auth_basic "Priv";
        auth_basic_user_file /etc/nginx/htpasswd;
     
     
    
  • nginx packaging: I was so happy to see the simplicity of the nginx packaging in Debian. Becuase http-auth-digest is not upstream, it needs to be pulled in separately and compiled. I was happy to see how simple the packaging structure of nginx in Debian is. It was just a matter of putting in the module in the right modules location, which as structured such in the packaging falls under the debian/ packaging sub-folder, that any future upgrades will be quite easy to manage.
  • http-auth-digest: I d love to see http-auth-digest module be part of the upstream package. While I m a web n00b, I felt this module perfect for my use case. From what I ve understood, set up and tested so far, this module fills in all my requirements; which is more of like a session management.
With the combined setup in place, https://host is authenticated with a different set of credentials. On the other hand, https://host/discover and /private get covered with a different set of credentials and policies. While, this will continue to be an ongoing effort and audit of the services that I build, so far now, I feel this is in decent shape that I can use it as my daily driver. The end result is:
nginx auth prompt
nginx auth prompt
PhotoPrism Photo List
PhotoPrism Photo List
PhotoPrism Menu
PhotoPrism Menu

Updates
  • Things can really get frustrating at times. These days, it is Google that contributes to it. https://bugs.chromium.org/p/chromium/issues/detail?id=544244
    • It is is now fair to say that Google is big enough to rewrite all the standards. Or at least break them. Or best, their arrogance of it. This bug report is a fine example of what other ways it could have been dealt.

25 April 2021

Russ Allbery: Review: Learning React

Review: Learning React, by Alex Banks & Eve Porcello
Publisher: O'Reilly
Copyright: June 2020
ISBN: 1-4920-5172-1
Format: Trade paperback
Pages: 287
My first JavaScript project was a React frontend to a REST service. As part of that project, I read two books: JavaScript: The Definitive Guide to learn the language foundation and this book to learn the framework on top of it. This was an unintentional experiment in the ways programming books can approach the topic. I commented in my review of JavaScript: the Definitive Guide that it takes the reference manual approach to the language. Learning React is the exact opposite. It's goal-driven, example-heavy, and has a problem and solution structure. The authors present a sample application, describe some desired new feature or deficiency in it, and then introduce the specific React technique that solves that problem. There is some rewriting of previous examples using more sophisticated techniques, but most chapters introduce new toy applications along with new parts of the React framework. The best part of this book is its narrative momentum, so I think the authors were successful at their primary goal. The first eight chapters of the book (more on the rest of the book in a moment) feel like a whirlwind tour where one concept flows naturally into the next and one's questions while reading about one technique are often answered in the next section. I thought the authors tried too hard in places and overdid the enthusiasm, but it's very readable in a way that I think may appeal to people who normally find programming books dry. Learning React is also firm and definitive about the correct way to use React, which may appeal to readers who only want to learn the preferred way of using the framework. (For example, React class components are mentioned briefly, mostly to tell the reader not to use them, and the rest of the book only uses functional components.) I had two major problems with this book, however. The first is that this breezy, narrative style turns out to be awful when one tries to use it as a reference. I read through most of this book with both enjoyment and curiosity, sat down to write a React component, and immediately struggled to locate the information I needed. Everything felt logically connected when I was focusing on the problems the authors introduced, but as soon as I started from my own problem, the structure of the book fell apart. I had to page through chapters to locate some nugget buried in the text, or re-read sections of the book to piece together which motivating problem my code was most similar to. It was a frustrating experience. This may be a matter of learning style, since this is why I prefer programming books with a reference structure. But be warned that I can't recommend this book as a reference while you're programming, nor does it prepare you to use the official React documentation as a reference. The second problem is less explicable and less defensible. I don't know what happened with O'Reilly's copy-editing for this book, but the code snippets are a train wreck. The Amazon reviews are full of people complaining about typos, syntax errors, omitted code, and glaring logical flaws, and they are entirely correct. It's so bad that I was left wondering if a very early, untested draft of the examples was somehow substituted into the book at the last minute by mistake. I'm not the sort of person who normally types code in from a book, so I don't care about a few typos or obvious misprints as long as the general shape is correct. The general shape was not correct. In a few places, the code is so completely wrong and incomplete that even combined with the surrounding text I was unable to figure out what it was supposed to be. It's possible this is fixed in a later printing (I read the June 2020 printing of the second edition), but otherwise beware. The authors do include a link to a GitHub repository of the code samples, which are significantly different than what's printed in the book, but that repository is incomplete; many of the later chapter examples are only links to JavaScript web sandboxes, which bodes poorly for the longevity of the example code. And then there's chapter nine of this book, which I found entirely baffling. This is a direct quote from the start of the chapter:
This is the least important chapter in this book. At least, that's what we've been told by the React team. They didn't specifically say, "this is the least important chapter, don't write it." They've only issued a series of tweets warning educators and evangelists that much of their work in this area will very soon be outdated. All of this will change.
This chapter is on suspense and error boundaries, with a brief mention of Fiber. I have no idea what I'm supposed to do with this material as a reader who is new to React (and thus presumably the target audience). Should I use this feature? When? Why is this material in the book at all when it's so laden with weird stream-of-consciousness disclaimers? It's a thoroughly odd editorial choice. The testing chapter was similarly disappointing in that it didn't answer any of my concrete questions about testing. My instinct with testing UIs is to break out Selenium and do integration testing with its backend, but the authors are huge fans of unit testing of React applications. Great, I thought, this should be interesting; unit testing seems like a poor fit for UI code because of how artificial the test construction is, but maybe I'm missing some subtlety. Convince me! And then the authors... didn't even attempt to convince me. They just asserted unit testing is great and explained how to write trivial unit tests that serve no useful purpose in a real application. End of chapter. Sigh. I'm not sure what to say about this book. I feel like it has so many serious problems that I should warn everyone away from it, and yet the narrative introduction to React was truly fun to read and got me excited about writing React code. Even though the book largely fell apart as a reference, I still managed to write a working application using it as my primary reference, so it's not all bad. If you like the problem and solution style and want a highly conversational and informal tone (that errs on the side of weird breeziness), this may still be the book for you. Just be aware that the code examples are a trash fire, so if you learn from examples, you're going to have to chase them down via the GitHub repository and hope that they still exist (or get a later edition of the book where this problem has hopefully been corrected). Rating: 6 out of 10

19 April 2021

Ritesh Raj Sarraf: Catching Up Your Sources

I ve mostly had the preference of controlling my data rather than depend on someone else. That s one reason why I still believe email to be my most reliable medium for data storage, one that is not plagued/locked by a single entity. If I had the resources, I d prefer all digital data to be broken down to its simplest form for storage, like email format, and empower the user with it i.e. their data. Yes, there are free services that are indirectly forced upon common users, and many of us get attracted to it. Many of us do not think that the information, which is shared for the free service in return, is of much importance. Which may be fair, depending on the individual, given that they get certain services without paying any direct dime.

New age communication So first, we had email and usenet. As I mentioned above, email was designed with fine intentions. Intentions that make it stand even today, independently. But not everything, back then, was that great either. For example, instant messaging was very closed and centralised then too. Things like: ICQ, MSN, Yahoo Messenger; all were centralized. I wonder if people still have access to their ICQ logs. Not much has chagned in the current day either. We now have domination by: Facebook Messenger, Google Whatever the new marketing term they introdcue, WhatsApp, Telegram, Signal etc. To my knowledge, they are all centralized. Over all this time, I m yet to see a product come up with good (business) intentions, to really empower the end user. In this information age, the most invaluable data is user activity. That s one data everyone is after. If you decline to share that bit of free data in exchange for the free services, mind you, that that free service like Facebook, Google, Instagram, WhatsApp, Truecaller, Twitter; none of it would come to you at all. Try it out. So the reality is that while you may not be valuating the data you offer in exchange correctly, there s a lot that is reaped from it. But still, I think each user has (and should have) the freedom to opt in for these tech giants and give them their personal bit, for free services in return. That is a decent barter deal. And it is a choice that one if free to choose

Retaining my data I m fond of keeping an archive folder in my mailbox. A folder that holds significant events in the form of an email usually, if documented. Over the years, I chose to resort to the email format because I felt it was more reliable in the longer term than any other formats. The next best would be plain text. In my lifetime, I have learnt a lot from the internet; so it is natural that my preference has been with it. Mailing Lists, IRCs, HOWTOs, Guides, Blog posts; all have helped. And over the years, I ve come across hundreds of such content that I d always like to preserve. Now there are multiple ways to preserving data. Like, for example, big tech giants. In most usual cases, your data for your lifetime, should be fine with a tech giant. In some odd scenarios, you may be unlucky if you relied on a service provider that went bankrupt. But seriously, I think users should be fine if they host their data with Microsoft, Google etc; as long as they abide by their policies. There s also the catch of alignment. As the user, you should ensure to align (and transition) with the product offerings of your service provider. Otherwise, what may look constant and always reliable, will vanish in the blink of an eye. I guess Google Plus would be a good example. There was some Google Feed service too. Maybe Google Photos in the near decade future, just like Google Picasa in the previous (or current) decade.

History what is On the topic of retaining information, lets take a small drift. I still admire our ancestors. I don t know what went in their mind when they were documenting events in the form of scriptures, carvings, temples, churches, mosques etc; but one things for sure, they were able to leave a fine means of communication. They are all gone but a large number of those events are evident through the creations that they left. Some of those events have been strong enough that further rulers/invaders have had tough times trying to wipe it out from history. Remember, history is usually not the truth, but the statement to be believed by the teller. And the teller is usually the survivor, or the winner you may call. But still, the information retention techniques were better. I haven t visited, but admire whosoever built the Kailasa Temple, Ellora, without which, we d be made to believe what not by all invaders and rulers of the region. But the majestic standing of the temple, is a fine example of the history and the events that have occured in the past.
Ellora Temple -  The majectic carving believed to be carved out of a single stone
Ellora Temple - The majectic carving believed to be carved out of a single stone
Dominance has the power to rewrite history and unfortunately that s true and it has done its part. It is just that in a mere human s defined lifetime, it is not possible to witness the transtion from current to history, and say that I was there then and I m here now, and this is not the reality. And if not dominance, there s always the other bit, hearsay. With it, you can always put anything up for dispute. Because there s no way one can go back in time and produce a fine evidence. There s also a part about religion. Religion can be highly sentimental. And religion can be a solid way to get an agenda going. For example, in India - a country which today consitutionally is a secular country, there have been multiple attempts to discard the belief, that never ever did the thing called Ramayana exist. That the Rama Setu, nicely reworded as Adam s Bridge by who so ever, is a mere result of science. Now Rama, or Hanumana, or Ravana, or Valmiki, aren t going to come over and prove that that is true or false. So such subjects serve as a solid base to get an agenda going. And probably we ve even succeeded in proving and believing that there was never an event like Ramayana or the Mahabharata. Nor was there ever any empire other than the Moghul or the British Empire. But yes, whosoever made the Ellora Temple or the many many more of such creations, did a fine job of making a dent for the future, to know of what the history possibly could also be.

Enough of the drift So, in my opinion, having events documented is important. It d be nice to have skills documented too so that it can be passed over generations but that s a debatable topic. But events, I believe should be documented. And documented in the best possible ways so that its existence is not diminished. A documentation in the form of certain carvings on a rock is far more better than links and posts shared on Facebook, Twitter, Reddit etc. For one, these are all corporate entities with vested interests and can pretext excuse in the name of compliance and conformance. So, for the helpless state and generation I am in, I felt email was the best possible independent form of data retention in today s age. If I really had the resources, I d not rely on digital age. This age has no guarantee of retaining and recording information in any reliable manner. Instead, it is just mostly junk, which is manipulative and changeable, conditionally.

Email and RSS So for my communication, I like to prefer emails over any other means. That doesn t mean I don t use the current trends. I do. But this blog is mostly about penning my desires. And desire be to have communication over email format. Such is the case that for information useful over the internet, I crave to have it formatted in email for archival. RSS feeds is my most common mode of keeping track of information I care about. Not all that I care for is available in RSS feeds but hey such is life. And adaptability is okay. But my preference is still RSS. So I use RSS feeds through a fine software called feed2imap. A software that fits my bill fairly well. feed2imap is:
  • An rss feed news aggregator
  • Pulls and extracts news feeds in the form of an email
  • Can push the converted email over pop/imap
  • Can convert all image content to email mime attachment
In a gist, it makes the online content available to me offline in the most admired email format In my mail box, in today s day, my preferred email client is Evolution. It does a good job of dealing with such emails (rss feed items). An image example of accessing the rrs feed item through it is below
RSS News Item through Evolution
RSS News Item through Evolution
The good part is that my actual data is always independent of such MUAs. Tomorrow, as technology - trends - economics evolve, something new would come as a replacement but my data would still be mine.

7 March 2021

Louis-Philippe V ronneau: New Year, New OpenPGP Key

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA512
Sun, 07 Mar 2021 13:00:17 -0500
I've recently set up a new OpenPGP key and will be transitioning away from my
old one.
It is a chance for me to start using a OpenPGP hardware token and to transition
to a new personal email address (my main public contact is still my
 @debian.org  address).
Please note that I've partially redacted some email addresses from this
statement to minimise the amount of spam I receive. It shouldn't be hard for
actual humans to follow the instructions below to find the complete addresses.
The old key will continue to be valid for a few months, but will eventually be
revoked.
You might know my old OpenPGP certificate as:
pub   rsa4096/0x7AEAC4EC6AAA0A97 2014-12-22 [expires: 2021-06-02]
      Key fingerprint = 677F 54F1 FA86 81AD 8EC0  BCE6 7AEA C4EC 6AAA 0A97
uid       Louis-Philippe V ronneau <REDACTED@riseup.net>
uid       Louis-Philippe V ronneau (alias) <REDACTED@riseup.net>
uid       Louis-Philippe V ronneau (debian) <REDACTED@debian.org>
My new OpenPGP certificate is:
pub   ed25519/0xE1E5457C8BAD4113 2021-03-06 [expires: 2022-03-06]
      Key fingerprint = F64D 61D3 21F3 CB48 9156  753D E1E5 457C 8BAD 4113
uid       Louis-Philippe V ronneau <REDACTED@veronneau.org>
uid       Louis-Philippe V ronneau <REDACTED@debian.org>
These days, I mostly use my key for Debian and to sign git commit. I don't
really expect you to sign my new key if you had signed my old one.
I've published the new certificate on keys.openpgp.org as well as on my
personal website. You can fetch it like this:
    $ wget -O- https://veronneau.org/media/openpgp.key   gpg --import
-----BEGIN PGP SIGNATURE-----
iQIzBAEBCgAdFiEEZ39U8fqGga2OwLzmeurE7GqqCpcFAmBFFM8ACgkQeurE7Gqq
CpcuchAAscAeszdtA+TlCI4YvK5nlk+nJnCnNBSnl7Et+jiNjq8kB/Fud+dWMTXC
Zag8oJkalbbxub0BT0bEAn+BiBunu58E0gd0Xq4syTbqZ5o5IN17S/tfxCD0k1hf
ewrnYZ2l0i5g4YvHGKC+Xv4D+Z84BylnIRaXHqlUdluOVfVYDfLybOAqoktO/KUH
I+vQBwXj0Fr/QAtgiz5Nwh/YHFiU9xMSvr5ozRwAFs6+xfIqFHuVPRRkEN5iVo4D
kkMIz+kFfkoh4aWIP4dgAu39XnEgxwTR9J+4yE8TzCCMzO7xCK0X6vqgPAxYMPvb
RuP4FnGWOnGnlcudCUAUkOaryrwRi+dPQTnNICHTYsvVc7dg+W0EhVUkwEuuEwpI
qtcB/Y5AGhqK0Cc11uXiFjIQwLTgwcUez4F0xrGeqsTtAM5gyRup2w0jbocTuYSh
ZRv/2zwrq/S3xVrUYGqdT+L5odmkBzz9zOwY5WlU2H9CMFOdh71XOv9wWQXan9ou
hLRodeOQ8MinIBP+sX36ol1zg/aP7mCHvRRSBzWt7l3WhVxgZFpNwIfp/RZqU0R4
IEq48mntFhPvHJjFmAKLKK/ckzNMtSn+HWQPJV3HTInKCTu5PTNMU3SAvPHOHEps
V6WWSOPB+1Lm/tlIULDc+0SopWoiWO4NObCSs8zMZHlYPBk5x/KIdQQBFgoAHRYh
BMqnQAcHqBawIC/DzfQlelCyHPqFBQJgRRTPAAoJEPQlelCyHPqFFVEA/1qScaAk
O+eBEE4q0BaJDsqweCS1XCcuQGkQCKi5Zv6kAQChQ96Ve7cKbN/wRkT9pdIhmx01
+CmIsnp3k6N0ZYLLCg==
=onl0
-----END PGP SIGNATURE-----

26 February 2021

Ritesh Raj Sarraf: Wayland KDE X11

KDE Impressions These days, I often hear a lot about Wayland. And how much of effort is being put into it; not just by the Embedded world but also the usual Desktop systems, namely KDE and GNOME. In recent past, I switched back to KDE and have been (very) happy about the switch. Even though the KDE 4 (and initial KDE 5) debacle had burnt many, coming back to a usable KDE desktop is always a delight. It makes me feel home with the elegance, while at the same time the flexibility, it provides. It feels so nice to draft this blog article from Kwrite + VI Input Mode Thanks to the great work of the Debian KDE Team, but Norbert Preining in particular, who has helped bring very up-to-date KDE packages into Debian. Right now, I m on a Plamsa 5.21.1 desktop, which is recent by all standards.

Wayland Almost all the places in the Linux world these days are busy with integrating Wayland as the primary display service. Not sure what the current status on the GNOME side is but I definitely keep trying KDE + Wayland with every release. I keep trying with every release because it still is not prime for daily use. And it makes me get back to X11, no matter how dated some may call. Fact is, X11 still shines to me as an end-user. Glitches with Wayland still are (Based on this week s test on Plasma 5.21.1):
  • Horrible performance compared to X11
  • Very crashy, especially when hotplugging secondary display. Plasma would just crash. X11 is very resilient to such things, part of the reason I can think is the age of the codebase.
  • Many many applications still need to be fixed for Wayland. Or Wayland needs to accomodate them in some way. XWayland does not really feel like the answer.
And while KDE keeps insisting users to switch to Wayland, as that s where all the new enhancements and fixes are put in, someone like me still needs to stick to X11 for the time being. So to get my shiny new LG 27" 4K Monitor (3840x2160 60.00*+) to work without too much glitch, I had to live with an alias:
$ alias   grep xrandr
alias rrs_xrandr_lg='xrandr --output DP-1 --mode 3840x2160 --scale .75x.75'
18:31                    

Plasma 5.21 On the brighter side, the Plasma 5.21.1 release brings some nice enhancements in other areas.
  • I m now able to make use of tighter integration with systemd/cgroups, with better organization and management of processes overall.
  • The new Plasma theme, Breeze Twilight, is a good blend of Light + Dark.
I also appreciate the work put in by Michail Vourlakos. The KDE project is lucky to have a developer/designer like him. His vision and work into the KDE desktop is well beyond a writing by me.
$ usystemctl status plasma-plasmashell.service 
  plasma-plasmashell.service - KDE Plasma Workspace
     Loaded: loaded (/usr/lib/systemd/user/plasma-plasmashell.service; enabled; vendor preset: enabled)
     Active: active (running) since Fri 2021-02-26 18:34:23 IST; 13s ago
   Main PID: 501806 (plasmashell)
      Tasks: 21 (limit: 18821)
     Memory: 759.8M
        CPU: 13.706s
     CGroup: /user.slice/user-1000.slice/user@1000.service/session.slice/plasma-plasmashell.service
              501806 /usr/bin/plasmashell --no-respawn
             
Feb 26 18:35:00 priyasi plasmashell[501806]: qml: recreating buttons
Feb 26 18:35:21 priyasi plasmashell[501806]: qml: recreating buttons
Feb 26 18:35:49 priyasi plasmashell[501806]: qml: recreating buttons
Feb 26 18:35:57 priyasi plasmashell[501806]: qml: recreating buttons
18:36                  

OBS - Open Build Service I should also thank the OpenSUSE folks for the OBS work. It has enabled the close equivalent (or better, in my experience) of PPAs for Debian. And that is what has enabled developers like Norbert to easily and quickly be able to deliver the entire KDE suite.

OBS - Some detail Christian asked for some more details on the OBS side of things, of my view. I m updating this article with it because the comment system may not always be reliable and I hate losing content. Having been using OBS myself, and also others in the Debian community who are making use of it, I surely think we as project should consider making use of OBS. Given that OBS is Free Software, it is a perfect fit for Debian. Gitlab is another example of what we ve made available in Debian. OBS is divided into multiple parts
  • OBS Server
  • OBS DoD service
  • OBS Publisher
  • OBS Workers
  • OBS Warden
  • OBS Rep Server
For every Debian release I care about, I add an OBS project per release. So I have OBS projects for: Sid, Bullseye, Buster, Jessie. Now, say you have a package, foo . You prep your package and enable all the releases that you want to build the package for. So the same package gets built, in separate clean environments, for every release I mentioned as an example above. You don t have to manually trigger the build for every release/every architcture. You should add the release (as projects) in OBS, set their supported architectures, and then add those enabled release projets as bits to your package. Every build involves:
  • Creating a new chroot for each build
  • Building the package
Builds can be scattered across multiple hosts, known as workers in OBS terminology. Your workers are independent machine entities, supporting different architectures. The machines can be Bare-Metal ones, VMs, even containers. So this allows for very nice scale-in and scale-out. There may be auto-scaling too but that is something worth investigating. Think of things like cross architecture builds. Let s assume the cloud vendors decide to donate resources to the Debian project. We could enable OBS worker instances on the respective clouds (different architectures) and plug them into the master OBS instance that Debian hosts. Fully distributed. Similarly, big hardware vendors willing to donate compute resources could house them in their premises and Debian could just easily establish a connection to them. All of this just a TCP connection away. So when I look at the features of OBS, from the point of view of Debian, I like it more. Extensibility won t be an issue. Supporting a new Debian release would just be a matter of bootstrapping the Debian release as a project in OBS, and then all done. The single effort of setting of the target release project is a one time job, and then all can leverage it. The PPA was a long craved feature missing in Debian, in my opinion. OBS allows to not just fulfil that gap but also extend it in a very easy way. Andrew Lee had put in a nice video presentation about the same @ Debconf 20

Next.