Search Results: "ghe"

20 June 2023

Vasudev Kamath: Notes: Experimenting with ZRAM and Memory Over commit

Introduction The ZRAM module in the Linux kernel creates a memory-backed block device that stores its content in a compressed format. It offers users the choice of compression algorithms such as lz4, zstd, or lzo. These algorithms differ in compression ratio and speed, with zstd providing the best compression but being slower, while lz4 offers higher speed but lower compression.
Using ZRAM as Swap One interesting use case for ZRAM is utilizing it as swap space in the system. There are two utilities available for configuring ZRAM as swap: zram-tools and systemd-zram-generator. However, Debian Bullseye lacks systemd-zram-generator, making zram-tools the only option for Bullseye users. While it's possible to use systemd-zram-generator by self-compiling or via cargo, I preferred using tools available in the distribution repository due to my restricted environment.
Installation The installation process is straightforward. Simply execute the following command:
apt-get install zram-tools
Configuration The configuration involves modifying a simple shell script file /etc/default/zramswap sourced by the /usr/bin/zramswap script. Here's an example of the configuration I used:
# Compression algorithm selection
# Speed: lz4 > zstd > lzo
# Compression: zstd > lzo > lz4
# This is not inclusive of all the algorithms available in the latest kernels
# See /sys/block/zram0/comp_algorithm (when the zram module is loaded) to check
# the currently set and available algorithms for your kernel [1]
# [1]  https://github.com/torvalds/linux/blob/master/Documentation/blockdev/zram.txt#L86
ALGO=zstd
# Specifies the amount of RAM that should be used for zram
# based on a percentage of the total available memory
# This takes precedence and overrides SIZE below
PERCENT=30
# Specifies a static amount of RAM that should be used for
# the ZRAM devices, measured in MiB
# SIZE=256000
# Specifies the priority for the swap devices, see swapon(2)
# for more details. A higher number indicates higher priority
# This should probably be higher than hdd/ssd swaps.
# PRIORITY=100
I chose zstd as the compression algorithm for its superior compression capabilities. Additionally, I reserved 30% of memory as the size of the zram device. After modifying the configuration, restart the zramswap.service to activate the swap:
systemctl restart zramswap.service
Using systemd-zram-generator For Debian Bookworm users, an alternative option is systemd-zram-generator. Although zram-tools is still available in Debian Bookworm, systemd-zram-generator offers a more integrated solution within the systemd ecosystem. Below is an example of the translated configuration for systemd-zram-generator, located at /etc/systemd/zram-generator.conf:
# This config file enables a /dev/zram0 swap device with the following
# properties:
# * size: 50% of available RAM or 4GiB, whichever is less
# * compression-algorithm: kernel default
#
# This device's properties can be modified by adding options under the
# [zram0] section below. For example, to set a fixed size of 2GiB, set
#  zram-size = 2GiB .
[zram0]
zram-size = ceil(ram * 30/100)
compression-algorithm = zstd
swap-priority = 100
fs-type = swap
After making the necessary changes, reload systemd and start the systemd-zram-setup@zram0.service:
systemctl daemon-reload
systemctl start systemd-zram-setup@zram0.service
The systemd-zram-generator creates the zram device by loading the kernel module and then creates a systemd.swap unit to mount the zram device as swap. In this case, the swap file is called zram0.swap.
Checking Compression and Details To verify the effectiveness of the swap configuration, you can use the zramctl command, which is part of the util-linux package. Alternatively, the zramswap utility provided by zram-tools can be used to obtain the same output. During my testing with synthetic memory load created using stress-ng vm class I found that I can reach upto 40% compression ratio.
Memory Overcommit Another use case I was looking for is allowing the launching of applications that require more memory than what is available in the system. By default, the Linux kernel attempts to estimate the amount of free memory left on the system when user space requests more memory (vm.overcommit_memory=0). However, you can change this behavior by modifying the sysctl value for vm.overcommit_memory to 1. To demonstrate this, I ran a test using stress-ng to request more memory than the system had available. As expected, the Linux kernel refused to allocate memory, and the stress-ng process could not proceed.
free -tg                                                                                                                                                                                          (Mon,Jun19) 
                total        used        free      shared  buff/cache   available
 Mem:              31          12          11           3          11          18
 Swap:             10           2           8
 Total:            41          14          19
sudo stress-ng --vm=1 --vm-bytes=50G -t 120                                                                                                                                                       (Mon,Jun19) 
 stress-ng: info:  [1496310] setting to a 120 second (2 mins, 0.00 secs) run per stressor
 stress-ng: info:  [1496310] dispatching hogs: 1 vm
 stress-ng: info:  [1496312] vm: gave up trying to mmap, no available memory, skipping stressor
 stress-ng: warn:  [1496310] vm: [1496311] aborted early, out of system resources
 stress-ng: info:  [1496310] vm:
 stress-ng: warn:  [1496310]         14 System Management Interrupts
 stress-ng: info:  [1496310] passed: 0
 stress-ng: info:  [1496310] failed: 0
 stress-ng: info:  [1496310] skipped: 1: vm (1)
 stress-ng: info:  [1496310] successful run completed in 10.04s
By setting vm.overcommit_memory=1, Linux will allocate memory in a more relaxed manner, assuming an infinite amount of memory is available.
Conclusion ZRAM provides disks that allow for very fast I/O, and compression allows for a significant amount of memory savings. ZRAM is not restricted to just swap usage; it can be used as a normal block device with different file systems. Using ZRAM as swap is beneficial because, unlike disk-based swap, it is faster, and compression ensures that we use a smaller amount of RAM itself as swap space. Additionally, adjusting the memory overcommit settings can be beneficial for scenarios that require launching memory-intensive applications. Note: When running stress tests or allocating excessive memory, be cautious about the actual memory capacity of your system to prevent out-of-memory (OOM) situations. Feel free to explore the capabilities of ZRAM and optimize your system's memory management. Happy computing!

12 June 2023

Matthew Palmer: Private Key Redaction: Redux

[Note: the original version of this post named the author of the referenced blog post, and the tone of my writing could be construed to be mocking or otherwise belittling them. While that was not my intention, I recognise that was a possible interpretation, and I have revised this post to remove identifying information and try to neutralise the tone. On the other hand, I have kept the identifying details of the domain involved, as there are entirely legitimate security concerns that result from the issues discussed in this post.] I have spoken before about why it is tricky to redact private keys. Although that post demonstrated a real-world, presumably-used-in-the-wild private key, I ve been made aware of commentary along the lines of this representative sample:
I find it hard to believe that anyone would take their actual production key and redact it for documentation. Does the author have evidence of this in practice, or did they see example keys and assume they were redacted production keys?
Well, buckle up, because today s post is another real-world case study, with rather higher stakes than the previous example.

When Helping Hurts Today s case study begins with someone who attempted to do a very good thing: they wrote a blog post about using HashiCorp Vault to store certificates and their private keys. In his post, they included some test data, a certificate and a private key, which they redacted. Unfortunately, they did not redact these very well. Each base64 blob has had one line replaced with all xs. Based on the steps I explained previously, it is relatively straightforward to retrieve the entire, intact private key.

From Bad to OMFG Now, if this post author had, say, generated a fresh private key (after all, there s no shortage of possible keys), that would not be worthy of a blog post. As you may surmise, that is not what happened. After reconstructing the insufficiently-redacted private key, you end up with a key that has a SHA256 fingerprint (in hex) of: 72bef096997ec59a671d540d75bd1926363b2097eb9fe10220b2654b1f665b54 Searching for certificates which use that key fingerprint, we find one result: a certificate for hiltonhotels.jp (and a bunch of other, related, domains, as subjectAltNames). As of the time of writing, that certificate is not marked as revoked, and appears to be the same certificate that is currently presented to visitors of that site. This is, shall we say, not great. Anyone in possession of this private key which, I should emphasise, has presumably been public information since the post s publication date of February 2023 has the ability to completely transparently impersonate the sites listed in that certificate. That would provide an attacker with the ability to capture any data a user entered, such as personal information, passwords, or payment details, and also modify what the user s browser received, including injecting malware or other unpleasantness. In short, no good deed goes unpunished, and this attempt to educate the world at large about the benefits of secure key storage has instead published private key material. Remember, kids: friends don t let friends post redacted private keys to the Internet.

6 June 2023

Russell Coker: Dell 32 4K Monitor and DisplayPort Switch

After determining that the Philips 43 monitor was too large for my taste as well as not having a clear enough display [1] I bought a Dell 32 4K monitor for $499 on the 1st of July 2022. That monitor has been working nicely for almost a year now, for DisplayPort it s operation is perfect and 32 seems like an ideal size for my use. There is one problem that both HDMI ports will sometimes turn off for about half a second, I ve tested on both ports and on multiple computers as well as a dock and it gives the same result so it s definitely the monitor. The problem for me is that the most casual inspection won t reveal the problem and the monitor is large and difficult to transport as I ve thrown out the box. If I had this sort of problem with a monitor at work I d add it to the list of things for Dell to fix next time they visit the office or use one of the many monitor boxes available to ship it back to them. But for home use it s more of a problem for me. The easiest solution is to avoid HDMI. A year ago I blogged about using DDC to switch monitor inputs [2], I had that running with a cheap USB switch since then to allow a workstation and a laptop to share the same monitor, keyboard, and mouse. Recently I got a USB-C dock that allows a USB-C laptop to talk to a display via DisplayPort as opposed to the HDMI connector that s built in. But my Dell monitor only has one DisplayPort input. So I have just bought a DisplayPort and USB KVM switch via eBay for $52, a reasonable price given that last year such things were well over $100. It has ports for 3 USB devices which is better than my previous setup of a USB switch with only a single port that I used with a 3 port hub for my keyboard and mouse. the DisplayPort switch is described as doing 4K at 60Hz, I don t know how it will perform with a 5K monitor, maybe it will work at 30Hz or 40Hz. But currently Dell 5K monitors are at $2,500 and 6K monitors are about $3,800 so I don t plan to get one of them any time soon.

2 June 2023

Matt Brown: Calling time on DNSSEC: The costs exceed the benefits

I m calling time on DNSSEC. Last week, prompted by a change in my DNS hosting setup, I began removing it from the few personal zones I had signed. Then this Monday the .nz ccTLD experienced a multi-day availability incident triggered by the annual DNSSEC key rotation process. This incident broke several of my unsigned zones, which led me to say very unkind things about DNSSEC on Mastodon and now I feel compelled to more completely explain my thinking: For almost all domains and use-cases, the costs and risks of deploying DNSSEC outweigh the benefits it provides. Don t bother signing your zones. The .nz incident, while topical, is not the motivation or the trigger for this conclusion. Had it been a novel incident, it would still have been annoying, but novel incidents are how we learn so I have a small tolerance for them. The problem with DNSSEC is precisely that this incident was not novel, just the latest in a long and growing list. It s a clear pattern. DNSSEC is complex and risky to deploy. Choosing to sign your zone will almost inevitably mean that you will experience lower availability for your domain over time than if you leave it unsigned. Even if you have a team of DNS experts maintaining your zone and DNS infrastructure, the risk of routine operational tasks triggering a loss of availability (unrelated to any attempted attacks that DNSSEC may thwart) is very high - almost guaranteed to occur. Worse, because of the nature of DNS and DNSSEC these incidents will tend to be prolonged and out of your control to remediate in a timely fashion. The only benefit you get in return for accepting this almost certain reduction in availability is trust in the integrity of the DNS data a subset of your users (those who validate DNSSEC) receive. Trusted DNS data that is then used to communicate across an untrusted network layer. An untrusted network layer which you are almost certainly protecting with TLS which provides a more comprehensive and trustworthy set of security guarantees than DNSSEC is capable of, and provides those guarantees to all your users regardless of whether they are validating DNSSEC or not. In summary, in our modern world where TLS is ubiquitous, DNSSEC provides only a thin layer of redundant protection on top of the comprehensive guarantees provided by TLS, but adds significant operational complexity, cost and a high likelihood of lowered availability. In an ideal world, where the deployment cost of DNSSEC and the risk of DNSSEC-induced outages were both low, it would absolutely be desirable to have that redundancy in our layers of protection. In the real world, given the DNSSEC protocol we have today, the choice to avoid its complexity and rely on TLS alone is not at all painful or risky to make as the operator of an online service. In fact, it s the prudent choice that will result in better overall security outcomes for your users. Ignore DNSSEC and invest the time and resources you would have spent deploying it improving your TLS key and certificate management. Ironically, the one use-case where I think a valid counter-argument for this position can be made is TLDs (including ccTLDs such as .nz). Despite its many failings, DNSSEC is an Internet Standard, and as infrastructure providers, TLDs have an obligation to enable its use. Unfortunately this means that everyone has to bear the costs, complexities and availability risks that DNSSEC burdens these operators with. We can t avoid that fact, but we can avoid creating further costs, complexities and risks by choosing not to deploy DNSSEC on the rest of our non-TLD zones.

But DNSSEC will save us from the evil CA ecosystem! Historically, the strongest motivation for DNSSEC has not been the direct security benefits themselves (which as explained above are minimal compared to what TLS provides), but in the new capabilities and use-cases that could be enabled if DNS were able to provide integrity and trusted data to applications. Specifically, the promise of DNS-based Authentication of Named Entities (DANE) is that with DNSSEC we can be free of the X.509 certificate authority ecosystem and along with it the expensive certificate issuance racket and dubious trust properties that have long been its most distinguishing features. Ten years ago this was an extremely compelling proposition with significant potential to improve the Internet. That potential has gone unfulfilled. Instead of maturing as deployments progressed and associated operational experience was gained, DNSSEC has been beset by the discovery of issue after issue. Each of these has necessitated further changes and additions to the protocol, increasing complexity and deployment cost. For many zones, including significant zones like google.com (where I led the attempt to evaluate and deploy DNSSEC in the mid 2010s), it is simply infeasible to deploy the protocol at all, let alone in a reliable and dependable manner. While DNSSEC maturation and deployment has been languishing, the TLS ecosystem has been steadily and impressively improving. Thanks to the efforts of many individuals and companies, although still founded on the use of a set of root certificate authorities, the TLS and CA ecosystem today features transparency, validation and multi-party accountability that comprehensively build trust in the ability to depend and rely upon the security guarantees that TLS provides. When you use TLS today, you benefit from:
  • Free/cheap issuance from a number of different certificate authorities.
  • Regular, automated issuance/renewal via the ACME protocol.
  • Visibility into who has issued certificates for your domain and when through Certificate Transparency logs.
  • Confidence that certificates issued without certificate transparency (and therefore lacking an SCT) will not be accepted by the leading modern browsers.
  • The use of modern cryptographic protocols as a baseline, with a plausible and compelling story for how these can be steadily and promptly updated over time.
DNSSEC with DANE can match the TLS ecosystem on the first benefit (up front price) and perhaps makes the second benefit moot, but has no ability to match any of the other transparency and accountability measures that today s TLS ecosystem offers. If your ZSK is stolen, or a parent zone is compromised or coerced, validly signed TLSA records for a forged certificate can be produced and spoofed to users under attack with minimal chances of detection. Finally, in terms of overall trust in the roots of the system, the CA/Browser forum requirements continue to improve the accountability and transparency of TLS certificate authorities, significantly reducing the ability for any single actor (say a nefarious government) to subvert the system. The DNS root has a well established transparent multi-party system for establishing trust in the DNSSEC root itself, but at the TLD level, almost intentionally thanks to the hierarchical nature of DNS, DNSSEC has multiple single points of control (or coercion) which exist outside of any formal system of transparency or accountability. We ve moved from DANE being a potential improvement in security over TLS when it was first proposed, to being a definite regression from what TLS provides today. That s not to say that TLS is perfect, but given where we re at, we ll get a better security return from further investment and improvements in the TLS ecosystem than we will from trying to fix DNSSEC.

But TLS is not ubiquitous for non-HTTP applications The arguments above are most compelling when applied to the web-based HTTP-oriented ecosystem which has driven most of the TLS improvements we ve seen to date. Non-HTTP protocols are lagging in adoption of many of the improvements and best practices TLS has on the web. Some claim this need to provide a solution for non-HTTP, non-web applications provides a motivation to continue pushing DNSSEC deployment. I disagree, I think it provides a motivation to instead double-down on moving those applications to TLS. TLS as the new TCP. The problem is that costs of deploying and operating DNSSEC are largely fixed regardless of how many protocols you are intending to protect with it, and worse, the negative side-effects of DNSSEC deployment can and will easily spill over to affect zones and protocols that don t want or need DNSSEC s protection. To justify continued DNSSEC deployment and operation in this context means using a smaller set of benefits (just for the non-HTTP applications) to justify the already high costs of deploying DNSSEC itself, plus the cost of the risk that DNSSEC poses to the reliability to your websites. I don t see how that equation can ever balance, particularly when you evaluate it against the much lower costs of just turning on TLS for the rest of your non-HTTP protocols instead of deploying DNSSEC. MTA-STS is a worked example of how this can be achieved. If you re still not convinced, consider that even DNS itself is considering moving to TLS (via DoT and DoH) in order to add the confidentiality/privacy attributes the protocol currently lacks. I m not a huge fan of the latency implications of these approaches, but the ongoing discussion shows that clever solutions and mitigations for that may exist. DoT/DoH solve distinct problems from DNSSEC and in principle should be used in combination with it, but in a world where DNS itself is relying on TLS and therefore has eliminated the majority of spoofing and cache poisoning attacks through DoT/DoH deployment the benefit side of the DNSSEC equation gets smaller and smaller still while the costs remain the same.

OK, but better software or more careful operations can reduce DNSSEC s cost Some see the current DNSSEC costs simply as teething problems that will reduce as the software and tooling matures to provide more automation of the risky processes and operational teams learn from their mistakes or opt to simply transfer the risk by outsourcing the management and complexity to larger providers to take care of. I don t find these arguments compelling. We ve already had 15+ years to develop improved software for DNSSEC without success. What s changed that we should expect a better outcome this year or next? Nothing. Even if we did have better software or outsourced operations, the approach is still only hiding the costs behind automation or transferring the risk to another organisation. That may appear to work in the short-term, but eventually when the time comes to upgrade the software, migrate between providers or change registrars the debt will come due and incidents will occur. The problem is the complexity of the protocol itself. No amount of software improvement or outsourcing addresses that. After 15+ years of trying, I think it s worth considering that combining cryptography, caching and distributed consensus, some of the most fundamental and complex computer science problems, into a slow-moving and hard to evolve low-level infrastructure protocol while appropriately balancing security, performance and reliability appears to be beyond our collective ability. That doesn t have to be the end of the world, the improvements achieved in the TLS ecosystem over the same time frame provide a positive counter example - perhaps DNSSEC is simply focusing our attention at the wrong layer of the stack. Ideally secure DNS data would be something we could have, but if the complexity of DNSSEC is the price we have to pay to achieve it, I m out. I would rather opt to remain with the simpler yet insecure DNS protocol and compensate for its short comings at higher transport or application layers where experience shows we are able to more rapidly improve and develop our security capabilities.

Summing up For the vast majority of domains and use-cases there is simply no net benefit to deploying DNSSEC in 2023. I d even go so far as to say that if you ve already signed your zones, you should (carefully) move them back to being unsigned - you ll reduce the complexity of your operating environment and lower your risk of availability loss triggered by DNS. Your users will thank you. The threats that DNSSEC defends against are already amply defended by the now mature and still improving TLS ecosystem at the application layer, and investing in further improvements here carries far more return than deployment of DNSSEC. For TLDs, like .nz whose outage triggered this post, DNSSEC is not going anywhere and investment in mitigating its complexities and risks is an unfortunate burden that must be shouldered. While the full incident report of what went wrong with .nz is not yet available, the interim report already hints at some useful insights. It is important that InternetNZ publishes a full and comprehensive review so that the full set of learnings and improvements this incident can provide can be fully realised by .nz and other TLD operators stuck with the unenviable task of trying to safely operate DNSSEC.

Postscript After taking a few days to draft and edit this post, I ve just stumbled across a presentation from the well respected Geoff Huston at last weeks RIPE86 meeting. I ve only had time to skim the slides (video here) - they don t seem to disagree with my thinking regarding the futility of the current state of DNSSEC, but also contain some interesting ideas for what it might take for DNSSEC to become a compelling proposition. Probably worth a read/watch!

31 May 2023

Russ Allbery: Review: Night Watch

Review: Night Watch, by Terry Pratchett
Series: Discworld #29
Publisher: Harper
Copyright: November 2002
Printing: August 2014
ISBN: 0-06-230740-1
Format: Mass market
Pages: 451
Night Watch is the 29th Discworld novel and the sixth Watch novel. I would really like to tell people they could start here if they wanted to, for reasons that I will get into in a moment, but I think I would be doing you a disservice. The emotional heft added by having read the previous Watch novels and followed Vimes's character evolution is significant. It's the 25th of May. Vimes is about to become a father. He and several of the other members of the Watch are wearing sprigs of lilac for reasons that Sergeant Colon is quite vehemently uninterested in explaining. A serial killer named Carcer the Watch has been after for weeks has just murdered an off-duty sergeant. It's a tense and awkward sort of day and Vimes is feeling weird and wistful, remembering the days when he was a copper and not a manager who has to dress up in ceremonial armor and meet with committees. That may be part of why, when the message comes over the clacks that the Watch have Carcer cornered on the roof of the New Hall of the Unseen University, Vimes responds in person. He's grappling with Carcer on the roof of the University Library in the middle of a magical storm when lightning strikes. When he wakes up, he's in the past, shortly after he joined the Watch and shortly before the events of the 25th of May that the older Watch members so vividly remember and don't talk about. I have been saying recently in Discworld reviews that it felt like Pratchett was on the verge of a breakout book that's head and shoulders above Discworld prior to that point. This is it. This is that book. The setup here is masterful: the sprigs of lilac that slowly tell the reader something is going on, the refusal of any of the older Watch members to talk about it, the scene in the graveyard to establish the stakes, the disconcerting fact that Vetinari is wearing a sprig of lilac as well, and the feeling of building tension that matches the growing electrical storm. And Pratchett never gives into the temptation to explain everything and tip his hand prematurely. We know the 25th is coming and something is going to happen, and the reader can put together hints from Vimes's thoughts, but Pratchett lets us guess and sometimes be right and sometimes be wrong. Vimes is trying to change history, which adds another layer of uncertainty and enjoyment as the reader tries to piece together both the true history and the changes. This is a masterful job at a "what if?" story. And, beneath that, the commentary on policing and government and ethics is astonishingly good. In a review of an earlier Watch novel, I compared Pratchett to Dickens in the way that he focuses on a sort of common-sense morality rather than political theory. That is true here too, but oh that moral analysis is sharp enough to slide into you like a knife. This is not the Vimes that we first met in Guards! Guards!. He has has turned his cynical stubbornness into a working theory of policing, and it's subtle and complicated and full of nuance that he only barely knows how to explain. But he knows how to show it to people.
Keep the peace. That was the thing. People often failed to understand what that meant. You'd go to some life-threatening disturbance like a couple of neighbors scrapping in the street over who owned the hedge between their properties, and they'd both be bursting with aggrieved self-righteousness, both yelling, their wives would either be having a private scrap on the side or would have adjourned to a kitchen for a shared pot of tea and a chat, and they all expected you to sort it out. And they could never understand that it wasn't your job. Sorting it out was a job for a good surveyor and a couple of lawyers, maybe. Your job was to quell the impulse to bang their stupid fat heads together, to ignore the affronted speeches of dodgy self-justification, to get them to stop shouting and to get them off the street. Once that had been achieved, your job was over. You weren't some walking god, dispensing finely tuned natural justice. Your job was simply to bring back peace.
When Vimes is thrown back in time, he has to pick up the role of his own mentor, the person who taught him what policing should be like. His younger self is right there, watching everything he does, and he's desperately afraid he'll screw it up and set a worse example. Make history worse when he's trying to make it better. It's a beautifully well-done bit of tension that uses time travel as the hook to show both how difficult mentorship is and also how irritating one's earlier naive self would be.
He wondered if it was at all possible to give this idiot some lessons in basic politics. That was always the dream, wasn't it? "I wish I'd known then what I know now"? But when you got older you found out that you now wasn't you then. You then was a twerp. You then was what you had to be to start out on the rocky road of becoming you now, and one of the rocky patches on that road was being a twerp.
The backdrop of this story, as advertised by the map at the front of the book, is a revolution of sorts. And the revolution does matter, but not in the obvious way. It creates space and circumstance for some other things to happen that are all about the abuse of policing as a tool of politics rather than Vimes's principle of keeping the peace. I mentioned when reviewing Men at Arms that it was an awkward book to read in the United States in 2020. This book tackles the ethics of policing head-on, in exactly the way that book didn't. It's also a marvelous bit of competence porn. Somehow over the years, Vimes has become extremely good at what he does, and not just in the obvious cop-walking-a-beat sort of ways. He's become a leader. It's not something he thinks about, even when thrown back in time, but it's something Pratchett can show the reader directly, and have the other characters in the book comment on. There is so much more that I'd like to say, but so much would be spoilers, and I think Night Watch is more effective when you have the suspense of slowly puzzling out what's going to happen. Pratchett's pacing is exquisite. It's also one of the rare Discworld novels where Pratchett fully commits to a point of view and lets Vimes tell the story. There are a few interludes with other people, but the only other significant protagonist is, quite fittingly, Vetinari. I won't say anything more about that except to note that the relationship between Vimes and Vetinari is one of the best bits of fascinating subtlety in all of Discworld. I think it's also telling that nothing about Night Watch reads as parody. Sure, there is a nod to Back to the Future in the lightning storm, and it's impossible to write a book about police and street revolutions without making the reader think about Les Miserables, but nothing about this plot matches either of those stories. This is Pratchett telling his own story in his own world, unapologetically, and without trying to wedge it into parody shape, and it is so much the better book for it. The one quibble I have with the book is that the bits with the Time Monks don't really work. Lu-Tze is annoying and flippant given the emotional stakes of this story, the interludes with him are frustrating and out of step with the rest of the book, and the time travel hand-waving doesn't add much. I see structurally why Pratchett put this in: it gives Vimes (and the reader) a time frame and a deadline, it establishes some of the ground rules and stakes, and it provides a couple of important opportunities for exposition so that the reader doesn't get lost. But it's not good story. The rest of the book is so amazingly good, though, that it doesn't matter (and the framing stories for "what if?" explorations almost never make much sense). The other thing I have a bit of a quibble with is outside the book. Night Watch, as you may have guessed by now, is the origin of the May 25th Pratchett memes that you will be familiar with if you've spent much time around SFF fandom. But this book is dramatically different from what I was expecting based on the memes. You will, for example see a lot of people posting "Truth, Justice, Freedom, Reasonably Priced Love, And a Hard-Boiled Egg!", and before reading the book it sounds like a Pratchett-style humorous revolutionary slogan. And I guess it is, sort of, but, well... I have to quote the scene:
"You'd like Freedom, Truth, and Justice, wouldn't you, Comrade Sergeant?" said Reg encouragingly. "I'd like a hard-boiled egg," said Vimes, shaking the match out. There was some nervous laughter, but Reg looked offended. "In the circumstances, Sergeant, I think we should set our sights a little higher " "Well, yes, we could," said Vimes, coming down the steps. He glanced at the sheets of papers in front of Reg. The man cared. He really did. And he was serious. He really was. "But...well, Reg, tomorrow the sun will come up again, and I'm pretty sure that whatever happens we won't have found Freedom, and there won't be a whole lot of Justice, and I'm damn sure we won't have found Truth. But it's just possible that I might get a hard-boiled egg."
I think I'm feeling defensive of the heart of this book because it's such an emotional gut punch and says such complicated and nuanced things about politics and ethics (and such deeply cynical things about revolution). But I think if I were to try to represent this story in a meme, it would be the "angels rise up" song, with all the layers of meaning that it gains in this story. I'm still at the point where the lilac sprigs remind me of Sergeant Colon becoming quietly furious at the overstep of someone who wasn't there. There's one other thing I want to say about that scene: I'm not naturally on Vimes's side of this argument. I think it's important to note that Vimes's attitude throughout this book is profoundly, deeply conservative. The hard-boiled egg captures that perfectly: it's a bit of physical comfort, something you can buy or make, something that's part of the day-to-day wheels of the city that Vimes talks about elsewhere in Night Watch. It's a rejection of revolution, something that Vimes does elsewhere far more explicitly. Vimes is a cop. He is in some profound sense a defender of the status quo. He doesn't believe things are going to fundamentally change, and it's not clear he would want them to if they did. And yet. And yet, this is where Pratchett's Dickensian morality comes out. Vimes is a conservative at heart. He's grumpy and cynical and jaded and he doesn't like change. But if you put him in a situation where people are being hurt, he will break every rule and twist every principle to stop it.
He wanted to go home. He wanted it so much that he trembled at the thought. But if the price of that was selling good men to the night, if the price was filling those graves, if the price was not fighting with every trick he knew... then it was too high. It wasn't a decision that he was making, he knew. It was happening far below the areas of the brain that made decisions. It was something built in. There was no universe, anywhere, where a Sam Vimes would give in on this, because if he did then he wouldn't be Sam Vimes any more.
This is truly exceptional stuff. It is the best Discworld novel I have read, by far. I feel like this was the Watch novel that Pratchett was always trying to write, and he had to write five other novels first to figure out how to write it. And maybe to prepare Discworld readers to read it. There are a lot of Discworld novels that are great on their own merits, but also it is 100% worth reading all the Watch novels just so that you can read this book. Followed in publication order by The Wee Free Men and later, thematically, by Thud!. Rating: 10 out of 10

23 May 2023

Craig Small: Devices with cgroup v2

Docker and other container systems by default restrict access to devices on the host. They used to do this with cgroups with the cgroup v1 system, however, the second version of cgroups removed this controller and the man page says:
Cgroup v2 device controller has no interface files and is implemented on top of cgroup BPF.
https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst
That is just awesome, nothing to see here, go look at the BPF documents if you have cgroup v2. With cgroup v1 if you wanted to know what devices were permitted, you just would cat /sys/fs/cgroup/XX/devices.allow and you were done! The kernel documentation is not very helpful, sure its something in BPF and has something to do with the cgroup BPF specifically, but what does that mean? There doesn t seem to be an easy corresponding method to get the same information. So to see what restrictions a docker container has, we will have to:
  1. Find what cgroup the programs running in the container belong to
  2. Find what is the eBPF program ID that is attached to our container cgroup
  3. Dump the eBPF program to a text file
  4. Try to interpret the eBPF syntax
The last step is by far the most difficult.

Finding a container s cgroup All containers have a short ID and a long ID. When you run the docker ps command, you get the short id. To get the long id you can either use the --no-trunc flag or just guess from the short ID. I usually do the second.
$ docker ps 
CONTAINER ID   IMAGE            COMMAND       CREATED          STATUS          PORTS     NAMES
a3c53d8aaec2   debian:minicom   "/bin/bash"   19 minutes ago   Up 19 minutes             inspiring_shannon
So the short ID is a3c53d8aaec2 and the long ID is a big ugly hex string starting with that. I generally just paste the relevant part in the next step and hit tab. For this container the cgroup is /sys/fs/cgroup/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope/ Notice that the last directory has docker- then the short ID. If you re not sure of the exact path. The /sys/fs/cgroup is the cgroup v2 mount point which can be found with mount -t cgroup2 and then rest is the actual cgroup name. If you know the process running in the container then the cgroup column in ps will show you.
$ ps -o pid,comm,cgroup 140064
    PID COMMAND         CGROUP
 140064 bash            0::/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope
Either way, you will have your cgroup path.

eBPF programs and cgroups Next we will need to get the eBPF program ID that is attached to our recently found cgroup. To do this, we will need to use the bpftool. One thing that threw me for a long time is when the tool talks about a program or a PROG ID they are talking about the eBPF programs, not your processes! With that out of the way, let s find the prog id.
$ sudo bpftool cgroup list /sys/fs/cgroup/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope/
ID       AttachType      AttachFlags     Name
90       cgroup_device   multi
Our cgroup is attached to eBPF prog with ID of 90 and the type of program is cgroup _device.

Dumping the eBPF program Next, we need to get the actual code that is run every time a process running in the cgroup tries to access a device. The program will take some parameters and will return either a 1 for yes you are allowed or a zero for permission denied. Don t use the file option as it dumps the program in binary format. The text version is hard enough to understand.
sudo bpftool prog dump xlated id 90 > myebpf.txt
Congratulations! You now have the eBPF program in a human-readable (?) format.

Interpreting the eBPF program The eBPF format as dumped is not exactly user friendly. It probably helps to first go and look at an example program to see what is going on. You ll see that the program splits type (lower 4 bytes) and access (higher 4 bytes) and then does comparisons on those values. The eBPF has something similar:
   0: (61) r2 = *(u32 *)(r1 +0)
   1: (54) w2 &= 65535
   2: (61) r3 = *(u32 *)(r1 +0)
   3: (74) w3 >>= 16
   4: (61) r4 = *(u32 *)(r1 +4)
   5: (61) r5 = *(u32 *)(r1 +8)
What we find is that once we get past the first few lines filtering the given value that the comparison lines have:
  • r2 is the device type, 1 is block, 2 is character.
  • r3 is the device access, it s used with r1 for comparisons after masking the relevant bits. mknod, read and write are 1,2 and 3 respectively.
  • r4 is the major number
  • r5 is the minor number
For a even pretty simple setup, you are going to have around 60 lines of eBPF code to look at. Luckily, you ll often find the lines for the command options you added will be near the end, which makes it easier. For example:
  63: (55) if r2 != 0x2 goto pc+4
  64: (55) if r4 != 0x64 goto pc+3
  65: (55) if r5 != 0x2a goto pc+2
  66: (b4) w0 = 1
  67: (95) exit
This is a container using the option --device-cgroup-rule='c 100:42 rwm'. It is checking if r2 (device type) is 2 (char) and r4 (major device number) is 0x64 or 100 and r5 (minor device number) is 0x2a or 42. If any of those are not true, move to the next section, otherwise return with 1 (permit). We have all access modes permitted so it doesn t check for it. The previous example has all permissions for our device with id 100:42, what about if we only want write access with the option --device-cgroup-rule='c 100:42 r'. The resulting eBPF is:
  63: (55) if r2 != 0x2 goto pc+7  
  64: (bc) w1 = w3
  65: (54) w1 &= 2
  66: (5d) if r1 != r3 goto pc+4
  67: (55) if r4 != 0x64 goto pc+3
  68: (55) if r5 != 0x2a goto pc+2
  69: (b4) w0 = 1
  70: (95) exit
The code is almost the same but we are checking that w3 only has the second bit set, which is for reading, effectively checking for X==X&2. It s a cautious approach meaning no access still passes but multiple bits set will fail.

The device option docker run allows you to specify files you want to grant access to your containers with the --device flag. This flag actually does two things. The first is to great the device file in the containers /dev directory, effectively doing a mknod command. The second thing is to adjust the eBPF program. If the device file we specified actually did have a major number of 100 and a minor of 42, the eBPF would look exactly like the above snippets.

What about privileged? So we have used the direct cgroup options here, what does the --privileged flag do? This lets the container have full access to all the devices (if the user running the process is allowed). Like the --device flag, it makes the device files as well, but what does the filtering look like? We still have a cgroup but the eBPF program is greatly simplified, here it is in full:
   0: (61) r2 = *(u32 *)(r1 +0)
   1: (54) w2 &= 65535
   2: (61) r3 = *(u32 *)(r1 +0)
   3: (74) w3 >>= 16
   4: (61) r4 = *(u32 *)(r1 +4)
   5: (61) r5 = *(u32 *)(r1 +8)
   6: (b4) w0 = 1
   7: (95) exit
There is the usual setup lines and then, return 1. Everyone is a winner for all devices and access types!

14 May 2023

C.J. Collier: Early Access: Inserting JSON data to BigQuery from Spark on Dataproc

Hello folks! We recently received a case letting us know that Dataproc 2.1.1 was unable to write to a BigQuery table with a column of type JSON. Although the BigQuery connector for Spark has had support for JSON columns since 0.28.0, the Dataproc images on the 2.1 line still cannot create tables with JSON columns or write to existing tables with JSON columns. The customer has graciously granted permission to share the code we developed to allow this operation. So if you are interested in working with JSON column tables on Dataproc 2.1 please continue reading! Use the following gcloud command to create your single-node dataproc cluster:
IMAGE_VERSION=2.1.1-debian11
REGION=us-west1
ZONE=$ REGION -a
CLUSTER_NAME=pick-a-cluster-name
gcloud dataproc clusters create $ CLUSTER_NAME  \
    --region $ REGION  \
    --zone $ ZONE  \
    --single-node \
    --master-machine-type n1-standard-4 \
    --master-boot-disk-type pd-ssd \
    --master-boot-disk-size 50 \
    --image-version $ IMAGE_VERSION  \
    --max-idle=90m \
    --enable-component-gateway \
    --scopes 'https://www.googleapis.com/auth/cloud-platform'
The following file is the Scala code used to write JSON structured data to a BigQuery table using Spark. The file following this one can be executed from your single-node Dataproc cluster. Main.scala
import org.apache.spark.sql.functions.col
import org.apache.spark.sql.types. Metadata, StringType, StructField, StructType 
import org.apache.spark.sql. Row, SaveMode, SparkSession 
import org.apache.spark.sql.avro
import org.apache.avro.specific
  val env = "x"
  val my_bucket = "cjac-docker-on-yarn"
  val my_table = "dataset.testavro2"
    val spark = env match  
      case "local" =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .master("local")
          .appName("isssue_115574")
          .getOrCreate()
      case _ =>
        SparkSession
          .builder()
          .config("temporaryGcsBucket", my_bucket)
          .appName("isssue_115574")
          .getOrCreate()
     
  // create DF with some data
  val someData = Seq(
    Row(""" "name":"name1", "age": 10  """, "id1"),
    Row(""" "name":"name2", "age": 20  """, "id2")
  )
  val schema = StructType(
    Seq(
      StructField("user_age", StringType, true),
      StructField("id", StringType, true)
    )
  )
  val avroFileName = s"gs://$ my_bucket /issue_115574/someData.avro"
  
  val someDF = spark.createDataFrame(spark.sparkContext.parallelize(someData), schema)
  someDF.write.format("avro").mode("overwrite").save(avroFileName)
  val avroDF = spark.read.format("avro").load(avroFileName)
  // set metadata
  val dfJSON = avroDF
    .withColumn("user_age_no_metadata", col("user_age"))
    .withMetadata("user_age", Metadata.fromJson(""" "sqlType":"JSON" """))
  dfJSON.show()
  dfJSON.printSchema
  // write to BigQuery
  dfJSON.write.format("bigquery")
    .mode(SaveMode.Overwrite)
    .option("writeMethod", "indirect")
    .option("intermediateFormat", "avro")
    .option("useAvroLogicalTypes", "true")
    .option("table", my_table)
    .save()
repro.sh:
#!/bin/bash
PROJECT_ID=set-yours-here
DATASET_NAME=dataset
TABLE_NAME=testavro2
# We have to remove all of the existing spark bigquery jars from the local
# filesystem, as we will be using the symbols from the
# spark-3.3-bigquery-0.30.0.jar below.  Having existing jar files on the
# local filesystem will result in those symbols having higher precedence
# than the one loaded with the spark-shell.
sudo find /usr -name 'spark*bigquery*jar' -delete
# Remove the table from the bigquery dataset if it exists
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Create the table with a JSON type column
bq mk --table $PROJECT_ID:$DATASET_NAME.$TABLE_NAME \
  user_age:JSON,id:STRING,user_age_no_metadata:STRING
# Load the example Main.scala 
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show the table schema when we use  bq mk --table  and then load the avro
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
# Remove the table so that we can see that the table is created should it not exist
bq rm -f -t $PROJECT_ID:$DATASET_NAME.$TABLE_NAME
# Dynamically generate a DataFrame, store it to avro, load that avro,
# and write the avro to BigQuery, creating the table if it does not already exist
spark-shell -i Main.scala \
  --jars /usr/lib/spark/external/spark-avro.jar,gs://spark-lib/bigquery/spark-3.3-bigquery-0.30.0.jar
# Show that the table schema does not differ from one created with a bq mk --table
bq query --use_legacy_sql=false \
  "SELECT ddl FROM $DATASET_NAME.INFORMATION_SCHEMA.TABLES where table_name='$TABLE_NAME'"
Google BigQuery has supported JSON data since October of 2022, but until now, it has not been possible, on generally available Dataproc clusters, to interact with these columns using the Spark BigQuery Connector. JSON column type support was introduced in spark-bigquery-connector release 0.28.0.

11 May 2023

Shirish Agarwal: India Press freedom, Profiteering, AMD issues in the wild.

India Press Freedom Just about a week back, India again slipped in the Freedom index, this time falling to 161 out of 180 countries. The RW again made lot of noise as they cannot fathom why it has been happening so. A recent news story gives some idea. Every year NCRB (National Crime Records Bureau) puts out its statistics of crimes happening across the country. The report is in public domain. Now according to report shared, around 40k women from Gujarat alone disappeared in the last five years. This is a state where BJP has been ruling for the last 30 odd years. When this report became viral, almost all national newspapers the news was censored/blacked out. For e.g. check out newindianexpress.com, likewise TOI and other newspapers, the news has been 404. The only place that you can get that news is in minority papers like siasat. But the story didn t remain till there. While the NCW (National Commission of Women) pointed out similar stuff happening in J&K, Gujarat Police claimed they got almost 39k women back. Now ideally, it should have been in NCRB data as an addendum as the report can be challenged. But as this news was made viral, nobody knows the truth or false in the above. What BJP has been doing is whenever they get questioned, they try to muddy the waters like that. And most of the time, such news doesn t make to court so the party gets a freebie in a sort as they are not legally challenged. Even if somebody asks why didn t Gujarat Police do it as NCRB report is jointly made with the help of all states, and especially with BJP both in Center and States, they cannot give any excuse. The only excuse you see or hear is whataboutism unfortunately

Profiteering on I.T. Hardware I was chatting with a friend yesterday who is an enthusiast like me but has been more alert about what has been happening in the CPU, motherboard, RAM world. I was simply shocked to hear the prices of motherboards which are three years old, even a middling motherboard. For e.g. the last time I bought a mobo, I spent about 6k but that was for an ATX motherboard. Most ITX motherboards usually sold for around INR 4k/- or even lower. I remember Via especially as their mobos were even cheaper around INR 1.5-2k/-. Even before pandemic, many motherboard manufacturers had closed down shop leaving only a few in the market. As only a few remained, prices started going higher. The pandemic turned it to a seller s market overnight as most people were stuck at home and needed good rigs for either work or leisure or both. The manufacturers of CPU, motherboards, GPU s, Powersupply (SMPS) named their prices and people bought it. So in 2023, high prices remained while warranty periods started coming down. Governments also upped customs and various other duties. So all are in hand in glove in the situation. So as shared before, what I have been offered is a 4 year motherboard with a CPU of that time. I haven t bought it nor do I intend to in short-term future but extremely disappointed with the state of affairs

AMD Issues It s just been couple of hard weeks apparently for AMD. The first has been the TPM (Trusted Platform Module) issue that was shown by couple of security researchers. From what is known, apparently with $200 worth of tools and with sometime you can hack into somebody machine if you have physical access. Ironically, MS made a huge show about TPM and also made it sort of a requirement if a person wanted to have Windows 11. I remember Matthew Garett sharing about TPM and issues with Lenovo laptops. While AMD has acknowledged the issue, its response has been somewhat wishy-washy. But this is not the only issue that has been plaguing AMD. There have been reports of AMD chips literally exploding and again AMD issuing a somewhat wishy-washy response.  Asus though made some changes but is it for Zen4 or only 5 parts, not known. Most people are expecting a recession in I.T. hardware this year as well as next year due to high prices. No idea if things will change, if ever

5 May 2023

Shirish Agarwal: CAT-6, AMD 5600G, Dealerships closing down, TRAI-caller and privacy.

CAT-6 patch cord & ONU Few months back I was offered a fibre service. Most of the service offering has been using Chinese infrastructure including the ONU (Optical Network Unit). Wikipedia doesn t have a good page on ONU hence had to rely on third-party sites. FS (a name I don t really know) has some (good basic info. on ONU and how it s part and parcel of the whole infrastructure. I also got an ONT (Optical Network Terminal) but it seems to be very basic and mostly dumb. I used the old CAT-6 cable ( a decade old) to connect them and it worked for couple of months. Had to change it, first went to know if a higher cable solution offered themselves. CAT-7 is there but not backward compatible. CAT-8 is the next higher version but apparently it s expensive and also not easily bought. I did quite a few tests on CAT-6 and the ONU and it conks out at best 1 mbps which is still far better than what I am used to. CAT-8 are either not available or simply too expensive for home applications atm. A good summary of CAT-8 and what they stand for can be found here. The networking part is hopeless as most consumer facing CPU s and motherboards don t even offer 10 mbps, so asking anything more is just overkill without any benefit. Which does bring me to the next question, something that I may do in a few months or a year down the road. Just to clarify they may say it is 100 mbps or even 1 Gbps but that s plain wrong.

AMD APU, Asus Motherboard & Dealerships I had been thinking of an AMD APU, could wait a while but sooner or later would have to get one. I got quoted an AMD Ryzen 3 3200G with an Asus A320 Motherboard for around 14k which kinda looked steep to me. Quite a few hardware dealers whom I had traded, consulted over years simply shut down. While there are new people, it s much more harder now to make relationships (due to deafness) rather than before. The easiest to share which was also online was pcpartpicker.com that had an Indian domain now no longer available. The number of offline brick and mortar PC business has also closed quite a bit. There are a few new ones but it takes time and the big guys have made more of a killing. I was shocked quite a bit. Came home and browsed a bit and was hit by this. Both AMD and Intel PC business has taken a beating. AMD a bit more as Intel still holds part of the business segment as traditionally been theirs. There have been proofs and allegations of bribing in the past (do remember the EU Antitrust case against Intel for monopoly) but Intel s own cutting corners with the Spectre and Meltdown flaws hasn t helped its case, nor the suits themselves. AMD on the other hand under expertise of Lisa Su has simply grown strength by strength. Inflation and Profiteering by other big companies has made the outlook for both AMD and Intel a bit lackluster. AMD is supposed to show Zen5 chips in a few days time and the rumor mill has been ongoing. Correction Not few days but 2025. Personally, I would be happy with maybe a Ryzen 5600G with an Asus motherboard. My main motive whenever I buy an APU is not to hit beyond 65 TDP. It s kinda middle of the road. As far as what I could read this year and next year we could have AM4+ or something like those updates, AM5 APU s, CPU s and boards are slated to be launched in 2025. I did see pcpricetracker and it does give idea of various APU prices although have to say pcpartpicker was much intuitive to work with than the above. I just had my system cleaned couple of months so touchwood I should be able to use it for another couple of years or more before I have to get one of these APU s and do hope they are worth it. My idea is to use that not only for testing various softwares but also delve a bit into VR if that s possible. I did read a bit about deafness and VR as well. A good summary can be found here. I am hopeful that there may be few people in the community who may look and respond to that. It s crucial.

TRAI-caller, Privacy 101& Element. While most of us in Debian and FOSS communities do engage in privacy, lots of times it s frustrating. I m always looking for videos that seek to share that view why Privacy is needed by individuals and why Governments and other parties hate it. There are a couple of basic Youtube Videos that does explain the same quite practically.
Now why am I sharing the above. It isn t that people do not privacy and how we hold it dear. I share it because GOI just today blocked Element. While it may be trivial for us to workaround the issues, it does tell what GOI is doing. And it still acts as if surprised why it s press ranking is going to pits. Even our Women Wrestlers have been protesting for a week to just file an FIR (First Information Report) . And these are women who have got medals for the country. More than half of these organizations, specifically the women wrestling team don t have POSH which is a mandatory body supposed to be in every organization. POSH stands for Prevention of Sexual Harassment at Workplace. The gentleman concerned is a known rowdy/Goon hence it took almost a week of protest to do the needful  I do try not to report because right now every other day we see somewhere or the other the Govt. curtailing our rights and most people are mute  Signing out, till later

1 May 2023

Gunnar Wolf: Scanning heaps of 8mm movies

After my father passed away, I brought home most of the personal items he had, both at home and at his office. Among many, many (many, many, many) other things, I brought two of his personal treasures: His photo collection and a box with the 8mm movies he shot approximately between 1956 and 1989, when he was forced into modernity and got a portable videocassette recorder. I have talked with several friends, as I really want to get it all in a digital format, and while I ve been making slow but steady advances scanning the photo reels, I was particularly dismayed (even though it was most expected most personal electronic devices aren t meant to last over 50 years) to find out the 8mm projector was no longer in working conditions; the lamp and the fans work, but the spindles won t spin. Of course, it is quite likely it is easy to fix, but it is beyond my tinkering abilities and finding photographic equipment repair shops is no longer easy. Anyway, even if I got it fixed, filming a movie from a screen, even with a decent camera, is a lousy way to get it digitized. But almost by mere chance, I got in contact with my cousin Daniel, ho came to Mexico to visit his parents, and had precisely brought with him a 8mm/Super8 movie scanner! It is a much simpler piece of equipment than I had expected, and while it does present some minor glitches (i.e. the vertical framing slightly loses alignment over the course of a medium-length film scanning session, and no adjustment is possible while the scan is ongoing), this is something that can be decently fixed in post-processing, and a scanning session can be split with no ill effects. Anyway, it is quite uncommon a mid-length (5min) film can be done without interrupting i.e. to join a splice, mostly given my father didn t just film, but also edited a lot (this is, it s not just family pictures, but all different kinds of fiction and documentary work he did). So, Daniel lent me a great, brand new, entry-level film scanner; I rushed to scan as many movies as possible before his return to the USA this week, but he insisted he bought it to help preserve our family s memory, and given we are still several cousins living in Mexico, I could keep hold of it so any other of the cousins will find it more easily. Of course, I am thankful and delighted! So, this equipment is a Magnasonic FS81. It is entry-level, as it lacks some adjustment abilities a professional one would surely have, and I m sure a better scanner will make the job faster but it s infinitely superior to not having it! The scanner processes roughly two frames per second (while the nominal 8mm/Super8 speed is 24 frames per second), so a 3 minute film reel takes a bit over 35 minutes And a long, ~20 minute film reel takes Close to 4hr, if nothing gets in your way :- And yes, with longer reels, the probability of a splice breaking are way higher than with a short one not only because there is simply a longer film to process, but also because, both at the unwinding and at the receiving reels, mechanics play their roles. The films don t advance smoothly, but jump to position each frame in the scanner s screen, so every bit of film gets its fair share of gentle tugs. My professional consultant on how and what to do is my good friend Chema Serralde, who has stopped me from doing several things I would regret later otherwise (such as joining spliced tapes with acidic chemical adhesives such as Kola Loka, a.k.a. Krazy Glue even if it s a bit trickier to do it, he insisted me on best using simple transparent tape if I m not buying fancy things such as film-adhesive). Chema also explained me the importance of the loopers (las Lupes in his technical Spanish translation), which I feared increased the likelihood of breaking a bit of old glue due to the angle in which the film gets pulled but if skipped, result in films with too much jumping. Not all of the movies I have are for public sharing Some of them are just family movies, with high personal value, but probably of very little interest to others. But some are! I have been uploading some of the movies, after minor post-processing, to the Internet Archive. Among them: Anyway, I have a long way forward for scanning. I have 20 3min reels, 19 5min reels, and 8 20min reels. I want to check the scanning quality, but I think my 20min reels are mostly processed (we paid for scanning them some years ago). I mostly finished the 3min reels, but might have to go over some of them again due to the learning process. And Well, I m having quite a bit of fun in the process!

27 April 2023

Arturo Borrero Gonz lez: Kubecon and CloudNativeCon 2023 Europe summary

Post logo This post serves as a report from my attendance to Kubecon and CloudNativeCon 2023 Europe that took place in Amsterdam in April 2023. It was my second time physically attending this conference, the first one was in Austin, Texas (USA) in 2017. I also attended once in a virtual fashion. The content here is mostly generated for the sake of my own recollection and learnings, and is written from the notes I took during the event. The very first session was the opening keynote, which reunited the whole crowd to bootstrap the event and share the excitement about the days ahead. Some astonishing numbers were announced: there were more than 10.000 people attending, and apparently it could confidently be said that it was the largest open source technology conference taking place in Europe in recent times. It was also communicated that the next couple iteration of the event will be run in China in September 2023 and Paris in March 2024. More numbers, the CNCF was hosting about 159 projects, involving 1300 maintainers and about 200.000 contributors. The cloud-native community is ever-increasing, and there seems to be a strong trend in the industry for cloud-native technology adoption and all-things related to PaaS and IaaS. The event program had different tracks, and in each one there was an interesting mix of low-level and higher level talks for a variety of audience. On many occasions I found that reading the talk title alone was not enough to know in advance if a talk was a 101 kind of thing or for experienced engineers. But unlike in previous editions, I didn t have the feeling that the purpose of the conference was to try selling me anything. Obviously, speakers would make sure to mention, or highlight in a subtle way, the involvement of a given company in a given solution or piece of the ecosystem. But it was non-invasive and fair enough for me. On a different note, I found the breakout rooms to be often small. I think there were only a couple of rooms that could accommodate more than 500 people, which is a fairly small allowance for 10k attendees. I realized with frustration that the more interesting talks were immediately fully booked, with people waiting in line some 45 minutes before the session time. Because of this, I missed a few important sessions that I ll hopefully watch online later. Finally, on a more technical side, I ve learned many things, that instead of grouping by session I ll group by topic, given how some subjects were mentioned in several talks. On gitops and CI/CD pipelines Most of the mentions went to FluxCD and ArgoCD. At that point there were no doubts that gitops was a mature approach and both flux and argoCD could do an excellent job. ArgoCD seemed a bit more over-engineered to be a more general purpose CD pipeline, and flux felt a bit more tailored for simpler gitops setups. I discovered that both have nice web user interfaces that I wasn t previously familiar with. However, in two different talks I got the impression that the initial setup of them was simple, but migrating your current workflow to gitops could result in a bumpy ride. This is, the challenge is not deploying flux/argo itself, but moving everything into a state that both humans and flux/argo can understand. I also saw some curious mentions to the config drifts that can happen in some cases, even if the goal of gitops is precisely for that to never happen. Such mentions were usually accompanied by some hints on how to operate the situation by hand. Worth mentioning, I missed any practical information about one of the key pieces to this whole gitops story: building container images. Most of the showcased scenarios were using pre-built container images, so in that sense they were simple. Building and pushing to an image registry is one of the two key points we would need to solve in Toolforge Kubernetes if adopting gitops. In general, even if gitops were already in our radar for Toolforge Kubernetes, I think it climbed a few steps in my priority list after the conference. Another learning was this site: https://opengitops.dev/. Group On etcd, performance and resource management I attended a talk focused on etcd performance tuning that was very encouraging. They were basically talking about the exact same problems we have had in Toolforge Kubernetes, like api-server and etcd failure modes, and how sensitive etcd is to disk latency, IO pressure and network throughput. Even though Toolforge Kubernetes scale is small compared to other Kubernetes deployments out there, I found it very interesting to see other s approaches to the same set of challenges. I learned how most Kubernetes components and apps can overload the api-server. Because even the api-server talks to itself. Simple things like kubectl may have a completely different impact on the API depending on usage, for example when listing the whole list of objects (very expensive) vs a single object. The conclusion was to try avoiding hitting the api-server with LIST calls, and use ResourceVersion which avoids full-dumps from etcd (which, by the way, is the default when using bare kubectl get calls). I already knew some of this, and for example the jobs-framework-emailer was already making use of this ResourceVersion functionality. There have been a lot of improvements in the performance side of Kubernetes in recent times, or more specifically, in how resources are managed and used by the system. I saw a review of resource management from the perspective of the container runtime and kubelet, and plans to support fancy things like topology-aware scheduling decisions and dynamic resource claims (changing the pod resource claims without re-defining/re-starting the pods). On cluster management, bootstrapping and multi-tenancy I attended a couple of talks that mentioned kubeadm, and one in particular was from the maintainers themselves. This was of interest to me because as of today we use it for Toolforge. They shared all the latest developments and improvements, and the plans and roadmap for the future, with a special mention to something they called kubeadm operator , apparently capable of auto-upgrading the cluster, auto-renewing certificates and such. I also saw a comparison between the different cluster bootstrappers, which to me confirmed that kubeadm was the best, from the point of view of being a well established and well-known workflow, plus having a very active contributor base. The kubeadm developers invited the audience to submit feature requests, so I did. The different talks confirmed that the basic unit for multi-tenancy in kubernetes is the namespace. Any serious multi-tenant usage should leverage this. There were some ongoing conversations, in official sessions and in the hallway, about the right tool to implement K8s-whitin-K8s, and vcluster was mentioned enough times for me to be convinced it was the right candidate. This was despite of my impression that multiclusters / multicloud are regarded as hard topics in the general community. I definitely would like to play with it sometime down the road. On networking I attended a couple of basic sessions that served really well to understand how Kubernetes instrumented the network to achieve its goal. The conference program had sessions to cover topics ranging from network debugging recommendations, CNI implementations, to IPv6 support. Also, one of the keynote sessions had a reference to how kube-proxy is not able to perform NAT for SIP connections, which is interesting because I believe Netfilter Conntrack could do it if properly configured. One of the conclusions on the CNI front was that Calico has a massive community adoption (in Netfilter mode), which is reassuring, especially considering it is the one we use for Toolforge Kubernetes. Slide On jobs I attended a couple of talks that were related to HPC/grid-like usages of Kubernetes. I was truly impressed by some folks out there who were using Kubernetes Jobs on massive scales, such as to train machine learning models and other fancy AI projects. It is acknowledged in the community that the early implementation of things like Jobs and CronJobs had some limitations that are now gone, or at least greatly improved. Some new functionalities have been added as well. Indexed Jobs, for example, enables each Job to have a number (index) and process a chunk of a larger batch of data based on that index. It would allow for full grid-like features like sequential (or again, indexed) processing, coordination between Job and more graceful Job restarts. My first reaction was: Is that something we would like to enable in Toolforge Jobs Framework? On policy and security A surprisingly good amount of sessions covered interesting topics related to policy and security. It was nice to learn two realities:
  1. kubernetes is capable of doing pretty much anything security-wise and create greatly secured environments.
  2. it does not by default. The defaults are not security-strict on purpose.
It kind of made sense to me: Kubernetes was used for a wide range of use cases, and developers didn t know beforehand to which particular setup they should accommodate the default security levels. One session in particular covered the most basic security features that should be enabled for any Kubernetes system that would get exposed to random end users. In my opinion, the Toolforge Kubernetes setup was already doing a good job in that regard. To my joy, some sessions referred to the Pod Security Admission mechanism, which is one of the key security features we re about to adopt (when migrating away from Pod Security Policy). I also learned a bit more about Secret resources, their current implementation and how to leverage a combo of CSI and RBAC for a more secure usage of external secrets. Finally, one of the major takeaways from the conference was learning about kyverno and kubeaudit. I was previously aware of the OPA Gatekeeper. From the several demos I saw, it was to me that kyverno should help us make Toolforge Kubernetes more sustainable by replacing all of our custom admission controllers with it. I already opened a ticket to track this idea, which I ll be proposing to my team soon. Final notes In general, I believe I learned many things, and perhaps even more importantly I re-learned some stuff I had forgotten because of lack of daily exposure. I m really happy that the cloud native way of thinking was reinforced in me, which I still need because most of my muscle memory to approach systems architecture and engineering is from the old pre-cloud days. List of sessions I attended on the first day: List of sessions I attended on the second day: List of sessions I attended on third day: The videos have been published on Youtube.

23 April 2023

Petter Reinholdtsen: Speech to text, she APTly whispered, how hard can it be?

While visiting a convention during Easter, it occurred to me that it would be great if I could have a digital Dictaphone with transcribing capabilities, providing me with texts to cut-n-paste into stuff I need to write. The background is that long drives often bring up the urge to write on texts I am working on, which of course is out of the question while driving. With the release of OpenAI Whisper, this seem to be within reach with Free Software, so I decided to give it a go. OpenAI Whisper is a Linux based neural network system to read in audio files and provide text representation of the speech in that audio recording. It handle multiple languages and according to its creators even can translate into a different language than the spoken one. I have not tested the latter feature. It can either use the CPU or a GPU with CUDA support. As far as I can tell, CUDA in practice limit that feature to NVidia graphics cards. I have few of those, as they do not work great with free software drivers, and have not tested the GPU option. While looking into the matter, I did discover some work to provide CUDA support on non-NVidia GPUs, and some work with the library used by Whisper to port it to other GPUs, but have not spent much time looking into GPU support yet. I've so far used an old X220 laptop as my test machine, and only transcribed using its CPU. As it from a privacy standpoint is unthinkable to use computers under control of someone else (aka a "cloud" service) to transcribe ones thoughts and personal notes, I want to run the transcribing system locally on my own computers. The only sensible approach to me is to make the effort I put into this available for any Linux user and to upload the needed packages into Debian. Looking at Debian Bookworm, I discovered that only three packages were missing, tiktoken, triton, and openai-whisper. For a while I also believed ffmpeg-python was needed, but as its upstream seem to have vanished I found it safer to rewrite whisper to stop depending on in than to introduce ffmpeg-python into Debian. I decided to place these packages under the umbrella of the Debian Deep Learning Team, which seem like the best team to look after such packages. Discussing the topic within the group also made me aware that the triton package was already a future dependency of newer versions of the torch package being planned, and would be needed after Bookworm is released. All required code packages have been now waiting in the Debian NEW queue since Wednesday, heading for Debian Experimental until Bookworm is released. An unsolved issue is how to handle the neural network models used by Whisper. The default behaviour of Whisper is to require Internet connectivity and download the model requested to ~/.cache/whisper/ on first invocation. This obviously would fail the deserted island test of free software as the Debian packages would be unusable for someone stranded with only the Debian archive and solar powered computer on a deserted island. Because of this, I would love to include the models in the Debian mirror system. This is problematic, as the models are very large files, which would put a heavy strain on the Debian mirror infrastructure around the globe. The strain would be even higher if the models change often, which luckily as far as I can tell they do not. The small model, which according to its creator is most useful for English and in my experience is not doing a great job there either, is 462 MiB (deb is 414 MiB). The medium model, which to me seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB) and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume everyone with enough resources would prefer to use the large model for highest quality. I believe the models themselves would have to go into the non-free part of the Debian archive, as they are not really including any useful source code for updating the models. The "source", aka the model training set, according to the creators consist of "680,000 hours of multilingual and multitask supervised data collected from the web", which to me reads material with both unknown copyright terms, unavailable to the general public. In other words, the source is not available according to the Debian Free Software Guidelines and the model should be considered non-free. I asked the Debian FTP masters for advice regarding uploading a model package on their IRC channel, and based on the feedback there it is still unclear to me if such package would be accepted into the archive. In any case I wrote build rules for a OpenAI Whisper model package and modified the Whisper code base to prefer shared files under /usr/ and /var/ over user specific files in ~/.cache/whisper/ to be able to use these model packages, to prepare for such possibility. One solution might be to include only one of the models (small or medium, I guess) in the Debian archive, and ask people to download the others from the Internet. Not quite sure what to do here, and advice is most welcome (use the debian-ai mailing list). To make it easier to test the new packages while I wait for them to clear the NEW queue, I created an APT source targeting bookworm. I selected Bookworm instead of Bullseye, even though I know the latter would reach more users, is that some of the required dependencies are missing from Bullseye and I during this phase of testing did not want to backport a lot of packages just to get up and running. Here is a recipe to run as user root if you want to test OpenAI Whisper using Debian packages on your Debian Bookworm installation, first adding the APT repository GPG key to the list of trusted keys, then setting up the APT repository and finally installing the packages and one of the models:
curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
  -o /etc/apt/trusted.gpg.d/pere-whisper.asc
mkdir -p /etc/apt/sources.list.d
cat > /etc/apt/sources.list.d/pere-whisper.list <<EOF
deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
EOF
apt update
apt install openai-whisper
The package work for me, but have not yet been tested on any other computer than my own. With it, I have been able to (badly) transcribe a 2 minute 40 second Norwegian audio clip to test using the small model. This took 11 minutes and around 2.2 GiB of RAM. Transcribing the same file with the medium model gave a accurate text in 77 minutes using around 5.2 GiB of RAM. My test machine had too little memory to test the large model, which I believe require 11 GiB of RAM. In short, this now work for me using Debian packages, and I hope it will for you and everyone else once the packages enter Debian. Now I can start on the audio recording part of this project. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

18 April 2023

Shirish Agarwal: Philips LCD Monitor 22 , 1984, Reaper Man, The Firm.

PHILIPS PHL 221S8L Those who have been reading this blog for a long time would perhaps know that I had bought a Viewsonic 19 almost 12 years ago. The Monitor was functioning well till last week. I had thought to change it to a 24 monitor almost 3-4 years ago when 24 LCD Monitors were going for around 4k/- or thereabouts. But the monitor kept on functioning and I didn t have space (nor do) to have a dual-monitor setup. It just didn t make sense. Apart from higher electricity charges it would have also have made more demands on my old system which somehow is still functioning even after all those years. Then last week, it started to dim and after couple of days completely conked out. I had wanted to buy a new monitor in front of mum so she could watch movies or whatever but this was not to be. Sp I had to buy an LCD Monitor as Government raised taxes enormously after pandemic. Same/similar monitor that used to cost INR 4k/- today costed me almost INR 7k/- almost double the price. Hooking it to Debian I got the following

$ sudo hwinfo --monitor grep Model Model: "PHILIPS PHL 221S8L" FWIW hwinfo is the latest version

~$ sudo hwinfo --version21.82
I did see couple of movies before starting to write this blog post. Not an exceptional monitor but better than before. I had option from three brands, Dell (most expensive), Philips (middle) and & LG (lowest in prices). Interestingly, Viewsonic disappeared from the market about 5 years back and made a comeback just couple of years ago. Even Philips which had exited the PC Monitor almost a decade back re-entered the market. Apart from the branding, it doesn t make much of a difference as almost all the products including the above monitors are produced in China. I did remember her a lot while buying the monitor as I m sure she would have enjoyed it far more than me but that was not to be

1984 During last week when I didn t have the monitor I re-read 1984. To be completely honest, I had read the above book when I was in the 20 s and I had no context. The protagonist seemed like a whiner and for the life of me I couldn t understand why he didn t try to escape. Re-reading after almost 2 decades and a bit more I shat a number of times because now the context is pretty near and pretty real. I can see why the Republicans in the U.S. banned it. I also realized why the protagnist didn t attempt to run away because wherever he would run away it would be the same thing. It probably is one of the most depressing books I have ever read. To willfully accept what is false after all that torture. What was also interesting to me is to find that George Orwell was also a soldier just like Tolkien was. Both took part and wrote such different stories. While Mr. Tolkien writes and shares the pendulum between hope and despair, Mr. Orwell is decidedly dark. Not grey but dark. I am not sure if I would like to read Animal Farm anytime soon.

The Reaper Man Terry Pratchett It is by sheer coincidence or perhaps I needed something to fill me up when I got The Reaper Man from Terry Pratchett. It was practically like a breath of fresh air. And I love Mr. Pratchett for the inclusivity he brings in. We think about skin color, and what not and here Mr. Pratchett writes about an undead gentleman who s extremely polite as he was a wizard. I won t talk more as I don t really want to spoil the surprise but rest assured everybody is gonna love it. I also read Long Utopia but this is for those who believe and think of multiverses long before it became a buzzword that it is today.

The Firm John Grisham Now I don t know what I should write about this book as there aren t many John Grisham books where a rookie wins against more than one party opposite him. I wouldn t go much into depth but simply say it was worth a read. I am currently reading Gray Mountain. It very much shows how the coal Industry is corrupt and what all it does. It also brings to mind the amount of mining that is done in which Iron is the mostly sought after and done. Now if we are mining 94% Iron then wouldn t it make sense to ask to have a circular economy around Iron but we don t even hear a word about it. Even with all the imagined projections of lithium mining, it would hardly be 10% . I could go on but will finish for now, till later.

Matthew Palmer: Rutie and Magnus, Two Good Ways to Build Ruby Extensions in Rust

I wrote the Ruby bindings for the Enquo Project, my attempt to bring queryable encryption to all databases, using the Rutie library. Recently, I ve rewritten the bindings to use Magnus instead, and I thought I d put down my thoughts about the whole situation.

The Story So Far The Enquo Project core cryptography is all written in Rust, as seems to be the vogue these days. Rust is fast, safe, and easily interoperable with most of the rest of the modern software development ecosystem, making it a good choice as a language to implement the cryptographic primitives that Enquo needs, like Order-Revealing Encryption. Of course, since not everyone writes their applications in Rust, we need to provide the functionality of the Enquo client in the languages that people do use, such as Ruby, Python, and so on. Since re-writing all that cryptographic code in a myriad of languages would be tedious and error-prone, we instead provide bindings to the core Rust code. These are just thin shims of code that translate the data types and function calls between Rust and the target language.
Shim in a Can Wrong sort of shim, but canned language bindings would be handy
As I m most familiar with Ruby and its development ecosystem (particularly Ruby on Rails), it was natural that I d make Ruby bindings for Enquo as my first target. Rummaging around, it seemed that Rutie was a good library to use, so off I went.

What are Rutie and Magnus, Anyway? Both libraries share the same goal: provide the ability to write some Rust code, run that through a compiler, and produce something that can be loaded by the Ruby interpreter and used just like any other Ruby class. They re both fairly high level interfaces, trying to abstract away much of the gory details, and do a lot of the common heavy lifting that can make writing bindings fiddly and annoying. Things like mapping data types (like strings and integers) between Rust data types and the closest equivalents in Ruby. This mapping never goes perfectly smoothly. For example, Ruby integers don t have a fixed range of values they can represent you can store a huge number like 2256 more-or-less as easily as you can the number 12. But Rust, being a lower-level language, only has a set of integer types that have fixed boundaries, like the u32 type, which can only store integers between zero and about four billion (232 - 1, to be precise). There s also lots of little things that need to be just right, also, like translating the different memory management approaches of the languages, and dealing with a myriad of fiddly little issues like passing arguments and return values in and out of method calls, helpers for defining classes and methods (and pointing to the correct Rust functions), and so on.
A mass of tangled pipes and valves This is what I imagine it looks like inside these libraries
(Herv Cozanet / Wikimedia Commons, CC-BY-SA)
All in all, these libraries are fairly significant pieces of work, and I m mighty glad that someone else has taken on the job of building (and maintaining!) them.

So Why the Change? Good question. It s important to say at the outset that there s nothing particularly wrong with Rutie. I found using Rutie to be very straightforward, and the Ruby bindings came together very quickly and easily. If someone chose to use Rutie for their project, I m sure they d have a good experience. What made me take the time to rewrite using Magnus was a set of a few tiny things, which together gave me enough of a shove to do the work. Firstly, I d had a hiccup with Rutie s support of newer versions of Ruby, particularly 3.2 (PR). Also, I d hit a couple of segfault issues, which were ultimately caused by Ruby garbage-collecting data out from underneath me. These were ultimately my fault, of course, but Rutie wasn t helping me out in avoiding the problems in the first place. Finally, while Rutie helped translate data types, there was still a bit of boilerplate and ugliness that needed to be included. This wasn t a showstopper, but I m appreciating the extra smoothness that Magnus provides here. As an example, here s what s required in Rutie to get native Rust data types from Ruby method parameters (and the self reference to the current object):
fn enquo_field_decrypt_text(ciphertext_obj: RString, context_obj: RString) -> RString  
    let ciphertext = ciphertext_obj.to_str_unchecked();
    let context = context_obj.to_vec_u8_unchecked();
    let field = rbself.get_data(&*FIELD_WRAPPER);
    // etc etc etc
The equivalent in Magnus is just the function signature:
fn decrypt_text(&self, ciphertext: String, context: String) -> Result<String, magnus::Error>  
You can also see there that Magnus signals an exception via the Result return value, while Rutie s approach to raising an exception involves poking the Ruby VM directly, which always struck me as a bit ugly. There are several other minor things in Magnus (like its cleaner approach to wrapping structs so they can be stored in Ruby objects) that I m appreciating, too. Never discount the power of ergonomics for making a happy developer.

The End Result I spent a bit over half of last weekend doing the rewrite maybe ten hours of so. Since Magnus did more type checking and data validation, and its approach to error handling was smoother, I took the opportunity to rewrite a bunch of Ruby wrapper code I d written (which just existed to check things like ranges of values and string encodings) into Rust, as well. To make sure that the conversion was accurate, I added a heap more unit tests to the bindings. I also took the opportunity to restructure the codebase to split the code for the different Ruby classes into separate files, which I hadn t done initially as the code had originally accreted, rather than being purposefully written. All up, though, my rewrite ended up removing over 60 lines (excluding the extra specs I added):
$ git diff --stat -- lib ext/enquo/src
 ruby/ext/enquo/src/field.rs         342 ++++++++++++++++++++++++++++++++++++++
 ruby/ext/enquo/src/lib.rs           338 ++++---------------------------------
 ruby/ext/enquo/src/root.rs           39 +++++
 ruby/ext/enquo/src/root_key.rs       67 ++++++++
 ruby/lib/enquo.rb                     6 +-
 ruby/lib/enquo/field.rb             173 -------------------
 ruby/lib/enquo/root.rb               28 ----
 ruby/lib/enquo/root_key.rb            1 -
 ruby/lib/enquo/root_key/static.rb    27 ---
 9 files changed, 479 insertions(+), 542 deletions(-)
Considering that I was translating from a higher level language into a lower level one, the removal of so much code is quite remarkable. Magnus was able to automagically replace rather a lot of raise ArgumentError if something.isnt_right code in those .rb files. So, in conclusion, if you, too, are building Ruby extensions in Rust, while Rutie is a solid choice (and you probably should stick with it if you re already using it), I highly recommend giving Magnus a look for your next extension.

3 April 2023

Matt Brown: Retrospective: Mar 2023

The key decision I made mid-March was to commit to pursuing ventilation monitoring as my primary product development focus. Prior to that decision, I hoped to use my writing plan to drive a breadth-first survey of the opportunities for each of my product ideas before deciding which had the best business potential to focus on first. Two factors changed my mind:
  1. As noted last month, I m finding the writing process much slower and harder than I expected the survey across all the ideas may not complete until mid-year or later!
  2. I ve realised that having begun building co2mon.nz last year, to stop work on the project at this point would leave me feeling that I had not done justice to developing the product and testing the market - seeing it to a conclusion is important to me.
This decision is an explicit choice to prioritize seeing a project through to a conclusion (successful, or otherwise) regardless of whether or not it has the highest potential of the various ideas I could invest time into. I m comfortable making that trade-off in this instance, but I am going to bound my time investment to two months. I ll evaluate at the end of May whether I m seeing sufficient traction and potential to justify continuing further with the idea. I had only one fully uninterrupted work week in March due to a combination of days out due to school trips, two LandSAR call outs and various farm maintenance tasks. April will be similarly disrupted given school holidays and a planned family trip to Brisbane. Sharpening my focus feels particularly necessary given this reality to ensure I m not spread overly thin.

Goal Scoring See last month s retrospective for a refresher on my scoring methodology.

Consulting - 4/10 Goal: Execute a series of successful consulting engagements, building a reputation for myself and leaving happy customers willing to provide testimonials that support a pipeline of future opportunities. Consulting hours were down from February, hitting only 31% of target this month as the client didn t make use of all the hours I had allocated for them. I didn t invest any time in advertising my services or developing new clients or projects over the month, which will now become a priority for April.

Product Development - 4/10 Goal: Grow my product development skill set by taking several ideas to MVP stage with customer feedback received, and launch at least one product which generates revenue and has growth potential. With the new focus entirely on co2mon.nz, I spent a lot of time re-working and developing my thinking around how I want to take this forward, specifically trying to analyse where I saw an opportunity in the market. After attending a workshop on finding product market fit using quantifiable metrics at the Southern SaaS conference this month, I ve realised that much of the time I spent on this analysis is too insular and focused on my own observations - I need to get out and talk to a lot more people and get more feedback on their needs and understanding of the space instead. Seems obvious in retrospect! I also spent a few days beginning to build another batch of prototype CO2 monitors so I have some units to use for experimentation and testing with potential customers as I get out and have those conversations. I can probably build one or two more batches of prototype monitors before needing to look at PCB assembly in earnest.

Professional Network Development - 8/10 Goal: To build a professional relationship with at least 30 new people this year. This goal continues to be my highlight with 8 new contacts added this month and catch-ups with 4 existing people I had not spoken to for a while. I joined the KiwiSaaS central community and attended the SouthernSaas conference this month as well, which has been time well spent given the workshop learnings discussed above.

Writing - 3/10 Goal: To publish a high-quality piece of writing on this site at least once a week. I published a single post, the first half of my updated ventilation monitoring business plan. I continue to find the writing process much harder and slower than I hoped or expected and remain well below my target publishing rate, but one post is better than zero! I tested working with an editor I contracted via UpWork who provided some very useful feedback on the structure of my writing which helped to unblock some of my progress. I plan to continue doing this for at least a few more posts.

Community - 5/10 Goal: To support the growth of my local technical community by volunteering my experience and knowledge with others through activities such as mentoring, conference talks and similar.

Feedback As always, I d love to hear from you if you have thoughts or feedback triggered by anything I ve written above.

25 March 2023

Gunnar Wolf: Now that we are talking about kernel building... What about firebuild?

After my last post, B lint (who prompted it with his last post) suggested I should do a hybrid test of his tests and my extremes. He suggested I should build the Linux kernel using my Raspberry Pi 4 (8GB model), but using the Firebuild build accelerator. Before going any further: I must make clear that while Firebuild is freely redistributable, it is not made available under a free license. It is free for personal use or commercial trial, but otherwise requires licensing. B lint managed to build a Linux kernel in just over 8 seconds. So, how did my test go? My previous experiment, using -j 4, built Linux in ~100 minutes; this was about a year ago, and I m now building linux 6.1, so I timed this again. To get a baseline, I built my kernel from a just-unpacked tree, just as usual:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time make -j4
(...)
real    117m30.588s
user    392m41.434s
sys     52m2.556s
Of course, having all of the object files built makes the rebuild process quite faster (this is still done without firebuild). I understand calling make defconfig without cleaning does not change much, but I saw it often referenced in firebuild s docs, so I m leaving it:
# time make -j 4
(...)
real    0m43.822s
user    1m36.577s
sys     0m40.805s
Then, I did a first run using firebuild. Firebuild is a caching build optimizer, so the first run will naturally be somewhat slower (but if you often rebuild your kernel, it should be seen as an investment). Now, in the Raspberry Pi, that uses a slow SD card interface for its storage It is a heavy investment. The first time I built with firebuild, it meant almost a 100% build time hit:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time firebuild make -j 4
(...)
real    212m58.647s
user    391m49.080s
sys     81m10.758s
Not only that; I am using a fairly decent and big 32GB card, but this is quite a big price to pay in such a limited system!
# du -sh .cache/firebuild/
4.2G    .cache/firebuild/
I did a build without cleaning the build directory, using firebuild, and it does help although not by so much as in higher performance systems:
# cd /usr/src/linux-source-6.1
# make clean
# make defconfig
# time firebuild make -j 4
(...)
real    68m6.621s
user    98m32.514s
sys     31m41.643s
So, it built in roughly 65% of the time it would take to build regularly. And what about rebuilding without cleaning?
# make defconfig
# time firebuild make -j 4
(...)
real    1m11.872s
user    2m5.807s
sys     1m46.178s
In this case, using firebuild worked roughly 30% slower than not using it. I guess the high number of file ops inside .cache/firebuild are to blame, as in the case of the media I m using, those are quite expensive; make went its way basically checking date stamps between *.c and *.o (yes, very roughly), and while running under firebuild, I suppose each of these meant an extra lookup inside the cache. So Experiment requested, experiment performed!

Russ Allbery: Review: Thief of Time

Review: Thief of Time, by Terry Pratchett
Series: Discworld #26
Publisher: Harper
Copyright: May 2001
Printing: August 2014
ISBN: 0-06-230739-8
Format: Mass market
Pages: 420
Thief of Time is the 26th Discworld novel and the last Death novel, although he still appears in subsequent books. It's the third book starring Susan Sto Helit, so I don't recommend starting here. Mort is the best starting point for the Death subseries, and Reaper Man provides a useful introduction to the villains. Jeremy Clockson was an orphan raised by the Guild of Clockmakers. He is very good at making clocks. He's not very good at anything else, particularly people, but his clocks are the most accurate in Ankh-Morpork. He is therefore the logical choice to receive a commission by a mysterious noblewoman who wants him to make the most accurate possible clock: a clock that can measure the tick of the universe, one that a fairy tale says had been nearly made before. The commission is followed by a surprise delivery of an Igor, to help with the clock-making. People who live in places with lots of fields become farmers. People who live where there is lots of iron and coal become blacksmiths. And people who live in the mountains near the Hub, near the gods and full of magic, become monks. In the highest valley are the History Monks, founded by Wen the Eternally Surprised. Like most monks, they take apprentices with certain talents and train them in their discipline. But Lobsang Ludd, an orphan discovered in the Thieves Guild in Ankh-Morpork, is proving a challenge. The monks decide to apprentice him to Lu-Tze the sweeper; perhaps that will solve multiple problems at once. Since Hogfather, Susan has moved from being a governess to a schoolteacher. She brings to that job the same firm patience, total disregard for rules that apply to other people, and impressive talent for managing children. She is by far the most popular teacher among the kids, and not only because she transports her class all over the Disc so that they can see things in person. It is a job that she likes and understands, and one that she's quite irate to have interrupted by a summons from her grandfather. But the Auditors are up to something, and Susan may be able to act in ways that Death cannot. This was great. Susan has quickly become one of my favorite Discworld characters, and this time around there is no (or, well, not much) unbelievable romance or permanently queasy god to distract. The clock-making portions of the book quickly start to focus on Igor, who is a delightful perspective through whom to watch events unfold. And the History Monks! The metaphysics of what they are actually doing (which I won't spoil, since discovering it slowly is a delight) is perhaps my favorite bit of Discworld world building to date. I am a sucker for stories that focus on some process that everyone thinks happens automatically and investigate the hidden work behind it. I do want to add a caveat here that the monks are in part a parody of Himalayan Buddhist monasteries, Lu-Tze is rather obviously a parody of Laozi and Daoism in general, and Pratchett's parodies of non-western cultures are rather ham-handed. This is not quite the insulting mess that the Chinese parody in Interesting Times was, but it's heavy on the stereotypes. It does not, thankfully, rely on the stereotypes; the characters are great fun on their own terms, with the perfect (for me) balance of irreverence and thoughtfulness. Lu-Tze refusing to be anything other than a sweeper and being irritatingly casual about all the rules of the order is a classic bit that Pratchett does very well. But I also have the luxury of ignoring stereotypes of a culture that isn't mine, and I think Pratchett is on somewhat thin ice. As one specific example, having Lu-Tze's treasured sayings be a collection of banal aphorisms from a random Ankh-Morpork woman is both hilarious and also arguably rather condescending, and I'm not sure where I landed. It's a spot-on bit of parody of how a lot of people who get very into "eastern religions" sound, but it's also equating the Dao De Jing with advice from the Discworld equivalent of a English housewife. I think the generous reading is that Lu-Tze made the homilies profound by looking at them in an entirely different way than the woman saying them, and that's not completely unlike Daoism and works surprisingly well. But that's reading somewhat against the grain; Pratchett is clearly making fun of philosophical koans, and while anything is fair game for some friendly poking, it still feels a bit weird. That isn't the part of the History Monks that I loved, though. Their actual role in the story doesn't come out of the parody. It's something entirely native to Discworld, and it's an absolute delight. The scene with Lobsang and the procrastinators is perhaps my favorite Discworld set piece to date. Everything about the technology of the History Monks, even the Bond parody, is so good. I grew up reading the Marvel Comics universe, and Thief of Time reminds me of a classic John Byrne or Jim Starlin story, where the heroes are dumped into the middle of vast interdimensional conflicts involving barely-anthropomorphized cosmic powers and the universe is revealed to work in ever more intricate ways at vastly expanding scales. The Auditors are villains in exactly that tradition, and just like the best of those stories, the fulcrum of the plot is questions about what it means to be human, what it means to be alive, and the surprising alliances these non-human powers make with humans or semi-humans. I devoured this kind of story as a kid, and it turns out I still love it. The one complaint I have about the plot is that the best part of this book is the middle, and the end didn't entirely work for me. Ronnie Soak is at his best as a supporting character about three quarters of the way through the book, and I found the ending of his subplot much less interesting. The cosmic confrontation was oddly disappointing, and there's a whole extended sequence involving chocolate that I think was funnier in Pratchett's head than it was in mine. The ending isn't bad, but the middle of this book is my favorite bit of Discworld writing yet, and I wish the story had carried that momentum through to the end. I had so much fun with this book. The Discworld novels are clearly getting better. None of them have yet vaulted into the ranks of my all-time favorite books there's always some lingering quibble or sagging bit but it feels like they've gone from reliably good books to more reliably great books. The acid test is coming, though: the next book is a Rincewind book, which are usually the weak spots. Followed by The Last Hero in publication order. There is no direct thematic sequel. Rating: 8 out of 10

21 March 2023

B lint R czey: Building the Linux kernel in under 10 seconds with Firebuild

Russell published an interesting post about his first experience with Firebuild accelerating refpolicy s and the Linux kernel s build. It turned out a few small tweaks could accelerate the builds even more, crossing the 10 second barrier with Linux s build.
Build performance with 18 cores The Linux kernel s build time is a widely used benchmark for compilers, making it a prime candidate to test a build accelerator as well. In the first run on Russell s 18 core test system the observed user+sys CPU time was cut by 44% with an actual increase in wall clock time which was quite unusual. Firebuild performed much better than that in prior tests. To replicate the results I ve set up a clean Debian Bookworm VM on my machine:
lxc launch images:debian/bookworm  vm -c limits.cpu=18 -c limits.memory=16GB bookworm-vm
Compiling Linux 6.1.10 in this clean Debian VM showed build times closer to what I expected to see, ~72% less wall clock time and ~97% less user+sys CPU time:
$ make defconfig && time make bzImage -j18
real	1m31.157s
user	20m54.256s
sys	2m25.986s
$ make defconfig && time firebuild make bzImage -j18
# first run:
real	2m3.948s
user	21m28.845s
sys	4m16.526s
# second run
real	0m25.783s
user	0m56.618s
sys	0m21.622s
There are multiple differences between Russell s and my test system including having different CPUs (E5-2696v3 vs. virtualized Ryzen 5900X) and different file systems (BTRFS RAID-1 vs ext4), but I don t think those could explain the observed mismatch in performance. The difference may be worth further analysis, but let s get back to squeezing out more performance from Firebuild. Firebuild was developed on Ubuntu. I was wondering if Firebuild was faster there, but I got only slightly better build times in an identical VM running Ubuntu 22.10 (Kinetic Kudu):
$ make defconfig && time make bzImage -j18
real	1m31.130s
user	20m52.930s
sys	2m12.294s
$ make defconfig && time firebuild make bzImage -j18
# first run:
real	2m3.274s
user	21m18.810s
sys	3m45.351s
# second run
real	0m25.532s
user	0m53.087s
sys	0m18.578s
The KVM virtualization certainly introduces an overhead, thus builds must be faster in LXC containers. Indeed, all builds are faster by a few percents:
$ lxc launch ubuntu:kinetic kinetic-container
...
$ make defconfig && time make bzImage -j18
real	1m27.462s
user	20m25.190s
sys	2m13.014s
$ make defconfig && time firebuild make bzImage -j18
# first run:
real	1m53.253s
user	21m42.730s
sys	3m41.067s
# second run
real	0m24.702s
user	0m49.120s
sys	0m16.840s
# Cache size:    1.85 GB
Apparently this ~72% reduction in wall clock time is what one should expect by simply prefixing the build command with firebuild on a similar configuration, but we should not stop here. Firebuild does not accelerate quicker commands by default to save cache space. This howto suggests letting firebuild accelerate all commands including even "sh by passing "-o 'processes.skip_cache = []' to firebuild.
Accelerating all commands in this build s case increases cache size by only 9%, and increases the wall clock time saving to 91%, not only making the build more than 10X faster, but finishing it in less than 8 seconds, which may be a new world record!:
$ make defconfig && time firebuild -o 'processes.skip_cache = []' make bzImage -j18
# first run:
real	1m54.937s
user	21m35.988s
sys	3m42.335s
# second run
real	0m7.861s
user	0m15.332s
sys	0m7.707s
# Cache size:    2.02 GB
There are even faster CPUs on the market than this 5900X. If you happen to have access to one please leave a comment if you could go below 5 seconds! Scaling to higher core counts and comparison with ccache Russell raised the very valid point about Firebuild s single threaded supervisor being a bottleneck on high core systems and comparison to ccache also came up in comments. Since ccache does not have a central supervisor it could scale better with more cores, but let s see if ccache could go below 10 seconds with the build times
firebuild -o processes.skip_cache = [] and ccache scaling to 24 cores
Well, no. The best time time for ccache is 18.81s, with -j24. Both firebuild and ccache keep gaining from extra cores up to 8 cores, but beyond that the wall clock time improvements diminish. The more interesting difference is that firebuild s user and sys time is basically constant from -j1 to -j24 similarly to ccache s user time, but ccache s sys time increases linearly or exponentially with the number of used cores. I suspect this is due to the many parallel ccache processes performing file operations to check if cache entries could be reused, while in firebuild s case the supervisor performs most of that work not requiring in-kernel synchronization across multiple cores. It is true, that the single threaded firebuild supervisor is a bottleneck, but the supervisor also implements a central filesystem cache, thus checking if a command s cache entry can be reused can be implemented with much fewer system calls and much less user space hashing making the architecture more efficient overall than ccache s. The beauty of Firebuild is not being faster than ccache, but being faster than ccache with basically no hard-coded information about how C compilers work. It can accelerate any other compiler or program that generates deterministic output from its input, just by observing what they did in their prior runs. It is like having ccache for every compiler including in-house developed ones, and also for random slow scripts.

13 March 2023

Antoine Beaupr : Framework 12th gen laptop review

The Framework is a 13.5" laptop body with swappable parts, which makes it somewhat future-proof and certainly easily repairable, scoring an "exceedingly rare" 10/10 score from ifixit.com. There are two generations of the laptop's main board (both compatible with the same body): the Intel 11th and 12th gen chipsets. I have received my Framework, 12th generation "DIY", device in late September 2022 and will update this page as I go along in the process of ordering, burning-in, setting up and using the device over the years. Overall, the Framework is a good laptop. I like the keyboard, the touch pad, the expansion cards. Clearly there's been some good work done on industrial design, and it's the most repairable laptop I've had in years. Time will tell, but it looks sturdy enough to survive me many years as well. This is also one of the most powerful devices I ever lay my hands on. I have managed, remotely, more powerful servers, but this is the fastest computer I have ever owned, and it fits in this tiny case. It is an amazing machine. On the downside, there's a bit of proprietary firmware required (WiFi, Bluetooth, some graphics) and the Framework ships with a proprietary BIOS, with currently no Coreboot support. Expect to need the latest kernel, firmware, and hacking around a bunch of things to get resolution and keybindings working right. Like others, I have first found significant power management issues, but many issues can actually be solved with some configuration. Some of the expansion ports (HDMI, DP, MicroSD, and SSD) use power when idle, so don't expect week-long suspend, or "full day" battery while those are plugged in. Finally, the expansion ports are nice, but there's only four of them. If you plan to have a two-monitor setup, you're likely going to need a dock. Read on for the detailed review. For context, I'm moving from the Purism Librem 13v4 because it basically exploded on me. I had, in the meantime, reverted back to an old ThinkPad X220, so I sometimes compare the Framework with that venerable laptop as well. This blog post has been maturing for months now. It started in September 2022 and I declared it completed in March 2023. It's the longest single article on this entire website, currently clocking at about 13,000 words. It will take an average reader a full hour to go through this thing, so I don't expect anyone to actually do that. This introduction should be good enough for most people, read the first section if you intend to actually buy a Framework. Jump around the table of contents as you see fit for after you did buy the laptop, as it might include some crucial hints on how to make it work best for you, especially on (Debian) Linux.

Advice for buyers Those are things I wish I would have known before buying:
  1. consider buying 4 USB-C expansion cards, or at least a mix of 4 USB-A or USB-C cards, as they use less power than other cards and you do want to fill those expansion slots otherwise they snag around and feel insecure
  2. you will likely need a dock or at least a USB hub if you want a two-monitor setup, otherwise you'll run out of ports
  3. you have to do some serious tuning to get proper (10h+ idle, 10 days suspend) power savings
  4. in particular, beware that the HDMI, DisplayPort and particularly the SSD and MicroSD cards take a significant amount power, even when sleeping, up to 2-6W for the latter two
  5. beware that the MicroSD card is what it says: Micro, normal SD cards won't fit, and while there might be full sized one eventually, it's currently only at the prototyping stage
  6. the Framework monitor has an unusual aspect ratio (3:2): I like it (and it matches classic and digital photography aspect ratio), but it might surprise you

Current status I have the framework! It's setup with a fresh new Debian bookworm installation. I've ran through a large number of tests and burn in. I have decided to use the Framework as my daily driver, and had to buy a USB-C dock to get my two monitors connected, which was own adventure. Update: Framework just (2023-03-23) just announced a whole bunch of new stuff: The recording is available in this video and it's not your typical keynote. It starts ~25 minutes late, audio is crap, lightning and camera are crap, clapping seems to be from whatever staff they managed to get together in a room, decor is bizarre, colors are shit. It's amazing.

Specifications Those are the specifications of the 12th gen, in general terms. Your build will of course vary according to your needs.
  • CPU: i5-1240P, i7-1260P, or i7-1280P (Up to 4.4-4.8 GHz, 4+8 cores), Iris Xe graphics
  • Storage: 250-4000GB NVMe (or bring your own)
  • Memory: 8-64GB DDR4-3200 (or bring your own)
  • WiFi 6e (AX210, vPro optional, or bring your own)
  • 296.63mm X 228.98mm X 15.85mm, 1.3Kg
  • 13.5" display, 3:2 ratio, 2256px X 1504px, 100% sRGB, >400 nit
  • 4 x USB-C user-selectable expansion ports, including
    • USB-C
    • USB-A
    • HDMI
    • DP
    • Ethernet
    • MicroSD
    • 250-1000GB SSD
  • 3.5mm combo headphone jack
  • Kill switches for microphone and camera
  • Battery: 55Wh
  • Camera: 1080p 60fps
  • Biometrics: Fingerprint Reader
  • Backlit keyboard
  • Power Adapter: 60W USB-C (or bring your own)
  • ships with a screwdriver/spludger
  • 1 year warranty
  • base price: 1000$CAD, but doesn't give you much, typical builds around 1500-2000$CAD

Actual build This is the actual build I ordered. Amounts in CAD. (1CAD = ~0.75EUR/USD.)

Base configuration
  • CPU: Intel Core i5-1240P (AKA Alder Lake P 8 4.4GHz P-threads, 8 3.2GHz E-threads, 16 total, 28-64W), 1079$
  • Memory: 16GB (1 x 16GB) DDR4-3200, 104$

Customization
  • Keyboard: US English, included

Expansion Cards
  • 2 USB-C $24
  • 3 USB-A $36
  • 2 HDMI $50
  • 1 DP $50
  • 1 MicroSD $25
  • 1 Storage 1TB $199
  • Sub-total: 384$

Accessories
  • Power Adapter - US/Canada $64.00

Total
  • Before tax: 1606$
  • After tax and duties: 1847$
  • Free shipping

Quick evaluation This is basically the TL;DR: here, just focusing on broad pros/cons of the laptop.

Pros

Cons
  • the 11th gen is out of stock, except for the higher-end CPUs, which are much less affordable (700$+)
  • the 12th gen has compatibility issues with Debian, followup in the DebianOn page, but basically: brightness hotkeys, power management, wifi, the webcam is okay even though the chipset is the infamous alder lake because it does not have the fancy camera; most issues currently seem solvable, and upstream is working with mainline to get their shit working
  • 12th gen might have issues with thunderbolt docks
  • they used to have some difficulty keeping up with the orders: first two batches shipped, third batch sold out, fourth batch should have shipped (?) in October 2021. they generally seem to keep up with shipping. update (august 2022): they rolled out a second line of laptops (12th gen), first batch shipped, second batch shipped late, September 2022 batch was generally on time, see this spreadsheet for a crowdsourced effort to track those supply chain issues seem to be under control as of early 2023. I got the Ethernet expansion card shipped within a week.
  • compared to my previous laptop (Purism Librem 13v4), it feels strangely bulkier and heavier; it's actually lighter than the purism (1.3kg vs 1.4kg) and thinner (15.85mm vs 18mm) but the design of the Purism laptop (tapered edges) makes it feel thinner
  • no space for a 2.5" drive
  • rather bright LED around power button, but can be dimmed in the BIOS (not low enough to my taste) I got used to it
  • fan quiet when idle, but can be noisy when running, for example if you max a CPU for a while
  • battery described as "mediocre" by Ars Technica (above), confirmed poor in my tests (see below)
  • no RJ-45 port, and attempts at designing ones are failing because the modular plugs are too thin to fit (according to Linux After Dark), so unlikely to have one in the future Update: they cracked that nut and ship an 2.5 gbps Ethernet expansion card with a realtek chipset, without any firmware blob (!)
  • a bit pricey for the performance, especially when compared to the competition (e.g. Dell XPS, Apple M1)
  • 12th gen Intel has glitchy graphics, seems like Intel hasn't fully landed proper Linux support for that chipset yet

Initial hardware setup A breeze.

Accessing the board The internals are accessed through five TorX screws, but there's a nice screwdriver/spudger that works well enough. The screws actually hold in place so you can't even lose them. The first setup is a bit counter-intuitive coming from the Librem laptop, as I expected the back cover to lift and give me access to the internals. But instead the screws is release the keyboard and touch pad assembly, so you actually need to flip the laptop back upright and lift the assembly off (!) to get access to the internals. Kind of scary. I also actually unplugged a connector in lifting the assembly because I lifted it towards the monitor, while you actually need to lift it to the right. Thankfully, the connector didn't break, it just snapped off and I could plug it back in, no harm done. Once there, everything is well indicated, with QR codes all over the place supposedly leading to online instructions.

Bad QR codes Unfortunately, the QR codes I tested (in the expansion card slot, the memory slot and CPU slots) did not actually work so I wonder how useful those actually are. After all, they need to point to something and that means a URL, a running website that will answer those requests forever. I bet those will break sooner than later and in fact, as far as I can tell, they just don't work at all. I prefer the approach taken by the MNT reform here which designed (with the 100 rabbits folks) an actual paper handbook (PDF). The first QR code that's immediately visible from the back of the laptop, in an expansion cord slot, is a 404. It seems to be some serial number URL, but I can't actually tell because, well, the page is a 404. I was expecting that bar code to lead me to an introduction page, something like "how to setup your Framework laptop". Support actually confirmed that it should point a quickstart guide. But in a bizarre twist, they somehow sent me the URL with the plus (+) signs escaped, like this:
https://guides.frame.work/Guide/Framework\+Laptop\+DIY\+Edition\+Quick\+Start\+Guide/57
... which Firefox immediately transforms in:
https://guides.frame.work/Guide/Framework/+Laptop/+DIY/+Edition/+Quick/+Start/+Guide/57
I'm puzzled as to why they would send the URL that way, the proper URL is of course:
https://guides.frame.work/Guide/Framework+Laptop+DIY+Edition+Quick+Start+Guide/57
(They have also "let the team know about this for feedback and help resolve the problem with the link" which is a support code word for "ha-ha! nope! not my problem right now!" Trust me, I know, my own code word is "can you please make a ticket?")

Seating disks and memory The "DIY" kit doesn't actually have that much of a setup. If you bought RAM, it's shipped outside the laptop in a little plastic case, so you just seat it in as usual. Then you insert your NVMe drive, and, if that's your fancy, you also install your own mPCI WiFi card. If you ordered one (which was my case), it's pre-installed. Closing the laptop is also kind of amazing, because the keyboard assembly snaps into place with magnets. I have actually used the laptop with the keyboard unscrewed as I was putting the drives in and out, and it actually works fine (and will probably void your warranty, so don't do that). (But you can.) (But don't, really.)

Hardware review

Keyboard and touch pad The keyboard feels nice, for a laptop. I'm used to mechanical keyboard and I'm rather violent with those poor things. Yet the key travel is nice and it's clickety enough that I don't feel too disoriented. At first, I felt the keyboard as being more laggy than my normal workstation setup, but it turned out this was a graphics driver issues. After enabling a composition manager, everything feels snappy. The touch pad feels good. The double-finger scroll works well enough, and I don't have to wonder too much where the middle button is, it just works. Taps don't work, out of the box: that needs to be enabled in Xorg, with something like this:
cat > /etc/X11/xorg.conf.d/40-libinput.conf <<EOF
Section "InputClass"
      Identifier "libinput touch pad catchall"
      MatchIsTouchpad "on"
      MatchDevicePath "/dev/input/event*"
      Driver "libinput"
      Option "Tapping" "on"
      Option "TappingButtonMap" "lmr"
EndSection
EOF
But be aware that once you enable that tapping, you'll need to deal with palm detection... So I have not actually enabled this in the end.

Power button The power button is a little dangerous. It's quite easy to hit, as it's right next to one expansion card where you are likely to plug in a cable power. And because the expansion cards are kind of hard to remove, you might squeeze the laptop (and the power key) when trying to remove the expansion card next to the power button. So obviously, don't do that. But that's not very helpful. An alternative is to make the power button do something else. With systemd-managed systems, it's actually quite easy. Add a HandlePowerKey stanza to (say) /etc/systemd/logind.conf.d/power-suspends.conf:
[Login]
HandlePowerKey=suspend
HandlePowerKeyLongPress=poweroff
You might have to create the directory first:
mkdir /etc/systemd/logind.conf.d/
Then restart logind:
systemctl restart systemd-logind
And the power button will suspend! Long-press to power off doesn't actually work as the laptop immediately suspends... Note that there's probably half a dozen other ways of doing this, see this, this, or that.

Special keybindings There is a series of "hidden" (as in: not labeled on the key) keybindings related to the fn keybinding that I actually find quite useful.
Key Equivalent Effect Command
p Pause lock screen xset s activate
b Break ? ?
k ScrLk switch keyboard layout N/A
It looks like those are defined in the microcontroller so it would be possible to add some. For example, the SysRq key is almost bound to fn s in there. Note that most other shortcuts like this are clearly documented (volume, brightness, etc). One key that's less obvious is F12 that only has the Framework logo on it. That actually calls the keysym XF86AudioMedia which, interestingly, does absolutely nothing here. By default, on Windows, it opens your browser to the Framework website and, on Linux, your "default media player". The keyboard backlight can be cycled with fn-space. The dimmer version is dim enough, and the keybinding is easy to find in the dark. A skinny elephant would be performed with alt PrtScr (above F11) KEY, so for example alt fn F11 b should do a hard reset. This comment suggests you need to hold the fn only if "function lock" is on, but that's actually the opposite of my experience. Out of the box, some of the fn keys don't work. Mute, volume up/down, brightness, monitor changes, and the airplane mode key all do basically nothing. They don't send proper keysyms to Xorg at all. This is a known problem and it's related to the fact that the laptop has light sensors to adjust the brightness automatically. Somehow some of those keys (e.g. the brightness controls) are supposed to show up as a different input device, but don't seem to work correctly. It seems like the solution is for the Framework team to write a driver specifically for this, but so far no progress since July 2022. In the meantime, the fancy functionality can be supposedly disabled with:
echo 'blacklist hid_sensor_hub'   sudo tee /etc/modprobe.d/framework-als-blacklist.conf
... and a reboot. This solution is also documented in the upstream guide. Note that there's another solution flying around that fixes this by changing permissions on the input device but I haven't tested that or seen confirmation it works.

Kill switches The Framework has two "kill switches": one for the camera and the other for the microphone. The camera one actually disconnects the USB device when turned off, and the mic one seems to cut the circuit. It doesn't show up as muted, it just stops feeding the sound. Both kill switches are around the main camera, on top of the monitor, and quite discreet. Then turn "red" when enabled (i.e. "red" means "turned off").

Monitor The monitor looks pretty good to my untrained eyes. I have yet to do photography work on it, but some photos I looked at look sharp and the colors are bright and lively. The blacks are dark and the screen is bright. I have yet to use it in full sunlight. The dimmed light is very dim, which I like.

Screen backlight I bind brightness keys to xbacklight in i3, but out of the box I get this error:
sep 29 22:09:14 angela i3[5661]: No outputs have backlight property
It just requires this blob in /etc/X11/xorg.conf.d/backlight.conf:
Section "Device"
    Identifier  "Card0"
    Driver      "intel"
    Option      "Backlight"  "intel_backlight"
EndSection
This way I can control the actual backlight power with the brightness keys, and they do significantly reduce power usage.

Multiple monitor support I have been able to hook up my two old monitors to the HDMI and DisplayPort expansion cards on the laptop. The lid closes without suspending the machine, and everything works great. I actually run out of ports, even with a 4-port USB-A hub, which gives me a total of 7 ports:
  1. power (USB-C)
  2. monitor 1 (DisplayPort)
  3. monitor 2 (HDMI)
  4. USB-A hub, which adds:
  5. keyboard (USB-A)
  6. mouse (USB-A)
  7. Yubikey
  8. external sound card
Now the latter, I might be able to get rid of if I switch to a combo-jack headset, which I do have (and still need to test). But still, this is a problem. I'll probably need a powered USB-C dock and better monitors, possibly with some Thunderbolt chaining, to save yet more ports. But that means more money into this setup, argh. And figuring out my monitor situation is the kind of thing I'm not that big of a fan of. And neither is shopping for USB-C (or is it Thunderbolt?) hubs. My normal autorandr setup doesn't work: I have tried saving a profile and it doesn't get autodetected, so I also first need to do:
autorandr -l framework-external-dual-lg-acer
The magic:
autorandr -l horizontal
... also works well. The worst problem with those monitors right now is that they have a radically smaller resolution than the main screen on the laptop, which means I need to reset the font scaling to normal every time I switch back and forth between those monitors and the laptop, which means I actually need to do this:
autorandr -l horizontal &&
eho Xft.dpi: 96   xrdb -merge &&
systemctl restart terminal xcolortaillog background-image emacs &&
i3-msg restart
Kind of disruptive.

Expansion ports I ordered a total of 10 expansion ports. I did manage to initialize the 1TB drive as an encrypted storage, mostly to keep photos as this is something that takes a massive amount of space (500GB and counting) and that I (unfortunately) don't work on very often (but still carry around). The expansion ports are fancy and nice, but not actually that convenient. They're a bit hard to take out: you really need to crimp your fingernails on there and pull hard to take them out. There's a little button next to them to release, I think, but at first it feels a little scary to pull those pucks out of there. You get used to it though, and it's one of those things you can do without looking eventually. There's only four expansion ports. Once you have two monitors, the drive, and power plugged in, bam, you're out of ports; there's nowhere to plug my Yubikey. So if this is going to be my daily driver, with a dual monitor setup, I will need a dock, which means more crap firmware and uncertainty, which isn't great. There are actually plans to make a dual-USB card, but that is blocked on designing an actual board for this. I can't wait to see more expansion ports produced. There's a ethernet expansion card which quickly went out of stock basically the day it was announced, but was eventually restocked. I would like to see a proper SD-card reader. There's a MicroSD card reader, but that obviously doesn't work for normal SD cards, which would be more broadly compatible anyways (because you can have a MicroSD to SD card adapter, but I have never heard of the reverse). Someone actually found a SD card reader that fits and then someone else managed to cram it in a 3D printed case, which is kind of amazing. Still, I really like that idea that I can carry all those little adapters in a pouch when I travel and can basically do anything I want. It does mean I need to shuffle through them to find the right one which is a little annoying. I have an elastic band to keep them lined up so that all the ports show the same side, to make it easier to find the right one. But that quickly gets undone and instead I have a pouch full of expansion cards. Another awesome thing with the expansion cards is that they don't just work on the laptop: anything that takes USB-C can take those cards, which means you can use it to connect an SD card to your phone, for backups, for example. Heck, you could even connect an external display to your phone that way, assuming that's supported by your phone of course (and it probably isn't). The expansion ports do take up some power, even when idle. See the power management section below, and particularly the power usage tests for details.

USB-C charging One thing that is really a game changer for me is USB-C charging. It's hard to overstate how convenient this is. I often have a USB-C cable lying around to charge my phone, and I can just grab that thing and pop it in my laptop. And while it will obviously not charge as fast as the provided charger, it will stop draining the battery at least. (As I wrote this, I had the laptop plugged in the Samsung charger that came with a phone, and it was telling me it would take 6 hours to charge the remaining 15%. With the provided charger, that flew down to 15 minutes. Similarly, I can power the laptop from the power grommet on my desk, reducing clutter as I have that single wire out there instead of the bulky power adapter.) I also really like the idea that I can charge my laptop with a power bank or, heck, with my phone, if push comes to shove. (And vice-versa!) This is awesome. And it works from any of the expansion ports, of course. There's a little led next to the expansion ports as well, which indicate the charge status:
  • red/amber: charging
  • white: charged
  • off: unplugged
I couldn't find documentation about this, but the forum answered. This is something of a recurring theme with the Framework. While it has a good knowledge base and repair/setup guides (and the forum is awesome) but it doesn't have a good "owner manual" that shows you the different parts of the laptop and what they do. Again, something the MNT reform did well. Another thing that people are asking about is an external sleep indicator: because the power LED is on the main keyboard assembly, you don't actually see whether the device is active or not when the lid is closed. Finally, I wondered what happens when you plug in multiple power sources and it turns out the charge controller is actually pretty smart: it will pick the best power source and use it. The only downside is it can't use multiple power sources, but that seems like a bit much to ask.

Multimedia and other devices Those things also work:
  • webcam: splendid, best webcam I've ever had (but my standards are really low)
  • onboard mic: works well, good gain (maybe a bit much)
  • onboard speakers: sound okay, a little metal-ish, loud enough to be annoying, see this thread for benchmarks, apparently pretty good speakers
  • combo jack: works, with slight hiss, see below
There's also a light sensor, but it conflicts with the keyboard brightness controls (see above). There's also an accelerometer, but it's off by default and will be removed from future builds.

Combo jack mic tests The Framework laptop ships with a combo jack on the left side, which allows you to plug in a CTIA (source) headset. In human terms, it's a device that has both a stereo output and a mono input, typically a headset or ear buds with a microphone somewhere. It works, which is better than the Purism (which only had audio out), but is on par for the course for that kind of onboard hardware. Because of electrical interference, such sound cards very often get lots of noise from the board. With a Jabra Evolve 40, the built-in USB sound card generates basically zero noise on silence (invisible down to -60dB in Audacity) while plugging it in directly generates a solid -30dB hiss. There is a noise-reduction system in that sound card, but the difference is still quite striking. On a comparable setup (curie, a 2017 Intel NUC), there is also a his with the Jabra headset, but it's quieter, more in the order of -40/-50 dB, a noticeable difference. Interestingly, testing with my Mee Audio Pro M6 earbuds leads to a little more hiss on curie, more on the -35/-40 dB range, close to the Framework. Also note that another sound card, the Antlion USB adapter that comes with the ModMic 4, also gives me pretty close to silence on a quiet recording, picking up less than -50dB of background noise. It's actually probably picking up the fans in the office, which do make audible noises. In other words, the hiss of the sound card built in the Framework laptop is so loud that it makes more noise than the quiet fans in the office. Or, another way to put it is that two USB sound cards (the Jabra and the Antlion) are able to pick up ambient noise in my office but not the Framework laptop. See also my audio page.

Performance tests

Compiling Linux 5.19.11 On a single core, compiling the Debian version of the Linux kernel takes around 100 minutes:
5411.85user 673.33system 1:37:46elapsed 103%CPU (0avgtext+0avgdata 831700maxresident)k
10594704inputs+87448000outputs (9131major+410636783minor)pagefaults 0swaps
This was using 16 watts of power, with full screen brightness. With all 16 cores (make -j16), it takes less than 25 minutes:
19251.06user 2467.47system 24:13.07elapsed 1494%CPU (0avgtext+0avgdata 831676maxresident)k
8321856inputs+87427848outputs (30792major+409145263minor)pagefaults 0swaps
I had to plug the normal power supply after a few minutes because battery would actually run out using my desk's power grommet (34 watts). During compilation, fans were spinning really hard, quite noisy, but not painfully so. The laptop was sucking 55 watts of power, steadily:
  Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average  87.9   0.0  10.7   1.4   0.1 17.8 6583.6 5054.3 233.0 223.9 233.1  55.96
 GeoMean  87.9   0.0  10.6   1.2   0.0 17.6 6427.8 5048.1 227.6 218.7 227.7  55.96
  StdDev   1.4   0.0   1.2   0.6   0.2  3.0 1436.8  255.5 50.0 47.5 49.7   0.20
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum  85.0   0.0   7.8   0.5   0.0 13.0 3594.0 4638.0 117.0 111.0 120.0  55.52
 Maximum  90.8   0.0  12.9   3.5   0.8 38.0 10174.0 5901.0 374.0 362.0 375.0  56.41
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
CPU:  55.96 Watts on average with standard deviation 0.20
Note: power read from RAPL domains: package-0, uncore, package-0, core, psys.
These readings do not cover all the hardware in this device.

memtest86+ I ran Memtest86+ v6.00b3. It shows something like this:
Memtest86+ v6.00b3        12th Gen Intel(R) Core(TM) i5-1240P
CLK/Temp: 2112MHz    78/78 C   Pass  2% #
L1 Cache:   48KB    414 GB/s   Test 46% ##################
L2 Cache: 1.25MB    118 GB/s   Test #3 [Moving inversions, 1s & 0s] 
L3 Cache:   12MB     43 GB/s   Testing: 16GB - 18GB [1GB of 15.7GB]
Memory  :  15.7GB  14.9 GB/s   Pattern: 
--------------------------------------------------------------------------------
CPU: 4P+8E-Cores (16T)    SMP: 8T (PAR))    Time:  0:27:23  Status: Pass     \
RAM: 1600MHz (DDR4-3200) CAS 22-22-22-51    Pass:  1        Errors: 0
--------------------------------------------------------------------------------
Memory SPD Information
----------------------
 - Slot 2: 16GB DDR-4-3200 - Crucial CT16G4SFRA32A.C16FP (2022-W23)
                          Framework FRANMACP04
 <ESC> Exit  <F1> Configuration  <Space> Scroll Lock            6.00.unknown.x64
So about 30 minutes for a full 16GB memory test.

Software setup Once I had everything in the hardware setup, I figured, voil , I'm done, I'm just going to boot this beautiful machine and I can get back to work. I don't understand why I am so na ve some times. It's mind boggling. Obviously, it didn't happen that way at all, and I spent the best of the three following days tinkering with the laptop.

Secure boot and EFI First, I couldn't boot off of the NVMe drive I transferred from the previous laptop (the Purism) and the BIOS was not very helpful: it was just complaining about not finding any boot device, without dropping me in the real BIOS. At first, I thought it was a problem with my NVMe drive, because it's not listed in the compatible SSD drives from upstream. But I figured out how to enter BIOS (press F2 manically, of course), which showed the NVMe drive was actually detected. It just didn't boot, because it was an old (2010!!) Debian install without EFI. So from there, I disabled secure boot, and booted a grml image to try to recover. And by "boot" I mean, I managed to get to the grml boot loader which promptly failed to load its own root file system somehow. I still have to investigate exactly what happened there, but it failed some time after the initrd load with:
Unable to find medium containing a live file system
This, it turns out, was fixed in Debian lately, so a daily GRML build will not have this problems. The upcoming 2022 release (likely 2022.10 or 2022.11) will also get the fix. I did manage to boot the development version of the Debian installer which was a surprisingly good experience: it mounted the encrypted drives and did everything pretty smoothly. It even offered me to reinstall the boot loader, but that ultimately (and correctly, as it turns out) failed because I didn't have a /boot/efi partition. At this point, I realized there was no easy way out of this, and I just proceeded to completely reinstall Debian. I had a spare NVMe drive lying around (backups FTW!) so I just swapped that in, rebooted in the Debian installer, and did a clean install. I wanted to switch to bookworm anyways, so I guess that's done too.

Storage limitations Another thing that happened during setup is that I tried to copy over the internal 2.5" SSD drive from the Purism to the Framework 1TB expansion card. There's no 2.5" slot in the new laptop, so that's pretty much the only option for storage expansion. I was tired and did something wrong. I ended up wiping the partition table on the original 2.5" drive. Oops. It might be recoverable, but just restoring the partition table didn't work either, so I'm not sure how I recover the data there. Normally, everything on my laptops and workstations is designed to be disposable, so that wasn't that big of a problem. I did manage to recover most of the data thanks to git-annex reinit, but that was a little hairy.

Bootstrapping Puppet Once I had some networking, I had to install all the packages I needed. The time I spent setting up my workstations with Puppet has finally paid off. What I actually did was to restore two critical directories:
/etc/ssh
/var/lib/puppet
So that I would keep the previous machine's identity. That way I could contact the Puppet server and install whatever was missing. I used my Puppet optimization trick to do a batch install and then I had a good base setup, although not exactly as it was before. 1700 packages were installed manually on angela before the reinstall, and not in Puppet. I did not inspect each one individually, but I did go through /etc and copied over more SSH keys, for backups and SMTP over SSH.

LVFS support It looks like there's support for the (de-facto) standard LVFS firmware update system. At least I was able to update the UEFI firmware with a simple:
apt install fwupd-amd64-signed
fwupdmgr refresh
fwupdmgr get-updates
fwupdmgr update
Nice. The 12th gen BIOS updates, currently (January 2023) beta, can be deployed through LVFS with:
fwupdmgr enable-remote lvfs-testing
echo 'DisableCapsuleUpdateOnDisk=true' >> /etc/fwupd/uefi_capsule.conf 
fwupdmgr update
Those instructions come from the beta forum post. I performed the BIOS update on 2023-01-16T16:00-0500.

Resolution tweaks The Framework laptop resolution (2256px X 1504px) is big enough to give you a pretty small font size, so welcome to the marvelous world of "scaling". The Debian wiki page has a few tricks for this.

Console This will make the console and grub fonts more readable:
cat >> /etc/default/console-setup <<EOF
FONTFACE="Terminus"
FONTSIZE=32x16
EOF
echo GRUB_GFXMODE=1024x768 >> /etc/default/grub
update-grub

Xorg Adding this to your .Xresources will make everything look much bigger:
! 1.5*96
Xft.dpi: 144
Apparently, some of this can also help:
! These might also be useful depending on your monitor and personal preference:
Xft.autohint: 0
Xft.lcdfilter:  lcddefault
Xft.hintstyle:  hintfull
Xft.hinting: 1
Xft.antialias: 1
Xft.rgba: rgb
It my experience it also makes things look a little fuzzier, which is frustrating because you have this awesome monitor but everything looks out of focus. Just bumping Xft.dpi by a 1.5 factor looks good to me. The Debian Wiki has a page on HiDPI, but it's not as good as the Arch Wiki, where the above blurb comes from. I am not using the latter because I suspect it's causing some of the "fuzziness". TODO: find the equivalent of this GNOME hack in i3? (gsettings set org.gnome.mutter experimental-features "['scale-monitor-framebuffer']"), taken from this Framework guide

Issues

BIOS configuration The Framework BIOS has some minor issues. One issue I personally encountered is that I had disabled Quick boot and Quiet boot in the BIOS to diagnose the above boot issues. This, in turn, triggers a bug where the BIOS boot manager (F12) would just hang completely. It would also fail to boot from an external USB drive. The current fix (as of BIOS 3.03) is to re-enable both Quick boot and Quiet boot. Presumably this is something that will get fixed in a future BIOS update. Note that the following keybindings are active in the BIOS POST check:
Key Meaning
F2 Enter BIOS setup menu
F12 Enter BIOS boot manager
Delete Enter BIOS setup menu

WiFi compatibility issues I couldn't make WiFi work at first. Obviously, the default Debian installer doesn't ship with proprietary firmware (although that might change soon) so the WiFi card didn't work out of the box. But even after copying the firmware through a USB stick, I couldn't quite manage to find the right combination of ip/iw/wpa-supplicant (yes, after repeatedly copying a bunch more packages over to get those bootstrapped). (Next time I should probably try something like this post.) Thankfully, I had a little USB-C dongle with a RJ-45 jack lying around. That also required a firmware blob, but it was a single package to copy over, and with that loaded, I had network. Eventually, I did managed to make WiFi work; the problem was more on the side of "I forgot how to configure a WPA network by hand from the commandline" than anything else. NetworkManager worked fine and got WiFi working correctly. Note that this is with Debian bookworm, which has the 5.19 Linux kernel, and with the firmware-nonfree (firmware-iwlwifi, specifically) package.

Battery life I was having between about 7 hours of battery on the Purism Librem 13v4, and that's after a year or two of battery life. Now, I still have about 7 hours of battery life, which is nicer than my old ThinkPad X220 (20 minutes!) but really, it's not that good for a new generation laptop. The 12th generation Intel chipset probably improved things compared to the previous one Framework laptop, but I don't have a 11th gen Framework to compare with). (Note that those are estimates from my status bar, not wall clock measurements. They should still be comparable between the Purism and Framework, that said.) The battery life doesn't seem up to, say, Dell XPS 13, ThinkPad X1, and of course not the Apple M1, where I would expect 10+ hours of battery life out of the box. That said, I do get those kind estimates when the machine is fully charged and idle. In fact, when everything is quiet and nothing is plugged in, I get dozens of hours of battery life estimated (I've seen 25h!). So power usage fluctuates quite a bit depending on usage, which I guess is expected. Concretely, so far, light web browsing, reading emails and writing notes in Emacs (e.g. this file) takes about 8W of power:
Time    User  Nice   Sys  Idle    IO  Run Ctxt/s  IRQ/s Fork Exec Exit  Watts
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Average   1.7   0.0   0.5  97.6   0.2  1.2 4684.9 1985.2 126.6 39.1 128.0   7.57
 GeoMean   1.4   0.0   0.4  97.6   0.1  1.2 4416.6 1734.5 111.6 27.9 113.3   7.54
  StdDev   1.0   0.2   0.2   1.2   0.0  0.5 1584.7 1058.3 82.1 44.0 80.2   0.71
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
 Minimum   0.2   0.0   0.2  94.9   0.1  1.0 2242.0  698.2 82.0 17.0 82.0   6.36
 Maximum   4.1   1.1   1.0  99.4   0.2  3.0 8687.4 4445.1 463.0 249.0 449.0   9.10
-------- ----- ----- ----- ----- ----- ---- ------ ------ ---- ---- ---- ------
Summary:
System:   7.57 Watts on average with standard deviation 0.71
Expansion cards matter a lot in the battery life (see below for a thorough discussion), my normal setup is 2xUSB-C and 1xUSB-A (yes, with an empty slot, and yes, to save power). Interestingly, playing a video in a (720p) window in a window takes up more power (10.5W) than in full screen (9.5W) but I blame that on my desktop setup (i3 + compton)... Not sure if mpv hits the VA-API, maybe not in windowed mode. Similar results with 1080p, interestingly, except the window struggles to keep up altogether. Full screen playback takes a relatively comfortable 9.5W, which means a solid 5h+ of playback, which is fine by me. Fooling around the web, small edits, youtube-dl, and I'm at around 80% battery after about an hour, with an estimated 5h left, which is a little disappointing. I had a 7h remaining estimate before I started goofing around Discourse, so I suspect the website is a pretty big battery drain, actually. I see about 10-12 W, while I was probably at half that (6-8W) just playing music with mpv in the background... In other words, it looks like editing posts in Discourse with Firefox takes a solid 4-6W of power. Amazing and gross. (When writing about abusive power usage generates more power usage, is that an heisenbug? Or schr dinbug?)

Power management Compared to the Purism Librem 13v4, the ongoing power usage seems to be slightly better. An anecdotal metric is that the Purism would take 800mA idle, while the more powerful Framework manages a little over 500mA as I'm typing this, fluctuating between 450 and 600mA. That is without any active expansion card, except the storage. Those numbers come from the output of tlp-stat -b and, unfortunately, the "ampere" unit makes it quite hard to compare those, because voltage is not necessarily the same between the two platforms.
  • TODO: review Arch Linux's tips on power saving
  • TODO: i915 driver has a lot of parameters, including some about power saving, see, again, the arch wiki, and particularly enable_fbc=1
TL:DR; power management on the laptop is an issue, but there's various tweaks you can make to improve it. Try:
  • powertop --auto-tune
  • apt install tlp && systemctl enable tlp
  • nvme.noacpi=1 mem_sleep_default=deep on the kernel command line may help with standby power usage
  • keep only USB-C expansion cards plugged in, all others suck power even when idle
  • consider upgrading the BIOS to latest beta (3.06 at the time of writing), unverified power savings
  • latest Linux kernels (6.2) promise power savings as well (unverified)
Update: also try to follow the official optimization guide. It was made for Ubuntu but will probably also work for your distribution of choice with a few tweaks. They recommend using tlpui but it's not packaged in Debian. There is, however, a Flatpak release. In my case, it resulted in the following diff to tlp.conf: tlp.patch.

Background on CPU architecture There were power problems in the 11th gen Framework laptop, according to this report from Linux After Dark, so the issues with power management on the Framework are not new. The 12th generation Intel CPU (AKA "Alder Lake") is a big-little architecture with "power-saving" and "performance" cores. There used to be performance problems introduced by the scheduler in Linux 5.16 but those were eventually fixed in 5.18, which uses Intel's hardware as an "intelligent, low-latency hardware-assisted scheduler". According to Phoronix, the 5.19 release improved the power saving, at the cost of some penalty cost. There were also patch series to make the scheduler configurable, but it doesn't look those have been merged as of 5.19. There was also a session about this at the 2022 Linux Plumbers, but they stopped short of talking more about the specific problems Linux is facing in Alder lake:
Specifically, the kernel's energy-aware scheduling heuristics don't work well on those CPUs. A number of features present there complicate the energy picture; these include SMT, Intel's "turbo boost" mode, and the CPU's internal power-management mechanisms. For many workloads, running on an ostensibly more power-hungry Pcore can be more efficient than using an Ecore. Time for discussion of the problem was lacking, though, and the session came to a close.
All this to say that the 12gen Intel line shipped with this Framework series should have better power management thanks to its power-saving cores. And Linux has had the scheduler changes to make use of this (but maybe is still having trouble). In any case, this might not be the source of power management problems on my laptop, quite the opposite. Also note that the firmware updates for various chipsets are supposed to improve things eventually. On the other hand, The Verge simply declared the whole P-series a mistake...

Attempts at improving power usage I did try to follow some of the tips in this forum post. The tricks powertop --auto-tune and tlp's PCIE_ASPM_ON_BAT=powersupersave basically did nothing: I was stuck at 10W power usage in powertop (600+mA in tlp-stat). Apparently, I should be able to reach the C8 CPU power state (or even C9, C10) in powertop, but I seem to be stock at C7. (Although I'm not sure how to read that tab in powertop: in the Core(HW) column there's only C3/C6/C7 states, and most cores are 85% in C7 or maybe C6. But the next column over does show many CPUs in C10 states... As it turns out, the graphics card actually takes up a good chunk of power unless proper power management is enabled (see below). After tweaking this, I did manage to get down to around 7W power usage in powertop. Expansion cards actually do take up power, and so does the screen, obviously. The fully-lit screen takes a solid 2-3W of power compared to the fully dimmed screen. When removing all expansion cards and making the laptop idle, I can spin it down to 4 watts power usage at the moment, and an amazing 2 watts when the screen turned off.

Caveats Abusive (10W+) power usage that I initially found could be a problem with my desktop configuration: I have this silly status bar that updates every second and probably causes redraws... The CPU certainly doesn't seem to spin down below 1GHz. Also note that this is with an actual desktop running with everything: it could very well be that some things (I'm looking at you Signal Desktop) take up unreasonable amount of power on their own (hello, 1W/electron, sheesh). Syncthing and containerd (Docker!) also seem to take a good 500mW just sitting there. Beyond my desktop configuration, this could, of course, be a Debian-specific problem; your favorite distribution might be better at power management.

Idle power usage tests Some expansion cards waste energy, even when unused. Here is a summary of the findings from the powerstat page. I also include other devices tested in this page for completeness:
Device Minimum Average Max Stdev Note
Screen, 100% 2.4W 2.6W 2.8W N/A
Screen, 1% 30mW 140mW 250mW N/A
Backlight 1 290mW ? ? ? fairly small, all things considered
Backlight 2 890mW 1.2W 3W? 460mW? geometric progression
Backlight 3 1.69W 1.5W 1.8W? 390mW? significant power use
Radios 100mW 250mW N/A N/A
USB-C N/A N/A N/A N/A negligible power drain
USB-A 10mW 10mW ? 10mW almost negligible
DisplayPort 300mW 390mW 600mW N/A not passive
HDMI 380mW 440mW 1W? 20mW not passive
1TB SSD 1.65W 1.79W 2W 12mW significant, probably higher when busy
MicroSD 1.6W 3W 6W 1.93W highest power usage, possibly even higher when busy
Ethernet 1.69W 1.64W 1.76W N/A comparable to the SSD card
So it looks like all expansion cards but the USB-C ones are active, i.e. they draw power with idle. The USB-A cards are the least concern, sucking out 10mW, pretty much within the margin of error. But both the DisplayPort and HDMI do take a few hundred miliwatts. It looks like USB-A connectors have this fundamental flaw that they necessarily draw some powers because they lack the power negotiation features of USB-C. At least according to this post:
It seems the USB A must have power going to it all the time, that the old USB 2 and 3 protocols, the USB C only provides power when there is a connection. Old versus new.
Apparently, this is a problem specific to the USB-C to USB-A adapter that ships with the Framework. Some people have actually changed their orders to all USB-C because of this problem, but I'm not sure the problem is as serious as claimed in the forums. I couldn't reproduce the "one watt" power drains suggested elsewhere, at least not repeatedly. (A previous version of this post did show such a power drain, but it was in a less controlled test environment than the series of more rigorous tests above.) The worst offenders are the storage cards: the SSD drive takes at least one watt of power and the MicroSD card seems to want to take all the way up to 6 watts of power, both just sitting there doing nothing. This confirms claims of 1.4W for the SSD (but not 5W) power usage found elsewhere. The former post has instructions on how to disable the card in software. The MicroSD card has been reported as using 2 watts, but I've seen it as high as 6 watts, which is pretty damning. The Framework team has a beta update for the DisplayPort adapter but currently only for Windows (LVFS technically possible, "under investigation"). A USB-A firmware update is also under investigation. It is therefore likely at least some of those power management issues will eventually be fixed. Note that the upcoming Ethernet card has a reported 2-8W power usage, depending on traffic. I did my own power usage tests in powerstat-wayland and they seem lower than 2W. The upcoming 6.2 Linux kernel might also improve battery usage when idle, see this Phoronix article for details, likely in early 2023.

Idle power usage tests under Wayland Update: I redid those tests under Wayland, see powerstat-wayland for details. The TL;DR: is that power consumption is either smaller or similar.

Idle power usage tests, 3.06 beta BIOS I redid the idle tests after the 3.06 beta BIOS update and ended up with this results:
Device Minimum Average Max Stdev Note
Baseline 1.96W 2.01W 2.11W 30mW 1 USB-C, screen off, backlight off, no radios
2 USB-C 1.95W 2.16W 3.69W 430mW USB-C confirmed as mostly passive...
3 USB-C 1.95W 2.16W 3.69W 430mW ... although with extra stdev
1TB SSD 3.72W 3.85W 4.62W 200mW unchanged from before upgrade
1 USB-A 1.97W 2.18W 4.02W 530mW unchanged
2 USB-A 1.97W 2.00W 2.08W 30mW unchanged
3 USB-A 1.94W 1.99W 2.03W 20mW unchanged
MicroSD w/o card 3.54W 3.58W 3.71W 40mW significant improvement! 2-3W power saving!
MicroSD w/ card 3.53W 3.72W 5.23W 370mW new measurement! increased deviation
DisplayPort 2.28W 2.31W 2.37W 20mW unchanged
1 HDMI 2.43W 2.69W 4.53W 460mW unchanged
2 HDMI 2.53W 2.59W 2.67W 30mW unchanged
External USB 3.85W 3.89W 3.94W 30mW new result
Ethernet 3.60W 3.70W 4.91W 230mW unchanged
Note that the table summary is different than the previous table: here we show the absolute numbers while the previous table was doing a confusing attempt at showing relative (to the baseline) numbers. Conclusion: the 3.06 BIOS update did not significantly change idle power usage stats except for the MicroSD card which has significantly improved. The new "external USB" test is also interesting: it shows how the provided 1TB SSD card performs (admirably) compared to existing devices. The other new result is the MicroSD card with a card which, interestingly, uses less power than the 1TB SSD drive.

Standby battery usage I wrote some quick hack to evaluate how much power is used during sleep. Apparently, this is one of the areas that should have improved since the first Framework model, let's find out. My baseline for comparison is the Purism laptop, which, in 10 minutes, went from this:
sep 28 11:19:45 angela systemd-sleep[209379]: /sys/class/power_supply/BAT/charge_now                      =   6045 [mAh]
... to this:
sep 28 11:29:47 angela systemd-sleep[209725]: /sys/class/power_supply/BAT/charge_now                      =   6037 [mAh]
That's 8mAh per 10 minutes (and 2 seconds), or 48mA, or, with this battery, about 127 hours or roughly 5 days of standby. Not bad! In comparison, here is my really old x220, before:
sep 29 22:13:54 emma systemd-sleep[176315]: /sys/class/power_supply/BAT0/energy_now                     =   5070 [mWh]
... after:
sep 29 22:23:54 emma systemd-sleep[176486]: /sys/class/power_supply/BAT0/energy_now                     =   4980 [mWh]
... which is 90 mwH in 10 minutes, or a whopping 540mA, which was possibly okay when this battery was new (62000 mAh, so about 100 hours, or about 5 days), but this battery is almost dead and has only 5210 mAh when full, so only 10 hours standby. And here is the Framework performing a similar test, before:
sep 29 22:27:04 angela systemd-sleep[4515]: /sys/class/power_supply/BAT1/charge_full                    =   3518 [mAh]
sep 29 22:27:04 angela systemd-sleep[4515]: /sys/class/power_supply/BAT1/charge_now                     =   2861 [mAh]
... after:
sep 29 22:37:08 angela systemd-sleep[4743]: /sys/class/power_supply/BAT1/charge_now                     =   2812 [mAh]
... which is 49mAh in a little over 10 minutes (and 4 seconds), or 292mA, much more than the Purism, but half of the X220. At this rate, the battery would last on standby only 12 hours!! That is pretty bad. Note that this was done with the following expansion cards:
  • 2 USB-C
  • 1 1TB SSD drive
  • 1 USB-A with a hub connected to it, with keyboard and LAN
Preliminary tests without the hub (over one minute) show that it doesn't significantly affect this power consumption (300mA). This guide also suggests booting with nvme.noacpi=1 but this still gives me about 5mAh/min (or 300mA). Adding mem_sleep_default=deep to the kernel command line does make a difference. Before:
sep 29 23:03:11 angela systemd-sleep[3699]: /sys/class/power_supply/BAT1/charge_now                     =   2544 [mAh]
... after:
sep 29 23:04:25 angela systemd-sleep[4039]: /sys/class/power_supply/BAT1/charge_now                     =   2542 [mAh]
... which is 2mAh in 74 seconds, which is 97mA, brings us to a more reasonable 36 hours, or a day and a half. It's still above the x220 power usage, and more than an order of magnitude more than the Purism laptop. It's also far from the 0.4% promised by upstream, which would be 14mA for the 3500mAh battery. It should also be noted that this "deep" sleep mode is a little more disruptive than regular sleep. As you can see by the timing, it took more than 10 seconds for the laptop to resume, which feels a little alarming as your banging the keyboard to bring it back to life. You can confirm the current sleep mode with:
# cat /sys/power/mem_sleep
s2idle [deep]
In the above, deep is selected. You can change it on the fly with:
printf s2idle > /sys/power/mem_sleep
Here's another test:
sep 30 22:25:50 angela systemd-sleep[32207]: /sys/class/power_supply/BAT1/charge_now                     =   1619 [mAh]
sep 30 22:31:30 angela systemd-sleep[32516]: /sys/class/power_supply/BAT1/charge_now                     =   1613 [mAh]
... better! 6 mAh in about 6 minutes, works out to 63.5mA, so more than two days standby. A longer test:
oct 01 09:22:56 angela systemd-sleep[62978]: /sys/class/power_supply/BAT1/charge_now                     =   3327 [mAh]
oct 01 12:47:35 angela systemd-sleep[63219]: /sys/class/power_supply/BAT1/charge_now                     =   3147 [mAh]
That's 180mAh in about 3.5h, 52mA! Now at 66h, or almost 3 days. I wasn't sure why I was seeing such fluctuations in those tests, but as it turns out, expansion card power tests show that they do significantly affect power usage, especially the SSD drive, which can take up to two full watts of power even when idle. I didn't control for expansion cards in the above tests running them with whatever card I had plugged in without paying attention so it's likely the cause of the high power usage and fluctuations. It might be possible to work around this problem by disabling USB devices before suspend. TODO. See also this post. In the meantime, I have been able to get much better suspend performance by unplugging all modules. Then I get this result:
oct 04 11:15:38 angela systemd-sleep[257571]: /sys/class/power_supply/BAT1/charge_now                     =   3203 [mAh]
oct 04 15:09:32 angela systemd-sleep[257866]: /sys/class/power_supply/BAT1/charge_now                     =   3145 [mAh]
Which is 14.8mA! Almost exactly the number promised by Framework! With a full battery, that means a 10 days suspend time. This is actually pretty good, and far beyond what I was expecting when starting down this journey. So, once the expansion cards are unplugged, suspend power usage is actually quite reasonable. More detailed standby tests are available in the standby-tests page, with a summary below. There is also some hope that the Chromebook edition specifically designed with a specification of 14 days standby time could bring some firmware improvements back down to the normal line. Some of those issues were reported upstream in April 2022, but there doesn't seem to have been any progress there since. TODO: one final solution here is suspend-then-hibernate, which Windows uses for this TODO: consider implementing the S0ix sleep states , see also troubleshooting TODO: consider https://github.com/intel/pm-graph

Standby expansion cards test results This table is a summary of the more extensive standby-tests I have performed:
Device Wattage Amperage Days Note
baseline 0.25W 16mA 9 sleep=deep nvme.noacpi=1
s2idle 0.29W 18.9mA ~7 sleep=s2idle nvme.noacpi=1
normal nvme 0.31W 20mA ~7 sleep=s2idle without nvme.noacpi=1
1 USB-C 0.23W 15mA ~10
2 USB-C 0.23W 14.9mA same as above
1 USB-A 0.75W 48.7mA 3 +500mW (!!) for the first USB-A card!
2 USB-A 1.11W 72mA 2 +360mW
3 USB-A 1.48W 96mA <2 +370mW
1TB SSD 0.49W 32mA <5 +260mW
MicroSD 0.52W 34mA ~4 +290mW
DisplayPort 0.85W 55mA <3 +620mW (!!)
1 HDMI 0.58W 38mA ~4 +250mW
2 HDMI 0.65W 42mA <4 +70mW (?)
Conclusions:
  • USB-C cards take no extra power on suspend, possibly less than empty slots, more testing required
  • USB-A cards take a lot more power on suspend (300-500mW) than on regular idle (~10mW, almost negligible)
  • 1TB SSD and MicroSD cards seem to take a reasonable amount of power (260-290mW), compared to their runtime equivalents (1-6W!)
  • DisplayPort takes a surprising lot of power (620mW), almost double its average runtime usage (390mW)
  • HDMI cards take, surprisingly, less power (250mW) in standby than the DP card (620mW)
  • and oddly, a second card adds less power usage (70mW?!) than the first, maybe a circuit is used by both?
A discussion of those results is in this forum post.

Standby expansion cards test results, 3.06 beta BIOS Framework recently (2022-11-07) announced that they will publish a firmware upgrade to address some of the USB-C issues, including power management. This could positively affect the above result, improving both standby and runtime power usage. The update came out in December 2022 and I redid my analysis with the following results:
Device Wattage Amperage Days Note
baseline 0.25W 16mA 9 no cards, same as before upgrade
1 USB-C 0.25W 16mA 9 same as before
2 USB-C 0.25W 16mA 9 same
1 USB-A 0.80W 62mA 3 +550mW!! worse than before
2 USB-A 1.12W 73mA <2 +320mW, on top of the above, bad!
Ethernet 0.62W 40mA 3-4 new result, decent
1TB SSD 0.52W 34mA 4 a bit worse than before (+2mA)
MicroSD 0.51W 22mA 4 same
DisplayPort 0.52W 34mA 4+ upgrade improved by 300mW
1 HDMI ? 38mA ? same
2 HDMI ? 45mA ? a bit worse than before (+3mA)
Normal 1.08W 70mA ~2 Ethernet, 2 USB-C, USB-A
Full results in standby-tests-306. The big takeaway for me is that the update did not improve power usage on the USB-A ports which is a big problem for my use case. There is a notable improvement on the DisplayPort power consumption which brings it more in line with the HDMI connector, but it still doesn't properly turn off on suspend either. Even worse, the USB-A ports now sometimes fails to resume after suspend, which is pretty annoying. This is a known problem that will hopefully get fixed in the final release.

Battery wear protection The BIOS has an option to limit charge to 80% to mitigate battery wear. There's a way to control the embedded controller from runtime with fw-ectool, partly documented here. The command would be:
sudo ectool fwchargelimit 80
I looked at building this myself but failed to run it. I opened a RFP in Debian so that we can ship this in Debian, and also documented my work there. Note that there is now a counter that tracks charge/discharge cycles. It's visible in tlp-stat -b, which is a nice improvement:
root@angela:/home/anarcat# tlp-stat -b
--- TLP 1.5.0 --------------------------------------------
+++ Battery Care
Plugin: generic
Supported features: none available
+++ Battery Status: BAT1
/sys/class/power_supply/BAT1/manufacturer                   = NVT
/sys/class/power_supply/BAT1/model_name                     = Framewo
/sys/class/power_supply/BAT1/cycle_count                    =      3
/sys/class/power_supply/BAT1/charge_full_design             =   3572 [mAh]
/sys/class/power_supply/BAT1/charge_full                    =   3541 [mAh]
/sys/class/power_supply/BAT1/charge_now                     =   1625 [mAh]
/sys/class/power_supply/BAT1/current_now                    =    178 [mA]
/sys/class/power_supply/BAT1/status                         = Discharging
/sys/class/power_supply/BAT1/charge_control_start_threshold = (not available)
/sys/class/power_supply/BAT1/charge_control_end_threshold   = (not available)
Charge                                                      =   45.9 [%]
Capacity                                                    =   99.1 [%]
One thing that is still missing is the charge threshold data (the (not available) above). There's been some work to make that accessible in August, stay tuned? This would also make it possible implement hysteresis support.

Ethernet expansion card The Framework ethernet expansion card is a fancy little doodle: "2.5Gbit/s and 10/100/1000Mbit/s Ethernet", the "clear housing lets you peek at the RTL8156 controller that powers it". Which is another way to say "we didn't completely finish prod on this one, so it kind of looks like we 3D-printed this in the shop".... The card is a little bulky, but I guess that's inevitable considering the RJ-45 form factor when compared to the thin Framework laptop. I have had a serious issue when trying it at first: the link LEDs just wouldn't come up. I made a full bug report in the forum and with upstream support, but eventually figured it out on my own. It's (of course) a power saving issue: if you reboot the machine, the links come up when the laptop is running the BIOS POST check and even when the Linux kernel boots. I first thought that the problem is likely related to the powertop service which I run at boot time to tweak some power saving settings. It seems like this:
echo 'on' > '/sys/bus/usb/devices/4-2/power/control'
... is a good workaround to bring the card back online. You can even return to power saving mode and the card will still work:
echo 'auto' > '/sys/bus/usb/devices/4-2/power/control'
Further research by Matt_Hartley from the Framework Team found this issue in the tlp tracker that shows how the USB_AUTOSUSPEND setting enables the power saving even if the driver doesn't support it, which, in retrospect, just sounds like a bad idea. To quote that issue:
By default, USB power saving is active in the kernel, but not force-enabled for incompatible drivers. That is, devices that support suspension will suspend, drivers that do not, will not.
So the fix is actually to uninstall tlp or disable that setting by adding this to /etc/tlp.conf:
USB_AUTOSUSPEND=0
... but that disables auto-suspend on all USB devices, which may hurt other power usage performance. I have found that a a combination of:
USB_AUTOSUSPEND=1
USB_DENYLIST="0bda:8156"
and this on the kernel commandline:
usbcore.quirks=0bda:8156:k
... actually does work correctly. I now have this in my /etc/default/grub.d/framework-tweaks.cfg file:
# net.ifnames=0: normal interface names ffs (e.g. eth0, wlan0, not wlp166
s0)
# nvme.noacpi=1: reduce SSD disk power usage (not working)
# mem_sleep_default=deep: reduce power usage during sleep (not working)
# usbcore.quirk is a workaround for the ethernet card suspend bug: https:
//guides.frame.work/Guide/Fedora+37+Installation+on+the+Framework+Laptop/
108?lang=en
GRUB_CMDLINE_LINUX="net.ifnames=0 nvme.noacpi=1 mem_sleep_default=deep usbcore.quirks=0bda:8156:k"
# fix the resolution in grub for fonts to not be tiny
GRUB_GFXMODE=1024x768
Other than that, I haven't been able to max out the card because I don't have other 2.5Gbit/s equipment at home, which is strangely satisfying. But running against my Turris Omnia router, I could pretty much max a gigabit fairly easily:
[ ID] Interval           Transfer     Bitrate         Retr
[  5]   0.00-10.00  sec  1.09 GBytes   937 Mbits/sec  238             sender
[  5]   0.00-10.00  sec  1.09 GBytes   934 Mbits/sec                  receiver
The card doesn't require any proprietary firmware blobs which is surprising. Other than the power saving issues, it just works. In my power tests (see powerstat-wayland), the Ethernet card seems to use about 1.6W of power idle, without link, in the above "quirky" configuration where the card is functional but without autosuspend.

Proprietary firmware blobs The framework does need proprietary firmware to operate. Specifically:
  • the WiFi network card shipped with the DIY kit is a AX210 card that requires a 5.19 kernel or later, and the firmware-iwlwifi non-free firmware package
  • the Bluetooth adapter also loads the firmware-iwlwifi package (untested)
  • the graphics work out of the box without firmware, but certain power management features come only with special proprietary firmware, normally shipped in the firmware-misc-nonfree but currently missing from the package
Note that, at the time of writing, the latest i915 firmware from linux-firmware has a serious bug where loading all the accessible firmware results in noticeable I estimate 200-500ms lag between the keyboard (not the mouse!) and the display. Symptoms also include tearing and shearing of windows, it's pretty nasty. One workaround is to delete the two affected firmware files:
cd /lib/firmware && rm adlp_guc_70.1.1.bin adlp_guc_69.0.3.bin
update-initramfs -u
You will get the following warning during build, which is good as it means the problematic firmware is disabled:
W: Possible missing firmware /lib/firmware/i915/adlp_guc_69.0.3.bin for module i915
W: Possible missing firmware /lib/firmware/i915/adlp_guc_70.1.1.bin for module i915
But then it also means that critical firmware isn't loaded, which means, among other things, a higher battery drain. I was able to move from 8.5-10W down to the 7W range after making the firmware work properly. This is also after turning the backlight all the way down, as that takes a solid 2-3W in full blast. The proper fix is to use some compositing manager. I ended up using compton with the following systemd unit:
[Unit]
Description=start compositing manager
PartOf=graphical-session.target
ConditionHost=angela
[Service]
Type=exec
ExecStart=compton --show-all-xerrors --backend glx --vsync opengl-swc
Restart=on-failure
[Install]
RequiredBy=graphical-session.target
compton is orphaned however, so you might be tempted to use picom instead, but in my experience the latter uses much more power (1-2W extra, similar experience). I also tried compiz but it would just crash with:
anarcat@angela:~$ compiz --replace
compiz (core) - Warn: No XI2 extension
compiz (core) - Error: Another composite manager is already running on screen: 0
compiz (core) - Fatal: No manageable screens found on display :0
When running from the base session, I would get this instead:
compiz (core) - Warn: No XI2 extension
compiz (core) - Error: Couldn't load plugin 'ccp'
compiz (core) - Error: Couldn't load plugin 'ccp'
Thanks to EmanueleRocca for figuring all that out. See also this discussion about power management on the Framework forum. Note that Wayland environments do not require any special configuration here and actually work better, see my Wayland migration notes for details.
Also note that the iwlwifi firmware also looks incomplete. Even with the package installed, I get those errors in dmesg:
[   19.534429] Intel(R) Wireless WiFi driver for Linux
[   19.534691] iwlwifi 0000:a6:00.0: enabling device (0000 -> 0002)
[   19.541867] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[   19.541881] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-72.ucode (-2)
[   19.541882] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-72.ucode failed with error -2
[   19.541890] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[   19.541895] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-71.ucode (-2)
[   19.541896] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-71.ucode failed with error -2
[   19.541903] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[   19.541907] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-70.ucode (-2)
[   19.541908] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-70.ucode failed with error -2
[   19.541913] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[   19.541916] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-69.ucode (-2)
[   19.541917] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-69.ucode failed with error -2
[   19.541922] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[   19.541926] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-68.ucode (-2)
[   19.541927] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-68.ucode failed with error -2
[   19.541933] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[   19.541937] iwlwifi 0000:a6:00.0: firmware: failed to load iwlwifi-ty-a0-gf-a0-67.ucode (-2)
[   19.541937] iwlwifi 0000:a6:00.0: Direct firmware load for iwlwifi-ty-a0-gf-a0-67.ucode failed with error -2
[   19.544244] iwlwifi 0000:a6:00.0: firmware: direct-loading firmware iwlwifi-ty-a0-gf-a0-66.ucode
[   19.544257] iwlwifi 0000:a6:00.0: api flags index 2 larger than supported by driver
[   19.544270] iwlwifi 0000:a6:00.0: TLV_FW_FSEQ_VERSION: FSEQ Version: 0.63.2.1
[   19.544523] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[   19.544528] iwlwifi 0000:a6:00.0: firmware: failed to load iwl-debug-yoyo.bin (-2)
[   19.544530] iwlwifi 0000:a6:00.0: loaded firmware version 66.55c64978.0 ty-a0-gf-a0-66.ucode op_mode iwlmvm
Some of those are available in the latest upstream firmware package (iwlwifi-ty-a0-gf-a0-71.ucode, -68, and -67), but not all (e.g. iwlwifi-ty-a0-gf-a0-72.ucode is missing) . It's unclear what those do or don't, as the WiFi seems to work well without them. I still copied them in from the latest linux-firmware package in the hope they would help with power management, but I did not notice a change after loading them. There are also multiple knobs on the iwlwifi and iwlmvm drivers. The latter has a power_schmeme setting which defaults to 2 (balanced), setting it to 3 (low power) could improve battery usage as well, in theory. The iwlwifi driver also has power_save (defaults to disabled) and power_level (1-5, defaults to 1) settings. See also the output of modinfo iwlwifi and modinfo iwlmvm for other driver options.

Graphics acceleration After loading the latest upstream firmware and setting up a compositing manager (compton, above), I tested the classic glxgears. Running in a window gives me odd results, as the gears basically grind to a halt:
Running synchronized to the vertical refresh.  The framerate should be
approximately the same as the monitor refresh rate.
137 frames in 5.1 seconds = 26.984 FPS
27 frames in 5.4 seconds =  5.022 FPS
Ouch. 5FPS! But interestingly, once the window is in full screen, it does hit the monitor refresh rate:
300 frames in 5.0 seconds = 60.000 FPS
I'm not really a gamer and I'm not normally using any of that fancy graphics acceleration stuff (except maybe my browser does?). I installed intel-gpu-tools for the intel_gpu_top command to confirm the GPU was engaged when doing those simulations. A nice find. Other useful diagnostic tools include glxgears and glxinfo (in mesa-utils) and (vainfo in vainfo). Following to this post, I also made sure to have those settings in my about:config in Firefox, or, in user.js:
user_pref("media.ffmpeg.vaapi.enabled", true);
Note that the guide suggests many other settings to tweak, but those might actually be overkill, see this comment and its parents. I did try forcing hardware acceleration by setting gfx.webrender.all to true, but everything became choppy and weird. The guide also mentions installing the intel-media-driver package, but I could not find that in Debian. The Arch wiki has, as usual, an excellent reference on hardware acceleration in Firefox.

Chromium / Signal desktop bugs It looks like both Chromium and Signal Desktop misbehave with my compositor setup (compton + i3). The fix is to add a persistent flag to Chromium. In Arch, it's conveniently in ~/.config/chromium-flags.conf but that doesn't actually work in Debian. I had to put the flag in /etc/chromium.d/disable-compositing, like this:
export CHROMIUM_FLAGS="$CHROMIUM_FLAGS --disable-gpu-compositing"
It's possible another one of the hundreds of flags might fix this issue better, but I don't really have time to go through this entire, incomplete, and unofficial list (!?!). Signal Desktop is a similar problem, and doesn't reuse those flags (because of course it doesn't). Instead I had to rewrite the wrapper script in /usr/local/bin/signal-desktop to use this instead:
exec /usr/bin/flatpak run --branch=stable --arch=x86_64 org.signal.Signal --disable-gpu-compositing "$@"
This was mostly done in this Puppet commit. I haven't figured out the root of this problem. I did try using picom and xcompmgr; they both suffer from the same issue. Another Debian testing user on Wayland told me they haven't seen this problem, so hopefully this can be fixed by switching to wayland.

Graphics card hangs I believe I might have this bug which results in a total graphical hang for 15-30 seconds. It's fairly rare so it's not too disruptive, but when it does happen, it's pretty alarming. The comments on that bug report are encouraging though: it seems this is a bug in either mesa or the Intel graphics driver, which means many people have this problem so it's likely to be fixed. There's actually a merge request on mesa already (2022-12-29). It could also be that bug because the error message I get is actually:
Jan 20 12:49:10 angela kernel: Asynchronous wait on fence 0000:00:02.0:sway[104431]:cb0ae timed out (hint:intel_atomic_commit_ready [i915]) 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GPU HANG: ecode 12:0:00000000 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] Resetting chip for stopped heartbeat on rcs0 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC firmware i915/adlp_guc_70.1.1.bin version 70.1 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC firmware i915/tgl_huc_7.9.3.bin version 7.9 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] HuC authenticated 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC submission enabled 
Jan 20 12:49:15 angela kernel: i915 0000:00:02.0: [drm] GuC SLPC enabled
It's a solid 30 seconds graphical hang. Maybe the keyboard and everything else keeps working. The latter bug report is quite long, with many comments, but this one from January 2023 seems to say that Sway 1.8 fixed the problem. There's also an earlier patch to add an extra kernel parameter that supposedly fixes that too. There's all sorts of other workarounds in there, for example this:
echo "options i915 enable_dc=1 enable_guc_loading=1 enable_guc_submission=1 edp_vswing=0 enable_guc=2 enable_fbc=1 enable_psr=1 disable_power_well=0"   sudo tee /etc/modprobe.d/i915.conf
from this comment... So that one is unsolved, as far as the upstream drivers are concerned, but maybe could be fixed through Sway.

Weird USB hangs / graphical glitches I have had weird connectivity glitches better described in this post, but basically: my USB keyboard and mice (connected over a USB hub) drop keys, lag a lot or hang, and I get visual glitches. The fix was to tighten the screws around the CPU on the motherboard (!), which is, thankfully, a rather simple repair.

USB docks are hell Note that the monitors are hooked up to angela through a USB-C / Thunderbolt dock from Cable Matters, with the lovely name of 201053-SIL. It has issues, see this blog post for an in-depth discussion.

Shipping details I ordered the Framework in August 2022 and received it about a month later, which is sooner than expected because the August batch was late. People (including me) expected this to have an impact on the September batch, but it seems Framework have been able to fix the delivery problems and keep up with the demand. As of early 2023, their website announces that laptops ship "within 5 days". I have myself ordered a few expansion cards in November 2022, and they shipped on the same day, arriving 3-4 days later.

The supply pipeline There are basically 6 steps in the Framework shipping pipeline, each (except the last) accompanied with an email notification:
  1. pre-order
  2. preparing batch
  3. preparing order
  4. payment complete
  5. shipping
  6. (received)
This comes from the crowdsourced spreadsheet, which should be updated when the status changes here. I was part of the "third batch" of the 12th generation laptop, which was supposed to ship in September. It ended up arriving on my door step on September 27th, about 33 days after ordering. It seems current orders are not processed in "batches", but in real time, see this blog post for details on shipping.

Shipping trivia I don't know about the others, but my laptop shipped through no less than four different airplane flights. Here are the hops it took: I can't quite figure out how to calculate exactly how much mileage that is, but it's huge. The ride through Alaska is surprising enough but the bounce back through Winnipeg is especially weird. I guess the route happens that way because of Fedex shipping hubs. There was a related oddity when I had my Purism laptop shipped: it left from the west coast and seemed to enter on an endless, two week long road trip across the continental US.

Other resources

9 March 2023

Charles Plessy: If you work at Dreamhost, can you help us?

Update: thanks to the very kind involvment of the widow of our wemaster, we could provide enough private information to Dreamhost, who finally accepted to reset the password and the MFA. We have recovered evrything! Many thanks to everybody who helped us! Due to tragic circumstances, one association that I am part of, Sciencescope got locked out of its account at Dreamhost. Locked out, we can not pay the annual bill. Dreamhost contacted us about the payment, but will not let us recover the access to our account in order to pay. So they will soon close the account. Our website, mailing lists and archives, will be erased. We provided plenty of evidence that we are not scammers and that we are the legitimate owners of the account, but reviewing it is above the pay grade of the custommer support (I don't blame them) and I could not convince them to let somebody higher have a look at our case. If you work at Dreamhost and want to keep us as custommers instead of kicking us like that, please ask the support service in charge of ticket 225948648 to send the recovery URL to the secondary email adddresses (the ones you used to contact us about the bill!) in addition to the primary one (which nobody will read anymore). You can encrypt it for my Debian Developer key 73471499CC60ED9EEE805946C5BD6C8F2295D502 if you worry it gets in wrong hands. If you still have doubts I am available for calls any time. If you know somebody working at Dreamhost can you pass them the message? This would be a big, big, relief for our non-profit association.

Next.

Previous.