Search Results: "ressu"

26 August 2022

Antoine Beaupr : How to nationalize the internet in Canada

Rogers had a catastrophic failure in July 2022. It affected emergency services (as in: people couldn't call 911, but also some 911 services themselves failed), hospitals (which couldn't access prescriptions), banks and payment systems (as payment terminals stopped working), and regular users as well. The outage lasted almost a full day, and Rogers took days to give any technical explanation on the outage, and even when they did, details were sparse. So far the only detailed account is from outside actors like Cloudflare which seem to point at an internal BGP failure. Its impact on the economy has yet to be measured, but it probably cost millions of dollars in wasted time and possibly lead to life-threatening situations. Apart from holding Rogers (criminally?) responsible for this, what should be done in the future to avoid such problems? It's not the first time something like this has happened: it happened to Bell Canada as well. The Rogers outage is also strangely similar to the Facebook outage last year, but, to its credit, Facebook did post a fairly detailed explanation only a day later. The internet is designed to be decentralised, and having large companies like Rogers hold so much power is a crucial mistake that should be reverted. The question is how. Some critics were quick to point out that we need more ISP diversity and competition, but I think that's missing the point. Others have suggested that the internet should be a public good or even straight out nationalized. I believe the solution to the problem of large, private, centralised telcos and ISPs is to replace them with smaller, public, decentralised service providers. The only way to ensure that works is to make sure that public money ends up creating infrastructure controlled by the public, which means treating ISPs as a public utility. This has been implemented elsewhere: it works, it's cheaper, and provides better service.

A modest proposal Global wireless services (like phone services) and home internet inevitably grow into monopolies. They are public utilities, just like water, power, railways, and roads. The question of how they should be managed is therefore inherently political, yet people don't seem to question the idea that only the market (i.e. "competition") can solve this problem. I disagree. 10 years ago (in french), I suggested we, in Qu bec, should nationalize large telcos and internet service providers. I no longer believe is a realistic approach: most of those companies have crap copper-based networks (at least for the last mile), yet are worth billions of dollars. It would be prohibitive, and a waste, to buy them out. Back then, I called this idea "R seau-Qu bec", a reference to the already nationalized power company, Hydro-Qu bec. (This idea, incidentally, made it into the plan of a political party.) Now, I think we should instead build our own, public internet. Start setting up municipal internet services, fiber to the home in all cities, progressively. Then interconnect cities with fiber, and build peering agreements with other providers. This also includes a bid on wireless spectrum to start competing with phone providers as well. And while that sounds really ambitious, I think it's possible to take this one step at a time.

Municipal broadband In many parts of the world, municipal broadband is an elegant solution to the problem, with solutions ranging from Stockholm's city-owned fiber network (dark fiber, layer 1) to Utah's UTOPIA network (fiber to the premises, layer 2) and municipal wireless networks like Guifi.net which connects about 40,000 nodes in Catalonia. A good first step would be for cities to start providing broadband services to its residents, directly. Cities normally own sewage and water systems that interconnect most residences and therefore have direct physical access everywhere. In Montr al, in particular, there is an ongoing project to replace a lot of old lead-based plumbing which would give an opportunity to lay down a wired fiber network across the city. This is a wild guess, but I suspect this would be much less expensive than one would think. Some people agree with me and quote this as low as 1000$ per household. There is about 800,000 households in the city of Montr al, so we're talking about a 800 million dollars investment here, to connect every household in Montr al with fiber and incidentally a quarter of the province's population. And this is not an up-front cost: this can be built progressively, with expenses amortized over many years. (We should not, however, connect Montr al first: it's used as an example here because it's a large number of households to connect.) Such a network should be built with a redundant topology. I leave it as an open question whether we should adopt Stockholm's more minimalist approach or provide direct IP connectivity. I would tend to favor the latter, because then you can immediately start to offer the service to households and generate revenues to compensate for the capital expenditures. Given the ridiculous profit margins telcos currently have 8 billion $CAD net income for BCE (2019), 2 billion $CAD for Rogers (2020) I also believe this would actually turn into a profitable revenue stream for the city, the same way Hydro-Qu bec is more and more considered as a revenue stream for the state. (I personally believe that's actually wrong and we should treat those resources as human rights and not money cows, but I digress. The point is: this is not a cost point, it's a revenue.) The other major challenge here is that the city will need competent engineers to drive this project forward. But this is not different from the way other public utilities run: we have electrical engineers at Hydro, sewer and water engineers at the city, this is just another profession. If anything, the computing science sector might be more at fault than the city here in its failure to provide competent and accountable engineers to society... Right now, most of the network in Canada is copper: we are hitting the limits of that technology with DSL, and while cable has some life left to it (DOCSIS 4.0 does 4Gbps), that is nowhere near the capacity of fiber. Take the town of Chattanooga, Tennessee: in 2010, the city-owned ISP EPB finished deploying a fiber network to the entire town and provided gigabit internet to everyone. Now, 12 years later, they are using this same network to provide the mind-boggling speed of 25 gigabit to the home. To give you an idea, Chattanooga is roughly the size and density of Sherbrooke.

Provincial public internet As part of building a municipal network, the question of getting access to "the internet" will immediately come up. Naturally, this will first be solved by using already existing commercial providers to hook up residents to the rest of the global network. But eventually, networks should inter-connect: Montr al should connect with Laval, and then Trois-Rivi res, then Qu bec City. This will require long haul fiber runs, but those links are not actually that expensive, and many of those already exist as a public resource at RISQ and CANARIE, which cross-connects universities and colleges across the province and the country. Those networks might not have the capacity to cover the needs of the entire province right now, but that is a router upgrade away, thanks to the amazing capacity of fiber. There are two crucial mistakes to avoid at this point. First, the network needs to remain decentralised. Long haul links should be IP links with BGP sessions, and each city (or MRC) should have its own independent network, to avoid Rogers-class catastrophic failures. Second, skill needs to remain in-house: RISQ has already made that mistake, to a certain extent, by selling its neutral datacenter. Tellingly, MetroOptic, probably the largest commercial dark fiber provider in the province, now operates the QIX, the second largest "public" internet exchange in Canada. Still, we have a lot of infrastructure we can leverage here. If RISQ or CANARIE cannot be up to the task, Hydro-Qu bec has power lines running into every house in the province, with high voltage power lines running hundreds of kilometers far north. The logistics of long distance maintenance are already solved by that institution. In fact, Hydro already has fiber all over the province, but it is a private network, separate from the internet for security reasons (and that should probably remain so). But this only shows they already have the expertise to lay down fiber: they would just need to lay down a parallel network to the existing one. In that architecture, Hydro would be a "dark fiber" provider.

International public internet None of the above solves the problem for the entire population of Qu bec, which is notoriously dispersed, with an area three times the size of France, but with only an eight of its population (8 million vs 67). More specifically, Canada was originally a french colony, a land violently stolen from native people who have lived here for thousands of years. Some of those people now live in reservations, sometimes far from urban centers (but definitely not always). So the idea of leveraging the Hydro-Qu bec infrastructure doesn't always work to solve this, because while Hydro will happily flood a traditional hunting territory for an electric dam, they don't bother running power lines to the village they forcibly moved, powering it instead with noisy and polluting diesel generators. So before giving me fiber to the home, we should give power (and potable water, for that matter), to those communities first. So we need to discuss international connectivity. (How else could we consider those communities than peer nations anyways?c) Qu bec has virtually zero international links. Even in Montr al, which likes to style itself a major player in gaming, AI, and technology, most peering goes through either Toronto or New York. That's a problem that we must fix, regardless of the other problems stated here. Looking at the submarine cable map, we see very few international links actually landing in Canada. There is the Greenland connect which connects Newfoundland to Iceland through Greenland. There's the EXA which lands in Ireland, the UK and the US, and Google has the Topaz link on the west coast. That's about it, and none of those land anywhere near any major urban center in Qu bec. We should have a cable running from France up to Saint-F licien. There should be a cable from Vancouver to China. Heck, there should be a fiber cable running all the way from the end of the great lakes through Qu bec, then up around the northern passage and back down to British Columbia. Those cables are expensive, and the idea might sound ludicrous, but Russia is actually planning such a project for 2026. The US has cables running all the way up (and around!) Alaska, neatly bypassing all of Canada in the process. We just look ridiculous on that map. (Addendum: I somehow forgot to talk about Teleglobe here was founded as publicly owned company in 1950, growing international phone and (later) data links all over the world. It was privatized by the conservatives in 1984, along with rails and other "crown corporations". So that's one major risk to any effort to make public utilities work properly: some government might be elected and promptly sell it out to its friends for peanuts.)

Wireless networks I know most people will have rolled their eyes so far back their heads have exploded. But I'm not done yet. I want wireless too. And by wireless, I don't mean a bunch of geeks setting up OpenWRT routers on rooftops. I tried that, and while it was fun and educational, it didn't scale. A public networking utility wouldn't be complete without providing cellular phone service. This involves bidding for frequencies at the federal level, and deploying a rather large amount of infrastructure, but it could be a later phase, when the engineers and politicians have proven their worth. At least part of the Rogers fiasco would have been averted if such a decentralized network backend existed. One might even want to argue that a separate institution should be setup to provide phone services, independently from the regular wired networking, if only for reliability. Because remember here: the problem we're trying to solve is not just technical, it's about political boundaries, centralisation, and automation. If everything is ran by this one organisation again, we will have failed. However, I must admit that phone services is where my ideas fall a little short. I can't help but think it's also an accessible goal maybe starting with a virtual operator but it seems slightly less so than the others, especially considering how closed the phone ecosystem is.

Counter points In debating these ideas while writing this article, the following objections came up.

I don't want the state to control my internet One legitimate concern I have about the idea of the state running the internet is the potential it would have to censor or control the content running over the wires. But I don't think there is necessarily a direct relationship between resource ownership and control of content. Sure, China has strong censorship in place, partly implemented through state-controlled businesses. But Russia also has strong censorship in place, based on regulatory tools: they force private service providers to install back-doors in their networks to control content and surveil their users. Besides, the USA have been doing warrantless wiretapping since at least 2003 (and yes, that's 10 years before the Snowden revelations) so a commercial internet is no assurance that we have a free internet. Quite the contrary in fact: if anything, the commercial internet goes hand in hand with the neo-colonial internet, just like businesses did in the "good old colonial days". Large media companies are the primary censors of content here. In Canada, the media cartel requested the first site-blocking order in 2018. The plaintiffs (including Qu becor, Rogers, and Bell Canada) are both content providers and internet service providers, an obvious conflict of interest. Nevertheless, there are some strong arguments against having a centralised, state-owned monopoly on internet service providers. FDN makes a good point on this. But this is not what I am suggesting: at the provincial level, the network would be purely physical, and regional entities (which could include private companies) would peer over that physical network, ensuring decentralization. Delegating the management of that infrastructure to an independent non-profit or cooperative (but owned by the state) would also ensure some level of independence.

Isn't the government incompetent and corrupt? Also known as "private enterprise is better skilled at handling this, the state can't do anything right" I don't think this is a "fait accomplit". If anything, I have found publicly ran utilities to be spectacularly reliable here. I rarely have trouble with sewage, water, or power, and keep in mind I live in a city where we receive about 2 meters of snow a year, which tend to create lots of trouble with power lines. Unless there's a major weather event, power just runs here. I think the same can happen with an internet service provider. But it would certainly need to have higher standards to what we're used to, because frankly Internet is kind of janky.

A single monopoly will be less reliable I actually agree with that, but that is not what I am proposing anyways. Current commercial or non-profit entities will be free to offer their services on top of the public network. And besides, the current "ha! diversity is great" approach is exactly what we have now, and it's not working. The pretense that we can have competition over a single network is what led the US into the ridiculous situation where they also pretend to have competition over the power utility market. This led to massive forest fires in California and major power outages in Texas. It doesn't work.

Wouldn't this create an isolated network? One theory is that this new network would be so hostile to incumbent telcos and ISPs that they would simply refuse to network with the public utility. And while it is true that the telcos currently do also act as a kind of "tier one" provider in some places, I strongly feel this is also a problem that needs to be solved, regardless of ownership of networking infrastructure. Right now, telcos often hold both ends of the stick: they are the gateway to users, the "last mile", but they also provide peering to the larger internet in some locations. In at least one datacenter in downtown Montr al, I've seen traffic go through Bell Canada that was not directly targeted at Bell customers. So in effect, they are in a position of charging twice for the same traffic, and that's not only ridiculous, it should just be plain illegal. And besides, this is not a big problem: there are other providers out there. As bad as the market is in Qu bec, there is still some diversity in Tier one providers that could allow for some exits to the wider network (e.g. yes, Cogent is here too).

What about Google and Facebook? Nationalization of other service providers like Google and Facebook is out of scope of this discussion. That said, I am not sure the state should get into the business of organising the web or providing content services however, but I will point out it already does do some of that through its own websites. It should probably keep itself to this, and also consider providing normal services for people who don't or can't access the internet. (And I would also be ready to argue that Google and Facebook already act as extensions of the state: certainly if Facebook didn't exist, the CIA or the NSA would like to create it at this point. And Google has lucrative business with the US department of defense.)

What does not work So we've seen one thing that could work. Maybe it's too expensive. Maybe the political will isn't there. Maybe it will fail. We don't know yet. But we know what does not work, and it's what we've been doing ever since the internet has gone commercial.

Subsidies The absurd price we pay for data does not actually mean everyone gets high speed internet at home. Large swathes of the Qu bec countryside don't get broadband at all, and it can be difficult or expensive, even in large urban centers like Montr al, to get high speed internet. That is despite having a series of subsidies that all avoided investing in our own infrastructure. We had the "fonds de l'autoroute de l'information", "information highway fund" (site dead since 2003, archive.org link) and "branchez les familles", "connecting families" (site dead since 2003, archive.org link) which subsidized the development of a copper network. In 2014, more of the same: the federal government poured hundreds of millions of dollars into a program called connecting Canadians to connect 280 000 households to "high speed internet". And now, the federal and provincial governments are proudly announcing that "everyone is now connected to high speed internet", after pouring more than 1.1 billion dollars to connect, guess what, another 380 000 homes, right in time for the provincial election. Of course, technically, the deadline won't actually be met until 2023. Qu bec is a big area to cover, and you can guess what happens next: the telcos threw up their hand and said some areas just can't be connected. (Or they connect their CEO but not the poor folks across the lake.) The story then takes the predictable twist of giving more money out to billionaires, subsidizing now Musk's Starlink system to connect those remote areas. To give a concrete example: a friend who lives about 1000km away from Montr al, 4km from a small, 2500 habitant village, has recently got symmetric 100 mbps fiber at home from Telus, thanks to those subsidies. But I can't get that service in Montr al at all, presumably because Telus and Bell colluded to split that market. Bell doesn't provide me with such a service either: they tell me they have "fiber to my neighborhood", and only offer me a 25/10 mbps ADSL service. (There is Vid otron offering 400mbps, but that's copper cable, again a dead technology, and asymmetric.)

Conclusion Remember Chattanooga? Back in 2010, they funded the development of a fiber network, and now they have deployed a network roughly a thousand times faster than what we have just funded with a billion dollars. In 2010, I was paying Bell Canada 60$/mth for 20mbps and a 125GB cap, and now, I'm still (indirectly) paying Bell for roughly the same speed (25mbps). Back then, Bell was throttling their competitors networks until 2009, when they were forced by the CRTC to stop throttling. Both Bell and Vid otron still explicitly forbid you from running your own servers at home, Vid otron charges prohibitive prices which make it near impossible for resellers to sell uncapped services. Those companies are not spurring innovation: they are blocking it. We have spent all this money for the private sector to build us a private internet, over decades, without any assurance of quality, equity or reliability. And while in some locations, ISPs did deploy fiber to the home, they certainly didn't upgrade their entire network to follow suit, and even less allowed resellers to compete on that network. In 10 years, when 100mbps will be laughable, I bet those service providers will again punt the ball in the public courtyard and tell us they don't have the money to upgrade everyone's equipment. We got screwed. It's time to try something new.

Updates There was a discussion about this article on Hacker News which was surprisingly productive. Trigger warning: Hacker News is kind of right-wing, in case you didn't know. Since this article was written, at least two more major acquisitions happened, just in Qu bec: In the latter case, vMedia was explicitly saying it couldn't grow because of "lack of access to capital". So basically, we have given those companies a billion dollars, and they are not using that very money to buy out their competition. At least we could have given that money to small players to even out the playing field. But this is not how that works at all. Also, in a bizarre twist, an "analyst" believes the acquisition is likely to help Rogers acquire Shaw. Also, since this article was written, the Washington Post published a review of a book bringing similar ideas: Internet for the People The Fight for Our Digital Future, by Ben Tarnoff, at Verso books. It's short, but even more ambitious than what I am suggesting in this article, arguing that all big tech companies should be broken up and better regulated:
He pulls from Ethan Zuckerman s idea of a web that is plural in purpose that just as pool halls, libraries and churches each have different norms, purposes and designs, so too should different places on the internet. To achieve this, Tarnoff wants governments to pass laws that would make the big platforms unprofitable and, in their place, fund small-scale, local experiments in social media design. Instead of having platforms ruled by engagement-maximizing algorithms, Tarnoff imagines public platforms run by local librarians that include content from public media.
(Links mine: the Washington Post obviously prefers to not link to the real web, and instead doesn't link to Zuckerman's site all and suggests Amazon for the book, in a cynical example.) And in another example of how the private sector has failed us, there was recently a fluke in the AMBER alert system where the entire province was warned about a loose shooter in Saint-Elz ar except the people in the town, because they have spotty cell phone coverage. In other words, millions of people received a strongly toned, "life-threatening", alert for a city sometimes hours away, except the people most vulnerable to the alert. Not missing a beat, the CAQ party is promising more of the same medicine again and giving more money to telcos to fix the problem, suggesting to spend three billion dollars in private infrastructure.

14 August 2022

Jamie McClelland: Web caching is hard

Web caching is hard. And also, maybe I m not that good under pressure? In any event, I made the following mistakes while trying to debug a web site using our nginx cache that bit the dust under heavy load today: Action: I ran curl -I https://website.org/ and it hung. Wrong assumption: Something is wrong with nginx. Why else would it just hang? Reconsidered conclusion: The resource (the home page) is a MISS, so nginx has to retrieve it from the origin, but the origin is over-loaded and timing out, so my request is also timing out. Maybe something is wrong with the nginx caching configuration since the home page really should be a HIT but that s another problem. Action: I changed the configuration from our normal caching set of directives to our aggressive caching set of directives, reloaded nginx and curl -I https://website.org/ still hung. Wrong assumption: aggressive caching isn t working and I need a different configuration. Reconsidered conclusion: The home page still has failed to be loaded from the origin, so every request for it is going to be a MISS, and is going to hang, until nginx is able to fill the cache with it. The configuration change might be the right change; we just need the origin to calm down before we will know. Action: I restarted PHP on the origin to free up PHP processes so my home page request can fill the cache and still curl -I https://website.org/ hangs. Wrong assumption: WTF! The world is ending! Reconsidered conclusion: The regular traffic which is accessing other pages (not the home page) consumed all the available PHP processes on the origin before my request for the home page could complete, so nginx is still unable to fill the cache with the home page. Action: Once we got things under control, I changed the caching level from aggressive back down to normal. I ran curl -I https://website.org/ and it was HIT ing. I concluded that we don t need the aggressive cache after all. Got some coffee, came back later and ran it again and it consistently showed MISS. Wrong assumption: What?!? Did something change on the origin to stop the cache from working?? Reconsidered conclusion: The aggressive cache set the cache for 5 minutes. Even after changing to normal caching, the home page was still cached so it was served from the cache. After 5 minutes, the cache expired. Now, the normal cache setting are in play to determine whether the request would be cached or not. In other words, you have to wait for the cache to expire (or bust the cache) before you can effectively know if the new cache settings are working.

Russ Allbery: Review: Still Not Safe

Review: Still Not Safe, by Robert L. Wears & Kathleen M. Sutcliffe
Publisher: Oxford University Press
Copyright: November 2019
ISBN: 0-19-027128-0
Format: Kindle
Pages: 232
Still Not Safe is an examination of the recent politics and history of patient safety in medicine. Its conclusions are summarized by the opening paragraph of the preface:
The American moral and social philosopher Eric Hoffer reportedly said that every great cause begins as a movement, becomes a business, and eventually degenerates into a racket. The reform movement to make healthcare safer is clearly a great cause, but patient safety efforts are increasingly following Hoffer's path.
Robert Wears was Professor of Emergency Medicine at the University of Florida specializing in patient safety. Kathleen Sutcliffe is Professor of Medicine and Business at Johns Hopkins. This book is based on research funded by a grant from the Robert Wood Johnson Foundation, for which both Wears and Sutcliffe were primary investigators. (Wears died in 2017, but the acknowledgments imply that at least early drafts of the book existed by that point and it was indeed co-written.) The anchor of the story of patient safety in Still Not Safe is the 1999 report from the Institute of Medicine entitled To Err is Human, to which the authors attribute an explosion of public scrutiny of medical safety. The headline conclusion of that report, which led nightly news programs after its release, was that 44,000 to 120,000 people died each year in the United States due to medical error. This report prompted government legislation, funding for new safety initiatives, a flurry of follow-on reports, and significant public awareness of medical harm. What it did not produce, in the authors' view, is significant improvements in patient safety. The central topic of this book is an analysis of why patient safety efforts have had so little measurable effect. The authors attribute this to three primary causes: an unwillingness to involve safety experts from outside medicine or absorb safety lessons from other disciplines, an obsession with human error that led to profound misunderstandings of the nature of safety, and the misuse of safety concerns as a means to centralize control of medical practice in the hands of physician-administrators. (The term used by the authors is "managerial, scientific-bureaucratic medicine," which is technically accurate but rather awkward.) Biggest complaint first: This book desperately needed examples, case studies, or something to make these ideas concrete. There are essentially none in 230 pages apart from passing mentions of famous cases of medical error that added to public pressure, and a tantalizing but maddeningly nonspecific discussion of the atypically successful effort to radically improve the safety of anesthesia. Apparently anesthesiologists involved safety experts from outside medicine, avoided a focus on human error, turned safety into an engineering problem, and made concrete improvements that had a hugely positive impact on the number of adverse events for patients. Sounds fascinating! Alas, I'm just as much in the dark about what those improvements were as I was when I started reading this book. Apart from a vague mention of some unspecified improvements to anesthesia machines, there are no concrete descriptions whatsoever. I understand that the authors were probably leery of giving too many specific examples of successful safety initiatives since one of their core points is that safety is a mindset and philosophy rather than a replicable set of actions, and copying the actions of another field without understanding their underlying motivations or context within a larger system is doomed to failure. But you have to give the reader something, or the book starts feeling like a flurry of abstract assertions. Much is made here of the drawbacks of a focus on human error, and the superiority of the safety analysis done in other fields that have moved beyond error-centric analysis (and in some cases have largely discarded the word "error" as inherently unhelpful and ambiguous). That leads naturally to showing an analysis of an adverse incident through an error lens and then through a more nuanced safety lens, making the differences concrete for the reader. It was maddening to me that the authors never did this. This book was recommended to me as part of a discussion about safety and reliability in tech and the need to learn from safety practices in other fields. In that context, I didn't find it useful, although surprisingly that's because the thinking in medicine (at least as presented by these authors) seems behind the current thinking in distributed systems. The idea that human error is not a useful model for approaching reliability is standard in large tech companies, nearly all of which use blameless postmortems for exactly that reason. Tech, similar to medicine, does have a tendency to be insular and not look outside the field for good ideas, but the approach to large-scale reliability in tech seems to have avoided the other traps discussed here. (Security is another matter, but security is also adversarial, which creates different problems that I suspect require different tools.) What I did find fascinating in this book, although not directly applicable to my own work, is the way in which a focus on human error becomes a justification for bureaucratic control and therefore a concentration of power in a managerial layer. If the assumption is that medical harm is primarily caused by humans making avoidable mistakes, and therefore the solution is to prevent humans from making mistakes through better training, discipline, or process, this creates organizations that are divided into those who make the rules and those who follow the rules. The long-term result is a practice of medicine in which a small number of experts decide the correct treatment for a given problem, and then all other practitioners are expected to precisely follow that treatment plan to avoid "errors." (The best distributed systems approaches may avoid this problem, but this failure mode seems nearly universal in technical support organizations.) I was startled by how accurate that portrayal of medicine felt. My assumption prior to reading this book was that the modern experience of medicine as an assembly line with patients as widgets was caused by the pressure for higher "productivity" and thus shorter visit times, combined with (in the US) the distorting effects of our broken medical insurance system. After reading this book, I've added a misguided way of thinking about medical error and risk avoidance to that analysis. One of the authors' points (which, as usual, I wish they'd made more concrete with a case study) is that the same thought process that lets a doctor make a correct diagnosis and find a working treatment is the thought process that may lead to an incorrect diagnosis or treatment. There is not a separable state of "mental error" that can be eliminated. Decision-making processes are more complicated and more integrated than that. If you try to prevent "errors" by eliminating flexibility, you also eliminate vital tools for successfully treating patients. The authors are careful to point out that the prior state of medicine in which each doctor was a force to themselves and there was no role for patient safety as a discipline was also bad for safety. Reverting to the state of medicine before the advent of the scientific-bureaucratic error-avoiding culture is also not a solution. But, rather at odds with other popular books about medicine, the authors are highly critical of safety changes focused on human error prevention, such as mandatory checklists. In their view, this is exactly the sort of attempt to blindly copy the machinery of safety in another field (in this case, air travel) without understanding the underlying purpose and system of which it's a part. I am not qualified to judge the sharp dispute over whether there is solid clinical evidence that checklists are helpful (these authors claim there is not; I know other books make different claims, and I suspect it may depend heavily on how the checklist is used). But I found the authors' argument that one has to design systems holistically for safety, not try to patch in safety later by turning certain tasks into rote processes and humans into machines, to be persuasive. I'm not willing to recommend this book given how devoid it is of concrete examples. I was able to fill in some of that because of prior experience with the literature on site reliability engineering, but a reader who wasn't previously familiar with discussions of safety or reliability may find much of this book too abstract to be comprehensible. But I'm not sorry I read it. I hadn't previously thought about the power dynamics of a focus on error, and I think that will be a valuable observation to keep in mind. Rating: 6 out of 10

12 August 2022

Shirish Agarwal: Mum and Books

The last day
The first lesson I would like everybody to know and have is to buy two machines, especially a machine to check low blood pressure. I had actually ordered one from Amazon but they never delivered. I hope to sue them in consumer court in due course of time. The other one is a blood sugar machine which I ordered and did get, but the former is more important than the latter, and the reason why will be known soon. Mum had stopped eating solids and was entirely on liquids for the last month of her life. I did try enticing her however I could with aromatic food but failed. Add to that we had weird weather this entire year. June is supposed to be when the weather turns and we have gentle showers, but this whole June it felt like we were in an oven. She asked for liquids whenever and although I hated that she was not eating solids, at least she was having liquids (juices and whatnot) and that s how I pacified myself. I had been repeatedly told by family and extended family to get a full-time nurse but she objected time and again for the same and I had to side with her. Then July 1st came around and part of extended family also came, and they impressed both on me and her to get a nurse so finally, I was able to get her nurse. I was also being pulled in various directions (outside my stuff, mumma s stuff) and doing whatever she needed in terms of supplies. On July 4th, think she had low blood pressure but without a machine, one cannot know. At least that s what I know. If somebody knows anything better, please share, who knows it may save lives. I don t have a blood pressure monitor even to date

There used to be 5-6 doctors in our locality before the Pandemic, but because of the Pandemic and whatever other reasons, almost all doctors had given up attending house calls. And the house where I live is a 100-year-old house so it has narrow passageways and we have no lift. So taking her in and out is a challenge and an ordeal, and something that is not easily done. I had to do some more works so I asked the nurse to stay a bit over 8 p.m. I came and the nurse left for the day. That day I had been distracted for a number of reasons which I don t remember what was but at that point in time, doing those works seemed important. I called out to her but she didn t respond. I remember the night before she had been agitated while sleeping, I slept nearby and kept an eye on her. I had called her a few times to ask whether she needed something but she didn t respond. (this is about the earlier night). That evening, it was raining quite a bit, I called her a few times but she didn t speak. I kissed her on the cheek and realized she is cold. Mumma usually becomes very agitated if she feels cold and shouts at me. I realized she is cold and her body a bit stiff. I was supposed to eat but just couldn t. I dunno what I suspected, I just hired a rickshaw and went around till 9 p.m. and it was a fruitless search for a doctor. I returned home, and again called her but there was no response. Because she was not responding, I became fearful, had a shat, and then dialed the hospital. Asking for the ambulance, it took about an hr. but finally, the ambulance came in. It was now 11 o clock or 2300 hrs. when the ambulance arrived in. It took another half an hr. getting few kids who had come from some movie or something to get them to help mum get down through the passage to the ambulance. We finally reached the hospital at 2330. The people on casualty that day were known to me, and they also knew my hearing problem, so it was much easier to communicate. Half an hour later, they proclaimed her dead. Fortunately or not, I had just bought the newer mobile phone just a few days back. And right now, In India, WhatsApp is one of the most used apps. So I was able to chat with everybody and tell them what was happening or rather what has happened. Of all, mamaji (mother s brother) shared that most members of the family would not be able to come except a cousin sister who lives in Mumbai. I was instructed to get the body refrigerated for a few hrs. It is only then I came to know various ways in which the body is refrigerated and how cruel it would have been towards Atal Bihari Vajpayee s family, but that is politics. I had to go to quite a few places and was back home around 3 a.m. I was supposed to sleep, but sleep was out of the question. I whiled away a few hrs. playing, seeing movies, something or the other to keep myself distracted as literally, I had no idea what to do. Morning came, took a bath, went outside, had some snacks, came home and somewhere then slept. One of my Mausi s (mother s sister) was insisting to get the body burnt in the morning itself but I wanted at least one relative to be there on the last journey. Cousin sister and her husband came to Pune around 4 p.m. I somehow woke, ringing, the vibration I do not know what. I took a short bath, rushed to the place where we had kept the body, got the body and from there where we had asked permission to get the body burned. More than anything else, I felt so sad that except for cousin sister, and me, nobody was with her on the last journey. Even that day, it was raining hard, so people avoided going out. Brother-in-law tried to give me some money, but I brushed it off. I just wanted their company, money is and was never the criteria. So, in the evening we had a meal, my cousin sister, brother-in-law, their two daughters and me. The next day we took the bones and ash to Alandi and did what was needed. I have tried to resurrect the day so many times in my head trying to figure out what I could have done better and am inconclusive. Having a blood pressure monitor for sure would have prevented the tragedy or at least post-phoned for it for a few more days, weeks, years, dunno. I am not medically inclined.

The Books I have to confess, the time they said she is no more, I was hoping that the doctors would say, we have a pill, would you like to take it, it would reunite you with mum. Maybe it wa crazy or whatever, but if such a situation had been, I would have easily gone for it. If I were to go, some people might miss me, but nobody would miss me terribly, and at least I would be with her. There was nothing to look forward to. What saved me from going mad was Michael Crichton s Timeline. It is a fascinating and seductive book. I had actually read it years ago but had forgotten. So many days and nights I was able to sleep hoping that quantum teleportation can be achieved. Anybody in my space would be easily enticed. What joy would it be if I were to meet mum once again. I can tell my other dumb child what to do so she lives for few more years. I could talk to her, just be with her for some time. It is a powerful and seductive idea. I can see so many cults and whatnot that can be formed around it, there may already be, who knows. Another good book that helped me to date has been Through The, Rings Of Fire (Hardcover, J. D. Benedict Thyagarajan). It is an autobiography of Venkat Chalasany (story of an orphan boy who became a successful builder in Pune and the setbacks he had.) While the author has very strong views and I sometimes feel very naive views about things, I was taken a ride of my own city as it was in 1970s and 1980s. I could very well imagine all the different places and people as if they were happening right now. While I have finished the main story, there is still a bit left to read and I read 5-10 minutes every day as it s like a sweet morsel, it s like somebody sharing a tale passed without me having to make an effort. And no lies, the author has been pretty upfront where he has exaggerated or told lies or simply made-up stuff. I was thinking of adding something about movies and some more info or impressions about android but it seems that would have to wait, I do hope, it does work for somebody, even if a single life can be saved from what I shared above, my job is done.

19 July 2022

Craig Small: Linux Memory Statistics

Pretty much everyone who has spent some time on a command line in Linux would have looked at the free command. This command provides some overall statistics on the memory and how it is used. Typical output looks something like this:
             total        used        free      shared  buff/cache  available
Mem:      32717924     3101156    26950016      143608     2666752  29011928
Swap:      1000444           0     1000444
Memory sits in the first row after the headers then we have the swap statistics. Most of the numbers are directly fetched from the procfs file /proc/meminfo which are scaled and presented to the user. A good example of a simple stat is total, which is just the MemTotal row located in that file. For the rest of this post, I ll make the rows from /proc/meminfo have an amber background. What is Free, and what is Used? While you could say that the free value is also merely the MemFree row, this is where Linux memory statistics start to get odd. While that value is indeed what is found for MemFree and not a calculated field, it can be misleading. Most people would assume that Free means free to use, with the implication that only this amount of memory is free to use and nothing more. That would also mean the used value is really used by something and nothing else can use it. In the early days of free and Linux statistics in general that was how it looked. Used is a calculated field (there is no MemUsed row) and was, initially, Total - Free. The problem was, Used also included Buffers and Cached values. This meant that it looked like Linux was using a lot of memory for something. If you read old messages before 2002 that are talking about excessive memory use, they quite likely are looking at the values printed by free. The thing was, under memory pressure the kernel could release Buffers and Cached for use. Not all of the storage but some of it so it wasn t all used. To counter this, free showed a row between Memory and Swap with Used having Buffers and Cached removed and Free having the same values added:
             total       used       free     shared    buffers     cached
Mem:      32717924    6063648   26654276          0     313552    2234436
-/+ buffers/cache:    3515660   29202264
Swap:      1000444          0    1000444
You might notice that this older version of free from around 2001 shows buffers and cached separately and there s no available column (we ll get to Available later.) Shared appears as zero because the old row was labelled MemShared and not Shmem which was changed in Linux 2.6 and I m running a system way past that version. It s not ideal, you can say that the amount of free memory is something above 26654276 and below 29202264 KiB but nothing more accurate. buffers and cached are almost never all-used or all-unused so the real figure is not either of those numbers but something in-between. Cached, just not for Caches That appeared to be an uneasy truce within the Linux memory statistics world for a while. By 2014 we realised that there was a problem with Cached. This field used to have the memory used for a cache for files read from storage. While this value still has that component, it was also being used for tmpfs storage and the use of tmpfs went from an interesting idea to being everywhere. Cheaper memory meant larger tmpfs partitions went from a luxury to something everyone was doing. The problem is with large files put into a tmpfs partition the Free would decrease, but so would Cached meaning the free column in the -/+ row would not change much and understate the impact of files in tmpfs. Lucky enough in Linux 2.6.32 the developers added a Shmem row which was the amount of memory used for shmem and tmpfs. Subtracting that value from Cached gave you the real cached value which we call main_cache and very briefly this is what the cached value would show in free. However, this caused further problems because not all Shem can be reclaimed and reused and probably swapped one set of problematic values for another. It did however prompt the Linux kernel community to have a look at the problem. Enter Available There was increasing awareness of the issues with working out how much memory a system has free within the kernel community. It wasn t just the output of free or the percentage values in top, but load balancer or workload placing systems would have their own view of this value. As memory management and use within the Linux kernel evolved, what was or wasn t free changed and all the userland programs were expected somehow to keep up. The kernel developers realised the best place to get an estimate of the memory not used was in the kernel and they created a new memory statistic called Available. That way if how the memory is used or set to be unreclaimable they could change it and userland programs would go along with it. procps has a fallback for this value and it s a pretty complicated setup.
  1. Find the min_free_kybtes setting in sysfs which is the minimum amount of free memory the kernel will handle
  2. Add a 25% to this value (e.g. if it was 4000 make it 5000), this is the low watermark
  3. To find available, start with MemFree and subtract the low watermark
  4. If half of both Inactive(file) and Active(file) values are greater than the low watermark, add that half value otherwise add the sum of the values minus the low watermark
  5. If half of Slab Claimable is greater than the low watermark, add that half value otherwise add Slab Claimable minus the low watermark
  6. If what you get is less than zero, make available zero
  7. Or, just look at Available in /proc/meminfo
For the free program, we added the Available value and the +/- line was removed. The main_cache value was Cached + Slab while Used was calculated as Total - Free - main_cache - Buffers. This was very close to what the Used column in the +/- line used to show. What s on the Slab? The next issue that came across was the use of slabs. At this point, main_cache was Cached + Slab, but Slab consists of reclaimable and unreclaimable components. One part of Slab can be used elsewhere if needed and the other cannot but the procps tools treated them the same. The Used calculation should not subtract SUnreclaim from the Total because it is actually being used. So in 2015 main_cache was changed to be Cached + SReclaimable. This meant that Used memory was calculated as Total - Free - Cached - SReclaimable - Buffers. Revenge of tmpfs and the return of Available The tmpfs impacting Cached was still an issue. If you added a 10MB file into a tmpfs partition, then Free would reduce by 10MB and Cached would increase by 10MB meaning Used stayed unchanged even though 10MB had gone somewhere. It was time to retire the complex calculation of Used. For procps 4.0.1 onwards, Used now means not available . We take the Total memory and subtract the Available memory. This is not a perfect setup but it is probably going to be the best one we have and testing is giving us much more sensible results. It s also easier for people to understand (take the total value you see in free, then subtract the available value). What does that mean for main_cache which is part of the buff/cache value you see? As this value is no longer in the used memory calculation, it is less important. Should it also be reverted to simply Cached without the reclaimable Slabs? The calculated fields In summary, what this means for the calculated fields in procps at least is: Almost everything else, with the exception of some bounds checking, is what you get out of /proc/meminfo which is straight from the kernel.

27 March 2022

Russ Allbery: Review: A Song for a New Day

Review: A Song for a New Day, by Sarah Pinsker
Publisher: Berkley
Copyright: September 2019
ISBN: 1-9848-0259-3
Format: Kindle
Pages: 372
Luce Cannon was touring with a session band when the shutdown began. First came the hotel evacuation in the middle of the night due to bomb threats against every hotel in the state. Then came the stadium bombing just before they were ready to go on stage. Luce and most of the band performed anyway, with a volunteer crew and a shaken crowd. It was, people later decided, the last large stage show in the United States before the congregation laws shut down public gatherings. That was the end of Luce's expected career, and could have been the end of music, or at least public music. But Luce was stubborn and needed the music. Rosemary grew up in the aftermath: living at home with her parents well away from other people, attending school virtually, and then moving seamlessly into a virtual job for Superwally, the corporation that ran essentially everything. A good fix for some last-minute technical problems with StageHoloLive's ticketing system got her an upgraded VR hoodie and complimentary tickets to the first virtual concert she'd ever attended. She found the experience astonishing, prompting her to browse StageHoloLive job openings and then apply for a technical job and, on a whim, an artist recruiter role. That's how Rosemary found herself, quite nerve-wrackingly, traveling out into the unsafe world to look for underground musicians who could become StageHoloLive acts. A Song for a New Day was published in 2019 and had a moment of fame at the beginning of 2020, culminating in the Nebula Award for best novel, because it's about lockdowns, isolation, and the suppression of public performances. There's even a pandemic, although it's not a respiratory disease (it's some variety of smallpox or chicken pox) and is only a minor contributing factor to the lockdowns in this book. The primary impetus is random violence. Unfortunately, the subsequent two years have not been kind to this novel. Reading it in 2022, with the experience of the past two years fresh in my mind, was a frustrating and exasperating experience because the world setting is completely unbelievable. This is not entirely Pinsker's fault; this book was published in 2019, was not intended to be about our pandemic, and therefore could not reasonably predict its consequences. Still, it required significant effort to extract the premise of the book from the contradictory evidence of current affairs and salvage the pieces of it I still enjoyed. First, Pinsker's characters are the most astonishingly incurious and docile group of people I've seen in a recent political SF novel. This extends beyond the protagonists, where it could arguably be part of their characterization, to encompass the entire world (or at least the United States; the rest of the world does not appear in this book at all so far as I can recall). You may be wondering why someone bombs a stadium at the start of the book. If so, you are alone; this is not something anyone else sees any reason to be curious about. Why is random violence spiraling out of control? Is there some coordinated terrorist activity? Is there some social condition that has gotten markedly worse? Race riots? Climate crises? Wars? The only answer this book offers is a completely apathetic shrug. There is a hint at one point that the government may have theories that they're not communicating, but no one cares about that either. That leads to the second bizarre gap: for a book that hinges on political action, formal political structures are weirdly absent. Near the end of the book, one random person says that they have been inspired to run for office, which so far as I can tell is the first mention of elections in the entire book. The "government" passes congregation laws shutting down public gatherings and there are no protests, no arguments, no debate, but also no suppression, no laws against the press or free speech, no attempt to stop that debate. There's no attempt to build consensus for or against the laws, and no noticeable political campaigning. That's because there's no need. So far as one can tell from this story, literally everyone just shrugs and feels sad and vaguely compliant. Police officers exist and enforce laws, but changing those laws or defying them in other than tiny covert ways simply never occurs to anyone. This makes the book read a bit like a fatuous libertarian parody of a docile populace, but this is so obviously not the author's intent that it wouldn't be satisfying to read even as that. To be clear, this is not something that lasts only a few months in an emergency when everyone is still scared. This complete political docility and total incuriosity persists for enough years that Rosemary grows up within that mindset. The triggering event was a stadium bombing followed by an escalating series of random shootings and bombings. (The pandemic in the book only happens after everything is locked down and, apart from adding to Rosemary's agoraphobia and making people inconsistently obsessed with surface cleanliness, plays little role in the novel.) I lived through 9/11 and the Oklahoma City bombing in the US, other countries have been through more protracted and personally dangerous periods of violence (the Troubles come to mind), and never in human history has any country reacted to a shock of violence (or, for that matter, disease) like the US does in this book. At points it felt like one of those SF novels where the author is telling an apparently normal story and all the characters turn out to be aliens based on spiders or bats. I finally made sense of this by deciding that the author wasn't using sudden shocks like terrorism or pandemics as a model, even though that's what the book postulates. Instead, the model seems to be something implicitly tolerated and worked around: US school shootings, for instance, or the (incorrect but widespread) US belief in a rise of child kidnappings by strangers. The societal reaction here looks less like a public health or counter-terrorism response and more like suburban attitudes towards child-raising, where no child is ever left unattended for safety reasons but we routinely have school shootings no other country has at the same scale. We have been willing to radically (and ineffectually) alter the experience of childhood due to fears of external threat, and that's vaguely and superficially similar to the premise of this novel. What I think Pinsker still misses (and which the pandemic has made glaringly obvious) is the immense momentum of normality and the inability of adults to accept limitations on their own activities for very long. Even with school shootings, kids go to school in person. We now know that parts of society essentially collapse if they don't, and political pressure becomes intolerable. But by using school shootings as the model, I managed to view Pinsker's setup as an unrealistic but still potentially interesting SF extrapolation: a thought experiment that ignores countervailing pressures in order to exaggerate one aspect of society to an extreme. This is half of Pinsker's setup. The other half, which made less of a splash because it didn't have the same accident of timing, is the company Superwally: essentially "what if Amazon bought Walmart, Google, Facebook, Netflix, Disney, and Live Nation." This is a more typical SF extrapolation that left me with a few grumbles about realism, but that I'll accept as a plot device to talk about commercialization, monopolies, and surveillance capitalism. But here again, the complete absence of formal political structures in this book is not credible. Superwally achieves an all-pervasiveness that in other SF novels results in corporations taking over the role of national governments, but it still lobbies the government in much the same way and with about the same effectiveness as Amazon does in our world. I thought this directly undermined some parts of the end of the book. I simply did not believe that Superwally would be as benign and ineffectual as it is shown here. Those are a lot of complaints. I found reading the first half of this book to be an utterly miserable experience and only continued reading out of pure stubbornness and completionism. But the combination of the above-mentioned perspective shift and Pinsker's character focus did partly salvage the book for me. This is not a book about practical political change, even though it makes gestures in that direction. It's primarily a book about people, music, and personal connection, and Pinsker's portrayal of individual and community trust in all its complexity is the one thing the book gets right. Rosemary's character combines a sort of naive arrogance with self-justification in a way that I found very off-putting, but the pivot point of the book is the way in which Luce and her community extends trust to her anyway, as part of staying true to what they believe. The problem that I think Pinsker was trying to write about is atomization, which leads to social fragmentation into small trust networks with vast gulfs between them. Luce and Rosemary are both characters who are willing to bridge those gulfs in their own ways. Pinsker does an excellent job describing the benefits, the hurt, the misunderstandings, the risk, and the awkward process of building those bridges between communities that fundamentally do not understand each other. There's something deep here about the nature of solidarity, and how you need both people like Luce and people like Rosemary to build strong and effective communities. I've kept thinking about that part. It's also helpful for a community to have people who are curious about cause and effect, and who know how a bill becomes a law. It's hard to sum up this book, other than to say that I understand why it won a Nebula but it has deep world-building flaws that have become far more obvious over the past two years. Pinsker tries hard to capture the feeling of live music for both the listener and the performer and partly succeeded even for me, which probably means others will enjoy that part of the book immensely. The portrayal of the difficult dynamics of personal trust was the best part of the book for me, but you may have to build scaffolding and bracing for your world-building disbelief in order to get there. On the whole, I think A Song for a New Day is worth reading, but maybe not right now. If you do read it now, tell yourself at the start that this is absolutely not about the pandemic and that everything political in this book is a hugely simplified straw-man extrapolation, and hopefully you'll find the experience less frustrating than I found it. Rating: 6 out of 10

10 January 2022

Louis-Philippe V ronneau: Grading using the Wacom Intuos S

I've been teaching economics for a few semesters already and, slowly but surely, I'm starting to get the hang of it. Having to deal with teaching remotely hasn't been easy though and I'm really hoping the winter semester will be in-person again. Although I worked way too much last semester1, I somehow managed to transition to using a graphics tablet. I bought a Wacom Intuos S tablet (model CTL-4100) in late August 2021 and overall, I have been very happy with it. Wacom Canada offers a small discount for teachers and I ended up paying 115 CAD (~90 USD) for the tablet, an overall very reasonable price. Unsurprisingly, the Wacom support on Linux is very good and my tablet worked out of the box. The only real problem I had was by default, the tablet sometimes boots up in Android mode, making it unusable. This is easily solved by pressing down on the pad's first and last buttons for a few seconds, until the LED turns white. The included stylus came with hard plastic nibs, but I find them too slippery. I eventually purchased hard felt nibs, which increase the friction and makes for a more paper-like experience. They are a little less durable, but I wrote quite a fair bit and still haven't gone through a single one yet. Learning curve Learning how to use a graphical tablet took me at least a few weeks! When writing on a sheet of paper, the eyes see what the hand writes directly. This is not the case when using a graphical tablet: you are writing on a surface and see the result on your screen, a completely different surface. This dissociation takes a bit of practise to master, but after going through more than 300 pages of notes, it now feels perfectly normal. Here is a side-by-side comparison of my very average hand-writing2:
  1. on paper
  2. using the tablet, the first week
  3. using the tablet, after a couple of months
Comparison of my writing, on paper, using the tablet and using the tablet after a few weeks I still prefer the result of writing on paper, but I think this is mostly due to me not using the pressure sensitivity feature. The support in xournal wasn't great, but now that I've tried it in xournalpp (more on this below), I think I will be enabling it in the future. The result on paper is also more consistent, but I trust my skills will improve over time. Pressure sensitivity on vs off Use case The first use case I have for the tablet is grading papers. I've been asking my students to submit their papers via Moodle for a few semesters already, but until now, I was grading them using PDF comments. The experience wasn't great3 and was rather slow compared to grading physical copies. I'm also a somewhat old-school teacher: I refuse to teach using slides. Death by PowerPoint is real. I write on the blackboard a lot4 and I find it much easier to prepare my notes by hand than by typing them, as the end result is closer to what I actually end up writing down on the board. Writing notes by hand on sheets of paper is a chore too, especially when you revisit the same materiel regularly. Being able to handwrite digital notes gives me a lot more flexibility and it's been great. So far, I have been using xournal to write notes and grade papers, and although it is OK, it has a bunch of quirks I dislike. I was waiting for xournalpp to be packaged in Debian, and it now is5! I'm looking forward to using it next semester. Towards a better computer monitor I have also been feeling the age of my current computer monitor. I am currently using an old 32" 1080p TV from LG and up until now, I had been able to deal with the drawbacks. The colors are pretty bad and 1080p for such a large display isn't great, but I got used to it. What I really noticed when I started using my graphics tablet was the input lag. It's bad enough that there's a clear jello effect when writing and it eventually gives me a headache. It's so bad I usually prefer to work on my laptop, which has a nicer but noticeably smaller panel. I'm currently looking to replace this aging TV6 by something more modern. I have been waiting out since I would like to buy something that will last me another 10 years if possible. Sadly, 32" high refresh rate 4K monitors aren't exactly there yet and I haven't found anything matching my criteria. I would probably also need a new GPU, something that is not easy to come by these days.

  1. I worked at two colleges at the same time, teaching 2 different classes (one of which I was giving for the first time...) to 6 groups in total. I averaged more than 60h per week for sure.
  2. Yes, I only write in small caps. Students love it, as it's much easier to read on the blackboard.
  3. Although most PDF readers support displaying comments, some of my more clueless students still had trouble seeing them and I had to play tech support more than I wanted.
  4. Unsurprisingly, my students also love it. One of the most common feedback I get at the end of the semester is they hate slides too and are very happy I'm one of the few teachers who writes on the board.
  5. Many thanks to Barak A. Pearlmutter for maintaining this package.
  6. It dates back from 2010, when my mom replaced our old CRT by a flat screen. FullHD TVs were getting affordable and I wasn't sad to see our tiny 20-something inches TV go. I eventually ended up with the LG flatscreen a few years later when I moved out in my first apartment and my mom got something better.

3 January 2022

Russ Allbery: Review: Crashed

Review: Crashed, by Adam Tooze
Publisher: Penguin Books
Copyright: 2018
Printing: 2019
ISBN: 0-525-55880-2
Format: Kindle
Pages: 615
The histories of the 2008 financial crisis that I have read focus almost exclusively on the United States. They also stop after the bank rescue and TARP or, if they press on into the aftermath, focus on the resulting damage to the US economy and the widespread pain of falling housing prices and foreclosure. Crashed does neither, instead arguing that 2008 was a crisis of European banks as much as American banks. It extends its history to cover the sovereign debt crisis in the eurozone, treating it as a continuation of the same crisis in a different guise. In the process, Tooze makes a compelling argument that one can draw a clear, if wandering, line from the moral revulsion at the propping up of the international banking system to Brexit and Trump. Qualifications first, since they are important for this type of comprehensive and, in places, surprising and counterintuitive history. Adam Tooze is Kathryn and Shelby Cullom Davis Professor of History at Columbia University and the director of its European Institute. His previous books have won multiple awards, and Crashed won the Lionel Gelber Prize for non-fiction on foreign policy. That it won a prize in that topic, rather than history or economics, is a hint at Tooze's chosen lens. The first half of the book is the lead-up and response to the crisis provoked by the collapse in value of securitized US mortgages and leading to the failure of Lehman Brothers, the failure in all but name of AIG, and a massive bank rescue. The financial instruments at the center of the crisis are complex and difficult to understand, and Tooze provides only brief explanation. This therefore may not be the best first book on the crisis; for that, I would still recommend Bethany McClean and Joe Nocera's All the Devils Are Here, although it's hard to beat Michael Lewis's storytelling in The Big Short. Tooze is not interested in dwelling on a blow-by-blow account of the crisis and initial response, and some of his account feels perfunctory. He is instead interested in describing its entangled global sweep. The new detail I took from the first half of Crashed is the depth of involvement of the European banks in what is often portrayed as a US crisis. Tooze goes into more specifics than other accounts on the eurodollar market, run primarily through the City of London, and the vast dollar-denominated liabilities of European banks. When the crisis struck, the breakdown of liquidity markets left those banks with no source of dollar funding to repay dollar-denominated short-term loans. The scale of dollar borrowing by European banks was vast, dwarfing the currency reserves or trade surpluses of their home countries. An estimate from the Bank of International Settlements put the total dollar funding needs for European banks at more than $2 trillion. The institution that saved the European banks was the United States Federal Reserve. This was an act of economic self-protection, not largesse; in the absence of dollar liquidity, the fire sale of dollar assets by European banks in a desperate attempt to cover their loans would have exacerbated the market crash. But it's remarkable in its extent, and in how deeply this contradicts the later public political position that 2008 was an American recession caused by American banks. 52% of the mortgage-backed securities purchased by the Federal Reserve in its quantitative easing policies (popularly known as QE1, QE2, and QE3) were sold by foreign banks. Deutsche Bank and Credit Suisse unloaded more securities on the Fed than any American bank by a significant margin. And when that wasn't enough, the Fed went farther and extended swap lines to major national banks, providing them dollar liquidity that they could then pass along to their local institutions. In essence, in Tooze's telling, the US Federal Reserve became the reserve bank for the entire world, preventing a currency crisis by providing dollars to financial systems both foreign and domestic, and it did so with a remarkable lack of scrutiny. Its swap lines avoided public review until 2010, when Bloomberg won a court fight to extract the records. That allowed the European banks that benefited to hide the extent of their exposure.
In Europe, the bullish CEOs of Deutsche Bank and Barclays claimed exceptional status because they avoided taking aid from their national governments. What the Fed data reveal is the hollowness of those boasts. The banks might have avoided state-sponsored recapitalization, but every major bank in the entire world was taking liquidity assistance on a grand scale from its local central bank, and either directly or indirectly by way of the swap lines from the Fed.
The emergency steps taken by Timothy Geithner in the Treasury Department were nearly as dramatic as those of the Federal Reserve. Without regard for borders, and pushing the boundary of their legal authority, they intervened massively in the world (not just the US) economy to save the banking and international finance system. And it worked. One of the benefits of a good history is to turn stories about heroes and villains into more nuanced information about motives and philosophies. I came away from Sheila Bair's account of the crisis furious at Geithner's protection of banks from any meaningful consequences for their greed. Tooze's account, and analysis, agrees with Bair in many respects, but Bair was continuing a personal fight and Tooze has more space to put Geithner into context. That context tells an interesting story about the shape of political economics in the 21st century. Tooze identifies Geithner as an institutionalist. His goal was to keep the system running, and he was acutely aware of what would happen if it failed. He therefore focused on the pragmatic and the practical: the financial system was about to collapse, he did whatever was necessary to keep it working, and that effort was successful. Fairness, fault, and morals were treated as irrelevant. This becomes more obvious when contrasted with the eurozone crisis, which started with a Greek debt crisis in the wake of the recession triggered by the 2008 crisis. Greece is tiny by the standards of the European economy, so at first glance there is no obvious reason why its debt crisis should have perturbed the financial system. Under normal circumstances, its lenders should have been able to absorb such relatively modest losses. But the immediate aftermath of the 2008 crisis was not normal circumstances, particularly in Europe. The United States had moved aggressively to recapitalize its banks using the threat of compensation caps and government review of their decisions. The European Union had not; European countries had done very little, and their banks were still in a fragile state. Worse, the European Central Bank had sent signals that the market interpreted as guaranteeing the safety of all European sovereign debt equally, even though this was explicitly ruled out by the Lisbon Treaty. If Greece defaulted on its debt, not only would that be another shock to already-precarious banks, it would indicate to the market that all European debt was not equal and other countries may also be allowed to default. As the shape of the Greek crisis became clearer, the cost of borrowing for all of the economically weaker European countries began rising towards unsustainable levels. In contrast to the approach taken by the United States government, though, Europe took a moralistic approach to the crisis. Jean-Claude Trichet, then president of the European Central Bank, held the absolute position that defaulting on or renegotiating the Greek debt was unthinkable and would not be permitted, even though there was no realistic possibility that Greece would be able to repay. He also took a conservative hard line on the role of the ECB, arguing that it could not assist in this crisis. (Tooze is absolutely scathing towards Trichet, who comes off in this account as rigidly inflexible, volatile, and completely irrational.) Germany's position, represented by Angela Merkel, was far more realistic: Greece's debt should be renegotiated and the creditors would have to accept losses. This is, in Tooze's account, clearly correct, and indeed is what eventually happened. But the problem with Merkel's position was the potential fallout. The German government was still in denial about the health of its own banks, and political opinion, particularly in Merkel's coalition, was strongly opposed to making German taxpayers responsible for other people's debts. Stopping the progression of a Greek default to a loss of confidence in other European countries would require backstopping European sovereign debt, and Merkel was not willing to support this. Tooze is similarly scathing towards Merkel, but I'm not sure it's warranted by his own account. She seemed, even in his account, boxed in by domestic politics and the tight constraints of the European political structure. Regardless, even after Trichet's term ended and he was replaced by the far more pragmatic Mario Draghi, Germany and Merkel continued to block effective action to relieve Greece's debt burden. As a result, the crisis lurched from inadequate stopgap to inadequate stopgap, forcing crippling austerity, deep depressions, and continued market instability while pretending unsustainable debt would magically become payable through sufficient tax increases and spending cuts. US officials such as Geithner, who put morals and arguably legality aside to do whatever was needed to save the system, were aghast. One takeaway from this is that expansionary austerity is the single worst macroeconomic idea that anyone has ever had.
In the summer of 2012 [the IMF's] staff revisited the forecasts they had made in the spring of 2010 as the eurozone crisis began and discovered that they had systematically underestimated the negative impact of budget cuts. Whereas they had started the crisis believing that the multiplier was on average around 0.5, they now concluded that from 2010 forward it had been in excess of 1. This meant that cutting government spending by 1 euro, as the austerity programs demanded, would reduce economic activity by more than 1 euro. So the share of the state in economic activity actually increased rather than decreased, as the programs presupposed. It was a staggering admission. Bad economics and faulty empirical assumptions had led the IMF to advocate a policy that destroyed the economic prospects for a generation of young people in Southern Europe.
Another takeaway, though, is central to Tooze's point in the final section of the book: the institutionalists in the United States won the war on financial collapse via massive state interventions to support banks and the financial system, a model that Europe grudgingly had to follow when attempting to reject it caused vast suffering while still failing to stabilize the financial system. But both did so via actions that were profoundly and obviously unfair, and only questionably legal. Bankers suffered few consequences for their greed and systematic mismanagement, taking home their normal round of bonuses while millions of people lost their homes and unemployment rates for young men in some European countries exceeded 50%. In Europe, the troika's political pressure against Greece and Italy was profoundly anti-democratic. The financial elite achieved their goal of saving the financial system. It could have failed, that failure would have been catastrophic, and their actions are defensible on pragmatic grounds. But they completely abandoned the moral high ground in the process. The political forces opposed to centrist neoliberalism attempted to step into that moral gap. On the Left, that came in the form of mass protest movements, Occupy Wall Street, Bernie Sanders, and parties such as Syriza in Greece. The Left, broadly, took the moral side of debtors, holding that the primary pain of the crisis should instead be born by the wealthy creditors who were more able to absorb it. The Right by contrast, in the form of the Tea Party movement inside the Republican Party in the United States and the nationalist parties in Europe, broadly blamed debtors for taking on excessive debt and focused their opposition on use of taxpayer dollars to bail out investment banks and other institutions of the rich. Tooze correctly points out that the Right's embrace of racist nationalism and incoherent demagoguery obscures the fact that their criticism of the elite center has real merit and is partly shared by the Left. As Tooze sketches out, the elite centrist consensus held in most of Europe, beating back challenges from both the Left and the Right, although it faltered in the UK, Poland, and Hungary. In the United States, the Democratic Party similarly solidified around neoliberalism and saw off its challenges from the Left. The Republican Party, however, essentially abandoned the centrist position, embracing the Right. That left the Democratic Party as the sole remaining neoliberal institutionalist party, supplemented by a handful of embattled Republican centrists. Wall Street and its money swung to the Democratic Party, but it was deeply unpopular on both the Left and the Right and this shift may have hurt them more than helped. The Democrats, by not abandoning the center, bore the brunt of the residual anger over the bank bailout and subsequent deep recession. Tooze sees in that part of the explanation for Trump's electoral victory over Hilary Clinton. This review is already much too long, and I haven't even mentioned Tooze's clear explanation of the centrality of treasury bonds to world finances, or his discussions of Russian and Ukraine, China, or Brexit, all of which I thought were excellent. This is not only an comprehensive history of both of the crises and international politics of the time period. It is also a thought-provoking look at how drastic of interventions are required to keep the supposed free market working, who is left to suffer after those interventions, and the political consequences of the choice to prioritize the stability of a deeply inequitable and unsafe financial system. At least in the United States, there is now a major political party that is likely to oppose even mundane international financial institutions, let alone another major intervention. The neoliberal center is profoundly weakened. But nothing has been done to untangle the international financial system, and little has been done to reduce its risk. The world will go into the next financial challenge still suffering from a legitimacy crisis. Given the miserly, condescending, and dismissive treatment of the suffering general populace after moving heaven and earth to save the banking system, that legitimacy crisis is arguably justified, but an uncontrolled crash of the financial system is not likely to be any kinder to the average citizen than it is to the investment bankers. Crashed is not the best-written book at a sentence-by-sentence level. Tooze's prose is choppy and a bit awkward, and his paragraphs occasionally wander away from a clear point. But the content is excellent and thought-provoking, filling in large sections of the crisis picture that I had not previously been aware of and making a persuasive argument for its continuing effects on current politics. Recommended if you're not tired of reading about financial crises. Rating: 8 out of 10

29 December 2021

Chris Lamb: Favourite books of 2021: Memoir/biography

Just as I did for 2020, I won't publically disclose exactly how many books I read in 2021, but they evidently provoked enough thoughts that felt it worth splitting my yearly writeup into separate posts. I will reveal, however, that I got through more books than the previous year, and, like before, I enjoyed the books I read this year even more in comparison as well. How much of this is due to refining my own preferences over time, and how much can be ascribed to feeling less pressure to read particular books? It s impossible to say, and the question is complicated further by the fact I found many of the classics I read well worth of their entry into the dreaded canon. But enough of the throat-clearing. In today's post I'll be looking at my favourite books filed under memoir and biography, in no particular order. Books that just missed the cut here include: Bernard Crick's celebrated 1980 biography of George Orwell, if nothing else because it was a pleasure to read; Hilary Mantel's exhilaratingly bitter early memoir, Giving up the Ghost (2003); and Patricia Lockwood's hilarious Priestdaddy (2017). I also had a soft spot for Tim Kreider's We Learn Nothing (2012) as well, despite not knowing anything about the author in advance, likely a sign of good writing. The strangest book in this category I read was definitely Michelle Zauner's Crying in H Mart. Based on a highly-recommended 2018 essay in the New Yorker, its rich broth of genuine yearning for a departed mother made my eyebrows raise numerous times when I encountered inadvertent extra details about Zauner's relationships.

Beethoven: A Life in Nine Pieces (2020) Laura Tunbridge Whilst it might immediately present itself as a clickbait conceit, organising an overarching narrative around just nine compositions by Beethoven turns out to be an elegant way of saying something fresh about this grizzled old bear. Some of Beethoven's most famous compositions are naturally included in the nine (eg. the Eroica and the Hammerklavier piano sonata), but the book raises itself above conventional Beethoven fare when it highlights, for instance, his Septet, Op. 20, an early work that is virtually nobody's favourite Beethoven piece today. The insight here is that it was widely popular in its time, played again and again around Vienna for the rest of his life. No doubt many contemporary authors can relate to this inability to escape being artistically haunted by an earlier runaway success. The easiest way to say something interesting about Beethoven in the twenty-first century is to talk about the myth of Beethoven instead. Or, as Tunbridge implies, perhaps that should really be 'Beethoven' in leaden quotation marks, given so much about what we think we know about the man is a quasi-fictional construction. Take Anton Schindler, Beethoven's first biographer and occasional amanuensis, who destroyed and fabricated details about Beethoven's life, casting himself in a favourable light and exaggerating his influence with the composer. Only a few decades later, the idea of a 'heroic' German was to be politically useful as well; the Anglosphere often need reminding that Germany did not exist as a nation-state prior to 1871, so it should be unsurprising to us that the late nineteenth-century saw a determined attempt to create a uniquely 'German' culture ex nihilo. (And the less we say about Immortal Beloved the better, even though I treasure that film.) Nevertheless, Tunbridge cuts through Beethoven's substantial legacy using surgical precision that not only avoids feeling like it is settling a score, but it also does so in a way that is unlikely to completely alienate anyone emotionally dedicated to some already-established idea of the man to bring forth the tediously predictable sentiment that Beethoven has 'gone woke'. With Alex Ross on the cult of Wagner, it seems that books about the 'myth of X' are somewhat in vogue right now. And this pattern within classical music might fit into some broader trend of deconstruction in popular non-fiction too, especially when we consider the numerous contemporary books on the long hangover of the Civil Rights era (Robin DiAngelo's White Fragility, etc.), the multifarious ghosts of Empire (Akala's Natives, Sathnam Sanghera's Empireland, etc.) or even the 'transmogrification' of George Orwell into myth. But regardless of its place in some wider canon, A Life in Nine Pieces is beautifully printed in hardback form (worth acquiring for that very reason alone), and it is one of the rare good books about classical music that can be recommended to both the connoisseur and the layperson alike.

Sea State (2021) Tabitha Lasley In her mid-30s and jerking herself out of a terrible relationship, Tabitha Lasley left London and put all her savings into a six-month lease on a flat within a questionable neighbourhood in Aberdeen, Scotland. She left to make good on a lukewarm idea for a book about oil rigs and the kinds of men who work on them: I wanted to see what men were like with no women around, she claims. The result is Sea State, a forthright examination of the life of North Sea oil riggers, and an unsparing portrayal of loneliness, masculinity, female desire and the decline of industry in Britain. (It might almost be said that Sea State is an update of a sort to George Orwell's visit to the mines in the North of England.) As bracing as the North Sea air, Sea State spoke to me on multiple levels but I found it additionally interesting to compare and contrast with Julian Barnes' The Man with Red Coat (see below). Women writers are rarely thought to be using fiction for higher purposes: it is assumed that, unlike men, whatever women commit to paper is confessional without any hint of artfulness. Indeed, it seems to me that the reaction against the decades-old genre of autofiction only really took hold when it became the domain of millennial women. (By contrast, as a 75-year-old male writer with a firmly established reputation in the literary establishment, Julian Barnes is allowed wide latitude in what he does with his sources and his writing can be imbued with supremely confident airs as a result.) Furthermore, women are rarely allowed metaphor or exaggeration for dramatic effect, and they certainly aren t permitted to emphasise darker parts in order to explore them... hence some of the transgressive gratification of reading Sea State. Sea State is admittedly not a work of autofiction, but the sense that you are reading about an author writing a book is pleasantly unavoidable throughout. It frequently returns to the topic of oil workers who live multiple lives, and Lasley admits to living two lives herself: she may be in love but she's also on assignment, and a lot of the pleasure in this candid and remarkably accessible book lies in the way these states become slowly inseparable.

Twilight of Democracy (2020) Anne Applebaum For the uninitiated, Anne Applebaum is a staff writer for The Atlantic magazine who won a Pulitzer-prize for her 2004 book on the Soviet Gulag system. Her latest book, however, Twilight of Democracy is part memoir and part political analysis and discusses the democratic decline and the rise of right-wing populism. This, according to Applebaum, displays distinctly authoritarian tendencies, and who am I to disagree? Applebaum does this through three main case studies (Poland, the United Kingdom and the United States), but the book also touches on Hungary as well. The strongest feature of this engaging book is that Appelbaum's analysis focuses on the intellectual classes and how they provide significant justification for a descent into authoritarianism. This is always an important point to be remembered, especially as much of the folk understanding of the rise of authoritarian regimes tends to place exaggerated responsibility on the ordinary and everyday citizen: the blame placed on the working-class in the Weimar Republic or the scorn heaped upon 'white trash' of the contemporary Rust Belt, for example. Applebaum is uniquely poised to discuss these intellectuals because, well, she actually knows a lot of them personally. Or at least, she used to know them. Indeed, the narrative of the book revolves around two parties she hosted, both in the same house in northwest Poland. The first party, on 31 December 1999, was attended by friends from around the Western world, but most of the guests were Poles from the broad anti-communist alliance. They all agreed about democracy, the rule of law and the route to prosperity whilst toasting in the new millennium. (I found it amusing to realise that War and Peace also starts with a party.) But nearly two decades later, many of the attendees have ended up as supporters of the problematic 'Law and Justice' party which currently governs the country. Applebaum would now cross the road to avoid them, and they would do the same to her, let alone behave themselves at a cordial reception. The result of this autobiographical detail is that by personalising the argument, Applebaum avoids the trap of making too much of high-minded abstract argument for 'democracy', and additionally makes her book compellingly spicy too. Yet the strongest part of this book is also its weakest. By individualising the argument, it often feels that Applebaum is settling a number of personal scores. She might be very well justified in doing this, but at times it feels like the reader has walked in halfway through some personal argument and is being asked to judge who is in the right. Furthermore, Applebaum's account of contemporary British politics sometimes deviates into the cartoonish: nothing was egregiously incorrect in any of her summations, but her explanation of the Brexit referendum result didn't read as completely sound. Nevertheless, this lively and entertaining book that can be read with profit, even if you disagree with significant portions of it, and its highly-personal approach makes it a refreshing change from similar contemporary political analysis (eg. David Runciman's How Democracy Ends) which reaches for that more 'objective' line.

The Man in the Red Coat (2019) Julian Barnes As rich as the eponymous red coat that adorns his cover, Julian Barnes quasi-biography of French gynaecologist Samuel-Jean Pozzi (1846 1918) is at once illuminating, perplexing and downright hilarious. Yet even that short description is rather misleading, for this book evades classification all manner number of ways. For instance, it is unclear that, with the biographer's narrative voice so obviously manifest, it is even a biography in the useful sense of the word. After all, doesn't the implied pact between author and reader require the biographer to at least pretend that they are hiding from the reader? Perhaps this is just what happens when an author of very fine fiction turns his hand to non-fiction history, and, if so, it represents a deeper incursion into enemy territory after his 1984 metafictional Flaubert's Parrot. Indeed, upon encountering an intriguing mystery in Pozzi's life crying out for a solution, Barnes baldly turns to the reader, winks and states: These matters could, of course, be solved in a novel. Well, quite. Perhaps Barnes' broader point is that, given that's impossible for the author to completely melt into air, why not simply put down your cards and have a bit of fun whilst you're at it? If there's any biography that makes the case for a rambling and lightly polemical treatment, then it is this one. Speaking of having fun, however, two qualities you do not expect in a typical biography is simply how witty they can be, as well as it having something of the whiff of the thriller about it. A bullet might be mentioned in an early chapter, but given the name and history of Monsieur Pozzi is not widely known, one is unlikely to learn how he lived his final years until the closing chapters. (Or what happened to that turtle.) Humour is primarily incorporated into the book in two main ways: first, by explicitly citing the various wits of the day ( What is a vice? Merely a taste you don t share. etc.), but perhaps more powerful is the gentle ironies, bon mots and observations in Barnes' entirely unflappable prose style, along with the satire implicit in him writing this moreish pseudo-biography to begin with. The opening page, with its steadfast refusal to even choose where to begin, is somewhat characteristic of Barnes' method, so if you don't enjoy the first few pages then you are unlikely to like the rest. (Indeed, the whole enterprise may be something of an acquired taste. Like Campari.) For me, though, I was left wryly grinning and often couldn't wait to turn the page. Indeed, at times it reminded me of a being at a dinner party with an extremely charming guest at the very peak of his form as a wit and raconteur, delighting the party with his rambling yet well-informed discursive on his topic de jour. A significant book, and a book of significance.

28 December 2021

Russ Allbery: Review: Out of Office

Review: Out of Office, by Charlie Warzel & Anne Helen Petersen
Publisher: Alfred A. Knopf
Copyright: 2021
ISBN: 0-593-32010-7
Format: Kindle
Pages: 260
Out of Office opens with the provocative assertion that you were not working from home during the pandemic, even if you were among the 42% of Americans who were able to work remotely.
You were, quite literally, doing your job from home. But you weren't working from home. You were laboring in confinement and under duress. Others have described it as living at work. You were frantically tapping out an email while trying to make lunch and supervise distance learning. You were stuck alone in a cramped apartment for weeks, unable to see friends or family, exhausted, and managing a level of stress you didn't know was possible. Work became life, and life became work. You weren't thriving. You were surviving.
The stated goal of this book is to reclaim the concept of working from home, not only from the pandemic, but also from the boundary-destroying metastasis of work into non-work life. It does work towards that goal, but the description of what would be required for working from home to live up to its promise becomes a sweeping critique of the organization and conception of work, leaving it nearly as applicable to those who continue working from an office. Turns out that the main problem with working from home is the work part, not the "from home" part. This was a fascinating book to read in conjunction with A World Without Email. Warzel and Petersen do the the structural and political analysis that I sometimes wish Newport would do more of, but as a result offer less concrete advice. Both, however, have similar diagnoses of the core problems of the sort of modern office work that could be done from home: it's poorly organized, poorly managed, and desperately inefficient. Rather than attempting to fix those problems, which is difficult, structural, and requires thought and institutional cooperation, we're compensating by working more. This both doesn't work and isn't sustainable. Newport has a background in productivity books and a love of systems and protocols, so his focus in A World Without Email is on building better systems of communication and organization of work. Warzel and Petersen come from a background of reporting and cultural critique, so they put more focus on power imbalances and power-serving myths about the American dream. Where Newport sees an easy-to-deploy ad hoc work style that isn't fit for purpose, Warzel and Petersen are more willing to point out intentional exploitation of workers in the guise of flexibility. But they arrive at some similar conclusions. The way office work is organized is not leading to more productivity. Tools like Slack encourage the public performance of apparent productivity at the cost of the attention and focus required to do meaningful work. And the process is making us miserable. Out of Office is, in part, a discussion of what would be required to do better work with less stress, but it also shares a goal with Newport and some (but not most) corners of productivity writing: spend less time and energy on work. The goal of Out of Office is not to get more work done. It's to work more efficiently and sustainably and thus work less. To reclaim the promise of flexibility so that it benefits the employee and not the employer. To recognize, in the authors' words, that the office can be a bully, locking people in to commute schedules and unnatural work patterns, although it also provides valuable moments of spontaneous human connection. Out of Office tries to envision a style of work that includes the office sometimes, home sometimes, time during the day to attend to personal chores or simply to take a mental break from an unnatural eight hours (or more) of continuous focus, universal design, real worker-centric flexibility, and an end to the constant productivity ratchet where faster work simply means more work for the same pay. That's a lot of topics for a short book, and structurally this is a grab bag. Some sections will land and some won't. Loom's video messages sound like a nightmare to me, and I rolled my eyes heavily at the VR boosterism, reluctant as it may be. The section on DEI (diversity, equity, and inclusion) was a valiant effort that at least gestures towards the dismal track record of most such efforts, but still left me unconvinced that anyone knows how to improve diversity in an existing organization without far more brute-force approaches than anyone with power is usually willing to consider. But there's enough here, and the authors move through topics quickly enough, that a section that isn't working for you will soon be over. And some of the sections that do work are great. For example, the whole discussion of management.
Many of these companies view middle management as bloat, waste, what David Graeber would call a "bullshit job." But that's because bad management is a waste; you're paying someone more money to essentially annoy everyone around them. And the more people experience that sort of bad management, and think of it as "just the way it is," the less they're going to value management in general.
I admit to a lot of confirmation bias here, since I've been ranting about this for years, but management must be the most wide-spread professional job for which we ignore both training and capability and assume that anyone who can do any type of useful work can also manage people doing that work. It's simply not true, it creates workplaces full of horrible management, and that in turn creates a deep and unhelpful cynicism about all management. There is still a tendency on the left to frame this problem in terms of class struggle, on the reasonable grounds that for decades under "scientific management" of manufacturing that's what it was. Managers were there to overwork workers and extract more profits for the owners, and labor unions were there to fight back against managers. But while some of this does happen in the sort of office work this book is focused on, I think Warzel and Petersen correctly point to a different cause.
"The reason she was underpaid on the team was not because her boss was cackling in the corner. It was because nobody told the boss it was their responsibility to look at the fucking spreadsheet."
We don't train managers, we have no clear expectations for what managers should do, we don't meaningfully measure their performance, we accept a high-overhead and high-chaos workstyle based on ad hoc one-to-one communication that de-emphasizes management, and many managers have never seen good management and therefore have no idea what they're supposed to be doing. The management problem for many office workers is less malicious management than incompetent management, or simply no effective management at all apart from an occasional reorg and a complicated and mind-numbing annual review form. The last section of this book (apart from concluding letters to bosses and workers) is on community, and more specifically on extracting time and energy from work (via the roadmap in previous chapters) and instead investing it in the people around you. Much ink has been spilled about the collapse of American civic life, about how we went from a nation of joiners to a nation of isolated individual workers with weak and failing community institutions. Warzel and Petersen correctly lay some blame for this at the foot of work, and see the reorganization of work and an increase in work from home (and thus a decrease in commutes) as an opportunity to reverse that trend. David Brooks recently filled in for Ezra Klein on his podcast and talked with University of Chicago professor Leon Kass, which I listened to shortly after reading this book. In one segment, they talked about marriage and complained about the decline in marriage rates. They were looking for causes in people's moral upbringing, in their life priorities, in the lack of aspiration for permanence in kids these days, and in any other personal or moral failing that would allow them to be smugly judgmental. It was a truly remarkable thing to witness. Neither man at any point in the conversation mentioned either money or time. Back in the world most Americans live in, real wages have been stagnant for decades, student loan debt is skyrocketing as people desperately try to keep up with the ever-shifting requirements for a halfway-decent job, and work has expanded to fill all hours of the day, even for people who don't have to work multiple jobs to make ends meet. Employers have fully embraced a "flexible" workforce via layoffs, micro-optimizing work scheduling, eliminating benefits, relying on contract and gig labor, and embracing exceptional levels of employee turnover. The American worker has far less of money, time, and stability, three important foundations for marriage and family as well as participation in most other civic institutions. People like Brooks and Kass stubbornly cling to their feelings of moral superiority instead of seeing a resource crisis. Work has stolen the resources that people previously put into those other areas of their life. And it's not even using those resources effectively. That's, in a way, a restatement of the topic of this book. Our current way of organizing work is not sustainable, healthy, or wise. Working from home may be part of a strategy for changing it. The pandemic has already heavily disrupted work, and some of those changes, including increased working from home, seem likely to stick. That provides a narrow opportunity to renegotiate our arrangement with work and try to make those changes stick. I largely agree with the analysis, but I'm pessimistic. I think the authors are as well. We're very bad at social change, and there will be immense pressure for everything to go "back to normal." Those in the best bargaining position to renegotiate work for themselves are not in the habit of sharing that renegotiation with anyone else. But I'm somewhat heartened by how much public discussion there currently is about a more fundamental renegotiation of the rules of office work. I'm also reminded of a deceptively profound aphorism from economist Herbert Stein: "If something cannot go on forever, it will stop." This book is a bit uneven and is more of a collection of related thoughts than a cohesive argument, but if you are hungry for more worker-centric analyses of the dynamics of office work (inside or outside the office), I think it's worth reading. Rating: 7 out of 10

12 December 2021

Andrej Shadura: Coffee gear upgrade

Two weeks ago I decided to make myself a combined birthday and Christmas present and upgrade my coffee gear. I ve got my first espresso machine back in 2013, it was a cheap Saeco Philips Poemia, which made reasonably drinkable coffee, but not being able to make good coffee made me increasingly unhappy about it. However, since it worked, I wasn t motivated enough to change anything until it stopped working. One day the nut holding the shower screen broke, and I couldn t replace it. Having no coffee machine is arguably worse than having a mediocre one, so I started looking for a new one in the budget range. Having spent about two months reading reviews for all sorts of manual espresso machines, I realised the best thing I can probably do for the money I was willing to spend at the time was to buy a second-hand Gaggia Classic. Which is what I did: I paid 260 to a person who apparently decided they prefer to press a button to get their espresso rather than have to prepare it themselves. My first attempts at making espresso weren t very much successful, as my hand grinder couldn t produce the right grind for espresso (without using pressurised baskets), so I quickly upgraded it to the 50 De Longhi electric grinder, which was much better for espresso.
Gaggia Classic 2015 and De Longhi grinderGaggia Classic 2015 and De Longhi grinder
This setup has worked for me for nearly 4 years, but over the time the Gaggia started malfunctioning. See, this particular Gaggia Classic is the 2015 model, which resulted in the overhaul of the design after Gaggia was acquired by Philips. They replaced the boiler, changed the exterior design a bit, and importantly for me replaced the fully metal group head with the metal and plastic version typically found in cheap espresso machines like my old Poemia.
The Gaggia Classic 2015 group headThe Gaggia Classic 2015 group head
The trouble with this one is that the plastic bit (barely seen on the picture, but it s inserted into the notches on the sides of the group head) is that it gets damaged over the time, especially when the portafilter is inserted very tightly. The more damaged it gets, the tigher it is necessary to insert the portafilter to avoid leakage, the more damaged it gets and so on. At one point, the Gaggia was leaking water every time I was making coffee, affecting the quality of the brew and making a mess in the kitchen. I made a mistake and removed the plastic bit only to realise it cannot be purchased separately and nobody knows how to put it back once it s been removed; I ended up paying more than a hundred euro to replace the group head as a whole. Once I ve got the Gaggia back, I became too conscious of the potential damage I can make by overengaging the portafilter, I decided it s probably the time to get a new coffee machine.
Gaggia Classic 2015 vs 2019Gaggia Classic 2015 vs 2019
The makers of Gaggia listened to the critics and undid the 2015 changes to the Gaggia Classic design, reverting to the previous one and fixing it they basically merged the fixes many of the owners of the old Gaggia did themselves. The group head is now without any plastic, so I don t have to worry that much about damaging it accidentally. A friend pointed out that my grinder is probably not good enough and recommended a couple of models to me; I checked Kev s Coffee Blog and found a grinder, Sage Dose Control Pro, which was available on sale in my local shop for a reasonable price. I ve also got a portafilter holder to make tamping more comfortable I used to tamp against an edge of the sink:
The final setup:
Gaggia Classic 2019 and Sage BCG600 Dose Control ProGaggia Classic 2019 and Sage BCG600 Dose Control Pro
What I learnt from this is that the grinder does indeed make a huge difference. I am now able to consistently produce brews I would only occasionally get with the old De Longhi grinder. There is one downside to the new grinder. Grinder review at Alza.sk Happy with my new purchase, I went to read this review and thought to myself: lucky me, my grinder is absolutely quiet! And then I realised that the noise in my kitchen is not, in fact, produced by the fridge, but the grinder. Well, a cheap switched plug solved the issue completely (I wish sockets here each had a switch like they usually to in the UK!) Switched plug

11 December 2021

Neil Williams: Diversity and gender

As a follow on to a previous blog entry of mine, Free and Open, I feel it worthwhile to do my bit to dismantle the pseudo-science and over simplification in the idea that gender is binary at a biological level.
TL;DR: Science simply does not support binary sexes or binary genders. Truth is a bit more complicated.
There is certainty and there are binary answers in mathematics. Things get less definitive in physics, certainly as soon as quantum is broached. Processes become more of an equilibrium between states in chemistry, never wholly one or the other. Yes, there is the oddity of absolute zero but no experiment has yet achieved that fully. It is accurate to describe physics as a development of applied mathematics and to view chemistry as applied physics. Biology, at the biochemical level, is applied chemistry. The sciences build on each other, "on the shoulders of giants", but at each level, some certainty is lost, some amount of uncertainty is expanded and measurements become probabilities, proportions and percentages. Biology is dependent on biochemistry - chemistry is how a biological change results in a different organism. Physics is how that chemical change occurs - temperature, pressure and physical states are inherent to all chemical changes. Outside laboratory constraints, few chemical reactions, especially in organic chemistry, produce one and only one result from two or more known reagents. In biology, everyone is familiar with genetic mutations but a genetic mutation only happens because a biochemical reaction (hydrogen bonding of nucleobases) does not always produce the expected result. Every cell division, every viral infection, there is a finite probability that a change will occur. It might be a small number but it is never zero and can never be dismissed. This is obvious in the current Covid pandemic - genetic mutations result in new variants. Some variants are inviable, some variants produce no net change in the way that the viral particles infect adjacent cells. Sometimes, a mutation happens that changes everything. These mutations are not mistakes - these are simply changes with undetermined outcomes. Genetic changes are the foundation of biodiversity and variety is what allows lifeforms of all kinds to survive changes in environmental factors and/or changes in prevalent diseases. It is precisely the same in humans, particularly in one of the principle spheres of human life that involves replicating genetic material - the creation of gametes for sexual reproduction. Every single time any DNA is copied, there is a finite chance that a different base will be put in place compared to the original. Copying genetic material is therefore non-binary. Given precisely the same initial conditions, the result is not always predictable and the range of how the results vary from one to another increases with every iteration. Let me stress that - at the molecular level, no genetic operation in any biological lifeform has a truly binary result. Repeat that operation sufficiently often and an unexpected result WILL inevitably occur. It is a mathematical certainty that genetic changes will arise by attempting precisely the same genetic operation enough times. Genetic changes are fundamental to how lifeforms survive changing conditions. Life would likely have died out a long time ago on this planet if every genetic operation was perfect. Diversity is life. Similarity leads to extinction. Viral load is interesting at this point. Someone can be infected with a virus, including coronavirus, by encountering a small number of viral particles. Some viruses, it may be a few hundred, some viruses may need a few thousand particles to infect a vulnerable host. But here's the thing, for that host to be at risk of infecting another host, the virus needs the host to produce billions upon billions of copies of the virus by taking over the genetic machinery within a huge number of cells in the host. This, as is accepted with Covid, is before the virus has been copied enough times to produce symptoms in the host. Before those symptoms become serious, billions more copies will be made. The numbers become unimaginable - and that is within a single host, let alone the 265 million (and counting) hosts in the current Covid19 pandemic. It's also no wonder that viral infections cause tiredness, the infection is diverting huge resources to propagating itself - before even considering the activity of the immune system. It is idiocy of the highest order to expect all those copies to be identical. The rise of variants is inevitable - indeed essential - in all spheres of biology. A single viral particle is absolutely no threat of any kind - it must first get inside and then copy the genetic information in a host cell. This is where the complexity lies in the definition of life itself. A virus can be considered a lifeform but it is only able to reproduce using another, more complex, lifeform. In truth, a viral particle does not and cannot mutate. The infected host mutates the virus. The longer it takes that host to clear the infection, the more mutations that host will create and then potentially spread to others. Now apply this to the creation of gametes in humans. With seven billion humans, the amount of copying of genetic material is not as large as the pandemic but it is still easy for everyone to understand that children do not merely combine the DNA of both parents. Changes happen. Human sexual reproduction is not as simple as 1 + 1 = 2. Sometimes, the copying of the genetic material produces an unexpected result. Sexual reproduction itself is non-binary. Sexual reproduction is not easy or simple for lifeforms to adopt - the diversity which results from the non-binary operations are exactly why so many lifeforms invest so much energy in reproducing in this way. Whilst many genetic changes in humans will be benign or beneficial, I d like to take an example of a genetic disorder that results from the non-binary nature of sex. Humans can be born with the XY phenotype - i.e. at a genetic level, the individual has the same combination of chromosomes as another XY individual but there are changes within the genes in those chromosomes. We accept this, some children of blonde parents do not have blonde hair, etc. There are also genetic changes where an XY phenotype is not binary. Some people, who at a genetic level would be almost identical to another person who is genetically male, have a genetic mutation which makes it impossible for the cells of that individual to respond to androgens (testosterone). (See Androgen insensitivity syndrome). Genetically, that individual has an X and a Y chromosome, just like many other individuals. However, due to a change in how the genes on those chromosomes were copied, that individual is biologically incapable of constructing the secondary sexual characteristics of a male. At a genetic level, the individual has the XY phenotype of a male. At the physical level, the individual has all the sexual characteristics of a female and none of the sexual characteristics of a male. The gender of that individual is not binary. Treatment is centred on supporting the individual and minimising some risks from the inactive genes on the Y chromosome. Human sexual reproduction is non-binary. The results of any sexual reproduction in humans will not always produce the binary option of male or female. It is a lie to claim that human gender is binary. The science is in plain view and cannot be ignored. Identifying as non-binary is not a "cop out" - it can be a biological, genetic, scientific fact. Human sexuality and gender are malleable. Where genetic changes result in symptoms, these can be ameliorated by treatment with human sex hormones, like oestrogen and testosterone. There are valid medical uses for anabolic steroids and hormone replacement therapies to help individuals who, at a genetic level, have non-binary gender. These treatments can help align the physical outer signs with the personality and identity of the individual, whether with or without surgery. It is unacceptable to abandon such people to suffer life long discrimination and harassment by imposing a binary definition that has no basis in science. When a human being has an XY phenotype, that human being is not necessarily male. That individual will be on a spectrum from female (left unaffected by sex hormones in the womb, the foetus will be female, even with an X and a Y chromosome), to various degrees of male. So, at a genetic, biological level, it is a scientific fact that human beings do not have binary gender. There is no evidence that this is new to the modern era, there is no scientific basis for thinking that copying of genetic material was somehow perfectly reliable in earlier history, or that such mutations are specific to homo sapiens. Changes in genetic material provide the diversity to fight infections and adapt to changing environmental factors. Species have and will continue to go extinct if this diversity is absent. With that out of the way, it is no longer a stretch to encompass other aspects of human non-binary genders beyond the known genetic syndromes based on changes in the XY phenotype. Science has not uncovered all of the ways that genes affect personality, behaviour, or identity. How other, less studied, genetic changes affect the much more subtle human facets, especially anything to do with consciousness, identity, personality, sexuality and behaviour, is guesswork. All of these facets can and likely are being affected by genetic factors as well as environmental factors in an endless range of permutations. Personality traits are a beautiful and largely unknowable blend of genes and environment. Genetic information has a finite probability of changes at each and every iteration. Environmental factors are more akin to chaos theory. The idea that the results will fit into binary constructs is laughable. Human society puts huge emphasis on societal norms. Individuals who do not fit into those norms suffer discrimination. The norms themselves have evolved over time as a response to various influences on human civilisation but most are not based on science. It is up to all humans in that society to call out discrimination, to call for changes in the accepted norms and support those who are marginalised. It is a precarious balance, one that humans rarely get right, but it must be based on an acceptance that variation is the natural state. Artificial constraints, like binary genders, must be dismantled because human beings and human sexual reproduction are not binary. To those who think, "well it is for 99%", think again about Covid. 99% (or closer to 98%) of infected humans recover without notable after effects. That has still crippled the nations of the globe and humbled all those who tried to deny it. Five million human beings are dead because "most infected people recover". Just because something only affects a proportion of human beings does not invalidate the suffering of those humans and the discrimination that those humans will face. Societal norms are not necessarily correct. Religious and other influences typically obscure and ignore scientific fact and undermine human kindness. The scientific truth of life on this planet is that gender is not binary. The more complex the lifeform, the more factors will affect where on the spectrum any one individual will appear. Just because we do not yet fully understand how genes affect human personality and sexuality, does not invalidate the science that variation is the natural order. My previous blog about diversity is not just about male vs female, one nationality vs another, one ethnicity compared to another. Diversity is diverse. Diversity requires accepting that every facet of humanity is subject to variation. That leads to tension at times, it is inevitable. Tension against societal norms, tension against discrimination, tension around those individuals who would abuse the tolerance of others for their own gratification or from their own ignorance. None of us are perfect, none of us have any of this fully sorted and all of us will make mistakes. Personally, I try to respect those around me. I will use whatever pronouns and other conventions that the person requests, from their perspective and not mine. To do otherwise is to deny the natural order and to deny the science. Celebrate all diversity, it is the very stuff of life. The discussions around (typically female) bathroom facilities often miss the point. The concern is not about individuals who describe themselves as non-binary. The concern is about individuals who are fully certain of their own sexuality and who act as sexual predators for their own gratification. These people are acting out a lie for their own ends. The problem people are the predators, so stop blaming the victims who are just as at risk as anyone else who identifies as female. Maybe the best people to spot such predators are those who are non-binary, who have had to pretend to fit into societal norms. Just as travel can be a good antidote to racism, openness and discussion can be a tool to undermine the lies of sexual predators and reassure those who are justifiably fearful. There can never be a biological binary test of gender, there can never be any scientific justification for binary division of facilities. Humanity itself is not binary, even life itself has blurry borders around comas, suspended animation and locked-in syndrome. Legal definitions of human death vary around the world. The only common thread I have ever found is: Be kind to each other. If you find anything above objectionable, then I can only suggest that you reconsider the science and learn to be kind to your fellow humans. None of us are getting out of this alive. I Think You ll Find It s a Bit More Complicated Than That - Ben Goldacre ISBN 978-0-00-750514-2 https://www.amazon.co.uk/dp/B00HATQA8K/ https://en.wikipedia.org/wiki/Androgen_insensitivity_syndrome https://www.bbc.co.uk/news/world-51235105 https://en.wikipedia.org/wiki/Nucleobase My degree is in pharmaceutical sciences and I practised community and hospital pharmacy for 20 years before moving into programming. I have direct experience of supporting people who were prescribed hormones to transition their physical characteristics to match their personal identity. I had a Christian upbringing but my work showed me that those religious norms were incompatible with being kind to others, so I rejected religion and I now consider myself a secular humanist.

3 October 2021

Louis-Philippe V ronneau: ANC is not for me

Active noise cancellation (ANC) has been all the rage lately in the headphones and in-ear monitors market. It seems after Apple got heavily praised for their AirPods Pro, every somewhat serious electronics manufacturer released their own design incorporating this technology. The first headphones with ANC I remember trying on (in the early 2010s) were the Bose QuietComfort 15. Although the concept did work (they indeed cancelled some sounds), they weren't amazing and did a great job of convincing me ANC was some weird fad for people who flew often. The Sony WH-1000X M3 folded in their case As the years passed, chip size decreased, battery capacity improved and machine learning blossomed truly a perfect storm for the wireless ANC headphones market. I had mostly stayed a sceptic of this tech until recently a kind friend offered to let me try a pair of Sony WH-1000X M3. Having tested them thoroughly, I have to say I'm really tempted to buy them from him, as they truly are fantastic headphones1. They are very light, comfortable, work without a proprietary app and sound very good with the ANC on2 if a little bass-heavy for my taste3. The ANC itself is truly astounding and is leaps and bounds beyond what was available five years ago. It still isn't perfect and doesn't cancel ALL sounds, but transforms the low hum of the subway I find myself sitting in too often these days into a light *swoosh*. When you turn the ANC on, HVAC simply disappears. Most impressive to me is the way they completely cancel the dreaded sound of your footsteps resonating in your headphones when you walk with them. My old pair of Senheiser HD 280 Pro, with aftermarket sheepskin earpads I won't be keeping them though. Whilst I really like what Sony has achieved here, I've grown to understand ANC simply isn't for me. Some of the drawbacks of ANC somewhat bother me: the ear pressure it creates is tolerable, but is an additional energy drain over long periods of time and eventually gives me headaches. I've also found ANC accentuates the motion sickness I suffer from, probably because it messes up with some part of the inner ear balance system. Most of all, I found that it didn't provide noticeable improvements over good passive noise cancellation solutions, at least in terms of how high I have to turn the volume up to hear music or podcasts clearly. The human brain works in mysterious ways and it seems ANC cancelling a class of noises (low hums, constant noises, etc.) makes other noises so much more noticeable. People talking or bursty high pitched noises bothered me much more with ANC on than without. So for now, I'll keep using my trusty Senheiser HD 280 Pro4 at work and good in-ear monitors with Comply foam tips on the go.

  1. This blog post certainly doesn't aim to be a comprehensive review of these headphones. See Zeos' review if you want something more in-depth.
  2. As most ANC headphones, they don't sound as good when used passively through the 3.5mm port, but that's just a testament of how a great job Sony did of tuning the DSP.
  3. Easily fixed using an EQ.
  4. Retrofitted with aftermarket sheepskin earpads, they provide more than 32db of passive noise reduction.

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

29 May 2021

Shirish Agarwal: Planes, Pandemic and Medical Devices I

The Great Electric Airplane Race It took me quite sometime to write as have been depressed about things. Then a few days back saw Nova s The Great Electric Airplane Race. While it was fabulous and a pleasure to see and know that there are more than 200 odd startups who are in the race of making an electric airplane which works and has FAA certification. I was disappointed though that there no coverage of any University projects. From what little I know, almost all advanced materials which U.S. had made has been first researched in mostly Universities and when it is close to fruition then either spin-off as a startup or give to some commercial organization/venture to make it scalable and profitable. If they had, I am sure more people could be convinced to join sciences and engineering in college. I actually do want to come to this as part of both general medicine and vaccine development in U.S. but will come later. The idea that industry works alone should be discouraged, but that perhaps may require another article to articulate why I believe so.

Medical Device Ventilators in India Before the pandemic, probably most didn t know what a ventilator is and was, at least I didn t, although I probably used it during my somewhat brief hospital stay a couple of years ago. It entered into the Indian twitter lexicon more so in the second wave as the number of people who got infected became more and more and the ventilators which were serving them became less and less just due to sheer mismatch of numbers and requirements. Rich countries donated/gifted ventilators to India on which GOI put GST of 28%. Apparently, they are a luxury item, just like my hearing aid. Last week Delhi High Court passed a judgement that imposition of GST should not be on a gift like ventilators or oxygenators. The order can be found here. Even without reading the judgement the shout from the right was judicial activism while after reading it is a good judgement which touches on several points. The first, in itself, stating the dichotomy that if a commercial organization wanted to import a ventilator or an oxygenator the IGST payable is nil while for an individual it is 12%. The State (here State refers to State Government in this case Gujarat Govt.) did reduce the IGST for state from 12% to NIL IGST for federal states but that to till only 30.06.2021. No relief to individuals on that account. The Court also made use of Mr. Arvind Datar, as Amicus Curiae or friend of court. The petitioner, an 85-year-old gentleman who has put it up has put broad assertions under Article 21 (right to live) and the court in its wisdom also added Article 14 which enshrines equality of everyone before law. The Amicus Curiae, as his duty, guided the court into how the IGST law works and shared a brief history of the law and the changes happening before and after it. During his submissions, he also shared the Mega Exemption Notification no. 50/2017 under which several items are there which are exempted from putting IGST. The Amicus Curiae did note that such exemptions were also there before Mega Exemption Notification had come into play. However, DGFT (Directorate General of Foreign Trade) on 30-04-2021 issued notification No. 4/2015-2020 through which oxygenators had been exempted from Custom Duty/BCD (Basic Customs Duty. In another notification on no. 30/2021 dated 01.05.2021 it reduced IGST from 28% to 12% for personal use. If however the oxygenator was procured by a canalizing agency (bodies such as State Trading Corporation of India (STC) or/and Metals and Minerals Corporation (MMTC) and such are defined as canalising agents) then it will be fully exempted from paying any sort of IGST, albeit subject to certain conditions. What the conditions are were not shared in the open court. The Amicus Curiae further observed that it is contrary to practice where both BCD and IGST has been exempted for canalising agents and others, some IGST has to be paid for personal use. To share within the narrow boundaries of the topic, he shared entry no. 607A of General Exemption no.190 where duty and IGST in case of life-saving drugs are zero provided the life-saving drugs imported have been provided by zero cost from an overseas supplier for personal use. He further shared that the oxygen generator would fall in the same entry of 607A as it fulfills all the criteria as shared for life-saving medicines and devices. He also used the help of Drugs and Cosmetics Act 1940 which provides such a relief. The Amicus Curiae further noted that GOI amended its foreign trade policy (2015-2020) via notification no.4/2015-2020, dated 30.04.2021, issued by DGFT where Rakhi and life-saving drugs for personal use has been exempted from BCD till 30-07-2021. No reason not to give the same exemption to oxygenators which fulfill the same thing. The Amicus Curiae, further observes that there are exceptional circumstances provisions as adverted to in sub-section (2) of Section 25 of the Customs Act, whereby Covid-19 which is known and labelled as a pandemic where the distinctions between the two classes of individuals or agencies do not make any sense. While he did make the observation that exemption from duty is not a right, in the light of the pandemic and Article 14, it does not make sense to have distinctions between the two classes of importers. He further shared from Circular no. 9/2014-Customs, dated 19.08.2014 by CBEC (Central Board of Excise and Customs) which gave broad exemptions under Section 25 (2) of the same act in respect of goods and services imported for safety and rehabilitation of people suffering and effected by natural disasters and epidemics. He further submits that the impugned notification is irrational as there is no intelligible differentia rule applied or observed in classifying the import of oxygen concentrators into two categories. One, by the State and its agencies; and the other, by an individual for personal use by way of gift. So there was an absence of adequate determining principle . To bolster his argument, he shared the judgements of

a) Union of India vs. N.S. Rathnam & Sons, (2015) 10 SCC 681 (N.S. Ratnams and Sons Case) b) Shayara Bano vs. Union of India, (2017) 9 SCC 1 (Shayara Bano Case) The Amicus Curiae also rightly observed that the right to life also encompasses within it, the right to health. You cannot have one without the other and within that is the right to have affordable treatment. He further stated that the state does not only have a duty but a positive obligation is cast upon it to ensure that the citizen s health is secured. He again cited Navtej Singh Johars vs Union of India (Navtej Singh Johar Case) in defence of right to life. Mr. Datar also shared that unlike in normal circumstances, it is and should be enough to show distinct and noticeable burdensomeness which is directly attributable to the impugned/questionable tax. The gentleman cited Indian Express Newspapers (Bombay) Private Limited vs. Union of India, (1985) 1 SCC 641 (Indian Express case) 1985 which shared both about Article 19 (1) (a) and Article 21. Bloggers note At this juncture, I should point out which I am sharing the judgement and I would be sharing only the Amicus Curiae POV and then the judge s final observations. While I was reading it, I was stuck by the fact that the Amicus Curiae had cited 4 cases till now, 3 of them are pretty well known both in the legal fraternity and even among public at large. Another 3 which have been shared below which are also of great significance. Hence, felt the need to share the whole judgement. The Amicus Curiae further observed that this tax would have to be disproportionately will have to be paid by the old and the infirm, and they might find it difficult to pay the amounts needed to pay the customs duty/IGST as well as find the agent to pay in this pandemic. Blogger Note The situation with the elderly is something like this. Now there are a few things to note, only Central Govt. employees and pensioners get pensions which has been freezed since last year. The rest of the elderly population does not. The rate of interest has fallen to record lows from 5-6% in savings interest rate to 2% and on Fixed Deposits at 4.9% while the nominal inflation rate has up by 6% while CPI and real inflation rates are and would be much more. And this is when there is absolutely no demand in the economy. To add to all this, RBI shared a couple of months ago that fraud of 5 trillion rupees has been committed between 2015 and 2019 in banks. And this is different from the number of record NPA s that have been both in Public and Private Sector banks. To get out of this, the banks have squeezed their customers and are squeezing as well as asking GOI for bailouts. How much GOI is responsible for the frauds as well as NPA s would probably require its own space. And even now, RBI and banks have made heavy provisions as lockdowns are still a facet and are supposed to remain a facet till the end of the year or even next year (all depending upon when we get the vaccine). The Amicus Curiae further argued that the ventilators which are available locally are of bad quality. The result of all this has resulted in a huge amount of unsurmountable pressure on hospitals which they are unable to overcome. Therefore, the levy of IGST on oxygenators has direct impact on health of the citizen. So the examination of the law should not be by what intention it was but how it is affecting citizen rights now. For this he shared R.C.Cooper vs Union of India (another famous case R.C. Cooper vs Union of India) especially paragraph 49 and Federation of Hotel & Restaurant Association of India vs. Union of India, (1989) at paragraph 46 (Federation of Hotel Case) Mr. Datar further shared the Supreme Court order dated 18.12.2020, passed in Suo Moto Writ Petition(Civil) No.7/2020, to buttress the plea that the right to health includes the right to affordable treatment. Blogger s Note For those, who don t know Suo Moto is when the Court, whether Supreme Court or the High Courts take up a matter for public good. It could be in anything, law and order, Banking, Finance, Public Health etc. etc. This was the norm before 2014. The excesses of the executive were curtailed by both the Higher and the lower Judiciary. That is and was the reason that Judiciary is and was known as the third pillar of Indian democracy. A good characterization of Suo Moto can be found here. Before ending his submission, the learned Amicus Curiae also shared Jeeja Ghosh vs. Union of India, (2016) (Jeeja Ghosh Case, an outstanding case as it deals with people with disabilities and their rights and the observations made by the Division Bench of Hon ble Mr. Justice A. K. Sikri as well as Hon ble Mr. Justice R. K. Agrawal.) After Amicus Curiae completed his submissions, it was the turn of Mr. Sudhir Nandrajog, and he adopted the arguments and submissions made by the Amicus Curiae. The gentleman reiterated the facts of the case and how the impugned notification was violative of both Article 14 and 21 of the Indian Constitution. Blogger s Note The High Court s judgement which shows all the above arguments by the Amicus Curiae and the petitioner s lawyer also shared the State s view. It is only on page 24, where the Delhi High Court starts to share its own observations on the arguments of both sides. Judgement continued The first observation that the Court makes is that while the petitioner demonstrated that the impugned tax imposition would have a distinct and noticeable burdensomeness while the State did not state or share in any way how much of a loss it would incur if such a tax were let go and how much additional work would have to be done in order to receive this specific tax. It didn t need to do something which is down the wire or mathematically precise, but it didn t even care to show even theoretically how many people will be affected by the above. The counter-affidavit by the State is silent on the whole issue. The Court also contended that the State failed to prove how collecting IGST from the concerned individuals would help in fighting coronavirus in any substantial manner for the public at large. The High Court shared observations from the Navtej Singh Johar case where it is observed that the State has both negative and positive obligations to ensure that its citizens are able to enjoy the right to health. The High Court further made the point that no respectable person does like to be turned into a charity case. If the State contends that those who obey the law should pay the taxes then it is also obligatory on the state s part to lessen exactions such as taxes at the very least in times of war, famine, floods, epidemics and pandemics. Such an approach would lead a person to live a life of dignity which is part of Article 21 of the Constitution. Another point that was made by the State that only the GST council is able to make any changes as regards to exemptions rather than the State were found to be false as the State had made some exemptions without going to the GST council using its own powers under Section 25 of the Customs Act. The Court also points out that it does send a discriminatory pattern when somebody like petitioner has to pay the tax for personal use while those who are buying it for commercial use do not have to pay the tax. The Court agreed of the view of the Amicus Curiae, Mr. Datar that oxygenator should be taxed at NIL rate at IGST as it is part of life-saving drugs and oxygenator fits the bill as medical equipment as it is used in the treatment, mitigation and prevention of spread of Coronavirus. Mr. Datar also did show that oxygenator is placed at the same level as other life-saving drugs. The Court felt further emboldened as the observations by Supreme Court in State of Andhra Pradesh vs. Linde India Limited, 2020 ( State of Andhra Pradesh vs Linde Ltd.) The Court further shared many subsequent notifications from the State and various press releases by the State itself which does make the Court s point that oxygenators indeed are drugs as defined in the court case above. The State should have it as part of notification 190. This would preserve the start of the notification date from 03.05.2021 and the state would not have to issue a new notification. The Court further went to postulate that any persons similar to the petitioner could avail of the same, if they furnish a letter of undertaking to an officer designated by the State that the medical equipment would not be put to commercial use. Till the state does not do that, in the interim the importer could give the same undertaking to Joint Secretary, Customs or their nominee can hand over the same to custom officer. The Court also shared that it does not disagree with the State s arguments but the challenges which have arisen are in a unique time period/circumstances, so they are basing their judgement based on how the situation is. The Court also mentioned an order given by Supreme Court Diary No. 10669/2020 passed on 20.03.2020 where SC has taken pains to understand the issues faced by the citizens. The court also mentioned the Small Scale Industrial Manufactures Association Case (both of these cases I don t know) . So in conclusion, the Court holds the imposition of IGST on oxygenator which are imported by individuals as gifts from their relatives as unconstitutional. They also shared that any taxes taken by GOI in above scenario have to be returned. The relief to the state is they will not have to pay interest cost on the same. To check misuse of the same, the petitioner or people who are in similar circumstances would have to give a letter of undertaking to an officer designated by the State within 7 days of the state notifying the patient or anybody authorized by him/her to act on their behalf to share the letter of undertaking with the State. And till the State doesn t provide an officer, the above will continue. Hence, both the writ petition and the pending application are disposed off. The Registry is directed to release any money deposited by the petitioner along with any interest occurred on it (if any) . At the end they record appreciation of Mr. Arvind Datar, Mr. Zoheb Hossain, Mr. Sudhir Nandrajog as well as Mr. Siddharth Bambha. It is only due to their assistance that the court could reach the conclusion it did. For Delhi High Court RAJIV SHAKDHER, J. TALWANT SINGH, J. May 21, 2020 Blogger s Observations Now, after the verdict GOI does have few choices, either accept the verdict or appeal in the SC. A third choice is to make a committee and come to the same conclusions via the committee. GOI has done something similar in the past. If that happens and the same conclusions are reached as before, then the aggrieved may have no choice but to appear in the highest court of law. And this will put the aggrieved at a much more vulnerable place than before as SC court fees, lawyer fees etc. are quite high compared to High Courts. So, there is a possibility that the petitioner may not even approach unless and until some non-profit (NGO) does decide to fight and put it up as common cause or something similar. There is another judgement that I will share, probably tomorrow. Thankfully, that one is pretty short compared to this one. So it should be far more easier to read. FWIW, I did learn about the whole freeenode stuff and many channels who have shifted from freenode to libera. I will share my own experience of the same but that probably will take a day or two.
Zeeshan of IYC (India Youth Congress) along with Salman Khan s non-profit Being Human getting oxygenators
The above picture of Zeeshan. There have been a whole team of Indian Youth Congress workers (main opposition party to the ruling party) who have doing lot of relief effort. They have been buying Oxygenators from abroad with help of Being Human Foundation started by Salman Khan, an actor who works in A-grade movies in Bollywood.

24 May 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.5.0.0 on CRAN: New Upstream

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 865 other packages on CRAN. This new release brings Armadillo 10.5.0 which was released early on Friday. We had done one full test in the 10.5 rc1 prerelease one week earlier, and did another test on 10.5.0 and this 0.10.5.0.0 RcppArmadillo release just for added rigour. The package was then uploaded to CRAN late Friday (my timezone). The automated process flagged one NOTE as a false positive (yet another instance of the well-known (yet dreaded) issue of Suggests != Depends by one these 865 packages). This lead to a need of an inspection by one of the CRAN maintainers, and the weekend being the weekend it was only processed just now. Upstream moves at a speed that is a little faster than the cadence CRAN likes. As we had released RcppArmadillo 0.10.4.0.0 on April 13 we did not want to follow-up too soon thereafter with 0.10.4.1.0 which was thusly only a GitHub and drat release (which can always be had easily too via install.packages("RcppArmadillo", repos="https://RcppCore.github.io/drat").) The full set of changes follows. We include the aforementioned interim release as well.

Changes in RcppArmadillo version 0.10.5.0 (2021-05-21)
  • Upgraded to Armadillo release 10.5 (Antipodean Fortress)
    • added .clamp() member function
    • expanded the standalone clamp() function to handle complex values
    • more efficient use of OpenMP
    • vector, matrix and cube constructors now initialise elements to zero by default; use the fill::none specifier, eg. mat X(4,5,fill::none), to disable element initialisation
  • Added codecov.yml to exclude Armadillo from coverage analysis

Changes in RcppArmadillo version 0.10.4.1.0 (2021-04-23)
  • Upgraded to Armadillo release 10.4.1 (Pressure Cooker)
  • GitHub-only release

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

22 April 2021

Shirish Agarwal: The Great Train Robbery

I had a twitter fight few days back with a gentleman and the article is a result of that fight. Sadly, I do not know the name of the gentleman as he goes via a psuedo name and then again I ve not taken permission from him to quote him in either way. So I will just state the observations I was able to make from the conversations we had. As people who read this blog regularly would know, I am and have been against Railway Privatization which is happening in India. And will be sharing some of the case studies from other countries as to how it panned out for them.

UK Railways
How Privatization Fails : Railways
The Above video is by a gentleman called Shaun who basically shared that privatization as far as UK is concerned is nothing but monopolies and while there are complex reasons for the same, the design of the Railways is such that it will always be a monopoly structure. At the most what you can do is have several monopolies but that is all that can happen. The idea of competition just cannot happen. Even the idea that subsidies will be less or/and trains will run on time is far from fact. Both of these facts have been checked and found to be truthful by fullfact.org. It is and argued that UK is small and perhaps it doesn t have the right conditions. It is probably true but still we do deserve to have a glance at the UK railway map.
UK railway map with operatorsUK railway map with operators
The above map is copyrighted to Map Marketing where you could see it today . As can be seen above most companies had their own specified areas. Now if you had looked at the facts then you would have seen that UK fares have been higher. In fact, an oldish article from Metro (a UK publication) shares the same. In fact, UK nationalized its railways effectively as many large rail operators were running in red. Even Scotland is set to nationalised back in March 2022. Remember this is a country which hasn t seen inflation go upwards of 5% in nearly a decade. The only outlier was 2011 where they indeed breached the 5% mark. So from this, what we see is Private Gains and Private Gains Public Losses perhaps seem fit. But then maybe we didn t use the right example. Perhaps Japan would be better. They have bullet trains while UK is still thinking about it. (HS2).

Japanese Railway Below is the map of Japanese Railway
Railway map of Japan with private ownership courtesy Wikimedia commons
Japan started privatizing its railway in 1987 and to date it has not been fully privatized. And on top of it, amount as much as 24 trillion of the long-term JNR debt was shouldered by the government at the expense of taxpayers of Japan while also reducing almost 1/4th of it employees. To add to it, while some parts of Japanese Railways did make profits, many of them made profits by doing large-scale non-railway business mostly real estate of land adjacent to railway stations. In many cases, it seems this went all the way up to 60% of the revenue. The most profitable has been the Shinkansen though. And while it has been profitable, it has not been without safety scandals over the years, the biggest in recent years was the 2005 Amagasaki derailment. What was interesting to me was the Aftermath, while the Wikipedia page doesn t share much, I had read at the time and probably could be found how a lot of ordinary people stood up to the companies in a country where it is a known fact that most companies are owned by the Yakuza. And this is a country where people are loyal to their corporation or company no matter what. It is a strange culture to west and also here in India where people change jobs on drop of hat, although nowadays we have record unemployment. So perhaps Japan too does not meet our standard as it doesn t do competition with each other but each is a set monopoly in those regions. Also how much subsidy is there or not is not really transparent.

U.S. Railways Last, but not the least I share the U.S. Railway map. This is provided by A Mr. Tom Alison on reddit on channel maporn. As the thread itself is archived and I do not know the gentleman concerned, nor have taken permission for the map, hence sharing the compressed version


U.S. Railway lines with the different owners
Now the U.S. Railways is and has always been peculiar as unlike the above two the U.S. has always been more of a freight network. Probably, much of it has to do that in the 1960 s when oil was cheap, the U.S. made zillions of roadways and romanticized the road trip and has been doing it ever since. Also the creation of low-cost airlines definitely didn t help the railways to have more passenger services, in fact the opposite. There are and have been smaller services and attempts of privatization in both New Zealand and Australia and both have been failures. Please see papers in that regard. My simple point is this, as can be seen above, there have been various attempts at privatization of railways and most of them have been a mixed bag. The only one which comes close to what we think as good is Japanese but that also used a lot of public debt which we don t know what will happen on next. Also for higher-speed train services like a bullet train or whatever, you need to direct, no hair pen bends. In fact, a good talk on the topic is the TBD podcast which while it talks about hyperloop, the same questions is and would be asked if were to do in India. Another thing to be kept in mind is that the Japanese have been exceptional builders and this is because they have been forced to. They live in a seismically active zone which made Fukushima disaster a reality but at the same time, their buildings are earthquake-resistant. Standard Disclaimer The above is a simplified version of things. I could have added in financial accounts but that again has no set pattern. For e.g. some Railways use accrual, some use cash and some use hybrid. I could have also shared in either the guage or electrification but all have slightly different standards, although uniguage is something that all Railways aspire for and electrification is again something that all Railways want although in many cases it just isn t economically feasible.

Indian Railways Indian Railways itself recently made the move from Cash to Accrual couple of years back. In-between for a couple of years, it was hybrid. The sad part is and was you can now never measure against past performance in the old way because it is so different. Hence, whether the Railways will be making a loss or a profit, we would come to know only much later. Also, most accountants don t know the new system well, so it is gonna take more time, how much unknown. Sadly, what GOI did a few years back is merge the Railway budget into the Union Budget. Of course, the excuse they gave is too many pressures of new trains, while the truth is, by doing this, they decreased transparency about the whole thing. For e.g. for the last few years, the only state which had significant work being done is in U.P. (Uttar Pradesh) and a bit in Goa, although that is has been protested time and again. I being from the neighborly state of Maharashtra , and have been there several times. Now it does feels all like a dream, going to Goa :(.

Covid news Now before I jump on the news, I should share the movie Virus (2019) which was made by the talented Aashiq Abu. Even though, am not a Malayalee, I still have enjoyed many of his movies simply because he is a terrific director and Malayalam movies, at least most of them have English subtitles and lot of original content.. Interestingly, unlike the first couple of times when I saw it a couple of years back. The first time I saw it, I couldn t sleep a wink for a week. Even the next time, it was heavy. I had shared the movie with mum, and even she couldn t see it in one go. It is and was that powerful Now maybe because we are headlong in the pandemic, and the madness is all around us. There are two terms that helped me though understand a great deal of what is happening in the movie, the first term was altered sensorium which has been defined here. The other is saturation or to be more precise oxygen saturation . This term has also entered the Indian twitter lexicon quite a bit as India has started running out of oxygen. Just today Delhi High Court did an emergency hearing on the subject late at night. Although there is much to share about the mismanagement of the center, the best piece on the subject has been by Miss Priya Ramani. Yup, the same lady who has won against M.J. Akbar and this is when Mr. Akbar had 100 lawyers for this specific case. It would be interesting to see what happens ahead. There are however few things even she forgot in her piece, For e.g. reverse migration i.e. from urban to rural migration started again. Two articles from different entities sharing a similar outlook.Sadly, the right have no empathy or feeling for either the poor or the sick. Even the labor minister Santosh Gangwar s statement that around 1.04 crores were the only people who walked back home. While there is not much data, however some work/research has been done on migration to cites that the number could be easily 10 times as much. And this was in the lockdown of last year. This year, again the same issue has re-surfaced and migrants learning lessons started leaving cities. And I m ashamed to say I think they are doing the right thing. Most State Governments have not learned lessons nor have they done any work to earn the trust of migrants. This is true of almost all state Governments. Last year, just before the lockdown was announced, me and my friend spent almost 30k getting a cab all the way from Chennai to Pune, how much we paid for the cab, how much we bribed the various people just so we could cross the state borders to return home to our anxious families. Thankfully, unlike the migrants, we were better off although we did make a loss. I probably wouldn t be alive if I were in their situation as many didn t. That number is still in the air undocumented deaths  Vaccine issues Currently, though the issue has been the Vaccine and the pricing of the same. A good article to get a summation of the issues outlined has been shared on Economist. Another article that goes to the heart of the issue is at scroll. To buttress the argument, the SII chairman had shared this few weeks back
Adar Poonawala talking to Vishnu Som on Left, right center, 7th April 2021.
So, a licensee manufacturer wants to make super-profits during the pandemic. And now, as shared above they can very easily do it. Even the quotes given to nearby countries is smaller than the quotes given to Indian states

Prices of AstraZeneca among various states and countries.
The situation around beds, vaccines, oxygen, anything is so dire that people could go to any lengths to save their loved ones. Even if they know if a certain medicine doesn t work. For e.g. Remdesivir, 5 WHO trials have concluded that it doesn t increase mortality. Heck, even AIIMS chief said the same. But both doctors and relatives desperation to cling on hope has made Remdesivir as a black market drug with unoffical prices hovering anywhere between INR 14k/- to INR30k/- per vial. One of the executives of a top firm was also arrested in Gujarat. In Maharashtra, the opposition M.P. came to the rescue of the officials of Bruick pharms in Mumbai. Sadly, this strange affliction to the party in the center is also there in my extended family. At one end, they will heap praise on Mr. Modi, at the same time they can t get wait to get fast out of India. Many of them have settled in horrors of horror Dubai, as it is the best place to do business, get international schools for the young ones at decent prices, cheaper or maybe a tad more than what they paid in Delhi or elsewhere. Being an Agarwal or a Gupta makes it easier to compartmentalize both things. Ease of doing business, 5 days flat to get a business registered, up and running. And the paranoia is still there. They won t talk on the phone about him because they are afraid they may say something which comes back to bite them. As far as their decision to migrate, can t really blame them. If I were 20-25 yeas younger and my mum were in a better shape than she is, we probably would have migrated as well, although would have preferred Europe than anywhere else.

Internet Freedom and Aarogya Setu App.


Internet Freedom had shared the chilling effects of the Aarogya Setu App. This had also been shared by FSCI in the past, and recently had their handle being banned on Twitter. This was also apparent in a legal bail order which the high court judge gave. While I won t go into the merits and demerits of the bail order, it is astounding for the judge to say that the accused, even though he would be on bail install an app. so he can be surveilled. And this is a high court judge, such a sad state of affairs. We seem to be putting up new lows every day when it comes to judicial jurisprudence. One interesting aspect of the whole case was shared by Aishwarya Iyer. She shared a story that she and her team worked on quint which raises questions on the quality of the work done by Delhi Police. This is of course, up to Delhi Police to ascertain the truth of the matter because unless and until they are able to tie in the PMO s office in for a leak or POTUS s office it hardly seems possible. For e.g. the dates when two heads of state can meet each other would be decided by the secretaries of the two. Once the date is known, it would be shared with the press while at the same time some sort of security apparatus would kick in place. It is incumbent, especially on the host to take as much care as he can of the guest. We all remember that World War 1 (the war to end all wars) started due to the murder of Archduke Ferdinand.

As nobody wants that, the best way is to make sure that a political murder doesn t happen on your watch. Now while I won t comment on what it would be, it would be safe to assume that it would be z+ security along with higher readiness. Especially if it as somebody as important as POTUS. Now, it would be quite a reach for Delhi Police to connect the two dates. They either will have to get creative with the dates or some other way. Otherwise, with practically no knowledge in the public domain, they can t work in limbo. In either case, I do hope the case comes up for hearing soon and we see what the Delhi Police says and contends in the High Court about the same. At the very least, it would be irritating for them to talk of the dates unless they can contend some mass conspiracy which involves the PMO (and would bring into question the constant vetting done by the Intelligence dept. of all those who work in PMO). And this whole case is to kind of shelter to the Delhi riots which happened in which majorly the Muslims died but their deaths lay unaccounted till date

Conclusion In Conclusion, I would like to share a bit of humor because right now the atmosphere is humorless, both with authoritarian tendencies of the Central Govt. and the mass mismanagement of public health which they now have left to the state to do as they fit. The peice I am sharing is from arre, one of my goto sites whenever I feel low.

14 April 2021

Dirk Eddelbuettel: RcppArmadillo 0.10.4.0.0 on CRAN: New Upstream Plus

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 852 other packages on CRAN. This new release brings us the just release Armadillo 10.4.0. Upstream moves at a speed that is a little faster than the cadence CRAN likes. We release RcppArmadillo 0.10.2.2.0 on March 9; and upstream 10.3.0 came out shortly thereafter. We aim to accomodate CRAN with (roughly) monthly (or less frequent) releases) so by the time we were ready 10.4.0 had just come out. As it turns, the full testing had a benefit. Among the (currently) 852 CRAN packages using RcppArmadillo, two were failing tests. This is due to a subtle, but important point. Early on we realized that it would be beneficial if the standard R control over random-number creation and seeding affected Armadillo too, which Conrad accomodated kindly with an optional RNG interface which RcppArmadillo supplies. With recent changes he made, the R side saw normally-distributed draws (via the Armadillo interface) changed, which lead to the two changes. All hail unit tests. So I mentioned this to Conrad, and with the usual Chicago-Brisbane time difference late my evening a fix was in my inbox. The CRAN upload was then halted as I had missed that due to other changes he had made random draws from a Gamma would now call std::rand() which CRAN flags. Another email to Brisbane, another late (one-line) fix back and all was good. We still encountered one package with an error but flagged this as internal to that package s setup, so Uwe let RcppArmadillo onto CRAN, I contacted that package s maintainer who was very receptive and a change should be forthcoming. So with all that we have 0.10.4.0.0 on CRAN giving us Armadillo 10.4.0. The full set of changes follows. As Armadillo 10.3.0 was not uploaded to CRAN, its changes are included too.

Changes in RcppArmadillo version 0.10.4.0.0 (2021-04-12)
  • Upgraded to Armadillo release 10.4.0 (Pressure Cooker)
    • faster handling of triangular matrices by log_det()
    • added log_det_sympd() for log determinant of symmetric positive matrices
    • added ARMA_WARN_LEVEL configuration option, to control the degree of emitted warning messages
    • reduced the default degree of warning messages, so that failed decompositions, failed saving/loading, etc, no longer emit warnings
  • Apply one upstream corrections for arma::randn draws when using alternative (here R) generator, and arma::randg.

Changes in RcppArmadillo version 0.10.3.0.0 (2021-03-10)
  • Upgraded to Armadillo release 10.3 (Sunrise Chaos)
    • faster handling of symmetric positive definite matrices by pinv()
    • expanded .save() / .load() for dense matrices to handle coord_ascii format
    • for out of bounds access, element accessors now throw the more nuanced std::out_of_range exception, instead of only std::logic_error
    • improved quality of random numbers

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

9 April 2021

Michael Prokop: A Ceph war story

It all started with the big bang! We nearly lost 33 of 36 disks on a Proxmox/Ceph Cluster; this is the story of how we recovered them. At the end of 2020, we eventually had a long outstanding maintenance window for taking care of system upgrades at a customer. During this maintenance window, which involved reboots of server systems, the involved Ceph cluster unexpectedly went into a critical state. What was planned to be a few hours of checklist work in the early evening turned out to be an emergency case; let s call it a nightmare (not only because it included a big part of the night). Since we have learned a few things from our post mortem and RCA, it s worth sharing those with others. But first things first, let s step back and clarify what we had to deal with. The system and its upgrade One part of the upgrade included 3 Debian servers (we re calling them server1, server2 and server3 here), running on Proxmox v5 + Debian/stretch with 12 Ceph OSDs each (65.45TB in total), a so-called Proxmox Hyper-Converged Ceph Cluster. First, we went for upgrading the Proxmox v5/stretch system to Proxmox v6/buster, before updating Ceph Luminous v12.2.13 to the latest v14.2 release, supported by Proxmox v6/buster. The Proxmox upgrade included updating corosync from v2 to v3. As part of this upgrade, we had to apply some configuration changes, like adjust ring0 + ring1 address settings and add a mon_host configuration to the Ceph configuration. During the first two servers reboots, we noticed configuration glitches. After fixing those, we went for a reboot of the third server as well. Then we noticed that several Ceph OSDs were unexpectedly down. The NTP service wasn t working as expected after the upgrade. The underlying issue is a race condition of ntp with systemd-timesyncd (see #889290). As a result, we had clock skew problems with Ceph, indicating that the Ceph monitors clocks aren t running in sync (which is essential for proper Ceph operation). We initially assumed that our Ceph OSD failure derived from this clock skew problem, so we took care of it. After yet another round of reboots, to ensure the systems are running all with identical and sane configurations and services, we noticed lots of failing OSDs. This time all but three OSDs (19, 21 and 22) were down:
% sudo ceph osd tree
ID CLASS WEIGHT   TYPE NAME      STATUS REWEIGHT PRI-AFF
-1       65.44138 root default
-2       21.81310     host server1
 0   hdd  1.08989         osd.0    down  1.00000 1.00000
 1   hdd  1.08989         osd.1    down  1.00000 1.00000
 2   hdd  1.63539         osd.2    down  1.00000 1.00000
 3   hdd  1.63539         osd.3    down  1.00000 1.00000
 4   hdd  1.63539         osd.4    down  1.00000 1.00000
 5   hdd  1.63539         osd.5    down  1.00000 1.00000
18   hdd  2.18279         osd.18   down  1.00000 1.00000
20   hdd  2.18179         osd.20   down  1.00000 1.00000
28   hdd  2.18179         osd.28   down  1.00000 1.00000
29   hdd  2.18179         osd.29   down  1.00000 1.00000
30   hdd  2.18179         osd.30   down  1.00000 1.00000
31   hdd  2.18179         osd.31   down  1.00000 1.00000
-4       21.81409     host server2
 6   hdd  1.08989         osd.6    down  1.00000 1.00000
 7   hdd  1.08989         osd.7    down  1.00000 1.00000
 8   hdd  1.63539         osd.8    down  1.00000 1.00000
 9   hdd  1.63539         osd.9    down  1.00000 1.00000
10   hdd  1.63539         osd.10   down  1.00000 1.00000
11   hdd  1.63539         osd.11   down  1.00000 1.00000
19   hdd  2.18179         osd.19     up  1.00000 1.00000
21   hdd  2.18279         osd.21     up  1.00000 1.00000
22   hdd  2.18279         osd.22     up  1.00000 1.00000
32   hdd  2.18179         osd.32   down  1.00000 1.00000
33   hdd  2.18179         osd.33   down  1.00000 1.00000
34   hdd  2.18179         osd.34   down  1.00000 1.00000
-3       21.81419     host server3
12   hdd  1.08989         osd.12   down  1.00000 1.00000
13   hdd  1.08989         osd.13   down  1.00000 1.00000
14   hdd  1.63539         osd.14   down  1.00000 1.00000
15   hdd  1.63539         osd.15   down  1.00000 1.00000
16   hdd  1.63539         osd.16   down  1.00000 1.00000
17   hdd  1.63539         osd.17   down  1.00000 1.00000
23   hdd  2.18190         osd.23   down  1.00000 1.00000
24   hdd  2.18279         osd.24   down  1.00000 1.00000
25   hdd  2.18279         osd.25   down  1.00000 1.00000
35   hdd  2.18179         osd.35   down  1.00000 1.00000
36   hdd  2.18179         osd.36   down  1.00000 1.00000
37   hdd  2.18179         osd.37   down  1.00000 1.00000
Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back? We stumbled upon this beauty in our logs:
kernel: [   73.697957] XFS (sdl1): SB stripe unit sanity check failed
kernel: [   73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff
kernel: [   73.698799] XFS (sdl1): Unmount and run xfs_repair
kernel: [   73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer:
kernel: [   73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
kernel: [   73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
kernel: [   73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8  bD+.."@..=..e...
kernel: [   73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
kernel: [   73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning.
ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5
ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32
kernel: [   73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
kernel: [   73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
kernel: [   73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
kernel: [   73.703373] XFS (sdl1): SB validate failed with error -117.
The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD. Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:
synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1
meta-data=/dev/sdc1              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=1, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
cache_node_purge: refcount was 1, not zero (node=0x1d3c400)
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000
SB stripe unit sanity check failed
Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000
releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number
bad magic number
Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000
libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000
releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.
Ouch. This very much looked related to the actual issue we re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using -d sunit=512 -d swidth=512 at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!). Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami). To identify the UUID, we can read the data from ceph --format json osd dump , like this for all our OSDs (Zsh syntax ftw!):
synpromika@server1 ~ % for f in  0..37  ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump   jq -r ".osds[]   select(.osd==$f)   .uuid")"
osd-0: 4568c300-ad83-4288-963e-badcd99bf54f
osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f
osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e
osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c
[...]
Identifying the corresponding raw device for each OSD UUID is possible via:
synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f"
synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"$ UUID "
/dev/sdc1
The OSD s key ID can be retrieved via:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph auth get osd."$ OSD_ID " -f json 2>/dev/null   jq -r '.[]   .key'
AQCKFpZdm0We[...]
Now we also need to identify the underlying block device:
synpromika@server1 ~ % OSD_ID=0
synpromika@server1 ~ % sudo ceph osd metadata osd."$ OSD_ID " -f json   jq -r '.bluestore_bdev_partition_path'    
/dev/sdc2
With all of this, we reconstructed the keyring, fsid, whoami, block + block_uuid files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure. We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server. Thanks to this, we managed to get the Ceph cluster up and running again. We didn t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA! Root Cause Analysis So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:
synpromika@server1 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
synpromika@server2 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.21.0-0112
Firmware Version = 4.680.00-8489
synpromika@server3 ~ % sudo storcli64 /c0 show all   grep '^Firmware'
Firmware Package Build = 24.16.0-0092
Firmware Version = 4.660.00-8156
This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make? Now let s check firmware changelogs. For the 24.21.0-0097 release we found this:
- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889)
- xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)
Our XFS problem certainly was related to the controller s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:
WARN  server2.example.org  Mount options of /var/lib/ceph/osd/ceph-21      WARN - Missing: sunit=1024, Exceeding: sunit=512
We compared the new OSD 21 with an existing one (OSD 25 on server3):
synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota
synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount   grep sunit
Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquota
Thanks to our documentation, we could compare execution logs of their creation:
% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log
-synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25
+synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21
[...]
-command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1
-meta-data=/dev/sdj1              isize=2048   agcount=4, agsize=6272 blks
[...]
+command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1
+meta-data=/dev/sdi1              isize=2048   agcount=4, agsize=6336 blks
          =                       sectsz=4096  attr=2, projid32bit=1
          =                       crc=1        finobt=1, sparse=0, rmapbt=0, reflink=0
-data     =                       bsize=4096   blocks=25088, imaxpct=25
-         =                       sunit=128    swidth=64 blks
+data     =                       bsize=4096   blocks=25344, imaxpct=25
+         =                       sunit=64     swidth=64 blks
 naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
 log      =internal log           bsize=4096   blocks=1608, version=2
          =                       sectsz=4096  sunit=1 blks, lazy-count=1
 realtime =none                   extsz=4096   blocks=0, rtextents=0
[...]
So back then, we even tried to track this down but couldn t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam s razor, assuming the simplest explanation is usually the right one, so let s check the disk properties and see what differs:
synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
524288
262144
synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk
4685545472
2398999281664
512
4096
262144
262144
See the difference between server1 and server2 for identical disks? The getiomin option now reports something different for them:
synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk            
524288
synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size
524288
synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 
262144
synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size
262144
It doesn t make sense that the minimum I/O size (iomin, AKA BLKIOMIN) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT). This leads us to Bug 202127 cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version? The XFS behaviour change Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left). Let s look at such a failing XFS partition with the Grml live system:
root@grml ~ # grml-version
grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24]
root@grml ~ # uname -a
Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux
root@grml ~ # grml-hostname grml-2020-06
Setting hostname to grml-2020-06: done
root@grml ~ # exec zsh
root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux
Desired=Unknown/Install/Remove/Purge/Hold
  Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
 / Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
 / Name           Version      Architecture Description
+++-==============-============-============-=========================================
ii  util-linux     2.35.2-4     amd64        miscellaneous system utilities
ii  xfsprogs       5.6.0-1+b2   amd64        Utilities for managing the XFS filesystem
There it s failing, no matter which mount option we try:
root@grml-2020-06 ~ # mount ./sdd1.dd /mnt
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
root@grml-2020-06 ~ # dmesg   tail -30
[...]
[   64.788640] XFS (loop1): SB stripe unit sanity check failed
[   64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff
[   64.788671] XFS (loop1): Unmount and run xfs_repair
[   64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   64.788679] XFS (loop1): SB validate failed with error -117.
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error.
32 root@grml-2020-06 ~ # dmesg   tail -1
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/
mount: /mnt: mount(2) system call failed: Structure needs cleaning.
32 root@grml-2020-06 ~ # dmesg   tail -14
[   66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024)
[   80.751277] XFS (loop1): SB stripe unit sanity check failed
[   80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff 
[   80.751324] XFS (loop1): Unmount and run xfs_repair
[   80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer:
[   80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00  XFSB..........b.
[   80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
[   80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36  2..5S.D..c0..+h6
[   80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00  ......@.........
[   80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02  ................
[   80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00  ................
[   80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00  ...H............
[   80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19  ................
[   80.751338] XFS (loop1): SB validate failed with error -117.
Also xfs_repair doesn t help either:
root@grml-2020-06 ~ # xfs_info ./sdd1.dd
meta-data=./sdd1.dd              isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
With the SB stripe unit sanity check failed message, we could easily track this down to the following commit fa4ca9c:
% git show fa4ca9c5574605d1e48b7e617705230a0640b6da   cat
commit fa4ca9c5574605d1e48b7e617705230a0640b6da
Author: Dave Chinner <dchinner@redhat.com>
Date:   Tue Jun 5 10:06:16 2018 -0700
    
    xfs: catch bad stripe alignment configurations
    
    When stripe alignments are invalid, data alignment algorithms in the
    allocator may not work correctly. Ensure we catch superblocks with
    invalid stripe alignment setups at mount time. These data alignment
    mismatches are now detected at mount time like this:
    
    XFS (loop0): SB stripe unit sanity check failed
    XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff
    XFS (loop0): Unmount and run xfs_repair
    XFS (loop0): First 128 bytes of corrupted metadata buffer:
    0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00  XFSB............
    0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
    00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2  .27...F=...3....
    000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0  ................
    0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2  ................
    00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00  ................
    000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00  ... .4..........
    00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19  ................
    XFS (loop0): SB validate failed with error -117.
    
    And the mount fails.
    
    Signed-off-by: Dave Chinner <dchinner@redhat.com>
    Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com>
    Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com>
    Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com>
diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c
index b5dca3c8c84d..c06b6fc92966 100644
--- fs/xfs/libxfs/xfs_sb.c
+++ fs/xfs/libxfs/xfs_sb.c
@@ -278,6 +278,22 @@ xfs_mount_validate_sb(
                return -EFSCORRUPTED;
         
        
+       if (sbp->sb_unit)  
+               if (!xfs_sb_version_hasdalign(sbp)  
+                   sbp->sb_unit > sbp->sb_width  
+                   (sbp->sb_width % sbp->sb_unit) != 0)  
+                       xfs_notice(mp, "SB stripe unit sanity check failed");
+                       return -EFSCORRUPTED;
+                 
+         else if (xfs_sb_version_hasdalign(sbp))   
+               xfs_notice(mp, "SB stripe alignment sanity check failed");
+               return -EFSCORRUPTED;
+         else if (sbp->sb_width)  
+               xfs_notice(mp, "SB stripe width sanity check failed");
+               return -EFSCORRUPTED;
+        
+
+       
        if (xfs_sb_version_hascrc(&mp->m_sb) &&
            sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE)  
                xfs_notice(mp, "v5 SB sanity check failed");
This change is included in kernel versions 4.18-rc1 and newer:
% git describe --contains fa4ca9c5574605d1e48
v4.18-rc1~37^2~14
Now let s try with an older kernel version (4.9.0), using old Grml 2017.05 release:
root@grml ~ # grml-version
grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31]
root@grml ~ # uname -a
Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux
root@grml ~ # lsb_release -a
No LSB modules are available.
Distributor ID: Debian
Description:    Debian GNU/Linux 9.0 (stretch)
Release:        9.0
Codename:       stretch
root@grml ~ # grml-hostname grml-2017-05
Setting hostname to grml-2017-05: done
root@grml ~ # exec zsh
root@grml-2017-05 ~ #
root@grml-2017-05 ~ # xfs_info ./sdd1.dd
xfs_info: ./sdd1.dd is not a mounted XFS filesystem
1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd
Phase 1 - find and verify superblock...
bad primary superblock - bad stripe width in superblock !!!
attempting to find secondary superblock...
..............................................................................................Sorry, could not find valid secondary superblock
Exiting now.
1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt
root@grml-2017-05 ~ # mount -t xfs
/root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota)
root@grml-2017-05 ~ # ls /mnt
activate.monmap  active  block  block_uuid  bluefs  ceph_fsid  fsid  keyring  kv_backend  magic  mkfs_done  ready  require_osd_release  systemd  type  whoami
root@grml-2017-05 ~ # xfs_info /mnt
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1 spinodes=0 rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=128    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0 ftype=1
log      =internal               bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:
root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/
root@grml-2017-05 ~ # umount /mnt/
And indeed, mounting this rewritten filesystem then also works with newer kernels:
root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/
root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten
meta-data=/dev/loop1             isize=2048   agcount=4, agsize=6272 blks
         =                       sectsz=4096  attr=2, projid32bit=1
         =                       crc=1        finobt=1, sparse=0, rmapbt=0
         =                       reflink=0
data     =                       bsize=4096   blocks=25088, imaxpct=25
         =                       sunit=64    swidth=64 blks
naming   =version 2              bsize=4096   ascii-ci=0, ftype=1
log      =internal log           bsize=4096   blocks=1608, version=2
         =                       sectsz=4096  sunit=1 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@grml-2020-06 ~ # mount -t xfs                
/root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)
FTR: The sunit=512,swidth=512 from the xfs mount option is identical to xfs_info s output sunit=64,swidth=64 (because mount.xfs s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so sunit = 512*512 := 64*4096 ). mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):
synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 262144 262144
/sys/block/sdd/queue/: 262144 262144
/sys/block/sde/queue/: 262144 262144
/sys/block/sdf/queue/: 262144 262144
/sys/block/sdg/queue/: 262144 262144
/sys/block/sdh/queue/: 262144 262144
/sys/block/sdi/queue/: 262144 262144
/sys/block/sdj/queue/: 262144 262144
/sys/block/sdk/queue/: 262144 262144
/sys/block/sdl/queue/: 262144 262144
/sys/block/sdm/queue/: 262144 262144
/sys/block/sdn/queue/: 262144 262144
[...]
synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done
[...]
/sys/block/sdc/queue/: 524288 262144
/sys/block/sdd/queue/: 524288 262144
/sys/block/sde/queue/: 524288 262144
/sys/block/sdf/queue/: 524288 262144
/sys/block/sdg/queue/: 524288 262144
/sys/block/sdh/queue/: 524288 262144
/sys/block/sdi/queue/: 524288 262144
/sys/block/sdj/queue/: 524288 262144
/sys/block/sdk/queue/: 524288 262144
/sys/block/sdl/queue/: 524288 262144
/sys/block/sdm/queue/: 524288 262144
/sys/block/sdn/queue/: 524288 262144
[...]
This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones. Make sure to also read the XFS FAQ regarding How to calculate the correct sunit,swidth values for optimal performance . We also stumbled upon two interesting reads in RedHat s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947. Am I affected? How to work around it? To check whether your XFS mount points are affected by this issue, the following command line should be useful:
awk '$3 == "xfs" print $2 ' /proc/self/mounts   while read mount ; do echo -n "$mount " ; xfs_info $mount   awk '$0 ~ "swidth" gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3 '   awk '  if ($1 > $2) print "impacted"; else print "OK" ' ; done
If you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise). Lessons learned Thanks: Darshaka Pathirana, Chris Hofstaedtler and Michael Hanscho. Looking for help with your IT infrastructure? Let us know!

26 March 2021

Benjamin Mako Hill: The Free Software Foundation and Richard Stallman

I served as a director and as a voting member of the Free Software Foundation for more than a decade. I left both positions over the last 18 months and currently have no formal authority in the organization. So although it is now just my personal opinion, I will publicly add my voice to the chorus of people who are expressing their strong opposition to Richard Stallman s return to leadership in the FSF and to his continued leadership in the free software movement. The current situation makes me unbelievably sad. I stuck around the FSF for a long time (maybe too long) and worked hard (I regret I didn t accomplish more) to try and make the FSF better because I believe that it is important to have voices advocating for social justice inside our movement s most important institutions. I believe this is especially true when one is unhappy with the existing state of affairs. I am frustrated and sad that I concluded that I could no longer be part of any process of organizational growth and transformation at FSF. I have nothing but compassion, empathy, and gratitude for those who are still at the FSF especially the staff who are continuing to work silently toward making the FSF better under intense public pressure. I still hope that the FSF will emerge from this as a better organization.

Next.

Previous.