Search Results: "tina"

28 June 2021

Shirish Agarwal: Indian Capital Markets, BSE, NSE

I had been meaning to write on the above topic for almost a couple of months now but just kept procrastinating about it. That push came to a shove when Sucheta Dalal and Debasis Basu shared their understanding, wisdom, and all in the new book called Absolute Power Inside story of the National Stock Exchange s amazing success, leading to hubris, regulatory capture and algo scam . Now while I will go into the details of the new book as currently, I have not bought it but even if I had bought it and shared some of the revelations from it, it wouldn t have done justice to either the book or what is sharing before knowing some of the background before it.

Before I jump ahead, I would suggest people to read my sort of introductory blog post on banking history so they know where I m coming from. I m going to deviate a bit from Banking as this is about trade and capital markets, although Banking would come in later on. And I will also be sharing some cultural insights along with history so people are aware of why things happened the way they did. Calicut, Calcutta, Kolkata, one-time major depot around the world Now, one cannot start any topic about trade without talking about Kolkata. While today, it seems like a bastion of communism, at one time it was one of the major trade depots around the world. Both William Dalrymple and the Chinese have many times mentioned Kolkata as being one of the major centers of trade. This was between the 13th and the late 19th century. A cursory look throws up this article which talks about Kolkata or Calicut as it was known as a major trade depot. There are of course many, many articles and even books which do tell about how Kolkata was a major trade depot. Now between the 13th and 19th century, a lot of changes happened which made Kolkata poorer and shifted trade to Mumbai/Bombay which in those times was nothing but just a port city like many others.

The Rise of the Zamindar Around the 15th century when Babur Invaded Hindustan, he realized that Hindustan is too big a country to be governed alone. And Hindustan was much broader than independent India today. So he created the title of Zamindars. Interestingly, if you look at the Mughal period, they were much more in tune with Hindustani practices than the British who came later. They used the caste divisions and hierarchy wisely making sure that the status quo was maintained as far as castes/creed were concerned. While in-fighting with various rulers continued, it was more or less about land and power other than anything else. When the Britishers came they co-opted the same arrangement with a minor adjustment. While in the before system, the zamindars didn t have powers to be landowners. The Britishers gave them land ownerships. A huge percentage of thess zamindars especially in Bengal were from my own caste Banias or Baniyas. The problem and the solution for the Britishers had been this was a large land to control and exploit and the number of British officers and nobles were very less. So they gave virtually a lot of powers to the Banias. The only thing the British insisted on were very high rents from the newly minted Zamindars. The Zamindar in turn used the powers of personal fiefdom to give loans at very high interest rates when the poor were unable to pay the interest rate, they would take the land while at the same time slavery was forced on both men and women, many a time rapes and affairs. While there have been many records shedding light on it, don t think it could be any more powerful as enacted and shared by Shabana Azmi in Ankur:the Seedling. Another prominent grouping was formed around the same time was the Bhadralok. Now as shared Bhadralok while having all the amenities of belonging to the community, turned a blind eye to the excesses being done by the Zamindars. How much they played a hand in the decimation of Bengal has been a matter of debate, but they did have a hand, that much is not contested.

The Rise of Stock Exchanges Sadly and interestingly, many people believe and continue to believe that stock exchanges is recent phenomena. The first stock exchange though was the Calcutta Stock Exchange rather than the Bombay Stock Exchange. How valuable was Calcutta to the Britishers in its early years can be gauged from the fact that at one time it was made the capital of India in 1772 . In fact, after the Grand Trunk Road (on which there had been even Train names in both countries) x number of books have been written of the trade between Calcutta and Peshawar (Now in Pakistan). And it was not just limited to trade but also cultural give-and-take between the two centers. Even today, if you look at YT (Youtube) and look up some interviews of old people, you find many interesting anecdotes of people sharing both culture and trade.

The problem of the 60 s and rise of BSE
After India became independent and the Constitutional debates happened, the new elites understood that there cannot be two power centers that could govern India. On one hand, were the politicians who had come to power on the back of the popular vote, the other was the Zamindars, who more often than not had abused their powers which resulted in widespread poverty. The Britishers are to blame, but so do the middlemen as they became willing enablers to the same system of oppression. Hence, you had the 1951 amendment to the Constitution and the 1956 Zamindari Abolition Act. In fact, you can find much more of an in-depth article both about Zamindars and their final abolition here. Now once Zamindari was gone, there was nothing to replace it with. The Zamindars ousted of their old roles turned and tried to become Industrialists. The problem was that the poor and the downtrodden had already had experiences with the Zamindars. Also, some Industrialists from North and West also came to Bengal but they had no understanding of either the language or the cultural understanding of what had happened in Bengal. And notice that I have not talked about both the famines and the floods that wrecked Bengal since time immemorial and some of the ones which got etched on soul of Bengal and has marks even today  The psyche of the Bengali and the Bhadralok has gone through enormous shifts. I have met quite a few and do see the guilt they feel. If one wonders as to how socialist parties are able to hold power in Bengal, look no further than Tarikh which tells and shares with you that even today how many Bengalis still feel somewhat lost.

The Rise of BSE Now, while Kolkata Stock Exchange had been going down, for multiple reasons other than listed above. From the 1950s onwards Jawaharlal Nehru had this idea of 5-year plans, borrowed from socialist countries such as Russia, China etc. His vision and ambition for the newly minted Indian state were huge, while at the same time he understood we were poor. The loot by East India Company and the Britishers and on top of that the division of wealth with Pakistan even though the majority of Muslims chose and remained with India. Travel on Indian Railways was a risky affair. My grandfather had shared numerous tales where he used to fill money in socks and put the socks on in boots when going between either Delhi Kolkata or Pune Kolkata. Also, as the Capital became Delhi, it unofficially was for many years, the transparency from Kolkata-based firms became less. So many Kolkata firms either mismanaged and shut down while Maharashtra, my own state, saw a huge boon in Industrialization as well as farming. From the 1960s to the 1990s there were many booms and busts in the stock exchanges but most were manageable.

While the 60s began on a good note as Goa was finally freed from the Portuguese army and influence, the 1962 war with the Chinese made many a soul question where we went wrong. Jawaharlal Nehru went all over the world to ask for help but had to return home empty-handed. Bollywood showed a world of bell-bottoms and cars and whatnot, while the majority were still trying to figure out how to put two square meals on the table. India suffered one of the worst famines in those times. People had to ration food. Families made do with either one meal or just roti (flatbread) rather than rice. In Bengal, things were much more severe. There were huge milk shortages, so Bengalis were told to cut down on sweets. This enraged the Bangalis as nothing else could. Note If one wants to read how bad Indians felt at that time, all one has to read is V.S. Naipaul s An Area of darkness . This was also the time when quite a few Indians took their first step out of India. While Air India had just started, the fares were prohibitive. Those who were not well off, either worked on ships or went via passenger or cargo ships to Dubai/Qatar middle-east. Some went to Russia and some even to States. While today s migr s want to settle in the west forever and have their children and grandchildren grow up in the West, in the 1960s and 70s the idea was far different. The main purpose for a vast majority was to get jobs and whatnot, save maximum money and send it back to India as a remittance. The idea was to make enough money in 3-5-10 years, come back to India, and then lead a comfortable life. Sadly, there has hardly been any academic work done in India, at least to my knowledge to document the sacrifices done by Indians in search of jobs, life, purpose, etc. in the 1960s and 1970s. The 1970s was also when alternative cinema started its journey with people like Smita Patil, Naseeruddin Shah who portrayed people s struggles on-screen. Most of them didn t have commercial success because the movies and the stories were bleak. While the acting was superb, most Indians loved to be captured by fights, car-chases, and whatnot rather than the deary existence which they had. And the alt cinema forced them to look into the mirror, which was frowned upon both by the masses and the classes. So cinema which could have been a wake-up call for a lot of Indians failed. One of the most notable works of that decade, at least to me, was Manthan. 1961 was also marked by the launch of Economic Times and Financial Express which tells that there was some appetite for financial news and understanding. The 1970s was also a very turbulent time in the corporate sector and stock exchanges. Again, the companies which were listed were run by the very well-off and many of them had been abroad. At the same time, you had fly-by-night operators. One of the happenings which started in this decade is you had corporate wars and hostile takeovers, quite a few of them of which could well have a Web series or two of their own. This was also a decade marked by huge labor unrest, which again changed the face of Bombay/Mumbai. From the 1950s till the 1970s, Bombay was known for its mills. So large migrant communities from all over India came to Bombay to become the next Bollywood star and if that didn t happen, they would get jobs in the mills. Bombay/Mumbai has/had this unique feature that somehow you will make money to make ends meet. Of course, with the pandemic, even that has gone for a toss. Labor unrest was a defining character of that decade. Three movies, Kaala Patthar, Kalyug, and Ankush give a broad outlook of what happened in that decade. One thing which is present and omnipresent then and now is how time and time again we lost our demographic dividend. Again there was an exodus of young people who ventured out to seek fortunes elsewhere. The 1970s and 80s were also famous for the license Raj which they bought in. Just like the Soviets, there were waiting periods for everything. A telephone line meant waiting for things anywhere from 4 to 8 years. In 1987, when we applied and got a phone within 2-3 months, most of my relatives both from my mother and father s side could not believe we paid 0 to get a telephone line. We did pay the telephone guy INR 10/- which was a somewhat princely sum when he was installing it, even then they could not believe it as in Northern India, you couldn t get a phone line even if your number had come. You had to pay anywhere from INR 500/1000 or more to get a line. This was BSNL and to reiterate there were no alternatives at that time.

The 1990s and the Harshad Mehta Scam The 90s was when I was a teenager. You do all the stupid things for love, lust, whatever. That is also the time you are introduced really to the world of money. During my time, there were only three choices, Sciences, Commerce, and Arts. If History were your favorite subject then you would take Arts and if it was not, and you were not studious, then you would up commerce. This is how careers were chosen. So I enrolled in Commerce. Due to my grandfather and family on my mother s side interested in stocks both as a saving and compounding tool, I was able to see Pune Stock Exchange in action one day. The only thing I remember that day is people shouting loudly with various chits. I had no idea that deals of maybe thousands or even lakhs. The Pune Stock Exchange had been newly minted. I also participated in a couple of mock stock exchanges and came to understand that one has to be aggressive in order to win. You had to be really loud to be heard over others, you could not afford to be shy. Also, spread your risks. Sadly, nothing about the stock markets was there in the syllabus. 1991 was also when we saw the Iraq war, the balance of payments crisis in India, and didn t know that the Harshad Mehta scam was around the corner. Most of the scams in India have been caught because the person who was doing it was flashy. And this was the reason that even he was caught as Ms. Sucheta Dalal, a young beat reporter from Indian Express who had been covering Indian stock market. Many of her articles were thought-provoking. Now, a brief understanding is required to know before we actually get to the scam. Because of the 1991 balance of payments crisis, IMF rescued India on the condition that India throws its market open. In the 1980s itself, Rajeev Gandhi had wanted to partially make India open but both politicians and Industrialists advised him not to do the same, we are/were not ready. On 21st May 1991, Rajeev Gandhi was assassinated by the LTTE. A month later, due to the sympathy vote, the Narsimha Rao Govt. took power. While for most new Governments there is usually a honeymoon period lasting 6 months or so till they get settled in their roles before people start asking tough questions. It was not to be for this Govt. Immediately, The problem had been building for a few years. Although, in many ways, our economy was better than it is today. The only thing India didn t do well at that time was managing foreign exchange. As only a few Indians had both the money and the opportunity to go abroad and need for electronics was limited. One of the biggest imports of the time then and still today is Energy, Oil. While today it is Oil/Gas and electronics, at that time it was only OIl. The Oil import bill was ballooning while exports were more or less stagnant and mostly comprised of raw materials rather than finished products. Even today, it is largely this, one of the biggest Industrialists in India Ambani exports gas/oil while Adani exports coal. Anyways, the deficit was large enough to trigger a payment crisis. And Narsimha Rao had to throw open the Indian market almost overnight. Some changes became quickly apparent, while others took a long time to come.

Satellite Television and Entry of Foreign Banks Almost overnight, from 1 channel we became multi-channel. Star TV (Rupert Murdoch) bought us Bold and Beautiful, while CNN broadcasted the Iraq War. It was unbelievable for us that we were getting reports of what had happened 24-48 hours earlier. Fortunately or unfortunately, I was still very much a teenager to understand the import of what was happening. Even in my college, except for one or two-person, it wasn t a topic for debate or talk or even the economy. We were basically somehow cocooned in our own little world. But this was not the case for the rest of India and especially banks. The entry of foreign banks was a rude shock to Indian banks. The foreign banks were bringing both technology and sophistication in their offerings, and Indian Banks needed and wanted fast money to show hefty profits. Demand for credit wasn t much, at least nowhere the level it today is. At the same time, default on credit was nowhere high as today is. But that will require its own space and article. To quench the thirst for hefty profits by banks, Enter Harshad Mehta. At that point in time, banks were not permitted at all to invest in the securities/share market. They could only buy Government securities or bonds which had a coupon rate of say 8-10% which was nowhere enough to satisfy the need for hefty profits as desired by Indian banks. On top of it, that cash was blocked for a long time. Most of these Government bonds had anywhere between 10-20 year maturity date and some even longer. Now, one loophole in that was that the banks themselves could not buy these securities. They had to approach a registered broker of the share market who will do these transactions on their behalf. Here is where Mr. Mehta played his game. He shared both legal and illegal ways in which both the bank and he would prosper. While banking at one time was thought to be conservative and somewhat cautious, either because they were too afraid that Western private banks will take that pie or whatever their reasons might be, they agreed to his antics. To play the game, Harshad Mehta needed lots of cash, which the banks provided him in the guise of buying securities that were never bought, but the amounts were transferred to his account. He actively traded stocks, at the same time made a group, and also made the rumor mill work to his benefit. The share market is largely a reactionary market. It operates on patience, news, and rumor-mill. The effect of his shenanigans was that the price of a stock that was trending at say INR 200 reached the stratospheric height of INR 9000/- without any change in the fundamentals or outlook of the stock. His thirst didn t remain restricted to stocks but also ventured into the unglamorous world of Govt. securities where he started trading even in them in large quantities. In order to attract new clients, he coveted a fancy lifestyle. The fancy lifestyle was what caught the eye of Sucheta Dalal, and she started investigating the deals he was doing. Being a reporter, she had the advantage of getting many doors to open and get information that otherwise would be under lock and key. On 23rd April 1992, Sucheta Dalal broke the scam.

The Impact The impact was almost like a shock to the markets. Even today, it can be counted as one of the biggest scams in the Indian market if you adjust it for inflation. I haven t revealed much of the scam and what happened, simply because Sucheta Dalal and Debasis Basu wrote The Scam for that purpose. How do I shorten a story and experience which has been roughly written in 300 odd pages in one or two paragraphs, it is simply impossible. The impact though was severe. The Indian stock market became a bear market for two years. Sucheta Dalal was kicked out/made to resign out of Indian Express. The thing is simple, all newspapers survive on readership and advertisements with advertisements. Companies who were having a golden run, whether justified or not, on the bourses/Stock Exchange. For many companies, having a good number on the stock exchange was better than the company fundamentals. There was supposed to be a speedy fast-track court setup for Financial crimes, but it worked only for the Harshad Mehta case and still took over 5 years. It led to the creation of NSE (National Stock Exchange). It also led to the creation of SEBI, perhaps one of the most powerful regulators, giving it a wide range of powers and remit but on the ground more often that proved to be no more than a glorified postman. And the few times it used, it used on the wrong people and people had to go to courts to get justice. But then this is not about SEBI nor is this blog post about NSE. I have anyways shared about Absolute power above, so will not repeat the link here. The Anecdotal impact was widespread. Our own family broker took the extreme step. For my grandfather on the mother s side, he was like the second son. The news of his suicide devastated my grandfather quite a bit, which we realized much later when he was diagnosed with Alzheimer s. Our family stockbroker had been punting, taking lots of cash from the market at very high rates, betting on stocks wildly as the stock market was reaching for the stars when the market crashed, he was insolvent. How the family survived is a tale in itself. They had just got married just a few years ago and had a cute boy and girl soon after. While today, both are grown-up, at that time what the wife faced only she knows. There were also quite a few shareholders who also took the extreme step. The stock markets in those days were largely based on trust and even today is unless you are into day-trading. So there was always some money left on the table for the share/stockbroker which would be squared off in the next deal/transaction where again you will leave something. My grandfather once thought of going over and meeting them, and we went to the lane where their house is, seeing the line of people who had come for recovery of loans, we turned back with a heavy heart. There was another taboo that kinda got broken that day. The taboo was that the stock market is open to scams. From 1992 to 2021 has been a cycle of scams. Even now, today, the stock market is at unnatural highs. We know for sure that a lot of hot money is rolling around, a lot of American pension funds etc. Till it will work, it will work, some news something and that money will be moved out. Who will be left handing the can, the Indian investors? A Few days back, Ambani writes about Adani. Now while the facts shared are correct, is Adani the only one, the only company to have a small free float in the market. There probably are more than 1/4th or 1/3rd of well-respected companies who may have a similar configuration, the only problem is it is difficult to know who the proxies are. Now if I were to reflect and compare this either with the 1960s or even the 1990s I don t find much difference apart from the fact that the proxy is sitting in Mauritius. At the same time, today you can speculate on almost anything. Whether it is stocks, commodities, derivatives, foreign exchange, cricket matches etc. the list is endless. Since 2014, the rise in speculation rather than investment has been dramatic, almost stratospheric. Sadly, there are no studies or even attempts made to document this. How much official and unofficial speculation is there in the market nobody knows. Money markets have become both fluid and non-transparent. In theory, you have all sorts of regulators, but it is still very much like the Wild West. One thing to note that even Income tax had to change and bring it provisions to account for speculative income.So, starting from being totally illegitimate, it has become kind of legal and is part of Income Tax. And if speculation is not wrong, why not make Indian cricket officially a speculative event, that will be honest and GOI will get part of the proceeds.

Conclusion I wish there was some positive conclusion I could drive, but sadly there is not. Just today read two articles about the ongoing environmental issues in Himachal Pradesh. As I had shared even earlier, the last time I visited those places in 2011, and even at that time I was devastated to see the kind of construction going on. Jogiwara Road which they showed used to be flat single ground/first floor dwellings, most of which were restaurants and whatnot. I had seen the water issues both in Himachal and UT (Uttarakhand) back then and this is when they made huge dams. In U.S. they are removing dams and here we want more dams

21 June 2021

Shirish Agarwal: Accessibility, Freenode and American imperialism.

Accessibility This is perhaps one of the strangest ways and yet also perhaps the straightest way to start the blog post. For the past weeks/months, a strange experience has been there. I am using a Logitech wireless keyboard and mouse for almost a decade. Now, for the past few months and weeks we observed a somewhat rare phenomena . While in-between us we have a single desktop computer. So me and mum take turns to be on the Desktop. At times, however, the system would sit idle and after some time it goes to low-power mode/sleep mode after 30 minutes. Then, when you want to come back, you obviously have to give your login credentials. At times, the keyboard refuses to input any data in the login screen. Interestingly, the mouse still functions. Much more interesting is the fact that both the mouse and the keyboard use the same transceiver sensor to send data. And I had changed batteries to ensure it was not a power issue but still no input :(. While my mother uses and used the power switch (I did teach her how to hold it for few minutes and then let it go) but for self, tried another thing. Using the mouse I logged of the session thinking perhaps some race condition or something might be in the session which was not letting the keystrokes be inputted into the system and having a new session might resolve it. But this was not to be  Luckily, on the screen you do have the option to reboot or power off. I did a reboot and lo, behold the system was able to input characters again. And this has happened time and again. I tried to find GOK and failed to remember that GOK had been retired. I looked up the accessibility page on Debian wiki. Very interesting, very detailed but sadly it did not and does not provide the backup I needed. I tried out florence but found that the app. is buggy. Moreover, the instructions provided on the lightdm screen does not work. I do not get the on-screen keyboard while I followed the instructions. Just to be clear this is all on Debian testing which is gonna be Debian stable soonish  I even tried the same with xvkbd but no avail. I do use mate as my desktop-manager so maybe the instructions need some refinement ???? $ cat /etc/lightdm/lightdm-gtk-greeter.conf grep keyboard
# a11y-states = states of accessibility features: name save state on exit, -name
disabled at start (default value for unlisted), +name enabled at start. Allowed names: contrast, font, keyboard, reader.
keyboard=xvkbd no-gnome focus &
# keyboard-position = x y[;width height] ( 50%,center -0;50% 25% by default) Works only for onboard
#keyboard= Interestingly, Debian does provide two more on-screen keyboards, matchbox as well as onboard which comes from Ubuntu. While I have both of them installed. I find xvkbd to be enough for my work, the only issue seems to be I cannot get it from the drop-down box of accessibility at the login screen. Just to make sure that I have not gone to Gnome-display manager, I did run

$ sudo dpkg-reconfigure gdm3 Only to find out that I am indeed running lightdm. So I am a bit confused why it doesn t come up as an option when I have the login window/login manager running. FWIW I do run metacity as the window manager as it plays nice with all the various desktop environments I have, almost all of them. So this is where I m stuck. If I do get any help, I probably would also add those instructions to the wiki page, so it would be convenient to the next person who comes with the same issue. I also need to figure out some way to know whether there is some race-condition or something which is happening, have no clue how would I go about it without having whole lot of noise. I am sure there are others who may have more of an idea. FWIW, I did search unix.stackexchange as well as reddit/debian to see if I could see any meaningful posts but came up empty.

Freenode I had not been using IRC for quite some time now. The reasons have been multiple issues with Riot (now element) taking the whole space on my desktop. I did get alerted to the whole thing about a week after the whole thing went down. Somebody messaged me DM. I *think* I put up a thread or a mini-thread about IRC or something in response to somebody praising telegram/WhatsApp or one of those apps. That probably triggered the DM. It took me a couple of minutes to hit upon this. I was angry and depressed, seeing the behavior of the new overlords of freenode. I did see that lot of channels moved over to Libera. It was also interesting to see that some communities were thinking of moving to some other obscure platform, which again could be held hostage to the same thing. One could argue one way or the other, but that would be tiresome and fact is any network needs lot of help to be grown and nurtured, whether it is online or offline. I also saw that Libera was also using a software Solanum which is ircv3 compliant. Now having done this initial investigation, it was time to move to an IRC client. The Libera documentation is and was pretty helpful in telling which IRC clients would be good with their network. So I first tried hexchat. I installed it and tried to add Libera server credentials, it didn t work. Did see that they had fixed the bug in sid/unstable and now it s in testing. But at the time it was in sid, the bug-fixed and I wanted to have something which just ran the damn thing. I chanced upon quassel. I had played around with quassel quite a number of times before, so I knew I could play/use it. Hence, I installed it and was able to use it on the first try. I did use the encrypted server and just had to tweak some settings before I could use it with some help with their documentation. Although, have to say that even quassel upstream needs to get its documentation in order. It is just all over the place, and they haven t put any effort into streamlining the documentation, so that finding things becomes easier. But that can be said of many projects upstream. There is one thing though that all of these IRC clients lack. The lack of a password manager. Now till that isn t fixed it will suck because you need another secure place to put your password/s. You either put it on your desktop somewhere (insecure) or store it in the cloud somewhere (somewhat secure but again need to remember that password), whatever you do is extra work. I am sure there will be a day when authenticating with Nickserv will be an automated task and people can just get on talking on channels and figuring out how to be part of the various communities. As can be seen, even now there is a bit of a learning curve for both newbies and people who know a bit about systems to get it working. Now, I know there are a lot of things that need to be fixed in the anonymity, security place if I put that sort of hat. For e.g. wouldn t it be cool if either the IRC client or one of its add-on gave throwaway usernames and passwords. The passwords would be complex. This would make it easier who are paranoid about security and many do and would have. As an example we can see of Fuchs. Now if the gentleman or lady is working in a professional capacity and would come to know of their real identity and perceive rightly or wrongly the role of that person, it will affect their career. Now, should it? I am sure a lot of people would be divided on the issue. Personally, as far as I am concerned, I would say no because whether right or wrong, whatever they were doing they were doing on their own time. Not on company time. So it doesn t concern the company at all. If we were to let companies police the behavior outside the time, individuals would be in a lot of trouble. Although, have to say that is a trend that has been seen in companies that are firing people either on the left or right. A recent example that comes to mind is Emily Wilder who was fired by Associated Press. Interestingly, she was interviewed by Democracy now, and it did come out that she is a Jew. As can be seen and understood there is a lot of nuance to her story and not the way she was fired. It doesn t give a good taste in the mouth, but then getting fired nobody does. On few forums, people did share of people getting fired of their job because they were dancing (cops). Again, it all depends, for me again, hats off to anybody who feels like dancing or whatever because there are just so many depressing stories all around.

Banned and FOE On few forums I was banned because I was talking about Brexit and American imperialism, both of which are seem to ruffle a few feathers in quite a few places. For instance, many people for obvious reasons do not like this video

Now I m sorry I am not able to and have not been able to give invidious links for the past few months. The reason being invidious itself went through some changes and the changes are good and bad. For e.g. now you need to share your google id with a third-party which at least to my mind is not a good idea. But that probably is another story altogether and it probably will need its own place. Coming back to the video itself, this was shared by Anthony hazard and the Title is The Atlantic slave trade: What too few textbooks told you . I did see this video quite a few years ago and still find it hard to swallow that tens of millions of Africans were bought as slaves to the Americas, although to be fair it does start with the Spanish settlement in the land which would be called the U.S. but they bought slaves with themselves. They even got the American natives, i.e. people from different tribes which made up America at that point. One point to note is that U.S. got its independence on July 4, 1776 so all the people before that were called as European settlers for want of a better word. Some or many of these European settlers would be convicts who were sent from UK. But as shared in the article, that would only happen with U.S. itself is mature and open enough for that discussion. Going back to the original point though, these European or American settlers bought lot of slaves from Africa. The video does also shed some of the cruelty the Europeans or Americans did on the slaves, men and women in different ways. The most revelatory part though which I also forget many a times that because lot of people were taken from Africa and many of them men, it did lead to imbalances in the African societies not just in weddings but economics in general. It also developed a theory called Critical Race theory in which it tries to paint the Africans as an inferior race otherwise how would Christianity work where their own good book says All men are born equal . That does in part explain why the African countries are still so far behind their European or American counterparts. But Africa can still be proud as they are richer than us, yup India. Sadly, I don t think America is ready to have that conversation anytime soon or if ever. And if it were to do, it would have to out-do any truth and reconciliation Committee which the world has seen. A mere apology or two would not just cut it. The problems of America sadly are not limited to just Africans but the natives of the land, for e.g. the Lakota people. In 1868, they put a letter stating we will give the land back to the Lakota people forever, but then the gold rush happened. In 2007, when the Lakota stated their proposal for independence, the U.S. through its force denied. So much for the paper, it was written on. Now from what I came to know over the years, the American natives are called First nations . Time and time again the American Govt. has tried or been foul towards them. Some of the examples include The Yucca Mountain nuclear waste repository . The same is and was the case with The Keystone pipeline which is now dead. Now one could say that it is America s internal matter and I would fully agree but when they speak of internal matters of other countries, then we should have the same freedom. But this is not restricted to just internal matters, sadly. Since the 1950 s i.e. the advent of the cold war, America s foreign policy made Regime changes all around the world. Sharing some of the examples from the Cold War

Iran 1953
Guatemala 1954
Democratic Republic of the Congo 1960
Republic of Ghana 1966
Iraq 1968
Chile 1973
Argentina 1976
Afghanistan 1978-1980s
Grenada
Nicaragua 1981-1990
1. Destabilization through CIA assets
2. Arming the Contras
El Salvador 1980-92
Philippines 1986 Even after the Cold War ended the situation was anonymolus, meaning they still continued with their old behavior. After the end of Cold War

Guatemala 1993
Serbia 2000
Iraq 2003-
Afghanistan 2001 ongoing There is a helpful Wikipedia article titled History of CIA which basically lists most of the covert regime changes done by U.S. The abvoe is merely a sub-set of the actions done by U.S. Now are all the behaviors above of a civilized nation ? And if one cares to notice, one would notice that all the above countries in the list which had the regime change had either Oil or precious metals. So U.S. is and was being what it accuses China, a profiteer. But this isn t just the U.S. China story but more about the American abuse of its power. My own country, India paid IMF loans till 1991 and we paid through the nose. There were economic sanctions against India. But then, this is again not just about U.S. India. Even with Europe or more precisely Norway which didn t want to side with America because their intelligence showed that no WMD were present in Iraq, the relationship still has issues.

Pandemic and the World So I do find that this whole blaming of China by U.S. quite theatrical and full of double-triple standards. Very early during the debates, it came to light that the Spanish Flu actually originated in Kensas, U.S.

What was also interesting as I found in the Pentagon Papers much before The Watergate scandal came out that U.S. had realized that China would be more of a competitor than Russia. And this itself was in 1960 s itself. This shows the level of intelligence that the Americans had. From what I can recollect from whatever I have read of that era, China was still mostly an agri-based economy. So, how the U.S. was able to deduce that China will surpass other economies is beyond me even now. They surely must have known something that even we today do not. One of the other interesting observations and understanding that I got while researching that every year we transfer an average of 7500 diseases from animal to humans and that should be a scary figure. I think more than anything else, loss of habitat and use of animals from food to clothing to medicine is probably the reason we are getting such diseases. I am also sure that there probably are and have been similar number of transfer of diseases from humans to animals as well but for well-known biases and whatnot those studies are neither done or are under-funded. There are and have been reports of something like 850,000 undiscovered viruses which various mammals and birds have. Also I did find that most of such pandemics are hard to identify, for e.g. SARS 1 took about 15 years, Ebola we don t know till date from where it came. Even HIV has questions for us. Hell, even why does hearing go away is a mystery to us. In all of this, we want to say China is culpable. And while China may or may not be culpable, only time will tell, this is surely the opportunity for all countries to spend and make capacities in public health. Countries which will take lessons from it and improve their public healthcare models will hopefully will not suffer as those who will suffer and are continuing to suffer now  To those who feel that habitat loss of animals is untrue, I would suggest them to see Sherni which depicts the human/animal conflict in all its brutality. I am gonna warn in advance that the ending is not nice but what can you expect from a country in which forest area cover has constantly declined and the Govt. itself is only interested in headline management

The only positive story I can share from India is that finally the Modi Govt. has said we will do free vaccine immunization for everybody. Although the pace is nothing to write home about. One additional thing they relaxed was instead of going to Cowin or any other portal, people could simply walk in using their identity papers. Although, given the pace of vaccinations, it is going to take anywhere between 13-18 months or more depending on availability of vaccines.

Looking forward to all and any replies have a virtual keyboard, preferably xvkbd as that is good enough for my use-case.

14 June 2021

Enrico Zini: Pipelining

This is part of a series of posts on ideas for an ansible-like provisioning system, implemented in Transilience. Running actions on a server is nice, but a network round trip for each action is not very efficient. If I need to run a linear sequence of actions, I can stream them all to the server, and then read replies streamed from the server as they get executed. This technique is called pipelining and one can see it used, for example, in Redis, or Mitogen. Roles Ansible has the concept of "Roles" as a series of related tasks: I'll play with that. Here's an example role to install and setup fail2ban:
class Role(role.Role):
    def main(self):
        self.add(builtin.apt(
            name=["fail2ban"],
            state="present",
        ))
        self.add(builtin.copy(
            content=inline("""
                [postfix]
                enabled = true
                [dovecot]
                enabled = true
            """),
            dest="/etc/fail2ban/jail.local",
            owner="root",
            group="root",
            mode=0o644,
        ), name="configure fail2ban")
I prototyped roles as classes, with methods that push actions down the pipeline. If an action fails, all further actions for the same role won't executed, and will be marked as skipped. Since skipping is applied per-role, it means that I can blissfully stream actions for multiple roles to the server down the same pipe, and errors in one role will stop executing that role and not others. Potentially I can get multiple roles going with a single network round-trip:
#!/usr/bin/python3
import sys
from transilience.system import Mitogen
from transilience.runner import Runner
@Runner.cli
def main():
    system = Mitogen("my server", "ssh", hostname="server.example.org", username="root")
    runner = Runner(system)
    # Send roles to the server
    runner.add_role("general")
    runner.add_role("fail2ban")
    runner.add_role("prosody")
    # Run until all roles are done
    runner.main()
if __name__ == "__main__":
    sys.exit(main())
That looks like a playbook, using Python as glue rather than YAML. Decision making in roles Besides filing a series of actions, a role may need to take decisions based on the results of previous actions, or on facts discovered from the server. In that case, we need to wait until the results we need come back from the server, and then decide if we're done or if we want to send more actions down the pipe. Here's an example role that installs and configures Prosody:
from transilience import actions, role
from transilience.actions import builtin
from .handlers import RestartProsody
class Role(role.Role):
    """
    Set up prosody XMPP server
    """
    def main(self):
        self.add(actions.facts.Platform(), then=self.have_facts)
        self.add(builtin.apt(
            name=["certbot", "python-certbot-apache"],
            state="present",
        ), name="install support packages")
        self.add(builtin.apt(
            name=["prosody", "prosody-modules", "lua-sec", "lua-event", "lua-dbi-sqlite3"],
            state="present",
        ), name="install prosody packages")
    def have_facts(self, facts):
        facts = facts.facts  # Malkovich Malkovich Malkovich!
        domain = facts["domain"]
        ctx =  
            "ansible_domain": domain
         
        self.add(builtin.command(
            argv=["certbot", "certonly", "-d", f"chat. domain ", "-n", "--apache"],
            creates=f"/etc/letsencrypt/live/chat. domain /fullchain.pem"
        ), name="obtain chat certificate")
        with self.notify(RestartProsody):
            self.add(builtin.copy(
                content=self.template_engine.render_file("roles/prosody/templates/prosody.cfg.lua", ctx),
                dest="/etc/prosody/prosody.cfg.lua",
            ), name="write prosody configuration")
            self.add(builtin.copy(
                src="roles/prosody/templates/firewall-ruleset.pfw",
                dest="/etc/prosody/firewall-ruleset.pfw",
            ), name="write prosody firewall")
    # ...
This files some general actions down the pipe, with a hook that says: when the results of this action come back, run self.have_facts(). At that point, the role can use the results to build certbot command lines, render prosody's configuration from Jinja2 templates, and use the results to file further action down the pipe. Note that this way, while the server is potentially still busy installing prosody, we're already streaming prosody's configuration to it. If anything goes wrong with the installation of prosody's package, the role will be marked as failed and all further actions of the same role, even those filed by have_facts() will be skipped. Notify and handlers In the previous example self.notify() also appears: that's my attempt to model the equivalent of Ansible's handlers. If any of the actions inside the with produce changes, then the RestartProsody role will be executed, potentially filing more actions ad the end of the playbook. The runner will take care of collecting all the triggered role classes in a set, which discards duplicates, and then running the main() method of all resulting roles, which will cause more actions to be filed down the pipe. Action conditions Sometimes some actions are only meaningful as consequences of other actions. Let's take, for example, enabling buster-backports as an extra apt source:
        a = self.add(builtin.copy(
            owner="root",
            group="root",
            mode=0o644,
            dest="/etc/apt/sources.list.d/debian-buster-backports.list",
            content="deb [arch=amd64] https://mirrors.gandi.net/debian/ buster-backports main contrib",
        ), name="enable backports")
        self.add(builtin.apt(
            update_cache=True
        ), name="update after enabling backports",
           # Run only if the previous copy changed anything
           when= a: ResultState.CHANGED ,
        )
Here we want to update Apt's cache, which is a slow operation, only after we actually write /etc/apt/sources.list.d/debian-buster-backports.list. If the file was already there from a previous run, we can skip downloading the new package lists. The when= attributes adds an annotation to the action that is sent town the pipeline, that says that it should only be run if the state of a previous action matches the given one. In this case, when on the remote it's the turn of "update after enabling backports", it gets skipped unless the state of the previous "enable backports" action is CHANGED. Effects of pipelining I ported enough of Ansible's modules to be able to run the provisioning scripts of my VPS entirely via ansible. This is the playbook run as plain Ansible:
$ time ansible-playbook vps.yaml
[...]
servername       : ok=55   changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
real    2m10.072s
user    0m33.149s
sys 0m10.379s
This is the same playbook run with Ansible speeded up via the Mitogen backend, which makes Ansible more bearable:
$ export ANSIBLE_STRATEGY=mitogen_linear
$ time ansible-playbook vps.yaml
[...]
servername       : ok=55   changed=1    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
real    0m24.428s
user    0m8.479s
sys 0m1.894s
This is the same playbook ported to Transilience:
$ time ./provision
[...]
real    0m2.585s
user    0m0.659s
sys 0m0.034s
Doing nothing went from 2 minutes down to 3 seconds! That's the kind of running time that finally makes me comfortable with maintaining my VPS by editing the playbook only, and never logging in to mess with the system configuration by hand! Next steps I'm quite happy with what I have: I can now maintain my VPS with a simple script with quick iterative cycles. I might use it to develop new playbooks, and port them to ansible only when they're tested and need to be shared with infrastructure that needs to rely on something more solid and battle tested than a prototype provisioning system. I might also keep working on it as I have more interesting ideas that I'd like to try. I feel like Ansible reached some architectural limits that are hard to overcome without a major redesign, and are in many way hardcoded in its playbook configuration. It's nice to be able to try out new designs without that baggage. I'd love it if even just the library of Transilience actions could grow, and gain widespread use. Ansible modules standardized a set of management operations, that I think became the way people think about system management, and should really be broadly available outside of Ansible. If you are interesting in playing with Transilience, such as: do get in touch or send a pull request! :) Next step: Reimagining Ansible variables.

9 June 2021

Enrico Zini: Ansible recurse and follow quirks

I'm reading Ansible's builtin.file sources for, uhm, reasons, and the use of follow stood out to my eyes. Reading on, not only that. I feel like the ansible codebase needs a serious review, at least in essential core modules like this one. In the file module documentation it says:
This flag indicates that filesystem links, if they exist, should be followed.
In the recursive_set_attributes implementation instead, follow means "follow symlinks to directories", but if a symlink to a file is found, it does not get followed, kind of. What happens is that ansible will try to change the mode of the symlink, which makes sense on some operating systems. And it does try to use lchmod if present. Buf if not, this happens:
# Attempt to set the perms of the symlink but be
# careful not to change the perms of the underlying
# file while trying
underlying_stat = os.stat(b_path)
os.chmod(b_path, mode)
new_underlying_stat = os.stat(b_path)
if underlying_stat.st_mode != new_underlying_stat.st_mode:
    os.chmod(b_path, stat.S_IMODE(underlying_stat.st_mode))
So it tries doing chmod on the symlink, and if that changed the mode of the actual file, switch it back. I would have appreciated a comment documenting on which systems a hack like this makes sense. As it is, it opens a very short time window in which a symlink attack can make a system file vulerable, and an exception thrown by the second stat will make it vulnerable permanently. What about follow following links during recursion: how does it avoid loops? I don't see a cache of (device, inode) pairs visited. Let's try:
fatal: [localhost]: FAILED! =>  "changed": false, "details": "maximum recursion depth exceeded", "gid": 1000, "group": "enrico", "mode": "0755", "msg": "mode must be in octal or symbolic form", "owner": "enrico", "path": "/tmp/test/test1", "size": 0, "state": "directory", "uid": 1000 
Ok, it, uhm, delegates handling that to the Python stack size. I guess it means that a ln -s .. foo in a directory that gets recursed will always fail the task. Fun! More quirks Turning a symlink into a hardlink is considered a noop if the symlink points to the same file:
---
- hosts: localhost
  tasks:
   - name: create test file
     file:
        path: /tmp/testfile
        state: touch
   - name: create test link
     file:
        path: /tmp/testlink
        state: link
        src: /tmp/testfile
   - name: turn it into a hard link
     file:
        path: /tmp/testlink
        state: hard
        src: /tmp/testfile
gives:
$ ansible-playbook test3.yaml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [create test file] *****************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [create test link] *****************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [turn it into a hard link] *********************************************************************************************************************************************************************************************
ok: [localhost]
PLAY RECAP ******************************************************************************************************************************************************************************************************************
localhost                  : ok=4    changed=2    unreachable=0    failed=0    skipped=0    rescued=0    ignored=0
More quirks Converting a directory into a hardlink should work, but it doesn't because unlink is used instead of rmdir:
---
- hosts: localhost
  tasks:
   - name: create test dir
     file:
        path: /tmp/testdir
        state: directory
   - name: turn it into a symlink
     file:
        path: /tmp/testdir
        state: hard
        src: /tmp/
        force: yes
gives:
$ ansible-playbook test4.yaml
[WARNING]: provided hosts list is empty, only localhost is available. Note that the implicit localhost does not match 'all'
PLAY [localhost] ************************************************************************************************************************************************************************************************************
TASK [Gathering Facts] ******************************************************************************************************************************************************************************************************
ok: [localhost]
TASK [create test dir] ******************************************************************************************************************************************************************************************************
changed: [localhost]
TASK [turn it into a symlink] ***********************************************************************************************************************************************************************************************
fatal: [localhost]: FAILED! =>  "changed": false, "gid": 1000, "group": "enrico", "mode": "0755", "msg": "Error while replacing: [Errno 21] Is a directory: b'/tmp/testdir'", "owner": "enrico", "path": "/tmp/testdir", "size": 0, "state": "directory", "uid": 1000 
PLAY RECAP ******************************************************************************************************************************************************************************************************************
localhost                  : ok=2    changed=1    unreachable=0    failed=1    skipped=0    rescued=0    ignored=0
More quirks This is hard to test, but it looks like if source and destination are hardlinks to the same inode numbers, but on different filesystems, the operation is considered a successful noop: https://github.com/ansible/ansible/blob/devel/lib/ansible/modules/file.py#L821 It should probably be something like:
if (st1.st_dev, st1.st_ino) == (st2.st_dev, st2.st_ino):

4 June 2021

Matthew Garrett: Mike Lindell's Cyber "Evidence"

Mike Lindell, notable for absolutely nothing relevant in this field, today filed a lawsuit against a couple of voting machine manufacturers in response to them suing him for defamation after he claimed that they were covering up hacks that had altered the course of the US election. Paragraph 104 of his suit asserts that he has evidence of at least 20 documented hacks, including the number of votes that were changed. The citation is just a link to a video called Absolute 9-0, which claims to present sufficient evidence that the US supreme court will come to a 9-0 decision that the election was tampered with.

The claim is that Lindell was provided with a set of files on the 9th of January, and gave these to some cyber experts to verify. These experts identified them as packet captures. The video contains scrolling hex, and we are told that this is the raw encrypted data from the files. In reality, the hex values correspond very clearly to printable ASCII, and appear to just be the Pennsylvania voter roll. They're not encrypted, and they're not packet captures (they contain no packet headers).

20 of these packet captures were then selected and analysed, giving us the tables contained within Exhibit 12. The alleged source IPs appear to correspond to the networks the tables claim, and the latitude and longitude presumably just come from a geoip lookup of some sort (although clearly those values are far too precise to be accurate). But if we look at the target IPs, we find something interesting. Most of them resolve to the website for the county that was the nominal target (eg, 198.108.253.104 is www.deltacountymi.org). So, we're supposed to believe that in many cases, the county voting infrastructure was hosted on the county website.

Unfortunately we're not given the destination port, but 198.108.253.104 isn't listening on anything other than 80 and 443. We're told that the packet data is encrypted, so presumably it's over HTTPS. So, uh, how did they decrypt this to figure out how many votes were switched? If Mike's hackers have broken TLS, they really don't need to be dealing with this.

We're also given some background information on how it's impossible to reconstruct packet captures after the fact (untrue), or that modifying them would change their hashes (true, but in the absence of known good hash values that tells us nothing), but it's pretty clear that nothing we're shown actually demonstrates what we're told it does.

In summary: yes, any supreme court decision on this would be 9-0, just not the way he's hoping for.

Update: It was pointed out that this data appears to be part of a larger dataset. This one is even more dubious - it somehow has MAC addresses for both the source and destination (which is impossible), and almost none of these addresses are in actual issued ranges.

comment count unavailable comments

27 May 2021

Michael Prokop: What to expect from Debian/bullseye #newinbullseye

Bullseye Banner, Copyright 2020 Juliette Taka Debian v11 with codename bullseye is supposed to be released as new stable release soon-ish (let s hope for June, 2021! :)). Similar to what we had with #newinbuster and previous releases, now it s time for #newinbullseye! I was the driving force at several of my customers to be well prepared for bullseye before its freeze, and since then we re on good track there overall. In my opinion, Debian s release team did (and still does) a great job I m very happy about how unblock requests (not only mine but also ones I kept an eye on) were handled so far. As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on bullseye that might be worth also for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings Of course start with taking a look at the official Debian release notes, make sure to especially go through What s new in Debian 11 + Issues to be aware of for bullseye. Chris published notes on upgrading to Debian bullseye, and also anarcat published upgrade notes for bullseye. Package versions As a starting point, let s look at some selected packages and their versions in buster vs. bullseye as of 2021-05-27 (mainly having amd64 in mind):
Package buster/v10 bullseye/v11
ansible 2.7.7 2.10.8
apache 2.4.38 2.4.46
apt 1.8.2.2 2.2.3
bash 5.0 5.1
ceph 12.2.11 14.2.20
docker 18.09.1 20.10.5
dovecot 2.3.4 2.3.13
dpkg 1.19.7 1.20.9
emacs 26.1 27.1
gcc 8.3.0 10.2.1
git 2.20.1 2.30.2
golang 1.11 1.15
libc 2.28 2.31
linux kernel 4.19 5.10
llvm 7.0 11.0
lxc 3.0.3 4.0.6
mariadb 10.3.27 10.5.10
nginx 1.14.2 1.18.0
nodejs 10.24.0 12.21.0
openjdk 11.0.9.1 11.0.11+9 + 17~19
openssh 7.9p1 8.4p1
openssl 1.1.1d 1.1.1k
perl 5.28.1 5.32.1
php 7.3 7.4+76
postfix 3.4.14 3.5.6
postgres 11 13
puppet 5.5.10 5.5.22
python2 2.7.16 2.7.18
python3 3.7.3 3.9.2
qemu/kvm 3.1 5.2
ruby 2.5.1 2.7+2
rust 1.41.1 1.48.0
samba 4.9.5 4.13.5
systemd 241 247.3
unattended-upgrades 1.11.2 2.8
util-linux 2.33.1 2.36.1
vagrant 2.2.3 2.2.14
vim 8.1.0875 8.2.2434
zsh 5.7.1 5.8
Linux Kernel The bullseye release will ship a Linux kernel based on v5.10 (v5.10.28 as of 2021-05-27, with v5.10.38 pending in unstable/sid), whereas buster shipped kernel 4.19. As usual there are plenty of changes in the kernel area and this might warrant a separate blog entry, but to highlight some issues: One surprising change might be that the scrollback buffer (Shift + PageUp) is gone from the Linux console. Make sure to always use screen/tmux or handle output through a pager of your choice if you need all of it and you re in the console. The kernel provides BTF support (via CONFIG_DEBUG_INFO_BTF, see #973870), which means it s no longer necessary to install LLVM, Clang, etc (requiring >100MB of disk space), see Gregg s excellent blog post regarding the underlying rational. Sadly the libbpf-tools packaging didn t make it into bullseye (#978727), but if you want to use your own self-made Debian packages, my notes might be useful. With kernel version 5.4, SUBDIRS support was removed from kbuild, so if an out-of-tree kernel module (like a *-dkms package) fails to compile on bullseye, make sure to use a recent version of it which uses M= or KBUILD_EXTMOD= instead. Unprivileged user namespaces are enabled by default (see #898446 + #987777), so programs can create more restricted sandboxes without the need to run as root or via a setuid-root helper. If you prefer to keep this feature restricted (or tools like web browsers, WebKitGTK, Flatpak, don t work), use sysctl -w kernel.unprivileged_userns_clone=0 . The /boot/System.map file(s) no longer provide the actual data, you need to switch to the dbg package if you rely on that information:
% cat /boot/System.map-5.10.0-6-amd64 
ffffffffffffffff B The real System.map is in the linux-image-<version>-dbg package
Be aware though, that the *-dbg package requires ~5GB of additional disk space. Systemd systemd v247 made it into bullseye (updated from v241). Same as for the kernel this might warrant a separate blog entry, but to mention some highlights: Systemd in bullseye activates its persistent journal functionality by default (storing its files in /var/log/journal/, see #717388). systemd-timesyncd is no longer part of the systemd binary package itself, but available as standalone package. This allows usage of ntp, chrony, openntpd, without having systemd-timesyncd installed (which prevents race conditions like #889290, which was biting me more than once). journalctl gained new options:
--cursor-file=FILE      Show entries after cursor in FILE and update FILE
--facility=FACILITY...  Show entries with the specified facilities
--image=IMAGE           Operate on files in filesystem image
--namespace=NAMESPACE   Show journal data from specified namespace
--relinquish-var        Stop logging to disk, log to temporary file system
--smart-relinquish-var  Similar, but NOP if log directory is on root mount
systemctl gained new options:
clean UNIT...                       Clean runtime, cache, state, logs or configuration of unit
freeze PATTERN...                   Freeze execution of unit processes
thaw PATTERN...                     Resume execution of a frozen unit
log-level [LEVEL]                   Get/set logging threshold for manager
log-target [TARGET]                 Get/set logging target for manager
service-watchdogs [BOOL]            Get/set service watchdog state
--with-dependencies                 Show unit dependencies with 'status', 'cat', 'list-units', and 'list-unit-files'
 -T --show-transaction              When enqueuing a unit job, show full transaction
 --what=RESOURCES                   Which types of resources to remove
--boot-loader-menu=TIME             Boot into boot loader menu on next boot
--boot-loader-entry=NAME            Boot into a specific boot loader entry on next boot
--timestamp=FORMAT                  Change format of printed timestamps
If you use systemctl edit to adjust overrides, then you ll now also get the existing configuration file listed as comment, which I consider very helpful. The MACAddressPolicy behavior with systemd naming schema v241 changed for virtual devices (I plan to write about this in a separate blog post). There are plenty of new manual pages: systemd also gained new unit configurations related to security hardening: Another new unit configuration is SystemCallLog= , which supports listing the system calls to be logged. This is very useful for for auditing or temporarily when constructing system call filters. The cgroupv2 change is also documented in the release notes, but to explicitly mention it also here, quoting from /usr/share/doc/systemd/NEWS.Debian.gz:
systemd now defaults to the unified cgroup hierarchy (i.e. cgroupv2).
This change reflects the fact that cgroups2 support has matured
substantially in both systemd and in the kernel.
All major container tools nowadays should support cgroupv2.
If you run into problems with cgroupv2, you can switch back to the previous,
hybrid setup by adding systemd.unified_cgroup_hierarchy=false to the
kernel command line.
You can read more about the benefits of cgroupv2 at
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html
Note that cgroup-tools (lssubsys + lscgroup etc) don t work in cgroup2/unified hierarchy yet (see #959022 for the details). Configuration management puppet s upstream doesn t provide packages for bullseye yet (see PA-3624 + MODULES-11060), and sadly neither v6 nor v7 made it into bullseye, so when using the packages from Debian you re still stuck with v5.5 (also see #950182). ansible is also available, and while it looked like that only version 2.9.16 would make it into bullseye (see #984557 + #986213), actually version 2.10.8 made it into bullseye. chef was removed from Debian and is not available with bullseye (due to trademark issues). Prometheus stack Prometheus server was updated from v2.7.1 to v2.24.1, and the prometheus service by default applies some systemd hardening now. Also all the usual exporters are still there, but bullseye also gained some new ones: Virtualization docker (v20.10.5), ganeti (v3.0.1), libvirt (v7.0.0), lxc (v4.0.6), openstack, qemu/kvm (v5.2), xen (v4.14.1), are all still around, though what s new and noteworthy is that podman version 3.0.1 (tool for managing OCI containers and pods) made it into bullseye. If you re using the docker packages from upstream, be aware that they still don t seem to understand Debian package version handling. The docker* packages will not be automatically considered for upgrade, as 5:20.10.6~3-0~debian-buster is considered newer than 5:20.10.6~3-0~debian-bullseye:
% apt-cache policy docker-ce
  docker-ce:
    Installed: 5:20.10.6~3-0~debian-buster
    Candidate: 5:20.10.6~3-0~debian-buster
    Version table:
   *** 5:20.10.6~3-0~debian-buster 100
          100 /var/lib/dpkg/status
       5:20.10.6~3-0~debian-bullseye 500
          500 https://download.docker.com/linux/debian bullseye/stable amd64 Packages
Vagrant is available in version 2.2.14, the package from upstream works perfectly fine on bullseye as well. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for bullseye yet, but the package from Debian/unstable (v6.1.22 as of 2021-05-27) works fine on bullseye (VirtualBox isn t shipped with stable releases since quite some time due to lack of cooperation from upstream on security support for older releases, see #794466). If you rely on the virtualbox-guest-additions-iso and its shared folders support, you might be glad to hear that v6.1.22 made it into bullseye (see #988783), properly supporting more recent kernel versions like present in bullseye. debuginfod There s a new service debuginfod.debian.net (see debian-devel-announce and Debian Wiki), which makes the debugging experience way smoother. You no longer need to download the debugging Debian packages (*-dbgsym/*-dbg), but instead can fetch them on demand, by exporting the following variables (before invoking gdb or alike):
% export DEBUGINFOD_PROGRESS=1    # for optional download progress reporting
% export DEBUGINFOD_URLS="https://debuginfod.debian.net"
BTW: if you can t rely on debuginfod (for whatever reason), I d like to point your attention towards find-dbgsym-packages from the debian-goodies package. Vim Sadly Vim 8.2 once again makes another change for bad defaults (hello mouse behavior!). When incsearch is set, it also applies to :substitute. This makes it veeeeeeeeeery annoying when running something like :%s/\s\+$// to get rid of trailing whitespace characters, because if there are no matches it jumps to the beginning of the file and then back, sigh. To get the old behavior back, you can use this:
au CmdLineEnter : let s:incs = &incsearch   set noincsearch
au CmdLineLeave : let &incsearch = s:incs
rsync rsync was updated from v3.1.3 to v3.2.3. It provides various checksum enhancements (see option --checksum-choice). We got new capabilities (hardlink-specials, atimes, optional protect-args, stop-at, no crtimes) and the addition of zstd and lz4 compression algorithms. And we got new options: OpenSSH OpenSSH was updated from v7.9p1 to 8.4p1, so if you re interested in all the changes, check out the release notes between those version (8.0, 8.1, 8.2, 8.3 + 8.4). Let s highlight some notable new features: Misc unsorted

23 March 2021

Antoine Beaupr : Major email crash with syncmaildir

TL:DR; lost half my mail (150,000 messages, ~6GB) last night. Cause uncertain, but possibly a combination of a dead CMOS battery, systemd OnCalendar=daily, a (locking?) bug in syncmaildir, and generally, a system too exotic and complicated.

The crash So I somehow lost half my mail:
anarcat@angela:~(main)$ du -sh Maildir/
7,9G    Maildir/
anarcat@curie:~(main)$ du -sh Maildir
14G     Maildir
anarcat@marcos:~$ du -sh Maildir
8,0G    Maildir
Those are three different machines:
  • angela: my laptop, not always on
  • curie: my workstation, mostly always on
  • marcos: my mail server, always on
Those mails are synchronized using a rather exotic system based on SSH, syncmaildir and rsendmail. The anomaly started on curie:
-- Reboot --
mar 22 16:13:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:13:00 curie smd-pull[4801]: rm: impossible de supprimer '/home/anarcat/.smd/workarea/Maildir': Le dossier n'est pas vide
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:13:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:14:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:14:00 curie smd-pull[7025]:  4091 ?        00:00:00 smd-push
mar 22 16:14:00 curie smd-pull[7025]: Already running.
mar 22 16:14:00 curie smd-pull[7025]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:14:00 curie smd-pull[7025]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 4091) run(rm /home/anarcat/.smd/lock))
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:14:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
Then it seems like smd-push (from curie) started destroying the universe for some reason:
mar 22 16:20:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:20:00 curie smd-pull[9319]:  4091 ?        00:00:00 smd-push
mar 22 16:20:00 curie smd-pull[9319]: Already running.
mar 22 16:20:00 curie smd-pull[9319]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:20:00 curie smd-pull[9319]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(ru
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:20:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:21:34 curie smd-push[4091]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:21:35 curie smd-push[9374]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:21:35 curie systemd[3199]: smd-push.service: Succeeded.
Notice the del-mails(293920) there: it is actively trying to destroy basically every email in my mail spool. Then somehow push and pull started both at once:
mar 22 16:21:35 curie systemd[3199]: Started push emails with syncmaildir.
mar 22 16:21:35 curie systemd[3199]: Starting push emails with syncmaildir...
mar 22 16:22:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:22:00 curie smd-pull[10333]:  9455 ?        00:00:00 smd-push
mar 22 16:22:00 curie smd-pull[10333]: Already running.
mar 22 16:22:00 curie smd-pull[10333]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:22:00 curie smd-pull[10333]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: Data transmission failed.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marco
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open requested file.
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:22:00 curie smd-push[9455]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
There it seems push tried to destroy the universe again: del-mails(293920). Interestingly, the push started again in parallel with the pull, right that minute:
mar 22 16:22:00 curie systemd[3199]: Starting push emails with syncmaildir...
... but didn't complete for a while, here's pull trying to start again:
mar 22 16:24:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:24:00 curie smd-pull[12051]: 10466 ?        00:00:00 smd-push
mar 22 16:24:00 curie smd-pull[12051]: Already running.
mar 22 16:24:00 curie smd-pull[12051]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:24:00 curie smd-pull[12051]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 10466) run(rm /home/anarcat/.smd/lock))
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
... and the long push finally resolving:
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: No such file or directory
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open requested file.
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie smd-push[10466]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
mar 22 16:24:00 curie systemd[3199]: Starting push emails with syncmaildir...
This pattern repeats until 16:35, when that locking issue silently recovered somehow:
mar 22 16:35:03 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:35:41 curie smd-pull[20788]: default: smd-client@localhost: TAGS: stats::new-mails(5), del-mails(1), bytes-received(21885), xdelta-received(6863398)
mar 22 16:35:42 curie smd-pull[21373]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:35:42 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:35:42 curie systemd[3199]: Started pull emails with syncmaildir.
mar 22 16:36:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:36:36 curie smd-pull[21738]: default: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(214)
mar 22 16:36:37 curie smd-pull[21816]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:36:37 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:36:37 curie systemd[3199]: Started pull emails with syncmaildir.
... notice that huge xdelta-received there, that's 7GB right there. Mysteriously, the curie mail spool survived this, possibly because smd-pull started failing again:
mar 22 16:38:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:38:00 curie smd-pull[23556]: 21887 ?        00:00:00 smd-push
mar 22 16:38:00 curie smd-pull[23556]: Already running.
mar 22 16:38:00 curie smd-pull[23556]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:38:00 curie smd-pull[23556]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 21887) run(rm /home/anarcat/.smd/lock))
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:38:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
That could have been when i got on angela to check my mail, and it was busy doing the nasty removal stuff... although the times don't match. Here is when angela came back online:
anarcat@angela:~(main)$ last
anarcat  :0           :0               Mon Mar 22 19:57   still logged in
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 19:57   still running
anarcat  :0           :0               Mon Mar 22 17:43 - 18:47  (01:03)
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 17:39   still running
Then finally the sync on curie started failing with:
mar 22 16:46:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:46:42 curie smd-pull[27455]: smd-server: ERROR: Client aborted, removing /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.new and /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.mtime.new
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: Failed to copy Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: The destination already exists but its content differs.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: To fix this problem you have two options:
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - rename Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S by hand so that Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   can be copied without replacing it.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   Executing  cd; mv -n "Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "Maildir/.koumbit/cur/1616446002.1.localhost"  should work.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - run smd-push so that your changes to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   are propagated to the other mailbox
mar 22 16:46:42 curie smd-pull[27455]: default: smd-client@localhost: TAGS: error::context(copy-message) probable-cause(concurrent-mailbox-edit) human-intervention(necessary) suggested-actions(run(mv -n "/home/anarcat/.smd/workarea/Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "/home/anarcat/.smd/workarea/Maildir/.koumbit/tmp/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S") run(smd-push default))
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:46:42 curie systemd[3199]: Failed to start pull emails with syncmaildir.
It went on like this until I found the problem. This is, presumably, a good thing because those emails were not being destroyed. On angela, things looked like this:
-- Reboot --
mar 22 17:39:29 angela systemd[1677]: Started run notmuch new at least once a day.
mar 22 17:39:29 angela systemd[1677]: Started run smd-pull regularly.
mar 22 17:40:46 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open Maildir/.tor/new/1616446842.M285912P26118.marcos,S=8860,W=8996: Maildir/.tor/new/1616446842.M285912P26118.marcos,S=886
0,W=8996: No such file or directory
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open requested file.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: Data transmission failed.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: This problem is transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: server sent ABORT or connection died
mar 22 17:43:18 angela smd-pull[3916]: default: smd-server@smd-server-anarcat: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested
-actions(retry)
mar 22 17:43:18 angela smd-pull[3916]: default: smd-client@localhost: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Failed with result 'exit-code'.
mar 22 17:43:18 angela systemd[1677]: Failed to start pull emails with syncmaildir.
mar 22 17:43:18 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:29 angela smd-pull[4847]: default: smd-client@localhost: TAGS: stats::new-mails(29), del-mails(0), bytes-received(401519), xdelta-received(38914)
mar 22 17:43:29 angela smd-pull[5600]: register: smd-client@localhost: TAGS: stats::new-mails(2), del-mails(0), bytes-received(92150), xdelta-received(471)
mar 22 17:43:29 angela systemd[1677]: smd-pull.service: Succeeded.
mar 22 17:43:29 angela systemd[1677]: Started pull emails with syncmaildir.
mar 22 17:43:29 angela systemd[1677]: Starting push emails with syncmaildir...
mar 22 17:43:32 angela smd-push[5693]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(217)
mar 22 17:43:33 angela smd-push[6575]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(219)
mar 22 17:43:33 angela systemd[1677]: smd-push.service: Succeeded.
mar 22 17:43:33 angela systemd[1677]: Started push emails with syncmaildir.
Notice how long it took to get the first error, in that first failure: it failed after 3 minutes! Presumably that's when it started deleting all that mail. And this is during pull, not push, so the error didn't come from angela.

Affected data It seems 2GB of mail from my main INBOX was destroyed. Another 2.4GB of spam (kept for training purposes) was also destroyed, along with 700MB of Sent mail. The rest is hard to figure out, because the folders are actually still there, just smaller. So I relied on ncdu to figure out the size changes. (Note that I don't really archive (or delete much of) my mail since I use notmuch, which is why the INBOX is so large...) Concretely, according to the notmuch-new.service which still runs periodically on marcos, here are the changes that happened on the server:
mar 22 16:17:12 marcos notmuch[10729]: Added 7 new messages to the database. Removed 57985 messages. Detected 1372 file renames.
mar 22 16:22:43 marcos notmuch[12826]: No new mail. Removed 143842 messages. Detected 6072 file renames.
mar 22 16:27:02 marcos notmuch[13969]: No new mail. Removed 82071 messages. Detected 1783 file renames.
mar 22 16:29:45 marcos notmuch[15079]: Added 22743 new messages to the database. Detected 1 file rename.
mar 22 16:31:48 marcos notmuch[16196]: Added 22779 new messages to the database. Removed 5 messages.
mar 22 16:33:11 marcos notmuch[17192]: Added 3711 new messages to the database.
mar 22 16:40:41 marcos notmuch[19122]: Added 74558 new messages to the database. Detected 1 file rename.
mar 22 16:43:21 marcos notmuch[20325]: Added 9061 new messages to the database. Detected 4 file renames.
mar 22 17:43:08 marcos notmuch[7420]: Added 1793 new messages to the database. Detected 6 file renames.
That is basically the entire mail spool destroyed at first (283 898 messages), and then bits and pieces of it progressively re-added (134 645 messages), somehow, so 149 253 mails were lost, presumably.

Recovery I disabled the services all over the place:
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
(Well, technically, I did that only on angela, as I thought the problem was there. Luckily, curie kept going but it seems like it was harmless.) I made a backup of the mail spool on curie:
tar cf - Maildir/   pv -s 14G   gzip -c > Maildir.tgz
Then I crossed my fingers and ran smd-push -v -s, as that was suggested by smd error codes themselves. That thankfully started restoring mail. It failed a few times on weird cases of files being duplicates, but I resolved this by following the instructions. Or mostly: I actually deleted the files instead of moving them, which made smd even unhappier (if there ever was such a thing). I had to recreate some of those files, so, lesson learned: do follow the advice smd gives you, even if it seems useless or strange. But then smd-push was humming along, uploading tens of thousands of messages, saturating the upload in the office, refilling the mail spool on the server... yaay!... ? Except... well, of course that didn't quite work: the mail spool in the office eventually started to grow beyond the size of the mail spool on the workstation. That is what smd-push eventually settled on:
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(151697), del-mails(0), bytes-received(7539147811), xdelta-received(10881198)
It recreated 151 697 emails, adding about 2000 emails to the pool, kind of from nowhere at all. On marcos, before:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,0 GiB [##########] /.notmuch
  717,3 MiB [#         ] /.Archives.2014
  498,2 MiB [#         ] /.feeds.debian-planet
  453,1 MiB [#         ] /.Archives.2012
  414,5 MiB [#         ] /.debian
  408,2 MiB [#         ] /.quoifaire
  389,8 MiB [          ] /.rapports
  356,6 MiB [          ] /.tor
  182,6 MiB [          ] /.koumbit
  179,8 MiB [          ] /tmp
   56,8 MiB [          ] /.nn
   43,0 MiB [          ] /.act-mtl
   32,6 MiB [          ] /.feeds.sysadvent
   31,7 MiB [          ] /.feeds.releases
   31,4 MiB [          ] /.Sent.2005
   26,3 MiB [          ] /.sage
   25,5 MiB [          ] /.freedombox
   24,0 MiB [          ] /.feeds.git-annex
   21,1 MiB [          ] /.Archives.2011
   19,1 MiB [          ] /.Sent.2003
   16,7 MiB [          ] /.bugtraq
   16,2 MiB [          ] /.mlug
 Total disk usage:   8,0 GiB  Apparent size:   7,6 GiB  Items: 184426
After:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,7 GiB [##########] /.notmuch
    2,7 GiB [#####     ] /.junk
    1,9 GiB [###       ] /cur
  717,3 MiB [#         ] /.Archives.2014
  659,3 MiB [#         ] /.Sent
  513,9 MiB [#         ] /.Archives.2012
  498,2 MiB [#         ] /.feeds.debian-planet
  449,6 MiB [          ] /.Archives.2015
  414,5 MiB [          ] /.debian
  408,2 MiB [          ] /.quoifaire
  389,8 MiB [          ] /.rapports
  380,8 MiB [          ] /.Archives.2013
  356,6 MiB [          ] /.tor
  261,1 MiB [          ] /.Archives.2011
  240,9 MiB [          ] /.koumbit
  183,6 MiB [          ] /.Archives.2010
  179,8 MiB [          ] /tmp
  128,4 MiB [          ] /.lists
  106,1 MiB [          ] /.inso-interne
  103,0 MiB [          ] /.github
   75,0 MiB [          ] /.nanog
   69,8 MiB [          ] /.full-disclosure
 Total disk usage:  16,2 GiB  Apparent size:  15,5 GiB  Items: 341143
That is 156 717 files more. On curie:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------------------------------------
    2,7 GiB [##########] /.junk
    2,3 GiB [########  ] /.notmuch
    1,9 GiB [######    ] /cur
  661,2 MiB [##        ] /.Archives.2014
  655,3 MiB [##        ] /.Sent
  512,0 MiB [#         ] /.Archives.2012
  447,3 MiB [#         ] /.Archives.2015
  438,5 MiB [#         ] /.feeds.debian-planet
  406,5 MiB [#         ] /.quoifaire
  383,6 MiB [#         ] /.debian
  378,6 MiB [#         ] /.Archives.2013
  303,3 MiB [#         ] /.tor
  296,0 MiB [#         ] /.rapports
  237,6 MiB [          ] /.koumbit
  233,2 MiB [          ] /.Archives.2011
  182,1 MiB [          ] /.Archives.2010
  127,0 MiB [          ] /.lists
  104,8 MiB [          ] /.inso-interne
  102,7 MiB [          ] /.register
   89,6 MiB [          ] /.github
   67,1 MiB [          ] /.full-disclosure
   66,5 MiB [          ] /.nanog
 Total disk usage:  13,3 GiB  Apparent size:  12,6 GiB  Items: 342465
Interestingly, there are more files, but less disk usage. It's possible the notmuch database there is more efficient. So maybe there's nothing to worry about. Last night's marcos backup has:
root@marcos:/home/anarcat# find /mnt/home/anarcat/Maildir   pv -l   wc -l
 341k 0:00:16 [20,4k/s] [                             <=>                                                                                                                                     ]
341040
... 341040 files, which seems about right, considering some mail was delivered during the day. An audit can be performed with hashdeep:
borg mount /media/sdb2/borg/::marcos-auto-2021-03-22 /mnt
hashdeep -c sha256 -r /mnt/home/anarcat/Maildir   pv -l -s 341k > Maildir-backup-manifest.txt
And then compared with:
hashdeep -c sha256 -k Maildir-backup-manifest.txt Maildir/
Some extra files should show up in the Maildir, and very few should actually be missing, because I shouldn't have deleted mail from the previous day the next day, or at least very few. The actual summary hashdeep gave me was:
hashdeep: Audit failed
   Input files examined: 0
  Known files expecting: 0
          Files matched: 339080
Files partially matched: 0
            Files moved: 782
        New files found: 107
  Known files not found: 106
So 106 files added, 107 deleted. Seems good enough for me... Postfix was stopped at Mar 22 21:12:59 to try and stop external events from confusing things even further. I reviewed the delivery log to see if mail that came in during the problem window disappeared:
grep 'dovecot:.*stored mail into mailbox' /var/log/mail.log  
  tail -20  
  sed 's/.*msgid=<//;s/>.*//'   
  while read msgid; do 
    notmuch count --exclude=false id:$msgid  
      grep 0 && echo $msgid missing;
  done
And things looked okay. Now of course if we go further back, we find mail I actually deleted (because I do do that sometimes), so it's hard to use this log as an audit trail. We can only hope that the curie spool is sufficiently coherent to be relied on. Worst case, we'll have to restore from last night's backup, but that's getting far away now: I get hundreds of mails a day in that mail spool, and reseting back to last night does not seem like a good idea. A dry run of smd-pull on angela seems to agree that it's missing some files:
default: smd-client@localhost: TAGS: stats::new-mails(154914), del-mails(0), bytes-received(0), xdelta-received(0)
... a number of mails somewhere in between the other two, go figure. A "wet" run of this was started, without deletion (-n), which gave us:
default: smd-client@localhost: TAGS: stats::new-mails(154911), del-mails(0), bytes-received(7658160107), xdelta-received(10837609)
Strange that it sync'd three less emails, but that's still better than nothing, and we have a mail spool on angela again:
anarcat@angela:~(main)$ notmuch new
purging with prefix '.': spam moved (0), ham moved (0), deleted (0), done
Note: Ignoring non-mail file: /home/anarcat/Maildir//.uidvalidity
Processed 1779 total files in 26s (66 files/sec.).
Added 1190 new messages to the database. Removed 3 messages. Detected 593 file renames.
tagging with prefix '.': spam, sent, feeds, koumbit, tor, lists, rapports, folders, done.
Notice how only 1190 messages were re-added, that is because I killed notmuch before it had time to remove all those mails from its database.

Possible causes I am totally at a loss as to why smd started destroying everything like it did. But a few things come to mind:
  1. I rewired my office on that day.
  2. This meant unplugging curie, the workstation.
  3. It has a bad CMOS battery (known problem), so it jumped around the time continuum a few times, sometimes by years.
  4. The smd services are ran from a systemd unit with OnCalendar=*:0/2. I have heard that it's possible that major time jumps "pile up" execution of jobs, and it seems this happened in this case.
  5. It's possible that locking in smd is not as great as it could be, and that it corrupted its internal data structures on curie, which led it to command a destruction of the remote mail spool.
It's also possible that there was a disk failure on the server, marcos. But since it's running on a (software) RAID-1 array, and no errors have been found (according to dmesg), I don't think that's a plausible hypothesis.

Lessons learned
  1. follow what smd says, even if it seems useless or strange.
  2. trust but verify: just backup everything before you do anything, especially the largest data set.
  3. daily backups are not great for email, unless you're ready to lose a day of email (which I'm not).
  4. hashdeep is great. I keep finding new use cases for it. Last time it was to audit my camera SD card to make sure I didn't forget anything, and now this. it's fast and powerful.
  5. borg is great too. the FUSE mount was especially useful, and it was pretty fast to explore the backup, even through that overhead: checksumming 15GB of mail took about 35 minutes, which gives a respectable 8MB/s, probably bottlenecked by the crap external USB drive I use for backups (!).
  6. I really need to finish my backup system so that I have automated offsite backups, although in this case that would actually have been much slower (certainly not 8MB/s!).

Workarounds and solutions I setup fake-hwclock on curie, so that the next power failure will not upset my clock that badly. I am thinking of switching to ZFS or BTRFS for most of my filesystems, so that I can use filesystem snapshots (including remotely!) as a backup strategy. This seems so much more powerful than crawling the filesystem for changes, and allows for truly offsite backups protected from an attacker (hopefully). But it's a long way there. I'm also thinking of rebuilding my mail setup without smd. It's not the first time something like this happens with smd. It's the first time I am more confident it's the root cause of the problem, however, and it makes me really nervous for the future. I have used offlineimap in the past and it seems it was finally ported to Python 3 so that could be an option again. isync/mbsync is another option, which I tried before but do not remember why I didn't switch. A complete redesign with something like getmail and/or nncp could also be an option. But alas, I lack the time to go crazy with those experiments. Somehow, doing like everyone else and just going with Google still doesn't seem to be an option for me. Screw big tech. But I am afraid they will win, eventually. In any case, I'm just happy I got mail again, strangely.

27 February 2021

Russ Allbery: DocKnot 4.01

DocKnot is my software documentation and release management tool. This release adds support for a global user configuration file separate from the metadata for any given project and adds support for signing generated distribution tarballs with GnuPG. Currently, the only configuration options for the global configuration file are to set the destination location of generated distributions and the PGP key to use when signing them. This release also removes some now-unnecessary helper functions, fixes docknot --help, and cleans up some documentation bugs left over from the big changes in 4.00. You can get the latest release from CPAN or from the DocKnot distribution page.

21 February 2021

Russ Allbery: Review: The Fated Sky

Review: The Fated Sky, by Mary Robinette Kowal
Series: Lady Astronaut #2
Publisher: Tor
Copyright: August 2018
ISBN: 0-7653-9893-1
Format: Kindle
Pages: 380
The Fated Sky is a sequel to The Calculating Stars, but you could start with this book if you wanted to. It would be obvious you'd missed a previous book in the series, and some of the relationships would begin in medias res, but the story is sufficiently self-contained that one could puzzle through. Mild spoilers follow for The Calculating Stars, although only to the extent of confirming that book didn't take an unexpected turn, and nothing that wouldn't already be spoiled if you had read the short story "The Lady Astronaut of Mars" that kicked this series off. (The short story takes place well after all of the books.) Also some minor spoilers for the first section of the book, since I have to talk about its outcome in broad strokes in order to describe the primary shape of the novel. In the aftermath of worsening weather conditions caused by the Meteor, humans have established a permanent base on the Moon and are preparing a mission to Mars. Elma is not involved in the latter at the start of the book; she's working as a shuttle pilot on the Moon, rotating periodically back to Earth. But the political situation on Earth is becoming more tense as the refugee crisis escalates and the weather worsens, and the Mars mission is in danger of having its funding pulled in favor of other priorities. Elma's success in public outreach for the space program as the Lady Astronaut, enhanced by her navigation of a hostage situation when an Earth re-entry goes off course and is met by armed terrorists, may be the political edge supporters of the mission need. The first part of this book is the hostage situation and other ground-side politics, but the meat of this story is the tense drama of experimental, pre-computer space flight. For those who aren't familiar with the previous book, this series is an alternate history in which a huge meteorite hit the Atlantic seaboard in 1952, potentially setting off runaway global warming and accelerating the space program by more than a decade. The Calculating Stars was primarily about the politics surrounding the space program. In The Fated Sky, we see far more of the technical details: the triumphs, the planning, and the accidents and other emergencies that each could be fatal in an experimental spaceship headed towards Mars. If what you were missing from the first book was more technological challenge and realistic detail, The Fated Sky delivers. It's edge-of-your-seat suspenseful and almost impossible to put down. I have more complicated feelings about the secondary plot. In The Calculating Stars, the heart of the book was an incredibly well-told story of Elma learning to deal with her social anxiety. That's still a theme here but a lesser one; Elma has better coping mechanisms now. What The Fated Sky tackles instead is pervasive sexism and racism, and how Elma navigates that (not always well) as a white Jewish woman. The centrality of sexism is about the same in both books. Elma's public outreach is tied closely to her gender and starts as a sort of publicity stunt. The space program remains incredibly sexist in The Fated Stars, something that Elma has to cope with but can't truly fix. If you found the sexism in the first book irritating, you're likely to feel the same about this installment. Racism is more central this time, though. In The Calculating Stars, Elma was able to help make things somewhat better for Black colleagues. She has a much different experience in The Fated Stars: she ends up in a privileged position that hurts her non-white colleagues, including one of her best friends. The merits of taking a stand on principle are ambiguous, and she chooses not to. When she later tries to help Black astronauts, she does so in a way that's focused on her perceptions rather than theirs and is therefore more irritating than helpful. The opportunities she gets, in large part because she's seen as white, unfairly hurt other people, and she has to sit with that. It's a thoughtful and uncomfortable look at how difficult it is for a white person to live with discomfort they can't fix and to not make it worse by trying to wave it away or point out their own problems. That was the positive side of this plot, although I'm still a bit wary and would like to read a review by a Black reviewer to see how well this plot works from their perspective. There are some other choices that I thought landed oddly. One is that the most racist crew member, the one who sparks the most direct conflict with the Black members of the international crew, is a white man from South Africa, which I thought let the United States off the hook too much and externalized the racism a bit too neatly. Another is that the three ships of the expedition are the Ni a, the Pinta, and the Santa Maria, and no one in the book comments on this. Given the thoughtful racial themes of the book, I can't imagine this is an accident, and it is in character for United States of this novel to pick those names, but it was an odd intrusion of an unremarked colonial symbol. This may be part of Kowal's attempt to show that Elma is embedded in a racist and sexist world, has limited room to maneuver, and can't solve most of the problems, which is certainly a theme of the series. But it left me unsettled on whether this book was up to fully handling the fraught themes Kowal is invoking. The other part of the book I found a bit frustrating is that it never seriously engaged with the political argument against Mars colonization, instead treating most of the opponents of space travel as either deluded conspiracy believers or cynical villains. Science fiction is still arguing with William Proxmire even though he's been dead for fifteen years and out of office for thirty. The strong argument against a Mars colony in Elma's world is not funding priorities; it's that even if it's successful, only a tiny fraction of well-connected elites will escape the planet to Mars. This argument is made in the book and Elma dismisses it as a risk she's trying to prevent, but it is correct. There is no conceivable technological future that leads to evacuating the Earth to Mars, but The Fated Sky declines to grapple with the implications of that fact. There's more that I haven't remarked on, including an ongoing excellent portrayal of the complicated and loving relationship between Elma and her husband, and a surprising development in her antagonistic semi-friendship with the sexist test pilot who becomes the mission captain. I liked how Kowal balanced technical problems with social problems on the long Mars flight; both are serious concerns and they interact with each other in complicated ways. The details of the perils and joys of manned space flight are excellent, at least so far as I can tell without having done the research that Kowal did. If you want a fictionalized Apollo 13 with higher stakes and less ground support, look no further; this is engrossing stuff. The interpersonal politics and sociology were also fascinating and gripping, but unsettling, in both good ways and bad. I like the challenge that Kowal presents to a white reader, although I'm not sure she was completely in control of it. Cautiously recommended, although be aware that you'll need to grapple with a sexist and racist society while reading it. Also a content note for somewhat graphic gastrointestinal problems. Followed by The Relentless Moon. Rating: 8 out of 10

4 February 2021

John Goerzen: A Simple, Delay-Tolerant, Offline-Capable Mesh Network with Syncthing (+ optional NNCP)

A little while back, I spent a week in a remote area. It had no Internet and no cell phone coverage. Sometimes, I would drive in to town where there was a signal to get messages, upload photos, and so forth. I had to take several devices with me: my phone, my wife s, maybe a laptop or a tablet too. It seemed there should have been a better way. And there is. I ll use this example to talk about a mesh network, but it could just as well apply to people wanting to communicate on a 12-hour flight that has no in-flight wifi, or spacecraft with an intermittent connection, or a person traveling. Syncthing makes a wonderful solution for things like these. Here are some interesting things about Syncthing: Syncthing works by having you define devices and folders. You can choose which devices to share folders with. A shared folder has an ID that is unique across Sycnthing. You can share a folder from device A to device B, and then device B can share it with device C, even if A and C don t know about each other or have no way to communicate. More commonly, though, all the devices would know about each other and will opportunistically communicate the best way they can. Syncthing uses something akin to a Bittorrent protocol. Say you re syncing videos from your phone, and they re going to 3 machines. It doesn t mean that Syncthing has to send it three times from the phone. Syncthing will send each block, most likely, just once; the other nodes in the swarm will register the block availability from the first other node to get it and will exchange blocks with themselves. Syncthing will typically look for devices on the local LAN. Failing that, it will use an introduction server to see if it can reach them directly using P2P. Failing that, perhaps due to restrictive firewalls or NAT, communication can be relayed through volunteer-run Syncthing servers on the Internet. All Syncthing communications are cryptographically encrypted and verified. You can also configure Syncthing arbitrarily; for instance, to run over ssh or Tor tunnels. So, let s look at how Syncthing might help with the example I laid out up front. All the devices at the remote location could communicate with each other. The Android app is quite capable of syncing photos and videos using Syncthing, for instance. Then one device could be taken to the Internet location and it would transmit data on behalf of all the others perhaps back to a computer at your home, or to a server somewhere. Perhaps a script running on the remote server would then move files out of the syncthing synced folder into permanent storage elsewhere, triggering a deletion to be sent to the phone to free up storage. When the phone gets back to the other devices, the deletion can be propagated to them to free up storage there too. Or maybe you have a computer out in a shed or somewhere without Internet access that you go to periodically, and need to get files to it. Again, your phone could be a carrier. Taking it a step further If you envision a file as a packet, you could, conceivably, do something like tunnel TCP/IP over Syncthing, assuming generous-enough timeouts. It can truly handle communication. But you don t need TCP/IP for this. Consider some other things you could do: You can start to see how there are a lot of possibilities here that extend beyond just file synchronization, though they are built upon a file synchronization tool. Enter NNCP Let s look at a tool that s especially suited for this: NNCP, which I ve been writing about a lot lately. NNCP is designed to handle file exchange and remote execution with remote computers in an asynchronous, store-and-forward manner. NNCP packets are themselves encrypted and authenticated. NNCP traditionally is source-routed (that is, you configure it so that machine A reaches machine D by relaying through B and C), and the packets are onion-routed. NNCP packets can be exchanged by a TCP call, a tar-like stream, copying files to something like a USB stick and physically transporting it to the remote, etc. This works really well and I ve been using it myself. But it gets complicated if the network topology isn t fixed; it is difficult to reroute packets due to the onion routing, for instance. There are various workarounds that could be used but why not just use Syncthing as a transport in those cases? nncp-xfer is the command that exchanges packets by writing them to, and reading them from, a directory. It is what you d use to exchange packets on a USB stick. And what you d use to exchange packets via Syncthing. It writes packets in a RECIPIENT/SENDER/PACKET directory structure, so it is perfectly fine to have multiple systems exchanging packets in a single Syncthing synced folder tree. This structure also allows leaf nodes to only carry the particular packets they re interested in. The packets are all encrypted, so they can be freely synced wherever. Since Syncthing opportunistically syncs a shared folder with any device the folder is shared with, a phone could very easily be the NNCP transport, even if it has no idea what NNCP is. It could carry NNCP packets back and forth between sites, or to the Internet, or whatever. NNCP supports file transmission, file request, and remote execution, all subject to controls, of course. It is easy to integrate with Exim or Postfix to use as a mail transport, Git transport, and so forth. I use it for backups. It would be quite easy to have it send those backups (encrypted zfs send) via nncp-xfer to Syncthing instead of the usual method, and then if I ve shared the Syncthing folder with my phone, all I need to do is bring the phone into Internet range and they get sent. nncp-xfer will normally remove the packets out of the xfer directory as it ingests them, so the space will only be consumed on the phone (and laptop) until we know the packets made it to their destination. Pretty slick, eh?

25 January 2021

Russ Allbery: Review: Laziness Does Not Exist

Review: Laziness Does Not Exist, by Devon Price
Publisher: Atria Books
Copyright: January 2021
ISBN: 1-9821-4013-5
Format: Kindle
Pages: 216
The premise of Laziness Does Not Exist is in the title: Laziness as a moral failing does not exist. It is a misunderstanding of other problems with physical or psychological causes, a belief system that is used to extract unsustainable amounts of labor, an excuse to withdraw our empathy, and a justification for not solving social problems. Price refers to this as the Laziness Lie, which they define with three main tenets:
  1. Your worth is your productivity.
  2. You cannot trust your own feelings and limits.
  3. There is always more you could be doing.
This book (an expansion of a Medium article) makes the case against all three tenets using the author's own burnout-caused health problems as the starting argument. They then apply that analysis to work, achievements, information overload, relationships, and social pressure. In each case, Price's argument is to prioritize comfort and relaxation, listen to your body and your limits, and learn who you are rather than who the Laziness Lie is trying to push you to be. The reader reaction to a book like this will depend on where the reader is in thinking about the problem. That makes reviewing a challenge, since it's hard to simulate a reader with a different perspective. For myself, I found the content unobjectionable, but largely repetitive of other things I've read. The section on relationships in particular will be very familiar to Captain Awkward readers, just not as pointed. Similarly, the chapter on information overload is ground already covered by Digital Minimalism, among other books. That doesn't make this a bad book, but it's more of a survey, so if you're already well-read on this topic you may not get much out of it. The core assertion is aggressive in part to get the reader to argue with it and thus pay attention, but I still came away convinced that laziness is not a useful word. The symptoms that cause us to call ourselves lazy procrastination, burnout, depression, or executive function problems, for example are better understood without the weight of moral reproach that laziness carries. I do think there is another meaning of laziness that Price doesn't cover, since they are aiming this book exclusively at people who are feeling guilty about possibly being lazy, and we need some term for people who use their social power to get other people to do all their work for them. But given how much the concept of laziness is used to attack and belittle the hard-working or exhausted, I'm happy to replace "laziness" with "exploitation" when talking about that behavior. This is a profoundly kind and gentle book. Price's goal is to help people be less hard on themselves and to take opportunities to relax without guilt. But that also means that it stays in the frame of psychological analysis and self-help, and only rarely strays into political or economic commentary. That means it's more useful for taking apart internalized social programming, but less useful for thinking about the broader political motives of those who try to convince us to work endlessly and treat all problems as personal responsibilities rather than political failures. For that, I think Anne Helen Peterson's Can't Even is the more effective book. Price also doesn't delve much into history, and I now want to read a book on the origin of a work ethic as a defining moral trait. One truly lovely thing about this book is that it's quietly comfortable with human variety of gender and sexuality in a way that's never belabored but that's obvious from the examples that Price uses. Laziness Does Not Exist felt more inclusive in that way, and to some extent on economic class, than Can't Even. I was in the mood for a book that takes apart the political, social, and economic motivations behind convincing people that they have to constantly strive to not be lazy, so the survey nature of this book and its focus on self-help made it not the book for me. It also felt a bit repetitive despite its slim length, and the chapter structure didn't click for me. But it's not a bad book, and I suspect it will be the book that someone else needs to read. Rating: 6 out of 10

22 January 2021

Robert McQueen: Launching Endless OS Foundation

Passion Led Us Here

How our for-profit company became a nonprofit, to better tackle the digital divide. Originally posted on the Endless OS Foundation blog.

An 8-year journey to a nonprofit On the 1st of April 2020, our for-profit Endless Mobile officially became a nonprofit as the Endless OS Foundation. Our launch as a nonprofit just as the global pandemic took hold was, predictably, hardly noticed, but for us the timing was incredible: as the world collectively asked What can we do to help others in need? , we framed our mission statement and launched our .org with the same very important question in mind. Endless always had a social impact mission at its heart, and the challenges related to students, families, and communities falling further into the digital divide during COVID-19 brought new urgency and purpose to our team s decision to officially step in the social welfare space.
On April 1st 2020, our for-profit Endless Mobile officially became a nonprofit as the Endless OS Foundation, focused on the #DigitalDivide.
Our updated status was a long time coming: we began our transformation to a nonprofit organization in late 2019 with the realization that the true charter and passions of our team would be greatly accelerated without the constraints of for-profit goals, investors and sales strategies standing in the way of our mission of digital access and equity for all. But for 8 years we made a go of it commercially, headquartered in Silicon Valley and framing ourselves as a tech startup with access to the venture capital and partnerships on our doorstep. We believed that a successful commercial channel would be the most efficient way to scale the impact of bringing computer devices and access to communities in need. We still believe this we ve just learned through our experience that we don t have the funding to enter the computer and OS marketplace head-on. With the social impact goal first, and the hope of any revenue a secondary goal, we have had many successes in those 8 years bridging the digital divide throughout the world, from Brazil, to Kenya, and the USA. We ve learned a huge amount which will go on to inform our strategy as a nonprofit.
Endless always had a social impact mission at its heart. COVID-19 brought new urgency and purpose to our team s decision to officially step in the social welfare space.
Our unique perspective One thing we learned as a for-profit is that the OS and technology we ve built has some unique properties which are hugely impactful as a working solution to digital equity barriers. And our experience deploying in the field around the world for 8 years has left us uniquely informed via many iterations and incremental improvements.
Endless OS designer in discussion with prospective user
With this knowledge in-hand, we ve been refining our strategy throughout 2020 and now starting to focus on what it really means to become an effective nonprofit and make that impact. In many ways it is liberating to abandon the goals and constraints of being a for-profit entity, and in other ways it s been a challenging journey for me and the team to adjust our way of thinking and let these for-profit notions and models go. Previously we exclusively built and sold a product that defined our success; and any impact we achieved was a secondary consequence of that success and seen through that lens. Now our success is defined purely in terms of social impact, and through our actions, those positive impacts can be made with or without our product . That means that we may develop and introduce technology to solve a problem, but it is equally as valid to find another organization s existing offering and design a way to increase that positive impact and scale.
We develop technology to solve access equity issues, but it s equally as valid to find another organization s offering and partner in a way that increases their positive impact.
The analogy to Free and Open Source Software is very strong while Endless has always used and contributed to a wide variety of FOSS projects, we ve also had a tension where we ve been trying to hold some pieces back and capture value such as our own application or content ecosystem, our own hardware platform necessarily making us competitors to other organisations even though they were hoping to achieve the same things as us. As a nonprofit we can let these ideas go and just pick the best partners and technologies to help the people we re trying to reach.
School kids writing on paper
Digital equity 4 barriers we need to overcome In future, our decisions around which projects to build or engage with will revolve around 4 barriers to digital equity, and how our Endless OS, Endless projects, or our partners offerings can help to solve them. We define these 4 equity barriers as: barriers to devices, barriers to connectivity, barriers to literacy in terms of your ability to use the technology, and barriers to engagement in terms of whether using the system is rewarding and worthwhile.
We define the 4 digital equity barriers we exist to impact as:
1. barriers to devices
2. barriers to connectivity
3. barriers to literacy
4. barriers to engagement
It doesn t matter who makes the solutions that break these barriers; what matters is how we assist in enabling people to use technology to gain access to the education and opportunities these barriers block. Our goal therefore is to simply ensure that solutions exist building them ourselves and with partners such as the FOSS community and other nonprofits proving them with real-world deployments, and sharing our results as widely as possible to allow for better adoption globally. If we define our goal purely in terms of whether people are using Endless OS, we are effectively restricting the reach and scale of our solutions to the audience we can reach directly with Endless OS downloads, installs and propagation. Conversely, partnerships that scale impact are a win-win-win for us, our partners, and the communities we all serve. Engineering impact Our Endless engineering roots and capabilities feed our unique ability to build and deploy all of our solutions, and the practical experience of deploying them gives us evidence and credibility as we advocate for their use. Either activity would be weaker without the other.
Our engineering roots and capabilities feed our unique ability to build and deploy digital divide solutions.
Our partners in various engineering communities will have already seen our change in approach. Particularly, with GNOME we are working hard to invest in upstream and reconcile the long-standing differences between our experience and GNOME. If successful, many more people can benefit from our work than just users of Endless OS. We re working with Learning Equality on Kolibri to build a better app experience for Linux desktop users and bring content publishers into our ecosystem for the first time, and we ve also taken our very own Hack, the immersive and fun destination for kids learning to code, released it for non-Endless systems on Flathub, and made it fully open-source.
Planning tasks with sticky notes on a whiteboard
What s next for our OS? What then is in store for the future of Endless OS, the place where we have invested so much time and planning through years of iterations? For the immediate future, we need the capacity to deploy everything we ve built all at once, to our partners. We built an OS that we feel is very unique and valuable, containing a number of world-firsts: first production OS shipped with OSTree, first Flatpak-only desktop, built-in support for updating OS and apps from USBs, while still providing a great deal of reliability and convenience for deployments in offline and educational-safe environments with great apps and content loaded on every system. However, we need to find a way to deliver this Linux-based experience in a more efficient way, and we d love to talk if you have ideas about how we can do this, perhaps as partners. Can the idea of Endless OS evolve to become a spec that is provided by different platforms in the future, maybe remixes of Debian, Fedora, openSUSE or Ubuntu? Build, Validate, Advocate Beyond the OS, the Endless OS Foundation has identified multiple programs to help underserved communities, and in each case we are adopting our build, validate, advocate strategy. This approach underpins all of our projects: can we build the technology (or assist in the making), will a community in-need validate it by adoption, and can we inspire others by telling the story and advocating for its wider use?
We are adopting a build, validate, advocate strategy.
1. build the technology (or assist in the making)
2. validate by community adoption
3. advocate for its wider use
As examples, we have just launched the Endless Key (link) as an offline solution for students during the COVID-19 at-home distance learning challenges. This project is also establishing a first-ever partnership of well-known online educational brands to reach an underserved offline audience with valuable learning resources. We are developing a pay-as-you-go platform and new partnerships that will allow families to own laptops via micro-payments that are built directly into the operating system, even if they cannot qualify for standard retail financing. And during the pandemic, we ve partnered with Teach For America to focus on very practical digital equity needs in the USA s urban and rural communities. One part of the world-wide digital divide solution We are one solution provider for the complex matrix of issues known collectively as the #DigitalDivide, and these issues will not disappear after the pandemic. Digital equity was an issue long before COVID-19, and we are not so naive to think it can be solved by any single institution, or by the time the pandemic recedes. It will take time and a coalition of partnerships to win. We are in for the long-haul and we are always looking for partners, especially now as we are finding our feet in the nonprofit world. We d love to hear from you, so please feel free to reach out to me I m ramcq on IRC, RocketChat, Twitter, LinkedIn or rob@endlessos.org.

4 January 2021

John Goerzen: More Topics on Store-And-Forward (Possibly Airgapped) ZFS and Non-ZFS Backups with NNCP

Note: this is another article in my series on asynchronous communication in Linux with UUCP and NNCP. In my previous post, I introduced a way to use ZFS backups over NNCP. In this post, I ll expand on that and also explore non-ZFS backups. Use of nncp-file instead of nncp-exec The previous example used nncp-exec (like UUCP s uux), which lets you pipe stdin in, then queues up a request to run a given command with that input on a remote. I discussed that NNCP doesn t guarantee order of execution, but that for the ZFS use case, that was fine since zfs receive would just fail (causing NNCP to try again later). At present, nncp-exec stores the data piped to it in RAM before generating the outbound packet (the author plans to fix this shortly). That made it unusable for some of my backups, so I set it up another way: with nncp-file, the tool to transfer files to a remote machine. A cron job then picks them up and processes them. On the machine being backed up, we have to find a way to encode the dataset to be received. I chose to do that as part of the filename, so the updated simplesnap-queue could look like this:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
FILE="bakfsfmt2- date "+%s.%N".$$ _ echo "$DEST"   sed 's,/,@,g' "
echo "Processing $DEST to $FILE" >&2
# stdin piped to this
zstd -8 - \
    gpg --compress-algo none --cipher-algo AES256 -e -r 012345...  \
    su nncp -c "/usr/local/nncp/bin/nncp-file -nice B -noprogress - 'backupsvr:$FILE'" >&2
echo "Queued $DEST to $FILE" >&2
I ve added compression and encryption here as well; more on that below. On the backup server, we would define a different incoming directory for each node in nncp.hjson. For instance:
host1:  
...
   incoming: "/var/local/nncp-bakcups-incoming/host1"
 
host2:  
...
   incoming: "/var/local/nncp-backups-incoming/host2"
 
I ll present the scanning script in a bit. Offsite Backup Rotation Most of the time, you don t want just a single drive to store the backups. You d like to have a set. At minimum, one wouldn t be plugged in so lightning wouldn t ruin all your backups. But maybe you d store a second drive at some other location you have access to (friend s house, bank box, etc.) There are several ways you could solve this: The third option can be helped with NNCP, too. One way is to create separate NNCP installations for each of the drives that you store data on. Then, whenever one is plugged in, the appropriate NNCP config will be loaded and appropriate packets received and processed. The neighbor machine the spooler would just store up packets for the offsite drive until it comes back onsite (or, perhaps, your airgapped USB transport would do this). Then when it s back onsite, all the queued up ZFS sends get replayed and the backups replicated. Now, how might you handle this with NNCP? The simple way would be to have each system generating backups send them to two destinations. For instance:
zstd -8 -   gpg --compress-algo none --cipher-algo AES256 -e -r 07D5794CD900FAF1D30B03AC3D13151E5039C9D5 \
    tee >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk1:$FILE'") \
        >(su nncp -c "/usr/local/nncp/bin/nncp-file -nice B+5 -noprogress - 'backupdisk2:$FILE'") \
   > /dev/null
You could probably also more safely use pee(1) (from moreutils) to do this. This has an unfortunate result of doubling the network traffic from every machine being backed up. So an alternative option would be to queue the packets to the spooling machine, and run a distribution script from it; something like this, in part:
INCOMINGDIR="/var/local/nncp-bakfs-incoming"
LOCKFILE="$INCOMINGDIR/.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "$ LOCKFILE "; then
  logit "Lock obtained at $ LOCKFILE  with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"$ EVAL_SAFE_LOCKFILE "'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi
logit "Scanning queue directory..."
cd "$INCOMINGDIR"
for HOST in *; do
   cd "$INCOMINGDIR/$HOST"
   for FILE in bakfsfmt2-*; do
           if [ -f "$FILE" ]; then
                   for BAKFS in backupdisk1 backupdisk2; do
                           runcommand nncp-file -nice B+5 -noprogress "$FILE" "$BAKFS:$HOST/$FILE"
                   done
                   runcommand rm "$FILE"
           else
                   logit "$HOST: Skipping $FILE since it doesn't exist"
           fi
   done
done
logit "Scan complete."
Security Considerations You ll notice that in my example above, the encryption happens as the root user, but nncp is called under su. This means that even if there is a vulnerability in NNCP, the data would still be protected by GPG. I ll also note here that many sites run ssh as root unnecessarily; the same principles should apply there. (ssh has had vulnerabilities in the past as well). I could have used gpg s built-in compression, but zstd is faster and better, so we can get good performance by using fast compression and piping that to an algorithm that can use hardware acceleration for encryption. I strongly encourage considering transport, whether ssh or NNCP or UUCP, to be untrusted. Don t run it as root if you can avoid it. In my example, the nncp user, which all NNCP commands are run as, has no access to the backup data at all. So even if NNCP were compromised, my backup data wouldn t be. For even more security, I could also sign the backup stream with gpg and validate that on the receiving end. I should note, however, that this conversation assumes that a network- or USB-facing ssh or NNCP is more likely to have an exploitable vulnerability than is gpg (which here is just processing a stream). This is probably a safe assumption in general. If you believe gpg is more likely to have an exploitable vulnerability than ssh or NNCP, then obviously you wouldn t take this particular approach. On the zfs side, the use of -F with zfs receive is avoided; this could lead to a compromised backed-up machine generating a malicious rollback on the destination. Backup zpools should be imported with -R or -N to ensure that a malicious mountpoint property couldn t be used to cause an attack. I choose to use zfs receive -u -o readonly=on which is compatible with both unmounted backup datasets and zpools imported with -R (or both). To access the data in a backup dataset, you would normally clone it and access it there. The processing script So, put this all together and look at an example of a processing script that would run from cron as root and process the incoming ZFS data.
#!/bin/bash
set -e
set -o pipefail
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
STORE=backups/simplesnap
INCOMINGDIR=/backups/nncp/incoming
if ! [ -d "$INCOMINGDIR" ]; then
        logerror "$INCOMINGDIR doesn't exist"
        exit 0
fi
LOCKFILE="/backups/nncp/.nncp-backups-zfs-scan.lock"
printf -v EVAL_SAFE_LOCKFILE '%q' "$LOCKFILE"
if dotlockfile -r 0 -l -p "$ LOCKFILE "; then
  logit "Lock obtained at $ LOCKFILE  with dotlockfile"
  trap 'ECODE=$?; dotlockfile -u '"$ EVAL_SAFE_LOCKFILE "'; exit $ECODE' EXIT INT TERM
else
  logit "Could not obtain lock at $LOCKFILE; $0 likely already running."
  exit 0
fi
EXITCODE=0
cd "$INCOMINGDIR"
logit "Scanning queue directory..."
for HOST in *; do
    HOSTPATH="$INCOMINGDIR/$HOST"
    # files like backupsfmt2-134.13134_dest
    for FILE in "$HOSTPATH"/backupsfmt2-[0-9]*_?*; do
        if [ ! -f "$FILE" ]; then
            logit "Skipping non-existent $FILE"
            continue
        fi
        # Now, $DEST will be HOST/DEST.  Strip off the @ also.
        DEST=" echo "$FILE"   sed -e 's/^.*backupsfmt2[^_]*_//' -e 's,@,/,g' "
        if [ -z "$DEST" ]; then
            logerror "Malformed dest in $FILE"
            continue
        fi
        HOST2=" echo "$DEST"   sed 's,/.*,,g' "
        if [ -z "$HOST2" ]; then
            logerror "Malformed DEST $DEST in $FILE"
            continue
        fi
        if [ ! "$HOST" = "$HOST2" ]; then
            logerror "$DIR: $HOST doesn't match $HOST2"
            continue
        fi
        logit "Processing $FILE to $STORE/$DEST"
            if runcommand gpg -q -d < "$FILE"   runcommand zstdcat   runcommand zfs receive -u -o readonly=on "$STORE/$DEST"; then
                logit "Successfully processed $FILE to $STORE/$DEST"
                runcommand rm "$FILE"
        else
                logerror "FAILED to process $FILE to $STORE/$DEST"
                EXITCODE=15
        fi
Applying These Ideas to Non-ZFS Backups ZFS backups made our job easier in a lot of ways: Some of these benefits you just won't get without ZFS (or something similar like btrfs), but let's see how we could apply these ideas to non-ZFS backups. I will explore the implementation of them in a future post. When I say "non ZFS", I am being a bit vague as to whether the source, the destination, or both systems are running a non-ZFS filesystem. In general I'll assume that neither are ZFS. The first and most obvious answer is to just tar up the whole system and send that every day. This is, of course, only suitable for small datasets on a fast network. These tarballs could be unpacked on the destination and stored more efficiently via any number of methods (hardlink trees, a block-level deduplicator like borg or rdedup, or even just simply compressed tarballs). To make the network trip more efficient, something like rdiff or xdelta could be used. A signature file could be stored on the machine being backed up (generated via tee/pee at stream time), and the next run could simply send an rdiff delta over NNCP. This would be quite network-efficient, but still would require reading every byte of every file on every backup, and would also require quite a bit of temporary space on the receiving end (to apply the delta to the previous tarball and generate a new one). Alternatively, a program that generates incremental backup files such as rdup could be used. These could be transmitted over NNCP to the backup server, and unpacked there. While perhaps less efficient on the network -- every file with at least one modified byte would be retransmitted in its entirety -- it avoids the need to read every byte of unmodified files or to have enormous temporary space. I should note here that GNU tar claims to have an incremental mode, but it has a potential data loss bug. There are also some tools with algorithms that may apply well in this use care: syrep and fssync being the two most prominent examples, though rdedup (mentioned above) and the nascent asuran project may also be combinable with other tools to achieve this effect. I should, of course, conclude this section by mentioning btrfs. Every time I've tried it, I've run into serious bugs, and its status page indicates that only some of them have been resolved. I would not consider using it for something as important as backups. However, if you are comfortable with it, it is likely to be able to run in more constrained environments than ZFS and could probably be processed in much the same way as zfs streams.

30 December 2020

John Goerzen: Airgapped / Asynchronous Backups with ZFS over NNCP

In my previous articles in the series on asynchronous communication with the modern NNCP tool, I talked about its use for asynchronous, potentially airgapped, backups. The first article, How & Why To Use Airgapped Backups laid out the foundations for this. Now let s dig into the details. Today s post will cover ZFS, because it has a lot of features that make it very easy to support in this setup. Non-ZFS backups will be covered later. The setup is actually about as simple as it is for SSH, but since people are less familiar with this kind of communication, I m going to try to go into more detail here. Assumptions I am assuming a setup where: Hardware Let s start with hardware for the machine to hold the backups. I initially considered a Raspberry Pi 4 with 8GB of RAM. That would probably have been a suitable machine, at least for smaller backup sets. However, none of the Raspberry Pi machines support hardware AES encryption acceleration, and my Pi4 benchmarks as about 60MB/s for AES encryption. I want my backups to be encrypted, and decided this would just be too slow for my purposes. Again, if you don t need encrypted backups or don t care that much about performance may people probably fall into this category you can have a fully-functional Raspberry Pi 4 system for under $100 that would make a fantastic backup server. I wound up purchasing a Qotom-Q355G4 micro PC with a Core i5 for about $315. It has USB 3 ports and is designed as a rugged, long-lasting system. I have been using one of their older Celeron-based models as my router/firewall for a number of years now and it s been quite reliable. For backup storage, you can get a USB 3 external drive. My own preference is to get a USB 3 toaster (device that lets me plug in SATA drives) so that I have more control over the underlying medium and can save the expense and hassle of a bunch of power supplies. In a future post, I will discuss drive rotation so you always have an offline drive. Then, there is the question of transport to the backup machine. A simple solution would be to have a heavily-firewalled backup system that has no incoming ports open but makes occasional outgoing connections to one specific NNCP daemon on the spooling machine. However, for airgapped operation, it would also be very simple to use nncp-xfer to transport the data across on a USB stick or some such. You could set up automounting for a specific USB stick plug it in, all the spooled data is moved over, then plug it in to the backup system and it s processed, and any outbound email traffic or whatever is copied to the USB stick at that point too. The NNCP page has some more commentary about this kind of setup. Both are fairly easy to set up, and NNCP is designed to be transport-agnostic, so in this article I m going to focus on how to integrate ZFS with NNCP. Operating System Of course, it should be no surprise that I set this up on Debian. As an added step, I did all the configuration in Ansible stored in a local git repo. This adds a lot of work, but it means that it is trivial to periodically wipe and reinstall if any security issue is suspected. The git repo can be copied off to another system for storage and takes the system from freshly-installed to ready-to-use state. Security There is, of course, nothing preventing you from running NNCP as root. The zfs commands, obviously, need to be run as root. However, from a privilege separation standpoint, I have chosen to run everything relating to NNCP as a nncp user. NNCP already does encryption, but if you prefer to have zero knowledge of the data even to NNCP, it s trivial to add gpg to the pipeline as well, and in fact I ll be demonstrating that in a future post for other reasons. Software Besides NNCP, there needs to be a system that generates the zfs send streams. For this project, I looked at quite a few. Most were designed to inspect the list of snapshots on a remote end, compare it to a list on the local end, and calculate a difference from there. This, of course, won t work for this situation. I realized my own simplesnap project was very close to being able to do this. It already used an algorithm of using specially-named snapshots on the machine being backed up, so never needed any communication about what snapshots were present where. All it needed was a few more options to permit sending to a stream instead of zfs receive. I made those changes and they are available in simplesnap 2.0.0 or above. That version has also been uploaded to sid, and will work fine as-is on buster as well. Preparing NNCP I m going to assume three hosts in this setup: The basic NNCP workflow documentation covers the basic steps. You ll need to run nncp-cfgnew on each machine. This generates a basic configuration, along with public and private keys for that machine. You ll copy the public key sets to the configurations of the other machines as usual. On the laptop, you ll add a via line like this:
backupsvr:  
  id: ....
  exchpub: ...
  signpub: ...
  noisepub: ...
  via: ["spooler"]
This tells NNCP that data destined for backupsvr should always be sent via spooler first. You can then arrange for the nncp-daemon to run on the spooler, and nncp-caller or nncp-call on the backupsvr. Or, alternatively, airgapped between the two with nncp-xfer. Generating Backup Data Now, on the laptop, install simplesnap (2.0.0 or above). Although you won t be backing up to the local system, simplesnap still maintains a hostlock in ZFS. Prepate a dataset for it:
zfs create tank/simplesnap
zfs set org.complete.simplesnap:exclude=on tank/simplesnap
Then, create a script /usr/local/bin/runsimplesnap like this:
#!/bin/bash
set -e
simplesnap --store tank/simplesnap --setname backups --local --host  hostname  \
   --receivecmd /usr/local/bin/simplesnap-queue \
   --noreap
su nncp -c '/usr/local/nncp/bin/nncp-toss -noprogress -quiet'
if ip addr   grep -q 192.168.65.64; then
  su nncp -c '/usr/local/nncp/bin/nncp-call -noprogress -quiet -onlinedeadline 1 spooler'
fi
The call to simplesnap sets it up to send the data to simplesnap-queue, which we ll create in a moment. The receivmd, plus noreap, sets it up to run without ZFS on the local system. The call to nncp-toss will process any previously-received inbound NNCP packets, if there are any. Then, in this example, we do a very basic check to see if we re on the LAN (checking 192.168.65.64), and if so, will establish a connection to the spooler to transmit the data. If course, you could also do this over the Internet, with tor, or whatever, but in my case, I don t want to automatically do this in case I m tethered to mobile. I figure if I want to send backups in that case, I can fire up nncp-call myself. You can also use nncp-caller to set up automated connections on other schedules; there are a lot of options. Now, here s what /usr/local/bin/simplesnap-queue looks like:
#!/bin/bash
set -e
set -o pipefail
DEST=" echo $1   sed 's,^tank/simplesnap/,,' "
echo "Processing $DEST" >&2
# stdin piped to this
su nncp -c "/usr/local/nncp/bin/nncp-exec -nice B -noprogress backupsvr zfsreceive '$DEST'" >&2
echo "Queued for $DEST" >&2
This is a pretty simple script. simplesnap will call it with a path based on the store, with the hostname after; so, for instance, tank/simplesnap/laptop/root or some such. This script strips off the leading tank/simplesnap (which is a local fragment), leaving the host and dataset paths. Then it just pipes it to nncp-exec. -nice B classifies it as low-priority bulk data (so if you have some more important interactive data, it would be sent first), then passes it to whatever the backupsvr defines as zfsreceive. Receiving ZFS backups In the NNCP configuration on the recipient s side, in the laptop section, we define what command it s allowed to run as zfsreceive:
      exec:  
        zfsreceive: ["/usr/bin/sudo", "-H", "/usr/local/bin/nncp-zfs-receive"]
       
We authorize the nncp user to run this under sudo in /etc/sudoers.d/local nncp:
Defaults env_keep += "NNCP_SENDER"
nncp ALL=(root) NOPASSWD: /usr/local/bin/nncp-zfs-receive
The NNCP_SENDER is the public key ID of the sending node when nncp-toss processes the incoming data. We can use that for sanity checking later. Now, here s a basic nncp-zfs-receive script:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
DEST="$1"
# now process stdin
runcommand zfs receive -o readonly=on -x mountpoint "$STORE/$DEST"
And there you have it all the basics are in place. Update 2020-12-30: An earlier version of this article had zfs receive -F instead of zfs receive -o readonly=on -x mountpoint . These changed arguments are more robust.
Update 2021-01-04: I am now recommending zfs receive -u -o readonly=on ; see my successor article for more. Enhancements You could enhance the nncp-zfs-receive script to improve logging and error handling. For instance:
#!/bin/bash
set -e
set -o pipefail
STORE=backups/simplesnap
# $1 will be the host/dataset
DEST="$1"
HOST=" echo "$1"   sed 's,/.*,,g' "
if [ -z "$HOST" ]; then
   echo "Malformed command line"
   exit 5
fi
# Log a message
logit ()  
   logger -p info -t " basename "$0" [$$]" "$1"
 
# Log an error message
logerror ()  
   logger -p err -t " basename "$0" [$$]" "$1"
 
# Log stdin with the given code.  Used normally to log stderr.
logstdin ()  
   logger -p info -t " basename "$0" [$$/$1]"
 
# Run command, logging stderr and exit code
runcommand ()  
   logit "Running $*"
   if "$@" 2> >(logstdin "$1") ; then
      logit "$1 exited successfully"
      return 0
   else
       RETVAL="$?"
       logerror "$1 exited with error $RETVAL"
       return "$RETVAL"
   fi
 
exiterror ()  
   logerror "$1"
   echo "$1" 1>&2
   exit 10
 
# Sanity check
if [ "$HOST" = "laptop" ]; then
  if [ "$NNCP_SENDER" != "12345678" ]; then
    exiterror "Host $HOST doesn't match sender $NNCP_SENDER"
  fi
else
  exiterror "Unknown host $HOST"
fi
runcommand zfs receive -F "$STORE/$DEST"
Now you ll capture the ZFS receive output in syslog in a friendly way, so you can look back later why things failed if they did. Further notes on NNCP nncp-toss will examine the exit code from an invocation. If it is nonzero, it will keep the command (and associated stdin) in the queue and retry it on the next invocation. NNCP does not guarantee order of execution, so it is possible in some cases that ZFS streams may be received in the wrong order. That is fine here; zfs receive will exit with an error, and nncp-toss will just run it again after the dependent snapshots have been received. For non-ZFS backups, a simple sequence number can handle this issue.

27 December 2020

John Goerzen: Asynchronous Email: Exim over NNCP (or UUCP)

Following up to yesterday s article about how NNCP rehabilitates asynchronous communication with modern encryption and onion routing, here is the first of my posts showing how to put it into action. Email is a natural fit for async; in fact, much of early email was carried by UUCP. It is useful for an airgapped machine to be able to send back messages; errors from cron, results of handling incoming data, disk space alerts, etc. (Of course, this would apply to a non-airgapped machine also). The NNCP documentation already describes how to do this for Postfix. Here I will show how to do it for Exim. A quick detour to UUCP land When you encounter a system such as email that has instructions for doing something via UUCP, that should be an alert to you that here is some very relevant information for doing this same thing via NNCP. The syntax is different, but broadly, here s a table of similar NNCP commands:
Purpose UUCP NNCP
Connect to remote system uucico -s, uupoll nncp-call, nncp-caller
Receive connection (pipe, daemon, etc) uucico (-l or similar) nncp-daemon
Request remote execution, stdin piped in uux nncp-exec
Copy file to remote machine uucp nncp-file
Copy file from remote machine uucp nncp-freq
Process received requests uuxqt nncp-toss
Move outbound requests to dir (for USB stick, airgap, etc) N/A nncp-xfer
Create streaming package of outbound requests N/A nncp-bundle
If you used UUCP back in the day, you surely remember bang paths. I will not be using those here. NNCP handles routing itself, rather than making the MTA be aware of the network topology, so this simplifies things considerably. Sending from Exim to a smarthost One common use for async email is from a satellite system: one that doesn t receive mail, or have local mailboxes, but just needs to get email out to the Internet. This is a common situation even for conventionally-connected systems; in Exim speak, this is a satellite system that routes mail via a smarthost. That is, every outbound message goes to a specific target, which then is responsible for eventual delivery (over the Internet, LAN, whatever). This is fairly simple in Exim. We actually have two choices for how to do this: bsmtp or rmail mode. bsmtp (batch SMTP) is the more modern way, and is essentially a derivative of SMTP that explicitly can be queued asynchronously. Basically it s a set of SMTP commands that can be saved in a file. The alternative is rmail (which is just an alias for sendmail these days), where the data is piped to rmail/sendmail with the recipients given on the command line. Both can work with Exim and NNCP, but because we re doing shiny new things, we ll use bsmtp. These instructions are loosely based on the Using outgoing BSMTP with Exim HOWTO. Some of these may assume Debianness in the configuration, but should be easily enough extrapolated to other configs as well. First, configure Exim to use satellite mode with minimal DNS lookups (assuming that you may not have working DNS anyhow). Then, in the Exim primary router section for smarthost (router/200_exim4-config_primary in Debian split configurations), just change transport = remote_smtp_smarthost to transport = nncp. Now, define the NNCP transport. If you are on Debian, you might name this transports/40_exim4-config_local_nncp:
nncp:
  debug_print = "T: nncp transport for $local_part@$domain"
  driver = pipe
  user = nncp
  batch_max = 100
  use_bsmtp
  command = /usr/local/nncp/bin/nncp-exec -noprogress -quiet hostname_goes_here rsmtp
.ifdef REMOTE_SMTP_HEADERS_REWRITE
  headers_rewrite = REMOTE_SMTP_HEADERS_REWRITE
.endif
.ifdef REMOTE_SMTP_RETURN_PATH
  return_path = REMOTE_SMTP_RETURN_PATH
.endif
This is pretty straightforward. We pipe to nncp-exec, run it as the nncp user. nncp-exec sends it to a target node and runs whatever that node has called rsmtp (the command to receive bsmtp data). When the target node processes the request, it will run the configured command and pipe the data in to it. More complicated: Routing to various NNCP nodes Perhaps you would like to be able to send mail directly to various NNCP nodes. There are a lot of ways to do that. Fundamentally, you will need a setup similar to the UUCP example in Exim s manualroute manual, which lets you define how to reach various hosts via UUCP/NNCP. Perhaps you have a star topology (every NNCP node exchanges email with a central hub). In the NNCP world, you have two choices of how you do this. You could, at the Exim level, make the central hub the smarthost for all the side nodes, and let it redistribute mail. That would work, but requires decrypting messages at the hub to let Exim process. The other alternative is to configure NNCP to just send to the destinations via the central hub; that takes advantage of onion routing and doesn t require any Exim processing at the central hub at all. Receiving mail from NNCP On the receiving side, first you need to configure NNCP to authorize the execution of a mail program. In the section of your receiving host where you set the permissions for the client, include something like this:
      exec:  
        rsmtp: ["/usr/sbin/sendmail", "-bS"]
       
The -bS option is what tells Exim to receive BSMTP on stdin. Now, you need to tell Exim that nncp is a trusted user (able to set From headers arbitrarily). Assuming you are running NNCP as the nncp user, then add MAIN_TRUSTED_USERS = nncp to a file such as /etc/exim4/conf.d/main/01_exim4-config_local-nncp. That s it! Some hosts, of course, both send and receive mail via NNCP and will need configurations for both.

23 December 2020

John Goerzen: How & Why To Use Airgapped Backups

A good backup strategy needs to consider various threats to the integrity of data. For instance: It s that last one that is of particular interest today. A lot of backup strategies are such that if a user (or administrator) has their local account or network compromised, their backups could very well be destroyed as well. For instance, do you ssh from the account being backed up to the system holding the backups? Or rsync using a keypair stored on it? Or access S3 buckets, etc? It is trivially easy in many of these schemes to totally ruin cloud-based backups, or even some other schemes. rsync can be run with delete (and often is, to prune remotes), S3 buckets can be deleted, etc. And even if you try to lock down an over-network backup to be append-only, still there are vectors for attack (ssh credentials, OpenSSL bugs, etc). In this post, I try to explore how we can protect against them and still retain some modern conveniences. A backup scheme also needs to make a balance between: My story so far About 20 years ago, I had an Exabyte tape drive, with the amazing capacity of 7GB per tape! Eventually as disk prices fell, I had external disks plugged in to a server, and would periodically rotate them offsite. I ve also had various combinations of partial or complete offsite copies over the Internet as well. I have around 6TB of data to back up (after compression), a figure that is growing somewhat rapidly as I digitize some old family recordings and videos. Since I last wrote about backups 5 years ago, my scheme has been largely unchanged; at present I use ZFS for local and to-disk backups and borg for the copies over the Internet. Let s take a look at some options that could make this better. Tape The original airgapped backup. You back up to a tape, then you take the (fairly cheap) tape out of the drive and put in another one. In cost per GB, tape is probably the cheapest medium out there. But of course it has its drawbacks. Let s start with cost. To get a drive that can handle capacities of what I d be needing, at least LTO-6 (2.5TB per tape) would be needed, if not LTO-7 (6TB). New, these drives cost several thousand dollars, plus they need LVD SCSI or Fibre Channel cards. You re not going to be hanging one off a Raspberry Pi; these things need a real server with enterprise-style connectivity. If you re particularly lucky, you might find an LTO-6 drive for as low as $500 on eBay. Then there are tapes. A 10-pack of LTO-6 tapes runs more than $200, and provides a total capacity of 25TB sufficient for these needs (note that, of course, you need to have at least double the actual space of the data, to account for multiple full backups in a set). A 5-pack of LTO-7 tapes is a little more expensive, while providing more storage. So all-in, this is going to be in the best possible scenario nearly $1000, and possibly a lot more. For a large company with many TB of storage, the initial costs can be defrayed due to the cheaper media, but for a home user, not so much. Consider that 8TB hard drives can be found for $150 $200. A pair of them (for redundancy) would run $300-400, and then you have all the other benefits of disk (quicker access, etc.) Plus they can be driven by something as cheap as a Raspberry Pi. Fancier tape setups involve auto-changers, but then you re not really airgapped, are you? (If you leave all your tapes in the changer, they can generally be selected and overwritten, barring things like hardware WORM). As useful as tape is, for this project, it would simply be way more expensive than disk-based options. Fundamentals of disk-based airgapping The fundamental thing we need to address with disk-based airgapping is that the machines being backed up have no real-time contact with the backup storage system. This rules out most solutions out there, that want to sync by comparing local state with remote state. If one is willing to throw storage efficiency out the window maybe practical for very small data sets one could just send a full backup daily. But in reality, what is more likely needed is a way to store a local proxy for the remote state. Then a runner device (a USB stick, disk, etc) could be plugged into the network, filled with queued data, then plugged into the backup system to have the data dequeued and processed. Some may be tempted to short-circuit this and just plug external disks into a backup system. I ve done that for a long time. This is, however, a risk, because it makes those disks vulnerable to whatever may be attacking the local system (anything from lightning to ransomware). ZFS ZFS is, it should be no surprise, particularly well suited for this. zfs send/receive can send an incremental stream that represents a delta between two checkpoints (snapshots or bookmarks) on a filesystem. It can do this very efficiently, much more so than walking an entire filesystem tree. Additionally, with the recent addition of ZFS crypto to ZFS on Linux, the replication stream can optionally reflect the encrypted data. Yes, as long as you don t need to mount them, you can mostly work with ZFS datasets on an encrypted basis, and can directly tell zfs send to just send the encrypted data instead of the decrypted data. The downside of ZFS is the resource requirements at the destination, which in terms of RAM are higher than most of the older Raspberry Pi-style devices. Still, one could perhaps just save off zfs send streams and restore them later if need be, but that implies a periodic resend of a full stream, an inefficient operation. dedpulicating software such as borg could be used on those streams (though with less effectiveness if they re encrypted). Tar Perhaps surprisingly, tar in listed incremental mode can solve this problem for non-ZFS users. It will keep a local cache of the state of the filesystem as of the time of the last run of tar, and can generate new tarballs that reflect the changes since the previous run (even deletions). This can achieve a similar result to the ZFS send/receive, though in a much less elegant way. Bacula / Bareos Bacula (and its fork Bareos) both have support for a FIFO destination. Theoretically this could be used to queue of data for transfer to the airgapped machine. This support is very poorly documented in both and is rumored to have bitrotted, however. rdiff and xdelta rdiff and xdelta can be used as sort of a non-real-time rsync, at least on a per-file basis. Theoretically, one could generate a full backup (with tar, ZFS send, or whatever), take an rdiff signature, and send over the file while keeping the signature. On the next run, another full backup is piped into rdiff, and on the basis of the signature file of the old and the new data, it produces a binary patch that can be queued for the backup target to update its stored copy of the file. This leaves history preservation as an exercise to be undertaken on the backup target. It may not necessarily be easy and may not be efficient. rsync batches rsync can be used to compute a delta between two directory trees and express this as a single-file batch that can be processed by a remote rsync. Unfortunately this implies the sender must always keep an old tree around (barring a solution such as ZFS snapshots) in order to compute the delta, and of course it still implies the need for history processing on the remote. Getting the Data There OK, so you ve got an airgapped system, some sort of runner device for your sneakernet (USB stick, hard drive, etc). Now what? Obviously you could just copy data on the runner and move it back off at the backup target. But a tool like NNCP (sort of a modernized UUCP) offer a lot of help in automating the process, returning error reports, etc. NNCP can be used online over TCP, over reliable serial links, over ssh, with offline onion routing via intermediaries or directly, etc. Imagine having an airgapped machine at a different location you go to frequently (workplace, friend, etc). Before leaving, you put a USB stick in your pocket. When you get there, you pop it in. It s despooled and processed while you want, and return emails or whatever are queued up to be sent when you get back home. Not bad, eh? Future installment I m going to try some of these approaches and report back on my experiences in the next few weeks.

21 December 2020

Russ Allbery: Review: Behind the Throne

Review: Behind the Throne, by K.B. Wagers
Series: Indranan War #1
Publisher: Orbit
Copyright: August 2016
ISBN: 0-316-30859-5
Format: Kindle
Pages: 416
Hail is a gunrunner, an outlaw and criminal, someone who knows how to survive violence and navigate by personal loyalty. That world knows her as Cressen Stone. What her colleagues don't know is that she's also an Imperial Princess. Hailimi Mercedes Jaya Bristol left that world twenty years earlier in secret pursuit of her father's killer and had no intention of returning. But her sisters are dead, her mother's health is failing, and two Imperial Trackers have been sent to bring her back to her rightful position as heir. I'm going to warn up-front that the first half of this novel was rough to the point of being unreadable. Wagers tries much too hard to establish Hail as a reluctant heroine torn between her dislike of royal protocols and her grief and anger at the death of her sisters. The result is excessively melodramatic and, to be frank, badly written. There are a lot of passages like this:
His words slammed into me, burning like the ten thousand volts of a Solarian Conglomerate police Taser.
(no, there's no significance to the Solarian Conglomerate here), or, just three paragraphs later:
The air rushed out of my lungs. Added grief for a niece I'd never known. One more log on the pyre set to burn my freedom to ashes. The hope I'd had of getting out of this mess was lost in that instant, and I couldn't do anything but stare at Emmory in abject shock.
Given how much air rushes out of Hail's lungs and how often she's struck down with guilt or grief, it's hard to believe she doesn't have brain damage. Worse, Hail spends a great deal of the first third of the book whining, which given that the book is written in first person gets old very quickly. Every emotion is overwritten and overstressed as Hail rails against obvious narrative inescapability. It's blatantly telegraphed from the first few pages that Hail is going to drop into the imperial palace like a profane invasion force and shake everything up, but the reader has to endure far too long of Hail being dramatically self-pitying about the plot. I almost gave up on this book in irritation (and probably should have). And then it sort of grew on me, because the other thing Wagers is doing (also not subtly) is a story trope for which I have a particular weakness: The fish out of water who nonetheless turns out to be the person everyone needs because she's systematically and deliberately kind and thoughtful while not taking any shit. Hail left Pashati young and inexperienced, with a strained relationship with her mother and a habit of letting her temper interfere with her ability to negotiate palace politics. She still has the temper, but age, experience, and confidence mean that she's decisive and confident in a way she never was before. The second half of this book is about Hail building her power base and winning loyalty by being loyal and decent. It's still not great writing, but there's something there I enjoyed reading. Wagers's setting is intriguing, although it makes me a bit nervous. The Indranan Empire was settled by colonists of primarily Indian background. The court trappings, mythology, and gods referenced in Behind the Throne are Hindu-derived, and I suspect (although didn't confirm) that the funeral arrangements are as well. Formal wear (and casual wear) for women is a sari. There's a direct reference to the goddess Lakshimi (not Lakshmi, which Wikipedia seems to indicate is the correct spelling, although transliteration is always an adventure). I was happy to see this, since there are more than enough SF novels out there that seem to assume only western countries go into space. But I'm never sure whether the author did enough research or has enough personal knowledge to pull off the references correctly, and I personally wouldn't know the difference. The Indranan Empire is also matriarchal, and here Wagers goes for an inversion of sexism that puts men in roughly the position women were in the 1970s. They can, in theory, do most jobs, but there are many things they're expected not to do, there are some explicit gender lines in power structures, and the role of men in society is a point of political conflict. It's skillfully injected as social background, with a believable pattern of societal prejudice that doesn't necessarily apply to specific men in specific situations. I liked that Wagers did this without giving the Empire itself any feminine-coded characteristics. All admirals are women because the characters believe women are obviously better military leaders, not because of some claptrap about nurturing or caring or some other female-coded reason from our society. That said, this gender role inversion didn't feel that significant to the story. The obvious "sexism is bad, see what it would be like if men were subject to it" message ran parallel to the main plot and never felt that insightful to me. I'm therefore not sure it was successful or worth the injection of sexism into the reading experience, although it certainly is different from the normal fare of space empires. I can't recommend Behind the Throne because a lot of it just isn't very good. But I still kind of want to because I sincerely enjoyed the last third of the book, despite some lingering melodrama. Watching Hail succeed by being a decent, trustworthy, loyal, and intelligent person is satisfying, once she finally stops whining. The destination is probably not worth the journey, but now that I've finished the first book, I'm tempted to grab the second. Followed by After the Crown. Rating: 6 out of 10

26 October 2020

Marco d'Itri: RPKI validation with FORT Validator

This article documents how to install FORT Validator (an RPKI relying party software which also implements the RPKI to Router protocol in a single daemon) on Debian 10 to provide RPKI validation to routers. If you are using testing or unstable then you can just skip the part about apt pinnings. The packages in bullseye (Debian testing) can be installed as is on Debian stable with no need to rebuild them, by configuring an appropriate pinning for apt:
cat <<END > /etc/apt/sources.list.d/bullseye.list
deb http://deb.debian.org/debian/ bullseye main
END
cat <<END > /etc/apt/preferences.d/pin-rpki
# by default do not install anything from bullseye
Package: *
Pin: release bullseye
Pin-Priority: 100
Package: fort-validator rpki-trust-anchors
Pin: release bullseye
Pin-Priority: 990
END
apt update
Before starting, make sure that curl (or wget) and the web PKI certificates are installed:
apt install curl ca-certificates
If you already know about the legal issues related to the ARIN TAL then you may instruct the package to automatically install it. If you skip this step then you will be asked at installation time about it, either way is fine.
echo 'rpki-trust-anchors rpki-trust-anchors/get_arin_tal boolean true' \
    debconf-set-selections
Install the package as usual:
apt install fort-validator
You may also install rpki-client and gortr on Debian 10, or maybe cfrpki and gortr. I have also tried packaging Routinator 3000 for Debian, but this effort is currently on hold because the Rust ecosystem is broken and hostile to the good packaging practices of Linux distributions.

Marco d'Itri: RPKI validation with OpenBSD's rpki-client and Cloudflare's gortr

This article documents how to install rpki-client (an RPKI relying party software, the actual validator) and gortr (which implements the RPKI to Router protocol) on Debian 10 to provide RPKI validation to routers. If you are using testing or unstable then you can just skip the part about apt pinnings. The packages in bullseye (Debian testing) can be installed as is on Debian stable with no need to rebuild them, by configuring an appropriate pinning for apt:
cat <<END > /etc/apt/sources.list.d/bullseye.list
deb http://deb.debian.org/debian/ bullseye main
END
cat <<END > /etc/apt/preferences.d/pin-rpki
# by default do not install anything from bullseye
Package: *
Pin: release bullseye
Pin-Priority: 100
Package: gortr rpki-client rpki-trust-anchors
Pin: release bullseye
Pin-Priority: 990
END
apt update
Before starting, make sure that curl (or wget) and the web PKI certificates are installed:
apt install curl ca-certificates
If you already know about the legal issues related to the ARIN TAL then you may instruct the package to automatically install it. If you skip this step then you will be asked at installation time about it, either way is fine.
echo 'rpki-trust-anchors rpki-trust-anchors/get_arin_tal boolean true' \
    debconf-set-selections
Install the packages as usual:
apt install rpki-client gortr
And then configure rpki-client to generate its output in the the JSON format needed by gortr:
echo 'OPTIONS=-j' > /etc/default/rpki-client
You may manually start the service unit to immediately generate the data instead of waiting for the next timer run:
systemctl start rpki-client &
gortr too needs to be configured to use the JSON data generated by rpki-client:
echo 'GORTR_ARGS=-bind :323 -verify=false -checktime=false -cache /var/lib/rpki-client/json' > /etc/default/gortr
And then it needs to be restarted to use the new configuration:
systemctl restart gortr
You may also install FORT Validator on Debian 10, or maybe cfrpki with gortr. I have also tried packaging Routinator 3000 for Debian, but this effort is currently on hold because the Rust ecosystem is broken and hostile to the packaging practices of Linux distributions.

3 October 2020

Steve Kemp: Writing an assembler.

Recently I've been writing a couple of simple compilers, which take input in a particular format and generate assembly language output. This output can then be piped through gcc to generate a native executable. Public examples include this trivial math compiler and my brainfuck compiler. Of course there's always the nagging thought that relying upon gcc (or nasm) is a bit of a cheat. So I wondered how hard is it to write an assembler? Something that would take assembly-language program and generate a native (ELF) binary? And the answer is "It isn't hard, it is just tedious". I found some code to generate an ELF binary, and after that assembling simple instructions was pretty simple. I remember from my assembly-language days that the encoding of instructions can be pretty much handled by tables, but I've not yet gone into that. (Specifically there are instructions like "add rax, rcx", and the encoding specifies the source/destination registers - with different forms for various sized immediates.) Anyway I hacked up a simple assembler, it can compile a.out from this input:
.hello   DB "Hello, world\n"
.goodbye DB "Goodbye, world\n"
        mov rdx, 13        ;; write this many characters
        mov rcx, hello     ;; starting at the string
        mov rbx, 1         ;; output is STDOUT
        mov rax, 4         ;; sys_write
        int 0x80           ;; syscall
        mov rdx, 15        ;; write this many characters
        mov rcx, goodbye   ;; starting at the string
        mov rax, 4         ;; sys_write
        mov rbx, 1         ;; output is STDOUT
        int 0x80           ;; syscall
        xor rbx, rbx       ;; exit-code is 0
        xor rax, rax       ;; syscall will be 1 - so set to xero, then increase
        inc rax            ;;
        int 0x80           ;; syscall
The obvious omission is support for "JMP", "JMP_NZ", etc. That's painful because jumps are encoded with relative offsets. For the moment if you want to jump:
        push foo     ; "jmp foo" - indirectly.
        ret
:bar
        nop          ; Nothing happens
        mov rbx,33   ; first syscall argument: exit code
        mov rax,1    ; system call number (sys_exit)
        int 0x80     ; call kernel
:foo
        push bar     ; "jmp bar" - indirectly.
        ret
I'll update to add some more instructions, and see if I can use it to handle the output I generate from a couple of other tools. If so that's a win, if not then it was a fun learning experience:

Next.

Previous.