Search Results: "wli"

24 February 2024

Niels Thykier: Language Server for Debian: Spellchecking

This is my third update on writing a language server for Debian packaging files, which aims at providing a better developer experience for Debian packagers. Lets go over what have done since the last report.

Semantic token support I have added support for what the Language Server Protocol (LSP) call semantic tokens. These are used to provide the editor insights into tokens of interest for users. Allegedly, this is what editors would use for syntax highlighting as well. Unfortunately, eglot (emacs) does not support semantic tokens, so I was not able to test this. There is a 3-year old PR for supporting with the last update being ~3 month basically saying "Please sign the Copyright Assignment". I pinged the GitHub issue in the hopes it will get unstuck. For good measure, I also checked if I could try it via neovim. Before installing, I read the neovim docs, which helpfully listed the features supported. Sadly, I did not spot semantic tokens among those and parked from there. That was a bit of a bummer, but I left the feature in for now. If you have an LSP capable editor that supports semantic tokens, let me know how it works for you! :)

Spellchecking Finally, I implemented something Otto was missing! :) This stared with Paul Wise reminding me that there were Python binding for the hunspell spellchecker. This enabled me to get started with a quick prototype that spellchecked the Description fields in debian/control. I also added spellchecking of comments while I was add it. The spellchecker runs with the standard en_US dictionary from hunspell-en-us, which does not have a lot of technical terms in it. Much less any of the Debian specific slang. I spend considerable time providing a "built-in" wordlist for technical and Debian specific slang to overcome this. I also made a "wordlist" for known Debian people that the spellchecker did not recognise. Said wordlist is fairly short as a proof of concept, and I fully expect it to be community maintained if the language server becomes a success. My second problem was performance. As I had suspected that spellchecking was not the fastest thing in the world. Therefore, I added a very small language server for the debian/changelog, which only supports spellchecking the textual part. Even for a small changelog of a 1000 lines, the spellchecking takes about 5 seconds, which confirmed my suspicion. With every change you do, the existing diagnostics hangs around for 5 seconds before being updated. Notably, in emacs, it seems that diagnostics gets translated into an absolute character offset, so all diagnostics after the change gets misplaced for every character you type. Now, there is little I could do to speed up hunspell. But I can, as always, cheat. The way diagnostics work in the LSP is that the server listens to a set of notifications like "document opened" or "document changed". In a response to that, the LSP can start its diagnostics scanning of the document and eventually publish all the diagnostics to the editor. The spec is quite clear that the server owns the diagnostics and the diagnostics are sent as a "notification" (that is, fire-and-forgot). Accordingly, there is nothing that prevents the server from publishing diagnostics multiple times for a single trigger. The only requirement is that the server publishes the accumulated diagnostics in every publish (that is, no delta updating). Leveraging this, I had the language server for debian/changelog scan the document and publish once for approximately every 25 typos (diagnostics) spotted. This means you quickly get your first result and that clears the obsolete diagnostics. Thereafter, you get frequent updates to the remainder of the document if you do not perform any further changes. That is, up to a predefined max of typos, so we do not overload the client for longer changelogs. If you do any changes, it resets and starts over. The only bit missing was dealing with concurrency. By default, a pygls language server is single threaded. It is not great if the language server hangs for 5 seconds everytime you type anything. Fortunately, pygls has builtin support for asyncio and threaded handlers. For now, I did an async handler that await after each line and setup some manual detection to stop an obsolete diagnostics run. This means the server will fairly quickly abandon an obsolete run. Also, as a side-effect of working on the spellchecking, I fixed multiple typos in the changelog of debputy. :)

Follow up on the "What next?" from my previous update In my previous update, I mentioned I had to finish up my python-debian changes to support getting the location of a token in a deb822 file. That was done, the MR is now filed, and is pending review. Hopefully, it will be merged and uploaded soon. :) I also submitted my proposal for a different way of handling relationship substvars to debian-devel. So far, it seems to have received only positive feedback. I hope it stays that way and we will have this feature soon. Guillem proposed to move some of this into dpkg, which might delay my plans a bit. However, it might be for the better in the long run, so I will wait a bit to see what happens on that front. :) As noted above, I managed to add debian/changelog as a support format for the language server. Even if it only does spellchecking and trimming of trailing newlines on save, it technically is a new format and therefore cross that item off my list. :D Unfortunately, I did not manage to write a linter variant that does not involve using an LSP-capable editor. So that is still pending. Instead, I submitted an MR against elpa-dpkg-dev-el to have it recognize all the fields that the debian/control LSP knows about at this time to offset the lack of semantic token support in eglot.

From here... My sprinting on this topic will soon come to an end, so I have to a bit more careful now with what tasks I open! I think I will narrow my focus to providing a batch linting interface. Ideally, with an auto-fix for some of the more mechanical issues, where this is little doubt about the answer. Additionally, I think the spellchecking will need a bit more maturing. My current code still trips on naming patterns that are "clearly" verbatim or code references like things written in CamelCase or SCREAMING_SNAKE_CASE. That gets annoying really quickly. It also trips on a lot of commands like dpkg-gencontrol, but that is harder to fix since it could have been a real word. I think those will have to be fixed people using quotes around the commands. Maybe the most popular ones will end up in the wordlist. Beyond that, I will play it by ear if I have any time left. :)

20 February 2024

Niels Thykier: Language Server (LSP) support for debian/control

About a month ago, Otto Kek l inen asked for editor extensions for debian related files on the debian-devel mailing list. In that thread, I concluded that what we were missing was a "Language Server" (LSP) for our packaging files. Last week, I started a prototype for such a LSP for the debian/control file as a starting point based on the pygls library. The initial prototype worked and I could do very basic diagnostics plus completion suggestion for field names.

Current features I got 4 basic features implemented, though I have only been able to test two of them in emacs.

Diagnostics or linting of basic issues.

Completion suggestions for all known field names that I could think of and values for some fields.

Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.

On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.

Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience. Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.

Completion suggestions For completion, all known fields are completed. Place the cursor at the start of the line or in a partially written out field name and trigger the completion in your editor. In my case, I can type R-R-R and trigger the completion and the editor will automatically replace it with Rules-Requires-Root as the only applicable match. Your milage may vary since I delegate most of the filtering to the editor, meaning the editor has the final say about whether your input matches anything. The only filtering done on the server side is that the server prunes out fields already used in the paragraph, so you are not presented with the option to repeat an already used field, which would be an error. Admittedly, not an error the language server detects at the moment, but other tools will. When completing field, if the field only has one non-default value such as Essential which can be either no (the default, but you should not use it) or yes, then the completion suggestion will complete the field along with its value. This is mostly only applicable for "yes/no" fields such as Essential and Protected. But it does also trigger for Package-Type at the moment. As for completing values, here the language server can complete the value for simple fields such as "yes/no" fields, Multi-Arch, Package-Type and Priority. I intend to add support for Section as well - maybe also Architecture.

Diagnostics On the diagnostic front, I have added multiple diagnostics:

An error marker for syntax errors.

An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.

An error marker for adding Multi-Arch: same to an Architecture: all package.

Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.

Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.

Info level marker if a field like Priority duplicates the value of the Source paragraph.

Notable omission at this time:

No errors are raised if a field does not have a value.

No errors are raised if a field is duplicated inside a paragraph.

No errors are used if a field is used in the wrong paragraph.

No spellchecking of the Description field.

No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.

Quick fixes to solve these problems... :)

Trying it out If you want to try, it is sadly a bit more involved due to things not being uploaded or merged yet. Also, be advised that I will regularly rebase my git branches as I revise the code. The setup:

Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.

Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.

Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy

Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.

Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them. For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.

Using the debputy LSP in emacs The guide I found so far relies on eglot. The guide below assumes you have the elpa-dpkg-dev-el package installed for the debian-control-mode. Though it should be a trivially matter to replace debian-control-mode with a different mode if you use a different mode for your debian/control file. In your emacs init file (such as ~/.emacs or ~/.emacs.d/init.el), you add the follow blob.

(with-eval-after-load 'eglot
    (add-to-list 'eglot-server-programs
        '(debian-control-mode . ("debputy" "lsp" "debian/control"))))

Once you open the debian/control file in emacs, you can type M-x eglot to activate the language server. Not sure why that manual step is needed and if someone knows how to automate it such that eglot activates automatically on opening debian/control, please let me know. For testing completions, I often have to manually activate them (with C-M-i or M-x complete-symbol). Though, it is a bit unclear to me whether this is an emacs setting that I have not toggled or something I need to do on the language server side.

From here As next steps, I will probably look into fixing some of the "known missing" items under diagnostics. The quick fix would be a considerable improvement to assisting users. In the not so distant future, I will probably start to look at supporting other files such as debian/changelog or look into supporting configuration, so I can cover formatting features like wrap-and-sort. I am also very much open to how we can provide integrations for this feature into editors by default. I will probably create a separate binary package for specifically this feature that pulls all relevant dependencies that would be able to provide editor integrations as well.

30 January 2024

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:

/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020

Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:

Issuer Compromised Count

Sectigo 170

ISRG (Let's Encrypt) 161

GoDaddy 141

DigiCert 81

GlobalSign 46

Entrust 3

SSL.com 1

If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Issuer	Compromised Count
Sectigo	170
ISRG (Let's Encrypt)	161
GoDaddy	141
DigiCert	81
GlobalSign	46
Entrust	3
SSL.com	1

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.

Issuer Issuance Volume Compromised Count Compromise Rate

Sectigo 88,323,068 170 1 in 519,547

ISRG (Let's Encrypt) 315,476,402 161 1 in 1,959,480

GoDaddy 56,121,429 141 1 in 398,024

DigiCert 144,713,475 81 1 in 1,786,586

GlobalSign 1,438,485 46 1 in 31,271

Entrust 23,166 3 1 in 7,722

SSL.com 171,816 1 1 in 171,816

If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:

Issuer Issuance Volume Compromised Count Compromise Rate

Entrust 23,166 3 1 in 7,722

GlobalSign 1,438,485 46 1 in 31,271

SSL.com 171,816 1 1 in 171,816

GoDaddy 56,121,429 141 1 in 398,024

Sectigo 88,323,068 170 1 in 519,547

DigiCert 144,713,475 81 1 in 1,786,586

ISRG (Let's Encrypt) 315,476,402 161 1 in 1,959,480

By grouping by order-of-magnitude in the compromise rate, we can identify three bands :

The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.

The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.

The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!

Now we have some useful insights we can think about.

Issuer	Issuance Volume	Compromised Count	Compromise Rate
Sectigo	88,323,068	170	1 in 519,547
ISRG (Let's Encrypt)	315,476,402	161	1 in 1,959,480
GoDaddy	56,121,429	141	1 in 398,024
DigiCert	144,713,475	81	1 in 1,786,586
GlobalSign	1,438,485	46	1 in 31,271
Entrust	23,166	3	1 in 7,722
SSL.com	171,816	1	1 in 171,816

Why Is It So?
If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Sidebar: ACME Does Not Currently Rule The World It is true that all of the organisations in this analysis also provide ACME issuance workflows, should customers desire it. However, the traditional CA companies have been around a lot longer than ACME has, and so they probably acquired many of their customers before ACME existed. Given that it s incredibly hard to get humans to change the way they do things, once they have a way that works , it seems reasonable to assume that most of the certificates issued by these CAs are handled in the same human-centric, error-prone manner they always have been. If organisations would like to refute this assumption, though, by sharing their data on ACME vs legacy issuance rates, I m sure we d all be extremely interested.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:

SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';

The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:

Issuer	Issuance Volume	Compromised Count	Compromise Rate
Entrust	23,166	3	1 in 7,722
GlobalSign	1,438,485	46	1 in 31,271
SSL.com	171,816	1	1 in 171,816
GoDaddy	56,121,429	141	1 in 398,024
"Regular" DigiCert	40,397,363	81	1 in 498,732
Sectigo	88,323,068	170	1 in 519,547
All DigiCert	144,713,475	81	1 in 1,786,586
ISRG (Let's Encrypt)	315,476,402	161	1 in 1,959,480

In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.
The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.

Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.

Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.

Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

1 January 2024

Russ Allbery: 2023 Book Reading in Review

In 2023, I finished and reviewed 53 books, continuing a trend of year-over-year increases and of reading the most books since 2012 (the last year I averaged five books a month). Reviewing continued to be uneven, with a significant slump in the summer and smaller slumps in February and November, and a big clump of reviews finished in October in addition to my normal year-end reading and reviewing vacation. The unevenness this year was mostly due to finishing books and not writing reviews immediately. Reviews are much harder to write when the finished books are piling up, so one goal for 2024 is to not let that happen again. I enter the new year with one book finished and not yet reviewed, after reading a book about every day and a half during my December vacation. I read two all-time favorite books this year. The first was Emily Tesh's debut novel Some Desperate Glory, which is one of the best space opera novels I have ever read. I cannot improve on Shelley Parker-Chan's blurb for this book: "Fierce and heartbreakingly humane, this book is for everyone who loved Ender's Game, but Ender's Game didn't love them back." This is not hard science fiction but it is fantastic character fiction. It was exactly what I needed in the middle of a year in which I was fighting a "burn everything down" mood. The second was Night Watch by Terry Pratchett, the 29th Discworld and 6th Watch novel. Throughout my Discworld read-through, Pratchett felt like he was on the cusp of a truly stand-out novel, one where all the pieces fit and the book becomes something more than the sum of its parts. This was that book. It's a book about ethics and revolutions and governance, but also about how your perception of yourself changes as you get older. It does all of the normal Pratchett things, just... better. While I would love to point new Discworld readers at it, I think you do have to read at least the Watch novels that came before it for it to carry its proper emotional heft. This was overall a solid year for fiction reading. I read another 15 novels I rated 8 out of 10, and 12 that I rated 7 out of 10. The largest contributor to that was my Discworld read-through, which was reliably entertaining throughout the year. The run of Discworld books between The Fifth Elephant (read late last year) and Wintersmith (my last of this year) was the best run of Discworld novels so far. One additional book I'll call out as particularly worth reading is Thud!, the Watch novel after Night Watch and another excellent entry. I read two stand-out non-fiction books this year. The first was Oliver Darkshire's delightful memoir about life as a rare book seller, Once Upon a Tome. One of the things I will miss about Twitter is the regularity with which I stumbled across fascinating people and then got to read their books. I'm off Twitter permanently now because the platform is designed to make me incoherently angry and I need less of that in my life, but it was very good at finding delightfully quirky books like this one. My other favorite non-fiction book of the year was Michael Lewis's Going Infinite, a profile of Sam Bankman-Fried. I'm still bemused at the negative reviews that this got from people who were upset that Lewis didn't turn the story into a black-and-white morality play. Bankman-Fried's actions were clearly criminal; that's not in dispute. Human motivations can be complex in ways that are irrelevant to the law, and I thought this attempt to understand that complexity by a top-notch storyteller was worthy of attention. Also worth a mention is Tony Judt's Postwar, the first book I reviewed in 2023. A sprawling history of post-World-War-II Europe will never have the sheer readability of shorter, punchier books, but this was the most informative book that I read in 2023. 2024 should see the conclusion of my Discworld read-through, after which I may return to re-reading Mercedes Lackey or David Eddings, both of which I paused to make time for Terry Pratchett. I also have another re-read similar to my Chronicles of Narnia reviews that I've been thinking about for a while. Perhaps I will start that next year; perhaps it will wait for 2025. Apart from that, my intention as always is to read steadily, write reviews as close to when I finished the book as possible, and make reading time for my huge existing backlog despite the constant allure of new releases. Here's to a new year full of more new-to-me books and occasional old favorites. The full analysis includes some additional personal reading statistics, probably only of interest to me.

Tim Retout: Prevent DOM-XSS with Trusted Types - a smarter DevSecOps approach

It can be incredibly easy for a frontend developer to accidentally write a client-side cross-site-scripting (DOM-XSS) security issue, and yet these are hard for security teams to detect. Vulnerability scanners are slow, and suffer from false positives. Can smarter collaboration between development, operations and security teams provide a way to eliminate these problems altogether? Google claims that Trusted Types has all but eliminated DOM-XSS exploits on those of their sites which have implemented it. Let s find out how this can work!

DOM-XSS vulnerabilities are easy to write, but hard for security teams to catch It is very easy to accidentally introduce a client-side XSS problem. As an example of what not to do, suppose you are setting an element s text to the current URL, on the client side:

// Don't do this
para.innerHTML = location.href;

Unfortunately, an attacker can now manipulate the URL (and e.g. send this link in a phishing email), and any HTML tags they add will be interpreted by the user s browser. This could potentially be used by the attacker to send private data to a different server. Detecting DOM-XSS using vulnerability scanning tools is challenging - typically this requires crawling each page of the website and attempting to detect problems such as the one above, but there is a significant risk of false positives, especially as the complexity of the logic increases. There are already ways to avoid these exploits developers should validate untrusted input before making use of it. There are libraries such as DOMPurify which can help with sanitization.¹ However, if you are part of a security team with responsibility for preventing these issues, it can be complex to understand whether you are at risk. Different developer teams may be using different techniques and tools. It may be impossible for you to work closely with every developer so how can you know that the frontend team have used these libraries correctly?

Trusted Types closes the DevSecOps feedback loop for DOM-XSS, by allowing Ops and Security to verify good Developer practices Trusted Types enforces sanitization in the browser², by requiring the web developer to assign a particular kind of JavaScript object rather than a native string to `.innerHTML` and other dangerous properties. Provided these special types are created in an appropriate way, then they can be trusted not to expose XSS problems. This approach will work with whichever tools the frontend developers have chosen to use, and detection of issues can be rolled out by infrastructure engineers without requiring frontend code changes.

Content Security Policy allows enforcement of security policies in the browser itself Because enforcing this safer approach in the browser for all websites would break backwards-compatibility, each website must opt-in through Content Security Policy headers. Content Security Policy (CSP) is a mechanism that allows web pages to restrict what actions a browser should execute on their page, and a way for the site to receive reports if the policy is violated. Figure 1: Content-Security-Policy browser communication This is revolutionary, because it allows servers to receive feedback in real time on errors that may be appearing in the browser s console.

Trusted Types can be rolled out incrementally, with continuous feedback Web.dev s article on Trusted Types explains how to safely roll out the feature using the features of CSP itself:

Deploy a CSP collector if you haven t already

Switch on CSP reports without enforcement (via `Content-Security-Policy-Report-Only` headers)

Iteratively review and fix the violations

Switch to enforcing mode when there are a low enough rate of reports

Static analysis in a continuous integration pipeline is also sensible you want to prevent regressions shipping in new releases before they trigger a flood of CSP reports. This will also give you a chance of finding any low-traffic vulnerable pages.

Smart security teams will use techniques like Trusted Types to eliminate entire classes of bugs at a time Rather than playing whack-a-mole with unreliable vulnerability scanning or bug bounties, techniques such as Trusted Types are truly in the spirit of Secure by Design build high quality in from the start of the engineering process, and do this in a way which closes the DevSecOps feedback loop between your Developer, Operations and Security teams.

Sanitization libraries are especially needed when the examples become more complex, e.g. if the application must manipulate the input. DOMPurify version 1.0.9 also added Trusted Types support, so can still be used to help developers adopt this feature.

Trusted Types has existed in Chrome and Edge since 2020, and should soon be coming to Firefox as well. However, it s not necessary to wait for Firefox or Safari to add support, because the large market share of Chrome and Edge will let you identify and fix your site s DOM-XSS issues, even if you do not set enforcing mode, and users of all browsers will benefit. Even so, it is great that Mozilla is now on board.

10 November 2023

Petter Reinholdtsen: New chrpath release 0.17

The chrpath package provide a simple command line tool to remove or modify the rpath or runpath of compiled ELF program. It is almost 10 years since I updated the code base, but I stumbled over the tool today, and decided it was time to move the code base from Subversion to git and find a new home for it, as the previous one (Debian Alioth) has been shut down. I decided to go with Codeberg this time, as it is my git service of choice these days, did a quick and dirty migration to git and updated the code with a few patches I found in the Debian bug tracker. These are the release notes: New in 0.17 released 2023-11-10:

Moved project to Codeberg, as Alioth is shut down.
Add Solaris support (use <sys/byteorder.h> instead of <byteswap.h>). Patch from Rainer Orth.
Added missing newline from printf() line. Patch from Frank Dana.
Corrected handling of multiple ELF sections. Patch from Frank Dana.
Updated build rules for .deb. Partly based on patch from djcj.

The latest edition is tagged and available from https://codeberg.org/pere/chrpath. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

21 August 2023

Russ Allbery: Review: Some Desperate Glory

Review: Some Desperate Glory, by Emily Tesh

Publisher:	Tordotcom
Copyright:	2023
ISBN:	1-250-83499-6
Format:	Kindle
Pages:	438

Some Desperate Glory is a far-future space... opera? That's probably the right genre classification given the setting, but this book is much more intense and character-focused than most space opera. It is Emily Tesh's first novel, although she has two previous novellas that were published as books. The alien majo and their nearly all-powerful Wisdom have won the war by destroying Earth with an antimatter bomb. The remnants of humanity were absorbed into the sprawling majo civilization. Gaea Station is the lone exception: a marginally viable station deep in space, formed from a lifeless rocky planetoid and the coupled hulks of the last four human dreadnoughts. Gaea Station survives on military discipline, ruthless use of every available resource, and constant training, raising new generations of soldiers for the war that it refuses to let end. While Earth's children live, the enemy shall fear us. Kyr is a warbreed, one of a genetically engineered line of soldiers that, following an accident, Gaea Station has lost the ability to make except the old-fashioned way. Among the Sparrows, her mess group, she is the best at the simulated combat exercises they use for training. She may be the best of her age cohort except her twin Magnus. As this novel opens, she and the rest of the Sparrows are about to get their adult assignments. Kyr is absolutely focused on living up to her potential and the attention of her uncle Jole, the leader of the station. Kyr's future will look nothing like what she expects. This book was so good, and I despair of explaining why it was so good without unforgivable spoilers. I can tell you a few things about it, but be warned that I'll be reduced to helpless gestures and telling you to just go read it. It's been a very long time since I was this surprised by a novel, possibly since I read Code Name: Verity for the first time. Some Desperate Glory follows Kyr in close third-person throughout the book, which makes the start of this book daring. If you're getting a fascist vibe from the setup, you're not wrong, and this is intentional on Tesh's part. But Kyr is a true believer at the start of the book, so the first quarter has a protagonist who is sometimes nasty and cruel and who makes some frustratingly bad decisions. Stay with it, though; Tesh knows exactly what she's doing. This is a coming of age story, in a way. Kyr has a lot to learn and a lot to process, and Some Desperate Glory is about that process. But by the middle of part three, halfway through the book, I had absolutely no idea where Tesh was going with the story. She then pulled the rug out from under me, in the best way, at least twice more. Part five of this book is an absolute triumph, the payoff for everything that's happened over the course of the novel, and there is no way I could have predicted it in advance. It was deeply satisfying in that way where I felt like I learned some things along with the characters, and where the characters find a better ending than I could possibly have worked out myself. Tesh does use some world-building trickery, which is at its most complicated in part four. That was the one place where I can point to a few chapters where I thought the world-building got a bit too convenient in order to enable the plot. But it also allows for some truly incredible character work. I can't describe that in detail because it would be a major spoiler, but it's one of my favorite tropes in fiction and Tesh pulls it off beautifully. The character growth and interaction in this book is just so good: deep and complicated and nuanced and thoughtful in a way that revises reader impressions of earlier chapters. The other great thing about this book is that for a 400+ page novel, it moves right along. Both plot and character development is beautifully paced with only a few lulls. Tesh also doesn't belabor conversations. This is a book that provides just the right amount of context for the reader to fully understand what's going on, and then trusts the reader to be following along and moves straight to the next twist. That makes it propulsively readable. I had so much trouble putting this book down at any time during the second half. I can't give any specifics, again because of spoilers, but this is not just a character story. Some Desperate Glory has strong opinions on how to ethically approach the world, and those ethics are at the center of the plot. Unlike a lot of books with a moral stance, though, this novel shows the difficulty of the work of deriving that moral stance. I have rarely read a book that more perfectly captures the interior experience of changing one's mind with all of its emotional difficulty and internal resistance. Tesh provides all the payoff I was looking for as a reader, but she never makes it easy or gratuitous (with the arguable exception of one moment at the very end of the book that I think some people will dislike but that I personally needed). This is truly great stuff, probably the best science fiction novel that I've read in several years. Since I read it (I'm late on reviews again), I've pushed it on several other people, and I've not had a miss yet. The subject matter is pretty heavy, and this book also uses several tropes that I personally adore and am therefore incapable of being objective about, but with those caveats, this gets my highest possible recommendation. Some Desperate Glory is a complete story in one novel with a definite end, although I love these characters so much that I'd happily read their further adventures, even if those are thematically unnecessary. Content warnings: Uh, a lot. Genocide, suicide, sexual assault, racism, sexism, homophobia, misgendering, and torture, and I'm probably forgetting a few things. Tesh doesn't linger on these long, but most of them are on-screen. You may have to brace yourself for this one. Rating: 10 out of 10

16 August 2023

Sam Hartman: A First Exercise with AI Training

Taking a hands-on low-level approach to learning AI has been incredibly rewarding. I wanted to create an achievable task that would motivate me to learn the tools and get practical experience training and using large language models. Just at the point when I was starting to spin up GPU instances, Llama2 was released to the public. So I elected to start with that model. As I mentioned, I m interested in exploring how sex-positive AI can help human connection in positive ways. For that reason, I suspected that Llama2 might not produce good results without training: some of Meta s safety goals run counter to what I m trying to explore. I suspected that there might be more attention paid to safety in the chat variants of Llama2 rather than the text generation variants, and working against that might be challenging for a first project, so I started with Llama-2-13b as a base. Preparing a Dataset I elected to generate a fine tuning dataset using fiction. Long term, that might not be a good fit. But I ve always wanted to understand how an LLM s tone is adjusted how you get an LLM to speak in a different voice. So much of fine tuning focuses on examples where a given prompt produces a particular result. I wanted to understand how to bring in data that wasn t structured as prompts. The Huggingface course actually gives an example of how to adjust a model set up for masked language modeling trained on wikitext to be better at predicting the vocabulary of movie reviews. There though, doing sample breaks in the dataset at movie review boundaries makes sense. There s another example of training an LLM from scratch based on a corpus of python code. Between these two examples, I figured out what I needed. It was relatively simple in retrospect: tokenize the whole mess, and treat everything as output. That is, compute loss on all the tokens. Long term, using fiction as a way to adjust how the model responds is likely to be the wrong starting point. However, it maximized focus on aspects of training I did not understand and allowed me to satisfy my curiosity. Rangling the Model I decided to actually try and add additional training to the model directly rather than building an adapter and fine tuning a small number of parameters. Partially this was because I had enough on my mind without understanding how LoRA adapters work. Partially, I wanted to gain an appreciation for the infrastructure complexity of AI training. I have enough of a cloud background that I ought to be able to work on distributed training. (As it turned out, using BitsAndBytes 8-bit optimizer, I was just able to fit my task onto a single GPU). I wasn t even sure that I could make a measurable difference in Llama-2-13b running 890,000 training tokens through a couple of training epochs. As it turned out I had nothing to fear on that front. Getting everything to work was more tricky than I expected. I didn t have an appreciation for exactly how memory intensive training was. The Transformers documentation points out that with typical parameters for mixed-precision training, it takes 18 bytes per model parameter. Using bfloat16 training and an 8-bit optimizer was enough to get things to fit. Of course then I got to play with convergence. My initial optimizer parameters caused the model to diverge, and before I knew it, my model had turned to NAN, and would only output newlines. Oops. But looking back over the logs, watching what happened to the loss, and looking at the math in the optimizer to understand how I ended up getting something that rounded to a divide by zero gave me a much better intuition for what was going on. The results. This time around I didn t do anything in the way of quantitative analysis of what I achieved. Empirically I definitely changed the tone of the model. The base Llama-2 model tends to steer away from sexual situations. It s relatively easy to get it to talk about affection and sometimes attraction. Unsurprisingly, given the design constraints, it takes a bit to get it to wonder into sexual situations. But if you hit it hard enough with your prompt, it will go there, and the results are depressing. At least for prompts I used, it tended to view sex fairly negatively. It tended to be less coherent than with other prompts. One inference managed to pop out in the middle of some text that wasn t hanging together well, Chapter 7 - Rape. With my training, I did manage to achieve my goal of getting the model to use more positive language and emotional signaling when talking about sexual situations. More importantly, I gained a practical understanding of many ways training can go wrong.

There were overfitting problems: names of characters from my dataset got more attention than I wished they did. As a model for interacting with some of the universes I used as input, that was kind of cool, but if I was looking to just adjust how the model talked about intimate situations, I massively got things to be too specific.
I gained a new appreciation for how easy it is to trigger catastrophic forgetting.
I begin to appreciate how this sort of unsupervised training could be best paired with supervised training to help correct model confusion. Playing with the model, I often ran into cases where my reaction was like Well, I don t want to train it to give that response, but if it ever does wander into this part of the state space, I d like to at least get it to respond more naturally. And I think I understand how to approach that either with custom loss functions or manipulating which tokens compute loss and which ones do not.
And of course realized I need to learn a lot about sanitizing and preparing datasets.

A lot of articles I ve been reading about training make more sense. I have better intuition for why you might want to do training a certain way, or why mechanisms for countering some problem will be important. Future Activities:

Look into LoRA adapters; having understood what happens when you manipulate the model directly, I can now move on to intelligent solutions.
Look into various mechanisms for rewards and supervised training.
See how hard it is to train a chat based model out of some of its safety constraints.
Construct datasets; possibly looking at sources like relationship questions/advice.

comments

12 July 2023

Reproducible Builds: Reproducible Builds in June 2023

Welcome to the June 2023 report from the Reproducible Builds project In our reports, we outline the most important things that we have been up to over the past month. As always, if you are interested in contributing to the project, please visit our Contribute page on our website.

We are very happy to announce the upcoming Reproducible Builds Summit which set to take place from October 31st November 2nd 2023, in the vibrant city of Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. Our aim is to create an inclusive space that fosters collaboration, innovation and problem-solving. We are thrilled to host the seventh edition of this exciting event, following the success of previous summits in various iconic locations around the world, including Venice, Marrakesh, Paris, Berlin and Athens. If you re interesting in joining us this year, please make sure to read the event page] which has more details about the event and location. (You may also be interested in attending PackagingCon 2023 held a few days before in Berlin.)

This month, Vagrant Cascadian will present at FOSSY 2023 on the topic of Breaking the Chains of Trusting Trust:

Corrupted build environments can deliver compromised cryptographically signed binaries. Several exploits in critical supply chains have been demonstrated in recent years, proving that this is not just theoretical. The most well secured build environments are still single points of failure when they fail. [ ] This talk will focus on the state of the art from several angles in related Free and Open Source Software projects, what works, current challenges and future plans for building trustworthy toolchains you do not need to trust.

Hosted by the Software Freedom Conservancy and taking place in Portland, Oregon, FOSSY aims to be a community-focused event: Whether you are a long time contributing member of a free software project, a recent graduate of a coding bootcamp or university, or just have an interest in the possibilities that free and open source software bring, FOSSY will have something for you . More information on the event is available on the FOSSY 2023 website, including the full programme schedule.

Marcel Fourn , Dominik Wermke, William Enck, Sascha Fahl and Yasemin Acar recently published an academic paper in the 44th IEEE Symposium on Security and Privacy titled It s like flossing your teeth: On the Importance and Challenges of Reproducible Builds for Software Supply Chain Security . The abstract reads as follows:

The 2020 Solarwinds attack was a tipping point that caused a heightened awareness about the security of the software supply chain and in particular the large amount of trust placed in build systems. Reproducible Builds (R-Bs) provide a strong foundation to build defenses for arbitrary attacks against build systems by ensuring that given the same source code, build environment, and build instructions, bitwise-identical artifacts are created.

However, in contrast to other papers that touch on some theoretical aspect of reproducible builds, the authors paper takes a different approach. Starting with the observation that much of the software industry believes R-Bs are too far out of reach for most projects and conjoining that with a goal of to help identify a path for R-Bs to become a commonplace property , the paper has a different methodology:

We conducted a series of 24 semi-structured expert interviews with participants from the Reproducible-Builds.org project, and iterated on our questions with the reproducible builds community. We identified a range of motivations that can encourage open source developers to strive for R-Bs, including indicators of quality, security benefits, and more efficient caching of artifacts. We identify experiences that help and hinder adoption, which heavily include communication with upstream projects. We conclude with recommendations on how to better integrate R-Bs with the efforts of the open source and free software community.

A PDF of the paper is now available, as is an entry on the CISPA Helmholtz Center for Information Security website and an entry under the TeamUSEC Human-Centered Security research group.

On our mailing list this month:

Martin Monperrus asked whether there are any project[s] where reproducibility is checked in a continuous integration pipeline? which received a number of replies from various projects.
Vagrant Cascadian mentioned that Packaging Con 2023 is being held in Berlin, the weekend before the Reproducible Builds summit later this year. In particular, Vagrant noticed that the Call for Proposals (CFP) closes at the end of July.
Larry Doolittle was searching Usenet archives and discovered a thread from December 1999 titled Time independent checksum(cksum) on comp.unix.programming. Larry notes that it starts with Jayan asking about comparing binaries that might have difference in their embedded timestamps (that is, perhaps, Foreshadowing diffoscope, amiright? ) and goes on to observe that:

The antagonist is David Schwartz, who correctly says There are dozens of complex reasons why what seems to be the same sequence of operations might produce different end results, but goes on to say I totally disagree with your general viewpoint that compilers must provide for reproducability [sic]. Dwight Tovey and I (Larry Doolittle) argue for reproducible builds. I assert Any program especially a mission-critical program like a compiler that cannot reproduce a result at will is broken. Also it s commonplace to take a binary from the net, and check to see if it was trojaned by attempting to recreate it from source.

Lastly, there were a few changes to our website this month too, including Bernhard M. Wiedemann adding a simplified Rust example to our documentation about the SOURCE_DATE_EPOCH environment variable [ ], Chris Lamb made it easier to parse our summit announcement at a glance [ ], Mattia Rizzolo added the summit announcement at a glance [ ] itself [ ][ ][ ] and Rahul Bajaj added a taxonomy of variations in build environments [ ].

Distribution work 27 reviews of Debian packages were added, 40 were updated and 8 were removed this month adding to our knowledge about identified issues. A new `randomness_in_documentation_generated_by_mkdocs` toolchain issue was added by Chris Lamb [ ], and the `deterministic` flag on the `paths_vary_due_to_usrmerge` issue as we are not currently testing `usrmerge` issues [ ] issues.
Roland Clobus posted his 18th update of the status of reproducible Debian ISO images on our mailing list. Roland reported that all major desktops build reproducibly with `bullseye`, `bookworm`, `trixie` and `sid` , but he also mentioned amongst many changes that not only are the `non-free` images being built (and are reproducible) but that the live images are generated officially by Debian itself. [ ]
Jan-Benedict Glaw noticed a problem when building NetBSD for the VAX architecture. Noting that Reproducible builds [are] probably not as reproducible as we thought , Jan-Benedict goes on to describe that when two builds from different source directories won t produce the same result and adds various notes about sub-optimal handling of the `CFLAGS` environment variable. [ ]
F-Droid added 21 new reproducible apps in June, resulting in a new record of 145 reproducible apps in total. [ ]. (This page now sports missing data for March May 2023.) F-Droid contributors also reported an issue with broken resources in APKs making some builds unreproducible. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE

Upstream patches

Bernhard M. Wiedemann:

`bcachefs` (sort find / filesys)

`build-compare` (reports files as identical)

`build-time` (toolchain date)

`cockpit` (merged, gzip mtime)

`gcc13` (gcc13 toolchain LTO parallelism)

`ghc-rpm-macros` (toolchain parallelism)

`golangcli-lint` (date)

`gutenprint` (date+time)

`mage` (date (golang))

`mumble` (filesys)

`pcr` (date)

`python-nss` (drop sphinx .doctrees)

`python310` (merged, bisected+backported)

`warpinator` (merged, date)

`xroachng` (date)

Chris Lamb:

#1037159 filed against `elinks`.

#1037205 filed against `multipath-tools`.

#1037216 filed against `mkdocstrings-python-handlers`.

#1038730 filed against `fribidi`.

#1038957 filed against `jtreg7`.

#1039932 filed against `python-bitstring` (forwarded upstream).

Vagrant Cascadian:

#1037235 filed against `gradle-kotlin-dsl`.

#1037240 filed against `libsdl-console`.

#1037267 filed against `kawari8`.

#1037269 filed against `freetds`.

#1037272 filed against `gbrowse`.

#1037276 filed against `bglibs`.

#1037277 filed against `advi`.

#1037278 filed against `afterstep`.

#1037280 filed against `simstring`.

#1037296 filed against `manderlbot`.

#1037298 filed against `erlang-proper`.

#1037300 filed against `comedilib`.

#1037309 filed against `libint`.

#1037427 filed against `newlib`.

#1038908 filed against `binutils-msp430`.

#1038970 & #1038971 filed against `c-munipack`.

#1039044 filed against `python-marshmallow-sqlalchemy`.

#1039457 filed against `mplayer`.

#1039493 & #1039500 filed against `menu`.

#1039506 filed against `mini-buildd`.

#1039610 filed against `pnetcdf`.

#1039617 & #1039618 filed against `liblopsub`.

#1039619 filed against `wcc`.

#1039623 filed against `shotcut`.

#1039624 filed against `icu`.

#1039741 filed against `libapache-poi-java`.

#1039808 filed against `atf`.

#1039865 filed against `valgrind`.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In June, a number of changes were made by Holger Levsen, including:

Additions to a (relatively) new Documented Jenkins Maintenance (djm) script to automatically shrink a cache & save a backup of old data [ ], automatically split out previous months data from logfiles into specially-named files [ ], prevent concurrent remote logfile fetches by using a lock file [ ] and to add/remove various debugging statements [ ].

Updates to the automated system health checks to, for example, to correctly detect new kernel warnings due to a wording change [ ] and to explicitly observe which old/unused kernels should be removed [ ]. This was related to an improvement so that various kernel issues on Ubuntu-based nodes are automatically fixed. [ ]

Holger and Vagrant Cascadian updated all thirty-five hosts running Debian on the `amd64`, `armhf`, and `i386` architectures to Debian bookworm, with the exception of the Jenkins host itself which will be upgraded after the release of Debian 12.1. In addition, Mattia Rizzolo updated the email configuration for the `@reproducible-builds.org` domain to correctly accept incoming mails from `jenkins.debian.net` [ ] as well as to set up DomainKeys Identified Mail (DKIM) signing [ ]. And working together with Holger, Mattia also updated the Jenkins configuration to start testing Debian trixie which resulted in stopped testing Debian buster. And, finally, Jan-Benedict Glaw contributed patches for improved NetBSD testing.

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Mailing list: `rb-general@lists.reproducible-builds.org`

Twitter: @ReproBuilds

21 April 2023

Steve Kemp: Managing header-spacing in markdown/org-mode files

It seems I'm having a theme recently on this blog, of making emacs-related posts. Here's another. I write a bunch of stuff in markdown, such as my emacs init-file, blog-posts and other documents. I try to be quite consistent about vertical spacing, for example a post might look like this:

# header1
Some top-level stuff.
## header2
Some more details.
## header2
Some more things on a related topic.
# header2

Here I'm trying to breakup sections, so there is a "big gap" between H1 and smaller gaps between the lesser-level headings. After going over my init file recently, making some changes, I noticed that the spacing was not at all consistent. So I figured "How hard could it be to recognize headers and insert/remove newlines before them?" A trivial regexp search for "^#" identifies headers, and that counting the "#" characters lets you determine their depth. From their removing any previous newlines is the work of a moment, and inserting the appropriate number to ensure consistency is simple. I spent 15 minutes writing the initial implementation, which was markdown-specific, then another 30 minutes adding support for org-mode files - because my work-diary is written using the org-diary package (along with other helpers, such as the org-tag-cloud. Anyway the end result is that now when I save a markdown/org file the headers are updated automatically:

https://github.com/skx/dotfiles/blob/master/.emacs.d/tools/vertical-space-cleanup.el

6 February 2023

Reproducible Builds: Reproducible Builds in January 2023

Welcome to the first report for 2023 from the Reproducible Builds project! In these reports we try and outline the most important things that we have been up to over the past month, as well as the most important things in/around the community. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws can be deliberately introduced during compilation and distribution of the software that we run on our devices. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.

News In a curious turn of events, GitHub first announced this month that the checksums of various Git archives may be subject to change, specifically that because:
the default compression for Git archives has recently changed. As result, archives downloaded from GitHub may have different checksums even though the contents are completely unchanged.
This change (which was brought up on our mailing list last October) would have had quite wide-ranging implications for anyone wishing to validate and verify downloaded archives using cryptographic signatures. However, GitHub reversed this decision, updating their original announcement with a message that We are reverting this change for now. More details to follow. It appears that this was informed in part by an in-depth discussion in the GitHub Community issue tracker.
The Bundesamt f r Sicherheit in der Informationstechnik (BSI) (trans: The Federal Office for Information Security ) is the agency in charge of managing computer and communication security for the German federal government. They recently produced a report that touches on attacks on software supply-chains (Supply-Chain-Angriff). (German PDF)
Contributor Seb35 updated our website to fix broken links to Tails Git repository [ ][ ], and Holger updated a large number of pages around our recent summit in Venice [ ][ ][ ][ ].
Noak J nsson has written an interesting paper entitled The State of Software Diversity in the Software Supply Chain of Ethereum Clients. As the paper outlines:
In this report, the software supply chains of the most popular Ethereum clients are cataloged and analyzed. The dependency graphs of Ethereum clients developed in Go, Rust, and Java, are studied. These client are Geth, Prysm, OpenEthereum, Lighthouse, Besu, and Teku. To do so, their dependency graphs are transformed into a unified format. Quantitative metrics are used to depict the software supply chain of the blockchain. The results show a clear difference in the size of the software supply chain required for the execution layer and consensus layer of Ethereum.

Yongkui Han posted to our mailing list discussing making reproducible builds & GitBOM work together without gitBOM-ID embedding. GitBOM (now renamed to OmniBOR) is a project to enable automatic, verifiable artifact resolution across today s diverse software supply-chains [ ]. In addition, Fabian Keil wrote to us asking whether anyone in the community would be at Chemnitz Linux Days 2023, which is due to take place on 11th and 12th March (event info). Separate to this, Akihiro Suda posted to our mailing list just after the end of the month with a status report of bit-for-bit reproducible Docker/OCI images. As Akihiro mentions in their post, they will be giving a talk at FOSDEM in the Containers devroom titled Bit-for-bit reproducible builds with `Dockerfile` and that my talk will also mention how to pin the apt/dnf/apk/pacman packages with my `repro-get` tool.
The extremely popular Signal messenger app added upstream support for the `SOURCE_DATE_EPOCH` environment variable this month. This means that release tarballs of the Signal desktop client do not embed nondeterministic release information. [ ][ ]

Distribution work

F-Droid & Android There was a very large number of changes in the F-Droid and wider Android ecosystem this month: On January 15th, a blog post entitled Towards a reproducible F-Droid was published on the F-Droid website, outlining the reasons why F-Droid signs published APKs with its own keys and how reproducible builds allow using upstream developers keys instead. In particular:
In response to [ ] criticisms, we started encouraging new apps to enable reproducible builds. It turns out that reproducible builds are not so difficult to achieve for many apps. In the past few months we ve gotten many more reproducible apps in F-Droid than before. Currently we can t highlight which apps are reproducible in the client, so maybe you haven t noticed that there are many new apps signed with upstream developers keys.
(There was a discussion about this post on Hacker News.) In addition:

F-Droid added 13 apps published with reproducible builds this month. [ ]

FC Stegerman outlined a bug where `baseline.profm` files are nondeterministic, developed a workaround, and provided all the details required for a fix. As they note, this issue has now been fixed but the fix is not yet part of an official Android Gradle plugin release.

GitLab user Parwor discovered that the number of CPU cores can affect the reproducibility of `.dex` files. [ ]

FC Stegerman also announced the `0.2.0` and `0.2.1` releases of reproducible-apk-tools, a suite of tools to help make `.apk` files reproducible. Several new subcommands and scripts were added, and a number of bugs were fixed as well [ ][ ]. They also updated the F-Droid website to improve the reproducibility-related documentation. [ ][ ]

On the F-Droid issue tracker, FC Stegerman discussed reproducible builds with one of the developers of the Threema messenger app and reported that Android SDK build-tools `31.0.0` and `32.0.0` (unlike earlier and later versions) have a `zipalign` command that produces incorrect padding.

A number of bugs related to reproducibility were discovered in Android itself. Firstly, the non-deterministic order of `.zip` entries in `.apk` files [ ] and then newline differences between building on Windows versus Linux that can make builds not reproducible as well. [ ] (Note that these links may require a Google account to view.)

And just before the end of the month, FC Stegerman started a thread on our mailing list on the topic of hiding data/code in APK embedded signatures which has been made possible by the Android APK Signature Scheme v2/v3. As part of this, they made an Android app that reads the APK Signing block of its own APK and extracts a payload in order to alter its behaviour called sigblock-code-poc.

Debian As mentioned in last month s report, Vagrant Cascadian has been organising a series of online sprints in order to clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). During January, a sprint took place on the 10th, resulting in the following uploads:

Chris Lamb:

`critcl` (#963600)

`log4cpp` (#1020662)

`logapp` (#1010845)

`nanomsg` (#1001853)

Holger Levsen:

`netkit-rsh` (#1020798)

`wcwidth` (#1005408)

Vagrant Cascadian:

`mc` (#828683)

`gtk-sharp3` (#989965 & #989966)

During this sprint, Holger Levsen filed Debian bug #1028615 to request that the `tracker.debian.org` service display results of reproducible rebuilds, not just reproducible CI results. Elsewhere in Debian, strip-nondeterminism is our tool to remove specific non-deterministic results from a completed build. This month, version `1.13.1-1` was uploaded to Debian unstable by Holger Levsen, including a fix by FC Stegerman (obfusk) to update a regular expression for the latest version of `file(1)` [ ]. (#1028892) Lastly, 65 reviews of Debian packages were added, 21 were updated and 35 were removed this month adding to our knowledge about identified issues.

Other distributions In other distributions:

Bernhard M. Wiedemann published another monthly report for reproducibility within openSUSE, as well as a belated report for December 2022.

It was announced that Fedora Rawhide now clamps file modification types to `SOURCE_DATE_EPOCH`. [ ]

Finally, an existing tool called rpmreproduce was (re-)discovered this month, which claims that given a buildinfo file from a RPM package, [it can] generate instructions for attempting to reproduce the binary packages built from the associated source and build information.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions `231`, `232`, `233` and `234` to Debian:

No need for `from future import print_function` import anymore. [ ]

Comment and tidy the `extras_require.json` handling. [ ]

Split inline Python code to generate test `Recommends` into a separate Python script. [ ]

Update `debian/tests/control` after merging support for PyPDF support. [ ]

Correctly catch segfaulting `cd-iccdump` binary. [ ]

Drop some old debugging code. [ ]

Allow ICC tests to (temporarily) fail. [ ]

In addition, FC Stegerman (obfusk) made a number of changes, including:

Updating the `test_text_proper_indentation` test to support the latest version(s) of `file(1)`. [ ]

Use an `extras_require.json` file to store some build/release metadata, instead of accessing the internet. [ ]

Updating an APK-related `file(1)` regular expression. [ ]

On the diffoscope.org website, de-duplicate contributors by e-mail. [ ]

Lastly, Sam James added support for PyPDF version 3 [ ] and Vagrant Cascadian updated a handful of tool references for GNU Guix. [ ][ ]

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. This month, we wrote a large number of such patches, including:

Bernhard M. Wiedemann:

`argparse` (date-related issue)

`asciimatics` (build failure)

`asyncpg` (fails to build in 2032)

`cpython` (fails to build in 2038)

`django` (fails to build in 2038)

`grandorgue` (`.zip`-related issue)

`libarchive` (fails to build in 2038)

`libarchive` (fails to build in 2038)

`librcc` (date)

`mbedtls` (fails to build in 2023)

`mozilla-nss` (fails to build in 2023)

`ocaml-rpm-macros` (fix fallout from an RPM-related change)

`perl HTTP::Cookies` (fails to build in 2038)

`python-aiosmtplib/python-trustme` (fails to build in 2038 due to SSL certificate)

`python-bmap` (fails to build in 2024)

`python-compileall2` (fails to build in 2038)

`taskwarrior` (`python-tasklib` fails to build in 2038)

`taskwarrior` (fix fails to build in 2038)

`wrk` (hash ordering issue)

`xemacs` (fails to build in 2038 stuck)

Chris Lamb:

#1027988 filed against `click`.

#1027992 filed against `towncrier`.

#1028051 filed against `unifrac-tools`.

#1028310 filed against `hamster-time-tracker`.

#1028515 filed against `accel-config`.

#1029295 filed against `python-miio`.

#1029297 filed against `python-graphviz`.

Vagrant Cascadian:

#1029227 filed against `ectrans`.

#1029303 filed against `fiat-ecmwf`.

#1029307 filed against `node-katex`.

#1029800 filed against `aribas`.

#1029801 filed against `tbox`.

#1029807 filed against `borgbackup2`.

#1029809 filed against `dnf-plugins-core`.

#1030057 filed against `refpolicy`.

FC Stegerman:

Several patches for `file(1)` (which is used by reproducible builds tools like diffoscope and strip-nondeterminism) that improve detection of various file formats are now included in the Debian packaging. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In January, the following changes were made by Holger Levsen:

Node changes:

Add three new nodes hosted at the Oregon State University Open Source Lab including integrating them into the DNS, maintenance and monitoring systems. [ ][ ][ ][ ][ ]

Debian-related changes:

Only keep diffoscope s HTML output (ie. no `.json` or `.txt`) for LTS suites and older in order to save diskspace on the Jenkins host. [ ]

Re-create `pbuilder` base less frequently for the `stretch`, `bookworm` and `experimental` suites. [ ]

OpenWrt-related changes:

Add gcc-multilib to `OPENWRT_HOST_PACKAGES` and install it on the nodes that need it. [ ]

Detect more problems in the health check when failing to build OpenWrt. [ ]

Misc changes:

Update the `chroot-run` script to correctly manage `/dev` and `/dev/pts`. [ ][ ][ ]

Update the Jenkins shell monitor script to collect disk stats less frequently [ ] and to include various directory stats. [ ][ ]

Update the real year in the configuration in order to be able to detect whether a node is running in the future or not. [ ]

Bump copyright years in the default page footer. [ ]

In addition, Christian Marangi submitted a patch to build OpenWrt packages with the `V=s` flag to enable debugging. [ ]
If you are interested in contributing to the Reproducible Builds project, please visit the Contribute page on our website. You can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mastodon: @reproducible_builds@fosstodon.org

Mailing list: `rb-general@lists.reproducible-builds.org`

1 February 2023

Simon Josefsson: Apt Archive Transparency: debdistdiff & apt-canary

I ve always found the operation of apt software package repositories to be a mystery. There appears to be a lack of transparency into which people have access to important apt package repositories out there, how the automatic non-human update mechanism is implemented, and what changes are published. I m thinking of big distributions like Ubuntu and Debian, but also the free GNU/Linux distributions like Trisquel and PureOS that are derived from the more well-known distributions. As far as I can tell, anyone who has the OpenPGP private key trusted by a apt-based GNU/Linux distribution can sign a modified Release/InRelease file and if my machine somehow downloads that version of the release file, my machine could be made to download and install packages that the distribution didn t intend me to install. Further, it seems that anyone who has access to the main HTTP server, or any of its mirrors, or is anywhere on the network between them and my machine (when plaintext HTTP is used), can either stall security updates on my machine (on a per-IP basis), or use it to send my machine (again, on a per-IP basis to avoid detection) a modified Release/InRelease file if they had been able to obtain the private signing key for the archive. These are mighty powers that warrant overview. I ve always put off learning about the processes to protect the apt infrastructure, mentally filing it under so many people rely on this infrastructure that enough people are likely to have invested time reviewing and improving these processes . Simultaneous, I ve always followed the more free-software friendly Debian-derived distributions such as gNewSense and have run it on some machines. I ve never put them into serious production use, because the trust issues with their apt package repositories has been a big question mark for me. The enough people part of my rationale for deferring this is not convincing. Even the simple question of is someone updating the apt repository is not easy to understand on a running gNewSense system. At some point in time the gNewSense cron job to pull in security updates from Debian must have stopped working, and I wouldn t have had any good mechanism to notice that. Most likely it happened without any public announcement. I ve recently switched to Trisquel on production machines, and these questions has come back to haunt me. The situation is unsatisfying and I looked into what could be done to improve it. I could try to understand who are the key people involved in each project, and may even learn what hardware component is used, or what software is involved to update and sign apt repositories. Is the server running non-free software? Proprietary BIOS or NIC firmware? Are the GnuPG private keys on disk? Smartcard? TPM? YubiKey? HSM? Where is the server co-located, and who has access to it? I tried to do a bit of this, and discovered things like Trisquel having a DSA1024 key in its default apt trust store (although for fairness, it seems that apt by default does not trust such signatures). However, I m not certain understanding this more would scale to securing my machines against attacks on this infrastructure. Even people with the best intentions, and the state of the art hardware and software, will have problems. To increase my trust in Trisquel I set out to understand how it worked. To make it easier to sort out what the interesting parts of the Trisquel archive to audit further were, I created debdistdiff to produce human readable text output comparing one apt archive with another apt archive. There is a GitLab CI/CD cron job that runs this every day, producing output comparing Trisquel vs Ubuntu and PureOS vs Debian. Working with these output files has made me learn more about how the process works, and I even stumbled upon something that is likely a bug where Trisquel aramo was imported from Ubuntu jammy while it contained a couple of package (e.g., gcc-8, python3.9) that were removed for the final Ubuntu jammy release. After working on auditing the Trisquel archive manually that way, I realized that whatever I could tell from comparing Trisquel with Ubuntu, it would only be something based on a current snapshot of the archives. Tomorrow it may look completely different. What felt necessary was to audit the differences of the Trisquel archive continously. I was quite happy to have developed debdistdiff for one purpose (comparing two different archives like Trisquel and Ubuntu) and discovered that the tool could be used for another purpose (comparing the Trisquel archive at two different points in time). At this time I realized that I needed a log of all different apt archive metadata to be able to produce an audit log of the differences in time for the archive. I create manually curated git-repositories with the Release/InRelease and the Packages files for each architecture/component of the well-known distributions Trisquel, Ubuntu, Debian and PureOS. Eventually I wrote scripts to automate this, which are now published in the debdistget project. At this point, one of the early question about per-IP substitution of Release files were lingering in my mind. However with the tooling I now had available, coming up with a way to resolve this was simple! Merely have apt compute a SHA256 checksum of the just downloaded InRelease file, and see if my git repository had the same file. At this point I started reading the Apt source code, and now I had more doubts about the security of my systems than I ever had before. Oh boy how the name Apt has never before felt more Apt?! Oh well, we must leave some exercises for the students. Eventually I realized I wanted to touch as little of apt code basis as possible, and noticed the SigVerify::CopyAndVerify function called ExecGPGV which called apt-key verify which called GnuPG s gpgv. By setting Apt::Key::gpgvcommand I could get apt-key verify to call another tool than gpgv. See where I m going? I thought wrapping this up would now be trivial but for some reason the hash checksum I computed locally never matched what was on my server. I gave up and started working on other things instead. Today I came back to this idea, and started to debug exactly how the local files looked that I got from apt and how they differed from what I had in my git repositories, that came straight from the apt archives. Eventually I traced this back to SplitClearSignedFile which takes an InRelease file and splits it into two files, probably mimicking the (old?) way of distributing both Release and Release.gpg. So the clearsigned InRelease file is split into one cleartext file (similar to the Release file) and one OpenPGP signature file (similar to the Release.gpg file). But why didn t the cleartext variant of the InRelease file hash to the same value as the hash of the Release file? Sadly they differ by the final newline. Having solved this technicality, wrapping the pieces up was easy, and I came up with a project apt-canary that provides a script apt-canary-gpgv that verify the local apt release files against something I call a apt canary witness file stored at a URL somewhere. I m now running apt-canary on my Trisquel aramo laptop, a Trisquel nabia server, and Talos II ppc64el Debian machine. This means I have solved the per-IP substitution worries (or at least made them less likely to occur, having to send the same malicious release files to both GitLab and my system), and allow me to have an audit log of all release files that I actually use for installing and downloading packages. What do you think? There are clearly a lot of work and improvements to be made. This is a proof-of-concept implementation of an idea, but instead of refining it until perfection and delaying feedback, I wanted to publish this to get others to think about the problems and various ways to resolve them. Btw, I m going to be at FOSDEM 23 this weekend, helping to manage the Security Devroom. Catch me if you want to chat about this or other things. Happy Hacking!

28 December 2022

Chris Lamb: Favourite books of 2022: Classics

As a follow-up to yesterday's post detailing my favourite works of fiction from 2022, today I'll be listing my favourite fictional works that are typically filed under classics. Books that just missed the cut here include: E. M. Forster's A Room with a View (1908) and his later A Passage to India (1913), both gently nudged out by Forster's superb Howard's End (see below). Giuseppe Tomasi di Lampedusa's The Leopard (1958) also just missed out on a write-up here, but I can definitely recommend it to anyone interested in reading a modern Italian classic.

War and Peace (1867) Leo Tolstoy It's strange to think that there is almost no point in reviewing this novel: who hasn't heard of War and Peace? What more could possibly be said about it now? Still, when I was growing up, War and Peace was always the stereotypical example of the 'impossible book', and even start it was, at best, a pointless task, and an act of hubris at worst. And so there surely exists a parallel universe in which I never have and will never will read the book... Nevertheless, let us try to set the scene. Book nine of the novel opens as follows:

On the twelfth of June, 1812, the forces of Western Europe crossed the Russian frontier and war began; that is, an event took place opposed to human reason and to human nature. Millions of men perpetrated against one another such innumerable crimes, frauds, treacheries, thefts, forgeries, issues of false money, burglaries, incendiarisms and murders as in whole centuries are not recorded in the annals of all the law courts of the world, but which those who committed them did not at the time regard as being crimes. What produced this extraordinary occurrence? What were its causes? [ ] The more we try to explain such events in history reasonably, the more unreasonable and incomprehensible they become to us.

Set against the backdrop of the Napoleonic Wars and Napoleon's invasion of Russia, War and Peace follows the lives and fates of three aristocratic families: The Rostovs, The Bolkonskys and the Bezukhov's. These characters find themselves situated athwart (or against) history, and all this time, Napoleon is marching ever closer to Moscow. Still, Napoleon himself is essentially just a kind of wallpaper for a diverse set of personal stories touching on love, jealousy, hatred, retribution, naivety, nationalism, stupidity and much much more. As Elif Batuman wrote earlier this year, "the whole premise of the book was that you couldn t explain war without recourse to domesticity and interpersonal relations." The result is that Tolstoy has woven an incredibly intricate web that connects the war, noble families and the everyday Russian people to a degree that is surprising for a book started in 1865. Tolstoy's characters are probably timeless (especially the picaresque adventures and constantly changing thoughts Pierre Bezukhov), and the reader who has any social experience will immediately recognise characters' thoughts and actions. Some of this is at a 'micro' interpersonal level: for instance, take this example from the elegant party that opens the novel:

Each visitor performed the ceremony of greeting this old aunt whom not one of them knew, not one of them wanted to know, and not one of them cared about. The aunt spoke to each of them in the same words, about their health and her own and the health of Her Majesty, who, thank God, was better today. And each visitor, though politeness prevented his showing impatience, left the old woman with a sense of relief at having performed a vexatious duty and did not return to her the whole evening.

But then, some of the focus of the observations are at the 'macro' level of the entire continent. This section about cities that feel themselves in danger might suffice as an example:

At the approach of danger, there are always two voices that speak with equal power in the human soul: one very reasonably tells a man to consider the nature of the danger and the means of escaping it; the other, still more reasonably, says that it is too depressing and painful to think of the danger, since it is not in man s power to foresee everything and avert the general course of events, and it is therefore better to disregard what is painful till it comes and to think about what is pleasant. In solitude, a man generally listens to the first voice, but in society to the second.

And finally, in his lengthy epilogues, Tolstoy offers us a dissertation on the behaviour of large organisations, much of it through engagingly witty analogies. These epilogues actually turn out to be an oblique and sarcastic commentary on the idiocy of governments and the madness of war in general. Indeed, the thorough dismantling of the 'great man' theory of history is a common theme throughout the book:

During the whole of that period [of 1812], Napoleon, who seems to us to have been the leader of all these movements as the figurehead of a ship may seem to a savage to guide the vessel acted like a child who, holding a couple of strings inside a carriage, thinks he is driving it. [ ] Why do [we] all speak of a military genius ? Is a man a genius who can order bread to be brought up at the right time and say who is to go to the right and who to the left? It is only because military men are invested with pomp and power and crowds of sychophants flatter power, attributing to it qualities of genius it does not possess.

Unlike some other readers, I especially enjoyed these diversions into the accounting and workings of history, as well as our narrow-minded way of trying to 'explain' things in a singular way:

When an apple has ripened and falls, why does it fall? Because of its attraction to the earth, because its stalk withers, because it is dried by the sun, because it grows heavier, because the wind shakes it, or because the boy standing below wants to eat it? Nothing is the cause. All this is only the coincidence of conditions in which all vital organic and elemental events occur. And the botanist who finds that the apple falls because the cellular tissue decays and so forth is equally right with the child who stands under the tree and says the apple fell because he wanted to eat it and prayed for it.

Given all of these serious asides, I was also not expecting this book to be quite so funny. At the risk of boring the reader with citations, take this sarcastic remark about the ineptness of medicine men:

After his liberation, [Pierre] fell ill and was laid up for three months. He had what the doctors termed 'bilious fever.' But despite the fact that the doctors treated him, bled him and gave him medicines to drink he recovered.

There is actually a multitude of remarks that are not entirely complimentary towards Russian medical practice, but they are usually deployed with an eye to the human element involved rather than simply to the detriment of a doctor's reputation "How would the count have borne his dearly loved daughter s illness had he not known that it was costing him a thousand rubles?" Other elements of note include some stunning set literary pieces, such as when Prince Andrei encounters a gnarly oak tree under two different circumstances in his life, and when Nat sha's 'Russian' soul is awakened by the strains of a folk song on the balalaika. Still, despite all of these micro- and macro-level happenings, for a long time I felt that something else was going on in War and Peace. It was difficult to put into words precisely what it was until I came across this passage by E. M. Forster:

After one has read War and Peace for a bit, great chords begin to sound, and we cannot say exactly what struck them. They do not arise from the story [and] they do not come from the episodes nor yet from the characters. They come from the immense area of Russia, over which episodes and characters have been scattered, from the sum-total of bridges and frozen rivers, forests, roads, gardens and fields, which accumulate grandeur and sonority after we have passed them. Many novelists have the feeling for place, [but] very few have the sense of space, and the possession of it ranks high in Tolstoy s divine equipment. Space is the lord of War and Peace, not time.

'Space' indeed. Yes, potential readers should note the novel's great length, but the 365 chapters are actually remarkably short, so the sensation of reading it is not in the least overwhelming. And more importantly, once you become familiar with its large cast of characters, it is really not a difficult book to follow, especially when compared to the other Russian classics. My only regret is that it has taken me so long to read this magnificent novel and that I might find it hard to find time to re-read it within the next few years.

Coming Up for Air (1939) George Orwell It wouldn't be a roundup of mine without at least one entry from George Orwell, and, this year, that place is occupied by a book I hadn't haven't read in almost two decades Still, the George Bowling of Coming Up for Air is a middle-aged insurance salesman who lives in a distinctly average English suburban row house with his nuclear family. One day, after winning some money on a bet, he goes back to the village where he grew up in order to fish in a pool he remembers from thirty years before. Less important than the plot, however, is both the well-observed remarks and scathing criticisms that Bowling has of the town he has returned to, combined with an ominous sense of foreboding before the Second World War breaks out. At several times throughout the book, George's placid thoughts about his beloved carp pool are replaced by racing, anxious thoughts that overwhelm his inner peace:

War is coming. In 1941, they say. And there'll be plenty of broken crockery, and little houses ripped open like packing-cases, and the guts of the chartered accountant's clerk plastered over the piano that he's buying on the never-never. But what does that kind of thing matter, anyway? I'll tell you what my stay in Lower Binfield had taught me, and it was this. IT'S ALL GOING TO HAPPEN. All the things you've got at the back of your mind, the things you're terrified of, the things that you tell yourself are just a nightmare or only happen in foreign countries. The bombs, the food-queues, the rubber truncheons, the barbed wire, the coloured shirts, the slogans, the enormous faces, the machine-guns squirting out of bedroom windows. It's all going to happen. I know it - at any rate, I knew it then. There's no escape. Fight against it if you like, or look the other way and pretend not to notice, or grab your spanner and rush out to do a bit of face-smashing along with the others. But there's no way out. It's just something that's got to happen.

Already we can hear psychological madness that underpinned the Second World War. Indeed, there is no great story in Coming Up For Air, no wonderfully empathetic characters and no revelations or catharsis, so it is impressive that I was held by the descriptions, observations and nostalgic remembrances about life in modern Lower Binfield, its residents, and how it has changed over the years. It turns out, of course, that George's beloved pool has been filled in with rubbish, and the village has been perverted by modernity beyond recognition. And to cap it off, the principal event of George's holiday in Lower Binfield is an accidental bombing by the British Royal Air Force. Orwell is always good at descriptions of awful food, and this book is no exception:

The frankfurter had a rubber skin, of course, and my temporary teeth weren't much of a fit. I had to do a kind of sawing movement before I could get my teeth through the skin. And then suddenly pop! The thing burst in my mouth like a rotten pear. A sort of horrible soft stuff was oozing all over my tongue. But the taste! For a moment I just couldn't believe it. Then I rolled my tongue around it again and had another try. It was fish! A sausage, a thing calling itself a frankfurter, filled with fish! I got up and walked straight out without touching my coffee. God knows what that might have tasted of.

Many other tell-tale elements of Orwell's fictional writing are in attendance in this book as well, albeit worked out somewhat less successfully than elsewhere in his oeuvre. For example, the idea of a physical ailment also serving as a metaphor is present in George's false teeth, embodying his constant preoccupation with his ageing. (Readers may recall Winston Smith's varicose ulcer representing his repressed humanity in Nineteen Eighty-Four). And, of course, we have a prematurely middle-aged protagonist who almost but not quite resembles Orwell himself. Given this and a few other niggles (such as almost all the women being of the typical Orwell 'nagging wife' type), it is not exactly Orwell's magnum opus. But it remains a fascinating historical snapshot of the feeling felt by a vast number of people just prior to the Second World War breaking out, as well as a captivating insight into how the process of nostalgia functions and operates.

Howards End (1910) E. M. Forster Howards End begins with the following sentence:

One may as well begin with Helen s letters to her sister.

In fact, "one may as well begin with" my own assumptions about this book instead. I was actually primed to consider Howards End a much more 'Victorian' book: I had just finished Virginia Woolf's Mrs Dalloway and had found her 1925 book at once rather 'modern' but also very much constrained by its time. I must have then unconsciously surmised that a book written 15 years before would be even more inscrutable, and, with its Victorian social mores added on as well, Howards End would probably not undress itself so readily in front of the reader. No doubt there were also the usual expectations about 'the classics' as well. So imagine my surprise when I realised just how inordinately affable and witty Howards End turned out to be. It doesn't have that Wildean shine of humour, of course, but it's a couple of fields over in the English countryside, perhaps abutting the more mordant social satires of the earlier George Orwell novels (see Coming Up for Air above). But now let us return to the story itself. Howards End explores class warfare, conflict and the English character through a tale of three quite different families at the beginning of the twentieth century: the rich Wilcoxes; the gentle & idealistic Schlegels; and the lower-middle class Basts. As the Bloomsbury Group Schlegel sisters desperately try to help the Basts and educate the rich but close-minded Wilcoxes, the three families are drawn ever closer and closer together. Although the whole story does, I suppose, revolve around the house in the title (which is based on the Forster's own childhood home), Howards End is perhaps best described as a comedy of manners or a novel that shows up the hypocrisy of people and society. In fact, it is surprising how little of the story actually takes place in the eponymous house, with the overwhelming majority of the first half of the book taking place in London. But it is perhaps more illuminating to remark that the Howards End of the book is a house that the Wilcoxes who own it at the start of the novel do not really need or want. What I particularly liked about Howards End is how the main character's ideals alter as they age, and subsequently how they find their lives changing in different ways. Some of them find themselves better off at the end, others worse. And whilst it is also surprisingly funny, it still manages to trade in heavier social topics as well. This is apparent in the fact that, although the characters themselves are primarily in charge of their own destinies, their choices are still constrained by the changing world and shifting sense of morality around them. This shouldn't be too surprising: after all, Forster's novel was published just four years before the Great War, a distinctly uncertain time. Not for nothing did Virginia Woolf herself later observe that "on or about December 1910, human character changed" and that "all human relations have shifted: those between masters and servants, husbands and wives, parents and children." This process can undoubtedly be seen rehearsed throughout Forster's Howards End, and it's a credit to the author to be able to capture it so early on, if not even before it was widespread throughout Western Europe. I was also particularly taken by Forster's fertile use of simile. An extremely apposite example can be found in the description Tibby Schlegel gives of his fellow Cambridge undergraduates. Here, Timmy doesn't want to besmirch his lofty idealisation of them with any banal specificities, and wishes that the idea of them remain as ideal Platonic forms instead. Or, as Forster puts it, to Timmy it is if they are "pictures that must not walk out of their frames." Wilde, at his most weakest, is 'just' style, but Forster often deploys his flair for a deeper effect. Indeed, when you get to the end of this section mentioning picture frames, you realise Forster has actually just smuggled into the story a failed attempt on Tibby's part to engineer an anonymous homosexual encounter with another undergraduate. It is a credit to Forster's sleight-of-hand that you don't quite notice what has just happened underneath you and that the books' reticence to honestly describe what has happened is thus structually analogus Tibby's reluctance to admit his desires to himself. Another layer to the character of Tibby (and the novel as a whole) is thereby introduced without the imposition of clumsy literary scaffolding. In a similar vein, I felt very clever noticing the arch reference to Debussy's Pr lude l'apr s-midi d'un faune until I realised I just fell into the trap Forster set for the reader in that I had become even more like Tibby in his pseudo-scholarly views on classical music. Finally, I enjoyed that each chapter commences with an ironic and self-conscious bon mot about society which is only slightly overblown for effect. Particularly amusing are the ironic asides on "women" that run through the book, ventriloquising the narrow-minded views of people like the Wilcoxes. The omniscient and amiable narrator of the book also recalls those ironically distant voiceovers from various French New Wave films at times, yet Forster's narrator seems to have bigger concerns in his mordant asides: Forster seems to encourage some sympathy for all of the characters even the more contemptible ones at their worst moments. Highly recommended, as are Forster's A Room with a View (1908) and his slightly later A Passage to India (1913).

The Good Soldier (1915) Ford Madox Ford The Good Soldier starts off fairly simply as the narrator's account of his and his wife's relationship with some old friends, including the eponymous 'Good Soldier' of the book's title. It's an experience to read the beginning of this novel, as, like any account of endless praise of someone you've never met or care about, the pages of approving remarks about them appear to be intended to wash over you. Yet as the chapters of The Good Soldier go by, the account of the other characters in the book gets darker and darker. Although the author himself is uncritical of others' actions, your own critical faculties are slowgrly brought into play, and you gradully begin to question the narrator's retelling of events. Our narrator is an unreliable narrator in the strict sense of the term, but with the caveat that he is at least is telling us everything we need to know to come to our own conclusions. As the book unfolds further, the narrator's compromised credibility seems to infuse every element of the novel even the 'Good' of the book's title starts to seem like a minor dishonesty, perhaps serving as the inspiration for the irony embedded in the title of The 'Great' Gatsby. Much more effectively, however, the narrator's fixations, distractions and manner of speaking feel very much part of his dissimulation. It sometimes feels like he is unconsciously skirting over the crucial elements in his tale, exactly like one does in real life when recounting a story containing incriminating ingredients. Indeed, just how much the narrator is conscious of his own concealment is just one part of what makes this such an interesting book: Ford Madox Ford has gifted us with enough ambiguity that it is also possible that even the narrator cannot find it within himself to understand the events of the story he is narrating. It was initially hard to believe that such a carefully crafted analysis of a small group of characters could have been written so long ago, and despite being fairly easy to read, The Good Soldier is an almost infinitely subtle book even the jokes are of the subtle kind and will likely get a re-read within the next few years.

Anna Karenina (1878) Leo Tolstoy There are many similar themes running through War and Peace (reviewed above) and Anna Karenina. Unrequited love; a young man struggling to find a purpose in life; a loving family; an overwhelming love of nature and countless fascinating observations about the minuti of Russian society. Indeed, rather than primarily being about the eponymous Anna, Anna Karenina provides a vast panorama of contemporary life in Russia and of humanity in general. Nevertheless, our Anna is a sophisticated woman who abandons her empty existence as the wife of government official Alexei Karenin, a colourless man who has little personality of his own, and she turns to a certain Count Vronsky in order to fulfil her passionate nature. Needless to say, this results in tragic consequences as their (admittedly somewhat qualified) desire to live together crashes against the rocks of reality and Russian society. Parallel to Anna's narrative, though, Konstantin Levin serves as the novel's alter-protagonist. In contrast to Anna, Levin is a socially awkward individual who straddles many schools of thought within Russia at the time: he is neither a free-thinker (nor heavy-drinker) like his brother Nikolai, and neither is he a bookish intellectual like his half-brother Serge. In short, Levin is his own man, and it is generally agreed by commentators that he is Tolstoy's surrogate within the novel. Levin tends to come to his own version of an idea, and he would rather find his own way than adopt any prefabricated view, even if confusion and muddle is the eventual result. In a roughly isomorphic fashion then, he resembles Anna in this particular sense, whose story is a counterpart to Levin's in their respective searches for happiness and self-actualisation. Whilst many of the passionate and exciting passages are told on Anna's side of the story (I'm thinking horse race in particular, as thrilling as anything in cinema ), many of the broader political thoughts about the nature of the working classes are expressed on Levin's side instead. These are stirring and engaging in their own way, though, such as when he joins his peasants to mow the field and seems to enter the nineteenth-century version of 'flow':

The longer Levin mowed, the more often he felt those moments of oblivion during which it was no longer his arms that swung the scythe, but the scythe itself that lent motion to his whole body, full of life and conscious of itself, and, as if by magic, without a thought of it, the work got rightly and neatly done on its own. These were the most blissful moments.

Overall, Tolstoy poses no didactic moral message towards any of the characters in Anna Karenina, and merely invites us to watch rather than judge. (Still, there is a hilarious section that is scathing of contemporary classical music, presaging many of the ideas found in Tolstoy's 1897 What is Art?). In addition, just like the earlier War and Peace, the novel is run through with a number of uncannily accurate observations about daily life:

Anna smiled, as one smiles at the weaknesses of people one loves, and, putting her arm under his, accompanied him to the door of the study.

... as well as the usual sprinkling of Tolstoy's sardonic humour ("No one is pleased with his fortune, but everyone is pleased with his wit."). Fyodor Dostoyevsky, the other titan of Russian literature, once described Anna Karenina as a "flawless work of art," and if you re only going to read one Tolstoy novel in your life, it should probably be this one.

23 December 2022

Jonathan Dowland: 2022 music discovery: Underworld

One of my main musical 'discoveries' in 2022 was British electronic band Underworld. I m super late to the party. Underworld s commercial high point was the mid nineties. And I was certainly aware of them then: the use of Born Slippy .NUXX in 1996 s Trainspotting soundtrack was ubiquitous, but it didn t grab me.

In more recent years my colleague and friend Andrew Dinn (with whom I enjoyed many pre-pandemic conversations about music) enthusiastically advocated for Underworld (and furnished me with some rarities). This started to get them under my skin. It took a bit longer for me to truly get them, though, and the final straw was revisiting their BBC 6 Music Festival performance from 2016 (with Rez and it's companion piece "Cowgirl" standing out)

So where to start? There s something compelling about their whole catalogue. This is a group with which you can go deep, if you wish. The only album which hasn t grabbed me is 100 days off and it s probably only a matter of time before it does (Andrew advocates the Extended Ansum Edition bootleg here). Here are four career-spanning personal highlights:

Spikee
King of snake
Nylon strung
Seven music drone

A very reasonable entry point to their latest sprawling effort, DRIFT, is ricksdubbedoutdrift experience (live in Amsterdam), only 3 on Bandcamp.

10 August 2022

Shirish Agarwal: Mum, Samsung Galaxy M-52

Mum I dunno from where to start. While I m not supposed to announce it, mum left this earth a month ago (thirteen days when I started to write this blog post) ago. I am still in part denial, part shock, and morose. Of all the seasons in a year, the rainy season used to be my favorite, now would I ever be able to look and feel other than the emptiness that this season has given me? In some senses, it is and was very ironic, when she became ill about last year, I had promised myself I would be by her side for 5-6 years, not go anywhere either Hillhacks or Debconf or any meetup and I was ok with that. Now that she s no more I have no clue why am I living. What is the purpose, the utility? When she was alive, the utility was understandable. We had an unspoken agreement, I would like after her, and she was supposed to look after me. A part of me self-blames as I am sure, I have done thousands of things wrong otherwise the deal was that she was going to be for another decade. But now that she has left not even halfway, I dunno what to do. I don t have someone to fight with anymore It s mostly a robotic existence atm. I try to distract myself via movies, web series, the web, books, etc. whatever can take my mind off. From the day she died to date, I have a lower back pain which acts as a reminder. It s been a month, I eat, drink, and am surviving but still feel empty. I do things suggested by extended family but within there is no feeling, just emptiness :(. I have no clue if things will get better and even if I do want the change. I clearly have no idea, so let me share a little about what I know.

Samsung Galaxy M-52G Just a couple of days before she died, part of our extended family had come and she chose that opportunity to gift me Samsung Galaxy M-52G even though my birthday was 3 months away. Ironically, after I purchased it, the next day, one of the resellers of the phone cut the price from INR 28k to 20k. If a day more, I could have saved another 8k/- but what s done is done To my mind, the phone is middling yet a solid phone. I had the phone drop accidentally at times but not a single scratch or anything like that. One can look at the specs in greater detail on fccd.io. Before the recent price drop, as I shared it was a mid-range phone so am gonna review it on that basis itself. One of the first things I did is to buy a plastic cover as well as a cover shield even though the original one is meant to work for a year or more. This was simply for added protection and it has served me to date. Even with the additional weight, I can easily use it with one hand. It only becomes problematic when using chatting apps. such as Whatsapp, Telegram, Quicksy and a few others where it comes with Samsung keyboard with the divided/split keyboard. The A.I. for guessing words and sentences are spot-on when you are doing it in English but if you try a mixture of Hinglish (Hindi and English) that becomes a bit of a nightmare. Tryng to each A.I; new words is something of a task. I wish there was an interface in which I could train the A.I. so it could be served for Hinglish words also. I do think it does, but it s too rudimentary as it is to be any useful at least where it is now.

WiFi Direct While my previous phone did use wifi direct but it that ancient android version wasn t wedded to Wifi Direct as this one is. You have essentially two ways to connect to any system outside. One is through Wi-Fi Direct and the more expensive way is through mobile data. One of the strange things I found quite a number of times, that Wifi would lose it pairings. Before we get into it, Wikipedia has a good explanation of what Wifi direct is all about. Apparently, either my phone or my modem loses the pairing, which of the two is the culprit, I really don t know. There are two apps from the Play Store that do help in figuring out what the issue is (although it is limited in what it gives out in info. but still good.) The first one is Wifi Signal Meter and the other one is WifiAnalyzer (open-source). I have found that pairing done through Wifi Signal Meter works better than through Google s own implementation which feels lacking. The whole universe of Android seems to be built on apps and games and many of these can be bought for money, but many of these can also be played using a combination of micro-transactions and ads. For many a game, you cannot play for more than 5 minutes before you either see an ad or wait for something like 2-3 hrs. before you attempt again. Hogwarts Mystery, for e.g., is an example of that. Another one would be Explore Lands . While Hogwarts Mystery is more towards the lore created by J.K.Rowling and you can really get into the thick of things if you know the lore, Explore lands is more into Exploration of areas. In both the games, you are basically looking to gain energy over a period of time, which requires either money or viewing ads or a combination of both Sadly most ads and even Google don t seem to have caught up that I m deaf so most ads do not have subtitles, so more often than not they are useless to me. I have found also that many games share screenshots or videos that have nothing to do with how the game is. So there is quite a bit of misleading going on. I did read that Android had been having issues with connecting with developers after their app. is in the Play Store. Most apps. ask and require a whole lot of permissions that aren t needed by that app.

F-Droid Think Pirate Praveen had introduced me to F-Droid and a whole lot of things have happened in F-Droid, lot more apps. games etc. the look of F-Droid has been pulled back. In fact, I found Neo Store to be a better skin to see F-Droid. I have yet to explore more of F-Droid before sharing any recommendations and spending some time on it. I do find that many of foss apps. do need to work on how we communicate with our users. For e.g. one app. that Praveen had shared with me recently was Quicksy. And while it is better, it uses a double negative while asking permission whether it should or not to use more of the phone s resources. It is an example of that sort of language that we need to be aware of and be better. I know this post is more on the mobile rather than the desktop but that is where I m living currently.

29 May 2022

John Goerzen: Fast, Ordered Unixy Queues over NNCP and Syncthing with Filespooler

It seems that lately I ve written several shell implementations of a simple queue that enforces ordered execution of jobs that may arrive out of order. After writing this for the nth time in bash, I decided it was time to do it properly. But first, a word on the why of it all.

Why did I bother? My needs arose primarily from handling Backups over Asynchronous Communication methods in this case, NNCP. When backups contain incrementals that are unpacked on the destination, they must be applied in the correct order. In some cases, like ZFS, the receiving side will detect an out-of-order backup file and exit with an error. In those cases, processing in random order is acceptable but can be slow if, say, hundreds or thousands of hourly backups have stacked up over a period of time. The same goes for using gitsync-nncp to synchronize git repositories. In both cases, a best effort based on creation date is sufficient to produce a significant performance improvement. With other cases, such as tar or dar backups, the receiving cannot detect out of order incrementals. In those situations, the incrementals absolutely must be applied with strict ordering. There are many other situations that arise with these needs also. Filespooler is the answer to these.

Existing Work Before writing my own program, I of course looked at what was out there already. I looked at celeary, gearman, nq, rq, cctools work queue, ts/tsp (task spooler), filequeue, dramatiq, GNU parallel, and so forth. Unfortunately, none of these met my needs at all. They all tended to have properties like:

An extremely complicated client/server system that was incompatible with piping data over existing asynchronous tools

A large bias to processing of small web requests, resulting in terrible inefficiency or outright incompatibility with jobs in the TB range

An inability to enforce strict ordering of jobs, especially if they arrive in a different order from how they were queued

Many also lacked some nice-to-haves that I implemented for Filespooler:

Support for the encryption and cryptographic authentication of jobs, including metadata

First-class support for arbitrary compressors

Ability to use both stream transports (pipes) and filesystem-like transports (eg, rclone mount, S3, Syncthing, or Dropbox)

Introducing Filespooler Filespooler is a tool in the Unix tradition: that is, do one thing well, and integrate nicely with other tools using the fundamental Unix building blocks of files and pipes. Filespooler itself doesn t provide transport for jobs, but instead is designed to cooperate extremely easily with transports that can be written to as a filesystem or piped to which is to say, almost anything of interest. Filespooler is written in Rust and has an extensive Filespooler Reference as well as many tutorials on its homepage. To give you a few examples, here are some links:

Using Filespooler over Syncthing (and the most comprehensive tutorial)

Using Filespooler over NNCP

Compressing Filespooler Jobs

Encrypting Filespooler Jobs with GPG or Age

Guidelines for Writing To Filespooler Queues Without Using Filespooler

Basics of How it Works Filespooler is intentionally simple:

The sender maintains a sequence file that includes a number for the next job packet to be created.

The receiver also maintains a sequence file that includes a number for the next job to be processed.

`fspl prepare` creates a Filespooler job packet and emits it to stdout. It includes a small header (<100 bytes in most cases) that includes the sequence number, creation timestamp, and some other useful metadata.

You get to transport this job packet to the receiver in any of many simple ways, which may or may not involve Filespooler s assistance.

On the receiver, Filespooler (when running in the default strict ordering mode) will simply look at the sequence file and process jobs in incremental order until it runs out of jobs to process.

The name of job files on-disk matches a pattern for identification, but the content of them is not significant; only the header matters. You can send job data in three ways:

By piping it to `fspl prepare`

By setting certain environment variables when calling `fspl prepare`

By passing additional command-line arguments to `fspl prepare`, which can optionally be passed to the processing command at the receiver.

Data piped in is added to the job payload , while environment variables and command-line parameters are encoded in the header.

Basic usage Here I will excerpt part of the Using Filespooler over Syncthing tutorial; consult it for further detail. As a bit of background, Syncthing is a FLOSS decentralized directory synchronization tool akin to Dropbox (but with a much richer feature set in many ways).

Preparation First, on the receiver, you create the queue (passing the directory name to -q):

sender$ fspl queue-init -q ~/sync/b64queue

Now, we can send a job like this:

sender$ echo Hi   fspl prepare -s ~/b64seq -i -   fspl queue-write -q ~/sync/b64queue

Let s break that down:

First, we pipe Hi to fspl prepare.
fspl prepare takes two parameters:
- -s seqfile gives the path to a sequence file used on the sender side. This file has a simple number in it that increments a unique counter for every generated job file. It is matched with the nextseq file within the queue to make sure that the receiver processes jobs in the correct order. It MUST be separate from the file that is in the queue and should NOT be placed within the queue. There is no need to sync this file, and it would be ideal to not sync it.
- The -i option tells fspl prepare to read a file for the packet payload. -i - tells it to read stdin for this purpose. So, the payload will consist of three bytes: Hi\n (that is, including the terminating newline that echo wrote)
Now, fspl prepare writes the packet to its stdout. We pipe that into fspl queue-write:
- fspl queue-write reads stdin and writes it to a file in the queue directory in a safe manner. The file will ultimately match the fspl-*.fspl pattern and have a random string in the middle.

At this point, wait a few seconds (or however long it takes) for the queue files to be synced over to the recipient. On the receiver, we can see if any jobs have arrived yet:

receiver$ fspl queue-ls -q ~/sync/b64queue
ID                   creation timestamp          filename
1                    2022-05-16T20:29:32-05:00   fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

Let s say we d like some information about the job. Try this:

receiver$ $ fspl queue-info -q ~/sync/b64queue -j 1
FSPL_SEQ=1
FSPL_CTIME_SECS=1652940172
FSPL_CTIME_NANOS=94106744
FSPL_CTIME_RFC3339_UTC=2022-05-17T01:29:32Z
FSPL_CTIME_RFC3339_LOCAL=2022-05-16T20:29:32-05:00
FSPL_JOB_FILENAME=fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl
FSPL_JOB_QUEUEDIR=/home/jgoerzen/sync/b64queue
FSPL_JOB_FULLPATH=/home/jgoerzen/sync/b64queue/jobs/fspl-7b85df4e-4df9-448d-9437-5a24b92904a4.fspl

This information is intentionally emitted in a format convenient for parsing. Now let s run the job!

receiver$ fspl queue-process -q ~/sync/b64queue --allow-job-params base64
SGkK

There are two new parameters here:

--allow-job-params says that the sender is trusted to supply additional parameters for the command we will be running.
base64 is the name of the command that we will run for every job. It will:
- Have environment variables set as we just saw in queue-info
- Have the text we previously prepared Hi\n piped to it

By default, fspl queue-process doesn t do anything special with the output; see Handling Filespooler Command Output for details on other options. So, the base64-encoded version of our string is SGkK . We successfully sent a packet using Syncthing as a transport mechanism! At this point, if you do a fspl queue-ls again, you ll see the queue is empty. By default, fspl queue-process deletes jobs that have been successfully processed.

For more See the Filespooler homepage.
This blog post is also available as a permanent, periodically-updated page.

20 March 2022

Joerg Jaspert: Another shell script moved to rust

Shell? Rust! Not the first shell script I took and made a rust version of, but probably my largest yet. This time I took my little tm (tmux helper) tool which is (well, was) a bit more than 600 lines of shell, and converted it to Rust. I got most of the functionality done now, only one major part is missing.

What s tm? tm started as a tiny shell script to make handling tmux easier. The first commit in git was in July 2013, but I started writing and using it in 2011. It started out as a kind-of wrapper around ssh, opening tmux windows with an ssh session on some other hosts. It quickly gained support to open multiple ssh sessions in one window, telling tmux to synchronize input (send input to all targets at once), which is great when you have a set of machines that ought to get the same commands.

tm vs clusterssh / mussh In spirit it is similar to clusterssh or mussh, allowing to run the same command on many hosts at the same time. clusterssh sets out to open new terminals (xterm) per host and gives you an input line, that it sends everywhere. mussh appears to take your command and then send it to all the hosts. Both have disadvantages in my opinion: clusterssh opens lots of xterm windows, and you can not easily switch between multiple sessions, mussh just seems to send things over ssh and be done. tm instead just creates a tmux session, telling it to ssh to the targets, possibly setting the tmux option to send input to all panes. And leaves all the rest of the handling to tmux. So you can

detach a session and reattach later easily,

use tmux great builtin support for copy/paste,

see all output, modify things even for one machine only,

zoom in to one machine that needs just ONE bit different (cssh can do this too),

let colleagues also connect to your tmux session, when needed,

easily add more machines to the mix, if needed,

and all the other extra features tmux brings.

More tm tm also supports just attaching to existing sessions as well as killing sessions, mostly for lazyness (less to type than using tmux directly). At some point tm gained support for setting up sessions according to some session file . It knows two formats now, one is simple and mostly a list of hostnames to open synchronized sessions for. This may contain LIST commands, which let tm execute that command, expected output is list of hostnames (or more LIST commands) for the session. That, combined with the replacement part, lets us have one config file that opens a set of VMs based on tags our Ganeti runs, based on tags. It is simply a LIST command asking for VMs tagged with the replacement arg and up. Very handy. Or also all VMs on host X . The second format is basically free form tmux commands . Mostly commandline tmux call, just drop the tmux in front collection. Both of them supporting a crude variable replacement.

Conversion to Rust Some while ago I started playing with Rust and it somehow clicked , I do like it. My local git tells me, that I tried starting off with go in 2017, but that appearently did not work out. Fun, everywhere I can read says that Rust ought to be harder to learn. So by now I have most of the functionality implemented in the Rust version, even if I am sure that the code isn t a good Rust example. I m learning, after all, and already have adjusted big parts of it, multiple times, whenever I learn (and understand) something more - and am also sure that this will happen again

Compatibility with old tm It turns out that my goal of staying compatible with the behaviour of the old shell script does make some things rather complicated. For example, the LIST commands in session config files - in shell I just execute them commands, and shell deals with variable/parameter expansion, I just set IFS to newline only and read in what I get back. Simple. Because shell is doing a lot of things for me. Now, in Rust, it is a different thing at all:

Properly splitting the line into shell words, taking care of quoting (can t simply take whitespace) (there is shlex)

Expanding specials like ~ and $HOME (there is home_dir).

Supporting environment variables in general, tm has some that adjust behaviour of it. Which shell can use globally. Used lazy_static for a similar effect - they aren t going to change at runtime ever, anyways.

Properly supporting the commandline arguments also turned out to be a bit more work. Rust appearently has multiple crates supporting this, I settled on clap, but as tm supports getopts -style as well as free-form arguments (subcommands in clap), it takes a bit to get that interpreted right.

Speed Most of the time entirely unimportant in the tool that tm is (open a tmux with one to some ssh connections to some places is not exactly hard or time consuming), there are situations, where one can notice that it s calling out to tmux over and over again, for every single bit to do, and that just takes time: Configurations that open sessions to 20 and more hosts at the same time especially lag in setup time. (My largest setup goes to 443 panes in one window). The compiled Rust version is so much faster there, it s just great. Nice side effect, that is. And yes, in the end it is also only driving tmux, still, it takes less than half the time to do so.

Code, Fun parts As this is still me learning to write Rust, I am sure the code has lots to improve. Some of which I will sure find on my own, but if you have time, I love PRs (or just mails with hints).

Github Also the first time I used Github Actions to see how it goes. Letting it build, test, run clippy and also run a code coverage tool (Yay, more than 50% covered ) on it. Unsure my tests are good, I am not used to writing tests for code, but hey, coverage!

Up next I do have to implement the last missing feature, which is reading the other config file format. A little scared, as that means somehow translating those lines into correct calls within the tmux_interface I am using, not sure that is easy. I could be bad and just shell out to tmux on it all the time, but somehow I don t like the thought of doing that. Maybe (ab)using the control mode, but then, why would I use tmux_interface, so trying to handle it with that first. Afterwards I want to gain a new command, to save existing sessions and be able to recreate them easily. Shouldn t be too hard, tmux has a way to get at that info, somewhere.

2 March 2022

Keith Packard: picolibc-testing

Testing Picolibc with the glibc tests Picolibc has a bunch of built-in tests, but more testing is always better, right? I decided to see how hard it would be to run some of the tests provided in the GNU C Library (glibc). Parallel meson build files Similar to how Picolibc uses meson build files to avoid modifying the newlib autotools infrastructure, I decided to take the glibc code and write meson build rules that would compile the tests against Picolibc header files and link against Picolibc libraries. I decided to select a single target for this project so I could focus on getting things building and not worry too much about making it portable. I wanted to pick something that had hardware floating point so that I would have rounding modes and exception support, so I picked the ARM Cortex M7 with hard float ABI:

$ arm-none-eabi-gcc -mcpu=cortex-m7 -mfloat-abi=hard

It should be fairly easy to extend this to other targets, but for now, that's what I've got working. There's a cross-cortex-m7.txt file containing all of the cross compilation details which is used when running meson setup. All of the Picolibc-specific files live in a new picolibc directory so they are isolated from the main glibc code. Pre-loading a pile of hacks Adapt Picolibc to support the Glibc test code required a bunch of random hacks, from providing _unlocked versions of the stdio macros to stubbing out various unsupportable syscalls (like sleep and chdir). Instead of modifying the Glibc code, I created a file called hacks.h which is full of this stuff and used the gcc -include parameter to read that into the compiler before starting compilation on all of the files. Supporting command line parameters The glibc tests all support a range of command line parameters, some of which turned out to be quite useful for this work. Picolibc had limited semihosting support for accessing the command line, but that required modifying applications to go fetch the command line using a special semihosting function. To make this easier, I added a new crt0 variant for picolibc called (oddly) semihost. This extends the existing hosted variant by adding a call to the semihosting library to fetch the current command line and splitting that into words at each space. It doesn't handle any quoting, but it's sufficient for the needs here. Avoiding glibc headers The glibc tests use some glibc-specific extensions to the standard POSIX C library, so I needed to include those in the test builds. Headers for those extensions are mixed in with the rest of the POSIX standard headers, which conflict with the Picolibc versions. To work around this, I stuck stub #include files in the picolibc directory which directly include the appropriate headers for the glibc API extensions. This includes things like argp.h and array_length.h. For other headers which weren't actually needed for picolibc, I created empty files. Adding more POSIX to Picolibc At this point, code was compiling but failing to find various standard POSIX functions which aren't available in Picolibc. That included some syscalls which could be emulated using semihosting, like gettimeofday and getpagesize. It also included some generally useful additions, like replacing ecvtbuf and fcvtbuf with ecvt_r and fcvt_r. The _r variants provide a target buffer size instead of assuming that it was large enough as the Picolibc buf variants did. Which tests are working? So far, I've got some of the tests in malloc, math, misc and stdio-common running. There are a lot of tests in the malloc directory which cover glibc API extensions or require POSIX syscalls not supported by semihosting. I think I've added all of the tests which should be supported. For the math tests, I'm testing the standard POSIX math APIs in both float and double forms, except for the Bessel and Gamma functions. Picolibc's versions of those are so bad that they violate some pretty basic assumptions about error bounds built into the glibc test code. Until Picolibc gets better versions of these functions, we'll have to skip testing them this way. In the misc directory, I've only added tests for ecvt, fcvt, gcvt, dirname and hsearch. I don't think there are any other tests there which should work. Finally, for stdio-common, almost all of the tests expect a fully functioning file system, which semihosting really can't support. As a result, we're only able to run the printf, scanf and some of the sprintf tests. All in all, we're running 78 of the glibc test programs, which is a small fraction of the total tests, but I think it's the bulk of the tests which cover APIs that don't depend too much on the underlying POSIX operating system. Bugs found and fixed in Picolibc This exercise has resulted in 17 fixes in Picolibc, which can be classified as:

Math functions taking sNaN and not raising FE_INVALID and returning qNaN. Almost any operation on an sNaN value is supposed to signal an invalid operand and replace that with a qNaN so that further processing doesn't raise another exception. This was fairly easy to fix, just need to use return x + x; instead of return x;.
Math functions failing to set errno. I'd recently restructured the math library to get rid of the separate IEEE version of the functions which didn't set errno and missed a couple of cases that needed to use the errno-setting helper functions.
Corner cases in string/float conversion, including the need to perform round-to-even for '%a' conversions in printf and supporting negative decimal values for fcvt. This particular exercise led to replacing the ecvtbuf and fcvtbuf APIs with glibc's ecvt_r and fcvt_r versions as those pass explicit buffer lengths, making overflow prevention far more reliable.
A bunch of malloc entry points were not handling failure correctly; allocation values close to the maximum possible size caused numerous numeric wrapping errors with the usual consequences (allocations "succeed", but return a far smaller buffer than requested). Also, realloc was failing to check the return value from malloc before copying the old data, which really isn't a good idea.
Tinystdio's POSIX support code was returning an error instead of EOF at end of file.
Error bounds for the Picolibc math library aren't great; I had to generate Picolibc-specific ulps files. Most functions are between 0 and 3 ulps, but for some reason, the float version of erfc (ercf) has errors as large as 64 ulps. That needs investigation.

Tests added to Picolibc With all of the fixes applied to Picolibc, I added some tests to verify them without relying on running the glibc tests, that includes sNaN vs qNaN tests for math functions, testing fopen and mktemp, checking the printf %a rounding support and adding a ecvt/fcvt tests. I also discovered that the github-based CI system was compiling but not testing when using clang with a riscv target, so that got added in. Where's the source code? The Picolibc changes are sitting on the glibc-testing branch. I'll merge them once the current CI pass finishes. The hacked-up Glibc bits are in a glibc mirror at github in the picolibc project on the picolibc-testing branch. It would be interesting to know what additional tests should be usable in this environment. And, perhaps, finding a way to use this for picolibc CI testing in the future. Concluding thoughts Overall, I'm pretty pleased with these results. The bugs in malloc are fairly serious and could easily result in trouble, but the remaining issues are mostly minor and shouldn't have a big impact on applications using Picolibc. I'll get the changes merged and start thinking about doing another Picolibc release.

26 February 2022

Russ Allbery: INN 2.6.5

This is a bit of a sneak preview announcement since I'm waiting for the ISC mirror to update before sending the official announcement to the normal channels, but INN 2.6.5 has been released. (The release was finalized a few days ago, and I'm a bit behind in posting it.) This is a bug fix and minor feature release over INN 2.6.4, and the upgrade should be painless. You can download the new release from ftp.isc.org (once it updates) or my personal INN pages. The latter also has links to the full changelog and the other INN documentation. As always, thanks to Julien LIE for preparing this release and doing most of the maintenance work on INN! Changes in this release:

A new step in INN development has been achieved with the migration of the INN project to GitHub. We now make use of the features GitHub provides: issue tracker, pull requests, continuous integration, a user-friendly interface to browse the code, etc. Our Subversion repository has therefore been migrated to Git, and our Trac tickets to the GitHub issue tracker.
An up-to-date nocem.ctl file is provided with this release. You should manually update your nocem.ctl file with the new information recorded about NoCeM issuers, and make sure the right PGP keys are present on your system.
Up-to-date control.ctl and moderators files are provided with this release. You should manually update them (notably for the fido7.* hierarchy).
Added a stricter validation of article numbers given in NNTP commands so that numbers superior to 2^31 are correctly considered invalid. Thanks to Richard Kettlewell for the patch.
Added a check in rc.news for the existence of the pathrun directory. INN won't start until this directory is writable. Previously, it bailed out quickly after starting, without clear logs about why it failed.
Fixed parallel builds using make -j. Thanks to Richard Kettlewell for the path.
nnrpd now properly gathers timer statistics when a compression layer is active.
nnrpd now properly discards data received from a news client after a timeout when a TLS layer is active. It previously tried to read incoming data before closing the socket, leading to decoding errors from an underlying compression or SASL layer.
innfeed and ovdb_stat now generate status reports in valid HTML syntax.
Fixed a bug in the buffindexed overview that prevented it from working on several systems, amongst them FreeBSD. Unsupported, and useless, permission bits were given to semaphores.
Fixed the detection of library paths at configure time: multilib directories (lib32 or lib64) are now also used if they exist, even it the system does not use multilib. It will notably fix the detection of the OpenSSL 3.0.0 library.
The tlscertfile parameter in inn.conf now permits the use of a complete certificate chain, instead of necessarily having to use tlscafile for additional certificates.
Added support for the new OpenSSL 3.0.0 API, which deprecated a few functions.
The inn.conf default value for tlsprotocols no longer contains TLS versions 1.0 and 1.1, which have been deprecated by RFC 8996.
A new inn.conf parameter has been added to tune the length of the queue of pending connections to innd, nnrpd and the ovdb overview storage method: the maxlisten parameter now permits configuring their listen backlog, whose previously hard-coded values were 128 for nnrpd and 25 for the others, which was not high enough for some uses. The default value is now 128 for all of them, and configurable in inn.conf. Thanks to Kevin Bowling for the patch.
The name of seven man pages for routines built in libinn(3) are now prefixed with libinn_ so as not to consume namespace and conflict with other packages (notably, the list(3) and uwildmat(3) man pages are now named libinn_list(3) and libinn_uwildmat(3)).
Other minor bug fixes and documentation improvements, notably a revised installation checklist and a section summarizing the most used configuration at the beginning of a few complex man pages.

17 January 2022

Matthew Garrett: Boot Guard and PSB have user-hostile defaults

Compromising an OS without it being detectable is hard. Modern operating systems support the imposition of a security policy or the launch of some sort of monitoring agent sufficient early in boot that even if you compromise the OS, you're probably going to have left some sort of detectable trace[1]. You can avoid this by attacking the lower layers - if you compromise the bootloader then it can just hotpatch a backdoor into the kernel before executing it, for instance.

This is avoided via one of two mechanisms. Measured boot (such as TPM-based Trusted Boot) makes a tamper-proof cryptographic record of what the system booted, with each component in turn creating a measurement of the next component in the boot chain. If a component is tampered with, its measurement will be different. This can be used to either prevent the release of a cryptographic secret if the boot chain is modified (for instance, using the TPM to encrypt the disk encryption key), or can be used to attest the boot state to another device which can tell you whether you're safe or not. The other approach is verified boot (such as UEFI Secure Boot), where each component in the boot chain verifies the next component before executing it. If the verification fails, execution halts.

In both cases, each component in the boot chain measures and/or verifies the next. But something needs to be the first link in this chain, and traditionally this was the system firmware. Which means you could tamper with the system firmware and subvert the entire process - either have the firmware patch the bootloader in RAM after measuring or verifying it, or just load a modified bootloader and lie about the measurements or ignore the verification. Attackers had already been targeting the firmware (Hacking Team had something along these lines, although this was pre-secure boot so just dropped a rootkit into the OS), and given a well-implemented measured and verified boot chain, the firmware becomes an even more attractive target.

Intel's Boot Guard and AMD's Platform Secure Boot attempt to solve this problem by moving the validation of the core system firmware to an (approximately) immutable environment. Intel's solution involves the Management Engine, a separate x86 core integrated into the motherboard chipset. The ME's boot ROM verifies a signature on its firmware before executing it, and once the ME is up it verifies that the system firmware's bootblock is signed using a public key that corresponds to a hash blown into one-time programmable fuses in the chipset. What happens next depends on policy - it can either prevent the system from booting, allow the system to boot to recover the firmware but automatically shut it down after a while, or flag the failure but allow the system to boot anyway. Most policies will also involve a measurement of the bootblock being pushed into the TPM.

AMD's Platform Secure Boot is slightly different. Rather than the root of trust living in the motherboard chipset, it's in AMD's Platform Security Processor which is incorporated directly onto the CPU die. Similar to Boot Guard, the PSP has ROM that verifies the PSP's own firmware, and then that firmware verifies the system firmware signature against a set of blown fuses in the CPU. If that fails, system boot is halted. I'm having trouble finding decent technical documentation about PSB, and what I have found doesn't mention measuring anything into the TPM - if this is the case, PSB only implements verified boot, not measured boot.

What's the practical upshot of this? The first is that you can't replace the system firmware with anything that doesn't have a valid signature, which effectively means you're locked into firmware the vendor chooses to sign. This prevents replacing the system firmware with either a replacement implementation (such as Coreboot) or a modified version of the original implementation (such as firmware that disables locking of CPU functionality or removes hardware allowlists). In this respect, enforcing system firmware verification works against the user rather than benefiting them.
Of course, it also prevents an attacker from doing the same thing, but while this is a real threat to some users, I think it's hard to say that it's a realistic threat for most users.

The problem is that vendors are shipping with Boot Guard and (increasingly) PSB enabled by default. In the AMD case this causes another problem - because the fuses are in the CPU itself, a CPU that's had PSB enabled is no longer compatible with any motherboards running firmware that wasn't signed with the same key. If a user wants to upgrade their system's CPU, they're effectively unable to sell the old one. But in both scenarios, the user's ability to control what their system is running is reduced.

As I said, the threat that these technologies seek to protect against is real. If you're a large company that handles a lot of sensitive data, you should probably worry about it. If you're a journalist or an activist dealing with governments that have a track record of targeting people like you, it should probably be part of your threat model. But otherwise, the probability of you being hit by a purely userland attack is so ludicrously high compared to you being targeted this way that it's just not a big deal.

I think there's a more reasonable tradeoff than where we've ended up. Tying things like disk encryption secrets to TPM state means that if the system firmware is measured into the TPM prior to being executed, we can at least detect that the firmware has been tampered with. In this case nothing prevents the firmware being modified, there's just a record in your TPM that it's no longer the same as it was when you encrypted the secret. So, here's what I'd suggest:

1) The default behaviour of technologies like Boot Guard or PSB should be to measure the firmware signing key and whether the firmware has a valid signature into PCR 7 (the TPM register that is also used to record which UEFI Secure Boot signing key is used to verify the bootloader).
2) If the PCR 7 value changes, the disk encryption key release will be blocked, and the user will be redirected to a key recovery process. This should include remote attestation, allowing the user to be informed that their firmware signing situation has changed.
3) Tooling should be provided to switch the policy from merely measuring to verifying, and users at meaningful risk of firmware-based attacks should be encouraged to make use of this tooling

This would allow users to replace their system firmware at will, at the cost of having to re-seal their disk encryption keys against the new TPM measurements. It would provide enough information that, in the (unlikely for most users) scenario that their firmware has actually been modified without their knowledge, they can identify that. And it would allow users who are at high risk to switch to a higher security state, and for hardware that is explicitly intended to be resilient against attacks to have different defaults.

This is frustratingly close to possible with Boot Guard, but I don't think it's quite there. Before you've blown the Boot Guard fuses, the Boot Guard policy can be read out of flash. This means that you can drop a Boot Guard configuration into flash telling the ME to measure the firmware but not prevent it from running. But there are two problems remaining:

1) The measurement is made into PCR 0, and PCR 0 changes every time your firmware is updated. That makes it a bad default for sealing encryption keys.
2) It doesn't look like the policy is measured before being enforced. This means that an attacker can simply reflash modified firmware with a policy that disables measurement and then make a fake measurement that makes it look like the firmware is ok.

Fixing this seems simple enough - the Boot Guard policy should always be measured, and measurements of the policy and the signing key should be made into a PCR other than PCR 0. If an attacker modified the policy, the PCR value would change. If an attacker modified the firmware without modifying the policy, the PCR value would also change. People who are at high risk would run an app that would blow the Boot Guard policy into fuses rather than just relying on the copy in flash, and enable verification as well as measurement. Now if an attacker tampers with the firmware, the system simply refuses to boot and the attacker doesn't get anything.

Things are harder on the AMD side. I can't find any indication that PSB supports measuring the firmware at all, which obviously makes this approach impossible. I'm somewhat surprised by that, and so wouldn't be surprised if it does do a measurement somewhere. If it doesn't, there's a rather more significant problem - if a system has a socketed CPU, and someone has sufficient physical access to replace the firmware, they can just swap out the CPU as well with one that doesn't have PSB enabled. Under normal circumstances the system firmware can detect this and prompt the user, but given that the attacker has just replaced the firmware we can assume that they'd do so with firmware that doesn't decide to tell the user what just happened. In the absence of better documentation, it's extremely hard to say that PSB actually provides meaningful security benefits.

So, overall: I think Boot Guard protects against a real-world attack that matters to a small but important set of targets. I think most of its benefits could be provided in a way that still gave users control over their system firmware, while also permitting high-risk targets to opt-in to stronger guarantees. Based on what's publicly documented about PSB, it's hard to say that it provides real-world security benefits for anyone at present. In both cases, what's actually shipping reduces the control people have over their systems, and should be considered user-hostile.

[1] Assuming that someone's both turning this on and actually looking at the data produced