Search Results: "aba"

17 February 2024

James Valleroy: snac2: a minimalist ActivityPub server in Debian

snac2, currently available in Debian testing and unstable, is described by its upstream as A simple, minimalistic ActivityPub instance written in portable C. It provides an ActivityPub server with a bare-bones web interface. It does not use JavaScript or require a database.
Basic forms for creating a new post, or following someone
ActivityPub is the protocol for federated social networks that is implemented by Mastodon, Pleroma, and other similar server software. Federated social networks are most often used for micro-blogging , or making many small posts. You can decide to follow another user (or bot) to see their posts, even if they happen to be on a different server (as long as the server software is compatible with the ActivityPub standard).
The timeline shows posts from accounts that you follow
In addition, snac2 has preliminary support for the Mastodon Client API. This allows basic support for mobile apps that support Mastodon, but you should expect that many features are not available yet. If you are interested in running a minimalist ActivityPub server on Debian, please try out snac2, and report any bugs that you find.

1 February 2024

Russ Allbery: Review: System Collapse

Review: System Collapse, by Martha Wells
Series: Murderbot Diaries #7
Publisher: Tordotcom
Copyright: 2023
ISBN: 1-250-82698-5
Format: Kindle
Pages: 245
System Collapse is the second Murderbot novel. Including the novellas, it's the 7th in the series. Unlike Fugitive Telemetry, the previous novella that was out of chronological order, this is the direct sequel to Network Effect. A very direct sequel; it picks up just a few days after the previous novel ended. Needless to say, you should not start here. I was warned by other people and therefore re-read Network Effect immediately before reading System Collapse. That was an excellent idea, since this novel opens with a large cast, no dramatis personae, not much in the way of a plot summary, and a lot of emotional continuity from the previous novel. I would grumble about this more, like I have in other reviews, but I thoroughly enjoyed re-reading Network Effect and appreciated the excuse.
ART-drone said, I wouldn t recommend it. I lack a sense of proportional response. I don t advise engaging with me on any level.
Saying much about the plot of this book without spoiling Network Effect and the rest of the series is challenging. Murderbot is suffering from the aftereffects of the events of the previous book more than it expected or would like to admit. It and its humans are in the middle of a complicated multi-way negotiation with some locals, who the corporates are trying to exploit. One of the difficulties in that negotiation is getting people to believe that the corporations are as evil as they actually are, a plot element that has a depressing amount in common with current politics. Meanwhile, Murderbot is trying to keep everyone alive. I loved Network Effect, but that was primarily for the social dynamics. The planet that was central to the novel was less interesting, so another (short) novel about the same planet was a bit of a disappointment. This does give Wells a chance to show in more detail what Murderbot's new allies have been up to, but there is a lot of speculative exploration and detailed descriptions of underground tunnels that I found less compelling than the relationship dynamics of the previous book. (Murderbot, on the other hand, would much prefer exploring creepy abandoned tunnels to talking about its feelings.) One of the things this series continues to do incredibly well, though, is take non-human intelligence seriously in a world where the humans mostly don't. It perfectly fills a gap between Star Wars, where neither the humans nor the story take non-human intelligences seriously (hence the creepy slavery vibes as soon as you start paying attention to droids), and the Culture, where both humans and the story do. The corporates (the bad guys in this series) treat non-human intelligences the way Star Wars treats droids. The good guys treat Murderbot mostly like a strange human, which is better but still wrong, and still don't notice the numerous other machine intelligences. But Wells, as the author, takes all of the non-human characters seriously, which means there are complex and fascinating relationships happening at a level of the story that the human characters are mostly unaware of. I love that Murderbot rarely bothers to explain; if the humans are too blinkered to notice, that's their problem. About halfway into the story, System Collapse hits its stride, not coincidentally at the point where Murderbot befriends some new computers. The rest of the book is great. This was not as good as Network Effect. There is a bit less competence porn at the start, and although that's for good in-story reasons I still missed it. Murderbot's redaction of things it doesn't want to talk about got a bit annoying before it finally resolved. And I was not sufficiently interested in this planet to want to spend two novels on it, at least without another major revelation that didn't come. But it's still a Murderbot novel, which means it has the best first-person narrative voice I've ever read, some great moments, and possibly the most compelling and varied presentation of computer intelligence in science fiction at the moment.
There was no feed ID, but AdaCol2 supplied the name Lucia and when I asked it for more info, the gender signifier bb (which didn t translate) and he/him pronouns. (I asked because the humans would bug me for the information; I was as indifferent to human gender as it was possible to be without being unconscious.)
This is not a series to read out of order, but if you have read this far, you will continue to be entertained. You don't need me to tell you this nearly everyone reviewing science fiction is saying it but this series is great and you should read it. Rating: 8 out of 10

30 January 2024

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:
/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020
Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Lieutenant Commander Data from Star Trek: The Next Generation Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:
IssuerCompromised Count
Sectigo170
ISRG (Let's Encrypt)161
GoDaddy141
DigiCert81
GlobalSign46
Entrust3
SSL.com1
If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.
IssuerIssuance VolumeCompromised CountCompromise Rate
Sectigo88,323,0681701 in 519,547
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
GoDaddy56,121,4291411 in 398,024
DigiCert144,713,475811 in 1,786,586
GlobalSign1,438,485461 in 31,271
Entrust23,16631 in 7,722
SSL.com171,81611 in 171,816
If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
Sectigo88,323,0681701 in 519,547
DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
By grouping by order-of-magnitude in the compromise rate, we can identify three bands :
  • The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.
  • The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.
  • The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!
Now we have some useful insights we can think about.

Why Is It So?
Professor Julius Sumner Miller If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:
SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';
The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
"Regular" DigiCert40,397,363811 in 498,732
Sectigo88,323,0681701 in 519,547
All DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.

The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
Bender, the robot from Futurama, asking if we'd like to kill all humans No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.
  • Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.
  • Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.
  • Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

21 January 2024

Debian Brasil: MiniDebConf BH 2024 - patroc nio e financiamento coletivo

MiniDebConf BH 2024 J est rolando a inscri o de participante e a chamada de atividades para a MiniDebConf Belo Horizonte 2024, que acontecer de 27 a 30 de abril no Campus Pampulha da UFMG. Este ano estamos ofertando bolsas de alimenta o, hospedagem e passagens para contribuidores(as) ativos(as) do Projeto Debian. Patroc nio: Para a realiza o da MiniDebConf, estamos buscando patroc nio financeiro de empresas e entidades. Ent o se voc trabalha em uma empresa/entidade (ou conhece algu m que trabalha em uma) indique o nosso plano de patroc nio para ela. L voc ver os valores de cada cota e os seus benef cios. Financiamento coletivo: Mas voc tamb m pode ajudar a realiza o da MiniDebConf por meio do nosso financiamento coletivo! Fa a uma doa o de qualquer valor e tenha o seu nome publicado no site do evento como apoiador(a) da MiniDebConf Belo Horizonte 2024. Mesmo que voc n o pretenda vir a Belo Horizonte para participar do evento, voc pode doar e assim contribuir para o mais importante evento do Projeto Debian no Brasil. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o Debian Brasil Debian Debian MG DCC

18 January 2024

Russell Coker: LicheePi 4A (RISC-V) First Look

I Just bought a LicheePi 4A RISC-V embedded computer (like a RaspberryPi but with a RISC-V CPU) for $322.68 from Aliexpress (the official site for buying LicheePi devices). Here is the Sipheed web page about it and their other recent offerings [1]. I got the version with 16G of RAM and 128G of storage, I probably don t need that much storage (I can use NFS or USB) but 16G of RAM is good for VMs. Here is the Wiki about this board [2]. Configuration When you get one of these devices you should make setting up ssh server your first priority. I found the HDMI output to be very unreliable. The first monitor I tried was a Samsung 4K monitor dating from when 4K was a new thing, the LicheePi initially refused to operate at a resolution higher than 1024*768 but later on switched to 4K resolution when resuming from screen-blank for no apparent reason (and the window manager didn t support this properly). On the Dell 4K monitor I use on my main workstation it sometimes refused to talk to it and occasionally worked. I got it running at 1920*1080 without problems and then switched it to 4K and it lost video sync and never talked to that monitor again. On my Desklab portabable 4K monitor I got it to display in 4K resolution but only the top left 1/4 of the screen displayed. The issues with HDMI monitor support greatly limit the immediate potential for using this as a workstation. It doesn t make it impossible but would be fiddly at best. It s quite likely that a future OS update will fix this. But at the moment it s best used as a server. The LicheePi has a custom Linux distribution based on Ubuntu so you want too put something like the following in /etc/network/interfaces to make it automatically connect to the ethernet when plugged in:
auto end0
iface end0 inet dhcp
Then to get sshd to start you have to run the following commands to generate ssh host keys that aren t zero bytes long:
rm /etc/ssh/ssh_host_*
systemctl restart ssh.service
It appears to have wifi hardware but the OS doesn t recognise it. This isn t a priority for me as I mostly want to use it as a server. Performance For the first test of performance I created a 100MB file from /dev/urandom and then tried compressing it on various systems. With zstd -9 it took 16.893 user seconds on the LicheePi4A, 0.428s on my Thinkpad X1 Carbon Gen5 with a i5-6300U CPU (Debian/Unstable), 1.288s on my E5-2696 v3 workstation (Debian/Bookworm), 0.467s on the E5-2696 v3 running Debian/Unstable, 2.067s on a E3-1271 v3 server, and 7.179s on the E3-1271 v3 system emulating a RISC-V system via QEMU running Debian/Unstable. It s very impressive that the QEMU emulation is fast enough that emulating a different CPU architecture is only 3.5* slower for this test (or maybe 10* slower if it was running Debian/Unstable on the AMD64 code)! The emulated RISC-V is also more than twice as fast as real RISC-V hardware and probably of comparable speed to real RISC-V hardware when running the same versions (and might be slightly slower if running the same version of zstd) which is a tribute to the quality of emulation. One performance issue that most people don t notice is the time taken to negotiate ssh sessions. It s usually not noticed because the common CPUs have got faster at about the same rate as the algorithms for encryption and authentication have become more complex. On my i5-6300U laptop it takes 0m0.384s to run ssh -i ~/.ssh/id_ed25519 localhost id with the below server settings (taken from advice on ssh-audit.com [3] for a secure ssh configuration). On the E3-1271 v3 server it is 0.336s, on the QMU system it is 28.022s, and on the LicheePi it is 0.592s. By this metric the LicheePi is about 80% slower than decent x86 systems and the QEMU emulation of RISC-V is 73* slower than the x86 system it runs on. Does crypto depend on instructions that are difficult to emulate?
HostKey /etc/ssh/ssh_host_ed25519_key
KexAlgorithms -ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha256
MACs -umac-64-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1
I haven t yet tested the performance of Ethernet (what routing speed can you get through the 2 gigabit ports?), emmc storage, and USB. At the moment I ve been focused on using RISC-V as a test and development platform. My conclusion is that I m glad I don t plan to compile many kernels or anything large like LibreOffice. But that for typical development that I do it will be quite adequate. The speed of Chromium seems adequate in basic tests, but the video output hasn t worked reliably enough to do advanced tests. Hardware Features Having two Gigabit Ethernet ports, 4 USB-3 ports, and Wifi on board gives some great options for using this as a router. It s disappointing that they didn t go with 2.5Gbit as everyone seems to be doing that nowadays but Gigabit is enough for most things. Having only a single HDMI port and not supporting USB-C docks (the USB-C port appears to be power only) limits what can be done for workstation use and for controlling displays. I know of people using small ARM computers attached to the back of large TVs for advertising purposes and that isn t going to be a great option for this. The CPU and RAM apparently uses a lot of power (which is relative the entire system draws up to 2A at 5V so the CPU would be something below 5W). To get this working a cooling fan has to be stuck to the CPU and RAM chips via a layer of thermal stuff that resembles a fine sheet of blu-tack in both color and stickyness. I am disappointed that there isn t any more solid form of construction, to mount this on a wall or ceiling some extra hardware would be needed to secure this. Also if they just had a really big copper heatsink I think that would be better. 80386 CPUs with similar TDP were able to run without a fan. I wonder how things would work with all USB ports in use. It s expected that a USB port can supply a minimum of 2.5W which means that all the ports could require 10W if they were active. Presumably something significantly less than 5W is available for the USB ports. Other Devices Sipheed has a range of other devices in the works. They currently sell the LicheeCluster4A which support 7 compute modules for a cluster in a box. This has some interesting potential for testing and demonstrating cluster software but you could probably buy an AMD64 system with more compute power for less money. The Lichee Console 4A is a tiny laptop which could be useful for people who like the 7 laptop form factor, unfortunately it only has a 1280*800 display if it had the same resolution display as a typical 7 phone I would have bought one. The next device that appeals to me is the soon to be released Lichee Pad 4A which is a 10.1 tablet with 1920*1200 display, Wifi6, Bluetooth 5.4, and 16G of RAM. It also has 1 USB-C connection, 2*USB-3 sockets, and support for an external card with 2*Gigabit ethernet. It s a tablet as a laptop without keyboard instead of the more common larger phone design model. They are also about to release the LicheePadMax4A which is similar to the other tablet but with a 14 2240*1400 display and which ships with a keyboard to make it essentially a laptop with detachable keyboard. Conclusion At this time I wouldn t recommend that this device be used as a workstation or laptop, although the people who want to do such things will probably do it anyway regardless of my recommendations. I think it will be very useful as a test system for RISC-V development. I have some friends who are interested in this sort of thing and I can give them VMs. It is a bit expensive. The Sipheed web site boasts about the LicheePi4 being faster than the RaspberryPi4, but it s not a lot faster and the RaspberryPi4 is much cheaper ($127 or $129 for one with 8G of RAM). The RaspberryPi4 has two HDMI ports but a limit of 8G of RAM while the LicheePi has up to 16G of RAM and two Gigabit Ethernet ports but only a single HDMI port. It seems that the RaspberryPi4 might win if you want a cheap low power desktop system. At this time I think the reason for this device is testing out RISC-V as an alternative to the AMD64 and ARM64 architectures. An open CPU architecture goes well with free software, but it isn t just people who are into FOSS who are testing such things. I know some corporations are trying out RISC-V as a way of getting other options for embedded systems that don t involve paying monopolists. The Lichee Console 4A is probably a usable tiny laptop if the resolution is sufficient for your needs. As an aside I predict that the tiny laptop or pocket computer segment will take off in the near future. There are some AMD64 systems the size of a phone but thicker that run Windows and go for reasonable prices on AliExpress. Hopefully in the near future this device will have better video drivers and be usable as a small and quiet workstation. I won t rule out the possibility of making this my main workstation in the not too distant future, all it needs is reliable 4K display and the ability to decode 4K video. It s performance for web browsing and as an ssh client seems adequate, and that s what matters for my workstation use. But for the moment it s just for server use.

17 January 2024

Colin Watson: Task management

Now that I m freelancing, I need to actually track my time, which is something I ve had the luxury of not having to do before. That meant something of a rethink of the way I ve been keeping track of my to-do list. Up to now that was a combination of things like the bug lists for the projects I m working on at the moment, whatever task tracking system Canonical was using at the moment (Jira when I left), and a giant flat text file in which I recorded logbook-style notes of what I d done each day plus a few extra notes at the bottom to remind myself of particularly urgent tasks. I could have started manually adding times to each logbook entry, but ugh, let s not. In general, I had the following goals (which were a bit reminiscent of my address book): I didn t do an elaborate evaluation of multiple options, because I m not trying to come up with the best possible solution for a client here. Also, there are a bazillion to-do list trackers out there and if I tried to evaluate them all I d never do anything else. I just wanted something that works well enough for me. Since it came up on Mastodon: a bunch of people swear by Org mode, which I know can do at least some of this sort of thing. However, I don t use Emacs and don t plan to use Emacs. nvim-orgmode does have some support for time tracking, but when I ve tried vim-based versions of Org mode in the past I ve found they haven t really fitted my brain very well. Taskwarrior and Timewarrior One of the other Freexian collaborators mentioned Taskwarrior and Timewarrior, so I had a look at those. The basic idea of Taskwarrior is that you have a task command that tracks each task as a blob of JSON and provides subcommands to let you add, modify, and remove tasks with a minimum of friction. task add adds a task, and you can add metadata like project:Personal (I always make sure every task has a project, for ease of filtering). Just running task shows you a task list sorted by Taskwarrior s idea of urgency, with an ID for each task, and there are various other reports with different filtering and verbosity. task <id> annotate lets you attach more information to a task. task <id> done marks it as done. So far so good, so a redacted version of my to-do list looks like this:
$ task ls
ID A Project     Tags                 Description
17   Freexian                         Add Incus support to autopkgtest [2]
 7   Columbiform                      Figure out Lloyds online banking [1]
 2   Debian                           Fix troffcvt for groff 1.23.0 [1]
11   Personal                         Replace living room curtain rail
Once I got comfortable with it, this was already a big improvement. I haven t bothered to learn all the filtering gadgets yet, but it was easy enough to see that I could do something like task all project:Personal and it d show me both pending and completed tasks in that project, and that all the data was stored in ~/.task - though I have to say that there are enough reporting bells and whistles that I haven t needed to poke around manually. In combination with the regular backups that I do anyway (you do too, right?), this gave me enough confidence to abandon my previous text-file logbook approach. Next was time tracking. Timewarrior integrates with Taskwarrior, albeit in an only semi-packaged way, and it was easy enough to set that up. Now I can do:
$ task 25 start
Starting task 00a9516f 'Write blog post about task tracking'.
Started 1 task.
Note: '"Write blog post about task tracking"' is a new tag.
Tracking Columbiform "Write blog post about task tracking"
  Started 2024-01-10T11:28:38
  Current                  38
  Total               0:00:00
You have more urgent tasks.
Project 'Columbiform' is 25% complete (3 of 4 tasks remaining).
When I stop work on something, I do task active to find the ID, then task <id> stop. Timewarrior does the tedious stopwatch business for me, and I can manually enter times if I forget to start/stop a task. Then the really useful bit: I can do something like timew summary :month <name-of-client> and it tells me how much to bill that client for this month. Perfect. I also started using VIT to simplify the day-to-day flow a little, which means I m normally just using one or two keystrokes rather than typing longer commands. That isn t really necessary from my point of view, but it does save some time. Android integration I left Android integration for a bit later since it wasn t essential. When I got round to it, I have to say that it felt a bit clumsy, but it did eventually work. The first step was to set up a taskserver. Most of the setup procedure was OK, but I wanted to use Let s Encrypt to minimize the amount of messing around with CAs I had to do. Getting this to work involved hitting things with sticks a bit, and there s still a local CA involved for client certificates. What I ended up with was a certbot setup with the webroot authenticator and a custom deploy hook as follows (with cert_name replaced by a DNS name in my house domain):
#! /bin/sh
set -eu
cert_name=taskd.example.org
found=false
for domain in $RENEWED_DOMAINS; do
    case "$domain" in
        $cert_name)
            found=:
            ;;
    esac
done
$found   exit 0
install -m 644 "/etc/letsencrypt/live/$cert_name/fullchain.pem" \
    /var/lib/taskd/pki/fullchain.pem
install -m 640 -g Debian-taskd "/etc/letsencrypt/live/$cert_name/privkey.pem" \
    /var/lib/taskd/pki/privkey.pem
systemctl restart taskd.service
I could then set this in /etc/taskd/config (server.crl.pem and ca.cert.pem were generated using the documented taskserver setup procedure):
server.key=/var/lib/taskd/pki/privkey.pem
server.cert=/var/lib/taskd/pki/fullchain.pem
server.crl=/var/lib/taskd/pki/server.crl.pem
ca.cert=/var/lib/taskd/pki/ca.cert.pem
Then I could set taskd.ca on my laptop to /usr/share/ca-certificates/mozilla/ISRG_Root_X1.crt and otherwise follow the client setup instructions, run task sync init to get things started, and then task sync every so often to sync changes between my laptop and the taskserver. I used TaskWarrior Mobile as the client. I have to say I wouldn t want to use that client as my primary task tracking interface: the setup procedure is clunky even beyond the necessity of copying a client certificate around, it expects you to give it a .taskrc rather than having a proper settings interface for that, and it only seems to let you add a task if you specify a due date for it. It also lacks Timewarrior integration, so I can only really use it when I don t care about time tracking, e.g. personal tasks. But that s really all I need, so it meets my minimum requirements. Next? Considering this is literally the first thing I tried, I have to say I m pretty happy with it. There are a bunch of optional extras I haven t tried yet, but in general it kind of has the vim nature for me: if I need something it s very likely to exist or easy enough to build, but the features I don t use don t get in my way. I wouldn t recommend any of this to somebody who didn t already spend most of their time in a terminal - but I do. I m glad people have gone to all the effort to build this so I didn t have to.

16 January 2024

Matthew Palmer: Pwned Certificates on the Fediverse

As well as the collection and distribution of compromised keys, the pwnedkeys project also matches those pwned keys against issued SSL certificates. I m excited to announce that, as of the beginning of 2024, all matched certificates are now being published on the Fediverse, thanks to the botsin.space Mastodon server. Want to know which sites are susceptible to interception and interference, in (near-)real time? Do you have a burning desire to know who is issuing certificates to people that post their private keys in public? Now you can.

How It Works The process for publishing pwned certs is, roughly, as follows:
  1. All the certificates in Certificate Transparency (CT) logs are hoovered up (using my scrape-ct-log tool, the fastest log scraper in the west!), and the fingerprint of the public key of each certificate is stored in an LMDB datafile.
  2. As new private keys are identified as having been compromised, the fingerprint of that key is checked against all the LMDB files, which map key fingerprints to certificates (actually to CT log entry IDs, from which the certificates themselves are retrieved).
  3. If one or more matches are found, then the certificates using the compromised key are forwarded to the tooter , which publishes them for the world to marvel at.
This makes it sound all very straightforward, and it is in theory. The trick comes in optimising the pipeline so that the five million or so new certificates every day can get indexed on the one slightly middle-aged server I ve got, without getting backlogged.

Why Don t You Just Have the Certificates Revoked? Funny story about that I used to notify CAs of certificates they d issued using compromised keys, which had the effect of requiring them to revoke the associated certificates. However, several CAs disliked having to revoke all those certificates, because it cost them staff time (and hence money) to do so. They went so far as to change their procedures from the standard way of accepting problem reports (emailing a generic attestation of compromise), and instead required CA-specific hoop-jumping to notify them of compromised keys. Since the effectiveness of revocation in the WebPKI is, shall we say, homeopathic at best, I decided I couldn t be bothered to play whack-a-mole with CAs that just wanted to be difficult, and I stopped sending compromised key notifications to CAs. Instead, now I m publishing the details of compromised certificates to everyone, so that users can protect themselves directly should they choose to.

Further Work The astute amongst you may have noticed, in the above How It Works description, a bit of a gap in my scanning coverage. CAs can (and do!) issue certificates for keys that are already compromised, including weak keys that have been known about for a decade or more (1, 2, 3). However, as currently implemented, the pwnedkeys certificate checker does not automatically find such certificates. My plan is to augment the CT scraping / cert processing pipeline to check all incoming certificates against the existing (2M+) set of pwned keys. Though, with over five million new certificates to check every day, it s not necessarily as simple as just hit the pwnedkeys API for every new cert . The poor old API server might not like that very much.

Support My Work If you d like to see this extra matching happen a bit quicker, I ve setup a ko-fi supporters page, where you can support my work on pwnedkeys and the other open source software and projects I work on by buying me a refreshing beverage. I would be very appreciative, and your support lets me know I should do more interesting things with the giant database of compromised keys I ve accumulated.

14 January 2024

Debian Brasil: MiniDebConf BH 2024 - abertura de inscri o e chamada de atividades

MiniDebConf BH 2024 Est aberta a inscri o de participantes e a chamada de atividades para a MiniDebConf Belo Horizonte 2024 e para o FLISOL - Festival Latino-americano de Instala o de Software Livre. Veja abaixo algumas informa es importantes: Data e local da MiniDebConf e do FLISOL A MiniDebConf acontecer de 27 a 30 de abril no Campus Pampulha da UFMG - Universidade Federal de Minas Gerais. No dia 27 (s bado) tamb m realizaremos uma edi o do FLISOL - Festival Latino-americano de Instala o de Software Livre, evento que acontece no mesmo dia em v rias cidades da Am rica Latina. Enquanto a MiniDebConf ter atividades focados no Debian, o FLISOL ter atividades gerais sobre Software Livre e temas relacionados como linguagem de programa o, CMS, administra o de redes e sistemas, filosofia, liberdade, licen as, etc. Inscri o gratuita e oferta de bolsas Voc j pode realizar a sua inscri o gratuita para a MiniDebConf Belo Horizonte 2024. A MiniDebConf um evento aberto a todas as pessoas, independente do seu n vel de conhecimento sobre Debian. O mais importante ser reunir a comunidade para celebrar um dos maiores projeto de Software Livre no mundo, por isso queremos receber desde usu rios(as) inexperientes que est o iniciando o seu contato com o Debian at Desenvolvedores(as) oficiais do projeto. Ou seja, est o todos(as) convidados(as)! Este ano estamos ofertando bolsas de hospedagem e passagens para viabilizar a vinda de pessoas de outras cidades que contribuem para o Projeto Debian. Contribuidores(as) n o oficiais, DMs e DDs podem solicitar as bolsas usando o formul rio de inscri o. Tamb m estamos ofertando bolsas de alimenta o para todos(as) os(as) participantes, mesmo n o contribuidores(as), e pessoas que moram na regi o de BH. Os recursos financeiros s o bastante limitados, mas tentaremos atender o m ximo de pedidos. Se voc pretende pedir alguma dessas bolsas, acesse este link e veja mais informa es antes de realizar a sua inscri o: A inscri o (sem bolsas) poder ser feita at a data do evento, mas temos uma data limite para o pedido de bolsas de hospedagem e passagens, por isso fique atento(a) ao prazo final: at 18 de fevereiro. Como estamos usando mesmo formul rio para os dois eventos, a inscri o ser v lida tanto para a MiniDebConf quanto para o FLISOL. Para se inscrever, acesse o site, v em Criar conta. Criei a sua conta (preferencialmente usando o Salsa) e acesse o seu perfil. L voc ver o bot o de Se inscrever. https://bh.mini.debconf.org Chamada de atividades Tamb m est aberta a chamada de atividades tanto para MiniDebConf quanto para o FLISOL. Para mais informa es, acesse este link. Fique atento ao prazo final para enviar sua proposta de atividade: at 18 de fevereiro. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o Debian Brasil Debian Debian MG DCC

9 January 2024

Louis-Philippe V ronneau: 2023 A Musical Retrospective

I ended 2022 with a musical retrospective and very much enjoyed writing that blog post. As such, I have decided to do the same for 2023! From now on, this will probably be an annual thing :) Albums In 2023, I added 73 new albums to my collection nearly 2 albums every three weeks! I listed them below in the order in which I acquired them. I purchased most of these albums when I could and borrowed the rest at libraries. If you want to browse though, I added links to the album covers pointing either to websites where you can buy them or to Discogs when digital copies weren't available. Once again this year, it seems that Punk (mostly O !) and Metal dominate my list, mostly fueled by Angry Metal Guy and the amazing Montr al Skinhead/Punk concert scene. Concerts A trend I started in 2022 was to go to as many concerts of artists I like as possible. I'm happy to report I went to around 80% more concerts in 2023 than in 2022! Looking back at my list, April was quite a busy month... Here are the concerts I went to in 2023: Although metalfinder continues to work as intended, I'm very glad to have discovered the Montr al underground scene has departed from Facebook/Instagram and adopted en masse Gancio, a FOSS community agenda that supports ActivityPub. Our local instance, askapunk.net is pretty much all I could ask for :) That's it for 2023!

3 January 2024

John Goerzen: Consider Security First

I write this in the context of my decision to ditch Raspberry Pi OS and move everything I possibly can, including my Raspberry Pi devices, to Debian. I will write about that later. But for now, I wanted to comment on something I think is often overlooked and misunderstood by people considering distributions or operating systems: the huge importance of getting security updates in an automated and easy way.

Background Let s assume that these statements are true, which I think are well-supported by available evidence:
  1. Every computer system (OS plus applications) that can do useful modern work has security vulnerabilities, some of which are unknown at any given point in time;
  2. During the lifetime of that computer system, some of these vulnerabilities will be discovered. For a (hopefully large) subset of those vulnerabilities, timely patches will become available.
Now then, it follows that applying those timely patches is a critical part of having a system that it as secure as possible. Of course, you have to do other things as well good passwords, secure practices, etc but, fundamentally, if your system lacks patches for known vulnerabilities, you ve already lost at the security ballgame.

How to stay patched There is something of a continuum of how you might patch your system. It runs roughly like this, from best to worst:
  1. All components are kept up-to-date automatically, with no intervention from the user/operator
  2. The operator is automatically alerted to necessary patches, and they can be easily installed with minimal intervention
  3. The operator is automatically alerted to necessary patches, but they require significant effort to apply
  4. The operator has no way to detect vulnerabilities or necessary patches
It should be obvious that the first situation is ideal. Every other situation relies on the timeliness of human action to keep up-to-date with security patches. This is a fallible situation; humans are busy, take trips, dismiss alerts, miss alerts, etc. That said, it is rare to find any system living truly all the way in that scenario, as you ll see.

What is your system ? A critical point here is: what is your system ? It includes:
  • Your kernel
  • Your base operating system
  • Your applications
  • All the libraries needed to run all of the above
Some OSs, such as Debian, make little or no distinction between the base OS and the applications. Others, such as many BSDs, have a distinction there. And in some cases, people will compile or install applications outside of any OS mechanism. (It must be stressed that by doing so, you are taking the responsibility of patching them on your own shoulders.)

How do common systems stack up?
  • Debian, with its support for unattended-upgrades, needrestart, debian-security-support, and such, is largely category 1. It can automatically apply security patches, in most cases can restart the necessary services for the patch to take effect, and will alert you when some processes or the system must be manually restarted for a patch to take effect (for instance, a kernel update). Those cases requiring manual intervention are category 2. The debian-security-support package will even warn you of gaps in the system. You can also use debsecan to scan for known vulnerabilities on a given installation.
  • FreeBSD has no way to automatically install security patches for things in the packages collection. As with many rolling-release systems, you can t automate the installation of these security patches with FreeBSD because it is not safe to blindly update packages. It s not safe to blindly update packages because they may bring along more than just security patches: they may represent major upgrades that introduce incompatibilities, etc. Unlike Debian s practice of backporting fixes and thus producing narrowly-tailored patches, forcing upgrades to newer versions precludes a minimal intervention install. Therefore, rolling release systems are category 3.
  • Things such as Snap, Flatpak, AppImage, Docker containers, Electron apps, and third-party binaries often contain embedded libraries and such for which you have no easy visibility into their status. For instance, if there was a bug in libpng, would you know how many of your containers had a vulnerability? These systems are category 4 you don t even know if you re vulnerable. It s for this reason that my Debian-based Docker containers apply security patches before starting processes, and also run unattended-upgrades and friends.

The pernicious library problem As mentioned in my last category above, hidden vulnerabilities can be a big problem. I ve been writing about this for years. Back in 2017, I wrote an article focused on Docker containers, but which applies to the other systems like Snap and so forth. I cited a study back then that Over 80% of the :latest versions of official images contained at least one high severity vulnerability. The situation is no better now. In December 2023, it was reported that, two years after the critical Log4Shell vulnerability, 25% of apps were still vulnerable to it. Also, only 21% of developers ever update third-party libraries after introducing them into their projects. Clearly, you can t rely on these images with embedded libraries to be secure. And since they are black box, they are difficult to audit. Debian s policy of always splitting libraries out from packages is hugely beneficial; it allows finegrained analysis of not just vulnerabilities, but also the dependency graph. If there s a vulnerability in libpng, you have one place to patch it and you also know exactly what components of your system use it. If you use snaps, or AppImages, you can t know if they contain a deeply embedded vulnerability, nor could you patch it yourself if you even knew. You are at the mercy of upstream detecting and remedying the problem a dicey situation at best.

Who makes the patches? Fundamentally, humans produce security patches. Often, but not always, patches originate with the authors of a program and then are integrated into distribution packages. It should be noted that every security team has finite resources; there will always be some CVEs that aren t patched in a given system for various reasons; perhaps they are not exploitable, or are too low-impact, or have better mitigations than patches. Debian has an excellent security team; they manage the process of integrating patches into Debian, produce Debian Security Advisories, maintain the Debian Security Tracker (which maintains cross-references with the CVE database), etc. Some distributions don t have this infrastructure. For instance, I was unable to find this kind of tracker for Devuan or Raspberry Pi OS. In contrast, Ubuntu and Arch Linux both seem to have active security teams with trackers and advisories.

Implications for Raspberry Pi OS and others As I mentioned above, I m transitioning my Pi devices off Raspberry Pi OS (Raspbian). Security is one reason. Although Raspbian is a fork of Debian, and you can install packages like unattended-upgrades on it, they don t work right because they use the Debian infrastructure, and Raspbian hasn t modified them to use their own infrastructure. I don t see any Raspberry Pi OS security advisories, trackers, etc. In short, they lack the infrastructure to support those Debian tools anyhow. Not only that, but Raspbian lags behind Debian in both new releases and new security patches, sometimes by days or weeks. Live Migrating from Raspberry Pi OS bullseye to Debian bookworm contains instructions for migrating Raspberry Pis to Debian.

27 December 2023

Russ Allbery: Review: A Study in Scarlet

Review: A Study in Scarlet, by Arthur Conan Doyle
Series: Sherlock Holmes #1
Publisher: AmazonClassics
Copyright: 1887
Printing: February 2018
ISBN: 1-5039-5525-7
Format: Kindle
Pages: 159
A Study in Scarlet is the short mystery novel (probably a novella, although I didn't count words) that introduced the world to Sherlock Holmes. I'm going to invoke the 100-year-rule and discuss the plot of this book rather freely on the grounds that even someone who (like me prior to a few days ago) has not yet read it is probably not that invested in avoiding all spoilers. If you do want to remain entirely unspoiled, exercise caution before reading on. I had somehow managed to avoid ever reading anything by Arthur Conan Doyle, not even a short story. I therefore couldn't be sure that some of the assertions I was making in my review of A Study in Honor were correct. Since A Study in Scarlet would be quick to read, I decided on a whim to do a bit of research and grab a free copy of the first Holmes novel. Holmes is such a part of English-speaking culture that I thought I had a pretty good idea of what to expect. This was largely true, but cultural osmosis had somehow not prepared me for the surprise Mormons. A Study in Scarlet establishes the basic parameters of a Holmes story: Dr. James Watson as narrator, the apartment he shares with Holmes at 221B Baker Street, the Baker Street Irregulars, Holmes's competition with police detectives, and his penchant for making leaps of logical deduction from subtle clues. The story opens with Watson meeting Holmes, agreeing to split the rent of a flat, and being baffled by the apparent randomness of Holmes's fields of study before Holmes reveals he's a consulting detective. The first case is a murder: a man is found dead in an abandoned house, without a mark on him although there are blood splatters on the walls and the word "RACHE" written in blood. Since my only prior exposure to Holmes was from cultural references and a few TV adaptations, there were a few things that surprised me. One is that Holmes is voluble and animated rather than aloof. Doyle is clearly going for passionate eccentric rather than calculating mastermind. Another is that he is intentionally and unabashedly ignorant on any topic not related to solving mysteries.
My surprise reached a climax, however, when I found incidentally that he was ignorant of the Copernican Theory and of the composition of the Solar System. That any civilized human being in this nineteenth century should not be aware that the earth travelled round the sun appeared to be to me such an extraordinary fact that I could hardly realize it. "You appear to be astonished," he said, smiling at my expression of surprise. "Now that I do know it I shall do my best to forget it." "To forget it!" "You see," he explained, "I consider that a man's brain originally is like a little empty attic, and you have to stock it with such furniture as you chose. A fool takes in all the lumber of every sort that he comes across, so that the knowledge which might be useful to him gets crowded out, or at best is jumbled up with a lot of other things so that he has a difficulty in laying his hands upon it. Now the skilful workman is very careful indeed as to what he takes into his brain-attic. He will have nothing but the tools which may help him in doing his work, but of these he has a large assortment, and all in the most perfect order. It is a mistake to think that that little room has elastic walls and can distend to any extent. Depend upon it there comes a time when for every addition of knowledge you forget something that you knew before. It is of the highest importance, therefore, not to have useless facts elbowing out the useful ones."
This is directly contrary to my expectation that the best way to make leaps of deduction is to know something about a huge range of topics so that one can draw unexpected connections, particularly given the puzzle-box construction and odd details so beloved in classic mysteries. I'm now curious if Doyle stuck with this conception, and if there were any later mysteries that involved astronomy. Speaking of classic mysteries, A Study in Scarlet isn't quite one, although one can see the shape of the genre to come. Doyle does not "play fair" by the rules that have not yet been invented. Holmes at most points knows considerably more than the reader, including bits of evidence that are not described until Holmes describes them and research that Holmes does off-camera and only reveals when he wants to be dramatic. This is not the sort of story where the reader is encouraged to try to figure out the mystery before the detective. Rather, what Doyle seems to be aiming for, and what Watson attempts (unsuccessfully) as the reader surrogate, is slightly different: once Holmes makes one of his grand assertions, the reader is encouraged to guess what Holmes might have done to arrive at that conclusion. Doyle seems to want the reader to guess technique rather than outcome, while providing only vague clues in general descriptions of Holmes's behavior at a crime scene. The structure of this story is quite odd. The first part is roughly what you would expect: first-person narration from Watson, supposedly taken from his journals but not at all in the style of a journal and explicitly written for an audience. Part one concludes with Holmes capturing and dramatically announcing the name of the killer, who the reader has never heard of before. Part two then opens with... a western?
In the central portion of the great North American Continent there lies an arid and repulsive desert, which for many a long year served as a barrier against the advance of civilization. From the Sierra Nevada to Nebraska, and from the Yellowstone River in the north to the Colorado upon the south, is a region of desolation and silence. Nor is Nature always in one mood throughout the grim district. It comprises snow-capped and lofty mountains, and dark and gloomy valleys. There are swift-flowing rivers which dash through jagged ca ons; and there are enormous plains, which in winter are white with snow, and in summer are grey with the saline alkali dust. They all preserve, however, the common characteristics of barrenness, inhospitality, and misery.
First, I have issues with the geography. That region contains some of the most beautiful areas on earth, and while a lot of that region is arid, describing it primarily as a repulsive desert is a bit much. Doyle's boundaries and distances are also confusing: the Yellowstone is a northeast-flowing river with its source in Wyoming, so the area between it and the Colorado does not extend to the Sierra Nevadas (or even to Utah), and it's not entirely clear to me that he realizes Nevada exists. This is probably what it's like for people who live anywhere else in the world when US authors write about their country. But second, there's no Holmes, no Watson, and not even the pretense of a transition from the detective novel that we were just reading. Doyle just launches into a random western with an omniscient narrator. It features a lean, grizzled man and an adorable child that he adopts and raises into a beautiful free spirit, who then falls in love with a wild gold-rush adventurer. This was written about 15 years before the first critically recognized western novel, so I can't blame Doyle for all the cliches here, but to a modern reader all of these characters are straight from central casting. Well, except for the villains, who are the Mormons. By that, I don't mean that the villains are Mormon. I mean Brigham Young is the on-page villain, plotting against the hero to force his adopted daughter into a Mormon harem (to use the word that Doyle uses repeatedly) and ruling Salt Lake City with an iron hand, border guards with passwords (?!), and secret police. This part of the book was wild. I was laughing out-loud at the sheer malevolent absurdity of the thirty-day countdown to marriage, which I doubt was the intended effect. We do eventually learn that this is the backstory of the murder, but we don't return to Watson and Holmes for multiple chapters. Which leads me to the other thing that surprised me: Doyle lays out this backstory, but then never has his characters comment directly on the morality of it, only the spectacle. Holmes cares only for the intellectual challenge (and for who gets credit), and Doyle sets things up so that the reader need not concern themselves with aftermath, punishment, or anything of that sort. I probably shouldn't have been surprised this does fit with the Holmes stereotype but I'm used to modern fiction where there is usually at least some effort to pass judgment on the events of the story. Doyle draws very clear villains, but is utterly silent on whether the murder is justified. Given its status in the history of literature, I'm not sorry to have read this book, but I didn't particularly enjoy it. It is very much of its time: everyone's moral character is linked directly to their physical appearance, and Doyle uses the occasional racial stereotype without a second thought. Prevailing writing styles have changed, so the prose feels long-winded and breathless. The rivalry between Holmes and the police detectives is tedious and annoying. I also find it hard to read novels from before the general absorption of techniques of emotional realism and interiority into all genres. The characters in A Study in Scarlet felt more like cartoon characters than fully-realized human beings. I have no strong opinion about the objective merits of this book in the context of its time other than to note that the sudden inserted western felt very weird. My understanding is that this is not considered one of the better Holmes stories, and Holmes gets some deeper characterization later on. Maybe I'll try another of Doyle's works someday, but for now my curiosity has been sated. Followed by The Sign of the Four. Rating: 4 out of 10

12 December 2023

Raju Devidas: Nextcloud AIO install with docker-compose and nginx reverse proxy

Nextcloud AIO install with docker-compose and nginx reverse proxyNextcloud is a popular self-hosted solution for file sync and share as well as cloud apps such as document editing, chat and talk, calendar, photo gallery etc. This guide will walk you through setting up Nextcloud AIO using Docker Compose. This blog post would not be possible without immense help from Sahil Dhiman a.k.a. sahilisterThere are various ways in which the installation could be done, in our setup here are the pre-requisites.

Step 1 : The docker-compose file for nextcloud AIOThe original compose.yml file is present in nextcloud AIO&aposs git repo here . By taking a reference of that file, we have own compose.yml here.
services:
  nextcloud-aio-mastercontainer:
    image: nextcloud/all-in-one:latest
    init: true
    restart: always
    container_name: nextcloud-aio-mastercontainer # This line is not allowed to be changed as otherwise AIO will not work correctly
    volumes:
      - nextcloud_aio_mastercontainer:/mnt/docker-aio-config # This line is not allowed to be changed as otherwise the built-in backup solution will not work
      - /var/run/docker.sock:/var/run/docker.sock:ro # May be changed on macOS, Windows or docker rootless. See the applicable documentation. If adjusting, don&apost forget to also set &aposWATCHTOWER_DOCKER_SOCKET_PATH&apos!
    ports:
      - 8080:8080
    environment: # Is needed when using any of the options below
      # - AIO_DISABLE_BACKUP_SECTION=false # Setting this to true allows to hide the backup section in the AIO interface. See https://github.com/nextcloud/all-in-one#how-to-disable-the-backup-section
      - APACHE_PORT=32323 # Is needed when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else). See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      - APACHE_IP_BINDING=127.0.0.1 # Should be set when running behind a web server or reverse proxy (like Apache, Nginx, Cloudflare Tunnel and else) that is running on the same host. See https://github.com/nextcloud/all-in-one/blob/main/reverse-proxy.md
      # - BORG_RETENTION_POLICY=--keep-within=7d --keep-weekly=4 --keep-monthly=6 # Allows to adjust borgs retention policy. See https://github.com/nextcloud/all-in-one#how-to-adjust-borgs-retention-policy
      # - COLLABORA_SECCOMP_DISABLED=false # Setting this to true allows to disable Collabora&aposs Seccomp feature. See https://github.com/nextcloud/all-in-one#how-to-disable-collaboras-seccomp-feature
      - NEXTCLOUD_DATADIR=/opt/docker/cloud.raju.dev/nextcloud # Allows to set the host directory for Nextcloud&aposs datadir.   Warning: do not set or adjust this value after the initial Nextcloud installation is done! See https://github.com/nextcloud/all-in-one#how-to-change-the-default-location-of-nextclouds-datadir
      # - NEXTCLOUD_MOUNT=/mnt/ # Allows the Nextcloud container to access the chosen directory on the host. See https://github.com/nextcloud/all-in-one#how-to-allow-the-nextcloud-container-to-access-directories-on-the-host
      # - NEXTCLOUD_UPLOAD_LIMIT=10G # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-upload-limit-for-nextcloud
      # - NEXTCLOUD_MAX_TIME=3600 # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-max-execution-time-for-nextcloud
      # - NEXTCLOUD_MEMORY_LIMIT=512M # Can be adjusted if you need more. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-php-memory-limit-for-nextcloud
      # - NEXTCLOUD_TRUSTED_CACERTS_DIR=/path/to/my/cacerts # CA certificates in this directory will be trusted by the OS of the nexcloud container (Useful e.g. for LDAPS) See See https://github.com/nextcloud/all-in-one#how-to-trust-user-defined-certification-authorities-ca
      # - NEXTCLOUD_STARTUP_APPS=deck twofactor_totp tasks calendar contacts notes # Allows to modify the Nextcloud apps that are installed on starting AIO the first time. See https://github.com/nextcloud/all-in-one#how-to-change-the-nextcloud-apps-that-are-installed-on-the-first-startup
      # - NEXTCLOUD_ADDITIONAL_APKS=imagemagick # This allows to add additional packages to the Nextcloud container permanently. Default is imagemagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-os-packages-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ADDITIONAL_PHP_EXTENSIONS=imagick # This allows to add additional php extensions to the Nextcloud container permanently. Default is imagick but can be overwritten by modifying this value. See https://github.com/nextcloud/all-in-one#how-to-add-php-extensions-permanently-to-the-nextcloud-container
      # - NEXTCLOUD_ENABLE_DRI_DEVICE=true # This allows to enable the /dev/dri device in the Nextcloud container.   Warning: this only works if the &apos/dev/dri&apos device is present on the host! If it should not exist on your host, don&apost set this to true as otherwise the Nextcloud container will fail to start! See https://github.com/nextcloud/all-in-one#how-to-enable-hardware-transcoding-for-nextcloud
      # - NEXTCLOUD_KEEP_DISABLED_APPS=false # Setting this to true will keep Nextcloud apps that are disabled in the AIO interface and not uninstall them if they should be installed. See https://github.com/nextcloud/all-in-one#how-to-keep-disabled-apps
      # - TALK_PORT=3478 # This allows to adjust the port that the talk container is using. See https://github.com/nextcloud/all-in-one#how-to-adjust-the-talk-port
      # - WATCHTOWER_DOCKER_SOCKET_PATH=/var/run/docker.sock # Needs to be specified if the docker socket on the host is not located in the default &apos/var/run/docker.sock&apos. Otherwise mastercontainer updates will fail. For macos it needs to be &apos/var/run/docker.sock&apos
    # networks: # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      # - nextcloud-aio # Is needed when you want to create the nextcloud-aio network with ipv6-support using this file, see the network config at the bottom of the file
      # - SKIP_DOMAIN_VALIDATION=true
    # # Uncomment the following line when using SELinux
    # security_opt: ["label:disable"]
volumes: # If you want to store the data on a different drive, see https://github.com/nextcloud/all-in-one#how-to-store-the-filesinstallation-on-a-separate-drive
  nextcloud_aio_mastercontainer:
    name: nextcloud_aio_mastercontainer # This line is not allowed to be changed as otherwise the built-in backup solution will not work
I have not removed many of the commented options in the compose file, for a possibility of me using them in the future.If you want a smaller cleaner compose with the extra options, you can refer to
services:
  nextcloud-aio-mastercontainer:
    image: nextcloud/all-in-one:latest
    init: true
    restart: always
    container_name: nextcloud-aio-mastercontainer
    volumes:
      - nextcloud_aio_mastercontainer:/mnt/docker-aio-config
      - /var/run/docker.sock:/var/run/docker.sock:ro
    ports:
      - 8080:8080
    environment:
      - APACHE_PORT=32323
      - APACHE_IP_BINDING=127.0.0.1
      - NEXTCLOUD_DATADIR=/opt/docker/nextcloud
volumes:
  nextcloud_aio_mastercontainer:
    name: nextcloud_aio_mastercontainer
I am using a separate directory to store nextcloud data. As per nextcloud documentation you should be using a separate partition if you want to use this feature, however I did not have that option on my server, so I used a separate directory instead. Also we use a custom port on which nextcloud listens for operations, we have set it up as 32323 above, but you can use any in the permissible port range. The 8080 port is used the setup the AIO management interface. Both 8080 and the APACHE_PORT do not need to be open on the host machine, as we will be using reverse proxy setup with nginx to direct requests. once you have your preferred compose.yml file, you can start the containers using
$ docker-compose -f compose.yml up -d 
Creating network "clouddev_default" with the default driver
Creating volume "nextcloud_aio_mastercontainer" with default driver
Creating nextcloud-aio-mastercontainer ... done
once your container&aposs are running, we can do the nginx setup.

Step 2: Configuring nginx reverse proxy for our domain on host. A reference nginx configuration for nextcloud AIO is given in the nextcloud git repository here . You can modify the configuration file according to your needs and setup. Here is configuration that we are using

map $http_upgrade $connection_upgrade  
    default upgrade;
    &apos&apos close;
 
server  
    listen 80;
    #listen [::]:80;            # comment to disable IPv6
    if ($scheme = "http")  
        return 301 https://$host$request_uri;
     
    listen 443 ssl http2;      # for nginx versions below v1.25.1
    #listen [::]:443 ssl http2; # for nginx versions below v1.25.1 - comment to disable IPv6
    # listen 443 ssl;      # for nginx v1.25.1+
    # listen [::]:443 ssl; # for nginx v1.25.1+ - keep comment to disable IPv6
    # http2 on;                                 # uncomment to enable HTTP/2        - supported on nginx v1.25.1+
    # http3 on;                                 # uncomment to enable HTTP/3 / QUIC - supported on nginx v1.25.0+
    # quic_retry on;                            # uncomment to enable HTTP/3 / QUIC - supported on nginx v1.25.0+
    # add_header Alt-Svc &aposh3=":443"; ma=86400&apos; # uncomment to enable HTTP/3 / QUIC - supported on nginx v1.25.0+
    # listen 443 quic reuseport;       # uncomment to enable HTTP/3 / QUIC - supported on nginx v1.25.0+ - please remove "reuseport" if there is already another quic listener on port 443 with enabled reuseport
    # listen [::]:443 quic reuseport;  # uncomment to enable HTTP/3 / QUIC - supported on nginx v1.25.0+ - please remove "reuseport" if there is already another quic listener on port 443 with enabled reuseport - keep comment to disable IPv6
    server_name cloud.example.com;
    location /  
        proxy_pass http://127.0.0.1:32323$request_uri;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-Scheme $scheme;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header Accept-Encoding "";
        proxy_set_header Host $host;
    
        client_body_buffer_size 512k;
        proxy_read_timeout 86400s;
        client_max_body_size 0;
        # Websocket
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection $connection_upgrade;
     
    ssl_certificate /etc/letsencrypt/live/cloud.example.com/fullchain.pem; # managed by Certbot
    ssl_certificate_key /etc/letsencrypt/live/cloud.example.com/privkey.pem; # managed by Certbot
    ssl_session_timeout 1d;
    ssl_session_cache shared:MozSSL:10m; # about 40000 sessions
    ssl_session_tickets off;
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:DHE-RSA-CHACHA20-POLY1305;
    ssl_prefer_server_ciphers on;
    # Optional settings:
    # OCSP stapling
    # ssl_stapling on;
    # ssl_stapling_verify on;
    # ssl_trusted_certificate /etc/letsencrypt/live/<your-nc-domain>/chain.pem;
    # replace with the IP address of your resolver
    # resolver 127.0.0.1; # needed for oscp stapling: e.g. use 94.140.15.15 for adguard / 1.1.1.1 for cloudflared or 8.8.8.8 for google - you can use the same nameserver as listed in your /etc/resolv.conf file
 
Please note that you need to have valid SSL certificates for your domain for this configuration to work. Steps on getting valid SSL certificates for your domain are beyond the scope of this article. You can give a web search on getting SSL certificates with letsencrypt and you will get several resources on that, or may write a blog post on it separately in the future.once your configuration for nginx is done, you can test the nginx configuration using
$ sudo nginx -t 
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
and then reload nginx with
$ sudo nginx -s reload

Step 3: Setup of Nextcloud AIO from the browser.To setup nextcloud AIO, we need to access it using the web browser on URL of our domain.tld:8080, however we do not want to open the 8080 port publicly to do this, so to complete the setup, here is a neat hack from sahilister
ssh -L 8080:127.0.0.1:8080 username:<server-ip>
you can bind the 8080 port of your server to the 8080 of your localhost using Unix socket forwarding over SSH.The port forwarding only last for the duration of your SSH session, if the SSH session breaks, your port forwarding will to. So, once you have the port forwarded, you can open the nextcloud AIO instance in your web browser at 127.0.0.1:8080
Nextcloud AIO install with docker-compose and nginx reverse proxy
you will get this error because you are trying to access a page on localhost over HTTPS. You can click on advanced and then continue to proceed to the next page. Your data is encrypted over SSH for this session as we are binding the port over SSH. Depending on your choice of browser, the above page might look different.once you have proceeded, the nextcloud AIO interface will open and will look something like this.
Nextcloud AIO install with docker-compose and nginx reverse proxynextcloud AIO initial screen with capsicums as password
It will show an auto generated passphrase, you need to save this passphrase and make sure to not loose it. For the purposes of security, I have masked the passwords with capsicums. once you have noted down your password, you can proceed to the Nextcloud AIO login, enter your password and then login. After login you will be greeted with a screen like this.
Nextcloud AIO install with docker-compose and nginx reverse proxy
now you can put the domain that you want to use in the Submit domain field. Once the domain check is done, you will proceed to the next step and see another screen like this
Nextcloud AIO install with docker-compose and nginx reverse proxy
here you can select any optional containers for the features that you might want. IMPORTANT: Please make sure to also change the time zone at the bottom of the page according to the time zone you wish to operate in.
Nextcloud AIO install with docker-compose and nginx reverse proxy
The timezone setup is also important because the data base will get initialized according to the set time zone. This could result in wrong initialization of database and you ending up in a startup loop for nextcloud. I faced this issue and could only resolve it after getting help from sahilister . Once you are done changing the timezone, and selecting any additional features you want, you can click on Download and start the containersIt will take some time for this process to finish, take a break and look at the farthest object in your room and take a sip of water. Once you are done, and the process has finished you will see a page similar to the following one.
Nextcloud AIO install with docker-compose and nginx reverse proxy
wait patiently for everything to turn green.
Nextcloud AIO install with docker-compose and nginx reverse proxy
once all the containers have started properly, you can open the nextcloud login interface on your configured domain, the initial login details are auto generated as you can see from the above screenshot. Again you will see a password that you need to note down or save to enter the nextcloud interface. Capsicums will not work as passwords. I have masked the auto generated passwords using capsicums.Now you can click on Open your Nextcloud button or go to your configured domain to access the login screen.
Nextcloud AIO install with docker-compose and nginx reverse proxy
You can use the login details from the previous step to login to the administrator account of your Nextcloud instance. There you have it, your very own cloud!

Additional Notes:

How to properly reset Nextcloud setup?While following the above steps, or while following steps from some other tutorial, you may have made a mistake, and want to start everything again from scratch. The instructions for it are present in the Nextcloud documentation here . Here is the TLDR for a docker-compose setup. These steps will delete all data, do not use these steps on an existing nextcloud setup unless you know what you are doing.
  • Stop your master container.
docker-compose -f compose.yml down -v
The above command will also remove the volume associated with the master container
  • Stop all the child containers that has been started by the master container.
docker stop nextcloud-aio-apache nextcloud-aio-notify-push nextcloud-aio-nextcloud nextcloud-aio-imaginary nextcloud-aio-fulltextsearch nextcloud-aio-redis nextcloud-aio-database nextcloud-aio-talk nextcloud-aio-collabora
  • Remove all the child containers that has been started by the master container
docker rm nextcloud-aio-apache nextcloud-aio-notify-push nextcloud-aio-nextcloud nextcloud-aio-imaginary nextcloud-aio-fulltextsearch nextcloud-aio-redis nextcloud-aio-database nextcloud-aio-talk nextcloud-aio-collabora
  • If you also wish to remove all images associated with nextcloud you can do it with
docker rmi $(docker images --filter "reference=nextcloud/*" -q)
  • remove all volumes associated with child containers
docker volume rm <volume-name>
  • remove the network associated with nextcloud
docker network rm nextcloud-aio

Additional references.
  1. Nextcloud Github
  2. Nextcloud reverse proxy documentation
  3. Nextcloud Administration Guide
  4. Nextcloud User Manual
  5. Nextcloud Developer&aposs manual

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed: In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now: As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:
$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)
At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

20 November 2023

Russ Allbery: Review: The Exiled Fleet

Review: The Exiled Fleet, by J.S. Dewes
Series: Divide #2
Publisher: Tor
Copyright: 2021
ISBN: 1-250-23635-5
Format: Kindle
Pages: 421
The Exiled Fleet is far-future interstellar military SF. It is a direct sequel to The Last Watch. You don't want to start here. The Last Watch took a while to get going, but it ended with some fascinating world-building and a suitably enormous threat. I was hoping Dewes would carry that momentum into the second book. I was disappointed; instead, The Exiled Fleet starts with interpersonal angst and wallowing and takes an annoyingly long time to build up narrative tension again. The world-building of the first book looked outward, towards aliens and strange technology and stranger physics, while setting up contributing problems on the home front. The Exiled Fleet pivots inwards, both in terms of world-building and in terms of character introspection. Neither of those worked as well for me. There's nothing wrong with the revelations here about human power structures and the politics that the Sentinels have been missing at the edge of space, but it also felt like a classic human autocracy without much new to offer in either wee thinky bits or plot structure. We knew most of shape from the start of the first book: Cavalon's grandfather is evil, human society is run as an oligarchy, and everything is trending authoritarian. Once the action started, I was entertained but not gripped the way that I was when reading The Last Watch. Dewes makes a brief attempt to tap into the morally complex question of the military serving as a brake on tyranny, but then does very little with it. Instead, everything is excessively personal, turning the political into less of a confrontation of ideologies or ethics and more a story of family abuse and rebellion. There is even more psychodrama in this book than there was in the previous book. I found it exhausting. Rake is barely functional after the events of the previous book and pushing herself way too hard at the start of this one. Cavalon regresses considerably and starts falling apart again. There's a lot of moping, a lot of angst, and a lot of characters berating themselves and occasionally each other. It was annoying enough that I took a couple of weeks break from this book in the middle before I could work up the enthusiasm to finish it. Some of this is personal preference. My favorite type of story is competence porn: details about something esoteric and satisfyingly complex, a challenge to overcome, and a main character who deploys their expertise to overcome that challenge in a way that shows they generally have their shit together. I can enjoy other types of stories, but that's the story I'll keep reaching for. Other people prefer stories about fuck-ups and walking disasters, people who barely pull together enough to survive the plot (or sometimes not even that). There's nothing wrong with that, and neither approach is right or wrong, but my tolerance for that story is usually lot lower. I think Dewes is heading towards the type of story in which dysfunctional characters compensate for each other's flaws in order to keep each other going, and intellectually I can see the appeal. But it's not my thing, and when the main characters are falling apart and the supporting characters project considerably more competence, I wish the story had different protagonists. It didn't help that this is in theory military SF, but Dewes does not seem to want to deploy any of the support framework of the military to address any of her characters' problems. This book is a lot of Rake and Cavalon dragging each other through emotional turmoil while coming to terms with Cavalon's family. I liked their dynamic in the first book when it felt more like Rake showing leadership skills. Here, it turns into something closer to found family in ways that seemed wildly inconsistent with the military structure, and while I'm normally not one to defend hierarchical discipline, I felt like Rake threw out the only structure she had to handle the thousands of other people under her command and started winging it based on personal friendship. If this were a small commercial crew, sure, fine, but Rake has a personal command responsibility that she obsessively angsts about and yet keeps abandoning. I realize this is probably another way to complain that I wanted competence porn and got barely-functional fuck-ups. The best parts of this series are the strange technologies and the aliens, and they are again the best part of this book. There was a truly great moment involving Viator technology that I found utterly delightful, and there was an intriguing setup for future books that caught my attention. Unfortunately, there were also a lot of deus ex machina solutions to problems, both from convenient undisclosed character backstories and from alien tech. I felt like the characters had to work satisfyingly hard for their victories in the first book; here, I felt like Dewes kept having issues with her characters being at point A and her plot at point B and pulling some rabbit out of the hat to make the plot work. This unfortunately undermined the cool factor of the world-building by making its plot device aspects a bit too obvious. This series also turns out not to be a duology (I have no idea why I thought it would be). By the end of The Exiled Fleet, none of the major political or world-building problems have been resolved. At best, the characters are in a more stable space to start being proactive. I'm cautiously optimistic that could mean the series would turn into the type of story I was hoping for, but I'm worried that Dewes is interested in writing a different type of character story than I am interested in reading. Hopefully there will be some clues in the synopsis of the (as yet unannounced) third book. I thought The Last Watch had some first-novel problems but was worth reading. I am much more reluctant to recommend The Exiled Fleet, or the series as a whole given that it is incomplete. Unless you like dysfunctional characters, proceed with caution. Rating: 5 out of 10

16 November 2023

Dimitri John Ledkov: Ubuntu 23.10 significantly reduces the installed kernel footprint


Photo by Pixabay
Ubuntu systems typically have up to 3 kernels installed, before they are auto-removed by apt on classic installs. Historically the installation was optimized for metered download size only. However, kernel size growth and usage no longer warrant such optimizations. During the 23.10 Mantic Minatour cycle, I led a coordinated effort across multiple teams to implement lots of optimizations that together achieved unprecedented install footprint improvements.

Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:

  • 2x less disk space used (1,417MB vs 2,940MB, including initrd)

  • 3x less peak RAM usage for the initrd boot (68MB vs 204MB)

  • 0.5x increase in download size (949MB vs 600MB)

  • 2.5x faster initrd generation (4.5s vs 11.3s)

  • approximately the same total time (103s vs 98s, hardware dependent)


For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:

  • 1.3x less disk space used (548MB vs 742MB)

  • 2.2x less peak RAM usage for initrd boot (27MB vs 62MB)

  • 0.4x increase in download size (207MB vs 146MB)


Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.

This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.

The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.

The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.

Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.

Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:

For questions and comments please post to Kernel section on Ubuntu Discourse.



13 November 2023

Freexian Collaborators: Monthly report about Debian Long Term Support, October 2023 (by Roberto C. S nchez)

Like each month, have a look at the work funded by Freexian s Debian LTS offering.

Debian LTS contributors In October, 18 contributors have been paid to work on Debian LTS, their reports are available:
  • Adrian Bunk did 8.0h (out of 7.75h assigned and 10.0h from previous period), thus carrying over 9.75h to the next month.
  • Anton Gladky did 9.5h (out of 9.5h assigned and 5.5h from previous period), thus carrying over 5.5h to the next month.
  • Bastien Roucari s did 16.0h (out of 16.75h assigned and 1.0h from previous period), thus carrying over 1.75h to the next month.
  • Ben Hutchings did 8.0h (out of 17.75h assigned), thus carrying over 9.75h to the next month.
  • Chris Lamb did 17.0h (out of 17.75h assigned), thus carrying over 0.75h to the next month.
  • Emilio Pozuelo Monfort did 17.5h (out of 17.75h assigned), thus carrying over 0.25h to the next month.
  • Guilhem Moulin did 9.75h (out of 17.75h assigned), thus carrying over 8.0h to the next month.
  • Helmut Grohne did 1.5h (out of 10.0h assigned), thus carrying over 8.5h to the next month.
  • Lee Garrett did 10.75h (out of 17.75h assigned), thus carrying over 7.0h to the next month.
  • Markus Koschany did 30.0h (out of 30.0h assigned).
  • Ola Lundqvist did 4.0h (out of 0h assigned and 19.5h from previous period), thus carrying over 15.5h to the next month.
  • Roberto C. S nchez did 12.0h (out of 5.0h assigned and 7.0h from previous period).
  • Santiago Ruano Rinc n did 13.625h (out of 7.75h assigned and 8.25h from previous period), thus carrying over 2.375h to the next month.
  • Sean Whitton did 13.0h (out of 6.0h assigned and 7.0h from previous period).
  • Sylvain Beucler did 7.5h (out of 11.25h assigned and 6.5h from previous period), thus carrying over 10.25h to the next month.
  • Thorsten Alteholz did 14.0h (out of 14.0h assigned).
  • Tobias Frost did 16.0h (out of 9.25h assigned and 6.75h from previous period).
  • Utkarsh Gupta did 0.0h (out of 0.75h assigned and 17.0h from previous period), thus carrying over 17.75h to the next month.

Evolution of the situation In October, we have released 49 DLAs. Of particular note in the month of October, LTS contributor Chris Lamb issued DLA 3627-1 pertaining to Redis, the popular key-value database similar to Memcached, which was vulnerable to an authentication bypass vulnerability. Fixing this vulnerability involved dealing with a race condition that could allow another process an opportunity to establish an otherwise unauthorized connection. LTS contributor Markus Koschany was involved in the mitigation of CVE-2023-44487, which is a protocol-level vulnerability in the HTTP/2 protocol. The impacts within Debian involved multiple packages, across multiple releases, with multiple advisories being released (both DSA for stable and old-stable, and DLA for LTS). Markus reviewed patches and security updates prepared by other Debian developers, investigated reported regressions, provided patches for the aforementioned regressions, and issued several security updates as part of this. Additionally, as MariaDB 10.3 (the version originally included with Debian buster) passed end-of-life earlier this year, LTS contributor Emilio Pozuelo Monfort has begun investigating the feasibility of backporting MariaDB 10.11. The work is in early stages, with much testing and analysis remaining before a final decision can be made, as this only one of several available potential courses of action concerning MariaDB. Finally, LTS contributor Lee Garrett has invested considerable effort into the development the Functional Test Framework here. While so far only an initial version has been published, it already has several features which we intend to begin leveraging for testing of LTS packages. In particular, the FTF supports provisioning multiple VMs for the purposes of performing functional tests of network-facing services (e.g., file services, authentication, etc.). These tests are in addition to the various unit-level tests which are executed during package build time. Development work will continue on FTF and as it matures and begins to see wider use within LTS we expect to improve the quality of the updates we publish.

Thanks to our sponsors Sponsors that joined recently are in bold.

12 November 2023

Petter Reinholdtsen: New and improved sqlcipher in Debian for accessing Signal database

For a while now I wanted to have direct access to the Signal database of messages and channels of my Desktop edition of Signal. I prefer the enforced end to end encryption of Signal these days for my communication with friends and family, to increase the level of safety and privacy as well as raising the cost of the mass surveillance government and non-government entities practice these days. In August I came across a nice recipe on how to use sqlcipher to extract statistics from the Signal database explaining how to do this. Unfortunately this did not work with the version of sqlcipher in Debian. The sqlcipher package is a "fork" of the sqlite package with added support for encrypted databases. Sadly the current Debian maintainer announced more than three years ago that he did not have time to maintain sqlcipher, so it seemed unlikely to be upgraded by the maintainer. I was reluctant to take on the job myself, as I have very limited experience maintaining shared libraries in Debian. After waiting and hoping for a few months, I gave up the last week, and set out to update the package. In the process I orphaned it to make it more obvious for the next person looking at it that the package need proper maintenance. The version in Debian was around five years old, and quite a lot of changes had taken place upstream into the Debian maintenance git repository. After spending a few days importing the new upstream versions, realising that upstream did not care much for SONAME versioning as I saw library symbols being both added and removed with minor version number changes to the project, I concluded that I had to do a SONAME bump of the library package to avoid surprising the reverse dependencies. I even added a simple autopkgtest script to ensure the package work as intended. Dug deep into the hole of learning shared library maintenance, I set out a few days ago to upload the new version to Debian experimental to see what the quality assurance framework in Debian had to say about the result. The feedback told me the pacakge was not too shabby, and yesterday I uploaded the latest version to Debian unstable. It should enter testing today or tomorrow, perhaps delayed by a small library transition. Armed with a new version of sqlcipher, I can now have a look at the SQL database in ~/.config/Signal/sql/db.sqlite. First, one need to fetch the encryption key from the Signal configuration using this simple JSON extraction command:
/usr/bin/jq -r '."key"' ~/.config/Signal/config.json
Assuming the result from that command is 'secretkey', which is a hexadecimal number representing the key used to encrypt the database. Next, one can now connect to the database and inject the encryption key for access via SQL to fetch information from the database. Here is an example dumping the database structure:
% sqlcipher ~/.config/Signal/sql/db.sqlite
sqlite> PRAGMA key = "x'secretkey'";
sqlite> .schema
CREATE TABLE sqlite_stat1(tbl,idx,stat);
CREATE TABLE conversations(
      id STRING PRIMARY KEY ASC,
      json TEXT,
      active_at INTEGER,
      type STRING,
      members TEXT,
      name TEXT,
      profileName TEXT
    , profileFamilyName TEXT, profileFullName TEXT, e164 TEXT, serviceId TEXT, groupId TEXT, profileLastFetchedAt INTEGER);
CREATE TABLE identityKeys(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE items(
      id STRING PRIMARY KEY ASC,
      json TEXT
    );
CREATE TABLE sessions(
      id TEXT PRIMARY KEY,
      conversationId TEXT,
      json TEXT
    , ourServiceId STRING, serviceId STRING);
CREATE TABLE attachment_downloads(
    id STRING primary key,
    timestamp INTEGER,
    pending INTEGER,
    json TEXT
  );
CREATE TABLE sticker_packs(
    id TEXT PRIMARY KEY,
    key TEXT NOT NULL,
    author STRING,
    coverStickerId INTEGER,
    createdAt INTEGER,
    downloadAttempts INTEGER,
    installedAt INTEGER,
    lastUsed INTEGER,
    status STRING,
    stickerCount INTEGER,
    title STRING
  , attemptedStatus STRING, position INTEGER DEFAULT 0 NOT NULL, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync
      INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE stickers(
    id INTEGER NOT NULL,
    packId TEXT NOT NULL,
    emoji STRING,
    height INTEGER,
    isCoverOnly INTEGER,
    lastUsed INTEGER,
    path STRING,
    width INTEGER,
    PRIMARY KEY (id, packId),
    CONSTRAINT stickers_fk
      FOREIGN KEY (packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE sticker_references(
    messageId STRING,
    packId TEXT,
    CONSTRAINT sticker_references_fk
      FOREIGN KEY(packId)
      REFERENCES sticker_packs(id)
      ON DELETE CASCADE
  );
CREATE TABLE emojis(
    shortName TEXT PRIMARY KEY,
    lastUsage INTEGER
  );
CREATE TABLE messages(
        rowid INTEGER PRIMARY KEY ASC,
        id STRING UNIQUE,
        json TEXT,
        readStatus INTEGER,
        expires_at INTEGER,
        sent_at INTEGER,
        schemaVersion INTEGER,
        conversationId STRING,
        received_at INTEGER,
        source STRING,
        hasAttachments INTEGER,
        hasFileAttachments INTEGER,
        hasVisualMediaAttachments INTEGER,
        expireTimer INTEGER,
        expirationStartTimestamp INTEGER,
        type STRING,
        body TEXT,
        messageTimer INTEGER,
        messageTimerStart INTEGER,
        messageTimerExpiresAt INTEGER,
        isErased INTEGER,
        isViewOnce INTEGER,
        sourceServiceId TEXT, serverGuid STRING NULL, sourceDevice INTEGER, storyId STRING, isStory INTEGER
        GENERATED ALWAYS AS (type IS 'story'), isChangeCreatedByUs INTEGER NOT NULL DEFAULT 0, isTimerChangeFromSync INTEGER
        GENERATED ALWAYS AS (
          json_extract(json, '$.expirationTimerUpdate.fromSync') IS 1
        ), seenStatus NUMBER default 0, storyDistributionListId STRING, expiresAt INT
        GENERATED ALWAYS
        AS (ifnull(
          expirationStartTimestamp + (expireTimer * 1000),
          9007199254740991
        )), shouldAffectActivity INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), shouldAffectPreview INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), isUserInitiatedMessage INTEGER
        GENERATED ALWAYS AS (
          type IS NULL
          OR
          type NOT IN (
            'change-number-notification',
            'contact-removed-notification',
            'conversation-merge',
            'group-v1-migration',
            'group-v2-change',
            'keychange',
            'message-history-unsynced',
            'profile-change',
            'story',
            'universal-timer-notification',
            'verified-change'
          )
        ), mentionsMe INTEGER NOT NULL DEFAULT 0, isGroupLeaveEvent INTEGER
        GENERATED ALWAYS AS (
          type IS 'group-v2-change' AND
          json_array_length(json_extract(json, '$.groupV2Change.details')) IS 1 AND
          json_extract(json, '$.groupV2Change.details[0].type') IS 'member-remove' AND
          json_extract(json, '$.groupV2Change.from') IS NOT NULL AND
          json_extract(json, '$.groupV2Change.from') IS json_extract(json, '$.groupV2Change.details[0].aci')
        ), isGroupLeaveEventFromOther INTEGER
        GENERATED ALWAYS AS (
          isGroupLeaveEvent IS 1
          AND
          isChangeCreatedByUs IS 0
        ), callId TEXT
        GENERATED ALWAYS AS (
          json_extract(json, '$.callId')
        ));
CREATE TABLE sqlite_stat4(tbl,idx,neq,nlt,ndlt,sample);
CREATE TABLE jobs(
        id TEXT PRIMARY KEY,
        queueType TEXT STRING NOT NULL,
        timestamp INTEGER NOT NULL,
        data STRING TEXT
      );
CREATE TABLE reactions(
        conversationId STRING,
        emoji STRING,
        fromId STRING,
        messageReceivedAt INTEGER,
        targetAuthorAci STRING,
        targetTimestamp INTEGER,
        unread INTEGER
      , messageId STRING);
CREATE TABLE senderKeys(
        id TEXT PRIMARY KEY NOT NULL,
        senderId TEXT NOT NULL,
        distributionId TEXT NOT NULL,
        data BLOB NOT NULL,
        lastUpdatedDate NUMBER NOT NULL
      );
CREATE TABLE unprocessed(
        id STRING PRIMARY KEY ASC,
        timestamp INTEGER,
        version INTEGER,
        attempts INTEGER,
        envelope TEXT,
        decrypted TEXT,
        source TEXT,
        serverTimestamp INTEGER,
        sourceServiceId STRING
      , serverGuid STRING NULL, sourceDevice INTEGER, receivedAtCounter INTEGER, urgent INTEGER, story INTEGER);
CREATE TABLE sendLogPayloads(
        id INTEGER PRIMARY KEY ASC,
        timestamp INTEGER NOT NULL,
        contentHint INTEGER NOT NULL,
        proto BLOB NOT NULL
      , urgent INTEGER, hasPniSignatureMessage INTEGER DEFAULT 0 NOT NULL);
CREATE TABLE sendLogRecipients(
        payloadId INTEGER NOT NULL,
        recipientServiceId STRING NOT NULL,
        deviceId INTEGER NOT NULL,
        PRIMARY KEY (payloadId, recipientServiceId, deviceId),
        CONSTRAINT sendLogRecipientsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE sendLogMessageIds(
        payloadId INTEGER NOT NULL,
        messageId STRING NOT NULL,
        PRIMARY KEY (payloadId, messageId),
        CONSTRAINT sendLogMessageIdsForeignKey
          FOREIGN KEY (payloadId)
          REFERENCES sendLogPayloads(id)
          ON DELETE CASCADE
      );
CREATE TABLE preKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE signedPreKeys(
        id STRING PRIMARY KEY ASC,
        json TEXT
      , ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE badges(
        id TEXT PRIMARY KEY,
        category TEXT NOT NULL,
        name TEXT NOT NULL,
        descriptionTemplate TEXT NOT NULL
      );
CREATE TABLE badgeImageFiles(
        badgeId TEXT REFERENCES badges(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        'order' INTEGER NOT NULL,
        url TEXT NOT NULL,
        localPath TEXT,
        theme TEXT NOT NULL
      );
CREATE TABLE storyReads (
        authorId STRING NOT NULL,
        conversationId STRING NOT NULL,
        storyId STRING NOT NULL,
        storyReadDate NUMBER NOT NULL,
        PRIMARY KEY (authorId, storyId)
      );
CREATE TABLE storyDistributions(
        id STRING PRIMARY KEY NOT NULL,
        name TEXT,
        senderKeyInfoJson STRING
      , deletedAtTimestamp INTEGER, allowsReplies INTEGER, isBlockList INTEGER, storageID STRING, storageVersion INTEGER, storageUnknownFields BLOB, storageNeedsSync INTEGER);
CREATE TABLE storyDistributionMembers(
        listId STRING NOT NULL REFERENCES storyDistributions(id)
          ON DELETE CASCADE
          ON UPDATE CASCADE,
        serviceId STRING NOT NULL,
        PRIMARY KEY (listId, serviceId)
      );
CREATE TABLE uninstalled_sticker_packs (
        id STRING NOT NULL PRIMARY KEY,
        uninstalledAt NUMBER NOT NULL,
        storageID STRING,
        storageVersion NUMBER,
        storageUnknownFields BLOB,
        storageNeedsSync INTEGER NOT NULL
      );
CREATE TABLE groupCallRingCancellations(
        ringId INTEGER PRIMARY KEY,
        createdAt INTEGER NOT NULL
      );
CREATE TABLE IF NOT EXISTS 'messages_fts_data'(id INTEGER PRIMARY KEY, block BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_idx'(segid, term, pgno, PRIMARY KEY(segid, term)) WITHOUT ROWID;
CREATE TABLE IF NOT EXISTS 'messages_fts_content'(id INTEGER PRIMARY KEY, c0);
CREATE TABLE IF NOT EXISTS 'messages_fts_docsize'(id INTEGER PRIMARY KEY, sz BLOB);
CREATE TABLE IF NOT EXISTS 'messages_fts_config'(k PRIMARY KEY, v) WITHOUT ROWID;
CREATE TABLE edited_messages(
        messageId STRING REFERENCES messages(id)
          ON DELETE CASCADE,
        sentAt INTEGER,
        readStatus INTEGER
      , conversationId STRING);
CREATE TABLE mentions (
        messageId REFERENCES messages(id) ON DELETE CASCADE,
        mentionAci STRING,
        start INTEGER,
        length INTEGER
      );
CREATE TABLE kyberPreKeys(
        id STRING PRIMARY KEY NOT NULL,
        json TEXT NOT NULL, ourServiceId NUMBER
        GENERATED ALWAYS AS (json_extract(json, '$.ourServiceId')));
CREATE TABLE callsHistory (
        callId TEXT PRIMARY KEY,
        peerId TEXT NOT NULL, -- conversation id (legacy)   uuid   groupId   roomId
        ringerId TEXT DEFAULT NULL, -- ringer uuid
        mode TEXT NOT NULL, -- enum "Direct"   "Group"
        type TEXT NOT NULL, -- enum "Audio"   "Video"   "Group"
        direction TEXT NOT NULL, -- enum "Incoming"   "Outgoing
        -- Direct: enum "Pending"   "Missed"   "Accepted"   "Deleted"
        -- Group: enum "GenericGroupCall"   "OutgoingRing"   "Ringing"   "Joined"   "Missed"   "Declined"   "Accepted"   "Deleted"
        status TEXT NOT NULL,
        timestamp INTEGER NOT NULL,
        UNIQUE (callId, peerId) ON CONFLICT FAIL
      );
[ dropped all indexes to save space in this blog post ]
CREATE TRIGGER messages_on_view_once_update AFTER UPDATE ON messages
      WHEN
        new.body IS NOT NULL AND new.isViewOnce = 1
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
      END;
CREATE TRIGGER messages_on_insert AFTER INSERT ON messages
      WHEN new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_delete AFTER DELETE ON messages BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        DELETE FROM sendLogPayloads WHERE id IN (
          SELECT payloadId FROM sendLogMessageIds
          WHERE messageId = old.id
        );
        DELETE FROM reactions WHERE rowid IN (
          SELECT rowid FROM reactions
          WHERE messageId = old.id
        );
        DELETE FROM storyReads WHERE storyId = old.storyId;
      END;
CREATE VIRTUAL TABLE messages_fts USING fts5(
        body,
        tokenize = 'signal_tokenizer'
      );
CREATE TRIGGER messages_on_update AFTER UPDATE ON messages
      WHEN
        (new.body IS NULL OR old.body IS NOT new.body) AND
         new.isViewOnce IS NOT 1 AND new.storyId IS NULL
      BEGIN
        DELETE FROM messages_fts WHERE rowid = old.rowid;
        INSERT INTO messages_fts
          (rowid, body)
        VALUES
          (new.rowid, new.body);
      END;
CREATE TRIGGER messages_on_insert_insert_mentions AFTER INSERT ON messages
      BEGIN
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
CREATE TRIGGER messages_on_update_update_mentions AFTER UPDATE ON messages
      BEGIN
        DELETE FROM mentions WHERE messageId = new.id;
        INSERT INTO mentions (messageId, mentionAci, start, length)
        
    SELECT messages.id, bodyRanges.value ->> 'mentionAci' as mentionAci,
      bodyRanges.value ->> 'start' as start,
      bodyRanges.value ->> 'length' as length
    FROM messages, json_each(messages.json ->> 'bodyRanges') as bodyRanges
    WHERE bodyRanges.value ->> 'mentionAci' IS NOT NULL
  
        AND messages.id = new.id;
      END;
sqlite>
Finally I have the tool needed to inspect and process Signal messages that I need, without using the vendor provided client. Now on to transforming it to a more useful format. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

11 November 2023

Reproducible Builds: Reproducible Builds in October 2023

Welcome to the October 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries.

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort, and this instance was no different. During this enriching event, participants had the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. A number of concrete outcomes from the summit will documented in the report for November 2023 and elsewhere. Amazingly the agenda and all notes from all sessions are already online. The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Reflections on Reflections on Trusting Trust Russ Cox posted a fascinating article on his blog prompted by the fortieth anniversary of Ken Thompson s award-winning paper, Reflections on Trusting Trust:
[ ] In March 2023, Ken gave the closing keynote [and] during the Q&A session, someone jokingly asked about the Turing award lecture, specifically can you tell us right now whether you have a backdoor into every copy of gcc and Linux still today?
Although Ken reveals (or at least claims!) that he has no such backdoor, he does admit that he has the actual code which Russ requests and subsequently dissects in great but accessible detail.

Ecosystem factors of reproducible builds Rahul Bajaj, Eduardo Fernandes, Bram Adams and Ahmed E. Hassan from the Maintenance, Construction and Intelligence of Software (MCIS) laboratory within the School of Computing, Queen s University in Ontario, Canada have published a paper on the Time to fix, causes and correlation with external ecosystem factors of unreproducible builds. The authors compare various response times within the Debian and Arch Linux distributions including, for example:
Arch Linux packages become reproducible a median of 30 days quicker when compared to Debian packages, while Debian packages remain reproducible for a median of 68 days longer once fixed.
A full PDF of their paper is available online, as are many other interesting papers on MCIS publication page.

NixOS installation image reproducible On the NixOS Discourse instance, Arnout Engelen (raboof) announced that NixOS have created an independent, bit-for-bit identical rebuilding of the nixos-minimal image that is used to install NixOS. In their post, Arnout details what exactly can be reproduced, and even includes some of the history of this endeavour:
You may remember a 2021 announcement that the minimal ISO was 100% reproducible. While back then we successfully tested that all packages that were needed to build the ISO were individually reproducible, actually rebuilding the ISO still introduced differences. This was due to some remaining problems in the hydra cache and the way the ISO was created. By the time we fixed those, regressions had popped up (notably an upstream problem in Python 3.10), and it isn t until this week that we were back to having everything reproducible and being able to validate the complete chain.
Congratulations to NixOS team for reaching this important milestone! Discussion about this announcement can be found underneath the post itself, as well as on Hacker News.

CPython source tarballs now reproducible Seth Larson published a blog post investigating the reproducibility of the CPython source tarballs. Using diffoscope, reprotest and other tools, Seth documents his work that led to a pull request to make these files reproducible which was merged by ukasz Langa.

New arm64 hardware from Codethink Long-time sponsor of the project, Codethink, have generously replaced our old Moonshot-Slides , which they have generously hosted since 2016 with new KVM-based arm64 hardware. Holger Levsen integrated these new nodes to the Reproducible Builds continuous integration framework.

Community updates On our mailing list during October 2023 there were a number of threads, including:
  • Vagrant Cascadian continued a thread about the implementation details of a snapshot archive server required for reproducing previous builds. [ ]
  • Akihiro Suda shared an update on BuildKit, a toolkit for building Docker container images. Akihiro links to a interesting talk they recently gave at DockerCon titled Reproducible builds with BuildKit for software supply-chain security.
  • Alex Zakharov started a thread discussing and proposing fixes for various tools that create ext4 filesystem images. [ ]
Elsewhere, Pol Dellaiera made a number of improvements to our website, including fixing typos and links [ ][ ], adding a NixOS Flake file [ ] and sorting our publications page by date [ ]. Vagrant Cascadian presented Reproducible Builds All The Way Down at the Open Source Firmware Conference.

Distribution work distro-info is a Debian-oriented tool that can provide information about Debian (and Ubuntu) distributions such as their codenames (eg. bookworm) and so on. This month, Benjamin Drung uploaded a new version of distro-info that added support for the SOURCE_DATE_EPOCH environment variable in order to close bug #1034422. In addition, 8 reviews of packages were added, 74 were updated and 56 were removed this month, all adding to our knowledge about identified issues. Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Chris Lamb fixed an issue in diffoscope, where if the equivalent of file -i returns text/plain, fallback to comparing as a text file. This was originally filed as Debian bug #1053668) by Niels Thykier. [ ] This was then uploaded to Debian (and elsewhere) as version 251.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Refine the handling of package blacklisting, such as sending blacklisting notifications to the #debian-reproducible-changes IRC channel. [ ][ ][ ]
    • Install systemd-oomd on all Debian bookworm nodes (re. Debian bug #1052257). [ ]
    • Detect more cases of failures to delete schroots. [ ]
    • Document various bugs in bookworm which are (currently) being manually worked around. [ ]
  • Node-related changes:
    • Integrate the new arm64 machines from Codethink. [ ][ ][ ][ ][ ][ ]
    • Improve various node cleanup routines. [ ][ ][ ][ ]
    • General node maintenance. [ ][ ][ ][ ]
  • Monitoring-related changes:
    • Remove unused Munin monitoring plugins. [ ]
    • Complain less visibly about too many installed kernels. [ ]
  • Misc:
    • Enhance the firewall handling on Jenkins nodes. [ ][ ][ ][ ]
    • Install the fish shell everywhere. [ ]
In addition, Vagrant Cascadian added some packages and configuration for snapshot experiments. [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

7 November 2023

Matthew Palmer: PostgreSQL Encryption: The Available Options

On an episode of Postgres FM, the hosts had a (very brief) discussion of data encryption in PostgreSQL. While Postgres FM is a podcast well worth a subscribe, the hosts aren t data security experts, and so as someone who builds a queryable database encryption system, I found the coverage to be somewhat lacking. I figured I d provide a more complete survey of the available options for PostgreSQL-related data encryption.

The Status Quo By default, when you install PostgreSQL, there is no data encryption at all. That means that anyone who gets access to any part of the system can read all the data they have access to. This is, of course, not peculiar to PostgreSQL: basically everything works much the same way. What s stopping an attacker from nicking off with all your data is the fact that they can t access the database at all. The things that are acting as protection are perimeter defences, like putting the physical equipment running the server in a secure datacenter, firewalls to prevent internet randos connecting to the database, and strong passwords. This is referred to as tortoise security it s tough on the outside, but soft on the inside. Once that outer shell is cracked, the delicious, delicious data is ripe for the picking, and there s absolutely nothing to stop a miscreant from going to town and making off with everything. It s a good idea to plan your defenses on the assumption you re going to get breached sooner or later. Having good defence-in-depth includes denying the attacker to your data even if they compromise the database. This is where encryption comes in.

Storage-Layer Defences: Disk / Volume Encryption To protect against the compromise of the storage that your database uses (physical disks, EBS volumes, and the like), it s common to employ encryption-at-rest, such as full-disk encryption, or volume encryption. These mechanisms protect against offline attacks, but provide no protection while the system is actually running. And therein lies the rub: your database is always running, so encryption at rest typically doesn t provide much value. If you re running physical systems, disk encryption is essential, but more to prevent accidental data loss, due to things like failing to wipe drives before disposing of them, rather than physical theft. In systems where volume encryption is only a tickbox away, it s also worth enabling, if only to prevent inane questions from your security auditors. Relying solely on storage-layer defences, though, is very unlikely to provide any appreciable value in preventing data loss.

Database-Layer Defences: Transparent Database Encryption If you ve used proprietary database systems in high-security environments, you might have come across Transparent Database Encryption (TDE). There are also a couple of proprietary extensions for PostgreSQL that provide this functionality. TDE is essentially encryption-at-rest implemented inside the database server. As such, it has much the same drawbacks as disk encryption: few real-world attacks are thwarted by it. There is a very small amount of additional protection, in that physical level backups (as produced by pg_basebackup) are protected, but the vast majority of attacks aren t stopped by TDE. Any attacker who can access the database while it s running can just ask for an SQL-level dump of the stored data, and they ll get the unencrypted data quick as you like.

Application-Layer Defences: Field Encryption If you want to take the database out of the threat landscape, you really need to encrypt sensitive data before it even gets near the database. This is the realm of field encryption, more commonly known as application-level encryption. This technique involves encrypting each field of data before it is sent to be stored in the database, and then decrypting it again after it s retrieved from the database. Anyone who gets the data from the database directly, whether via a backup or a direct connection, is out of luck: they can t decrypt the data, and therefore it s worthless. There are, of course, some limitations of this technique. For starters, every ORM and data mapper out there has rolled their own encryption format, meaning that there s basically zero interoperability. This isn t a problem if you build everything that accesses the database using a single framework, but if you ever feel the need to migrate, or use the database from multiple codebases, you re likely in for a rough time. The other big problem of traditional application-level encryption is that, when the database can t understand what data its storing, it can t run queries against that data. So if you want to encrypt, say, your users dates of birth, but you also need to be able to query on that field, you need to choose between one or the other: you can t have both at the same time. You may think to yourself, but this isn t any good, an attacker that breaks into my application can still steal all my data! . That is true, but security is never binary. The name of the game is reducing the attack surface, making it harder for an attacker to succeed. If you leave all the data unencrypted in the database, an attacker can steal all your data by breaking into the database or by breaking into the application. Encrypting the data reduces the attacker s options, and allows you to focus your resources on hardening the application against attack, safe in the knowledge that an attacker who gets into the database directly isn t going to get anything valuable.

Sidenote: The Curious Case of pg_crypto PostgreSQL ships a contrib module called pg_crypto, which provides encryption and decryption functions. This sounds ideal to use for encrypting data within our applications, as it s available no matter what we re using to write our application. It avoids the problem of framework-specific cryptography, because you call the same PostgreSQL functions no matter what language you re using, which produces the same output. However, I don t recommend ever using pg_crypto s data encryption functions, and I doubt you will find many other cryptographic engineers who will, either. First up, and most horrifyingly, it requires you to pass the long-term keys to the database server. If there s an attacker actively in the database server, they can capture the keys as they come in, which means all the data encrypted using that key is exposed. Sending the keys can also result in the keys ending up in query logs, both on the client and server, which is obviously a terrible result. Less scary, but still very concerning, is that pg_crypto s available cryptography is, to put it mildly, antiquated. We have a lot of newer, safer, and faster techniques for data encryption, that aren t available in pg_crypto. This means that if you do use it, you re leaving a lot on the table, and need to have skilled cryptographic engineers on hand to avoid the potential pitfalls. In short: friends don t let friends use pg_crypto.

The Future: Enquo All this brings us to the project I run: Enquo. It takes application-layer encryption to a new level, by providing a language- and framework-agnostic cryptosystem that also enables encrypted data to be efficiently queried by the database. So, you can encrypt your users dates of birth, in such a way that anyone with the appropriate keys can query the database to return, say, all users over the age of 18, but an attacker just sees unintelligible gibberish. This should greatly increase the amount of data that can be encrypted, and as the Enquo project expands its available data types and supported languages, the coverage of encrypted data will grow and grow. My eventual goal is to encrypt all data, all the time. If this appeals to you, visit enquo.org to use or contribute to the open source project, or EnquoDB.com for commercial support and hosted database options.

29 October 2023

Aigars Mahinovs: Figuring out finances part 4

At the end of the last part of this, we got a Home Assistant OS installation that contains in itself a Firefly III instance and that contains all the current financial information. Now I will try to connect the two. While it could be nice to create a fully-featured integration for Firefly III to Home Assistant to communicate all interesting values and events, I have an interest on programming a more advanced data point calculation for my budget needs, so a less generic, but more flexible approch is a better one for me. So I was quite interested when among the addons in the Home Assistant Addon Store I saw AppDaemon - a way to simply integrate arbitrary Python processing with Home Assistant. Let's see if that can do what I want. For start, after reading the tutorial , I wanted to create a simple script that would use Firefly III REST API to read the current balance of my main account and then send that to Home Assistant as a sensor value, which then can be displayed on a dashboard. As a quick try I modified the provided hello_world.py that is included in the default AppDaemon installation:
import requests
from datetime import datetime
import appdaemon.plugins.hass.hassapi as hass
app_token = "<FIREFLY_PERSONAL_ACCESS_TOKEN>"
firefly_url = "<FIREFLY_URL>"
class HelloWorld(hass.Hass):
    def initialize(self):
        self.run_every(self.set_asset, "now", 60 * 60)
    def set_asset(self, kwargs):
        ent = self.get_entity("sensor.firefly3_asset_sparkasse_main")
        if not ent.exists():
            ent.add(
                state=0.0,
                attributes= 
                    "native_value": 0.0,
                    "native_unit_of_measurement": "EUR",
                    "state_class": "measurement",
                    "device_class": "monetary",
                    "current_balance_date": datetime.now(),
                 )
        r = requests.get(
            firefly_url + "/api/v1/accounts?type=asset",
            headers= 
                "Authorization": "Bearer " + app_token,
                "Accept": "application/vnd.api+json",
                "Content-Type": "application/json",
         )
        data = r.json()
        for account in data["data"]:
            if not "attributes" in account or "name" not in account["attributes"]:
                continue
            if account["attributes"]["name"] != "Sparkasse giro":
                continue
            self.log("Account :" + str(account["attributes"]))
            ent.set_state(
                state=account["attributes"]["current_balance"],
                attributes= 
                    "native_value": account["attributes"]["current_balance"],
                    "current_balance_date": datetime.fromisoformat(account["attributes"]["current_balance_date"]),
                 )
            self.log("Entity updated")
It uses a URL and personal access token to access Firefly III API, gets the asset accounts information, then extracts info about current balance and balance date of my main account and then creates and/or updates a "sensor" value into Home Assistant. This sensor is with metadata marked as a monetary value and as a measurement. This makes Home Assistant track this value in the database as a graphable changing value. I modified the file using the File Editor addon to edit the /config/appdaemon/apps/hello.py file. Each time the file is saved it is reloaded and logs can be seen in the AppDaemon Logs section - main_log for logging messages or error_log if there is a crash. Useful to know that requests library is included, but it hard to see in the docks what else is included or if there is an easy way to install extra Python packages. This is already a very nice basis for custom value insertion into Home Assistant - whatever you can with a Python script extract or calculate, you can also inject into Home Assistant. With even this simple approach you can monitor balances, budgets, piggy-banks, bill payment status and even sum of transactions in particular catories in a particular time window. Especially interesting data can be found in the insight section of the Firefly III API. The script above uses a trigger like self.run_every(self.set_asset, "now", 60 * 60) to simply run once per hour. The data in Firefly will not be updated too often anyway, at least not until we figure out how to make bank connection run automatically without user interaction and not screw up already existing transactions along the way. In theory a webhook API of the Firefly III could be used to trigger the data update instantly when any transaction is created or updated. Possibly even using Home Assistant webhook integration. Hmmm. Maybe. Who am I kiddind? I am going to make that work, for sure! :D But first - how about figuring out the future? So what I want to do? In short, I want to predict what will be the balance on my main account just before the next months salary comes in. To do this I would:
  • take the current balance of the main account
  • if this months salary is not paid out yet, then add that into the balance
  • deduct all still unpaid bills that are due between now and the target date
  • if the credit card account has not yet been reset to the main account, deduct current amount on the cards
  • if credit card account has been reset, but not from main account deducted yet, deduct the reset amount
To do that I need to use the Firefly API to read: current account info, status of all bills including next due date and amount, transfer transactions between credit cards and main account and something that would store the expected salary date and amount. Ideally I'd use a recurring transaction or a income bill for this, but Firefly is not really cooperating with that. The easiest would be just to hardcode that in the script itself. And this is what I have come up with so far. To make the development process easier, I separated put the params for the API key and salary info and app params for the month to predict for, and predict both this and next months balances at the same time. I edited the script locally with Neovim and also ran it locally with a few mocks, uploading to Home Assistant via the SSH addon when the local executions looked good. So what's next? Well, need to somewhat automate the sync with the bank (if at all possible). And for sure take a regular database and config backup :D

Next.