Search Results: "tora"

21 November 2021

Antoine Beaupr : The last syncmaildir crash

My syncmaildir (SMD) setup failed me one too many times (previously, previously). In an attempt to migrate to an alternative mail synchronization tool, I looked into using my IMAP server again, and found out my mail spool was in a pretty bad shape. I'm comparing mbsync and offlineimap in the next post but this post talks about how I recovered the mail spool so that tools like those could correctly synchronise the mail spool again.

The latest crash On Monday, SMD just started failing with this error:
nov 15 16:12:19 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:12:22 angela systemd[2305]: smd-pull.service: Succeeded.
nov 15 16:12:22 angela systemd[2305]: Finished pull emails with syncmaildir.
nov 15 16:14:08 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:14:11 angela systemd[2305]: Failed to start pull emails with syncmaildir.
nov 15 16:16:14 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Unable to get any data from the other endpoint.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: This problem may be transient, please retry.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Hint: did you correctly setup the SERVERNAME variable
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: on your client? Did you add an entry for it in your ssh
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: configuration file?
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error
nov 15 16:16:17 angela smd-pull[27188]: register: smd-client@localhost: TAGS: error::context(handshake) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:16:17 angela systemd[2305]: Failed to start pull emails with syncmaildir.
What is frustrating is that there's actually no network error here. Running the command by hand I did see a different message, but now I have lost it in my backlog. It had something to do with a filename being too long, and I gave up debugging after a while. This happened suddenly too, which added to the confusion. In a fit of rage I started this blog post and experimenting with alternatives, which led me down a lot of rabbit holes. Reviewing my previous mail crash documentation, it seems most solutions involve talking to an IMAP server, so I figured I would just do that. Wanting to try something new, i gave isync (AKA mbsync) a try. Oh dear, I did not expect how much trouble just talking to my IMAP server would be, which wasn't not isync's fault, for what that's worth. It was the primary tool I used to debug things, and served me well in that regard.

Mailbox corruption The first thing I found out is that certain messages in the IMAP spool were corrupted. mbsync would stop on a FETCH command and Dovecot would give me those errors on the server side.

"wrong W value"
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Maildir filename has wrong W value, renamed the file from /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S to /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495:2,S
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Deleting corrupted cache record uid=1582: UID 1582: Broken virtual size in mailbox junk: read(/home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S): FETCH BODY[] got too little data: 2540 vs 2578
At least this first error was automatically healed by Dovecot (by renaming the file without the W= flag). The problem is that the FETCH command fails and mbsync exits noisily. So you need to constantly restart mbsync with a silly command like:
while ! mbsync -a; do sleep 1; done

"cached message size larger than expected"
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=mail stream)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: Deleting corrupted cache record uid=19288: UID 19288: Broken physical size in mailbox Sent: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=)
nov 16 13:53:08 marcos dovecot[3520770]: imap-login: Panic: epoll_ctl(del, 7) failed: Bad file descriptor
This second problem is much harder to fix, because dovecot does not recover automatically. This is Dovecot complaining that the cached size (the S= field, but also present in Dovecot's metadata files) doesn't match the file size. I wonder if at least some of those messages were corrupted in the OfflineIMAP to syncmaildir migration because part of that procedure is to run the strip_header script to remove content from the emails. That could easily have broken things since the files do not also get renamed.

Workaround So I read a lot of the Dovecot documentation on the maildir format, and wrote an extensive fix script for those two errors. The script worked and mbsync was able to sync the entire mail spool. And no, rebuilding the index files didn't work. Also tried doveadm force-resync -u anarcat which didn't do anything. In the end I also had to do this, because the wrong cache values were also stored elsewhere.
service dovecot stop ; find -name 'dovecot*' -delete; service dovecot start
This would have totally broken any existing clients, but thankfully I'm starting from scratch (except maybe webmail, but I'm hoping it will self-heal as well, assuming it only has a cache and not a full replica of the mail spool).

Incoherence between Maildir and IMAP Unfortunately, the first mbsync was incomplete as it was missing about 15,000 mails:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
384836
anarcat@angela:~(main)$ find Maildir-mbsync/ -type f -a \! -name '.*'   wc -l 
369221
As it turns out, mbsync was not at fault here either: this was yet more mail spool corruption. It's actually 26 folders (out of 205) with inconsistent sizes, which can be found with:
for folder in * .[^.]* ; do 
  printf "%s\t%d\n" $folder $(find "$folder" -type f -a \! -name '.*'   wc -l );
done
The special \! -name '.*' bit is to ignore the mbsync metadata, which creates .uidvalidity and .mbsyncstate in every folder. That ignores about 200 files but since they are spread around all folders, which was making it impossible to review where the problem was. Here is what the diff looks like:
--- Maildir-list    2021-11-17 20:42:36.504246752 -0500
+++ Maildir-mbsync-list 2021-11-17 20:18:07.731806601 -0500
@@ -6,16 +6,15 @@
[...]
 .Archives  1
 .Archives.2010 3553
-.Archives.2011 3583
-.Archives.2012 12593
+.Archives.2011 3582
+.Archives.2012 620
 .Archives.2013 8576
 .Archives.2014 11057
-.Archives.2015 8173
+.Archives.2015 8165
 .Archives.2016 54
 .band  34
 .bitbuck   1
@@ -38,13 +37,12 @@
 .couchsurfers  2
-cur    11285
+cur    11280
 .current   130
 .cv    2
 .debbug    262
-.debian    37544
-drafts 1
-.Drafts    4
+.debian    37533
+.Drafts    2
 .drone 241
 .drupal    188
 .drupal-devel  303
[...]

Misfiled messages It's a bit all over the place, but we can already notice some huge differences between mailboxes, for example in the Archives folders. As it turns out, at least 12,000 of those missing mails were actually misfiled: instead of being in the Maildir/.Archives.2012/cur/ folder, they were directly in Maildir/.Archives.2012/. This is something that doesn't matter for SMD (and possibly for notmuch? it does matter, notmuch suddenly found 12,000 new mails) but that definitely matters to Dovecot and therefore mbsync... After moving those files around, we still have 4,000 message missing:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
381196
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385053
The problem is that those 4,000 missing mails are harder to track. Take, for example, .Archives.2011, which has a single message missing, out of 3,582. And the files are not identical: the checksums don't match after going through the IMAP transport, so we can't use a tool like hashdeep to compare the trees and find why any single file is missing.

"register" folder One big chunk of the 4,000, however, is a special folder called register in my spool, which I am syncing separately (see Securing registration email for details on that setup). That actually covers 3,700 of those messages, so I actually have a more modest 300 messages to figure out, after (easily!) configuring mbsync to sync that folder separately:
 @@ -30,9 +33,29 @@ Slave :anarcat-local:
  # Exclude everything under the internal [Gmail] folder, except the interesting folders
  #Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
  # Or include everything
 -Patterns *
 +#Patterns *
 +Patterns * !register  !.register
  # Automatically create missing mailboxes, both locally and on the server
  #Create Both
  Create slave
  # Sync the movement of messages between folders and deletions, add after making sure the sync works
  #Expunge Both
 +
 +IMAPAccount anarcat-register
 +Host imap.anarc.at
 +User register
 +PassCmd "pass imap.anarc.at-register"
 +SSLType IMAPS
 +CertificateFile /etc/ssl/certs/ca-certificates.crt
 +
 +IMAPStore anarcat-register-remote
 +Account anarcat-register
 +
 +MaildirStore anarcat-register-local
 +SubFolders Maildir++
 +Inbox ~/Maildir-mbsync/.register/
 +
 +Channel anarcat-register
 +Master :anarcat-register-remote:
 +Slave :anarcat-register-local:
 +Create slave

"tmp" folders and empty messages After syncing the "register" messages, I end up with the measly little 160 emails out of sync:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384900
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385059
Argh. After more digging, I have found 131 mails in the tmp/ directories of the client's mail spool. Mysterious! On the server side, it's even more files, and not the same ones. Possible that those were mails that were left there during a failed delivery of some sort, during a power failure or some sort of crash? Who knows. It could be another race condition in SMD if it runs while mail is being delivered in tmp/... The first thing to do with those is to cleanup a bunch of empty files (21 on angela):
find .[^.]*/tmp -type f -empty -delete
As it turns out, they are all duplicates, in the sense that notmuch can easily find a copy of files with the same message ID in its database. In other words, this hairy command returns nothing
find .[^.]*/tmp -type f   while read path; do
  msgid=$(grep -m 1  -i ^message-id "$path"   sed 's/Message-ID: //i;s/[<>]//g');
  if notmuch count --exclude=false  "id:$msgid"   grep -q 0; then
    echo "$path <$msgid> not in notmuch" ;
  fi;
done
... which is good. Or, to put it another way, this is safe:
find .[^.]*/tmp -type f -delete
Poof! 314 mails cleaned on the server side. Interestingly, SMD doesn't pick up on those changes at all and still sees files in tmp/ directories on the client side, so we need to operate the same twisted logic there.

notmuch to the rescue again After cleaning that on the client, we get:
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
384928
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384901
Ha! 27 mails difference. Those are the really sticky, unclear ones. I was hoping a full sync might clear that up, but after deleting the entire directory and starting from scratch, I end up with:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385034
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
384993
That is: even more messages missing (now 37). Sigh. Thankfully, this is something notmuch can help with: it can index all files by Message-ID (which I learned is case-insensitive, yay) and tell us which messages don't make it through. Considering the corruption I found in the mail spool, I wouldn't be the least surprised those messages are just skipped by the IMAP server. Unfortunately, there's nothing on the Dovecot server logs that would explain the discrepancy. Here again, notmuch comes to the rescue. We can list all message IDs to figure out that discrepancy:
notmuch search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-msgids
notmuch --config=.notmuch-config-mbsync search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-mbsync-msgids
And then we can see how many messages notmuch thinks are missing:
$ wc -l *msgids
372723 Maildir-mbsync-msgids
372752 Maildir-msgids
That's 29 messages. Oddly, it doesn't exactly match the find output:
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
385204
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385241
That is 10 more messages. Ugh. But actually, I know what those are: more misfiled messages (in a .folder/draft/ directory, bizarrely, so the totals actually match. In the notmuch output, there's a lot of stuff like this:
id:notmuch-sha1-fb880d673e24f5dae71b6b4d825d4a0d5d01cde4
Those are messages without a valid Message-ID. Notmuch (presumably) constructs one based on the file's checksum. Because the files differ between the IMAP server and the local mail spool (which is unfortunate, but possibly inevitable), those do not match. There are exactly the same number of those on both sides, so I'll go ahead and assume those are all accounted for. What remains is:
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\-[^-]'   grep -v sha1   wc -l 
2
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\+[^+]'   grep -v sha1   wc -l 
21
anarcat@angela:~(main)$ 
ie. 21 missing from mbsync, and, surprisingly, 2 missing from the original mail spool. Further inspection also showed they were all messages with some sort of "corruption": no body and only headers. I am not sure that is a legal email format in the first place. Since they were mostly spam or administrative emails ("You have been unsubscribed from mailing list..."), it seems fairly harmless to ignore those.

Conclusion As we'll see in the next article, SMD has stellar performance. But that comes at a huge cost: it accesses the mail storage directly. This can (and has) created significant problems on the mail server. It's unclear exactly why those things happen, but Dovecot expects a particular storage format on its file, and it seems unwise to bypass that. In the future, I'll try to remember to avoid that, especially since mechanisms like SMD require special server access (SSH) which, in the long term, I am not sure I want to maintain or expect. In other words, just talking with an IMAP server opens up a lot more possibilities of hosting than setting up a custom synchronisation protocol over SSH. It's also safer and more reliable, as we have seen. Thankfully, I've been able to recover from all the errors I could find, but it could have gone differently and it would have been possible for SMD to permanently corrupt significant part of my mail archives. In the end, however, the last drop was just another weird bug which, ironically, SMD mysteriously recovered from on its own while I was writing this documentation and migrating away from it. In any case, I recommend SMD users start looking for alternatives. The project has been archived upstream, and the Debian package has been orphaned. I have seen significant mail box corruption, including entire mail spool destruction, mostly due to incorrect locking code. I have filed a release-critical bug in Debian to make sure it doesn't ship with Debian bookworm. Alternatives like mbsync provide fast and reliable transport, including over SSH. See the next article for further discussion of the alternatives.

17 November 2021

Christoph Berg: PostgreSQL and Undelete

pg_dirtyread Earlier this week, I updated pg_dirtyread to work with PostgreSQL 14. pg_dirtyread is a PostgreSQL extension that allows reading "dead" rows from tables, i.e. rows that have already been deleted, or updated. Of course that works only if the table has not been cleaned-up yet by a VACUUM command or autovacuum, which is PostgreSQL's garbage collection machinery. Here's an example of pg_dirtyread in action:
# create table foo (id int, t text);
CREATE TABLE
# insert into foo values (1, 'Doc1');
INSERT 0 1
# insert into foo values (2, 'Doc2');
INSERT 0 1
# insert into foo values (3, 'Doc3');
INSERT 0 1
# select * from foo;
 id    t
 
  1   Doc1
  2   Doc2
  3   Doc3
(3 rows)
# delete from foo where id < 3;
DELETE 2
# select * from foo;
 id    t
 
  3   Doc3
(1 row)
Oops! The first two documents have disappeared. Now let's use pg_dirtyread to look at the table:
# create extension pg_dirtyread;
CREATE EXTENSION
# select * from pg_dirtyread('foo') t(id int, t text);
 id    t
 
  1   Doc1
  2   Doc2
  3   Doc3
All three documents are still there, but only one of them is visible. pg_dirtyread can also show PostgreSQL's system colums with the row location and visibility information. For the first two documents, xmax is set, which means the row has been deleted:
# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid    xmin   xmax   id    t
 
 (0,1)   1577   1580    1   Doc1
 (0,2)   1578   1580    2   Doc2
 (0,3)   1579      0    3   Doc3
(3 rows)
Undelete Caveat: I'm not promising any of the ideas quoted below will actually work in practice. There are a few caveats and a good portion of intricate knowledge about the PostgreSQL internals might be required to succeed properly. Consider consulting your favorite PostgreSQL support channel for advice if you need to recover data on any production system. Don't try this at work. I always had plans to extend pg_dirtyread to include some "undelete" command to make deleted rows reappear, but never got around to trying that. But rows can already be restored by using the output of pg_dirtyread itself:
# insert into foo select * from pg_dirtyread('foo') t(id int, t text) where id = 1;
This is not a true "undelete", though - it just inserts new rows from the data read from the table. pg_surgery Enter pg_surgery, which is a new PostgreSQL extension supplied with PostgreSQL 14. It contains two functions to "perform surgery on a damaged relation". As a side-effect, they can also make delete tuples reappear. As I discovered now, one of the functions, heap_force_freeze(), works nicely with pg_dirtyread. It takes a list of ctids (row locations) that it marks "frozen", but at the same time as "not deleted". Let's apply it to our test table, using the ctids that pg_dirtyread can read:
# create extension pg_surgery;
CREATE EXTENSION
# select heap_force_freeze('foo', array_agg(ctid))
    from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text) where id = 1;
 heap_force_freeze
 
(1 row)
Et voil , our deleted document is back:
# select * from foo;
 id    t
 
  1   Doc1
  3   Doc3
(2 rows)
# select * from pg_dirtyread('foo') t(ctid tid, xmin xid, xmax xid, id int, t text);
 ctid    xmin   xmax   id    t
 
 (0,1)      2      0    1   Doc1
 (0,2)   1578   1580    2   Doc2
 (0,3)   1579      0    3   Doc3
(3 rows)
Disclaimer Most importantly, none of the above methods will work if the data you just deleted has already been purged by VACUUM or autovacuum. These actively zero out reclaimed space. Restore from backup to get your data back. Since both pg_dirtyread and pg_surgery operate outside the normal PostgreSQL MVCC machinery, it's easy to create corrupt data using them. This includes duplicated rows, duplicated primary key values, indexes being out of sync with tables, broken foreign key constraints, and others. You have been warned. pg_dirtyread does not work (yet) if the deleted rows contain any toasted values. Possible other approaches include using pageinspect and pg_filedump to retrieve the ctids of deleted rows. Please make sure you have working backups and don't need any of the above.

5 November 2021

Jonathan Dowland: 25 things I would like to 3D print

Last year I started collecting ideas of things I would like to 3D print one day, on Twitter. Twitter is fundamentally ephemeral, so I'll collect it here instead. I got up to 14 items on Twitter, and now I'm up to 25. I don't own a 3D printer, but I have access to one at the work office. Perhaps this list is my subconcious trying to convince me to buy one. What am I missing? What else should I be thinking of printing? Let me know!
  1. Some kind of 45 leaning prong to dry bottles and flasks on
  2. A tea tray and coasters
  3. a replacement prop arm/foot for my computer keyboard (something like this but for the Lenovo Ultranav)
  4. some attempted representation of Borges Library of Babel, a la @jwz The Library of Babel, again
  5. an exploration of the geometry of Susanna Clarke s Piranesi
  6. further iterations of my castle
  7. Small tins to keep loose-leaf tea in
  8. Who am I kidding, bound to be a map from DOOM. E1M1 perhaps, or something more regular (MAP07? E2M8?) See also this amazing print of Quake 3: Arena's "Camping Grounds"
  9. replacement bits for kids toy sets, e.g. a bolt with long shank from Early Learning Centre Build It Deluxe Set, without all 4 of which you can't build much of anything
  10. A stand for a decorative Christmas bauble (kid's hand print on it) A roll of cellotape works pretty well in the mean time
  11. DIY bits-and-bobs sorter/storage (nuts and bolts etc)
  12. A space ship from Elite/Frontier. Probably a Cobra mk3 or maybe a Viper mk2. In a glow-in-the-dark PLA which i d overpaint with gunmetal except for the fuselage
  13. A watch stand/holder/storage thing Except it would look nicer in wood (And I m more inclined to get rid of all but one watch instead)
  14. Little tabbed 7 dividers for vinyl records, with A-Z cut into the tabs 12 ones might be a stretch (something like this)
  15. A low-profile custom trackpoint cover like the ones by SaotoTech (e.g.)
  16. A vinyl record. (Not sure that any 3D printer I would have access to would have the necessary resolution. I haven't done any research yet.)
  17. A free-standing inclined vinyl record display stand (e.g.)
  18. A "Settlers of Catan" set. I've got the travel edition which is great but it would be nicer to have a larger-sized set. There are some things I really like about the travel set that the full-size set lacks; so designing and printing a larger set myself could incorporate them. Also I don't feel inclined to buy the full-size set for 50 or so to end up with essentially the same game I already bought. No doubt I'd spent at least that much in PLA.
  19. Little kids trinkets. Pacman ghosts, that sort-of thing. Whatever my daughters come up with next.
  20. Lego storage/sorters.
  21. Some kind of lenticular picture. Perhaps a gift or Christmas card combined in one.
  22. A bracket to install a Gotek drive in my Amiga 500 (e.g.). I've managed without but the fit isn't great.
  23. An attempt at using the 3D printer for 2D drawing. I would never get the same kind of quality results you can get from a proper plotter, but still Take a look at some proper plotter art!
  24. Garden decorations. I like the idea of porous geometric shapes that you can plant mosses or ferns into, but also things which might be taken over and "used" by nature in ways I hadn't thought of.
  25. Floor plans / 3D plans of my house (including variations if I remodelled)

23 October 2021

Antoine Beaupr : The Neo-Colonial Internet

I grew up with the Internet and its ethics and politics have always been important in my life. But I have also been involved at other levels, against police brutality, for Food, Not Bombs, worker autonomy, software freedom, etc. For a long time, that all seemed coherent. But the more I look at the modern Internet -- and the mega-corporations that control it -- and the less confidence I have in my original political analysis of the liberating potential of technology. I have come to believe that most of our technological development is harmful to the large majority of the population of the planet, and of course the rest of the biosphere. And now I feel this is not a new problem. This is because the Internet is a neo-colonial device, and has been from the start. Let me explain.

What is Neo-Colonialism? The term "neo-colonialism" was coined by Kwame Nkrumah, first president of Ghana. In Neo-Colonialism, the Last Stage of Imperialism (1965), he wrote:
In place of colonialism, as the main instrument of imperialism, we have today neo-colonialism ... [which] like colonialism, is an attempt to export the social conflicts of the capitalist countries. ... The result of neo-colonialism is that foreign capital is used for the exploitation rather than for the development of the less developed parts of the world. Investment, under neo-colonialism, increases, rather than decreases, the gap between the rich and the poor countries of the world.
So basically, if colonialism is Europeans bringing genocide, war, and its religion to the Africa, Asia, and the Americas, neo-colonialism is the Americans (note the "n") bringing capitalism to the world. Before we see how this applies to the Internet, we must therefore make a detour into US history. This matters, because anyone would be hard-pressed to decouple neo-colonialism from the empire under which it evolves, and here we can only name the United States of America.

US Declaration of Independence Let's start with the United States declaration of independence (1776). Many Americans may roll their eyes at this, possibly because that declaration is not actually part of the US constitution and therefore may have questionable legal standing. Still, it was obviously a driving philosophical force in the founding of the nation. As its author, Thomas Jefferson, stated:
it was intended to be an expression of the American mind, and to give to that expression the proper tone and spirit called for by the occasion
In that aging document, we find the following pearl:
We hold these truths to be self-evident, that all men are created equal, that they are endowed by their Creator with certain unalienable Rights, that among these are Life, Liberty and the pursuit of Happiness.
As a founding document, the Declaration still has an impact in the sense that the above quote has been called an:
"immortal declaration", and "perhaps [the] single phrase" of the American Revolutionary period with the greatest "continuing importance." (Wikipedia)
Let's read that "immortal declaration" again: "all men are created equal". "Men", in that context, is limited to a certain number of people, namely "property-owning or tax-paying white males, or about 6% of the population". Back when this was written, women didn't have the right to vote, and slavery was legal. Jefferson himself owned hundreds of slaves. The declaration was aimed at the King and was a list of grievances. A concern of the colonists was that the King:
has excited domestic insurrections amongst us, and has endeavoured to bring on the inhabitants of our frontiers, the merciless Indian Savages whose known rule of warfare, is an undistinguished destruction of all ages, sexes and conditions.
This is a clear mark of the frontier myth which paved the way for the US to exterminate and colonize the territory some now call the United States of America. The declaration of independence is obviously a colonial document, having being written by colonists. None of this is particularly surprising, historically, but I figured it serves as a good reminder of where the Internet is coming from, since it was born in the US.

A Declaration of the Independence of Cyberspace Two hundred and twenty years later, in 1996, John Perry Barlow wrote a declaration of independence of cyberspace. At this point, (almost) everyone has a right to vote (including women), slavery was abolished (although some argue it still exists in the form of the prison system); the US has made tremendous progress. Surely this text will have aged better than the previous declaration it is obviously derived from. Let's see how it reads today and how it maps to how the Internet is actually built now.

Borders of Independence One of the key ideas that Barlow brings up is that "cyberspace does not lie within your borders". In that sense, cyberspace is the final frontier: having failed to colonize the moon, Americans turn inwards, deeper into technology, but still in the frontier ideology. And indeed, Barlow is one of the co-founder of the Electronic Frontier Foundation (the beloved EFF), founded six years prior. But there are other problems with this idea. As Wikipedia quotes:
The declaration has been criticized for internal inconsistencies.[9] The declaration's assertion that 'cyberspace' is a place removed from the physical world has also been challenged by people who point to the fact that the Internet is always linked to its underlying geography.[10]
And indeed, the Internet is definitely a physical object. First controlled and severely restricted by "telcos" like AT&T, it was somewhat "liberated" from that monopoly in 1982 when an anti-trust lawsuit broke up the monopoly, a key historical event that, one could argue, made the Internet possible. (From there on, "backbone" providers could start competing and emerge, and eventually coalesce into new monopolies: Google has a monopoly on search and advertisement, Facebook on communications for a few generations, Amazon on storage and computing, Microsoft on hardware, etc. Even AT&T is now pretty much as consolidated as it was before.) The point is: all those companies have gigantic data centers and intercontinental cables. And those are definitely prioritizing the western world, the heart of the empire. Take for example Google's latest 3,900 mile undersea cable: it does not connect Argentina to South Africa or New Zealand, it connects the US to UK and Spain. Hardly a revolutionary prospect.

Private Internet But back to the Declaration:
Do not think that you can build it, as though it were a public construction project. You cannot. It is an act of nature and it grows itself through our collective actions.
In Barlow's mind, the "public" is bad, and private is good, natural. Or, in other words, a "public construction project" is unnatural. And indeed, the modern "nature" of development is private: most of the Internet is now privately owned and operated. I must admit that, as an anarchist, I loved that sentence when I read it. I was rooting for "us", the underdogs, the revolutionaries. And, in a way, I still do: I am on the board of Koumbit and work for a non-profit that has pivoted towards censorship and surveillance evasion. Yet I cannot help but think that, as a whole, we have failed to establish that independence and put too much trust in private companies. It is obvious in retrospect, but it was not, 30 years ago. Now, the infrastructure of the Internet has zero accountability to traditional political entities supposedly representing the people, or even its users. The situation is actually worse than when the US was founded (e.g. "6% of the population can vote"), because the owners of the tech giants are only a handful of people who can override any decision. There's only one Amazon CEO, he's called Jeff Bezos, and he has total control. (Update: Bezos actually ceded the CEO role to Andy Jassy, AWS and Amazon music founder, while remaining executive chairman. I would argue that, as the founder and the richest man on earth, he still has strong control over Amazon.)

Social Contract Here's another claim of the Declaration:
We are forming our own Social Contract.
I remember the early days, back when "netiquette" was a word, it did feel we had some sort of a contract. Not written in standards of course -- or barely (see RFC1855) -- but as a tacit agreement. How wrong we were. One just needs to look at Facebook to see how problematic that idea is on a global network. Facebook is the quintessential "hacker" ideology put in practice. Mark Zuckerberg explicitly refused to be "arbiter of truth" which implicitly means he will let lies take over its platforms. He also sees Facebook as place where everyone is equal, something that echoes the Declaration:
We are creating a world that all may enter without privilege or prejudice accorded by race, economic power, military force, or station of birth.
(We note, in passing, the omission of gender in that list, also mirroring the infamous "All men are created equal" claim of the US declaration.) As the Wall Street Journal's (WSJ) Facebook files later shown, both of those "contracts" have serious limitations inside Facebook. There are VIPs who systematically bypass moderation systems including fascists and rapists. Drug cartels and human traffickers thrive on the platform. Even when Zuckerberg himself tried to tame the platform -- to get people vaccinated or to make it healthier -- he failed: "vaxxer" conspiracies multiplied and Facebook got angrier. This is because the "social contract" behind Facebook and those large companies is a lie: their concern is profit and that means advertising, "engagement" with the platform, which causes increased anxiety and depression in teens, for example. Facebook's response to this is that they are working really hard on moderation. But the truth is that even that system is severely skewed. The WSJ showed that Facebook has translators for only 50 languages. It's a surprisingly hard to count human languages but estimates range the number of distinct languages between 2500 and 7000. So while 50 languages seems big at first, it's actually a tiny fraction of the human population using Facebook. Taking the first 50 of the Wikipedia list of languages by native speakers we omit languages like Dutch (52), Greek (74), and Hungarian (78), and that's just a few random nations picks from Europe. As an example, Facebook has trouble moderating even a major language like Arabic. It censored content from legitimate Arab news sources when they mentioned the word al-Aqsa because Facebook associates it with the al-Aqsa Martyrs' Brigades when they were talking about the Al-Aqsa Mosque... This bias against Arabs also shows how Facebook reproduces the American colonizer politics. The WSJ also pointed out that Facebook spends only 13% of its moderation efforts outside of the US, even if that represents 90% of its users. Facebook spends three more times moderating on "brand safety", which shows its priority is not the safety of its users, but of the advertisers.

Military Internet Sergey Brin and Larry Page are the Lewis and Clark of our generation. Just like the latter were sent by Jefferson (the same) to declare sovereignty over the entire US west coast, Google declared sovereignty over all human knowledge, with its mission statement "to organize the world's information and make it universally accessible and useful". (It should be noted that Page somewhat questioned that mission but only because it was not ambitious enough, Google having "outgrown" it.) The Lewis and Clark expedition, just like Google, had a scientific pretext, because that is what you do to colonize a world, presumably. Yet both men were military and had to receive scientific training before they left. The Corps of Discovery was made up of a few dozen enlisted men and a dozen civilians, including York an African American slave owned by Clark and sold after the expedition, with his final fate lost in history. And just like Lewis and Clark, Google has a strong military component. For example, Google Earth was not originally built at Google but is the acquisition of a company called Keyhole which had ties with the CIA. Those ties were brought inside Google during the acquisition. Google's increasing investment inside the military-industrial complex eventually led Google to workers organizing a revolt although it is currently unclear to me how much Google is involved in the military apparatus. Other companies, obviously, do not have such reserve, with Microsoft, Amazon, and plenty of others happily bidding on military contracts all the time.

Spreading the Internet I am obviously not the first to identify colonial structures in the Internet. In an article titled The Internet as an Extension of Colonialism, Heather McDonald correctly identifies fundamental problems with the "development" of new "markets" of Internet "consumers", primarily arguing that it creates a digital divide which creates a "lack of agency and individual freedom":
Many African people have gained access to these technologies but not the freedom to develop content such as web pages or social media platforms in their own way. Digital natives have much more power and therefore use this to create their own space with their own norms, shaping their online world according to their own outlook.
But the digital divide is certainly not the worst problem we have to deal with on the Internet today. Going back to the Declaration, we originally believed we were creating an entirely new world:
This governance will arise according to the conditions of our world, not yours. Our world is different.
How I dearly wished that was true. Unfortunately, the Internet is not that different from the offline world. Or, to be more accurate, the values we have embedded in the Internet, particularly of free speech absolutism, sexism, corporatism, and exploitation, are now exploding outside of the Internet, into the "real" world. The Internet was built with free software which, fundamentally, was based on quasi-volunteer labour of an elite force of white men with obviously too much time on their hands (and also: no children). The mythical writing of GCC and Emacs by Richard Stallman is a good example of this, but the entirety of the Internet now seems to be running on random bits and pieces built by hit-and-run programmers working on their copious free time. Whenever any of those fails, it can compromise or bring down entire systems. (Heck, I wrote this article on my day off...) This model of what is fundamentally "cheap labour" is spreading out from the Internet. Delivery workers are being exploited to the bone by apps like Uber -- although it should be noted that workers organise and fight back. Amazon workers are similarly exploited beyond belief, forbidden to take breaks until they pee in bottles, with ambulances nearby to carry out the bodies. During peak of the pandemic, workers were being dangerously exposed to the virus in warehouses. All this while Amazon is basically taking over the entire economy. The Declaration culminates with this prophecy:
We will spread ourselves across the Planet so that no one can arrest our thoughts.
This prediction, which first felt revolutionary, is now chilling.

Colonial Internet The Internet is, if not neo-colonial, plain colonial. The US colonies had cotton fields and slaves, we have disposable cell phones and Foxconn workers. Canada has its cultural genocide, Facebook has his own genocides in Ethiopia, Myanmar, and mob violence in India. Apple is at least implicitly accepting the Uyghur genocide. And just like the slaves of the colony, those atrocities are what makes the empire run. The Declaration actually ends like this, a quote which I have in my fortune cookies file:
We will create a civilization of the Mind in Cyberspace. May it be more humane and fair than the world your governments have made before.
That is still inspiring to me. But if we want to make "cyberspace" more humane, we need to decolonize it. Work on cyberpeace instead of cyberwar. Establish clear code of conduct, discuss ethics, and question your own privileges, biases, and culture. For me the first step in decolonizing my own mind is writing this article. Breaking up tech monopolies might be an important step, but it won't be enough: we have to do a culture shift as well, and that's the hard part.

Appendix: an apology to Barlow I kind of feel bad going through Barlow's declaration like this, point by point. It is somewhat unfair, especially since Barlow passed away a few years ago and cannot mount a response (even humbly assuming that he might read this). But then again, he himself recognized he was a bit too "optimistic" in 2009, saying: "we all get older and smarter":
I'm an optimist. In order to be libertarian, you have to be an optimist. You have to have a benign view of human nature, to believe that human beings left to their own devices are basically good. But I'm not so sure about human institutions, and I think the real point of argument here is whether or not large corporations are human institutions or some other entity we need to be thinking about curtailing. Most libertarians are worried about government but not worried about business. I think we need to be worrying about business in exactly the same way we are worrying about government.
And, in a sense, it was a little naive to expect Barlow to not be a colonist. Barlow is, among many things, a cattle rancher who grew up on a colonial ranch in Wyoming. The ranch was founded in 1907 by his great uncle, 17 years after the state joined the Union, and only a generation or two after the Powder River War (1866-1868) and Black Hills War (1876-1877) during which the US took over lands occupied by Lakota, Cheyenne, Arapaho, and other native American nations, in some of the last major First Nations Wars.

Appendix: further reading There is another article that almost has the same title as this one: Facebook and the New Colonialism. (Interestingly, the <title> tag on the article is actually "Facebook the Colonial Empire" which I also find appropriate.) The article is worth reading in full, but I loved this quote so much that I couldn't resist reproducing it here:
Representations of colonialism have long been present in digital landscapes. ( Even Super Mario Brothers, the video game designer Steven Fox told me last year. You run through the landscape, stomp on everything, and raise your flag at the end. ) But web-based colonialism is not an abstraction. The online forces that shape a new kind of imperialism go beyond Facebook.
It goes on:
Consider, for example, digitization projects that focus primarily on English-language literature. If the web is meant to be humanity s new Library of Alexandria, a living repository for all of humanity s knowledge, this is a problem. So is the fact that the vast majority of Wikipedia pages are about a relatively tiny square of the planet. For instance, 14 percent of the world s population lives in Africa, but less than 3 percent of the world s geotagged Wikipedia articles originate there, according to a 2014 Oxford Internet Institute report.
And they introduce another definition of Neo-colonialism, while warning about abusing the word like I am sort of doing here:
I m loath to toss around words like colonialism but it s hard to ignore the family resemblances and recognizable DNA, to wit, said Deepika Bahri, an English professor at Emory University who focuses on postcolonial studies. In an email, Bahri summed up those similarities in list form:
  1. ride in like the savior
  2. bandy about words like equality, democracy, basic rights
  3. mask the long-term profit motive (see 2 above)
  4. justify the logic of partial dissemination as better than nothing
  5. partner with local elites and vested interests
  6. accuse the critics of ingratitude
In the end, she told me, if it isn t a duck, it shouldn t quack like a duck.
Another good read is the classic Code and other laws of cyberspace (1999, free PDF) which is also critical of Barlow's Declaration. In "Code is law", Lawrence Lessig argues that:
computer code (or "West Coast Code", referring to Silicon Valley) regulates conduct in much the same way that legal code (or "East Coast Code", referring to Washington, D.C.) does (Wikipedia)
And now it feels like the west coast has won over the east coast, or maybe it recolonized it. In any case, Internet now christens emperors.

20 October 2021

Arturo Borrero Gonz lez: Iterating on how we do NFS at Wikimedia Cloud Services

Logos This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. NFS is a central piece of infrastructure that is essential to services like Toolforge. Recently, the Cloud Services team at Wikimedia had been reviewing how we do NFS. The current situation NFS is a central piece of technology for some of the services that the Wikimedia Cloud Services team offers to the community. We have several shares that power different use cases: Toolforge user home directories live on NFS, and Cloud VPS users can also access dumps using this protocol. The current setup involves several physical hardware servers, with about 20TB of storage, offering shares over 10G links to the cloud. For the system to be more fault-tolerant, we duplicate each share for redundancy using DRBD. Running NFS on dedicated hardware servers has traditionally offered us advantages: mostly on the performance and the capacity fields. As time has passed, we have been enumerating more and more reasons to review how we do NFS. For one, the current setup is in violation of some of our internal rules regarding realm separation. Additionally, we had been longing for additional flexibility managing our servers: we wanted to use virtual machines managed by Openstack Nova. The DRBD-based high-availability system required mostly a hand-crafted procedure for failover/failback. There s also some scalability concerns as NFS is easy to grow up, but not to grow horizontally, and of course, we have to be able to keep the tenancy setup while doing so, something that NFS does by using LDAP/Unix users and may get complicated too when growing. In general, the servers have become too big to fail , clearly technical debt, and it has taken us years to decide on taking on the task to rethink the architecture. It s worth mentioning that in an ideal world, we wouldn t depend on NFS, but the truth is that it will still be a central piece of infrastructure for years to come in services like Toolforge. Over a series of brainstorming meetings, the WMCS team evaluated the situation and sorted out the many moving parts. The team managed to boil down the potential service future to two competing options: Then we decided to research both options in parallel. For a number of reasons, the evaluation was timeboxed to three weeks. Both ideas had a couple of points in common: the NFS data would be stored on our Ceph farm via Cinder volumes, and we would rely on Ceph reliability to avoid using DRBD. Another open topic was how to back up data from Ceph, to store our important bits in more than one basket. We will get to the back up topic later. The manila experiment The Wikimedia Foundation was an early adopter of some Openstack components (Nova, Glance, Designate, Horizon), but Manila was never evaluated for usage until now. Our approach for this experiment was to closely follow the upstream guidelines. We read the documentation and tried to understand the different setups you can build with Manila. As we often feel with other Openstack components, the documentation doesn t perfectly describe how to introduce a given component in your particular local setup. Here we use an admin-controller flat-topology Neutron network. This network is shared by all tenants (or projects) in our Openstack deployment. Also, Manila can use many different driver backends, for things like NetApps or CephFS that we don t use , yet. After some research, the generic driver was the one that seemed to better fit our use case. The generic driver leverages Nova virtual machines instances plus Cinder volume to create and manage the shares. In general, Manila supports two operational modes, whether it should create/destroy the share servers (i.e, the virtual machine instances) or not. This option is called driver_handles_share_server (or DHSS) and takes a boolean value. We were interested in trying with DHSS=true, to really benefit from the potential of the setup. Manila diagram NFS idea 6, original image in Wikitech So, after sorting all these variables, we moved on with our initial testing. We built a PoC setup as depicted in the diagram above, with the manila-share component running in a virtual machine inside the cloud. The PoC led to us reporting several bugs upstream: In some cases we tried to address these bugs ourselves: It s worth mentioning that the upstream community was extra-welcoming to us, and we re thankful for that. However, at the end of our three-week period, our Manila setup still wasn t working as expected. Your experience may change with other drivers perhaps the ZFSonLinux or the CephFS ones. In general, we were having trouble making the setup work as expected, so we decided to abandon this approach in favor of the other option we were considering at the beginning. Simple virtual machine serving NFS The alternative was to create a Nova virtual machine instance by hand and to configure it using puppet. We have been investing in an automation framework lately, so the idea is to not actually create the server by hand. Anyway, the data would be decoupled from the instance into Cinder volumes, which led us to the question we left for later: How should we back up those terabytes of important information? Just to be clear, the backup problem was independent of the above options; with Manila we would still have had to solve the same challenge. We would like to see our data be backed up somewhere else other than in Ceph. And that s exactly where we are at right now. We ve been exploring different backup strategies and will finally use the Cinder backup API. Conclusion The iteration will end with the dedicated NFS hardware servers being stopped, and the shares being served from within the cloud. The migration will take some time to happen because we will check and double-check that everything works as expected (including from the performance point of view) before making definitive changes. We already have some plans to make sure our users experience as little service impact as possible. The most troublesome shares will be those related to Toolforge. At some point we will need to disallow writes to the NFS share, rsync the data out of the hardware servers into the Cinder volumes, point the NFS clients to the new virtual machines, and then enable writes again. The main Toolforge share has about 8TB of data, so this will take a while. We will have more updates in the future. Who knows, perhaps our next-next iteration, in a couple of years, will see us adopting Openstack Manila for good. Featured image credit: File:(from break water) Manila Skyline panoramio.jpg, ewol, CC BY-SA 3.0 This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

22 September 2021

Ian Jackson: Tricky compatibility issue - Rust's io::ErrorKind

This post is about some changes recently made to Rust's ErrorKind, which aims to categorise OS errors in a portable way. Audiences for this post Background and context Error handling principles Handling different errors differently is often important (although, sadly, often neglected). For example, if a program tries to read its default configuration file, and gets a "file not found" error, it can proceed with its default configuration, knowing that the user hasn't provided a specific config. If it gets some other error, it should probably complain and quit, printing the message from the error (and the filename). Otherwise, if the network fileserver is down (say), the program might erroneously run with the default configuration and do something entirely wrong. Rust's portability aims The Rust programming language tries to make it straightforward to write portable code. Portable error handling is always a bit tricky. One of Rust's facilities in this area is std::io::ErrorKind which is an enum which tries to categorise (and, sometimes, enumerate) OS errors. The idea is that a program can check the error kind, and handle the error accordingly. That these ErrorKinds are part of the Rust standard library means that to get this right, you don't need to delve down and get the actual underlying operating system error number, and write separate code for each platform you want to support. You can check whether the error is ErrorKind::NotFound (or whatever). Because ErrorKind is so important in many Rust APIs, some code which isn't really doing an OS call can still have to provide an ErrorKind. For this purpose, Rust provides a special category ErrorKind::Other, which doesn't correspond to any particular OS error. Rust's stability aims and approach Another thing Rust tries to do is keep existing code working. More specifically, Rust tries to:
  1. Avoid making changes which would contradict the previously-published documentation of Rust's language and features.
  2. Tell you if you accidentally rely on properties which are not part of the published documentation.
By and large, this has been very successful. It means that if you write code now, and it compiles and runs cleanly, it is quite likely that it will continue work properly in the future, even as the language and ecosystem evolves. This blog post is about a case where Rust failed to do (2), above, and, sadly, it turned out that several people had accidentally relied on something the Rust project definitely intended to change. Furthermore, it was something which needed to change. And the new (corrected) way of using the API is not so obvious. Rust enums, as relevant to io::ErrorKind (Very briefly:) When you have a value which is an io::ErrorKind, you can compare it with specific values:
    if error.kind() == ErrorKind::NotFound   ...
  
But in Rust it's more usual to write something like this (which you can read like a switch statement):
    match error.kind()  
      ErrorKind::NotFound => use_default_configuration(),
      _ => panic!("could not read config file  :  ", &file, &error),
     
  
Here _ means "anything else". Rust insists that match statements are exhaustive, meaning that each one covers all the possibilities. So if you left out the line with the _, it wouldn't compile. Rust enums can also be marked non_exhaustive, which is a declaration by the API designer that they plan to add more kinds. This has been done for ErrorKind, so the _ is mandatory, even if you write out all the possibilities that exist right now: this ensures that if new ErrorKinds appear, they won't stop your code compiling. Improving the error categorisation The set of error categories stabilised in Rust 1.0 was too small. It missed many important kinds of error. This makes writing error-handling code awkward. In any case, we expect to add new error categories occasionally. I set about trying to improve this by proposing new ErrorKinds. This obviously needed considerable community review, which is why it took about 9 months. The trouble with Other and tests Rust has to assign an ErrorKind to every OS error, even ones it doesn't really know about. Until recently, it mapped all errors it didn't understand to ErrorKind::Other - reusing the category for "not an OS error at all". Serious people who write serious code like to have serious tests. In particular, testing error conditions is really important. For example, you might want to test your program's handling of disk full, to make sure it didn't crash, or corrupt files. You would set up some contraption that would simulate a full disk. And then, in your tests, you might check that the error was correct. But until very recently (still now, in Stable Rust), there was no ErrorKind::StorageFull. You would get ErrorKind::Other. If you were diligent you would dig out the OS error code (and check for ENOSPC on Unix, corresponding Windows errors, etc.). But that's tiresome. The more obvious thing to do is to check that the kind is Other. Obvious but wrong. ErrorKind is non_exhaustive, implying that more error kinds will appears, and, naturally, these would more finely categorise previously-Other OS errors. Unfortunately, the documentation note
Errors that are Other now may move to a different or a new ErrorKind variant in the future.
was only added in May 2020. So the wrongness of the "obvious" approach was, itself, not very obvious. And even with that docs note, there was no compiler warning or anything. The unfortunate result is that there is a body of code out there in the world which might break any time an error that was previously Other becomes properly categorised. Furthermore, there was nothing stopping new people writing new obvious-but-wrong code. Chosen solution: Uncategorized The Rust developers wanted an engineered safeguard against the bug of assuming that a particular error shows up as Other. They chose the following solution: There is now a new ErrorKind::Uncategorized which is now used for all OS errors for which there isn't a more specific categorisation. The fallback translation of unknown errors was changed from Other to Uncategorised. This is de jure justified by the fact that this enum has always been marked non_exhaustive. But in practice because this bug wasn't previously detected, there is such code in the wild. That code now breaks (usually, in the form of failing test cases). Usually when Rust starts to detect a particular programming error, it is reported as a new warning, which doesn't break anything. But that's not possible here, because this is a behavioural change. The new ErrorKind::Uncategorized is marked unstable. This makes it impossible to write code on Stable Rust which insists that an error comes out as Uncategorized. So, one cannot now write code that will break when new ErrorKinds are added. That's the intended effect. The downside is that this does break old code, and, worse, it is not as clear as it should be what the fixed code looks like. Alternatives considered and rejected by the Rust developers Not adding more ErrorKinds This was not tenable. The existing set is already too small, and error categorisation is in any case expected to improve over time. Just adding ErrorKinds as had been done before This would mean occasionally breaking test cases (or, possibly, production code) when an error that was previously Other becomes categorised. The broken code would have been "obvious", but de jure wrong, just as it is now, So this option amounts to expecting this broken code to continue to be written and continuing to break it occasionally. Somehow using Rust's Edition system The Rust language has a system to allow language evolution, where code declares its Edition (2015, 2018, 2021). Code from multiple editions can be combined, so that the ecosystem can upgrade gradually. It's not clear how this could be used for ErrorKind, though. Errors have to be passed between code with different editions. If those different editions had different categorisations, the resulting programs would have incoherent and broken error handling. Also some of the schemes for making this change would mean that new ErrorKinds could only be stabilised about once every 3 years, which is far too slow. How to fix code broken by this change Most main-line error handling code already has a fallback case for unknown errors. Simply replacing any occurrence of Other with _ is right. How to fix thorough tests The tricky problem is tests. Typically, a thorough test case wants to check that the error is "precisely as expected" (as far as the test can tell). Now that unknown errors come out as an unstable Uncategorized variant that's not so easy. If the test is expecting an error that is currently not categorised, you want to write code that says "if the error is any of the recognised kinds, call it a test failure". What does "any of the recognised kinds" mean here ? It doesn't meany any of the kinds recognised by the version of the Rust stdlib that is actually in use. That set might get bigger. When the test is compiled and run later, perhaps years later, the error in this test case might indeed be categorised. What you actually mean is "the error must not be any of the kinds which existed when the test was written". IMO therefore the right solution for such a test case is to cut and paste the current list of stable ErrorKinds into your code. This will seem wrong at first glance, because the list in your code and in Rust can get out of step. But when they do get out of step you want your version, not the stdlib's. So freezing the list at a point in time is precisely right. You probably only want to maintain one copy of this list, so put it somewhere central in your codebase's test support machinery. Periodically, you can update the list deliberately - and fix any resulting test failures. Unfortunately this approach is not suggested by the documentation. In theory you could work all this out yourself from first principles, given even the situation prior to May 2020, but it seems unlikely that many people have done so. In particular, cutting and pasting the list of recognised errors would seem very unnatural. Conclusions This was not an easy problem to solve well. I think Rust has done a plausible job given the various constraints, and the result is technically good. It is a shame that this change to make the error handling stability more correct caused the most trouble for the most careful people who write the most thorough tests. I also think the docs could be improved.
edited shortly after posting, and again 2021-09-22 16:11 UTC, to fix HTML slips


comment count unavailable comments

9 September 2021

Bits from Debian: DebConf21 online closes

DebConf21 group photo - click to enlarge On Saturday 28 August 2021, the annual Debian Developers and Contributors Conference came to a close. DebConf21 has been held online for the second time, due to the coronavirus (COVID-19) disease pandemic. All of the sessions have been streamed, with a variety of ways of participating: via IRC messaging, online collaborative text documents, and video conferencing meeting rooms. With 740 registered attendees from more than 15 different countries and a total of over 70 event talks, discussion sessions, Birds of a Feather (BoF) gatherings and other activities, DebConf21 was a large success. The setup made for former online events involving Jitsi, OBS, Voctomix, SReview, nginx, Etherpad, a web-based frontend for voctomix has been improved and used for DebConf21 successfully. All components of the video infrastructure are free software, and configured through the Video Team's public ansible repository. The DebConf21 schedule included a wide variety of events, grouped in several tracks: The talks have been streamed using two rooms, and several of these activities have been held in different languages: Telugu, Portuguese, Malayalam, Kannada, Hindi, Marathi and English, allowing a more diverse audience to enjoy and participate. Between talks, the video stream has been showing the usual sponsors on the loop, but also some additional clips including photos from previous DebConfs, fun facts about Debian and short shout-out videos sent by attendees to communicate with their Debian friends. The Debian publicity team did the usual live coverage to encourage participation with micronews announcing the different events. The DebConf team also provided several mobile options to follow the schedule. For those who were not able to participate, most of the talks and sessions are already available through the Debian meetings archive website, and the remaining ones will appear in the following days. The DebConf21 website will remain active for archival purposes and will continue to offer links to the presentations and videos of talks and events. Next year, DebConf22 is planned to be held in Prizren, Kosovo, in July 2022. DebConf is committed to a safe and welcome environment for all participants. During the conference, several teams (Front Desk, Welcome team and Community team) have been available to help so participants get their best experience in the conference, and find solutions to any issue that may arise. See the web page about the Code of Conduct in DebConf21 website for more details on this. Debian thanks the commitment of numerous sponsors to support DebConf21, particularly our Platinum Sponsors: Lenovo, Infomaniak, Roche, Amazon Web Services (AWS) and Google. About Debian The Debian Project was founded in 1993 by Ian Murdock to be a truly free community project. Since then the project has grown to be one of the largest and most influential open source projects. Thousands of volunteers from all over the world work together to create and maintain Debian software. Available in 70 languages, and supporting a huge range of computer types, Debian calls itself the universal operating system. About DebConf DebConf is the Debian Project's developer conference. In addition to a full schedule of technical, social and policy talks, DebConf provides an opportunity for developers, contributors and other interested people to meet in person and work together more closely. It has taken place annually since 2000 in locations as varied as Scotland, Argentina, and Bosnia and Herzegovina. More information about DebConf is available from https://debconf.org/. About Lenovo As a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office and data center solutions, Lenovo understands how critical open systems and platforms are to a connected world. About Infomaniak Infomaniak is Switzerland's largest web-hosting company, also offering backup and storage services, solutions for event organizers, live-streaming and video on demand services. It wholly owns its datacenters and all elements critical to the functioning of the services and products provided by the company (both software and hardware). About Roche Roche is a major international pharmaceutical provider and research company dedicated to personalized healthcare. More than 100.000 employees worldwide work towards solving some of the greatest challenges for humanity using science and technology. Roche is strongly involved in publicly funded collaborative research projects with other industrial and academic partners and have supported DebConf since 2017. About Amazon Web Services (AWS) Amazon Web Services (AWS) is one of the world's most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centers globally (in 77 Availability Zones within 24 geographic regions). AWS customers include the fastest-growing startups, largest enterprises and leading government agencies. About Google Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products such as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf for more than ten years, and is also a Debian partner sponsoring parts of Salsa's continuous integration infrastructure within Google Cloud Platform. Contact Information For further information, please visit the DebConf21 web page at https://debconf21.debconf.org/ or send mail to press@debian.org.

7 September 2021

Russell Coker: Oracle Cloud Free Tier

It seems that every cloud service of note has a free tier nowadays and the Oracle Cloud is the latest that I ve discovered (thanks to r/homelab which I highly recommend reading). Here s Oracle s summary of what they offer for free [1]. Oracle s always free tier (where presumable always is defined as until we change our contract ) currently offers ARM64 VMs to a total capacity of 4 CPU cores, 24G of RAM, and 200G of storage with a default VM size of 1/4 that (1 CPU core and 6G of RAM). It also includes 2 AMD64 VMs that each have 1G of RAM, but a 64bit VM with 1G of RAM isn t that useful nowadays. Web Interface The first thing to note is that the management interface is a massive pain to use. When a login times out for security reasons it redirects to a web page that gives a 404 error, maybe the redirection works OK if you are using it when it times out, but if you go off and spend an hour doing something else you will return to a 404 page. A web interface should never refer you to a page with a 404. There doesn t seem to be a way of bookmarking the commonly used links (as AWS does) and the set of links on the left depend on the section you are in with no obvious way of going between sections. Sometimes I got stuck in a set of pages about authentication controls (the identity cloud ) and there seems to be no link I could click on to get me back to cloud computing, I had to go to a bookmarked link for the main cloud login page. A web interface should never force the user to type in the main URL or go to a bookmark, you should be able to navigate from every page to every other page in a logical manner. An advanced user might have their own bookmarks in their browser to suit their workflow. But a beginner should be able to go to anywhere without breaking the session. Some parts of the interface appear to be copied from AWS, but unfortunately not the good parts. The way AWS manages IP access control is not easy to manage and it s not clear why packets are dropped, Oracle copies all this. On the upside Oracle has some good Datadog style analytics so for a new deployment you can debug IP access control by seeing records of rejected packets. Just to make it extra annoying when you create a rule with multiple ports specified the web interface will expand it out to multiple rules for one port each, having ports 80 and 443 on separate lines doesn t make things easier. Also it forces you to have IPv4 and IPv6 as separate rules, so if you want HTTP and HTTPS on both IPv4 and IPv6 (a common requirement) then you need 4 separate rules. One final annoying thing is that the web interface doesn t make your previous settings a default. As I ve created many ARM images and haven t created a single AMD image it should know that the probability that I want to create an AMD image is very low and stop defaulting to that. Recovery When trying a new system you will inevitably break things and have to recover things. The way to recover from a configuration error that prevents your VM from booting and getting to a state of allowing a login is to go to stop the VM, then go to the Boot volume section under Resources and use the settings button to detach the boot volume. Then you go to another VM (which must be running), go to the Attached block volumes menu and attach it as Paravirtualised (not iSCSI and not default which will probably be iSCSI). After some time the block device will appear and you can mount it and do stuff to it. Then after umounting it you detach it from the recovery VM and attach it again to the original VM (where it will still have an entry in the Boot volume section) and boot the original VM. As an aside it s really annoying that you can t attach a volume to a VM that isn t running. My first attempt at image recovery started with making a snapshot of the Boot volume, this didn t work well because the image uses EFI and therefore GPT and because the snapshot was larger than the original block device (which incidentally was the default size). I admit that I might have made a mistake making the snapshot, but if so it shouldn t be so easy to do. With GPT if you have a larger block device then partitioning tools complain about the backup partition table not being found, and they complain even more if you try to go back to the smaller size later on. Generally GPT partition tables are a bad idea for VMs, when I run the host I don t use partition tables, I have a separate block device for each filesystem or swap space. Snapshots aren t needed for recovery, they don t seem to work very well, and if it s possible to attach a snapshot to a VM in place of it s original Boot volume I haven t figured out how to do it. Console Connection If you boot Oracle Linux a derivative of RHEL that has SE Linux enabled in enforcing mode (yay) then you can go to the Console connection . The console is a Javascript console which allows you to login on a virtual serial console on device /dev/ttyAMA0. It tells you to type help but that isn t accepted, you have a straight Linux console login prompt. If you boot Ubuntu then you don t get a working serial console, it tells you to type help for help but doesn t respond to that. It seems that the Oracle Linux kernel 5.4.17-2102.204.4.4.el7uek.aarch64 is compiled with support for /dev/ttyAMA0 (the default ARM serial device) while the kernel 5.11.0-1016-oracle compiled by Oracle for their Ubuntu VMs doesn t have it. Performance I haven t done any detailed tests of VM performance. As a quick test I used zstd to compress a 154MB file, on my home workstation (E5-2620 v4 @ 2.10GHz) it took 11.3 seconds of CPU time to compress with zstd -9 and 7.2s to decompress. On the Oracle cloud it took 7.2s and 5.4s. So it seems that for some single core operations the ARM CPU used by the Oracle cloud is about 30% to 50% faster than a E5-2620 v4 (a slightly out of date server processor that uses DDR4 RAM). If you ran all the free resources in a single VM that would make a respectable build server. If you want to contribute to free software development and only have a laptop with 4G of RAM then an ARM build/test server with 24G of RAM and 4 cores would be very useful. Ubuntu Configuration The advantage of using EFI is that you can manage the kernel from within the VM. The default Oracle kernel for Ubuntu has a lot of modules included and is compiled with a lot of security options including SE Linux. Competitors https://aws.amazon.com/free AWS offers 750 hours (just over 31 days) per month of free usage of a t2.micro or t3.micro EC2 instance (which means 1GB of RAM). But that only lasts for 12 months and it s still only 1GB of RAM. AWS has some other things that could be useful like 1 million free Lambda requests per month. If you want to run your personal web site on Lambda you shouldn t hit that limit. They also apparently have some good offers for students. https://cloud.google.com/free The Google Cloud Project (GCP) offers $300 of credit. https://cloud.google.com/free/docs/gcp-free-tier#free-tier-usage-limits GCP also has ongoing free tier usage for some services. Some of them are pretty much unlimited use (50GB of storage for Cloud Source Repositories is a heap of source code). But for VMs you get the equivalent of 1*e2-micro instance running 24*7. A e2-micro has 1G of RAM. You also only get 30G of storage and 1GB of outbound data. It s clearly not as generous an offer as Oracle, but Oracle is the underdog so they have to try harder. https://azure.microsoft.com/en-us/free/ Azure appears to be much the same as AWS, free Linux VM for a year and then other less popular services free forever (or until they change the contract). https://www.ibm.com/cloud/free The IBM cloud free tier is the least generous offer, a VM is only free for 30 days. But what they offer for 30 days is pretty decent. If you want to try the IBM cloud and see if it can do what your company needs then this will do well. If you want to have free hosting for your hobby stuff then it s no good. Oracle seems like the most generous offer if you want to do stuff, but also one of the least valuable if you want to learn things that will help you at a job interview. For job interviews AWS seems the most useful and then GCP and Azure vying for second place.

6 September 2021

Vincent Bernat: Switching to the i3 window manager

I have been using the awesome window manager for 10 years. It is a tiling window manager, configurable and extendable with the Lua language. Using a general-purpose programming language to configure every aspect is a double-edged sword. Due to laziness and the apparent difficulty of adapting my configuration about 3000 lines to newer releases, I was stuck with the 3.4 version, whose last release is from 2013. It was time for a rewrite. Instead, I have switched to the i3 window manager, lured by the possibility to migrate to Wayland and Sway later with minimal pain. Using an embedded interpreter for configuration is not as important to me as it was in the past: it brings both complexity and brittleness.
i3 dual screen setup
Dual screen desktop running i3, Emacs, some terminals, including a Quake console, Firefox, Polybar as the status bar, and Dunst as the notification daemon.
The window manager is only one part of a desktop environment. There are several options for the other components. I am also introducing them in this post.

i3: the window manager i3 aims to be a minimal tiling window manager. Its documentation can be read from top to bottom in less than an hour. i3 organize windows in a tree. Each non-leaf node contains one or several windows and has an orientation and a layout. This information arbitrates the window positions. i3 features three layouts: split, stacking, and tabbed. They are demonstrated in the below screenshot:
Example of layouts
Demonstration of the layouts available in i3. The main container is split horizontally. The first child is split vertically. The second one is tabbed. The last one is stacking.
Tree representation of the previous screenshot
Tree representation of the previous screenshot.
Most of the other tiling window managers, including the awesome window manager, use predefined layouts. They usually feature a large area for the main window and another area divided among the remaining windows. These layouts can be tuned a bit, but you mostly stick to a couple of them. When a new window is added, the behavior is quite predictable. Moreover, you can cycle through the various windows without thinking too much as they are ordered. i3 is more flexible with its ability to build any layout on the fly, it can feel quite overwhelming as you need to visualize the tree in your head. At first, it is not unusual to find yourself with a complex tree with many useless nested containers. Moreover, you have to navigate windows using directions. It takes some time to get used to. I set up a split layout for Emacs and a few terminals, but most of the other workspaces are using a tabbed layout. I don t use the stacking layout. You can find many scripts trying to emulate other tiling window managers but I did try to get my setup pristine of these tentatives and get a chance to familiarize myself. i3 can also save and restore layouts, which is quite a powerful feature. My configuration is quite similar to the default one and has less than 200 lines.

i3 companion: the missing bits i3 philosophy is to keep a minimal core and let the user implements missing features using the IPC protocol:
Do not add further complexity when it can be avoided. We are generally happy with the feature set of i3 and instead focus on fixing bugs and maintaining it for stability. New features will therefore only be considered if the benefit outweighs the additional complexity, and we encourage users to implement features using the IPC whenever possible. Introduction to the i3 window manager
While this is not as powerful as an embedded language, it is enough for many cases. Moreover, as high-level features may be opinionated, delegating them to small, loosely coupled pieces of code keeps them more maintainable. Libraries exist for this purpose in several languages. Users have published many scripts to extend i3: automatic layout and window promotion to mimic the behavior of other tiling window managers, window swallowing to put a new app on top of the terminal launching it, and cycling between windows with Alt+Tab. Instead of maintaining a script for each feature, I have centralized everything into a single Python process, i3-companion using asyncio and the i3ipc-python library. Each feature is self-contained into a function. It implements the following components:
make a workspace exclusive to an application
When a workspace contains Emacs or Firefox, I would like other applications to move to another workspace, except for the terminal which is allowed to intrude into any workspace. The workspace_exclusive() function monitors new windows and moves them if needed to an empty workspace or to one with the same application already running.
implement a Quake console
The quake_console() function implements a drop-down console available from any workspace. It can be toggled with Mod+ . This is implemented as a scratchpad window.
back and forth workspace switching on the same output
With the workspace back_and_forth command, we can ask i3 to switch to the previous workspace. However, this feature is not restricted to the current output. I prefer to have one keybinding to switch to the workspace on the next output and one keybinding to switch to the previous workspace on the same output. This behavior is implemented in the previous_workspace() function by keeping a per-output history of the focused workspaces.
create a new empty workspace or move a window to an empty workspace
To create a new empty workspace or move a window to an empty workspace, you have to locate a free slot and use workspace number 4 or move container to workspace number 4. The new_workspace() function finds a free number and use it as the target workspace.
restart some services on output change
When adding or removing an output, some actions need to be executed: refresh the wallpaper, restart some components unable to adapt their configuration on their own, etc. i3 triggers an event for this purpose. The output_update() function also takes an extra step to coalesce multiple consecutive events and to check if there is a real change with the low-level library xcffib.
I will detail the other features as this post goes on. On the technical side, each function is decorated with the events it should react to:
@on(CommandEvent("previous-workspace"), I3Event.WORKSPACE_FOCUS)
async def previous_workspace(i3, event):
    """Go to previous workspace on the same output."""
The CommandEvent() event class is my way to send a command to the companion, using either i3-msg -t send_tick or binding a key to a nop command. The latter is used to avoid spawning a shell and a i3-msg process just to send a message. The companion listens to binding events and checks if this is a nop command.
bindsym $mod+Tab nop "previous-workspace"
There are other decorators to avoid code duplication: @debounce() to coalesce multiple consecutive calls, @static() to define a static variable, and @retry() to retry a function on failure. The whole script is a bit more than 1000 lines. I think this is worth a read as I am quite happy with the result.

dunst: the notification daemon Unlike the awesome window manager, i3 does not come with a built-in notification system. Dunst is a lightweight notification daemon. I am running a modified version with HiDPI support for X11 and recursive icon lookup. The i3 companion has a helper function, notify(), to send notifications using DBus. container_info() and workspace_info() uses it to display information about the container or the tree for a workspace.
Notification showing i3 tree for a workspace
Notification showing i3 s tree for a workspace

polybar: the status bar i3 bundles i3bar, a versatile status bar, but I have opted for Polybar. A wrapper script runs one instance for each monitor. The first module is the built-in support for i3 workspaces. To not have to remember which application is running in a workspace, the i3 companion renames workspaces to include an icon for each application. This is done in the workspace_rename() function. The icons are from the Font Awesome project. I maintain a mapping between applications and icons. This is a bit cumbersome but it looks great.
i3 workspaces in Polybar
i3 workspaces in Polybar
For CPU, memory, brightness, battery, disk, and audio volume, I am relying on the built-in modules. Polybar s wrapper script generates the list of filesystems to monitor and they get only displayed when available space is low. The battery widget turns red and blinks slowly when running out of power. Check my Polybar configuration for more details.
Various modules for Polybar
Polybar displaying various information: CPU usage, memory usage, screen brightness, battery status, Bluetooth status (with a connected headset), network status (connected to a wireless network and to a VPN), notification status, and speaker volume.
For Bluetooh, network, and notification statuses, I am using Polybar s ipc module: the next version of Polybar can receive an arbitrary text on an IPC socket. The module is defined with a single hook to be executed at the start to restore the latest status.
[module/network]
type = custom/ipc
hook-0 = cat $XDG_RUNTIME_DIR/i3/network.txt 2> /dev/null
initial = 1
It can be updated with polybar-msg action "#network.send.XXXX". In the i3 companion, the @polybar() decorator takes the string returned by a function and pushes the update through the IPC socket. The i3 companion reacts to DBus signals to update the Bluetooth and network icons. The @on() decorator accepts a DBusSignal() object:
@on(
    StartEvent,
    DBusSignal(
        path="/org/bluez",
        interface="org.freedesktop.DBus.Properties",
        member="PropertiesChanged",
        signature="sa sv as",
        onlyif=lambda args: (
            args[0] == "org.bluez.Device1"
            and "Connected" in args[1]
            or args[0] == "org.bluez.Adapter1"
            and "Powered" in args[1]
        ),
    ),
)
@retry(2)
@debounce(0.2)
@polybar("bluetooth")
async def bluetooth_status(i3, event, *args):
    """Update bluetooth status for Polybar."""
The middle of the bar is occupied by the date and a weather forecast. The latest also uses the IPC mechanism, but the source is a Python script triggered by a timer.
Date and weather in Polybar
Current date and weather forecast for the day in Polybar. The data is retrieved with the OpenWeather API.
I don t use the system tray integrated with Polybar. The embedded icons usually look horrible and they all behave differently. A few years back, Gnome has removed the system tray. Most of the problems are fixed by the DBus-based Status Notifier Item protocol also known as Application Indicators or Ayatana Indicators for GNOME. However, Polybar does not support this protocol. In the i3 companion, The implementation of Bluetooth and network icons, including displaying notifications on change, takes about 200 lines. I got to learn a bit about how DBus works and I get exactly the info I want.

picom: the compositor I like having slightly transparent backgrounds for terminals and to reduce the opacity of unfocused windows. This requires a compositor.1 picom is a lightweight compositor. It works well for me, but it may need some tweaking depending on your graphic card.2 Unlike the awesome window manager, i3 does not handle transparency, so the compositor needs to decide by itself the opacity of each window. Check my configuration for details.

systemd: the service manager I use systemd to start i3 and the various services around it. My xsession script only sets some environment variables and lets systemd handles everything else. Have a look at this article from Micha G ral for the rationale. Notably, each component can be easily restarted and their logs are not mangled inside the ~/.xsession-errors file.3 I am using a two-stage setup: i3.service depends on xsession.target to start services before i3:
[Unit]
Description=X session
BindsTo=graphical-session.target
Wants=autorandr.service
Wants=dunst.socket
Wants=inputplug.service
Wants=picom.service
Wants=pulseaudio.socket
Wants=policykit-agent.service
Wants=redshift.service
Wants=spotify-clean.timer
Wants=ssh-agent.service
Wants=xiccd.service
Wants=xsettingsd.service
Wants=xss-lock.service
Then, i3 executes the second stage by invoking the i3-session.target:
[Unit]
Description=i3 session
BindsTo=graphical-session.target
Wants=wallpaper.service
Wants=wallpaper.timer
Wants=polybar-weather.service
Wants=polybar-weather.timer
Wants=polybar.service
Wants=i3-companion.service
Wants=misc-x.service
Have a look on my configuration files for more details.

rofi: the application launcher Rofi is an application launcher. Its appearance can be customized through a CSS-like language and it comes with several themes. Have a look at my configuration for mine.
Rofi as an application launcher
Rofi as an application launcher
It can also act as a generic menu application. I have a script to control a media player and another one to select the wifi network. It is quite a flexible application.
Rofi as a wifi network selector
Rofi to select a wireless network

xss-lock and i3lock: the screen locker i3lock is a simple screen locker. xss-lock invokes it reliably on inactivity or before a system suspend. For inactivity, it uses the XScreenSaver events. The delay is configured using the xset s command. The locker can be invoked immediately with xset s activate. X11 applications know how to prevent the screen saver from running. I have also developed a small dimmer application that is executed 20 seconds before the locker to give me a chance to move the mouse if I am not away.4 Have a look at my configuration script.
Demonstration of xss-lock, xss-dimmer and i3lock with a 4 speedup.

The remaining components
  • autorandr is a tool to detect the connected display, match them against a set of profiles, and configure them with xrandr.
  • inputplug executes a script for each new mouse and keyboard plugged. This is quite useful to load the appropriate the keyboard map. See my configuration.
  • xsettingsd provides settings to X11 applications, not unlike xrdb but it notifies applications for changes. The main use is to configure the Gtk and DPI settings. See my article on HiDPI support on Linux with X11.
  • Redshift adjusts the color temperature of the screen according to the time of day.
  • maim is a utility to take screenshots. I use Prt Scn to trigger a screenshot of a window or a specific area and Mod+Prt Scn to capture the whole desktop to a file. Check the helper script for details.
  • I have a collection of wallpapers I rotate every hour. A script selects them using advanced machine learning algorithms and stitches them together on multi-screen setups. The selected wallpaper is reused by i3lock.

  1. Apart from the eye candy, a compositor also helps to get tear-free video playbacks.
  2. My configuration works with both Haswell (2014) and Whiskey Lake (2018) Intel GPUs. It also works with AMD GPU based on the Polaris chipset (2017).
  3. You cannot manage two different displays this way e.g. :0 and :1. In the first implementation, I did try to parametrize each service with the associated display, but this is useless: there is only one DBus user session and many services rely on it. For example, you cannot run two notification daemons.
  4. I have only discovered later that XSecureLock ships such a dimmer with a similar implementation. But mine has a cool countdown!

2 September 2021

Eddy Petri&#537;or: Stretch to Buster upgrade issues: "Grub error: symbol grub_is_lockdown not found", missing RTL8111/8168/8411 Ethernet driver and RTL8821CE Wireless adapter on Linux Kernel 5.10 (and 4.19)

I have been Debian Stretch running on my HP Pavilion 14-ce0000nq laptop since buying it back in April 2019, just before my presence at Oxidizeconf where I presented "How to Rust When Standards Are Defined in C".Debian Buster (aka Debian 10) was released about 4 months later and I've been postponing the upgrade as my free time isn't what it used to be. I also tend to wait for the first or even second update of the release to avoid any sharp edges.As this laptop has a Realtek 8821CE wireless card that wasn't officially supported in the Linux kernel, I had to use an out-of-tree hacked driver to have the wireless work on Stretch kernels such as 4.19, it didn't even got along with DKMS, so all compilations and installations of it, I did them manually. More reason to wait for a newer release that would contain a driver inside the official kernel.I was waiting for the inevitable and dreading the wireless issues, but since mid-august Bullseye became stable, turning Stretch into oldoldstable, I decided that I had to do the upgrade, at least to buster.The Grub error and the fix
Everything went quite smooth, except that after the reboot, the laptop failed to boot with this Grub error:

error: symbol grub_is_lockdown not found
I looked for a solution and it seemed everyone was stuck or the solution was unclear.There is even a bug report in Debian about this error, bug #984760.
Adding to the pile of confusion my own confused solution: I tried supergrubdisk2/rescatux, it didn't work for me, it might have been a combination of me using LVM and grub-efi-amd64. I also tried to boot in rescue mode the Buster first DVD (to avoid the need for network), I was able to enter the partition, mount the EFI partition, too, but since I didn't want to mess the setup even more or depend on an external USB stick, I didn't know where should I try to write the Grub EFI config - the root partition is on an NVME storage.When buying the laptop it had FreeDOS installed on it and some HP rescue app, which I did not wipe when installing Debian. I even forgot where or how was the EFI installed on the disk and EFI, even if it should be more reliable and simpler, I never got the hang of it.
In the end, I realized that I could via BIOS actually select manually which EFI executable should be booted into, so I was able to boot with some manual intervention during boot into the regular system.I tried regenerating the grub configuration, installing it and also tried restoring the default proper boot sequence (and I even installed refind in the system during my fumbling), but I think somewhere between grub-efi-amd64 reconfiguration and its reinstallation I managed to do the right thing, as the default boot screen is the Grub one now.Hints for anyone reading this in the hope to fix the same issue, hopefully it will make things better, not worse (see the text below):1) regenerate the grub config:
update-grub2
2) reinstall grub-efi-amd64 and make Debian the default
dpkg-reconfigure -plow grub-efi-amd64
When reinstalling grub-efi-amd64 onto the disk, I think the scariest questions were to these:

Force extra installation to the EFI removable media path?

Some EFI-based systems are buggy and do not handle new bootloaders correctly. If you force an extra installation of GRUB to the EFI removable media path, this should ensure that this system will boot Debian correctly despite such a problem. However, it may remove the ability to boot any other operating systems that also depend on this path. If so, you will need to make sure that GRUB is configured successfully to be able to boot any other OS installations correctly.

and
Update NVRAM variables to automatically boot into Debian?

GRUB can configure your platform's NVRAM variables so that it boots into Debian automatically when powered on. However, you may prefer to disable this behavior and avoid changes to your boot configuration. For example, if your NVRAM variables have been set up such that your system contacts a PXE server on every boot, this would preserve that behavior.

I think the first can be safely answered "No" if you don't plan on booting via a removable USB stick, while the second is the one that does the restoring.The second question is probably safe if you don't use PXE boot or other boot method, at least that's what I understand. But if you do, I suspect by installing refind, by playing with the multiple efi* named packages and tools, you can restore that, or it might be that your BIOS allows that directly.
I just did a walk through of these 2 steps again on my laptop and answered "No" to the removable media question as it leads to errors when the media was not inserted (in my case the internal SD card reader), and "Yes" to making Debian the default.It seems that for me this broke the FreeDOS and HP utilities boot entries from Grub, but I still can boot via the BIOS options and my goal was to have Debian boot correctly by default.
Fixing the missing RTL811/8168/8411 Ethernet card issue
As a side note for people with computers having Realtek RTL8111/8168/8411 Gigabit Ethernet Controller and upgrading to Buster or switching to a newer kernel, please note that you might end up having the unpleasant surprise even your Ethernet card to disappear because the r8169 driver is not loader by default.I had to add it to /etc/modules so is loaded by default:
eddy@aptonia:/ $ cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
r8169

The 5.10 compatible driver for RTL8821CE wireless adapterAfter the upgrade to Buster, the oldstable version of the kernel, 4.19, the hacked version of the driver I've been using on Stretch on 4.9 kernels was no longer compatible - failed to compile due to missing symbols.The fix for me was to switch to the DKMS compatible driver from https://github.com/tomaspinho/rtl8821ce, as this seems to work for both 4.19 and 5.10 kernels (installed from backports).
I installed it via a modification of the manual install method only for the 4.19 and 5.10 kernels, leaving the legacy 4.9 kernels working with the hacked driver. You can do the same if instead of running the provided script, you do its steps manually and you install only for the kernel versions you want, instead of the default to install for all:I looked inside the dkms-install.sh script to do the required steps:Copy the driver, add it to the dkms set of known drivers:
DRV_NAME=rtl8821ce
DRV_VERSION=v5.5.2_34066.20200325

cp -r . /usr/src/$ DRV_NAME -$ DRV_VERSION

dkms add -m $ DRV_NAME -v $ DRV_VERSION
But you just build and install them only for the select kernel versions of your choice:
dkms build -m $ DRV_NAME -v $ DRV_VERSION -k 5.10.0-0.bpo.8-amd64
dkms install -m $ DRV_NAME -v $ DRV_VERSION -k 5.10.0-0.bpo.8-amd64
Or, without the variables:
dkms build rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64
dkms install rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64
dkms status should confirm everything is in place and I think you need to update grub2 again after this.
Please note this driver is no longer maintained and the 5.10 tree should support the RTL8821CE wireless card with the rtw88 driver from the kernel, but for me it did not. I'll probably try this at a later time, or after I upgrade to the current Debian stable, Bullseye.

30 August 2021

Andrew Cater: Oh, my goodness, where's the fantastic barbeque [OMGWTFBBQ 2021]

I'm guessing the last glasses will be through the dishwasher (again) and Pepper the dog can settle down without having to cope with so many people.For those who don't know - Steve and his wife Jo (Sledge and Randombird) hold a barbeque in their garden every August Bank Holiday weekend [UK Bank Holiday on the last Monday in August]. The barbeque is not small - it's the dominating feature in the suburban garden, brick built, with a dedication stone, lights, electricity. The garden is small, generally made smaller by forty or so Debian friends and allies standing and sitting around. People are talking, arguing, hugging people they've not seen for (literal) years and putting the world to rights. This is Debian central point - with large quantities of meat and salads, an amount of beer/alcohol and "Cambridge gin" and general goodwill. This year was more than usually atmospheric because for some of us it was the first time with a large group of people in a while. Side conversations abound: for me it was learning something about the high energy particle physics community, how to precision build helicopters, fly quadcopters and precision 3D print anything, the maths of Isy counting crochet stitches to sew together randomly sized squares ... and, of course, obligatory things like how random is random and what's good enough entropy. And a few sessions of the game of our leader.
This is also a place for stuff to get done: I was unashamedly using this to upgrade the storage in my laptop while there were sensible engineers around. A corner of the table, a RattusRattus and it was quickly sorted - then a discussion around the internals of Thinkpads as he took his apart. Then getting a full install - Gb Ethernet to the Debian mirror in the cupboard six feet away is faster bandwidth than a jumbo jet full of tapes. Then getting mail to work again - it's handy when the mailserver owner is next to you, having come in from the garden to help, and finally IRC. And not just me: "You need a GPG key signed - there's three DPLs here, there's a release manager - but you've just missed one of the DAMs." plus an in-depth GPG how-to session on the other side of the table.I was the luckiest one with the most comfortable bed in the house overnight but I couldn't stay for last night. Thanks once again to all involved but especially Steve and Jo who do this for the love of it, and the fun, and the community and the family. Oh, and thanks to Lenovo - not just for being a platinum sponsor of Debconf but also for providing the official laptop of this and most Debian occasions

25 August 2021

Bits from Debian: DebConf21 welcomes its sponsors!

DebConf21 logo DebConf21 is taking place online, from 24 August to 28 August 2021. It is the 22nd Debian conference, and organizers and participants are working hard together at creating interesting and fruitful events. We would like to warmly welcome the 19 sponsors of DebConf21, and introduce them to you. We have five Platinum sponsors. Our first Platinum sponsor is Lenovo. As a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office and data center solutions, Lenovo understands how critical open systems and platforms are to a connected world. Our next Platinum sponsor is Infomaniak. Infomaniak is Switzerland's largest web-hosting company, also offering backup and storage services, solutions for event organizers, live-streaming and video on demand services. It wholly owns its datacenters and all elements critical to the functioning of the services and products provided by the company (both software and hardware). Roche is our third Platinum sponsor. Roche is a major international pharmaceutical provider and research company dedicated to personalized healthcare. More than 100,000 employees worldwide work towards solving some of the greatest challenges for humanity using science and technology. Roche is strongly involved in publicly funded collaborative research projects with other industrial and academic partners and has supported DebConf since 2017. Amazon Web Services (AWS) is our fourth Platinum sponsor. Amazon Web Services is one of the world's most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centers globally (in 77 Availability Zones within 24 geographic regions). AWS customers include the fastest-growing startups, largest enterprises and leading government agencies. Google is our fifth Platinum sponsor. Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products such as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf for more than ten years, and is also a Debian partner sponsoring parts of Salsa's continuous integration infrastructure within Google Cloud Platform. Our Gold sponsor is the Matanel Foundation. The Matanel Foundation operates in Israel, as its first concern is to preserve the cohesion of a society and a nation plagued by divisions. The Matanel Foundation also works in Europe, in Africa and in South America. Our Silver sponsors are: arm: the World s Best SoC Design Portfolio, Arm powered solutions have been supporting innovation for more than 30 years and are deployed in over 160 billion chips to date, Hudson-Trading, a company researching and developing automated trading algorithms using advanced mathematical techniques, Ubuntu the Operating System delivered by Canonical, Globo, the largest media conglomerate in Brazil, founded in Rio de Janeiro in 1925 and distributing high-quality content across multiple platforms, Two Sigma, rigorous inquiry, data analysis, and invention to help solve the toughest challenges across financial services and GitLab, an open source end-to-end software development platform with built-in version control, issue tracking, code review, CI/CD, and more. Bronze sponsors: Univention, Gandi.net, daskeyboard, InterFace AG and credativ. And finally, our Supporter level sponsor, ISG.EE. Thanks to all our sponsors for their support! Their contributions make it possible for a large number of Debian contributors from all over the globe to work together, help and learn from each other in DebConf21. Participating in DebConf21 online The 22nd Debian Conference is being held online, due to COVID-19, from August 24 to 28, 2021. Talks, discussions, panels and other activities run from 12:00 to 02:00 UTC. Visit the DebConf21 website at https://debconf21.debconf.org to learn about the complete schedule, watch the live streaming and join the different communication channels for participating in the conference.

23 August 2021

Bits from Debian: Lenovo, Infomaniak, Roche, Amazon Web Services (AWS) and Google, Platinum Sponsors of DebConf21

We are very pleased to announce that Lenovo, Infomaniak, Roche, Amazon Web Services (AWS) and Google, have committed to supporting DebConf21 as Platinum sponsors. lenovologo As a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office and data center solutions, Lenovo understands how critical open systems and platforms are to a connected world. infomaniaklogo Infomaniak is Switzerland's largest web-hosting company, also offering backup and storage services, solutions for event organizers, live-streaming and video on demand services. It wholly owns its datacenters and all elements critical to the functioning of the services and products provided by the company (both software and hardware). rochelogo Roche Roche is a major international pharmaceutical provider and research company dedicated to personalized healthcare. More than 100.000 employees worldwide work towards solving some of the greatest challenges for humanity using science and technology. Roche is strongly involved in publicly funded collaborative research projects with other industrial and academic partners and have supported DebConf since 2017. AWSlogo Amazon Web Services (AWS) is one of the world's most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centers globally (in 77 Availability Zones within 24 geographic regions). AWS customers include the fastest-growing startups, largest enterprises and leading government agencies. Googlelogo Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products such as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf for more than ten years, and is also a Debian partner sponsoring parts of Salsa's continuous integration infrastructure within Google Cloud Platform. With these commitments as Platinum Sponsors, Lenovo, Infomaniak, Roche, Amazon Web Services and Google are contributing to make possible our annual conference, and directly supporting the progress of Debian and Free Software, helping to strengthen the community that continues to collaborate on Debian projects throughout the rest of the year. Thank you very much for your support of DebConf21! Participating in DebConf21 online The 22nd Debian Conference is being held Online, due to COVID-19, from August 22nd to 28th, 2021. There are 8 days of activities, running from 10:00 to 01:00 UTC. Visit the DebConf21 website at https://debconf21.debconf.org to learn about the complete schedule, watch the live streaming and join the different communication channels for participating in the conference.

5 August 2021

Ian Wienand: Lyte Portable Projector Investigation

I recently picked up this portable projector for a reasonable price. It might also be called a "M5" projector, but I can not find one canonical source. In terms of projection, it performs as well as a 5cm cube could be expected to. They made a poor choice to eschew adding an external video input which severely limits the device's usefulness. The design is nice and getting into it is quite an effort. There is no wasted space! After pulling off the rubber top covering and base, you have to pry the decorative metal shielding off all sides to access the screws to open it. This almost unavoidably bends it so it will never quite be the same. To avoid you having to bother, some photos: Lyte ProjectorIt is fairly locked down. I found a couple of ways in; installing the Disney+ app from the "Aptoide TV" store it ships with does not work, but the app prompts you to update it, which sends you to an action where you can then choose to open the Google Play store. From there, you can install things that work on it's Android 7 OS. This allowed me to install a system-viewer app which revealed its specs: Another weird thing I found was that if you go into the custom launcher "About" page under settings and keep clicking the "OK" button on the version number, it will open the standard Android settings page. From there you can enable developer options. I could not get it connecting to ADB, although you perhaps need a USB OTG cable which I didn't have. It has some sort of built-in Miracast app that I could not get anything to detect. It doesn't have the native Google app store; most of the apps in the provided system don't work. Somehow it runs Netflix via a webview or which is hard to use. If it had HDMI input it would still be a useful little thing to plug things into. You could perhaps sideload some sort of apps to get the screensharing working, or it plays media files off a USB stick or network shares. I don't believe there is any practical way to get a more recent Android on this, leaving it on an accelerated path to e-waste for all but the most boutique users.

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

17 July 2021

Andy Simpkins: Duel boot Debian and Windows

Installing a new laptop New is a 2nd hand Thinkpad T470p laptop that I intend to duel boot with windows.
I have been a Debian user for over 20 years, I use windows at work for the proprietary EDA Altium , but I have never had a windows installation on my laptop. This machine will to be different it is the first laptop that I have owned that has sufficient GPU to realistically run Altium.. I will try it in a VM later (if that works it will be my preferred choice), but for now I want to try a duel boot system. So where to start? Step one Debian wiki https://wiki.debian.org/DimentionedDualBoot/Windows My laptop was purchased from a dealer / refurbisher. This means that they had confirmed that the hardware was functional, wiped it down and then installed a clean copy of Windows on the whole system. What it doesn t mean is that the system was set for UEFI boot and that the EFI partition is set correctly . I turned on UEFI and made sure that Legacy BIOS mode was disabled. Next I re-installed Windows, making sure to leave enough disk space for may later Debian install. (if you already have UEFI / secure boot enabled then you could skip the reinstall and instead re-size your disk) Eeew! Windows now wants to show me adverts, it doesn t give me the option to never show me ads, but at least I could insist that it doesn t display tailored ads based on the obvious snooping of my web browsing habits just another reason to use Debian. Now to install Debian I want an encrypted file system, and because I want to dual boot I can t just follow the guided installation in the Debian installer. So I shall detail what I did here. Indeed I took several attempts at this and eventually asked for help as I had still messed up (I thought I was doing it correctly but had missed out a step) First the boiler plate DI Now for the interesting bit Partitioning the disk(s) Select MANUAL disk partitioning I have the following partitions: /dev/nvmen0p1
1.0MB FREE SPACE
#1 536.9 MB B K ESP
400.0 GB FREE SPACE
#3 16.8 MB Microsoft reserved partition
#4 111.6 GB ntfs Basic data partition
335.4 kB FREE SPACE Set use I now have the following partitions: LVM VG VG-Skink
#1 32 GB f swap swap
LVM VG VG-System
#1 367.5 GB f ext4 /
Encrypted volume
#1 399.5 GB K lvm
/dev/nvmen0p1
1.0MB FREE SPACE
#1 536.9 MB B K ESP
#2 500.2 MB F ext2 /boot
#5 399.2 GB K crypto skink
#3 16.8 MB Microsoft reserved partition
#4 111.6 GB ntfs Basic data partition
335.4 kB FREE SPACE Boiler plate debian install continues The system will install a base system Sit back and wait a for the system to install Well that didn t take very long Damn this new laptop is quick. I suspect that is nvme solid state storage, no longer limited to SATA bus speeds (and even that wasn t slow)

16 July 2021

Russell Coker: Thoughts about RAM and Storage Changes

My first Linux system in 1992 was a 386 with 4MB of RAM and a 120MB hard drive which (for some reason I forgot) only was supported by Linux for about 90MB. My first hard drive was 70MB and could do 500KB/s for contiguous IO, my first Linux hard drive was probably a bit faster, maybe 1MB/s. My current Linux workstation has 64G of RAM and 2*1TB NVMe devices that can sustain about 1.1GB/s. The laptop I m using right now has 8GB of RAM and a 180GB SSD that can do 380MB/s. My laptop has 2000* the RAM of my first Linux system and maybe 400* the contiguous IO speed. Currently I don t even run a VM with less than 4GB of RAM, NB I m not saying that smaller VMs aren t useful merely that I don t happen to be using them now. Modern AMD64 CPUs support 2MB huge pages . As a proportion of system RAM if I used 2MB pages everywhere they would be a smaller portion of system RAM than the 4KB pages on my first Linux system! I am not suggesting using 2MB pages for general systems. For my workstations the majority of processes are using less than 10MB of resident memory and given the different uses for memory mapped shared objects, memory mapped file IO, malloc(), stack, heap, etc there would be a lot of inefficiency having 2MB the limit for all allocation. But as systems worked with 4MB of RAM or less and 4K pages it would surely work to have only 2MB pages with 64GB or more of RAM. Back in the 90s it seemed ridiculous to me to have 256 byte pages on a 68030 CPU, but 4K pages on a modern AMD64 system is even more ridiculous. Apparently AMD64 supports 1GB pages on some CPUs, that seems ridiculously large but when run on a system with 1TB of RAM that s comparable to 4K pages on my first Linux system. Currently AWS offers 24TB EC2 instances and the Google Cloud Project offers 12TB virtual machines. It might even make sense to have the entire OS using 1GB pages for some usage scenarios on such systems, wasting tens of GB of RAM to save TLB thrashing might be a good trade-off. My personal laptop has 200* the RAM of my first Linux system and maybe 400* the contiguous IO speed. An employer recently assigned me a Thinkpad Carbon X1 Gen6 with an NVMe device that could sustain 5GB/s until the CPU overheated, that s 5000* the contiguous IO speed of my first Linux hard drive. My Linux hard drive had a 28ms average access time and my first Linux hard drive probably was a little better, let s call it 20ms for the sake of discussion. It s generally quoted that access times for NVMe are at best 10us, that s 2000* better than my first Linux hard drive. As seek times are the main factor for swap performance a laptop with 8GB of RAM and a fast NVMe device could be expected to give adequate performance with 2000* the swap of my first Linux system. For the work laptop in question I had 8G of swap and my personal laptop has 6G of swap which is somewhat comparable to the 4MB of swap on my first Linux system in that swap is about equal to RAM size, so I guess my personal laptop is performing better than it can be expected to. These are just some idle thoughts about hardware changes over the years. Don t take it as advice for purchasing hardware and don t take it too seriously in general. Also when writing comments don t restrict yourself to being overly serious, feel free to run the numbers on what systems with petabytes of Optane might be like, speculate on what NUMA systems in laptops might be like, etc. Go wild.

10 July 2021

Laura Arjona Reina: Android backups with rsync

A quick note to self to remind how I do backups of my Android device with rsync (and adb). I have followed this guide: How to use rsync over USB on Android with adb My personal notes: address = 127.0.0.1
port = 1873
uid = 0
gid = 0
[root]
path = /
use chroot = false
read only = false'
adb shell /data/local/tmp/rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/proc/self/fd/2 didn't work, produced this message: "@ERROR: protocol startup error" so I ended up doing: adb shell
rsync --daemon --no-detach --config=/sdcard/rsyncd.conf --log-file=/sdcard/rsync.log
and opened another tab to perform the rsync commands from my laptop: rsync -av --progress --stats rsync://localhost:6010/root/storage .
rsync -av --progress --stats rsync://localhost:6010/root/data .
Then I saw that rsync was copying the symlinks instead of their contents: /storage/self/primary was a broken link to /mnt/user/0/primary So I ran again the commands with -LK: rsync -av --progress --stats -LK rsync://localhost:6010/root/storage .
rsync -av --progress --stats -LK rsync://localhost:6010/root/data .
and now I have a copy of all the files I'm interested. In addition to this, I run an adb backup of the system: adb backup -f ./adb_backup_apk_shared_all_system.ad -apk -shared -all -system and I think that's all that I need for the case I want to remove stuff from my phone or some disaster happens.

5 July 2021

Jonathan Dowland: Photos and WhatsApp

I woke up this morning to a lovely little gallery of pictures of our children that my wife had sent me via WhatsApp. This has become the most common way we interact with family photos. We regularly send and receive photos to and from our families via WhatsApp, which re-compresses them for transit and temporary storage across their network. The original photos, wherever they are, will be in a very high quality (as you get on most modern cameras) and will be backed up in perfect fidelity to either Apple or Google s photo storage solutions. But all of that seems moot, when the most frequent way we engage with the pictures is via a method which compresses so aggressively that you can clearly see the artefacts, even thumbnailed on a phone screen. I still don t feel particularly happy with the solution in place for backing up the photos (or even: getting them off the phone). Both Apple and Google make it less than convenient to get them out of their respective walled gardens. I ve been evaluating the nextCloud app and a Nextcloud instance on my home NAS as a possible alternative.

Russell Coker: Servers and Lockdown

OS security features and server class systems are things that surely belong together. If a program is important enough to buy expensive servers to run it then it s important enough that you want to have all the OS security features enabled. For such an important program you will also want to have all possible monitoring systems running so you can predict hardware failures etc. Therefore you would expect that you could buy a server, setup the vendor s management software, configure your Linux kernel with security features such as lockdown (a LSM that restricts access to /dev/mem, the iopl() system call, and other dangerous things [1]), and have it run nicely! You will be disappointed if you try doing that on a HP or Dell server though. HP Problems
[370742.622525] Lockdown: hpasmlited: raw io port access is restricted; see man kernel_lockdown.7
The above message is logged when trying to INSTALL (not even run) the hp-health package from the official HP repository (as documented in my previous blog post about the HP ML-110 Gen9 [2]) with lockdown=integrity (the less restrictive lockdown option). Now the HP package in question is in their repository for Debian/Stretch (released in 2017) and the Lockdown LSM was documented by LWN as being released in 2019, so not supporting a Debian/Bullseye feature in Debian/Stretch packages isn t inherently a bad thing apart from the fact that they haven t released a new version of that package since. The Stretch package that I am testing now was released in 2019. Also it s been regarded as best practice to have device drivers for this sort of thing since long before 2017.
# hplog -v
ERROR: Could not open /dev/cpqhealth/cdt.
Please make sure the Health Monitor is started.
Attempting to run the hplog -v command (to view the HP hardware log) gives the above error. Strace reveals that it could and did open /dev/cpqhealth/cdt but had problems talking to something (presumably the Health Monitor daemon) over a Unix domain socket. It would be nice if they could at least get the error message right! Dell Problems
[   13.811165] Lockdown: smbios-sys-info: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
[   13.820935] Lockdown: smbios-sys-info: raw io port access is restricted; see man kernel_lockdown.7
[   18.118118] Lockdown: dchcfg: raw io port access is restricted; see man kernel_lockdown.7
[   18.127621] Lockdown: dchcfg: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
[   19.371391] Lockdown: dsm_sa_datamgrd: raw io port access is restricted; see man kernel_lockdown.7
[   19.382147] Lockdown: dsm_sa_datamgrd: /dev/mem,kmem,port is restricted; see man kernel_lockdown.7
Above is a sample of the messages when booting a Dell PowerEdge R710 with lockdown=integrity with the srvadmin-omacore package installed from the official Dell repository I describe in my blog post about the Dell PowerEdge R710 [3]. Now that repository is for Ubuntu/Xenial which was released in 2015, but again it was best practice to have device drivers for this many years ago. Also the newest Debian based releases that Dell apparently supports are Ubuntu/Xenial and Debian/Jessie which were both released in 2015.
# omreport system esmlog
Error! No Embedded System Management (ESM) log found on this system.
Above is the result when I try to view the ESM log (the Dell hardware log). How Long Should Server Support Last? The Wikipedia List of Dell PowerEdge Servers shows that the R710 is a Generation 11 system. Generation 11 was first released in 2010 and Generation 12 was first released in 2012. Generation 13 was the latest hardware Dell sold in 2015 when they apparently ceased providing newer OS support for Generation 11. Dell currently sells Generation 15 systems and provides more recent support for Generation 14 and Generation 15. I think it s reasonable to debate whether Dell should support servers for 4 generations. But given that a major selling point of server class systems is that they have long term support I think it would make sense to give better support for this and not drop support when it s only 2 versions from the latest release! The support for Dell Generation 11 hardware only seems to have lasted for 3 years after Generation 12 was first released. Also it appears that software support for Dell Generation 13 ceased before Generation 14 was released, that sucks for the people who bought Generation 13 when they were new! HP is currently selling Gen 10 servers which were first released at the end of 2017. So it appears that HP stopped properly supporting Gen 9 servers as soon as Gen 10 servers were released! One thing to note about these support times, when the new generation of hardware was officially released the previous generation was still on sale. So while HP Gen 10 servers officially came out in 2017 that doesn t necessarily mean that someone who wanted to buy a ML-110 Gen10 could actually have done so. For comparison Red Hat Enterprise Linux has been supported for 4-6 years for every release they made since 2005 and Ubuntu has always had a 5 year LTS support for servers. How To Do It Properly The correct way of interfacing with hardware is via a device driver that is supported in the kernel.org tree. That means it goes through the usual kernel source code quality checks which are really good at finding bugs and gives users an assurance that the code won t cause security problems. Generally nothing about the code from Dell or HP gives me confidence that it should be directly accessing /dev/kmem or raw IO ports without risk of problems. Once a driver is in the kernel.org tree it will usually stay there forever and not require further effort from the people who submit it. Then it just works for everyone and tends to work with any other kernel features that people use, like LSMs. If they released the source code to the management programs then it would save them even more effort as they could be maintained by the community.

Next.