Search Results: "orv"

21 May 2024

Michael Ablassmeier: lvm thin send/recv

A few days ago i found this mail on the LKML that introduces support for userspace access to LVM thin provisioned metadata snapshots. I didn t know this is possible. Using the thin provisioning tools you can then export the metadata information for your LVM snapshots to track changed regions between them. The workflow is pretty straight forward, yet not really documented:
# lvcreate -ay -Ky --snapshot -n full_backup thingroup/vol1
  # dmsetup message /dev/mapper/thingroup-thinpool-tpool 0 reserve_metadata_snap
  # lvcreate -ay -Ky --snapshot -n inc_backup thingroup/vol1
  # thin_delta  -m --snap1 $(lvs --noheadings -o thin_id thingroup/full_backup) --snap2 $(lvs --noheadings -o thin_id thingroup/inc_backup) > delta_dump
  # dmsetup message /dev/mapper/thingroup-thinpool-tpool 0 release_metadata_snap
This all has already been implemented by a nice utility called thin-send-recv, which based on this functionality allows to (incrementally) send LVM snapshots to remote systems just like zfs send or zfs recv.

1 May 2024

Antoine Beaupr : Tor migrates from Gitolite/GitWeb to GitLab

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.
Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server. Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:
https://git.torproject.org/
to:
https://gitlab.torproject.org/
In your Git configuration. The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab. Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab. User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate. The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining. The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code? Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request. An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page. Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost. Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones. A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form. That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab. Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org. In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why? This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago. The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories. There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab. TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab. In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules. When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories. When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.) So a relatively large amount of Python code was produced to basically do the following:
  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change
User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks. Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order. The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions. The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:
anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]
In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:
$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> =>  'id': 2723, 'name': 'master', 'push_access_levels': [ 'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None ], 'merge_access_levels': [ 'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers' ], 'allow_force_push': False 
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]
Acute eyes will notice the bell used as a notification mechanism as well in this transcript. A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious. This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone. Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record. The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case
# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches hooks info objects/).*) git-.* upload-pack receive-pack HEAD config description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond % REQUEST_URI  ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%2 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$2 .git/$3 [R=302,L]
# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

gitweb.torproject.org rewrite rules Those are the vastly more complicated GitWeb to GitLab rewrite rules. Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]
# list of endpoints taken from cgit's cmd.c
# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond % REQUEST_URI  ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# summary
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# about
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# commit
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond "% QUERY_STRING " "(.*(?:^ &))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1 [R=302,L,QSD]
# patch
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.patch [R=302,L,QSD]
# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.diff [R=302,L,QSD]
# log
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD$2 [R=302,L]
# atom
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L,QSD]
# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags/%1 [R=302,L,QSD]
# tree
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD$2 [R=302,L]
# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD [R=302,L]
# plain
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/HEAD$2 [R=302,L]
# blame: disabled
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteCond % QUERY_STRING  h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/HEAD/$2 [R=302,L]
# stats
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/graphs/HEAD [R=302,L]
# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side
# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]
The reference copy of those is available in our (currently private) Puppet git repository.

12 February 2024

Emanuele Rocca: Enabling Kernel Settings in Debian

This time it s about enabling new kernel config options in the official Debian kernel packages. A few dependencies are needed to run the various scripts used by the Debian kernel folks, as well as to build the kernel itself:
apt install git gpg python3-debian python3-dacite
apt build-dep linux
With that in place, fetch the linux and kernel-team repos:
git clone --depth 1 https://salsa.debian.org/kernel-team/linux
git clone --depth 1 https://salsa.debian.org/kernel-team/kernel-team
So far you ve only got the Debian-specific bits. Fetch the actual kernel sources now. In the likely case that you re building a stable kernel, run the following from within the linux directory:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
Use the torvalds repo if you re building an RC version instead:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
Now generate the upstream tarball as well as debian/control. The first command will take a bit, and the second command will fail: but that s success just as the output says.
debian/rules orig
debian/rules debian/control
Now generate patched sources with:
debian/rules source
Time to edit the Kconfig and enable/disable whatever setting you wanted to change. Take a look around the files under debian/config/ to see where your changes should go. If it s a setting shared among multiple architectures that may be debian/config/config. For x86-specific things, the file is debian/config/amd64/config. On aarch64 debian/config/arm64/config. If in doubt, you could try asking #debian-kernel on IRC.
It may look like you need to figure out where exactly in the file the setting should be placed. That is not the case. There s a helpful script fixing things up for you:
../kernel-team/utils/kconfigeditor2/process.py .
The above will fail if you forgot to run debian/rules source. The debian/build/source_rt/Kconfig file is needed by the script:
Traceback (most recent call last):
  File "/tmp/linux/../kernel-team/utils/kconfigeditor2/process.py", line 19, in __init__
    menu = fs_menu[featureset or 'none']
           ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^
KeyError: 'rt'
During handling of the above exception, another exception occurred:
[...]
FileNotFoundError: [Errno 2] No such file or directory: './debian/build/source_rt/Kconfig'
If that happens, run:
debian/rules source
Now process.py should work fine and fix your config file.
Excellent, now the config is updated and we re ready to build the kernel. Off we go:
export MAKEFLAGS=-j$(nproc)
export DEB_BUILD_PROFILES='pkg.linux.nokerneldbg pkg.linux.nokerneldbginfo pkg.linux.notools nodoc'
dpkg-buildpackage -b -nc -uc

7 January 2024

Jonathan McDowell: Free Software Activities for 2023

This year was hard from a personal and work point of view, which impacted the amount of Free Software bits I ended up doing - even when I had the time I often wasn t in the right head space to make progress on things. However writing this annual recap up has been a useful exercise, as I achieved more than I realised. For previous years see 2019, 2020, 2021 + 2022.

Conferences The only Free Software related conference I made it to this year was DebConf23 in Kochi, India. Changes with projects at work meant I couldn t justify anything work related. This year I m planning to make it to FOSDEM, and haven t made a decision on DebConf24 yet.

Debian Most of my contributions to Free software continue to happen within Debian. I started the year working on retrogaming with Kodi on Debian. I got this to a much better state for bookworm, with it being possible to run the bsnes-mercury emulator under Kodi using RetroArch. There are a few other libretro backends available for RetroArch, but Kodi needs some extra controller mappings packaged up first. Plenty of uploads were involved, though some of this was aligning all the dependencies and generally cleaning things up in iterations. I continued to work on a few packages within the Debian Electronics Packaging Team. OpenOCD produced a new release in time for the bookworm release, so I uploaded 0.12.0-1. There were a few minor sigrok cleanups - sigrok 0.3, libsigrokdecode 0.5.3-4 + libsigrok 0.5.2-4 / 0.5.2-5. While I didn t manage to get the work completed I did some renaming of the ESP8266 related packages - gcc-xtensa-lx106 (which saw a 13 upload pre-bookworm) has become gcc-xtensa (with 14) and binutils-xtensa-lx106 has become binutils-xtensa (with 6). Binary packages remain the same, but this is intended to allow for the generation of ESP32 compiler toolchains from the same source. onak saw 0.6.3-1 uploaded to match the upstream release. I also uploaded libgpg-error 1.47-1 (though I can claim no credit for any of the work in preparing the package) to help move things forward on updating gnupg2 in Debian. I NMUed tpm2-pkcs11 1.9.0-0.1 to fix some minor issues pre-bookworm release; I use this package myself to store my SSH key within my laptop TPM, so I care about it being in a decent state. sg3-utils also saw a bit of love with 1.46-2 + 1.46-3 - I don t work in the storage space these days, but I m still listed as an uploaded and there was an RC bug around the library package naming that I was qualified to fix and test pre-bookworm. Related to my retroarch work I sponsored uploads of mgba for Ryan Tandy: 0.10.0+dfsg-1, 0.10.0+dfsg-2, 0.10.1+dfsg-1, 0.10.2+dfsg-1, mgba 0.10.1+dfsg-1+deb12u1. As part of the Data Protection Team I responded to various inbound queries to that team, both from project members and those external to the project. I continue to keep an eye on Debian New Members, even though I m mostly inactive as an application manager - we generally seem to have enough available recently. Mostly my involvement is via Front Desk activities, helping out with queries to the team alias, and contributing to internal discussions as well as our panel at DebConf23. Finally the 3 month rotation for Debian Keyring continues to operate smoothly. I dealt with 2023.03.24, 2023.06.26, 2023.06.29, 2023.09.10, 2023.09.24 + 2023.12.24.

Linux I had a few minor patches accepted to the kernel this year. A pair of safexcel cleanups (improved error logging for firmware load fail and cleanup on load failure) came out of upgrading the kernel running on my RB5009. The rest were related to my work on repurposing my C.H.I.P.. The AXP209 driver needed extended to support GPIO3 (with associated DT schema update). That allowed Bluetooth to be enabled. Adding the AXP209 internal temperature ADC as an iio-hwmon node means it can be tracked using the normal sensor monitoring framework. And finally I added the pinmux settings for mmc2, which I use to support an external microSD slot on my C.H.I.P.

Personal projects 2023 saw another minor release of onak, 0.6.3, which resulted in a corresponding Debian upload (0.6.3-1). It has a couple of bug fixes (including a particularly annoying, if minor, one around systemd socket activation that felt very satisfying to get to the bottom of), but I still lack the time to do any of the major changes I would like to. I wrote listadmin3 to allow easy manipulation of moderation queues for Mailman3. It s basic, but it s drastically improved my timeliness on dealing with held messages.

13 December 2023

Melissa Wen: 15 Tips for Debugging Issues in the AMD Display Kernel Driver

A self-help guide for examining and debugging the AMD display driver within the Linux kernel/DRM subsystem. It s based on my experience as an external developer working on the driver, and are shared with the goal of helping others navigate the driver code. Acknowledgments: These tips were gathered thanks to the countless help received from AMD developers during the driver development process. The list below was obtained by examining open source code, reviewing public documentation, playing with tools, asking in public forums and also with the help of my former GSoC mentor, Rodrigo Siqueira.

Pre-Debugging Steps: Before diving into an issue, it s crucial to perform two essential steps: 1) Check the latest changes: Ensure you re working with the latest AMD driver modifications located in the amd-staging-drm-next branch maintained by Alex Deucher. You may also find bug fixes for newer kernel versions on branches that have the name pattern drm-fixes-<date>. 2) Examine the issue tracker: Confirm that your issue isn t already documented and addressed in the AMD display driver issue tracker. If you find a similar issue, you can team up with others and speed up the debugging process.

Understanding the issue: Do you really need to change this? Where should you start looking for changes? 3) Is the issue in the AMD kernel driver or in the userspace?: Identifying the source of the issue is essential regardless of the GPU vendor. Sometimes this can be challenging so here are some helpful tips:
  • Record the screen: Capture the screen using a recording app while experiencing the issue. If the bug appears in the capture, it s likely a userspace issue, not the kernel display driver.
  • Analyze the dmesg log: Look for error messages related to the display driver in the dmesg log. If the error message appears before the message [drm] Display Core v... , it s not likely a display driver issue. If this message doesn t appear in your log, the display driver wasn t fully loaded and you will see a notification that something went wrong here.
4) AMD Display Manager vs. AMD Display Core: The AMD display driver consists of two components:
  • Display Manager (DM): This component interacts directly with the Linux DRM infrastructure. Occasionally, issues can arise from misinterpretations of DRM properties or features. If the issue doesn t occur on other platforms with the same AMD hardware - for example, only happens on Linux but not on Windows - it s more likely related to the AMD DM code.
  • Display Core (DC): This is the platform-agnostic part responsible for setting and programming hardware features. Modifications to the DC usually require validation on other platforms, like Windows, to avoid regressions.
5) Identify the DC HW family: Each AMD GPU has variations in its hardware architecture. Features and helpers differ between families, so determining the relevant code for your specific hardware is crucial.
  • Find GPU product information in Linux/AMD GPU documentation
  • Check the dmesg log for the Display Core version (since this commit in Linux kernel 6.3v). For example:
    • [drm] Display Core v3.2.241 initialized on DCN 2.1
    • [drm] Display Core v3.2.237 initialized on DCN 3.0.1

Investigating the relevant driver code: Keep from letting unrelated driver code to affect your investigation. 6) Narrow the code inspection down to one DC HW family: the relevant code resides in a directory named after the DC number. For example, the DCN 3.0.1 driver code is located at drivers/gpu/drm/amd/display/dc/dcn301. We all know that the AMD s shared code is huge and you can use these boundaries to rule out codes unrelated to your issue. 7) Newer families may inherit code from older ones: you can find dcn301 using code from dcn30, dcn20, dcn10 files. It s crucial to verify which hooks and helpers your driver utilizes to investigate the right portion. You can leverage ftrace for supplemental validation. To give an example, it was useful when I was updating DCN3 color mapping to correctly use their new post-blending color capabilities, such as: Additionally, you can use two different HW families to compare behaviours. If you see the issue in one but not in the other, you can compare the code and understand what has changed and if the implementation from a previous family doesn t fit well the new HW resources or design. You can also count on the help of the community on the Linux AMD issue tracker to validate your code on other hardware and/or systems. This approach helped me debug a 2-year-old issue where the cursor gamma adjustment was incorrect in DCN3 hardware, but working correctly for DCN2 family. I solved the issue in two steps, thanks for community feedback and validation: 8) Check the hardware capability screening in the driver: You can currently find a list of display hardware capabilities in the drivers/gpu/drm/amd/display/dc/dcn*/dcn*_resource.c file. More precisely in the dcn*_resource_construct() function. Using DCN301 for illustration, here is the list of its hardware caps:
	/*************************************************
	 *  Resource + asic cap harcoding                *
	 *************************************************/
	pool->base.underlay_pipe_index = NO_UNDERLAY_PIPE;
	pool->base.pipe_count = pool->base.res_cap->num_timing_generator;
	pool->base.mpcc_count = pool->base.res_cap->num_timing_generator;
	dc->caps.max_downscale_ratio = 600;
	dc->caps.i2c_speed_in_khz = 100;
	dc->caps.i2c_speed_in_khz_hdcp = 5; /*1.4 w/a enabled by default*/
	dc->caps.max_cursor_size = 256;
	dc->caps.min_horizontal_blanking_period = 80;
	dc->caps.dmdata_alloc_size = 2048;
	dc->caps.max_slave_planes = 2;
	dc->caps.max_slave_yuv_planes = 2;
	dc->caps.max_slave_rgb_planes = 2;
	dc->caps.is_apu = true;
	dc->caps.post_blend_color_processing = true;
	dc->caps.force_dp_tps4_for_cp2520 = true;
	dc->caps.extended_aux_timeout_support = true;
	dc->caps.dmcub_support = true;
	/* Color pipeline capabilities */
	dc->caps.color.dpp.dcn_arch = 1;
	dc->caps.color.dpp.input_lut_shared = 0;
	dc->caps.color.dpp.icsc = 1;
	dc->caps.color.dpp.dgam_ram = 0; // must use gamma_corr
	dc->caps.color.dpp.dgam_rom_caps.srgb = 1;
	dc->caps.color.dpp.dgam_rom_caps.bt2020 = 1;
	dc->caps.color.dpp.dgam_rom_caps.gamma2_2 = 1;
	dc->caps.color.dpp.dgam_rom_caps.pq = 1;
	dc->caps.color.dpp.dgam_rom_caps.hlg = 1;
	dc->caps.color.dpp.post_csc = 1;
	dc->caps.color.dpp.gamma_corr = 1;
	dc->caps.color.dpp.dgam_rom_for_yuv = 0;
	dc->caps.color.dpp.hw_3d_lut = 1;
	dc->caps.color.dpp.ogam_ram = 1;
	// no OGAM ROM on DCN301
	dc->caps.color.dpp.ogam_rom_caps.srgb = 0;
	dc->caps.color.dpp.ogam_rom_caps.bt2020 = 0;
	dc->caps.color.dpp.ogam_rom_caps.gamma2_2 = 0;
	dc->caps.color.dpp.ogam_rom_caps.pq = 0;
	dc->caps.color.dpp.ogam_rom_caps.hlg = 0;
	dc->caps.color.dpp.ocsc = 0;
	dc->caps.color.mpc.gamut_remap = 1;
	dc->caps.color.mpc.num_3dluts = pool->base.res_cap->num_mpc_3dlut; //2
	dc->caps.color.mpc.ogam_ram = 1;
	dc->caps.color.mpc.ogam_rom_caps.srgb = 0;
	dc->caps.color.mpc.ogam_rom_caps.bt2020 = 0;
	dc->caps.color.mpc.ogam_rom_caps.gamma2_2 = 0;
	dc->caps.color.mpc.ogam_rom_caps.pq = 0;
	dc->caps.color.mpc.ogam_rom_caps.hlg = 0;
	dc->caps.color.mpc.ocsc = 1;
	dc->caps.dp_hdmi21_pcon_support = true;
	/* read VBIOS LTTPR caps */
	if (ctx->dc_bios->funcs->get_lttpr_caps)  
		enum bp_result bp_query_result;
		uint8_t is_vbios_lttpr_enable = 0;
		bp_query_result = ctx->dc_bios->funcs->get_lttpr_caps(ctx->dc_bios, &is_vbios_lttpr_enable);
		dc->caps.vbios_lttpr_enable = (bp_query_result == BP_RESULT_OK) && !!is_vbios_lttpr_enable;
	 
	if (ctx->dc_bios->funcs->get_lttpr_interop)  
		enum bp_result bp_query_result;
		uint8_t is_vbios_interop_enabled = 0;
		bp_query_result = ctx->dc_bios->funcs->get_lttpr_interop(ctx->dc_bios, &is_vbios_interop_enabled);
		dc->caps.vbios_lttpr_aware = (bp_query_result == BP_RESULT_OK) && !!is_vbios_interop_enabled;
	 
Keep in mind that the documentation of color capabilities are available at the Linux kernel Documentation.

Understanding the development history: What has brought us to the current state? 9) Pinpoint relevant commits: Use git log and git blame to identify commits targeting the code section you re interested in. 10) Track regressions: If you re examining the amd-staging-drm-next branch, check for regressions between DC release versions. These are defined by DC_VER in the drivers/gpu/drm/amd/display/dc/dc.h file. Alternatively, find a commit with this format drm/amd/display: 3.2.221 that determines a display release. It s useful for bisecting. This information helps you understand how outdated your branch is and identify potential regressions. You can consider each DC_VER takes around one week to be bumped. Finally, check testing log of each release in the report provided on the amd-gfx mailing list, such as this one Tested-by: Daniel Wheeler:

Reducing the inspection area: Focus on what really matters. 11) Identify involved HW blocks: This helps isolate the issue. You can find more information about DCN HW blocks in the DCN Overview documentation. In summary:
  • Plane issues are closer to HUBP and DPP.
  • Blending/Stream issues are closer to MPC, OPP and OPTC. They are related to DRM CRTC subjects.
This information was useful when debugging a hardware rotation issue where the cursor plane got clipped off in the middle of the screen. Finally, the issue was addressed by two patches: 12) Issues around bandwidth (glitches) and clocks: May be affected by calculations done in these HW blocks and HW specific values. The recalculation equations are found in the DML folder. DML stands for Display Mode Library. It s in charge of all required configuration parameters supported by the hardware for multiple scenarios. See more in the AMD DC Overview kernel docs. It s a math library that optimally configures hardware to find the best balance between power efficiency and performance in a given scenario. Finding some clk variables that affect device behavior may be a sign of it. It s hard for a external developer to debug this part, since it involves information from HW specs and firmware programming that we don t have access. The best option is to provide all relevant debugging information you have and ask AMD developers to check the values from your suspicions.
  • Do a trick: If you suspect the power setup is degrading performance, try setting the amount of power supplied to the GPU to the maximum and see if it affects the system behavior with this command: sudo bash -c "echo high > /sys/class/drm/card0/device/power_dpm_force_performance_level"
I learned it when debugging glitches with hardware cursor rotation on Steam Deck. My first attempt was changing the clock calculation. In the end, Rodrigo Siqueira proposed the right solution targeting bandwidth in two steps:

Checking implicit programming and hardware limitations: Bring implicit programming to the level of consciousness and recognize hardware limitations. 13) Implicit update types: Check if the selected type for atomic update may affect your issue. The update type depends on the mode settings, since programming some modes demands more time for hardware processing. More details in the source code:
/* Surface update type is used by dc_update_surfaces_and_stream
 * The update type is determined at the very beginning of the function based
 * on parameters passed in and decides how much programming (or updating) is
 * going to be done during the call.
 *
 * UPDATE_TYPE_FAST is used for really fast updates that do not require much
 * logical calculations or hardware register programming. This update MUST be
 * ISR safe on windows. Currently fast update will only be used to flip surface
 * address.
 *
 * UPDATE_TYPE_MED is used for slower updates which require significant hw
 * re-programming however do not affect bandwidth consumption or clock
 * requirements. At present, this is the level at which front end updates
 * that do not require us to run bw_calcs happen. These are in/out transfer func
 * updates, viewport offset changes, recout size changes and pixel
depth changes.
 * This update can be done at ISR, but we want to minimize how often
this happens.
 *
 * UPDATE_TYPE_FULL is slow. Really slow. This requires us to recalculate our
 * bandwidth and clocks, possibly rearrange some pipes and reprogram
anything front
 * end related. Any time viewport dimensions, recout dimensions,
scaling ratios or
 * gamma need to be adjusted or pipe needs to be turned on (or
disconnected) we do
 * a full update. This cannot be done at ISR level and should be a rare event.
 * Unless someone is stress testing mpo enter/exit, playing with
colour or adjusting
 * underscan we don't expect to see this call at all.
 */
enum surface_update_type  
UPDATE_TYPE_FAST, /* super fast, safe to execute in isr */
UPDATE_TYPE_MED,  /* ISR safe, most of programming needed, no bw/clk change*/
UPDATE_TYPE_FULL, /* may need to shuffle resources */
 ;

Using tools: Observe the current state, validate your findings, continue improvements. 14) Use AMD tools to check hardware state and driver programming: help on understanding your driver settings and checking the behavior when changing those settings.
  • DC Visual confirmation: Check multiple planes and pipe split policy.
  • DTN logs: Check display hardware state, including rotation, size, format, underflow, blocks in use, color block values, etc.
  • UMR: Check ASIC info, register values, KMS state - links and elements (framebuffers, planes, CRTCs, connectors). Source: UMR project documentation
15) Use generic DRM/KMS tools:
  • IGT test tools: Use generic KMS tests or develop your own to isolate the issue in the kernel space. Compare results across different GPU vendors to understand their implementations and find potential solutions. Here AMD also has specific IGT tests for its GPUs that is expect to work without failures on any AMD GPU. You can check results of HW-specific tests using different display hardware families or you can compare expected differences between the generic workflow and AMD workflow.
  • drm_info: This tool summarizes the current state of a display driver (capabilities, properties and formats) per element of the DRM/KMS workflow. Output can be helpful when reporting bugs.

Don t give up! Debugging issues in the AMD display driver can be challenging, but by following these tips and leveraging available resources, you can significantly improve your chances of success. Worth mentioning: This blog post builds upon my talk, I m not an AMD expert, but presented at the 2022 XDC. It shares guidelines that helped me debug AMD display issues as an external developer of the driver. Open Source Display Driver: The Linux kernel/AMD display driver is open source, allowing you to actively contribute by addressing issues listed in the official tracker. Tackling existing issues or resolving your own can be a rewarding learning experience and a valuable contribution to the community. Additionally, the tracker serves as a valuable resource for finding similar bugs, troubleshooting tips, and suggestions from AMD developers. Finally, it s a platform for seeking help when needed. Remember, contributing to the open source community through issue resolution and collaboration is mutually beneficial for everyone involved.

21 November 2023

Mike Hommey: How I (kind of) killed Mercurial at Mozilla

Did you hear the news? Firefox development is moving from Mercurial to Git. While the decision is far from being mine, and I was barely involved in the small incremental changes that ultimately led to this decision, I feel I have to take at least some responsibility. And if you are one of those who would rather use Mercurial than Git, you may direct all your ire at me. But let's take a step back and review the past 25 years leading to this decision. You'll forgive me for skipping some details and any possible inaccuracies. This is already a long post, while I could have been more thorough, even I think that would have been too much. This is also not an official Mozilla position, only my personal perception and recollection as someone who was involved at times, but mostly an observer from a distance. From CVS to DVCS From its release in 1998, the Mozilla source code was kept in a CVS repository. If you're too young to know what CVS is, let's just say it's an old school version control system, with its set of problems. Back then, it was mostly ubiquitous in the Open Source world, as far as I remember. In the early 2000s, the Subversion version control system gained some traction, solving some of the problems that came with CVS. Incidentally, Subversion was created by Jim Blandy, who now works at Mozilla on completely unrelated matters. In the same period, the Linux kernel development moved from CVS to Bitkeeper, which was more suitable to the distributed nature of the Linux community. BitKeeper had its own problem, though: it was the opposite of Open Source, but for most pragmatic people, it wasn't a real concern because free access was provided. Until it became a problem: someone at OSDL developed an alternative client to BitKeeper, and licenses of BitKeeper were rescinded for OSDL members, including Linus Torvalds (they were even prohibited from purchasing one). Following this fiasco, in April 2005, two weeks from each other, both Git and Mercurial were born. The former was created by Linus Torvalds himself, while the latter was developed by Olivia Mackall, who was a Linux kernel developer back then. And because they both came out of the same community for the same needs, and the same shared experience with BitKeeper, they both were similar distributed version control systems. Interestingly enough, several other DVCSes existed: In this landscape, the major difference Git was making at the time was that it was blazing fast. Almost incredibly so, at least on Linux systems. That was less true on other platforms (especially Windows). It was a game-changer for handling large codebases in a smooth manner. Anyways, two years later, in 2007, Mozilla decided to move its source code not to Bzr, not to Git, not to Subversion (which, yes, was a contender), but to Mercurial. The decision "process" was laid down in two rather colorful blog posts. My memory is a bit fuzzy, but I don't recall that it was a particularly controversial choice. All of those DVCSes were still young, and there was no definite "winner" yet (GitHub hadn't even been founded). It made the most sense for Mozilla back then, mainly because the Git experience on Windows still wasn't there, and that mattered a lot for Mozilla, with its diverse platform support. As a contributor, I didn't think much of it, although to be fair, at the time, I was mostly consuming the source tarballs. Personal preferences Digging through my archives, I've unearthed a forgotten chapter: I did end up setting up both a Mercurial and a Git mirror of the Firefox source repository on alioth.debian.org. Alioth.debian.org was a FusionForge-based collaboration system for Debian developers, similar to SourceForge. It was the ancestor of salsa.debian.org. I used those mirrors for the Debian packaging of Firefox (cough cough Iceweasel). The Git mirror was created with hg-fast-export, and the Mercurial mirror was only a necessary step in the process. By that time, I had converted my Subversion repositories to Git, and switched off SVK. Incidentally, I started contributing to Git around that time as well. I apparently did this not too long after Mozilla switched to Mercurial. As a Linux user, I think I just wanted the speed that Mercurial was not providing. Not that Mercurial was that slow, but the difference between a couple seconds and a couple hundred milliseconds was a significant enough difference in user experience for me to prefer Git (and Firefox was not the only thing I was using version control for) Other people had also similarly created their own mirror, or with other tools. But none of them were "compatible": their commit hashes were different. Hg-git, used by the latter, was putting extra information in commit messages that would make the conversion differ, and hg-fast-export would just not be consistent with itself! My mirror is long gone, and those have not been updated in more than a decade. I did end up using Mercurial, when I got commit access to the Firefox source repository in April 2010. I still kept using Git for my Debian activities, but I now was also using Mercurial to push to the Mozilla servers. I joined Mozilla as a contractor a few months after that, and kept using Mercurial for a while, but as a, by then, long time Git user, it never really clicked for me. It turns out, the sentiment was shared by several at Mozilla. Git incursion In the early 2010s, GitHub was becoming ubiquitous, and the Git mindshare was getting large. Multiple projects at Mozilla were already entirely hosted on GitHub. As for the Firefox source code base, Mozilla back then was kind of a Wild West, and engineers being engineers, multiple people had been using Git, with their own inconvenient workflows involving a local Mercurial clone. The most popular set of scripts was moz-git-tools, to incorporate changes in a local Git repository into the local Mercurial copy, to then send to Mozilla servers. In terms of the number of people doing that, though, I don't think it was a lot of people, probably a few handfuls. On my end, I was still keeping up with Mercurial. I think at that time several engineers had their own unofficial Git mirrors on GitHub, and later on Ehsan Akhgari provided another mirror, with a twist: it also contained the full CVS history, which the canonical Mercurial repository didn't have. This was particularly interesting for engineers who needed to do some code archeology and couldn't get past the 2007 cutoff of the Mercurial repository. I think that mirror ultimately became the official-looking, but really unofficial, mozilla-central repository on GitHub. On a side note, a Mercurial repository containing the CVS history was also later set up, but that didn't lead to something officially supported on the Mercurial side. Some time around 2011~2012, I started to more seriously consider using Git for work myself, but wasn't satisfied with the workflows others had set up for themselves. I really didn't like the idea of wasting extra disk space keeping a Mercurial clone around while using a Git mirror. I wrote a Python script that would use Mercurial as a library to access a remote repository and produce a git-fast-import stream. That would allow the creation of a git repository without a local Mercurial clone. It worked quite well, but it was not able to incrementally update. Other, more complete tools existed already, some of which I mentioned above. But as time was passing and the size and depth of the Mercurial repository was growing, these tools were showing their limits and were too slow for my taste, especially for the initial clone. Boot to Git In the same time frame, Mozilla ventured in the Mobile OS sphere with Boot to Gecko, later known as Firefox OS. What does that have to do with version control? The needs of third party collaborators in the mobile space led to the creation of what is now the gecko-dev repository on GitHub. As I remember it, it was challenging to create, but once it was there, Git users could just clone it and have a working, up-to-date local copy of the Firefox source code and its history... which they could already have, but this was the first officially supported way of doing so. Coincidentally, Ehsan's unofficial mirror was having trouble (to the point of GitHub closing the repository) and was ultimately shut down in December 2013. You'll often find comments on the interwebs about how GitHub has become unreliable since the Microsoft acquisition. I can't really comment on that, but if you think GitHub is unreliable now, rest assured that it was worse in its beginning. And its sustainability as a platform also wasn't a given, being a rather new player. So on top of having this official mirror on GitHub, Mozilla also ventured in setting up its own Git server for greater control and reliability. But the canonical repository was still the Mercurial one, and while Git users now had a supported mirror to pull from, they still had to somehow interact with Mercurial repositories, most notably for the Try server. Git slowly creeping in Firefox build tooling Still in the same time frame, tooling around building Firefox was improving drastically. For obvious reasons, when version control integration was needed in the tooling, Mercurial support was always a no-brainer. The first explicit acknowledgement of a Git repository for the Firefox source code, other than the addition of the .gitignore file, was bug 774109. It added a script to install the prerequisites to build Firefox on macOS (still called OSX back then), and that would print a message inviting people to obtain a copy of the source code with either Mercurial or Git. That was a precursor to current bootstrap.py, from September 2012. Following that, as far as I can tell, the first real incursion of Git in the Firefox source tree tooling happened in bug 965120. A few days earlier, bug 952379 had added a mach clang-format command that would apply clang-format-diff to the output from hg diff. Obviously, running hg diff on a Git working tree didn't work, and bug 965120 was filed, and support for Git was added there. That was in January 2014. A year later, when the initial implementation of mach artifact was added (which ultimately led to artifact builds), Git users were an immediate thought. But while they were considered, it was not to support them, but to avoid actively breaking their workflows. Git support for mach artifact was eventually added 14 months later, in March 2016. From gecko-dev to git-cinnabar Let's step back a little here, back to the end of 2014. My user experience with Mercurial had reached a level of dissatisfaction that was enough for me to decide to take that script from a couple years prior and make it work for incremental updates. That meant finding a way to store enough information locally to be able to reconstruct whatever the incremental updates would be relying on (guess why other tools hid a local Mercurial clone under hood). I got something working rather quickly, and after talking to a few people about this side project at the Mozilla Portland All Hands and seeing their excitement, I published a git-remote-hg initial prototype on the last day of the All Hands. Within weeks, the prototype gained the ability to directly push to Mercurial repositories, and a couple months later, was renamed to git-cinnabar. At that point, as a Git user, instead of cloning the gecko-dev repository from GitHub and switching to a local Mercurial repository whenever you needed to push to a Mercurial repository (i.e. the aforementioned Try server, or, at the time, for reviews), you could just clone and push directly from/to Mercurial, all within Git. And it was fast too. You could get a full clone of mozilla-central in less than half an hour, when at the time, other similar tools would take more than 10 hours (needless to say, it's even worse now). Another couple months later (we're now at the end of April 2015), git-cinnabar became able to start off a local clone of the gecko-dev repository, rather than clone from scratch, which could be time consuming. But because git-cinnabar and the tool that was updating gecko-dev weren't producing the same commits, this setup was cumbersome and not really recommended. For instance, if you pushed something to mozilla-central with git-cinnabar from a gecko-dev clone, it would come back with a different commit hash in gecko-dev, and you'd have to deal with the divergence. Eventually, in April 2020, the scripts updating gecko-dev were switched to git-cinnabar, making the use of gecko-dev alongside git-cinnabar a more viable option. Ironically(?), the switch occurred to ease collaboration with KaiOS (you know, the mobile OS born from the ashes of Firefox OS). Well, okay, in all honesty, when the need of syncing in both directions between Git and Mercurial (we only had ever synced from Mercurial to Git) came up, I nudged Mozilla in the direction of git-cinnabar, which, in my (biased but still honest) opinion, was the more reliable option for two-way synchronization (we did have regular conversion problems with hg-git, nothing of the sort has happened since the switch). One Firefox repository to rule them all For reasons I don't know, Mozilla decided to use separate Mercurial repositories as "branches". With the switch to the rapid release process in 2011, that meant one repository for nightly (mozilla-central), one for aurora, one for beta, and one for release. And with the addition of Extended Support Releases in 2012, we now add a new ESR repository every year. Boot to Gecko also had its own branches, and so did Fennec (Firefox for Mobile, before Android). There are a lot of them. And then there are also integration branches, where developer's work lands before being merged in mozilla-central (or backed out if it breaks things), always leaving mozilla-central in a (hopefully) good state. Only one of them remains in use today, though. I can only suppose that the way Mercurial branches work was not deemed practical. It is worth noting, though, that Mercurial branches are used in some cases, to branch off a dot-release when the next major release process has already started, so it's not a matter of not knowing the feature exists or some such. In 2016, Gregory Szorc set up a new repository that would contain them all (or at least most of them), which eventually became what is now the mozilla-unified repository. This would e.g. simplify switching between branches when necessary. 7 years later, for some reason, the other "branches" still exist, but most developers are expected to be using mozilla-unified. Mozilla's CI also switched to using mozilla-unified as base repository. Honestly, I'm not sure why the separate repositories are still the main entry point for pushes, rather than going directly to mozilla-unified, but it probably comes down to switching being work, and not being a top priority. Also, it probably doesn't help that working with multiple heads in Mercurial, even (especially?) with bookmarks, can be a source of confusion. To give an example, if you aren't careful, and do a plain clone of the mozilla-unified repository, you may not end up on the latest mozilla-central changeset, but rather, e.g. one from beta, or some other branch, depending which one was last updated. Hosting is simple, right? Put your repository on a server, install hgweb or gitweb, and that's it? Maybe that works for... Mercurial itself, but that repository "only" has slightly over 50k changesets and less than 4k files. Mozilla-central has more than an order of magnitude more changesets (close to 700k) and two orders of magnitude more files (more than 700k if you count the deleted or moved files, 350k if you count the currently existing ones). And remember, there are a lot of "duplicates" of this repository. And I didn't even mention user repositories and project branches. Sure, it's a self-inflicted pain, and you'd think it could probably(?) be mitigated with shared repositories. But consider the simple case of two repositories: mozilla-central and autoland. You make autoland use mozilla-central as a shared repository. Now, you push something new to autoland, it's stored in the autoland datastore. Eventually, you merge to mozilla-central. Congratulations, it's now in both datastores, and you'd need to clean-up autoland if you wanted to avoid the duplication. Now, you'd think mozilla-unified would solve these issues, and it would... to some extent. Because that wouldn't cover user repositories and project branches briefly mentioned above, which in GitHub parlance would be considered as Forks. So you'd want a mega global datastore shared by all repositories, and repositories would need to only expose what they really contain. Does Mercurial support that? I don't think so (okay, I'll give you that: even if it doesn't, it could, but that's extra work). And since we're talking about a transition to Git, does Git support that? You may have read about how you can link to a commit from a fork and make-pretend that it comes from the main repository on GitHub? At least, it shows a warning, now. That's essentially the architectural reason why. So the actual answer is that Git doesn't support it out of the box, but GitHub has some backend magic to handle it somehow (and hopefully, other things like Gitea, Girocco, Gitlab, etc. have something similar). Now, to come back to the size of the repository. A repository is not a static file. It's a server with which you negotiate what you have against what it has that you want. Then the server bundles what you asked for based on what you said you have. Or in the opposite direction, you negotiate what you have that it doesn't, you send it, and the server incorporates what you sent it. Fortunately the latter is less frequent and requires authentication. But the former is more frequent and CPU intensive. Especially when pulling a large number of changesets, which, incidentally, cloning is. "But there is a solution for clones" you might say, which is true. That's clonebundles, which offload the CPU intensive part of cloning to a single job scheduled regularly. Guess who implemented it? Mozilla. But that only covers the cloning part. We actually had laid the ground to support offloading large incremental updates and split clones, but that never materialized. Even with all that, that still leaves you with a server that can display file contents, diffs, blames, provide zip archives of a revision, and more, all of which are CPU intensive in their own way. And these endpoints are regularly abused, and cause extra load to your servers, yes plural, because of course a single server won't handle the load for the number of users of your big repositories. And because your endpoints are abused, you have to close some of them. And I'm not mentioning the Try repository with its tens of thousands of heads, which brings its own sets of problems (and it would have even more heads if we didn't fake-merge them once in a while). Of course, all the above applies to Git (and it only gained support for something akin to clonebundles last year). So, when the Firefox OS project was stopped, there wasn't much motivation to continue supporting our own Git server, Mercurial still being the official point of entry, and git.mozilla.org was shut down in 2016. The growing difficulty of maintaining the status quo Slowly, but steadily in more recent years, as new tooling was added that needed some input from the source code manager, support for Git was more and more consistently added. But at the same time, as people left for other endeavors and weren't necessarily replaced, or more recently with layoffs, resources allocated to such tooling have been spread thin. Meanwhile, the repository growth didn't take a break, and the Try repository was becoming an increasing pain, with push times quite often exceeding 10 minutes. The ongoing work to move Try pushes to Lando will hide the problem under the rug, but the underlying problem will still exist (although the last version of Mercurial seems to have improved things). On the flip side, more and more people have been relying on Git for Firefox development, to my own surprise, as I didn't really push for that to happen. It just happened organically, by ways of git-cinnabar existing, providing a compelling experience to those who prefer Git, and, I guess, word of mouth. I was genuinely surprised when I recently heard the use of Git among moz-phab users had surpassed a third. I did, however, occasionally orient people who struggled with Mercurial and said they were more familiar with Git, towards git-cinnabar. I suspect there's a somewhat large number of people who never realized Git was a viable option. But that, on its own, can come with its own challenges: if you use git-cinnabar without being backed by gecko-dev, you'll have a hard time sharing your branches on GitHub, because you can't push to a fork of gecko-dev without pushing your entire local repository, as they have different commit histories. And switching to gecko-dev when you weren't already using it requires some extra work to rebase all your local branches from the old commit history to the new one. Clone times with git-cinnabar have also started to go a little out of hand in the past few years, but this was mitigated in a similar manner as with the Mercurial cloning problem: with static files that are refreshed regularly. Ironically, that made cloning with git-cinnabar faster than cloning with Mercurial. But generating those static files is increasingly time-consuming. As of writing, generating those for mozilla-unified takes close to 7 hours. I was predicting clone times over 10 hours "in 5 years" in a post from 4 years ago, I wasn't too far off. With exponential growth, it could still happen, although to be fair, CPUs have improved since. I will explore the performance aspect in a subsequent blog post, alongside the upcoming release of git-cinnabar 0.7.0-b1. I don't even want to check how long it now takes with hg-git or git-remote-hg (they were already taking more than a day when git-cinnabar was taking a couple hours). I suppose it's about time that I clarify that git-cinnabar has always been a side-project. It hasn't been part of my duties at Mozilla, and the extent to which Mozilla supports git-cinnabar is in the form of taskcluster workers on the community instance for both git-cinnabar CI and generating those clone bundles. Consequently, that makes the above git-cinnabar specific issues a Me problem, rather than a Mozilla problem. Taking the leap I can't talk for the people who made the proposal to move to Git, nor for the people who put a green light on it. But I can at least give my perspective. Developers have regularly asked why Mozilla was still using Mercurial, but I think it was the first time that a formal proposal was laid out. And it came from the Engineering Workflow team, responsible for issue tracking, code reviews, source control, build and more. It's easy to say "Mozilla should have chosen Git in the first place", but back in 2007, GitHub wasn't there, Bitbucket wasn't there, and all the available options were rather new (especially compared to the then 21 years-old CVS). I think Mozilla made the right choice, all things considered. Had they waited a couple years, the story might have been different. You might say that Mozilla stayed with Mercurial for so long because of the sunk cost fallacy. I don't think that's true either. But after the biggest Mercurial repository hosting service turned off Mercurial support, and the main contributor to Mercurial going their own way, it's hard to ignore that the landscape has evolved. And the problems that we regularly encounter with the Mercurial servers are not going to get any better as the repository continues to grow. As far as I know, all the Mercurial repositories bigger than Mozilla's are... not using Mercurial. Google has its own closed-source server, and Facebook has another of its own, and it's not really public either. With resources spread thin, I don't expect Mozilla to be able to continue supporting a Mercurial server indefinitely (although I guess Octobus could be contracted to give a hand, but is that sustainable?). Mozilla, being a champion of Open Source, also doesn't live in a silo. At some point, you have to meet your contributors where they are. And the Open Source world is now majoritarily using Git. I'm sure the vast majority of new hires at Mozilla in the past, say, 5 years, know Git and have had to learn Mercurial (although they arguably didn't need to). Even within Mozilla, with thousands(!) of repositories on GitHub, Firefox is now actually the exception rather than the norm. I should even actually say Desktop Firefox, because even Mobile Firefox lives on GitHub (although Fenix is moving back in together with Desktop Firefox, and the timing is such that that will probably happen before Firefox moves to Git). Heck, even Microsoft moved to Git! With a significant developer base already using Git thanks to git-cinnabar, and all the constraints and problems I mentioned previously, it actually seems natural that a transition (finally) happens. However, had git-cinnabar or something similarly viable not existed, I don't think Mozilla would be in a position to take this decision. On one hand, it probably wouldn't be in the current situation of having to support both Git and Mercurial in the tooling around Firefox, nor the resource constraints related to that. But on the other hand, it would be farther from supporting Git and being able to make the switch in order to address all the other problems. But... GitHub? I hope I made a compelling case that hosting is not as simple as it can seem, at the scale of the Firefox repository. It's also not Mozilla's main focus. Mozilla has enough on its plate with the migration of existing infrastructure that does rely on Mercurial to understandably not want to figure out the hosting part, especially with limited resources, and with the mixed experience hosting both Mercurial and git has been so far. After all, GitHub couldn't even display things like the contributors' graph on gecko-dev until recently, and hosting is literally their job! They still drop the ball on large blames (thankfully we have searchfox for those). Where does that leave us? Gitlab? For those criticizing GitHub for being proprietary, that's probably not open enough. Cloud Source Repositories? "But GitHub is Microsoft" is a complaint I've read a lot after the announcement. Do you think Google hosting would have appealed to these people? Bitbucket? I'm kind of surprised it wasn't in the list of providers that were considered, but I'm also kind of glad it wasn't (and I'll leave it at that). I think the only relatively big hosting provider that could have made the people criticizing the choice of GitHub happy is Codeberg, but I hadn't even heard of it before it was mentioned in response to Mozilla's announcement. But really, with literal thousands of Mozilla repositories already on GitHub, with literal tens of millions repositories on the platform overall, the pragmatic in me can't deny that it's an attractive option (and I can't stress enough that I wasn't remotely close to the room where the discussion about what choice to make happened). "But it's a slippery slope". I can see that being a real concern. LLVM also moved its repository to GitHub (from a (I think) self-hosted Subversion server), and ended up moving off Bugzilla and Phabricator to GitHub issues and PRs four years later. As an occasional contributor to LLVM, I hate this move. I hate the GitHub review UI with a passion. At least, right now, GitHub PRs are not a viable option for Mozilla, for their lack of support for security related PRs, and the more general shortcomings in the review UI. That doesn't mean things won't change in the future, but let's not get too far ahead of ourselves. The move to Git has just been announced, and the migration has not even begun yet. Just because Mozilla is moving the Firefox repository to GitHub doesn't mean it's locked in forever or that all the eggs are going to be thrown into one basket. If bridges need to be crossed in the future, we'll see then. So, what's next? The official announcement said we're not expecting the migration to really begin until six months from now. I'll swim against the current here, and say this: the earlier you can switch to git, the earlier you'll find out what works and what doesn't work for you, whether you already know Git or not. While there is not one unique workflow, here's what I would recommend anyone who wants to take the leap off Mercurial right now: As there is no one-size-fits-all workflow, I won't tell you how to organize yourself from there. I'll just say this: if you know the Mercurial sha1s of your previous local work, you can create branches for them with:
$ git branch <branch_name> $(git cinnabar hg2git <hg_sha1>)
At this point, you should have everything available on the Git side, and you can remove the .hg directory. Or move it into some empty directory somewhere else, just in case. But don't leave it here, it will only confuse the tooling. Artifact builds WILL be confused, though, and you'll have to ./mach configure before being able to do anything. You may also hit bug 1865299 if your working tree is older than this post. If you have any problem or question, you can ping me on #git-cinnabar or #git on Matrix. I'll put the instructions above somewhere on wiki.mozilla.org, and we can collaboratively iterate on them. Now, what the announcement didn't say is that the Git repository WILL NOT be gecko-dev, doesn't exist yet, and WON'T BE COMPATIBLE (trust me, it'll be for the better). Why did I make you do all the above, you ask? Because that won't be a problem. I'll have you covered, I promise. The upcoming release of git-cinnabar 0.7.0-b1 will have a way to smoothly switch between gecko-dev and the future repository (incidentally, that will also allow to switch from a pure git-cinnabar clone to a gecko-dev one, for the git-cinnabar users who have kept reading this far). What about git-cinnabar? With Mercurial going the way of the dodo at Mozilla, my own need for git-cinnabar will vanish. Legitimately, this begs the question whether it will still be maintained. I can't answer for sure. I don't have a crystal ball. However, the needs of the transition itself will motivate me to finish some long-standing things (like finalizing the support for pushing merges, which is currently behind an experimental flag) or implement some missing features (support for creating Mercurial branches). Git-cinnabar started as a Python script, it grew a sidekick implemented in C, which then incorporated some Rust, which then cannibalized the Python script and took its place. It is now close to 90% Rust, and 10% C (if you don't count the code from Git that is statically linked to it), and has sort of become my Rust playground (it's also, I must admit, a mess, because of its history, but it's getting better). So the day to day use with Mercurial is not my sole motivation to keep developing it. If it were, it would stay stagnant, because all the features I need are there, and the speed is not all that bad, although I know it could be better. Arguably, though, git-cinnabar has been relatively stagnant feature-wise, because all the features I need are there. So, no, I don't expect git-cinnabar to die along Mercurial use at Mozilla, but I can't really promise anything either. Final words That was a long post. But there was a lot of ground to cover. And I still skipped over a bunch of things. I hope I didn't bore you to death. If I did and you're still reading... what's wrong with you? ;) So this is the end of Mercurial at Mozilla. So long, and thanks for all the fish. But this is also the beginning of a transition that is not easy, and that will not be without hiccups, I'm sure. So fasten your seatbelts (plural), and welcome the change. To circle back to the clickbait title, did I really kill Mercurial at Mozilla? Of course not. But it's like I stumbled upon a few sparks and tossed a can of gasoline on them. I didn't start the fire, but I sure made it into a proper bonfire... and now it has turned into a wildfire. And who knows? 15 years from now, someone else might be looking back at how Mozilla picked Git at the wrong time, and that, had we waited a little longer, we would have picked some yet to come new horse. But hey, that's the tech cycle for you.

6 October 2023

Emanuele Rocca: Custom Debian Installer and Kernel on a USB stick

There are many valid reasons to create a custom Debian Installer image. You may need to pass some special arguments to the kernel, use a different GRUB version, automate the installation by means of preseeding, use a custom kernel, or modify the installer itself.
If you have a EFI system, which is probably the case in 2023, there is no need to learn complex procedures in order to create a custom Debian Installer stick.
The source of many frustrations is that the ISO format for CDs/DVDs is read-only, but you can just create a VFAT filesystem on a USB stick, copy all ISO contents onto the stick itself, and modify things at will.

Create a writable USB stick
First create a FAT32 filesystem on the removable device and mount it. The device is sdX in the example.
$ sudo parted --script /dev/sdX mklabel msdos
$ sudo parted --script /dev/sdX mkpart primary fat32 0% 100%
$ sudo mkfs.vfat /dev/sdX1
$ sudo mount /dev/sdX1 /mnt/data/
Then copy to the USB stick the installer ISO you would like to modify, debian-testing-amd64-netinst.iso here.
$ sudo kpartx -v -a debian-testing-amd64-netinst.iso
# Mount the first partition on the ISO and copy its contents to the stick
$ sudo mount /dev/mapper/loop0p1 /mnt/cdrom/
$ sudo rsync -av /mnt/cdrom/ /mnt/data/
$ sudo umount /mnt/cdrom
# Same story with the second partition on the ISO
$ sudo mount /dev/mapper/loop0p2 /mnt/cdrom/
$ sudo rsync -av /mnt/cdrom/ /mnt/data/
$ sudo umount /mnt/cdrom
$ sudo kpartx -d debian-testing-amd64-netinst.iso
$ sudo umount /mnt/data
Now try booting from the USB stick just to verify that everything went well and we can start customizing the image.

Boot loader, preseeding, installer hacks
The easiest things we can change now are the shim, GRUB, and GRUB s configuration. The USB stick contains the shim under /EFI/boot/bootx64.efi, while GRUB is at /EFI/boot/grubx64.efi. This means that if you want to test a different shim / GRUB version, you just replace the relevant files. That s it. Take for example /usr/lib/grub/x86_64-efi/monolithic/grubx64.efi from the package grub-efi-amd64-bin, or the signed version from grub-efi-amd64-signed and copy them under /EFI/boot/grubx64.efi. Or perhaps you want to try out systemd-boot? Then take /usr/lib/systemd/boot/efi/systemd-bootx64.efi from the package systemd-boot-efi, copy it to /EFI/boot/bootx64.efi and you re good to go. Figuring out the right systemd-boot configuration needed to start the Installer is left as an exercise.
By editing /boot/grub/grub.cfg you can pass arbitrary arguments to the kernel and the Installer itself. See the official Installation Guide for a comprehensive list of boot parameters.
One very commong thing to do is automating the installation using a preseed file. Add the following to the kernel command line: preseed/file=/cdrom/preseed.cfg and create a /preseed.cfg file on the USB stick. As a little example:
d-i time/zone select Europe/Rome
d-i passwd/root-password this-is-the-root-password
d-i passwd/root-password-again this-is-the-root-password
d-i passwd/user-fullname string Emanuele Rocca
d-i passwd/username string ema
d-i passwd/user-password password lol-haha-uh
d-i passwd/user-password-again password lol-haha-uh
d-i apt-setup/no_mirror boolean true
d-i popularity-contest/participate boolean true
tasksel tasksel/first multiselect standard
See Steve McIntyre s awesome page with the full list of available settings and their description: https://preseed.einval.com/debian-preseed/.
Two noteworthy settings are early_command and late_command. They can be used to execute arbitrary commands and provide thus extreme flexibility! You can go as far as replacing parts of the installer with a sed command, or maybe wgetting an entirely different file. This is a fairly easy way to test minor Installer patches. As an example, I ve once used this to test a patch to grub-installer:
d-i partman/early_command string wget https://people.debian.org/~ema/grub-installer-1035085-1 -O /usr/bin/grub-installer
Finally, the initrd contains all early stages of the installer. It s easy to unpack it, modify whatever component you like, and repack it. Say you want to change a given udev rule:
$ mkdir /tmp/new-initrd
$ cd /tmp/new-initrd
$ zstdcat /mnt/data/install.a64/initrd.gz   sudo cpio -id
$ vi lib/udev/rules.d/60-block.rules
$ find .   cpio -o -H newc   zstd --stdout > /mnt/data/install.a64/initrd.gz

Custom udebs
From a basic architectural standpoint the Debian Installer can be seen as an initrd that loads a series of special Debian packages called udebs. In the previous section we have seen how to (ab)use early_command to replace one of the scripts used by the Installer, namely grub-installer. It turns out that such script is installed by a udeb, so let s do things right and build a new Installer ISO with our custom grub udeb.
Fetch the code for the grub-installer udeb, make your changes and build it with a classic dpkg-buildpackage -rfakeroot.
Then get the Installer code and install all dependencies:
$ git clone https://salsa.debian.org/installer-team/debian-installer/
$ cd debian-installer/
$ sudo apt build-dep .
Now add the grub-installer udeb to the localudebs directory and create a new netboot image:
$ cp /path/to/grub-installer_1.198_arm64.udeb build/localudebs/
$ cd build
$ fakeroot make clean_netboot build_netboot
Give it some time, soon enough you ll have a brand new ISO to test under dest/netboot/mini.iso.

Custom kernel
Perhaps there s a kernel configuration option you need to enable, or maybe you need a more recent kernel version than what is available in sid.
The Debian Linux Kernel Handbook has all the details for how to do things properly, but here s a quick example.
Get the Debian kernel packaging from salsa and generate the upstream tarball:
$ git clone https://salsa.debian.org/kernel-team/linux/
$ ./debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
For RC kernels use the repo from Linus instead of linux-stable.
Now do your thing, for instance change a config setting by editing debian/config/amd64/config. Don t worry about where you put it in the file, there s a tool from https://salsa.debian.org/kernel-team/kernel-team to fix that:
$ /path/to/kernel-team/utils/kconfigeditor2/process.py .
Now build your kernel:
$ export MAKEFLAGS=-j$(nproc)
$ export DEB_BUILD_PROFILES='pkg.linux.nokerneldbg pkg.linux.nokerneldbginfo pkg.linux.notools nodoc'
$ debian/rules orig
$ debian/rules debian/control
$ dpkg-buildpackage -b -nc -uc
After some time, if everything went well, you should get a bunch of .deb files as well as a .changes file, linux_6.6~rc3-1~exp1_arm64.changes here. To generate the udebs used by the Installer you need to first get a linux-signed .dsc file, and then build it with sbuild in this example:
$ /path/to/kernel-team/scripts/debian-test-sign linux_6.6~rc3-1~exp1_arm64.changes
$ sbuild --dist=unstable --extra-package=$PWD linux-signed-arm64_6.6~rc3+1~exp1.dsc
Excellent, now you should have a ton of .udebs. To build a custom installer image with this kernel, copy them all under debian-installer/build/localudebs/ and then run fakeroot make clean_netboot build_netboot as described in the previous section. In case you are trying to use a different kernel version from what is currently in sid, you will have to install the linux-image package on the system building the ISO, and change LINUX_KERNEL_ABI in build/config/common. The linux-image dependency in debian/control probably needs to be tweaked as well.
That s it, the new Installer ISO should boot with your custom kernel!
There is going to be another minor obstacle though, as anna will complain that your new kernel cannot be found in the archive. Copy the kernel udebs you have built onto a vfat formatted USB stick, switch to a terminal, and install them all with udpkg:
~ # udpkg -i *.udeb
Now the installation should proceed smoothly.

20 June 2023

Vasudev Kamath: Notes: Experimenting with ZRAM and Memory Over commit

Introduction The ZRAM module in the Linux kernel creates a memory-backed block device that stores its content in a compressed format. It offers users the choice of compression algorithms such as lz4, zstd, or lzo. These algorithms differ in compression ratio and speed, with zstd providing the best compression but being slower, while lz4 offers higher speed but lower compression.
Using ZRAM as Swap One interesting use case for ZRAM is utilizing it as swap space in the system. There are two utilities available for configuring ZRAM as swap: zram-tools and systemd-zram-generator. However, Debian Bullseye lacks systemd-zram-generator, making zram-tools the only option for Bullseye users. While it's possible to use systemd-zram-generator by self-compiling or via cargo, I preferred using tools available in the distribution repository due to my restricted environment.
Installation The installation process is straightforward. Simply execute the following command:
apt-get install zram-tools
Configuration The configuration involves modifying a simple shell script file /etc/default/zramswap sourced by the /usr/bin/zramswap script. Here's an example of the configuration I used:
# Compression algorithm selection
# Speed: lz4 > zstd > lzo
# Compression: zstd > lzo > lz4
# This is not inclusive of all the algorithms available in the latest kernels
# See /sys/block/zram0/comp_algorithm (when the zram module is loaded) to check
# the currently set and available algorithms for your kernel [1]
# [1]  https://github.com/torvalds/linux/blob/master/Documentation/blockdev/zram.txt#L86
ALGO=zstd
# Specifies the amount of RAM that should be used for zram
# based on a percentage of the total available memory
# This takes precedence and overrides SIZE below
PERCENT=30
# Specifies a static amount of RAM that should be used for
# the ZRAM devices, measured in MiB
# SIZE=256000
# Specifies the priority for the swap devices, see swapon(2)
# for more details. A higher number indicates higher priority
# This should probably be higher than hdd/ssd swaps.
# PRIORITY=100
I chose zstd as the compression algorithm for its superior compression capabilities. Additionally, I reserved 30% of memory as the size of the zram device. After modifying the configuration, restart the zramswap.service to activate the swap:
systemctl restart zramswap.service
Using systemd-zram-generator For Debian Bookworm users, an alternative option is systemd-zram-generator. Although zram-tools is still available in Debian Bookworm, systemd-zram-generator offers a more integrated solution within the systemd ecosystem. Below is an example of the translated configuration for systemd-zram-generator, located at /etc/systemd/zram-generator.conf:
# This config file enables a /dev/zram0 swap device with the following
# properties:
# * size: 50% of available RAM or 4GiB, whichever is less
# * compression-algorithm: kernel default
#
# This device's properties can be modified by adding options under the
# [zram0] section below. For example, to set a fixed size of 2GiB, set
#  zram-size = 2GiB .
[zram0]
zram-size = ceil(ram * 30/100)
compression-algorithm = zstd
swap-priority = 100
fs-type = swap
After making the necessary changes, reload systemd and start the systemd-zram-setup@zram0.service:
systemctl daemon-reload
systemctl start systemd-zram-setup@zram0.service
The systemd-zram-generator creates the zram device by loading the kernel module and then creates a systemd.swap unit to mount the zram device as swap. In this case, the swap file is called zram0.swap.
Checking Compression and Details To verify the effectiveness of the swap configuration, you can use the zramctl command, which is part of the util-linux package. Alternatively, the zramswap utility provided by zram-tools can be used to obtain the same output. During my testing with synthetic memory load created using stress-ng vm class I found that I can reach upto 40% compression ratio.
Memory Overcommit Another use case I was looking for is allowing the launching of applications that require more memory than what is available in the system. By default, the Linux kernel attempts to estimate the amount of free memory left on the system when user space requests more memory (vm.overcommit_memory=0). However, you can change this behavior by modifying the sysctl value for vm.overcommit_memory to 1. To demonstrate this, I ran a test using stress-ng to request more memory than the system had available. As expected, the Linux kernel refused to allocate memory, and the stress-ng process could not proceed.
free -tg                                                                                                                                                                                          (Mon,Jun19) 
                total        used        free      shared  buff/cache   available
 Mem:              31          12          11           3          11          18
 Swap:             10           2           8
 Total:            41          14          19
sudo stress-ng --vm=1 --vm-bytes=50G -t 120                                                                                                                                                       (Mon,Jun19) 
 stress-ng: info:  [1496310] setting to a 120 second (2 mins, 0.00 secs) run per stressor
 stress-ng: info:  [1496310] dispatching hogs: 1 vm
 stress-ng: info:  [1496312] vm: gave up trying to mmap, no available memory, skipping stressor
 stress-ng: warn:  [1496310] vm: [1496311] aborted early, out of system resources
 stress-ng: info:  [1496310] vm:
 stress-ng: warn:  [1496310]         14 System Management Interrupts
 stress-ng: info:  [1496310] passed: 0
 stress-ng: info:  [1496310] failed: 0
 stress-ng: info:  [1496310] skipped: 1: vm (1)
 stress-ng: info:  [1496310] successful run completed in 10.04s
By setting vm.overcommit_memory=1, Linux will allocate memory in a more relaxed manner, assuming an infinite amount of memory is available.
Conclusion ZRAM provides disks that allow for very fast I/O, and compression allows for a significant amount of memory savings. ZRAM is not restricted to just swap usage; it can be used as a normal block device with different file systems. Using ZRAM as swap is beneficial because, unlike disk-based swap, it is faster, and compression ensures that we use a smaller amount of RAM itself as swap space. Additionally, adjusting the memory overcommit settings can be beneficial for scenarios that require launching memory-intensive applications. Note: When running stress tests or allocating excessive memory, be cautious about the actual memory capacity of your system to prevent out-of-memory (OOM) situations. Feel free to explore the capabilities of ZRAM and optimize your system's memory management. Happy computing!

11 June 2023

Michael Prokop: What to expect from Debian/bookworm #newinbookworm

Bookworm Banner, Copyright 2022 Juliette Taka Debian v12 with codename bookworm was released as new stable release on 10th of June 2023. Similar to what we had with #newinbullseye and previous releases, now it s time for #newinbookworm! I was the driving force at several of my customers to be well prepared for bookworm. As usual with major upgrades, there are some things to be aware of, and hereby I m starting my public notes on bookworm that might be worth also for other folks. My focus is primarily on server systems and looking at things from a sysadmin perspective. Further readings As usual start at the official Debian release notes, make sure to especially go through What s new in Debian 12 + Issues to be aware of for bookworm. Package versions As a starting point, let s look at some selected packages and their versions in bullseye vs. bookworm as of 2023-02-10 (mainly having amd64 in mind):
Package bullseye/v11 bookworm/v12
ansible 2.10.7 2.14.3
apache 2.4.56 2.4.57
apt 2.2.4 2.6.1
bash 5.1 5.2.15
ceph 14.2.21 16.2.11
docker 20.10.5 20.10.24
dovecot 2.3.13 2.3.19
dpkg 1.20.12 1.21.22
emacs 27.1 28.2
gcc 10.2.1 12.2.0
git 2.30.2 2.39.2
golang 1.15 1.19
libc 2.31 2.36
linux kernel 5.10 6.1
llvm 11.0 14.0
lxc 4.0.6 5.0.2
mariadb 10.5 10.11
nginx 1.18.0 1.22.1
nodejs 12.22 18.13
openjdk 11.0.18 + 17.0.6 17.0.6
openssh 8.4p1 9.2p1
openssl 1.1.1n 3.0.8-1
perl 5.32.1 5.36.0
php 7.4+76 8.2+93
podman 3.0.1 4.3.1
postfix 3.5.18 3.7.5
postgres 13 15
puppet 5.5.22 7.23.0
python2 2.7.18 (gone!)
python3 3.9.2 3.11.2
qemu/kvm 5.2 7.2
ruby 2.7+2 3.1
rust 1.48.0 1.63.0
samba 4.13.13 4.17.8
systemd 247.3 252.6
unattended-upgrades 2.8 2.9.1
util-linux 2.36.1 2.38.1
vagrant 2.2.14 2.3.4
vim 8.2.2434 9.0.1378
zsh 5.8 5.9
Linux Kernel The bookworm release ships a Linux kernel based on version 6.1, whereas bullseye shipped kernel 5.10. As usual there are plenty of changes in the kernel area, including better hardware support, and this might warrant a separate blog entry, but to highlight some changes: See Kernelnewbies.org for further changes between kernel versions. Configuration management puppet s upstream sadly still doesn t provide packages for bookworm (see PA-4995), though Debian provides puppet-agent and puppetserver packages, and even puppetdb is back again, see release notes for further information. ansible is also available and made it with version 2.14 into bookworm. Prometheus stack Prometheus server was updated from v2.24.1 to v2.42.0 and all the exporters that got shipped with bullseye are still around (in more recent versions of course). Virtualization docker (v20.10.24), ganeti (v3.0.2-3), libvirt (v9.0.0-4), lxc (v5.0.2-1), podman (v4.3.1), openstack (Zed), qemu/kvm (v7.2), xen (v4.17.1) are all still around. Vagrant is available in version 2.3.4, also Vagrant upstream provides their packages for bookworm already. If you re relying on VirtualBox, be aware that upstream doesn t provide packages for bookworm yet (see ticket 21524), but thankfully version 7.0.8-dfsg-2 is available from Debian/unstable (as of 2023-06-10) (VirtualBox isn t shipped with stable releases since quite some time due to lack of cooperation from upstream on security support for older releases, see #794466). rsync rsync was updated from v3.2.3 to v3.2.7, and we got a few new options: OpenSSH OpenSSH was updated from v8.4p1 to v9.2p1, so if you re interested in all the changes, check out the release notes between those version (8.5, 8.6, 8.7, 8.8, 8.9, 9.0, 9.1 + 9.2). Let s highlight some notable new features: One important change you might wanna be aware of is that as of OpenSSH v8.8, RSA signatures using the SHA-1 hash algorithm got disabled by default, but RSA/SHA-256/512 AKA RSA-SHA2 gets used instead. OpenSSH has supported RFC8332 RSA/SHA-256/512 signatures since release 7.2 and existing ssh-rsa keys will automatically use the stronger algorithm where possible. A good overview is also available at SSH: Signature Algorithm ssh-rsa Error. Now tools/libraries not supporting RSA-SHA2 fail to connect to OpenSSH as present in bookworm. For example python3-paramiko v2.7.2-1 as present in bullseye doesn t support RSA-SHA2. It tries to connect using the deprecated RSA-SHA-1, which is no longer offered by default with OpenSSH as present in bookworm, and then fails. Support for RSA/SHA-256/512 signatures in Paramiko was requested e.g. at #1734, and eventually got added to Paramiko and in the end the change made it into Paramiko versions >=2.9.0. Paramiko in bookworm works fine, and a backport by rebuilding the python3-paramiko package from bookworm for bullseye solves the problem (BTDT). Misc unsorted Thanks to everyone involved in the release, happy upgrading to bookworm, and let s continue with working towards Debian/trixie. :)

16 April 2023

Iustin Pop: Quick note: nftables and TCP MSS clamping

Another short note to myself, and whomever cares/searches later for nft or nftables, tcp mss clamping. Somewhat surprising, many/most of the instructions found by Google are still related to iptables. I guess people stopped writing blog posts by the time nftables became widely used? The only official documentation I can find is in the official wiki, but it doesn t list/explain exactly how does this work/in which conditions. I think this results in posts like this one that suggest additionally limiting the packets it acts on using a size limiter, in order to prevent changing small packets. Looking at the code that actually implements this, in net/netfilter/xt_TCPMSS.c (and not in the lower case-named file, which is about matching, TIL), in the function tcpmss_mangle_packet, first there is this comment:
/* Never increase MSS, even when setting it, as
 * doing so results in problems for hosts that rely
 * on MSS being set correctly.
 */
So at least the intent is that this always does the right thing (only decrease). Second, the code does correctly look at both directions of the packet when using auto-clamping (set rt mtu rather than set 1452), in the if branch for XT_TCPMSS_CLAMP_PMTU. This means, it s safer to use auto-clamping, rather than manually set the value. And finally, there is handling of some corner cases as well (syn packet with data, syn packet without the MSS option - unlikely for modern stacks - in which case it defaults to minimal values). All in all, it seems to me that it should always be correct to simply do what the wiki recommends, setting this for all packets traversing the host:
nft add rule ip filter forward tcp flags syn tcp option maxseg size set rt mtu
Of course, if you d rather not do it always, but only for external interfaces, make sure you set it in both directions:
nft add rule ip filter forward iifname ppp0 tcp flags syn tcp option maxseg size set rt mtu
nft add rule ip filter forward oifname ppp0 tcp flags syn tcp option maxseg size set rt mtu
And that should be it. Well, use iifgroup/oifgroup for better rules .

1 January 2023

Jonathan McDowell: Free Software Activities for 2022

There is a move to Bring Back Blogging and having recently sorted out my own FreshRSS install I am completely in favour of such a thing. RSS feeds with complete posts, for preference, not just a teaser intro sentence/paragraph. It s also a reminder to me that I should blog more, and what better way to start 2023 than with my traditional recap of my Free Software activities in 2022. For previous years see 2019, 2020 + 2021

Conferences I attended DebConf22 in Prizen, Kosova this year, and finally hit the end of my luck in avoiding COVID. 0/10, would not recommend. Thankfully I didn t come down with symptoms until I got home (I felt fine and tested negative on arrival home, then started to feel terrible the next day and tested again), so I was able to enjoy the conference itself. I also made it to Linux Security Summit Europe 2022, which aligned with work related bits and was interesting. I suspect I would have been better going to LPC 2022 for the hallway track, though I did manage to get some overlap with folk being in town given that both were the same week.

Debian Most of my contributions to Free software continue to happen within Debian. We continue to operate a roughly 3 month rotation for Debian Keyring in terms of handling the regular updates, and I dealt with 2022.03.24, 2022.06.26, 2022.08.11, 2022.09.24, 2022.09.25 + 2022.12.24. There were a few out of cycle updates this year and I handled a couple of them. My other contributions are largely within the Debian Electronics Packaging Team. gcc-xtensa-lx106 saw a few updates, to GCC 11 + enabling D (10 + 11), then to GCC 12 (12). binutils-xtensa-lx106 got some minor packaging cleanups, which also served to force a rebuild with the current binutils source (5). libsigrokdecode got an upload to enable building with Python 3.10 (0.5.3-3). Related, I updated sdcc to a new upstream version (4.2.0+dfsg-1) - it s used for the sigrok-firmware-fx2lafw package and I do have a tendency to play with microcontrollers, so it s good to have a recent version available in the archive. I continue to pay attention to OpenOCD, with a minor set of updates to pull in some fixes from master (0.11.0-2). I was pleased to see the release process for 0.12.0 kick off and have been uploading RCs as they come out (0.12.0~rc1-1, 0.12.0~rc2-1 + 0.12.0~rc3-1). Upstream have been interested in the upcoming bookworm release cycle and I m hopeful we ll get 0.12.0 proper in before freeze. libjaylink also saw an upstream release (0.3.1-1). Package upload sponsorship isn t normally something I get involved with, because I find I have to spend a lot of time checking over things before I m comfortable doing the upload. However I did sponsor an initial upload for sugarjar and an update for mgba (0.10.0+dsfg-1, currently stuck in NEW). Credit to Michel for dealing swiftly with my review comments, and Ryan for producing a nicely reviewable set of changes. As part of the Data Protection Team I responded to various inbound queries to that team. There was also some discussion on debian-vote as part of the DPL election that I engaged with, as well as discussions at DebConf about how we can do things better. For Debian New Members I m mostly inactive as an application manager - we generally seem to have enough available recently. If that changes I ll look at stepping in to help, but I don t see that happening (it got close this year but several people had stood up before I got around to offering). I continue to be involved in Front Desk, having various conversations throughout the year with the rest of the team and occasionally approving some of the checks for new applicants. Towards the end of the year I got involved with the Debian Games Team, largely because I m keen to try and get my Kodi working with libretro based emulators - I d really like to be able to play old style games from the same interface as I can engage with locally stored movies, music and TV. It turns out there are a lot of moving pieces to make that happen, some missing from Debian and others in need of some TLC. I updated retroarch to current upstream (1.13.0+dfsg-1 + 1.13.0+dfsg-2) but while I was doing so upstream did another release. I plan on uploading 1.14.0 once 1.13.0 has migrated to testing. It turned out I also needed to update libretro-core-info (1.13.0-1) and retroarch-assets (1.7.6+git20221024+dfsg-1). In terms of actual emulators I pulled in new versions for genesisplusgx (1.7.4+git20221128-1) and libretro-bsnes-mercury (094+git20220807-1). On the Kodi side I haven t uploaded anything yet. I ve filed an ITP for rcheevos, which is a dependency for game.libretro and I have a fledgling package for game.libretro that I finally got working today. I m not sure if I can get it cleaned up enough in time to make the bookworm release, but I m hoping that at least the libretro piece is in a bit better shape now (though I m aware there are more emulator cores that could do with being updated).

Linux This year was a quiet year for personal Linux contributions. I submitted a minor fix for the qca8081 PHY with speeds lower than 2.5Gb/s that caused me issues on my RB5009.

Personal projects 2022 finally saw a minor releases of onak, 0.6.2, which resulted in a corresponding Debian upload (0.6.2-1). It has a couple of bug fixes but nothing major.. As I said last year it s not dead, just resting, but Sequoia PGP is probably where you should be looking for a modern OpenPGP implementation. I added some basic Debian packaging to mqtt-arp - I didn t bother uploading it as it s a fairly niche package, but I m using it locally.

C.J. Adams-Collier: State of the racks, 20221231

Hi friends! I haven t written in a while. I ve been caught up in work. But between working, I ve put together some new equipment in a couple of new racks. I bought an audio dampened 15U rack a couple of years ago or so, and into it I ve placed the RAID array and an HP desktop form-factor ML110 server to drive the disks. The disk array controller is a two-port Broadcom / LSI SAS3008 PCI-Express Fusion-MPT SAS-3. I ve been thinking about getting the four-port variant, since I like this one and I ve got another 7 drive bays in the chassis that don t have disks in them. In the next rack over, which was gifted to me by one of my colleagues (Thank you Nahuel!), I have six qotom mini computers and a couple of 48-port Dell 6248 switches with two 6200-XGSF 10GE SFP+ modules. The mini computers are a sort of proto-cluster, and all have a whole bunch of network interfaces. The smallest of the group is a celeron with four gigabit ethernet ports, and the two fastest ones have i7 processors with 6x GE ports. Each of the mini computers is configured with all of their ethernet interfaces in a single LACP port-channel, thanks to the bonding Linux kernel module. On my desk, I have a Mikrotik CRS305. It is populated with three LR SMF transceivers. One attaches to my work desktop via a QLogic Corp. cLOM8214 PCIe card, one attaches via a thunderbolt 3 NIC to my work laptop, and the other is connected to the aforementioned Dell switch. My internet provider has installed a Optical Network Transceiver (ONT) device in my home which terminates the incoming Gigabit Passive Optical Network (GPON) services from the CO and delivers 940Mbit symmetric PPP over Ethernet via 8-pin copper out of the ONT. I connect the ONT to a Mikrotik CRS309-1G-8S+ router. That router is connected via LR SMF to the dell switch in the rack full of qotom hardware. This afternoon, I tested the throughput between my work desktop and my storage server and came up with these numbers:
$ iperf -c 100.64.79.102
------------------------------------------------------------
Client connecting to 100.64.79.102, TCP port 5001
TCP window size: 16.0 KByte (default)
------------------------------------------------------------
[  1] local 100.64.79.46 port 55216 connected with 100.64.79.102 port 5001 (icwnd/mss/irtt=14/1448/547)
[ ID] Interval       Transfer     Bandwidth
[  1] 0.0000-10.0129 sec  10.4 GBytes  8.95 Gbits/sec
That seems pretty good to me! The traffic hopped through my desk router and the dell switch to get to the Proliant server, and still nearly reached 9Gbit/s. That s a lot of blinkenlights.

30 August 2022

John Goerzen: The PC & Internet Revolution in Rural America

Inspired by several others (such as Alex Schroeder s post and Szcze uja s prompt), as well as a desire to get this down for my kids, I figure it s time to write a bit about living through the PC and Internet revolution where I did: outside a tiny town in rural Kansas. And, as I ve been back in that same area for the past 15 years, I reflect some on the challenges that continue to play out. Although the stories from the others were primarily about getting online, I want to start by setting some background. Those of you that didn t grow up in the same era as I did probably never realized that a typical business PC setup might cost $10,000 in today s dollars, for instance. So let me start with the background.

Nothing was easy This story begins in the 1980s. Somewhere around my Kindergarten year of school, around 1985, my parents bought a TRS-80 Color Computer 2 (aka CoCo II). It had 64K of RAM and used a TV for display and sound. This got you the computer. It didn t get you any disk drive or anything, no joysticks (required by a number of games). So whenever the system powered down, or it hung and you had to power cycle it a frequent event you d lose whatever you were doing and would have to re-enter the program, literally by typing it in. The floppy drive for the CoCo II cost more than the computer, and it was quite common for people to buy the computer first and then the floppy drive later when they d saved up the money for that. I particularly want to mention that computers then didn t come with a modem. What would be like buying a laptop or a tablet without wifi today. A modem, which I ll talk about in a bit, was another expensive accessory. To cobble together a system in the 80s that was capable of talking to others with persistent storage (floppy, or hard drive), screen, keyboard, and modem would be quite expensive. Adjusted for inflation, if you re talking a PC-style device (a clone of the IBM PC that ran DOS), this would easily be more expensive than the Macbook Pros of today. Few people back in the 80s had a computer at home. And the portion of those that had even the capability to get online in a meaningful way was even smaller. Eventually my parents bought a PC clone with 640K RAM and dual floppy drives. This was primarily used for my mom s work, but I did my best to take it over whenever possible. It ran DOS and, despite its monochrome screen, was generally a more capable machine than the CoCo II. For instance, it supported lowercase. (I m not even kidding; the CoCo II pretty much didn t.) A while later, they purchased a 32MB hard drive for it what luxury! Just getting a machine to work wasn t easy. Say you d bought a PC, and then bought a hard drive, and a modem. You didn t just plug in the hard drive and it would work. You would have to fight it every step of the way. The BIOS and DOS partition tables of the day used a cylinder/head/sector method of addressing the drive, and various parts of that those addresses had too few bits to work with the big drives of the day above 20MB. So you would have to lie to the BIOS and fdisk in various ways, and sort of work out how to do it for each drive. For each peripheral serial port, sound card (in later years), etc., you d have to set jumpers for DMA and IRQs, hoping not to conflict with anything already in the system. Perhaps you can now start to see why USB and PCI were so welcomed.

Sharing and finding resources Despite the two computers in our home, it wasn t as if software written on one machine just ran on another. A lot of software for PC clones assumed a CGA color display. The monochrome HGC in our PC wasn t particularly compatible. You could find a TSR program to emulate the CGA on the HGC, but it wasn t particularly stable, and there s only so much you can do when a program that assumes color displays on a monitor that can only show black, dark amber, or light amber. So I d periodically get to use other computers most commonly at an office in the evening when it wasn t being used. There were some local computer clubs that my dad took me to periodically. Software was swapped back then; disks copied, shareware exchanged, and so forth. For me, at least, there was no online to download software from, and selling software over the Internet wasn t a thing at all.

Three Different Worlds There were sort of three different worlds of computing experience in the 80s:
  1. Home users. Initially using a wide variety of software from Apple, Commodore, Tandy/RadioShack, etc., but eventually coming to be mostly dominated by IBM PC clones
  2. Small and mid-sized business users. Some of them had larger minicomputers or small mainframes, but most that I had contact with by the early 90s were standardized on DOS-based PCs. More advanced ones had a network running Netware, most commonly. Networking hardware and software was generally too expensive for home users to use in the early days.
  3. Universities and large institutions. These are the places that had the mainframes, the earliest implementations of TCP/IP, the earliest users of UUCP, and so forth.
The difference between the home computing experience and the large institution experience were vast. Not only in terms of dollars the large institution hardware could easily cost anywhere from tens of thousands to millions of dollars but also in terms of sheer resources required (large rooms, enormous power circuits, support staff, etc). Nothing was in common between them; not operating systems, not software, not experience. I was never much aware of the third category until the differences started to collapse in the mid-90s, and even then I only was exposed to it once the collapse was well underway. You might say to me, Well, Google certainly isn t running what I m running at home! And, yes of course, it s different. But fundamentally, most large datacenters are running on x86_64 hardware, with Linux as the operating system, and a TCP/IP network. It s a different scale, obviously, but at a fundamental level, the hardware and operating system stack are pretty similar to what you can readily run at home. Back in the 80s and 90s, this wasn t the case. TCP/IP wasn t even available for DOS or Windows until much later, and when it was, it was a clunky beast that was difficult. One of the things Kevin Driscoll highlights in his book called Modem World see my short post about it is that the history of the Internet we usually receive is focused on case 3: the large institutions. In reality, the Internet was and is literally a network of networks. Gateways to and from Internet existed from all three kinds of users for years, and while TCP/IP ultimately won the battle of the internetworking protocol, the other two streams of users also shaped the Internet as we now know it. Like many, I had no access to the large institution networks, but as I ve been reflecting on my experiences, I ve found a new appreciation for the way that those of us that grew up with primarily home PCs shaped the evolution of today s online world also.

An Era of Scarcity I should take a moment to comment about the cost of software back then. A newspaper article from 1985 comments that WordPerfect, then the most powerful word processing program, sold for $495 (or $219 if you could score a mail order discount). That s $1360/$600 in 2022 money. Other popular software, such as Lotus 1-2-3, was up there as well. If you were to buy a new PC clone in the mid to late 80s, it would often cost $2000 in 1980s dollars. Now add a printer a low-end dot matrix for $300 or a laser for $1500 or even more. A modem: another $300. So the basic system would be $3600, or $9900 in 2022 dollars. If you wanted a nice printer, you re now pushing well over $10,000 in 2022 dollars. You start to see one barrier here, and also why things like shareware and piracy if it was indeed even recognized as such were common in those days. So you can see, from a home computer setup (TRS-80, Commodore C64, Apple ][, etc) to a business-class PC setup was an order of magnitude increase in cost. From there to the high-end minis/mainframes was another order of magnitude (at least!) increase. Eventually there was price pressure on the higher end and things all got better, which is probably why the non-DOS PCs lasted until the early 90s.

Increasing Capabilities My first exposure to computers in school was in the 4th grade, when I would have been about 9. There was a single Apple ][ machine in that room. I primarily remember playing Oregon Trail on it. The next year, the school added a computer lab. Remember, this is a small rural area, so each graduating class might have about 25 people in it; this lab was shared by everyone in the K-8 building. It was full of some flavor of IBM PS/2 machines running DOS and Netware. There was a dedicated computer teacher too, though I think she was a regular teacher that was given somewhat minimal training on computers. We were going to learn typing that year, but I did so well on the very first typing program that we soon worked out that I could do programming instead. I started going to school early these machines were far more powerful than the XT at home and worked on programming projects there. Eventually my parents bought me a Gateway 486SX/25 with a VGA monitor and hard drive. Wow! This was a whole different world. It may have come with Windows 3.0 or 3.1 on it, but I mainly remember running OS/2 on that machine. More on that below.

Programming That CoCo II came with a BASIC interpreter in ROM. It came with a large manual, which served as a BASIC tutorial as well. The BASIC interpreter was also the shell, so literally you could not use the computer without at least a bit of BASIC. Once I had access to a DOS machine, it also had a basic interpreter: GW-BASIC. There was a fair bit of software written in BASIC at the time, but most of the more advanced software wasn t. I wondered how these .EXE and .COM programs were written. I could find vague references to DEBUG.EXE, assemblers, and such. But it wasn t until I got a copy of Turbo Pascal that I was able to do that sort of thing myself. Eventually I got Borland C++ and taught myself C as well. A few years later, I wanted to try writing GUI programs for Windows, and bought Watcom C++ much cheaper than the competition, and it could target Windows, DOS (and I think even OS/2). Notice that, aside from BASIC, none of this was free, and none of it was bundled. You couldn t just download a C compiler, or Python interpreter, or whatnot back then. You had to pay for the ability to write any kind of serious code on the computer you already owned.

The Microsoft Domination Microsoft came to dominate the PC landscape, and then even the computing landscape as a whole. IBM very quickly lost control over the hardware side of PCs as Compaq and others made clones, but Microsoft has managed in varying degrees even to this day to keep a stranglehold on the software, and especially the operating system, side. Yes, there was occasional talk of things like DR-DOS, but by and large the dominant platform came to be the PC, and if you had a PC, you ran DOS (and later Windows) from Microsoft. For awhile, it looked like IBM was going to challenge Microsoft on the operating system front; they had OS/2, and when I switched to it sometime around the version 2.1 era in 1993, it was unquestionably more advanced technically than the consumer-grade Windows from Microsoft at the time. It had Internet support baked in, could run most DOS and Windows programs, and had introduced a replacement for the by-then terrible FAT filesystem: HPFS, in 1988. Microsoft wouldn t introduce a better filesystem for its consumer operating systems until Windows XP in 2001, 13 years later. But more on that story later.

Free Software, Shareware, and Commercial Software I ve covered the high cost of software already. Obviously $500 software wasn t going to sell in the home market. So what did we have? Mainly, these things:
  1. Public domain software. It was free to use, and if implemented in BASIC, probably had source code with it too.
  2. Shareware
  3. Commercial software (some of it from small publishers was a lot cheaper than $500)
Let s talk about shareware. The idea with shareware was that a company would release a useful program, sometimes limited. You were encouraged to register , or pay for, it if you liked it and used it. And, regardless of whether you registered it or not, were told please copy! Sometimes shareware was fully functional, and registering it got you nothing more than printed manuals and an easy conscience (guilt trips for not registering weren t necessarily very subtle). Sometimes unregistered shareware would have a nag screen a delay of a few seconds while they told you to register. Sometimes they d be limited in some way; you d get more features if you registered. With games, it was popular to have a trilogy, and release the first episode inevitably ending with a cliffhanger as shareware, and the subsequent episodes would require registration. In any event, a lot of software people used in the 80s and 90s was shareware. Also pirated commercial software, though in the earlier days of computing, I think some people didn t even know the difference. Notice what s missing: Free Software / FLOSS in the Richard Stallman sense of the word. Stallman lived in the big institution world after all, he worked at MIT and what he was doing with the Free Software Foundation and GNU project beginning in 1983 never really filtered into the DOS/Windows world at the time. I had no awareness of it even existing until into the 90s, when I first started getting some hints of it as a port of gcc became available for OS/2. The Internet was what really brought this home, but I m getting ahead of myself. I want to say again: FLOSS never really entered the DOS and Windows 3.x ecosystems. You d see it make a few inroads here and there in later versions of Windows, and moreso now that Microsoft has been sort of forced to accept it, but still, reflect on its legacy. What is the software market like in Windows compared to Linux, even today? Now it is, finally, time to talk about connectivity!

Getting On-Line What does it even mean to get on line? Certainly not connecting to a wifi access point. The answer is, unsurprisingly, complex. But for everyone except the large institutional users, it begins with a telephone.

The telephone system By the 80s, there was one communication network that already reached into nearly every home in America: the phone system. Virtually every household (note I don t say every person) was uniquely identified by a 10-digit phone number. You could, at least in theory, call up virtually any other phone in the country and be connected in less than a minute. But I ve got to talk about cost. The way things worked in the USA, you paid a monthly fee for a phone line. Included in that monthly fee was unlimited local calling. What is a local call? That was an extremely complex question. Generally it meant, roughly, calling within your city. But of course, as you deal with things like suburbs and cities growing into each other (eg, the Dallas-Ft. Worth metroplex), things got complicated fast. But let s just say for simplicity you could call others in your city. What about calling people not in your city? That was long distance , and you paid often hugely by the minute for it. Long distance rates were difficult to figure out, but were generally most expensive during business hours and cheapest at night or on weekends. Prices eventually started to come down when competition was introduced for long distance carriers, but even then you often were stuck with a single carrier for long distance calls outside your city but within your state. Anyhow, let s just leave it at this: local calls were virtually free, and long distance calls were extremely expensive.

Getting a modem I remember getting a modem that ran at either 1200bps or 2400bps. Either way, quite slow; you could often read even plain text faster than the modem could display it. But what was a modem? A modem hooked up to a computer with a serial cable, and to the phone system. By the time I got one, modems could automatically dial and answer. You would send a command like ATDT5551212 and it would dial 555-1212. Modems had speakers, because often things wouldn t work right, and the telephone system was oriented around speech, so you could hear what was happening. You d hear it wait for dial tone, then dial, then hopefully the remote end would ring, a modem there would answer, you d hear the screeching of a handshake, and eventually your terminal would say CONNECT 2400. Now your computer was bridged to the other; anything going out your serial port was encoded as sound by your modem and decoded at the other end, and vice-versa. But what, exactly, was the other end? It might have been another person at their computer. Turn on local echo, and you can see what they did. Maybe you d send files to each other. But in my case, the answer was different: PC Magazine.

PC Magazine and CompuServe Starting around 1986 (so I would have been about 6 years old), I got to read PC Magazine. My dad would bring copies that were being discarded at his office home for me to read, and I think eventually bought me a subscription directly. This was not just a standard magazine; it ran something like 350-400 pages an issue, and came out every other week. This thing was a monster. It had reviews of hardware and software, descriptions of upcoming technologies, pages and pages of ads (that often had some degree of being informative to them). And they had sections on programming. Many issues would talk about BASIC or Pascal programming, and there d be a utility in most issues. What do I mean by a utility in most issues ? Did they include a floppy disk with software? No, of course not. There was a literal program listing printed in the magazine. If you wanted the utility, you had to type it in. And a lot of them were written in assembler, so you had to have an assembler. An assembler, of course, was not free and I didn t have one. Or maybe they wrote it in Microsoft C, and I had Borland C, and (of course) they weren t compatible. Sometimes they would list the program sort of in binary: line after line of a BASIC program, with lines like 64, 193, 253, 0, 53, 0, 87 that you would type in for hours, hopefully correctly. Running the BASIC program would, if you got it correct, emit a .COM file that you could then run. They did have a rudimentary checksum system built in, but it wasn t even a CRC, so something like swapping two numbers you d never notice except when the program would mysteriously hang. Eventually they teamed up with CompuServe to offer a limited slice of CompuServe for the purpose of downloading PC Magazine utilities. This was called PC MagNet. I am foggy on the details, but I believe that for a time you could connect to the limited PC MagNet part of CompuServe for free (after the cost of the long-distance call, that is) rather than paying for CompuServe itself (because, OF COURSE, that also charged you per the minute.) So in the early days, I would get special permission from my parents to place a long distance call, and after some nerve-wracking minutes in which we were aware every minute was racking up charges, I could navigate the menus, download what I wanted, and log off immediately. I still, incidentally, mourn what PC Magazine became. As with computing generally, it followed the mass market. It lost its deep technical chops, cut its programming columns, stopped talking about things like how SCSI worked, and so forth. By the time it stopped printing in 2009, it was no longer a square-bound 400-page beheamoth, but rather looked more like a copy of Newsweek, but with less depth.

Continuing with CompuServe CompuServe was a much larger service than just PC MagNet. Eventually, our family got a subscription. It was still an expensive and scarce resource; I d call it only after hours when the long-distance rates were cheapest. Everyone had a numerical username separated by commas; mine was 71510,1421. CompuServe had forums, and files. Eventually I would use TapCIS to queue up things I wanted to do offline, to minimize phone usage online. CompuServe eventually added a gateway to the Internet. For the sum of somewhere around $1 a message, you could send or receive an email from someone with an Internet email address! I remember the thrill of one time, as a kid of probably 11 years, sending a message to one of the editors of PC Magazine and getting a kind, if brief, reply back! But inevitably I had

The Godzilla Phone Bill Yes, one month I became lax in tracking my time online. I ran up my parents phone bill. I don t remember how high, but I remember it was hundreds of dollars, a hefty sum at the time. As I watched Jason Scott s BBS Documentary, I realized how common an experience this was. I think this was the end of CompuServe for me for awhile.

Toll-Free Numbers I lived near a town with a population of 500. Not even IN town, but near town. The calling area included another town with a population of maybe 1500, so all told, there were maybe 2000 people total I could talk to with a local call though far fewer numbers, because remember, telephones were allocated by the household. There was, as far as I know, zero modems that were a local call (aside from one that belonged to a friend I met in around 1992). So basically everything was long-distance. But there was a special feature of the telephone network: toll-free numbers. Normally when calling long-distance, you, the caller, paid the bill. But with a toll-free number, beginning with 1-800, the recipient paid the bill. These numbers almost inevitably belonged to corporations that wanted to make it easy for people to call. Sales and ordering lines, for instance. Some of these companies started to set up modems on toll-free numbers. There were few of these, but they existed, so of course I had to try them! One of them was a company called PennyWise that sold office supplies. They had a toll-free line you could call with a modem to order stuff. Yes, online ordering before the web! I loved office supplies. And, because I lived far from a big city, if the local K-Mart didn t have it, I probably couldn t get it. Of course, the interface was entirely text, but you could search for products and place orders with the modem. I had loads of fun exploring the system, and actually ordered things from them and probably actually saved money doing so. With the first order they shipped a monster full-color catalog. That thing must have been 500 pages, like the Sears catalogs of the day. Every item had a part number, which streamlined ordering through the modem.

Inbound FAXes By the 90s, a number of modems became able to send and receive FAXes as well. For those that don t know, a FAX machine was essentially a special modem. It would scan a page and digitally transmit it over the phone system, where it would at least in the early days be printed out in real time (because the machines didn t have the memory to store an entire page as an image). Eventually, PC modems integrated FAX capabilities. There still wasn t anything useful I could do locally, but there were ways I could get other companies to FAX something to me. I remember two of them. One was for US Robotics. They had an on demand FAX system. You d call up a toll-free number, which was an automated IVR system. You could navigate through it and select various documents of interest to you: spec sheets and the like. You d key in your FAX number, hang up, and US Robotics would call YOU and FAX you the documents you wanted. Yes! I was talking to a computer (of a sorts) at no cost to me! The New York Times also ran a service for awhile called TimesFax. Every day, they would FAX out a page or two of summaries of the day s top stories. This was pretty cool in an era in which I had no other way to access anything from the New York Times. I managed to sign up for TimesFax I have no idea how, anymore and for awhile I would get a daily FAX of their top stories. When my family got its first laser printer, I could them even print these FAXes complete with the gothic New York Times masthead. Wow! (OK, so technically I could print it on a dot-matrix printer also, but graphics on a 9-pin dot matrix is a kind of pain that is a whole other article.)

My own phone line Remember how I discussed that phone lines were allocated per household? This was a problem for a lot of reasons:
  1. Anybody that tried to call my family while I was using my modem would get a busy signal (unable to complete the call)
  2. If anybody in the house picked up the phone while I was using it, that would degrade the quality of the ongoing call and either mess up or disconnect the call in progress. In many cases, that could cancel a file transfer (which wasn t necessarily easy or possible to resume), prompting howls of annoyance from me.
  3. Generally we all had to work around each other
So eventually I found various small jobs and used the money I made to pay for my own phone line and my own long distance costs. Eventually I upgraded to a 28.8Kbps US Robotics Courier modem even! Yes, you heard it right: I got a job and a bank account so I could have a phone line and a faster modem. Uh, isn t that why every teenager gets a job? Now my local friend and I could call each other freely at least on my end (I can t remember if he had his own phone line too). We could exchange files using HS/Link, which had the added benefit of allowing split-screen chat even while a file transfer is in progress. I m sure we spent hours chatting to each other keyboard-to-keyboard while sharing files with each other.

Technology in Schools By this point in the story, we re in the late 80s and early 90s. I m still using PC-style OSs at home; OS/2 in the later years of this period, DOS or maybe a bit of Windows in the earlier years. I mentioned that they let me work on programming at school starting in 5th grade. It was soon apparent that I knew more about computers than anybody on staff, and I started getting pulled out of class to help teachers or administrators with vexing school problems. This continued until I graduated from high school, incidentally often to my enjoyment, and the annoyance of one particular teacher who, I must say, I was fine with annoying in this way. That s not to say that there was institutional support for what I was doing. It was, after all, a small school. Larger schools might have introduced BASIC or maybe Logo in high school. But I had already taught myself BASIC, Pascal, and C by the time I was somewhere around 12 years old. So I wouldn t have had any use for that anyhow. There were programming contests occasionally held in the area. Schools would send teams. My school didn t really send anybody, but I went as an individual. One of them was run by a local college (but for jr. high or high school students. Years later, I met one of the professors that ran it. He remembered me, and that day, better than I did. The programming contest had problems one could solve in BASIC or Logo. I knew nothing about what to expect going into it, but I had lugged my computer and screen along, and asked him, Can I write my solutions in C? He was, apparently, stunned, but said sure, go for it. I took first place that day, leading to some rather confused teams from much larger schools. The Netware network that the school had was, as these generally were, itself isolated. There was no link to the Internet or anything like it. Several schools across three local counties eventually invested in a fiber-optic network linking them together. This built a larger, but still closed, network. Its primary purpose was to allow students to be exposed to a wider variety of classes at high schools. Participating schools had an ITV room , outfitted with cameras and mics. So students at any school could take classes offered over ITV at other schools. For instance, only my school taught German classes, so people at any of those participating schools could take German. It was an early Zoom room. But alongside the TV signal, there was enough bandwidth to run some Netware frames. By about 1995 or so, this let one of the schools purchase some CD-ROM software that was made available on a file server and could be accessed by any participating school. Nice! But Netware was mainly about file and printer sharing; there wasn t even a facility like email, at least not on our deployment.

BBSs My last hop before the Internet was the BBS. A BBS was a computer program, usually ran by a hobbyist like me, on a computer with a modem connected. Callers would call it up, and they d interact with the BBS. Most BBSs had discussion groups like forums and file areas. Some also had games. I, of course, continued to have that most vexing of problems: they were all long-distance. There were some ways to help with that, chiefly QWK and BlueWave. These, somewhat like TapCIS in the CompuServe days, let me download new message posts for reading offline, and queue up my own messages to send later. QWK and BlueWave didn t help with file downloading, though.

BBSs get networked BBSs were an interesting thing. You d call up one, and inevitably somewhere in the file area would be a BBS list. Download the BBS list and you ve suddenly got a list of phone numbers to try calling. All of them were long distance, of course. You d try calling them at random and have a success rate of maybe 20%. The other 80% would be defunct; you might get the dreaded this number is no longer in service or the even more dreaded angry human answering the phone (and of course a modem can t talk to a human, so they d just get silence for probably the nth time that week). The phone company cared nothing about BBSs and recycled their numbers just as fast as any others. To talk to various people, or participate in certain discussion groups, you d have to call specific BBSs. That s annoying enough in the general case, but even more so for someone paying long distance for it all, because it takes a few minutes to establish a connection to a BBS: handshaking, logging in, menu navigation, etc. But BBSs started talking to each other. The earliest successful such effort was FidoNet, and for the duration of the BBS era, it remained by far the largest. FidoNet was analogous to the UUCP that the institutional users had, but ran on the much cheaper PC hardware. Basically, BBSs that participated in FidoNet would relay email, forum posts, and files between themselves overnight. Eventually, as with UUCP, by hopping through this network, messages could reach around the globe, and forums could have worldwide participation asynchronously, long before they could link to each other directly via the Internet. It was almost entirely volunteer-run.

Running my own BBS At age 13, I eventually chose to set up my own BBS. It ran on my single phone line, so of course when I was dialing up something else, nobody could dial up me. Not that this was a huge problem; in my town of 500, I probably had a good 1 or 2 regular callers in the beginning. In the PC era, there was a big difference between a server and a client. Server-class software was expensive and rare. Maybe in later years you had an email client, but an email server would be completely unavailable to you as a home user. But with a BBS, I could effectively run a server. I even ran serial lines in our house so that the BBS could be connected from other rooms! Since I was running OS/2, the BBS didn t tie up the computer; I could continue using it for other things. FidoNet had an Internet email gateway. This one, unlike CompuServe s, was free. Once I had a BBS on FidoNet, you could reach me from the Internet using the FidoNet address. This didn t support attachments, but then email of the day didn t really, either. Various others outside Kansas ran FidoNet distribution points. I believe one of them was mgmtsys; my memory is quite vague, but I think they offered a direct gateway and I would call them to pick up Internet mail via FidoNet protocols, but I m not at all certain of this.

Pros and Cons of the Non-Microsoft World As mentioned, Microsoft was and is the dominant operating system vendor for PCs. But I left that world in 1993, and here, nearly 30 years later, have never really returned. I got an operating system with more technical capabilities than the DOS and Windows of the day, but the tradeoff was a much smaller software ecosystem. OS/2 could run DOS programs, but it ran OS/2 programs a lot better. So if I were to run a BBS, I wanted one that had a native OS/2 version limiting me to a small fraction of available BBS server software. On the other hand, as a fully 32-bit operating system, there started to be OS/2 ports of certain software with a Unix heritage; most notably for me at the time, gcc. At some point, I eventually came across the RMS essays and started to be hooked.

Internet: The Hunt Begins I certainly was aware that the Internet was out there and interesting. But the first problem was: how the heck do I get connected to the Internet?

Computer labs There was one place that tended to have Internet access: colleges and universities. In 7th grade, I participated in a program that resulted in me being invited to visit Duke University, and in 8th grade, I participated in National History Day, resulting in a trip to visit the University of Maryland. I probably sought out computer labs at both of those. My most distinct memory was finding my way into a computer lab at one of those universities, and it was full of NeXT workstations. I had never seen or used NeXT before, and had no idea how to operate it. I had brought a box of floppy disks, unaware that the DOS disks probably weren t compatible with NeXT. Closer to home, a small college had a computer lab that I could also visit. I would go there in summer or when it wasn t used with my stack of floppies. I remember downloading disk images of FLOSS operating systems: FreeBSD, Slackware, or Debian, at the time. The hash marks from the DOS-based FTP client would creep across the screen as the 1.44MB disk images would slowly download. telnet was also available on those machines, so I could telnet to things like public-access Archie servers and libraries though not Gopher. Still, FTP and telnet access opened up a lot, and I learned quite a bit in those years.

Continuing the Journey At some point, I got a copy of the Whole Internet User s Guide and Catalog, published in 1994. I still have it. If it hadn t already figured it out by then, I certainly became aware from it that Unix was the dominant operating system on the Internet. The examples in Whole Internet covered FTP, telnet, gopher all assuming the user somehow got to a Unix prompt. The web was introduced about 300 pages in; clearly viewed as something that wasn t page 1 material. And it covered the command-line www client before introducing the graphical Mosaic. Even then, though, the book highlighted Mosaic s utility as a front-end for Gopher and FTP, and even the ability to launch telnet sessions by clicking on links. But having a copy of the book didn t equate to having any way to run Mosaic. The machines in the computer lab I mentioned above all ran DOS and were incapable of running a graphical browser. I had no SLIP or PPP (both ways to run Internet traffic over a modem) connectivity at home. In short, the Web was something for the large institutional users at the time.

CD-ROMs As CD-ROMs came out, with their huge (for the day) 650MB capacity, various companies started collecting software that could be downloaded on the Internet and selling it on CD-ROM. The two most popular ones were Walnut Creek CD-ROM and Infomagic. One could buy extensive Shareware and gaming collections, and then even entire Linux and BSD distributions. Although not exactly an Internet service per se, it was a way of bringing what may ordinarily only be accessible to institutional users into the home computer realm.

Free Software Jumps In As I mentioned, by the mid 90s, I had come across RMS s writings about free software most probably his 1992 essay Why Software Should Be Free. (Please note, this is not a commentary on the more recently-revealed issues surrounding RMS, but rather his writings and work as I encountered them in the 90s.) The notion of a Free operating system not just in cost but in openness was incredibly appealing. Not only could I tinker with it to a much greater extent due to having source for everything, but it included so much software that I d otherwise have to pay for. Compilers! Interpreters! Editors! Terminal emulators! And, especially, server software of all sorts. There d be no way I could afford or run Netware, but with a Free Unixy operating system, I could do all that. My interest was obviously piqued. Add to that the fact that I could actually participate and contribute I was about to become hooked on something that I ve stayed hooked on for decades. But then the question was: which Free operating system? Eventually I chose FreeBSD to begin with; that would have been sometime in 1995. I don t recall the exact reasons for that. I remember downloading Slackware install floppies, and probably the fact that Debian wasn t yet at 1.0 scared me off for a time. FreeBSD s fantastic Handbook far better than anything I could find for Linux at the time was no doubt also a factor.

The de Raadt Factor Why not NetBSD or OpenBSD? The short answer is Theo de Raadt. Somewhere in this time, when I was somewhere between 14 and 16 years old, I asked some questions comparing NetBSD to the other two free BSDs. This was on a NetBSD mailing list, but for some reason Theo saw it and got a flame war going, which CC d me. Now keep in mind that even if NetBSD had a web presence at the time, it would have been minimal, and I would have not all that unusually for the time had no way to access it. I was certainly not aware of the, shall we say, acrimony between Theo and NetBSD. While I had certainly seen an online flamewar before, this took on a different and more disturbing tone; months later, Theo randomly emailed me under the subject SLIME saying that I was, well, SLIME . I seem to recall periodic emails from him thereafter reminding me that he hates me and that he had blocked me. (Disclaimer: I have poor email archives from this period, so the full details are lost to me, but I believe I am accurately conveying these events from over 25 years ago) This was a surprise, and an unpleasant one. I was trying to learn, and while it is possible I didn t understand some aspect or other of netiquette (or Theo s personal hatred of NetBSD) at the time, still that is not a reason to flame a 16-year-old (though he would have had no way to know my age). This didn t leave any kind of scar, but did leave a lasting impression; to this day, I am particularly concerned with how FLOSS projects handle poisonous people. Debian, for instance, has come a long way in this over the years, and even Linus Torvalds has turned over a new leaf. I don t know if Theo has. In any case, I didn t use NetBSD then. I did try it periodically in the years since, but never found it compelling enough to justify a large switch from Debian. I never tried OpenBSD for various reasons, but one of them was that I didn t want to join a community that tolerates behavior such as Theo s from its leader.

Moving to FreeBSD Moving from OS/2 to FreeBSD was final. That is, I didn t have enough hard drive space to keep both. I also didn t have the backup capacity to back up OS/2 completely. My BBS, which ran Virtual BBS (and at some point also AdeptXBBS) was deleted and reincarnated in a different form. My BBS was a member of both FidoNet and VirtualNet; the latter was specific to VBBS, and had to be dropped. I believe I may have also had to drop the FidoNet link for a time. This was the biggest change of computing in my life to that point. The earlier experiences hadn t literally destroyed what came before. OS/2 could still run my DOS programs. Its command shell was quite DOS-like. It ran Windows programs. I was going to throw all that away and leap into the unknown. I wish I had saved a copy of my BBS; I would love to see the messages I exchanged back then, or see its menu screens again. I have little memory of what it looked like. But other than that, I have no regrets. Pursuing Free, Unixy operating systems brought me a lot of enjoyment and a good career. That s not to say it was easy. All the problems of not being in the Microsoft ecosystem were magnified under FreeBSD and Linux. In a day before EDID, monitor timings had to be calculated manually and you risked destroying your monitor if you got them wrong. Word processing and spreadsheet software was pretty much not there for FreeBSD or Linux at the time; I was therefore forced to learn LaTeX and actually appreciated that. Software like PageMaker or CorelDraw was certainly nowhere to be found for those free operating systems either. But I got a ton of new capabilities. I mentioned the BBS didn t shut down, and indeed it didn t. I ran what was surely a supremely unique oddity: a free, dialin Unix shell server in the middle of a small town in Kansas. I m sure I provided things such as pine for email and some help text and maybe even printouts for how to use it. The set of callers slowly grew over the time period, in fact. And then I got UUCP.

Enter UUCP Even throughout all this, there was no local Internet provider and things were still long distance. I had Internet Email access via assorted strange routes, but they were all strange. And, I wanted access to Usenet. In 1995, it happened. The local ISP I mentioned offered UUCP access. Though I couldn t afford the dialup shell (or later, SLIP/PPP) that they offered due to long-distance costs, UUCP s very efficient batched processes looked doable. I believe I established that link when I was 15, so in 1995. I worked to register my domain, complete.org, as well. At the time, the process was a bit lengthy and involved downloading a text file form, filling it out in a precise way, sending it to InterNIC, and probably mailing them a check. Well I did that, and in September of 1995, complete.org became mine. I set up sendmail on my local system, as well as INN to handle the limited Usenet newsfeed I requested from the ISP. I even ran Majordomo to host some mailing lists, including some that were surprisingly high-traffic for a few-times-a-day long-distance modem UUCP link! The modem client programs for FreeBSD were somewhat less advanced than for OS/2, but I believe I wound up using Minicom or Seyon to continue to dial out to BBSs and, I believe, continue to use Learning Link. So all the while I was setting up my local BBS, I continued to have access to the text Internet, consisting of chiefly Gopher for me.

Switching to Debian I switched to Debian sometime in 1995 or 1996, and have been using Debian as my primary OS ever since. I continued to offer shell access, but added the WorldVU Atlantis menuing BBS system. This provided a return of a more BBS-like interface (by default; shell was still an uption) as well as some BBS door games such as LoRD and TradeWars 2002, running under DOS emulation. I also continued to run INN, and ran ifgate to allow FidoNet echomail to be presented into INN Usenet-like newsgroups, and netmail to be gated to Unix email. This worked pretty well. The BBS continued to grow in these days, peaking at about two dozen total user accounts, and maybe a dozen regular users.

Dial-up access availability I believe it was in 1996 that dial up PPP access finally became available in my small town. What a thrill! FINALLY! I could now FTP, use Gopher, telnet, and the web all from home. Of course, it was at modem speeds, but still. (Strangely, I have a memory of accessing the Web using WebExplorer from OS/2. I don t know exactly why; it s possible that by this time, I had upgraded to a 486 DX2/66 and was able to reinstall OS/2 on the old 25MHz 486, or maybe something was wrong with the timeline from my memories from 25 years ago above. Or perhaps I made the occasional long-distance call somewhere before I ditched OS/2.) Gopher sites still existed at this point, and I could access them using Netscape Navigator which likely became my standard Gopher client at that point. I don t recall using UMN text-mode gopher client locally at that time, though it s certainly possible I did.

The city Starting when I was 15, I took computer science classes at Wichita State University. The first one was a class in the summer of 1995 on C++. I remember being worried about being good enough for it I was, after all, just after my HS freshman year and had never taken the prerequisite C class. I loved it and got an A! By 1996, I was taking more classes. In 1996 or 1997 I stayed in Wichita during the day due to having more than one class. So, what would I do then but enjoy the computer lab? The CS dept. had two of them: one that had NCD X terminals connected to a pair of SunOS servers, and another one running Windows. I spent most of the time in the Unix lab with the NCDs; I d use Netscape or pine, write code, enjoy the University s fast Internet connection, and so forth. In 1997 I had graduated high school and that summer I moved to Wichita to attend college. As was so often the case, I shut down the BBS at that time. It would be 5 years until I again dealt with Internet at home in a rural community. By the time I moved to my apartment in Wichita, I had stopped using OS/2 entirely. I have no memory of ever having OS/2 there. Along the way, I had bought a Pentium 166, and then the most expensive piece of computing equipment I have ever owned: a DEC Alpha, which, of course, ran Linux.

ISDN I must have used dialup PPP for a time, but I eventually got a job working for the ISP I had used for UUCP, and then PPP. While there, I got a 128Kbps ISDN line installed in my apartment, and they gave me a discount on the service for it. That was around 3x the speed of a modem, and crucially was always on and gave me a public IP. No longer did I have to use UUCP; now I got to host my own things! By at least 1998, I was running a web server on www.complete.org, and I had an FTP server going as well.

Even Bigger Cities In 1999 I moved to Dallas, and there got my first broadband connection: an ADSL link at, I think, 1.5Mbps! Now that was something! But it had some reliability problems. I eventually put together a server and had it hosted at an acquantaince s place who had SDSL in his apartment. Within a couple of years, I had switched to various kinds of proper hosting for it, but that is a whole other article. In Indianapolis, I got a cable modem for the first time, with even tighter speeds but prohibitions on running servers on it. Yuck.

Challenges Being non-Microsoft continued to have challenges. Until the advent of Firefox, a web browser was one of the biggest. While Netscape supported Linux on i386, it didn t support Linux on Alpha. I hobbled along with various attempts at emulators, old versions of Mosaic, and so forth. And, until StarOffice was open-sourced as Open Office, reading Microsoft file formats was also a challenge, though WordPerfect was briefly available for Linux. Over the years, I have become used to the Linux ecosystem. Perhaps I use Gimp instead of Photoshop and digikam instead of well, whatever somebody would use on Windows. But I get ZFS, and containers, and so much that isn t available there. Yes, I know Apple never went away and is a thing, but for most of the time period I discuss in this article, at least after the rise of DOS, it was niche compared to the PC market.

Back to Kansas In 2002, I moved back to Kansas, to a rural home near a different small town in the county next to where I grew up. Over there, it was back to dialup at home, but I had faster access at work. I didn t much care for this, and thus began a 20+-year effort to get broadband in the country. At first, I got a wireless link, which worked well enough in the winter, but had serious problems in the summer when the trees leafed out. Eventually DSL became available locally highly unreliable, but still, it was something. Then I moved back to the community I grew up in, a few miles from where I grew up. Again I got DSL a bit better. But after some years, being at the end of the run of DSL meant I had poor speeds and reliability problems. I eventually switched to various wireless ISPs, which continues to the present day; while people in cities can get Gbps service, I can get, at best, about 50Mbps. Long-distance fees are gone, but the speed disparity remains.

Concluding Reflections I am glad I grew up where I did; the strong community has a lot of advantages I don t have room to discuss here. In a number of very real senses, having no local services made things a lot more difficult than they otherwise would have been. However, perhaps I could say that I also learned a lot through the need to come up with inventive solutions to those challenges. To this day, I think a lot about computing in remote environments: partially because I live in one, and partially because I enjoy visiting places that are remote enough that they have no Internet, phone, or cell service whatsoever. I have written articles like Tools for Communicating Offline and in Difficult Circumstances based on my own personal experience. I instinctively think about making protocols robust in the face of various kinds of connectivity failures because I experience various kinds of connectivity failures myself.

(Almost) Everything Lives On In 2002, Gopher turned 10 years old. It had probably been about 9 or 10 years since I had first used Gopher, which was the first way I got on live Internet from my house. It was hard to believe. By that point, I had an always-on Internet link at home and at work. I had my Alpha, and probably also at least PCMCIA Ethernet for a laptop (many laptops had modems by the 90s also). Despite its popularity in the early 90s, less than 10 years after it came on the scene and started to unify the Internet, it was mostly forgotten. And it was at that moment that I decided to try to resurrect it. The University of Minnesota finally released it under an Open Source license. I wrote the first new gopher server in years, pygopherd, and introduced gopher to Debian. Gopher lives on; there are now quite a few Gopher clients and servers out there, newly started post-2002. The Gemini protocol can be thought of as something akin to Gopher 2.0, and it too has a small but blossoming ecosystem. Archie, the old FTP search tool, is dead though. Same for WAIS and a number of the other pre-web search tools. But still, even FTP lives on today. And BBSs? Well, they didn t go away either. Jason Scott s fabulous BBS documentary looks back at the history of the BBS, while Back to the BBS from last year talks about the modern BBS scene. FidoNet somehow is still alive and kicking. UUCP still has its place and has inspired a whole string of successors. Some, like NNCP, are clearly direct descendents of UUCP. Filespooler lives in that ecosystem, and you can even see UUCP concepts in projects as far afield as Syncthing and Meshtastic. Usenet still exists, and you can now run Usenet over NNCP just as I ran Usenet over UUCP back in the day (which you can still do as well). Telnet, of course, has been largely supplanted by ssh, but the concept is more popular now than ever, as Linux has made ssh be available on everything from Raspberry Pi to Android. And I still run a Gopher server, looking pretty much like it did in 2002. This post also has a permanent home on my website, where it may be periodically updated.

1 August 2022

Bastian Venthur: Keychron keyboards fixed on Linux

Last year, I wrote about on how to get my buggy Keychron C1 keyboard working properly on Linux by setting a kernel module parameter. Afterwards, I contacted Hans de Goede since he was the last one that contributed a major patch to the relevant kernel module. After some debugging, it turned out that the Keychron keyboards are indeed misbehaving when set to Windows mode. Almost a year later, Bryan Cain provided a patch fixing the behavior, which has now been merged to the Linux kernel in 5.19. Thank you, Hans and Bryan!

8 April 2022

Jacob Adams: The Unexpected Importance of the Trailing Slash

For many using Unix-derived systems today, we take for granted that /some/path and /some/path/ are the same. Most shells will even add a trailing slash for you when you press the Tab key after the name of a directory or a symbolic link to one. However, many programs treat these two paths as subtly different in certain cases, which I outline below, as all three have tripped me up in various ways1.

POSIX and Coreutils Perhaps the trickiest use of the trailing slash in a distinguishing way is in POSIX2 which states:
When the final component of a pathname is a symbolic link, the standard requires that a trailing <slash> causes the link to be followed. This is the behavior of historical implementations3. For example, for /a/b and /a/b/, if /a/b is a symbolic link to a directory, then /a/b refers to the symbolic link, and /a/b/ refers to the directory to which the symbolic link points.
This leads to some unexpected behavior. For example, if you have the following structure of a directory dir containing a file dirfile with a symbolic link link pointing to dir. (which will be used in all shell examples throughout this article):
$ ls -lR
.:
total 4
drwxr-xr-x 2 jacob jacob 4096 Apr  3 00:00 dir
lrwxrwxrwx 1 jacob jacob    3 Apr  3 00:00 link -> dir
./dir:
total 0
-rw-r--r-- 1 jacob jacob 0 Apr  3 00:12 dirfile
On Unixes such as MacOS, FreeBSD or Illumos4, you can move a directory through a symbolic link by using a trailing slash:
$ mv link/ otherdir
$ ls
link	otherdir
On Linux5, mv will not rename the indirectly referenced directory and not the symbolic link, when given a symbolic link with a trailing slash as the source to be renamed. despite the coreutils documentation s claims to the contrary6, instead failing with Not a directory:
$ mv link/ other
mv: cannot move 'link/' to 'other': Not a directory
$ mkdir otherdir
$ mv link/ otherdir
mv: cannot move 'link/' to 'otherdir/link': Not a directory
$ mv link/ otherdir/
mv: cannot move 'link/' to 'otherdir/link': Not a directory
$ mv link otherdirlink
$ ls -l otherdirlink
lrwxrwxrwx 1 jacob jacob 3 Apr  3 00:13 otherdirlink -> dir
This is probably for the best, as it is very confusing behavior. There is still one advantage the trailing slash has when using mv, even on Linux, in that is it does not allow you to move a file to a non-existent directory, or move a file that you expect to be a directory that isn t.
$ mv dir/dirfile nonedir/
mv: cannot move 'dir/dirfile' to 'nonedir/': Not a directory
$ touch otherfile
$ mv otherfile/ dir
mv: cannot stat 'otherfile/': Not a directory
$ mv otherfile dir
$ ls dir
dirfile  otherfile
However, Linux still exhibits some confusing behavior of its own, like when you attempt to remove link recursively with a trailing slash:
rm -rvf link/
Neither link nor dir are removed, but the contents of dir are removed:
removed 'link/dirfile'
Whereas if you remove the trailing slash, you just remove the symbolic link:
$ rm -rvf link
removed 'link'
While on MacOS, FreeBSD or Illumos4, rm will also remove the source directory:
$ rm -rvf link
link/dirfile
link/
$ ls
link
The find and ls commands, in contrast, behave the same on all three operating systems. The find command only searches the contents of the directory a symbolic link points to if the trailing slash is added:
$ find link -name dirfile
$ find link/ -name dirfile
link/dirfile
The ls command acts similarly, showing information on just a symbolic link by itself unless a trailing slash is added, at which point it shows the contents of the directory that it links to:
$ ls -l link
lrwxrwxrwx 1 jacob jacob 3 Apr  3 00:13 link -> dir
$ ls -l link/
total 0
-rw-r--r-- 1 jacob jacob 0 Apr  3 00:13 dirfile

rsync The command rsync handles a trailing slash in an unusual way that trips up many new users. The rsync man page notes:
You can think of a trailing / on a source as meaning copy the contents of this directory as opposed to copy the directory by name , but in both cases the attributes of the containing directory are transferred to the containing directory on the destination.
That is to say, if we had two folders a and b each of which contained some files:
$ ls -R .
.:
a  b
./a:
a1  a2
./b:
b1  b2
Running rsync -av a b moves the entire directory a to directory b:
$ rsync -av a b
sending incremental file list
a/
a/a1
a/a2
sent 181 bytes  received 58 bytes  478.00 bytes/sec
total size is 0  speedup is 0.00
$ ls -R b
b:
a  b1  b2
b/a:
a1  a2
While running rsync -av a/ b moves the contents of directory a to b:
$ rsync -av a/ b
sending incremental file list
./
a1
a2
sent 170 bytes  received 57 bytes  454.00 bytes/sec
total size is 0  speedup is 0.00
$ ls b
a1  a2	b1  b2

Dockerfile COPY The Dockerfile COPY command also cares about the presence of the trailing slash, using it to determine whether the destination should be considered a file or directory. The Docker documentation explains the rules of the command thusly:
COPY [--chown=<user>:<group>] <src>... <dest>
If <src> is a directory, the entire contents of the directory are copied, including filesystem metadata. Note: The directory itself is not copied, just its contents. If <src> is any other kind of file, it is copied individually along with its metadata. In this case, if <dest> ends with a trailing slash /, it will be considered a directory and the contents of <src> will be written at <dest>/base(<src>). If multiple <src> resources are specified, either directly or due to the use of a wildcard, then <dest> must be a directory, and it must end with a slash /. If <dest> does not end with a trailing slash, it will be considered a regular file and the contents of <src> will be written at <dest>. If <dest> doesn t exist, it is created along with all missing directories in its path.
This means if you had a COPY command that moved file to a nonexistent containerfile without the slash, it would create containerfile as a file with the contents of file.
COPY file /containerfile
container$ stat -c %F containerfile
regular empty file
Whereas if you add a trailing slash, then file will be added as a file under the new directory containerdir:
COPY file /containerdir/
container$ stat -c %F containerdir
directory
Interestingly, at no point can you copy a directory completely, only its contents. Thus if you wanted to make a directory in the new container, you need to specify its name in both the source and the destination:
COPY dir /dirincontainer
container$ stat -c %F /dirincontainer
directory
Dockerfiles do also make good use of the trailing slash to ensure they re doing what you mean by requiring a trailing slash on the destination of multiple files:
COPY file otherfile /othercontainerdir
results in the following error:
When using COPY with more than one source file, the destination must be a directory and end with a /
  1. I m sure there are probably more than just these three cases, but these are the three I m familiar with. If you know of more, please tell me about them!.
  2. Some additional relevant sections are the Path Resolution Appendix and the section on Symbolic Links.
  3. The sentence This is the behavior of historical implementations implies that this probably originated in some ancient Unix derivative, possibly BSD or even the original Unix. I don t really have a source on that though, so please reach out if you happen to have any more knowledge on what this refers to.
  4. I tested on MacOS 11.6.5, FreeBSD 12.0 and OmniOS 5.11 2
  5. unless the source is a directory trailing slashes give -ENOTDIR
  6. In fairness to the coreutils maintainers, it seems to be true on all other Unix platforms, but it probably deserves a mention in the documentation when Linux is the most common platform on which coreutils is used. I should submit a patch.

5 April 2022

Kees Cook: security things in Linux v5.10

Previously: v5.9 Linux v5.10 was released in December, 2020. Here s my summary of various security things that I found interesting: AMD SEV-ES
While guest VM memory encryption with AMD SEV has been supported for a while, Joerg Roedel, Thomas Lendacky, and others added register state encryption (SEV-ES). This means it s even harder for a VM host to reconstruct a guest VM s state. x86 static calls
Josh Poimboeuf and Peter Zijlstra implemented static calls for x86, which operates very similarly to the static branch infrastructure in the kernel. With static branches, an if/else choice can be hard-coded, instead of being run-time evaluated every time. Such branches can be updated too (the kernel just rewrites the code to switch around the branch ). All these principles apply to static calls as well, but they re for replacing indirect function calls (i.e. a call through a function pointer) with a direct call (i.e. a hard-coded call address). This eliminates the need for Spectre mitigations (e.g. RETPOLINE) for these indirect calls, and avoids a memory lookup for the pointer. For hot-path code (like the scheduler), this has a measurable performance impact. It also serves as a kind of Control Flow Integrity implementation: an indirect call got removed, and the potential destinations have been explicitly identified at compile-time. network RNG improvements
In an effort to improve the pseudo-random number generator used by the network subsystem (for things like port numbers and packet sequence numbers), Linux s home-grown pRNG has been replaced by the SipHash round function, and perturbed by (hopefully) hard-to-predict internal kernel states. This should make it very hard to brute force the internal state of the pRNG and make predictions about future random numbers just from examining network traffic. Similarly, ICMP s global rate limiter was adjusted to avoid leaking details of network state, as a start to fixing recent DNS Cache Poisoning attacks. SafeSetID handles GID
Thomas Cedeno improved the SafeSetID LSM to handle group IDs (which required teaching the kernel about which syscalls were actually performing setgid.) Like the earlier setuid policy, this lets the system owner define an explicit list of allowed group ID transitions under CAP_SETGID (instead of to just any group), providing a way to keep the power of granting this capability much more limited. (This isn t complete yet, though, since handling setgroups() is still needed.) improve kernel s internal checking of file contents
The kernel provides LSMs (like the Integrity subsystem) with details about files as they re loaded. (For example, loading modules, new kernel images for kexec, and firmware.) There wasn t very good coverage for cases where the contents were coming from things that weren t files. To deal with this, new hooks were added that allow the LSMs to introspect the contents directly, and to do partial reads. This will give the LSMs much finer grain visibility into these kinds of operations. set_fs removal continues
With the earlier work landed to free the core kernel code from set_fs(), Christoph Hellwig made it possible for set_fs() to be optional for an architecture. Subsequently, he then removed set_fs() entirely for x86, riscv, and powerpc. These architectures will now be free from the entire class of kernel address limit attacks that only needed to corrupt a single value in struct thead_info. sysfs_emit() replaces sprintf() in /sys
Joe Perches tackled one of the most common bug classes with sprintf() and snprintf() in /sys handlers by creating a new helper, sysfs_emit(). This will handle the cases where kernel code was not correctly dealing with the length results from sprintf() calls, which might lead to buffer overflows in the PAGE_SIZE buffer that /sys handlers operate on. With the helper in place, it was possible to start the refactoring of the many sprintf() callers. nosymfollow mount option
Mattias Nissler and Ross Zwisler implemented the nosymfollow mount option. This entirely disables symlink resolution for the given filesystem, similar to other mount options where noexec disallows execve(), nosuid disallows setid bits, and nodev disallows device files. Quoting the patch, it is useful as a defensive measure for systems that need to deal with untrusted file systems in privileged contexts. (i.e. for when /proc/sys/fs/protected_symlinks isn t a big enough hammer.) Chrome OS uses this option for its stateful filesystem, as symlink traversal as been a common attack-persistence vector. ARMv8.5 Memory Tagging Extension support
Vincenzo Frascino added support to arm64 for the coming Memory Tagging Extension, which will be available for ARMv8.5 and later chips. It provides 4 bits of tags (covering multiples of 16 byte spans of the address space). This is enough to deterministically eliminate all linear heap buffer overflow flaws (1 tag for free , and then rotate even values and odd values for neighboring allocations), which is probably one of the most common bugs being currently exploited. It also makes use-after-free and over/under indexing much more difficult for attackers (but still possible if the target s tag bits can be exposed). Maybe some day we can switch to 128 bit virtual memory addresses and have fully versioned allocations. But for now, 16 tag values is better than none, though we do still need to wait for anyone to actually be shipping ARMv8.5 hardware. fixes for flaws found by UBSAN
The work to make UBSAN generally usable under syzkaller continues to bear fruit, with various fixes all over the kernel for stuff like shift-out-of-bounds, divide-by-zero, and integer overflow. Seeing these kinds of patches land reinforces the the rationale of shifting the burden of these kinds of checks to the toolchain: these run-time bugs continue to pop up. flexible array conversions
The work on flexible array conversions continues. Gustavo A. R. Silva and others continued to grind on the conversions, getting the kernel ever closer to being able to enable the -Warray-bounds compiler flag and clear the path for saner bounds checking of array indexes and memcpy() usage. That s it for now! Please let me know if you think anything else needs some attention. Next up is Linux v5.11.

2022, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
CC BY-SA 4.0

9 March 2022

Fran ois Marier: Using a Streamzap remote control with MythTV on Debian Bullseye

After upgrading my MythTV machine to Debian Bullseye and MythTV 31, my Streamzap remote control stopped working correctly: the up and down buttons were working, but the OK button wasn't. Here's the complete solution that made it work with the built-in kernel support (i.e. without LIRC).

Button re-mapping Since some of the buttons were working, but not others, I figured that the buttons were probably not mapped to the right keys. Inspired by these old v4l-utils-based instructions, I made my own custom keymap by by copying the original keymap:
cp /lib/udev/rc_keymaps/streamzap.toml /etc/rc_keymaps/
and then modifying it to adapt it to what MythTV needs. This is what I ended up with:
<span class="createlink"><a href="/blog.cgi?do=create&amp;from=posts%2Fusing-streamzap-remote-with-mythtv-debian-bullseye&amp;page=protocols" rel="nofollow">?</a>protocols</span>
name = "streamzap"
protocol = "rc-5-sz"
[protocols.scancodes]
0x28c0 = "KEY_0"
0x28c1 = "KEY_1"
0x28c2 = "KEY_2"
0x28c3 = "KEY_3"
0x28c4 = "KEY_4"
0x28c5 = "KEY_5"
0x28c6 = "KEY_6"
0x28c7 = "KEY_7"
0x28c8 = "KEY_8"
0x28c9 = "KEY_9"
0x28ca = "KEY_ESC"
0x28cb = "KEY_MUTE"
0x28cc = "KEY_UP"
0x28cd = "KEY_RIGHTBRACE"
0x28ce = "KEY_DOWN"
0x28cf = "KEY_LEFTBRACE"
0x28d0 = "KEY_UP"
0x28d1 = "KEY_LEFT"
0x28d2 = "KEY_ENTER"
0x28d3 = "KEY_RIGHT"
0x28d4 = "KEY_DOWN"
0x28d5 = "KEY_M"
0x28d6 = "KEY_ESC"
0x28d7 = "KEY_L"
0x28d8 = "KEY_P"
0x28d9 = "KEY_ESC"
0x28da = "KEY_BACK"
0x28db = "KEY_FORWARD"
0x28dc = "KEY_R"
0x28dd = "KEY_PAGEUP"
0x28de = "KEY_PAGEDOWN"
0x28e0 = "KEY_D"
0x28e1 = "KEY_I"
0x28e2 = "KEY_END"
0x28e3 = "KEY_A"
Note that the keycodes can be found in the kernel source code. With my own keymap in place at /etc/rc_keymaps/streamzap.toml, I changed /etc/rc_maps.cfg to have the kernel driver automatically use it:
--- a/rc_maps.cfg
+++ b/rc_maps.cfg
@@ -126,7 +126,7 @@
 *      rc-real-audio-220-32-keys real_audio_220_32_keys.toml
 *      rc-reddo                 reddo.toml
 *      rc-snapstream-firefly    snapstream_firefly.toml
-*      rc-streamzap             streamzap.toml
+*      rc-streamzap             /etc/rc_keymaps/streamzap.toml
 *      rc-su3000                su3000.toml
 *      rc-tango                 tango.toml
 *      rc-tanix-tx3mini         tanix_tx3mini.toml

Button repeat delay To adjust the delay before button presses are repeated, I followed these old out-of-date instructions on the MythTV wiki and put the following in /etc/udev/rules.d/streamzap.rules:
ACTION=="add", ATTRS idVendor =="0e9c", ATTRS idProduct =="0000", RUN+="/usr/bin/ir-keytable -s rc0 -D 1000 -P 250"
Note that the -d option has been replaced with -s in the latest version of ir-keytable. To check that the Streamzap is indeed detected as rc0 on your system, use this command:
$ ir-keytable 
Found /sys/class/rc/rc0/ with:
    Name: Streamzap PC Remote Infrared Receiver (0e9c:0000)
    Driver: streamzap
    Default keymap: rc-streamzap
...
Make sure you don't pass the -c to ir-keytable or else it will clear the keymap set via /etc/rc_maps.cfg, removing all of the button mappings.

6 February 2022

Jonathan McDowell: Free Software Activities for 2021

About a month later than I probably should have posted it, here s a recap of my Free Software activities in 2021. For previous years see 2019 + 2020. Again, this year had fewer contributions than I d like thanks to continuing fatigue about the state of the world, and trying to work on separation between work and leisure while working from home. I ve made some effort to improve that balance but it s still a work in progress.

Conferences No surprise, I didn t attend any in-person conferences in 2021. I find virtual conferences don t do a lot for me (a combination of my not carving time out for them in the same way, because not being at the conference means other things will inevitably intrude, and the lack of the social side) but I did get to attend a few of the DebConf21 talks, which was nice. I m hoping to make it to DebConf22 this year in person.

Debian Most of my contributions to Free software continue to happen within Debian. As part of the Data Protection Team I responded to various inbound queries to that team. Some of this involved chasing up other project teams who had been slow to respond - folks, if you re running a service that stores personal data about people then you need to be responsive to requests about it. Some of this was dealing with what look like automated scraping tools which send no information about the person making the request, and in all the cases we ve seen so far there s been no indication of any data about that person on any systems we have access to. Further team time was wasted dealing with the Princeton-Radboud Study on Privacy Law Implementation (though Matthew did the majority of the work on this). The Debian Keyring was possibly my largest single point of contribution. We re in a roughly 3 month rotation of who handles the keyring updates, and I handled 2021.03.24, 2021.04.09, 2021.06.25, 2021.09.25 + 2021.12.24 For Debian New Members I m mostly inactive as an application manager - we generally seem to have enough available recently. If that changes I ll look at stepping in to help, but I don t see that happening. I continue to be involved in Front Desk, having various conversations throughout the year with the rest of the team, but there s no doubt Mattia and Pierre-Elliott are the real doers at present. I did take part in an NM Committee appeals process. In terms of package uploads I continued to work on gcc-xtensa-lx106, largely doing uploads to deal with updates to the GCC version or packaging (8 + 9). sigrok had a few minor updates, libsigkrok 0.5.2-3, pulseview 0.4.2-3 as well as a new upstream release of sigrok CLI 0.7.2-1. There was a last minute pre-release upload of libserialport 0.1.1-4 thanks to a kernel change in v5.10.37 which removed termiox support. Despite still not writing any VHDL these days I continue to keep an eye on ghdl, because I found it a useful tool in the past. Last year that was just a build fix for LLVM 11.1.0 - 1.0.0+dfsg+5. Andreas Bombe has largely taken over more proactive maintenance, which is nice to see. I uploaded OpenOCD 0.11.0~rc1-2, cleaning up some packaging / dependency issues. This was followed by 0.11.0~rc2-1 as a newer release candidate. Sadly 0.11.0 did not make it in time for bullseye, but rc2 was fairly close and I uploaded 0.11.0-1 once bullseye was released. Finally I did a drive-by upload for garmin-forerunner-tools 0.10repacked-12, cleaning up some packaging issues and uploading it to salsa. My Forerunner 305 has died (after 11 years of sterling service) and the Forerunner 45 I ve replaced it with uses a different set of tools, so I decided it didn t make sense to pick up longer term ownership of the package.

Linux My Linux contributions continued to revolve around pushing MikroTik RB3011 support upstream. There was a minor code change to Set FIFO sizes for ipq806x (which fixed up the allowed MTU for the internal switch + VLANs). The rest was DTS related - adding ADM DMA + NAND definitions now that the ADM driver was merged, adding tsens details, adding USB port info and adding the L2CC and RPM details for IPQ8064. Finally I was able to update the RB3011 DTS to enable NAND + USB. With all those in I m down to 4 local patches against a mainline kernel, all of which are hacks that aren t suitable for submission upstream. 2 are for patching in details of the root device and ethernet MAC addresses, one is dealing with the fact the IPQ8064 has some reserved memory that doesn t play well with AUTO_ZRELADDR (there keeps being efforts to add some support for this via devicetree, but unfortunately it gets shot down every time), and the final one is a hack to turn off the LCD backlight by treating it as an LED (actually supporting the LCD properly is on my TODO list).

Personal projects 2021 didn t see any releases of onak. It s not dead, just resting, but Sequoia PGP is probably where you should be looking for a modern OpenPGP implementation. I continued work on my Desk Viking project, which is an STM32F103 based debug tool inspired by the Bus Pirate. The main addtion was some CCLib support (forking it in the process to move to Python 3 and add some speed ups) to allow me to program my Zigbee dongles, but I also added some 1-Wire search logic and some support for Linux emulation mode with VCD output to allow for a faster development cycle. I really want to try and get OpenOCD JTAG mode supported at some point, and have vague plans for an STM32F4 based version that have suffered from a combination of a silicon shortage and a lack of time. That wraps up 2021. I d like to say I m hoping to make more Free Software contributions this year, but I don t have a concrete plan yet for how that might happen, so I ll have to wait and see.

6 January 2022

Jacob Adams: Linux Hibernation Documentation

Recently I ve been curious about how hibernation works on Linux, as it s an interesting interaction between hardware and software. There are some notes in the Arch wiki and the kernel documentation (as well as some kernel documentation on debugging hibernation and on sleep states more generally), and of course the ACPI Specification

The Formal Definition ACPI (Advanced Configuration and Power Interface) is, according to the spec, an architecture-independent power management and configuration framework that forms a subsystem within the host OS which defines a hardware register set to define power states. ACPI defines four global system states G0, working/on, G1, sleeping, G2, soft off, and G3, mechanical off1. Within G1 there are 4 sleep states, numbered S1 through S4. There are also S0 and S5, which are equivalent to G0 and G2 respectively2.

Sleep According to the spec, the ACPI S1-S4 states all do the same thing from the operating system s perspective, but each saves progressively more power, so the operating system is expected to pick the deepest of these states when entering sleep. However, most operating systems3 distinguish between S1-S3, which are typically referred to as sleep or suspend, and S4, which is typically referred to as hibernation.

S1: CPU Stop and Cache Wipe The CPU caches are wiped and then the CPU is stopped, which the spec notes is equivalent to the WBINVD instruction followed by the STPCLK signal on x86. However, nothing is powered off.

S2: Processor Power off The system stops the processor and most system clocks (except the real time clock), then powers off the processor. Upon waking, the processor will not continue what it was doing before, but instead use its reset vector4.

S3: Suspend/Sleep (Suspend-to-RAM) Mostly equivalent to S2, but hardware ensures that only memory and whatever other hardware memory requires are powered.

S4: Hibernate (Suspend-to-Disk) In this state, all hardware is completely powered off and an image of the system is written to disk, to be restored from upon reapplying power. Writing the system image to disk can be handled by the operating system if supported, or by the firmware.

Linux Sleep States Linux has its own set of sleep states which mostly correspond with ACPI states.

Suspend-to-Idle This is a software only sleep that puts all hardware into the lowest power state it can, suspends timekeeping, and freezes userspace processes. All userspace and some kernel threads5, except those tagged with PF_NOFREEZE, are frozen before the system enters a sleep state. Frozen tasks are sent to the __refrigerator(), where they set TASK_UNINTERRUPTIBLE and PF_FROZEN and infinitely loop until PF_FROZEN is unset6. This prevents these tasks from doing anything during the imaging process. Any userspace process running on a different CPU while the kernel is trying to create a memory image would cause havoc. This is also done because any filesystem changes made during this would be lost and could cause the filesystem and its related in-memory structures to become inconsistent. Also, creating a hibernation image requires about 50% of memory free, so no tasks should be allocating memory, which freezing also prevents.

Standby This is equivalent to ACPI S1.

Suspend-to-RAM This is equivalent to ACPI S3.

Hibernation Hibernation is mostly equivalent to ACPI S4 but does not require S4, only requiring low-level code for resuming the system to be present for the underlying CPU architecture according to the Linux sleep state docs. To hibernate, everything is stopped and the kernel takes a snapshot of memory. Then, the system writes out the memory image to disk. Finally, the system either enters S4 or turns off completely. When the system restores power it boots a new kernel, which looks for a hibernation image and loads it into memory. It then overwrites itself with the hibernation image and jumps to a resume area of the original kernel7. The resumed kernel restores the system to its previous state and resumes all processes.

Hybrid Suspend Hybrid suspend does not correspond to an official ACPI state, but instead is effectively a combination of S3 and S4. The system writes out a hibernation image, but then enters suspend-to-RAM. If the system wakes up from suspend it will discard the hibernation image, but if the system loses power it can safely restore from the hibernation image.
  1. The difference between soft and mechanical off is that mechanical off is entered and left by a mechanical means (for example, turning off the system s power through the movement of a large red switch)
  2. It s unclear to me why G and S states overlap like this. I assume this is a relic of an older spec that only had S states, but I have not as yet found any evidence of this. If someone has any information on this, please let me know and I ll update this footnote.
  3. Of the operating systems I know of that support ACPI sleep states (I checked Windows, Mac, Linux, and the three BSDs8), only MacOS does not allow the user to deliberately enable hibernation, instead supporting a hybrid suspend it calls safe sleep
  4. The reset vector of a processor is the default location where, upon a reset, the processor will go to find the first instruction to execute. In other words, the reset vector is a pointer or address where the processor should always begin its execution. This first instruction typically branches to the system initialization code. Xiaocong Fan, Real-Time Embedded Systems, 2015
  5. All kernel threads are tagged with PF_NOFREEZE by default, so they must specifically opt-in to task freezing.
  6. This is not from the docs, but from kernel/freezer.c which also notes Refrigerator is place where frozen processes are stored :-).
  7. This is the operation that requires special architecture-specific low-level code .
  8. Interestingly NetBSD has a setting to enable hibernation, but does not actually support hibernation

31 December 2021

Matthew Garrett: Update on Linux hibernation support when lockdown is enabled

Some time back I wrote up a description of my proposed (and implemented) solution for making hibernation work under Linux even within the bounds of the integrity model. It's been a while, so here's an update.

The first is that localities just aren't an option. It turns out that they're optional in the spec, and TPMs are entirely permitted to say they don't support them. The only time they're likely to work is on platforms that support DRTM implementations like TXT. Most consumer hardware doesn't fall into that category, so we don't get to use that solution. Unfortunate, but, well.

The second is that I'd ignored an attack vector. If the kernel is configured to restrict access to PCR 23, then yes, an attacker is never able to modify PCR 23 to be in the same state it would be if hibernation were occurring and the key certification data will fail to validate. Unfortunately, an attacker could simply boot into an older kernel that didn't implement the PCR 23 restriction, and could fake things up there (yes, this is getting a bit convoluted, but the entire point here is to make this impossible rather than just awkward). Once PCR 23 was in the correct state, they would then be able to write out a new swap image, boot into a new kernel that supported the secure hibernation solution, and have that resume successfully in the (incorrect) belief that the image was written out in a secure environment.

This felt like an awkward problem to fix. We need to be able to distinguish between the kernel having modified the PCRs and userland having modified the PCRs, and we need to be able to do this without modifying any kernels that have already been released[1]. The normal approach to determining whether an event occurred in a specific phase of the boot process is to "cap" the PCR - extend it with a known value that indicates a transition between stages of the boot process. Any events that occur before the cap event must have occurred in the previous stage of boot, and since the final PCR value depends on the order of measurements and not just the contents of those measurements, if a PCR is capped before userland runs, userland can't fake the same PCR value afterwards. If Linux capped a PCR before userland started running, we'd be able to place a measurement there before the cap occurred and then prove that that extension occurred before userland had the opportunity to interfere. We could simply place a statement that the kernel supported the PCR 23 restrictions there, and we'd be fine.

Unfortunately Linux doesn't currently do this, and adding support for doing so doesn't fix the problem - if an attacker boots a kernel that doesn't cap a PCR, they can just cap it themselves from userland. So, we're faced with the same problem: booting an older kernel allows the system to be placed in an identical state to the current kernel, and a fake hibernation image can be written out. Solving this required a PCR that was being modified after kernel code was running, but before userland was started, even with existing kernels.

Thankfully, there is one! PCR 5 is defined as containing measurements related to boot management configuration and data. One of the measurements it contains is the result of the UEFI ExitBootServices() call. ExitBootServices() is called at the transition from the UEFI boot environment to the running OS, and the kernel contains code that executes before it. So, if we measure an assertion regarding whether or not we support restricted access to PCR 23 into PCR 5 before we call ExitBootServices(), this will prevent userspace from spoofing us (because userspace will only be able to extend PCR 5 after the firmware extended PCR 5 in response to ExitBootServices() being called). Obviously this depends on the firmware actually performing the PCR 5 extension when ExitBootServices() is called, but if firmware's out of spec then I don't think there's any real expectation of it being secure enough for any of this to buy you anything anyway.

My current tree is here, but there's a couple of things I want to do before submitting it, including ensuring that the key material is wiped from RAM after use (otherwise it could potentially be scraped out and used to generate another image afterwards) and, uh, actually making sure this works (I no longer have the machine I was previously using for testing, and switching my other dev machine over to TPM 2 firmware is proving troublesome, so I need to pull another machine out of the stack and reimage it).

[1] The linear nature of time makes feature development much more frustrating

comment count unavailable comments

Next.