Search Results: "philipp"

1 May 2024

Antoine Beaupr : Tor migrates from Gitolite/GitWeb to GitLab

Note: I've been awfully silent here for the past ... (checks notes) oh dear, 3 months! But that's not because I've been idle, quite the contrary, I've been very busy but just didn't have time to write about anything. So I've taken it upon myself to write something about my work this week, and published this post on the Tor blog which I copy here for a broader audience. Let me know if you like this or not.
Tor has finally completed a long migration from legacy Git infrastructure (Gitolite and GitWeb) to our self-hosted GitLab server. Git repository addresses have therefore changed. Many of you probably have made the switch already, but if not, you will need to change:
https://git.torproject.org/
to:
https://gitlab.torproject.org/
In your Git configuration. The GitWeb front page is now an archived listing of all the repositories before the migration. Inactive git repositories were archived in GitLab legacy/gitolite namespace and the gitweb.torproject.org and git.torproject.org web sites now redirect to GitLab. Best effort was made to reproduce the original gitolite repositories faithfully and also avoid duplicating too much data in the migration. But it's possible that some data present in Gitolite has not migrated to GitLab. User repositories are particularly at risk, because they were massively migrated, and they were "re-forked" from their upstreams, to avoid wasting disk space. If a user had a project with a matching name it was assumed to have the right data, which might be inaccurate. The two virtual machines responsible for the legacy service (cupani for git-rw.torproject.org and vineale for git.torproject.org and gitweb.torproject.org) have been shutdown. Their disks will remain for 3 months (until the end of July 2024) and their backups for another year after that (until the end of July 2025), after which point all the data from those hosts will be destroyed, with only the GitLab archives remaining. The rest of this article expands on how this was done and what kind of problems we faced during the migration.

Where is the code? Normally, nothing should be lost. All repositories in gitolite have been either explicitly migrated by their owners, forcibly migrated by the sysadmin team (TPA), or explicitly destroyed at their owner's request. An exhaustive rewrite map translates gitolite projects to GitLab projects. Some of those projects actually redirect to their parent in cases of empty repositories that were obvious forks. Destroyed repositories redirect to the GitLab front page. Because the migration happened progressively, it's technically possible that commits pushed to gitolite were lost after the migration. We took great care to avoid that scenario. First, we adopted a proposal (TPA-RFC-36) in June 2023 to announce the transition. Then, in March 2024, we locked down all repositories from any further changes. Around that time, only a handful of repositories had changes made after the adoption date, and we examined each repository carefully to make sure nothing was lost. Still, we built a diff of all the changes in the git references that archivists can peruse to check for data loss. It's large (6MiB+) because a lot of repositories were migrated before the mass migration and then kept evolving in GitLab. Many other repositories were rebuilt in GitLab from parent to rebuild a fork relationship which added extra references to those clones. A note to amateur archivists out there, it's probably too late for one last crawl now. The Git repositories now all redirect to GitLab and are effectively unavailable in their original form. That said, the GitWeb site was crawled into the Internet Archive in February 2024, so at least some copy of it is available in the Wayback Machine. At that point, however, many developers had already migrated their projects to GitLab, so the copies there were already possibly out of date compared with the repositories in GitLab. Software Heritage also has a copy of all repositories hosted on Gitolite since June 2023 and have continuously kept mirroring the repositories, where they will be kept hopefully in eternity. There's an issue where the main website can't find the repositories when you search for gitweb.torproject.org, instead search for git.torproject.org. In any case, if you believe data is missing, please do let us know by opening an issue with TPA.

Why? This is an old project in the making. The first discussion about migrating from gitolite to GitLab started in 2020 (almost 4 years ago). But going further back, the first GitLab experiment was in 2016, almost a decade ago. The current GitLab server dates from 2019, replacing Trac for issue tracking in 2020. It was originally supposed to host only mirrors for merge requests and issue trackers but, naturally, one thing led to another and eventually, GitLab had grown a container registry, continuous integration (CI) runners, GitLab Pages, and, of course, hosted most Git repositories. There were hesitations at moving to GitLab for code hosting. We had discussions about the increased attack surface and ways to mitigate that, but, ultimately, it seems the issues were not that serious and the community embraced GitLab. TPA actually migrated its most critical repositories out of shared hosting entirely, into specific servers (e.g. the Puppet Git repository is just on the Puppet server now), leveraging Git's decentralized nature and removing an entire attack surface from our infrastructure. Some of those repositories are mirrored back into GitLab, but the authoritative copy is not on GitLab. In any case, the proposal to migrate from Gitolite to GitLab was effectively just formalizing a fait accompli.

How to migrate from Gitolite / cgit to GitLab The progressive migration was a challenge. If you intend to migrate between hosting platforms, we strongly recommend to make a "flag day" during which you migrate all repositories at once. This ensures a smoother transition and avoids elaborate rewrite rules. When Gitolite access was shutdown, we had repositories on both GitLab and Gitolite, without a clear relationship between the two. A priori, the plan then was to import all the remaining Gitolite repositories into the legacy/gitolite namespace, but that seemed wasteful, particularly for large repositories like Tor Browser which uses nearly a gigabyte of disk space. So we took special care to avoid duplicating repositories. When the mass migration started, only 71 of the 538 Gitolite repositories were Migrated to GitLab in the gitolite.conf file. So, given that we had hundreds of repositories to migrate:, we developed some automation to "save time". We already automate similar ad-hoc tasks with Fabric, so we used that framework here as well. (Our normal configuration management tool is Puppet, which is a poor fit here.) So a relatively large amount of Python code was produced to basically do the following:
  1. check if all on-disk repositories are listed in gitolite.conf (and vice versa) and either add missing repositories or delete them from disk if garbage
  2. for each repository in gitolite.conf, if its category is marked Migrated to GitLab, skip, otherwise;
  3. find a matching GitLab project by name, prompt the user for multiple matches
  4. if a match is found, redirect if the repository is non-empty
    • we have GitLab projects that look like the real thing, but are only present to host migrated Trac issues
    • in such cases we cloned the Gitolite project locally and pushed to the existing repository instead
  5. otherwise, a new repository is created in the legacy/gitolite namespace, using the "import" mechanism in GitLab to automatically import the repository from Gitolite, creating redirections and updating gitolite.conf to document the change
User repositories (those under the user/ directory in Gitolite) were handled specially. First, the existing redirection map was checked to see if a similarly named project was migrated (so that, e.g. user/dgoulet/tor is properly treated as a fork of tpo/core/tor). Then the parent project was forked in GitLab and the Gitolite project force-pushed to the fork. This allows us to show the fork relationship in GitLab and, more importantly, benefit from the "pool" feature in GitLab which deduplicates disk usage between forks. Sometimes, we found no such relationships. Then we simply imported multiple repositories with similar names in the legacy/gitolite namespace, sometimes creating forks between user repositories, on a first-come-first-served basis from the gitolite.conf order. The code used in this migration is now available publicly. We encourage other groups planning to migrate from Gitolite/GitWeb to GitLab to use (and contribute to) our fabric-tasks repository, even though it does have its fair share of hard-coded assertions. The main entry point is the gitolite.mass-repos-migration task. A typical migration job looked like:
anarcat@angela:fabric-tasks$ fab -H cupani.torproject.org gitolite.mass-repos-migration 
[...]
INFO: skipping project project/help/infra in category Migrated to GitLab
INFO: skipping project project/help/wiki in category Migrated to GitLab
INFO: skipping project project/jenkins/jobs in category Migrated to GitLab
INFO: skipping project project/jenkins/tools in category Migrated to GitLab
INFO: searching for projects matching fastlane
INFO: Successfully connected to https://gitlab.torproject.org
import gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'? [Y/n] 
INFO: importing gitolite project project/tor-browser/fastlane into gitlab legacy/gitolite/project/tor-browser/fastlane with desc 'Tor Browser app store and deployment configuration for Fastlane'
INFO: building a new connect to cupani
INFO: defaulting name to fastlane
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/project/tor-browser
INFO: archiving project
INFO: creating repository fastlane (fastlane) in namespace legacy/gitolite/project/tor-browser from https://git.torproject.org/project/tor-browser/fastlane into https://gitlab.torproject.org/legacy/gitolite/project/tor-browser/fastlane
INFO: migrating Gitolite repository project/tor-browser/fastlane to GitLab project legacy/gitolite/project/tor-browser/fastlane
INFO: uploading 399 bytes to /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive
INFO: making /srv/git.torproject.org/repositories/project/tor-browser/fastlane.git/hooks/pre-receive executable
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project project/tor-browser/fastlane to category Migrated to GitLab
INFO: skipping project project/bridges/bridgedb-admin in category Migrated to GitLab
[...]
In the above, you can see migrated repositories skipped then the fastlane project being archived into GitLab. Another example with a later version of the script, processing only user repositories and showing the interactive prompt and a force-push into a fork:
$ fab -H cupani.torproject.org  gitolite.mass-repos-migration --include 'user/.*' --exclude '.*tor-?browser.*'
INFO: skipping project user/aagbsn/bridgedb in category Migrated to GitLab
[...]
INFO: skipping project user/phw/atlas in category Migrated to GitLab
INFO: processing project user/phw/obfsproxy (Philipp's obfsproxy repository) in category Users' development repositories (Attic)
INFO: Successfully connected to https://gitlab.torproject.org
INFO: user repository detected, trying to find fork phw/obfsproxy
WARNING: no existing fork found, entering user fork subroutine
INFO: found 6 GitLab projects matching 'obfsproxy' (https://gitweb.torproject.org/user/phw/obfsproxy.git)
0 legacy/gitolite/debian/obfsproxy
1 legacy/gitolite/debian/obfsproxy-legacy
2 legacy/gitolite/user/asn/obfsproxy
3 legacy/gitolite/user/ioerror/obfsproxy
4 tpo/anti-censorship/pluggable-transports/obfsproxy
5 tpo/anti-censorship/pluggable-transports/obfsproxy-legacy
select parent to fork from, or enter to abort: ^G4
INFO: repository is not empty: in-pack: 2104, packs: 1, size-pack: 414
fork project tpo/anti-censorship/pluggable-transports/obfsproxy into legacy/gitolite/user/phw/obfsproxy^G [Y/n] 
INFO: loading project tpo/anti-censorship/pluggable-transports/obfsproxy
INFO: forking project user/phw/obfsproxy into namespace legacy/gitolite/user/phw
INFO: waiting for fork to complete...
INFO: fork status: started, sleeping...
INFO: fork finished
INFO: cloning and force pushing from user/phw/obfsproxy to legacy/gitolite/user/phw/obfsproxy
INFO: deleting branch protection: <class 'gitlab.v4.objects.branches.ProjectProtectedBranch'> =>  'id': 2723, 'name': 'master', 'push_access_levels': [ 'id': 2864, 'access_level': 40, 'access_level_description': 'Maintainers', 'deploy_key_id': None ], 'merge_access_levels': [ 'id': 2753, 'access_level': 40, 'access_level_description': 'Maintainers' ], 'allow_force_push': False 
INFO: cloning repository git-rw.torproject.org:/srv/git.torproject.org/repositories/user/phw/obfsproxy.git in /tmp/tmp6orvjggy/user/phw/obfsproxy
Cloning into bare repository '/tmp/tmp6orvjggy/user/phw/obfsproxy'...
INFO: pushing to GitLab: https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
remote: 
remote: To create a merge request for bug_10887, visit:        
remote:   https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy/-/merge_requests/new?merge_request%5Bsource_branch%5D=bug_10887        
remote: 
[...]
To ssh://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
 + 2bf9d09...a8e54d5 master -> master (forced update)
 * [new branch]      bug_10887 -> bug_10887
[...]
INFO: migrating repo
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/obfsproxy.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/obfsproxy
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/obfsproxy to category Migrated to GitLab
INFO: processing project user/phw/scramblesuit (Philipp's ScrambleSuit repository) in category Users' development repositories (Attic)
INFO: user repository detected, trying to find fork phw/scramblesuit
WARNING: no existing fork found, entering user fork subroutine
WARNING: no matching gitlab project found for user/phw/scramblesuit
INFO: user fork subroutine failed, resuming normal procedure
INFO: searching for projects matching scramblesuit
import gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'?^G [Y/n] 
INFO: checking if remote repo https://git.torproject.org/user/phw/scramblesuit exists
INFO: importing gitolite project user/phw/scramblesuit into gitlab legacy/gitolite/user/phw/scramblesuit with desc 'Philipp's ScrambleSuit repository'
INFO: importing project into GitLab
INFO: Successfully connected to https://gitlab.torproject.org
INFO: loading group legacy/gitolite/user/phw
INFO: creating repository scramblesuit (scramblesuit) in namespace legacy/gitolite/user/phw from https://git.torproject.org/user/phw/scramblesuit into https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: archiving project
INFO: migrating Gitolite repository https://gitweb.torproject.org/user/phw/scramblesuit.git to GitLab project https://gitlab.torproject.org/legacy/gitolite/user/phw/scramblesuit
INFO: adding entry to rewrite_map /home/anarcat/src/tor/tor-puppet/modules/profile/files/git/gitolite2gitlab.txt
INFO: modifying gitolite.conf to add: "config gitweb.category = Migrated to GitLab"
INFO: rewriting gitolite config /home/anarcat/src/tor/gitolite-admin/conf/gitolite.conf to change project user/phw/scramblesuit to category Migrated to GitLab
[...]
Acute eyes will notice the bell used as a notification mechanism as well in this transcript. A lot of the code is now useless for us, but some, like "commit and push" or is-repo-empty live on in the git module and, of course, the gitlab module has grown some legs along the way. We've also found fun bugs, like a file descriptor exhaustion in bash, among other oddities. The retirement milestone and issue 41215 has a detailed log of the migration, for those curious. This was a challenging project, but it feels nice to have this behind us. This gets rid of 2 of the 4 remaining machines running Debian "old-old-stable", which moves a bit further ahead in our late bullseye upgrades milestone. Full transparency: we tested GPT-3.5, GPT-4, and other large language models to see if they could answer the question "write a set of rewrite rules to redirect GitWeb to GitLab". This has become a standard LLM test for your faithful writer to figure out how good a LLM is at technical responses. None of them gave an accurate, complete, and functional response, for the record. The actual rewrite rules as of this writing follow, for humans that actually like working answers provided by expert humans instead of artificial intelligence which currently seem to be, glorified, mansplaining interns.

git.torproject.org rewrite rules Those rules are relatively simple in that they rewrite a single URL to its equivalent GitLab counterpart in a 1:1 fashion. It relies on the rewrite map mentioned above, of course.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# if this becomes a performance bottleneck, convert to a DBM map with:
#
#  $ httxt2dbm -i mapfile.txt -o mapfile.map
#
# and:
#
# RewriteMap mapname "dbm:/etc/apache/mapfile.map"
#
# according to reports lavamind found online, we hit such a
# performance bottleneck only around millions of entries, which is not our case
# those two rules can go away once all the projects are
# migrated to GitLab
#
# this matches the request URI so we can check the RewriteMap
# for a match next
#
# WARNING: this won't match URLs without .git in them, which
# *do* work now. one possibility would be to match the request
# URI (without query string!) with:
#
# /git/(.*)(.git)?/(((branches hooks info objects/).*) git-.* upload-pack receive-pack HEAD config description)?.
#
# I haven't been able to figure out the actual structure of
# those URLs, so it's really hard to figure out the boundaries
# of the project name here. I stopped after pouring around the
# http-backend.c code in git
# itself. https://www.git-scm.com/docs/http-protocol is also
# kind of incomplete and unsatisfying.
RewriteCond % REQUEST_URI  ^/(git/)?(.*).git/.*$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%2 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(git/)?(.*).git/(.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$2 .git/$3 [R=302,L]
# Fallback everything else to GitLab
RewriteRule (.*) https://gitlab.torproject.org [R=302,L]

gitweb.torproject.org rewrite rules Those are the vastly more complicated GitWeb to GitLab rewrite rules. Note that we say "GitWeb" but we were actually not running GitWeb but cgit, as the former didn't actually scale for us.
RewriteEngine on
# this RewriteMap connects the gitweb projects to their GitLab
# equivalent
RewriteMap gitolite2gitlab "txt:/etc/apache2/gitolite2gitlab.txt"
# special rule to process targets of the old spec.tpo site and
# bring them to the right redirect on the new spec.tpo site. that should turn, for example:
#
# https://gitweb.torproject.org/torspec.git/tree/address-spec.txt
#
# into:
#
# https://spec.torproject.org/address-spec
RewriteRule ^/torspec.git/tree/(.*).txt$ https://spec.torproject.org/$1 [R=302]
# list of endpoints taken from cgit's cmd.c
# those two RewriteCond are necessary because we don't move
# all repositories at once. once the migration is completed,
# they can be removed.
#
# and yes, they are copied all over the place below
#
# create a match for the project name to check if the project
# has been moved to GitLab
RewriteCond % REQUEST_URI  ^/(.*).git(/.*)?$
# this makes the RewriteRule match only if there's a match in
# the rewrite map
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
# main project page, like summary below
RewriteRule ^/(.*).git/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# summary
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/summary/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# about
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/about/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 / [R=302,L]
# commit
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond "% QUERY_STRING " "(.*(?:^ &))id=([^&]*)(&.*)?$"
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/commit/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
# diff, incomplete because can diff arbitrary refs and files in cgit but not in GitLab, hard to parse
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/diff/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1 [R=302,L,QSD]
# patch
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/patch/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.patch [R=302,L,QSD]
# rawdiff, incomplete because can show only one file diff, which GitLab cannot
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/rawdiff/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commit/%1.diff [R=302,L,QSD]
# log
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/log(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD$2 [R=302,L]
# atom
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/%1 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/atom/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/commits/HEAD [R=302,L,QSD]
# refs, incomplete because two pages in GitLab, defaulting to "tags"
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/refs/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags [R=302,L]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/tag/? https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tags/%1 [R=302,L,QSD]
# tree
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  id=([^&]*)
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD$2 [R=302,L]
# /-/tree has no good default in GitLab, revert to HEAD which is a good
# approximation (we can't assume "master" here anymore)
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/tree/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/tree/HEAD [R=302,L]
# plain
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteCond % QUERY_STRING  h=([^&]*)
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/%1$2 [R=302,L,QSD]
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/plain(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/raw/HEAD$2 [R=302,L]
# blame: disabled
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteCond % QUERY_STRING  h=([^&]*)
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/%1$2 [R=302,L,QSD]
# same default as tree above
#RewriteCond % REQUEST_URI  ^/(.*).git/.*$
#RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
#RewriteRule ^/(.*).git/blame(/?.*)$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/blame/HEAD/$2 [R=302,L]
# stats
RewriteCond % REQUEST_URI  ^/(.*).git/.*$
RewriteCond $ gitolite2gitlab:%1 NOT_FOUND  !NOT_FOUND
RewriteRule ^/(.*).git/stats/?$ https://gitlab.torproject.org/$ gitolite2gitlab:$1 /-/graphs/HEAD [R=302,L]
# still TODO:
# repolist: once migration is complete
#
# cannot be done:
# atom: needs a feed token, user must be logged in
# blob: no direct equivalent
# info: not working on main cgit website?
# ls_cache: not working, irrelevant?
# objects: undocumented?
# snapshot: pattern too hard to match on cgit's side
# special case, we keep a copy of the main index on the archive
RewriteRule ^/?$ https://archive.torproject.org/websites/gitweb.torproject.org.html [R=302,L]
# Fallback: everything else to GitLab
RewriteRule .* https://gitlab.torproject.org [R=302,L]
The reference copy of those is available in our (currently private) Puppet git repository.

19 April 2024

Louis-Philippe V ronneau: Montreal's Debian & Stuff - March 2024

Time really flies when you are really busy you have fun! Our Montr al Debian User Group met on Sunday March 31st and I only just found the time to write our report :) This time around, 9 of us we met at EfficiOS's offices1 to chat, hang out and work on Debian and other stuff! Here is what we did: pollo: tvaz: tassia: viashimo: lavamind: justin: Pictures Here are pictures of the event. Well, one picture (thanks Tassia!) of the event itself and another one of the crisp Italian lager I drank at the bar after the event :) People at the event working around a long table A glass of beer illuminated by sunlight

  1. Maintainers, amongst other things, of the great LTTng.

8 March 2024

Louis-Philippe V ronneau: Acts of active procrastination: example of a silly Python script for Moodle

My brain is currently suffering from an overload caused by grading student assignments. In search of a somewhat productive way to procrastinate, I thought I would share a small script I wrote sometime in 2023 to facilitate my grading work. I use Moodle for all the classes I teach and students use it to hand me out their papers. When I'm ready to grade them, I download the ZIP archive Moodle provides containing all their PDF files and comment them using xournalpp and my Wacom tablet. Once this is done, I have a directory structure that looks like this:
Assignment FooBar/
  Student A_21100_assignsubmission_file
    graded paper.pdf
    Student A's perfectly named assignment.pdf
    Student A's perfectly named assignment.xopp
  Student B_21094_assignsubmission_file
    graded paper.pdf
    Student B's perfectly named assignment.pdf
    Student B's perfectly named assignment.xopp
  Student C_21093_assignsubmission_file
    graded paper.pdf
    Student C's perfectly named assignment.pdf
    Student C's perfectly named assignment.xopp
 
Before I can upload files back to Moodle, this directory needs to be copied (I have to keep the original files), cleaned of everything but the graded paper.pdf files and compressed in a ZIP. You can see how this can quickly get tedious to do by hand. Not being a complete tool, I often resorted to crafting a few spurious shell one-liners each time I had to do this1. Eventually I got tired of ctrl-R-ing my shell history and wrote something reusable. Behold this script! When I began writing this post, I was certain I had cheaped out on my 2021 New Year's resolution and written it in Shell, but glory!, it seems I used a proper scripting language instead.
#!/usr/bin/python3
# Copyright (C) 2023, Louis-Philippe V ronneau <pollo@debian.org>
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU General Public License as published by
# the Free Software Foundation, either version 3 of the License, or
# (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
"""
This script aims to take a directory containing PDF files exported via the
Moodle mass download function, remove everything but the final files to submit
back to the students and zip it back.
usage: ./moodle-zip.py <target_dir>
"""
import os
import shutil
import sys
import tempfile
from fnmatch import fnmatch
def sanity(directory):
    """Run sanity checks before doing anything else"""
    base_directory = os.path.basename(os.path.normpath(directory))
    if not os.path.isdir(directory):
        sys.exit(f"Target directory  directory  is not a valid directory")
    if os.path.exists(f"/tmp/ base_directory .zip"):
        sys.exit(f"Final ZIP file path '/tmp/ base_directory .zip' already exists")
    for root, dirnames, _ in os.walk(directory):
        for dirname in dirnames:
            corrige_present = False
            for file in os.listdir(os.path.join(root, dirname)):
                if fnmatch(file, 'graded paper.pdf'):
                    corrige_present = True
            if corrige_present is False:
                sys.exit(f"Directory  dirname  does not contain a 'graded paper.pdf' file")
def clean(directory):
    """Remove superfluous files, to keep only the graded PDF"""
    with tempfile.TemporaryDirectory() as tmp_dir:
        shutil.copytree(directory, tmp_dir, dirs_exist_ok=True)
        for root, _, filenames in os.walk(tmp_dir):
            for file in filenames:
                if not fnmatch(file, 'graded paper.pdf'):
                    os.remove(os.path.join(root, file))
        compress(tmp_dir, directory)
def compress(directory, target_dir):
    """Compress directory into a ZIP file and save it to the target dir"""
    target_dir = os.path.basename(os.path.normpath(target_dir))
    shutil.make_archive(f"/tmp/ target_dir ", 'zip', directory)
    print(f"Final ZIP file has been saved to '/tmp/ target_dir .zip'")
def main():
    """Main function"""
    target_dir = sys.argv[1]
    sanity(target_dir)
    clean(target_dir)
if __name__ == "__main__":
    main()
If for some reason you happen to have a similar workflow as I and end up using this script, hit me up? Now, back to grading...

  1. If I recall correctly, the lazy way I used to do it involved copying the directory, renaming the extension of the graded paper.pdf files, deleting all .pdf and .xopp files using find and changing graded paper.foobar back to a PDF. Some clever regex or learning awk from the ground up could've probably done the job as well, but you know, that would have required using my brain and spending spoons...

8 February 2024

Reproducible Builds: Reproducible Builds at FOSDEM 2024

Core Reproducible Builds developer Holger Levsen presented at the main track at FOSDEM on Saturday 3rd February this year in Brussels, Belgium. Titled Reproducible Builds: The First Ten Years
In this talk Holger h01ger Levsen will give an overview about Reproducible Builds: How it started with a small BoF at DebConf13 (and before), then grew from being a Debian effort to something many projects work on together, until in 2021 it was mentioned in an Executive Order of the President of the United States. And of course, the talk will not end there, but rather outline where we are today and where we still need to be going, until Debian stable (and other distros!) will be 100% reproducible, verified by many. h01ger has been involved in reproducible builds since 2014 and so far has set up automated reproducibility testing for Debian, Fedora, Arch Linux, FreeBSD, NetBSD and coreboot.
More information can be found on FOSDEM s own page for the talk, including a video recording and slides.
Separate from Holger s talk, however, there were a number of other talks about reproducible builds at FOSDEM this year: and there was even an entire track on Software Bill of Materials.

6 February 2024

Louis-Philippe V ronneau: Montreal's Debian & Stuff - February 2024

New Year, Same Great People! Our Debian User Group met for the first of our 2024 bi-monthly meetings on February 4th and it was loads of fun. Around twelve different people made it this time to Koumbit, where the meeting happened. As a reminder, our meetings are called "Debian & Stuff" because we want to be as open as possible and welcome people that want to work on "other stuff" than Debian. Here is what we did: pollo: LeLutin: mjeanson: lavamind: viashimo: tvaz & tassia: joeDoe: anarcat: Pictures I was pretty busy this time around and ended up not taking a lot of pictures. Here's a bad one of the ceiling at Koumbit I took, and a picture by anarcat of the content of his boxes of loot: A picture of the ceiling at Koumbit The content of anarcat's boxes of loot

24 January 2024

Louis-Philippe V ronneau: Montreal Subway Foot Traffic Data, 2023 edition

For the fifth year in a row, I've asked Soci t de Transport de Montr al, Montreal's transit agency, for the foot traffic data of Montreal's subway. By clicking on a subway station, you'll be redirected to a graph of the station's foot traffic. Licences

9 January 2024

Louis-Philippe V ronneau: 2023 A Musical Retrospective

I ended 2022 with a musical retrospective and very much enjoyed writing that blog post. As such, I have decided to do the same for 2023! From now on, this will probably be an annual thing :) Albums In 2023, I added 73 new albums to my collection nearly 2 albums every three weeks! I listed them below in the order in which I acquired them. I purchased most of these albums when I could and borrowed the rest at libraries. If you want to browse though, I added links to the album covers pointing either to websites where you can buy them or to Discogs when digital copies weren't available. Once again this year, it seems that Punk (mostly O !) and Metal dominate my list, mostly fueled by Angry Metal Guy and the amazing Montr al Skinhead/Punk concert scene. Concerts A trend I started in 2022 was to go to as many concerts of artists I like as possible. I'm happy to report I went to around 80% more concerts in 2023 than in 2022! Looking back at my list, April was quite a busy month... Here are the concerts I went to in 2023: Although metalfinder continues to work as intended, I'm very glad to have discovered the Montr al underground scene has departed from Facebook/Instagram and adopted en masse Gancio, a FOSS community agenda that supports ActivityPub. Our local instance, askapunk.net is pretty much all I could ask for :) That's it for 2023!

20 December 2023

Ulrike Uhlig: How volunteer work in F/LOSS exacerbates pre-existing lines of oppression, and what that has to do with low diversity

This is a post I wrote in June 2022, but did not publish back then. After first publishing it in December 2023, a perfectionist insecure part of me unpublished it again. After receiving positive feedback, i slightly amended and republish it now. In this post, I talk about unpaid work in F/LOSS, taking on the example of hackathons, and why, in my opinion, the expectation of volunteer work is hurting diversity. Disclaimer: I don t have all the answers, only some ideas and questions.

Previous findings In 2006, the Flosspols survey searched to explain the role of gender in free/libre/open source software (F/LOSS) communities because an earlier [study] revealed a significant discrepancy in the proportion of men to women. It showed that just about 1.5% of F/LOSS community members were female at that time, compared with 28% in proprietary software (which is also a low number). Their key findings were, to name just a few:
  • that F/LOSS rewards the producing code rather than the producing software. It thereby puts most emphasis on a particular skill set. Other activities such as interface design or documentation are understood as less technical and therefore less prestigious.
  • The reliance on long hours of intensive computing in writing successful code means that men, who in general assume that time outside of waged labour is theirs , are freer to participate than women, who normally still assume a disproportionate amount of domestic responsibilities. Female F/LOSS participants, however, seem to be able to allocate a disproportionate larger share of their leisure time for their F/LOSS activities. This gives an indication that women who are not able to spend as much time on voluntary activities have difficulties to integrate into the community.
We also know from the 2016 Debian survey, published in 2021, that a majority of Debian contributors are employed, rather than being contractors, and rather than being students. Also, 95.5% of respondents to that study were men between the ages of 30 and 49, highly educated, with the largest groups coming from Germany, France, USA, and the UK. The study found that only 20% of the respondents were being paid to work on Debian. Half of these 20% estimate that the amount of work on Debian they are being paid for corresponds to less than 20% of the work they do there. On the other side, there are 14% of those who are being paid for Debian work who declared that 80-100% of the work they do in Debian is remunerated.

So, if a majority of people is not paid, why do they work on F/LOSS? Or: What are the incentives of free software? In 2021, Louis-Philippe V ronneau aka Pollo, who is not only a Debian Developer but also an economist, published his thesis What are the incentive structures of free software (The actual thesis was written in French). One very interesting finding Pollo pointed out is this one:
Indeed, while we have proven that there is a strong and significative correlation between the income and the participation in a free/libre software project, it is not possible for us to pronounce ourselves about the causality of this link.
In the French original text:
En effet, si nous avons prouv qu il existe une corr lation forte et significative entre le salaire et la participation un projet libre, il ne nous est pas possible de nous prononcer sur la causalit de ce lien.
Said differently, it is certain that there is a relationship between income and F/LOSS contribution, but it s unclear whether working on free/libre software ultimately helps finding a well paid job, or if having a well paid job is the cause enabling work on free/libre software. I would like to scratch this question a bit further, mostly relying on my own observations, experiences, and discussions with F/LOSS contributors.

Volunteer work is unpaid work We often hear of hackathons, hack weeks, or hackfests. I ve been at some such events myself, Tails organized one, the IETF regularly organizes hackathons, and last week (June 2022!) I saw an invitation for a hack week with the Torproject. This type of event generally last several days. While the people who organize these events are being paid by the organizations they work for, participants on the other hand are generally joining on a volunteer basis. Who can we expect to show up at this type of event under these circumstances as participants? To answer this question, I collected some ideas:
  • people who have an employer sponsoring their work
  • people who have a funder/grant sponsoring their work
  • people who have a high income and can take time off easily (in that regard, remember the Gender Pay Gap, women often earn less for the same work than men)
  • people who rely on family wealth (living off an inheritance, living on rights payments from a famous grandparent - I m not making these situations up, there are actual people in such financially favorable situations )
  • people who don t need much money because they don t have to pay rent or pay low rent (besides house owners that category includes people who live in squats or have social welfare paying for their rent, people who live with parents or caretakers)
  • people who don t need to do care work (for children, elderly family members, pets. Remember that most care work is still done by women.)
  • students who have financial support or are in a situation in which they do not yet need to generate a lot of income
  • people who otherwise have free time at their disposal
So, who, in your opinion, fits these unwritten requirements? Looking at this list, it s pretty clear to me why we d mostly find white men from the Global North, generally with higher education in hackathons and F/LOSS development. ( Great, they re a culture fit! ) Yes, there will also always be some people of marginalized groups who will attend such events because they expect to network, to find an internship, to find a better job in the future, or to add their participation to their curriculum. To me, this rings a bunch of alarm bells.

Low diversity in F/LOSS projects a mirror of the distribution of wealth I believe that the lack of diversity in F/LOSS is first of all a mirror of the distribution of wealth on a larger level. And by wealth I m referring to financial wealth as much as to social wealth in the sense of Bourdieu: Families of highly educated parents socially reproducing privilege by allowing their kids to attend better schools, supporting and guiding them in their choices of study and work, providing them with relations to internships acting as springboards into well paid jobs and so on. That said, we should ask ourselves as well:

Do F/LOSS projects exacerbate existing lines of oppression by relying on unpaid work? Let s look again at the causality question of Pollo s research (in my words):
It is unclear whether working on free/libre software ultimately helps finding a well paid job, or if having a well paid job is the cause enabling work on free/libre software.
Maybe we need to imagine this cause-effect relationship over time: as a student, without children and lots of free time, hopefully some money from the state or the family, people can spend time on F/LOSS, collect experience, earn recognition - and later find a well-paid job and make unpaid F/LOSS contributions into a hobby, cementing their status in the community, while at the same time generating a sense of well-being from working on the common good. This is a quite common scenario. As the Flosspols study revealed however, boys often get their own computer at the age of 14, while girls get one only at the age of 20. (These numbers might be slightly different now, and possibly many people don t own an actual laptop or desktop computer anymore, instead they own mobile devices which are not exactly inciting them to look behind the surface, take apart, learn, appropriate technology.) In any case, the above scenario does not allow for people who join F/LOSS later in life, eg. changing careers, to find their place. I believe that F/LOSS projects cannot expect to have more women, people of color, people from working class backgrounds, people from outside of Germany, France, USA, UK, Australia, and Canada on board as long as volunteer work is the status quo and waged labour an earned privilege.

Wait, are you criticizing all these wonderful people who sacrifice their free time to work towards common good? No, that s definitely not my intention, I m glad that F/LOSS exists, and the F/LOSS ecosystem has always represented a small utopia to me that is worth cherishing and nurturing. However, I think we still need to talk more about the lack of diversity, and investigate it further.

Some types of work are never being paid Besides free work at hacking events, let me also underline that a lot of work in F/LOSS is not considered payable work (yes, that s an oxymoron!). Which F/LOSS project for example, has ever paid translators a decent fee? Which project has ever considered that doing the social glue work, often done by women in the projects, is work that should be paid for? Which F/LOSS projects pay the people who do their Debian packaging rather than relying on yet another already well-paid white man who can afford doing this work for free all the while holding up how great the F/LOSS ecosystem is? And how many people on opensourcedesign jobs are looking to get their logo or website done for free? (Isn t that heart icon appealing to your altruistic empathy?) In my experience even F/LOSS projects which are trying to do the right thing by paying everyone the same amount of money per hour run into issues when it turns out that not all hours are equal and that some types of work do not qualify for remuneration at all or that the rules for the clocking of work are not universally applied in the same way by everyone.

Not every interaction should have a monetary value, but Some of you want to keep working without being paid, because that feels a bit like communism within capitalism, it makes you feel good to contribute to the greater good while not having the system determine your value over money. I hear you. I ve been there (and sometimes still am). But as long as we live in this system, even though we didn t choose to and maybe even despise it - communism is not about working for free, it s about getting paid equally and adequately. We may not think about it while under the age of 40 or 45, but working without adequate financial compensation, even half of the time, will ultimately result in not being able to care for oneself when sick, when old. And while this may not be an issue for people who inherit wealth, or have an otherwise safe economical background, eg. an academic salary, it is a huge problem and barrier for many people coming out of the working or service classes. (Oh and please, don t repeat the neoliberal lie that everyone can achieve whatever they aim for, if they just tried hard enough. French research shows that (in France) one has only 30% chance to become a class defector , and change social class upwards. But I managed to get out and move up, so everyone can! - well, if you believe that I m afraid you might be experiencing survivor bias.)

Not all bodies are equally able We should also be aware that not all of us can work with the same amount of energy either. There is yet another category of people who are excluded by the expectation of volunteer work, either because the waged labour they do already eats all of their energy, or because their bodies are not disposed to do that much work, for example because of mental health issues - such as depression-, or because of physical disabilities.

When organizing events relying on volunteer work please think about these things. Yes, you can tell people that they should ask their employer to pay them for attending a hackathon - but, as I ve hopefully shown, that would not do it for many people, especially newcomers. Instead, you could propose a fund to make it possible that people who would not normally attend can attend. DebConf is a good example for having done this for many years.

Conclusively I would like to urge free software projects that have a budget and directly pay some people from it to map where they rely on volunteer work and how this hurts diversity in their project. How do you or your project exacerbate pre-existing lines of oppression by granting or not granting monetary value to certain types of work? What is it that you take for granted? As always, I m curious about your feedback!

Worth a read These ideas are far from being new. Ashe Dryden s well-researched post The ethics of unpaid labor and the OSS community dates back to 2013 and is as important as it was ten years ago.

5 December 2023

Louis-Philippe V ronneau: Montreal's Debian & Stuff - November 2023

Hello from a snowy Montr al! My life has been pretty busy lately1 so please forgive this late report. On November 19th, our local Debian User Group met at Montreal's most prominent hackerspace, Foulab. We've been there a few times already, but since our last visit, Foulab has had some membership/financial troubles. Happy to say things are going well again and a new team has taken over the space. This meetup wasn't the most productive day for me (something about being exhausted apparently makes it hard to concentrate), but other people did a bunch of interesting stuff :) Pictures Here are a bunch of pictures I took! Foulab is always a great place to snap quirky things :) A sign on a whiteboard that says 'Bienvenue aux laboratoires qui rends fou' The entrance of the bio-hacking house, with a list of rules An exploded keyboard with a 'Press F1 to continue' sign An inflatable Tux with a Foulab T-Shirt A picture of the woodworking workshop

  1. More busy than the typical end of semester rush... At work, we are currently renegotiating our collective bargaining agreement and things aren't going so well. We went on strike for a few days already and we're planning on another 7 days starting on Friday 8th.

7 October 2023

Louis-Philippe V ronneau: Montreal's Debian & Stuff - "September" 2023

Last Sunday, our local Debian user group gathered to chat, to work on Debian and to do other, non-Debian related hacking. A "Debian & Stuff"! It had been a while since we held a proper meetup. Our last event was the Montreal BSP we organised back in March 2023... We somewhat missed the window for a June meetup and summer events never seem to gather a good crowd, so I didn't try to organise one. All this to say it was nice to see folks from the Montreal Debian community :) This event was also the first time we were hosted by L'Espace des possibles - Petite Patrie, a social venue that aims to provide a space for not-for-profit activities, like repair caf s, sowing classes, board game nights, etc. It was really nice and we will surely meet there again in the future. A group picture during the event Many people came to the event, including some new ones. Although people always tend to come and go during the day, a total of 12 people attended the event. As always, people worked on very different projects! One of the focus of this D&S was assembling AirGradient DIY basic kits. Our local community has been talking a lot about air quality metrics in the past few months1. Tiago thus decided to have a company print the PCBs for this kit and graciously gave away a few spares. Michael then took upon himself to order parts on AliExpress and a few of us ended up soldering the kits together while chatting. An AirGradient DIY basic kit, semi-assembled Otherwise, some Debian work was also done: The whole event was super fun, the tacos we had for lunch were delicious (and very authentic!), and we ended up at a local microbrewery to share a pint later in the evening. Looking forward to the next event!

  1. Mostly as a result of the large forest fires in Canada this summer. I myself blogged twice about air quality-related projects recently.

27 September 2023

Bits from Debian: New Debian Developers and Maintainers (July and August 2023)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

4 August 2023

Louis-Philippe V ronneau: pymonitair: Air Quality Monitoring Display with MicroPython

I've never been a fan of IoT devices for obvious reasons: not only do they tend to be excellent at being expensive vendor locked-in machines, but far too often, they also end up turning into e-waste after a short amount of time. Manufacturers can go out of business or simply decide to shut down the cloud servers for older models, and then you're stuck with a brick. Well, this all changes today, as I've built my first IoT device and I love it. Introducing pymonitair. What pymonitair is a MicroPython project that aims to display weather data from a home weather station (like the ones sold by AirGradient) on a small display. The source code was written for the Raspberry Pi Pico W, the Waveshare Pico OLED 1.3 display and the RevolvAir Revo 1 weather station, but can be adapted to other displays and stations easily, as I tried to keep the code as modular as possible. The general MicroPython code itself isn't specific to the Raspberry Pi Pico and shouldn't need to be modified for other boards. pymonitair features: Here's a demo of me scrolling through the different pages and (somewhat failing) to turn the screen on and off: Why? If you follow my blog, you'll know that my last entry was about building a set of tools to collect and graph data from a weather station my neighbor set up. Why on Earth would I need a separate device to show this data, when the website I've built works perfectly fine and is accessible on any computer or smartphone? Mostly alerts. When the air quality here dropped following forest fires, I found out keeping track of if I had to close my windows and bunker down was quite a hassle. Air quality would degrade during the day and I would only notice it hours later. With the pymonitair, I'll have a little screen flashing angrily at me whenever this happens. A simpler solution would probably have been to forgo hardware altogether and code some icinga2 alert to ping me over Signal whenever the air quality got bad. Hacking on pymonitair was mostly a way to learn to use MicroPython and familiarize myself with this type of embedded hardware device. I'll surely blog about this later this year, but I plan to use a very similar stack to mod my apartment's HVAC unit to stop pulling air from outside when an air quality sensor detects cigarette smoke (or bad air quality in general). Things I've learnt This project was super fun and taught me many things:

  1. PM1, PM2.5, PM10, Temperature, Humidity and Pressure
  2. Part of the screen will flash repeatedly
  3. I did look for other solutions to transfer files to the board, but none of them were actually maintained. I nearly finished packaging ampy before realising it was officially unmaintained and its main alternative, rshell, has had its last release in December 2021. When I caught myself seriously considering writing a script to transfer files over the serial link, I gave up and decided thonny was not that bad after all.

1 August 2023

Louis-Philippe V ronneau: Weather Station Data Visualisations Using R and Python

A few weeks ago, my friend and neighbor J r me (aka lavamind) installed a weather station on his balcony and started collecting data from it. It has been quite useful to measure the degrading air quality during the recent forest fires plaguing northern Canada, but sadly, the hardware itself isn't great. Whereas some projects like airgradient offer open hardware devices running free software, the station we got is from RevolvAir, some kind of local air monitoring project that aims to be a one-stop solution for exterior air monitoring. Not only is their device pretty expensive1, but it also reboots frequently by itself. Even worse, their online data map requires an account to view the data and the interface is bad, unintuitive and only stores data up to a month. Having a good background in data visualisation and statistics thanks to my master's degree in economics, I decided I could do better. Two days later, I had built a series of tools to collect, analyse and graph the JSON time series data provided by the device. The result is a very simple website that works without any JavaScript, leveraging static graphs built using R. Modern web libraries and projects offer an incredible wealth of tools to graph and visualise data, but as for most of my web projects, I wanted something static and simple. The source code for the project can be found here, and although it is somewhat specific to the data structure provided by the RevolvAir device, it could easily be adapted to other devices, as they tend to have very similar JSON dumps.

  1. around 300 CAD, whereas a similar station from airgradient costs around 90 CAD. Thankfully, this station was a gift from a local group mobilising against an industrial project near our housing cooperative and we didn't have to pay for it ourselves.

27 July 2023

Louis-Philippe V ronneau: My new friend Ted

About 6 months ago, I decided to purchase a bike trailer. I don't drive and although I also have a shopping caddy, it often can't handle a week's groceries. The trailer, attached to my bike, hauling two storage crates Since the goal for the trailer was to haul encumbering and heavy loads, I decided to splurge and got a Surly Ted. The 32" x 24" flat bed is very versatile and the trailer is rated for up to 300 lbs (~135 kg). At around 30 lbs (~13.5 kg), the trailer itself is light enough for me to climb up the stairs to my apartment with it. Having seldom driven a bike trailer before, I was at first worried about its handling and if it would jerk me around (as some children's chariots tend to). I'm happy to report the two pronged hitch Surly designed works very well and lets you do 180 turns effortlessly. The trailer on my balcony, ready to be hoisted So far, I've used the trailer to go grocery shopping, buy bulk food and haul dirt and mulch. To make things easier, I've purchased two 45L storing crates from Home Depot and added two planks of wood on each side of the trailer to stabilise things when I strap the crates down to the bed. Since my partner and I are subscribed to an organic farmer's box during the summer and get baskets from Lufa during the winter, picking up our groceries at the pick-up point is as easy as dumping our order in the storing crates and strapping them back to the trailer. The pulleys on the roof of my balcony Although my housing cooperative has a (small) indoor bicycle parking space, my partner uses our spot during the summer, which means I have to store the trailer on my balcony. To make things more manageable and free up some space, I set up a system of pulleys to hoist the trailer up the air when it's not in use. I did go through a few iterations, but I'm pretty happy with the current 8 pulleys block and tackle mechanism I rigged. The trailer hoisted to the roof of the balcony All and all, this trailer wasn't cheap, but I regret nothing. Knowing Surley's reputation, it will last me many years and not having to drive a car to get around always ends up being the cheaper solution.

23 July 2023

Wouter Verhelst: Debconf Videoteam sprint in Paris, France, 2023-07-20 - 2023-07-23

The DebConf video team has been sprinting in preparation for DebConf 23 which will happen in Kochi, India, in September of this year. Video team sprint Present were Nicolas "olasd" Dandrimont, Stefano "tumbleweed" Rivera, and yours truly. Additionally, Louis-Philippe "pollo" V ronneau and Carl "CarlFK" Karsten joined the sprint remotely from across the pond. Thank you to the DPL for agreeing to fund flights, food, and accomodation for the team members. We would also like to extend a special thanks to the Association April for hosting our sprint at their offices. We made a lot of progress: It is now Sunday the 23rd at 14:15, and while the sprint is coming to an end, we haven't quite finished yet, so some more progress can still be made. Let's see what happens by tonight. All in all, though, we believe that the progress we made will make the DebConf Videoteam's work a bit easier in some areas, and will make things work better in the future. See you in Kochi!

21 June 2023

Louis-Philippe V ronneau: New Keyboard, Who This?

My old Thinkpad X220 has been slowly dying1 and as much as it makes me sad, it is also showing its age in terms of computational power. As such, I've pre-ordered a Framework 13 (the AMD version) and plan to retire my X220 when I get it. One thing I will miss from that laptop is the keyboard. At work, I dock it (on the amazing Thinkpad dock), which lets me use the keyboard while working on a larger monitor. I could probably replicate this setup with the Framework, but I'm not a fan of trackpads. So I built a keyboard. A nice one. One with a trackpoint. The bare board in its box If you follow Debian Planet, you may have seen the Tex Shinobi when Jonathan Dowland featured it on his blog back in January. It is a Tenkeyless board (saving me precious space at work) and is everything you would want from a old-school Thinkpad keyboard replacement. Since I had no previous experience with "Cherry MX"-style keyboard switches2, I decided to go full-bore and buy the "DIY" model that came unpopulated. A 35 switch tester board To know what model of switches I wanted, I bought a nice switch tester and played with it for a few days. After having thoroughly annoyed my SO (turns out 35 different switches on a little board is an incredible fidget toy), I decided to go with the Gateron Aliaz 70g. They are silent tactile switches, similar to the classic Cherry MX Browns, but with a much nicer sound profile and a much stronger actuation force (55g VS 70g). The end result is somewhat "stiff" keyboard that has a nice "THOCC", while still being relatively silent: perfect for a shared office. The keyboard with the switches, but no keycaps The only thing left to do on this keyboard is to replace the three soldered switches that came pre-installed for the mouse buttons. They are Cherry MX Red low-profile switches and are genuinely terrible3. I will be swapping them for Gateron KS-33 low-profile Blue switches when I get the ones I ordered online. My assembled Tex Shinobi Overall, I am very satisfied with this keyboard and I look forward using it daily when schools starts again in September.

  1. The power button sometimes does not work at all (for minutes?) and the laptop has been shutting down randomly (not a heat issue) more and more often...
  2. I am blessed with an IBM M keyboard at home and am in love with those clicky, very loud buckling springs.
  3. Not only are they linear switches (weird choice for mouse buttons), but they are very inconsistent. All three switches feel different and make different sounds.

18 June 2023

Louis-Philippe V ronneau: Solo V2: nice but flawed

EDIT: One of my 2 keys has died. There are what seems like golden bubbles under the epoxy, over one of the chips and those were not there before. I've emailed SoloKeys and I'm waiting for a reply, but for now, I've stopped using the Solo V2 altogether :( I recently received the two Solo V2 hardware tokens I ordered as part of their crowdfunding campaign, back in March 2022. It did take them longer than advertised to ship me the tokens, but that's hardly unexpected from such small-scale, crowdfunded undertaking. I'm mostly happy about my purchase and I'm glad to get rid of the aging Tomu boards I was using as U2F tokens1. Still, beware: I am not sure it's a product I would recommend if what you want is simply something that works. If you do not care about open-source hardware, the Solo V2 is not for you. The Good A side-by-side view of the Solo V2's top and back sides I first want to mention I find the Solo V2 gorgeous. I really like the black and gold color scheme of the USB-A model (which is reversible!) and it seems like a well built and solid device. I'm not afraid to have it on my keyring and I fully expect it to last a long time. An animation of the build process, showing how the PCB is assembled and then slotted into the shell I'm also very impressed by the modular design: the PCB sits inside a shell, which decouples the logic from the USB interface and lets them manufacture a single board for both the USB-C and USB-A models. The clear epoxy layer on top of the PCB module also looks very nice in my opinion. A picture of the Solo V2 with its silicone case on my keyring, showing the 3 capacitive buttons I'm also very happy the Solo V2 has capacitive touch buttons instead of physical "clicky" buttons, as it means the device has no moving parts. The token has three buttons (the gold metal strips): one on each side of the device and a third one near the keyhole. As far as I've seen, the FIDO2 functions seem to work well via the USB interface and do not require any configuration on a Debian 12 machine. I've already migrated to the Solo V2 for web-based 2FA and I am in the process of migrating to an SSH ed25519-sk key. Here is a guide I recommend if you plan on setting those up with a Solo V2. The Bad and the Ugly Sadly, the Solo V2 is far from being a perfect project. First of all, since the crowdfunding campaign is still being fulfilled, it is not currently commercially available. Chances are you won't be able to buy one directly before at least Q4 2023. I've also hit what seems to be a pretty big firmware bug, or at least, one that affects my use case quite a bit. Invoking gpg crashes the Solo V2 completely if you also have scdaemon installed. Since scdaemon is necessary to use gpg with an OpenPGP smartcard, this means you cannot issue any gpg commands (like signing a git commit...) while the Solo V2 is plugged in. Any gpg commands that queries scdaemon, such as gpg --edit-card or gpg --sign foo.txt times out after about 20 seconds and leaves the token unresponsive to both touch and CLI commands. The way to "fix" this issue is to make sure scdaemon does not interact with the Solo V2 anymore, using the reader-port argument:
  1. Plug both your Solo V2 and your OpenPGP smartcard
  2. To get a list of the tokens scdaemon sees, run the following command: $ echo scd getinfo reader_list gpg-connect-agent --decode awk '/^D/ print $2 '
  3. Identify your OpenPGP smartcard. For example, my Nitrokey Start is listed as 20A0:4211:FSIJ-1.2.15-43211613:0
  4. Create a file in ~/.gnupg/scdaemon.conf with the following line reader-port $YOUR_TOKEN_ID. For example, in my case I have: reader-port 20A0:4211:FSIJ-1.2.15-43211613:0
  5. Reload scdaemon: $ gpgconf --reload scdaemon
Although this is clearly a firmware bug2, I do believe GnuPG is also partly to blame here. Let's just say I was not very surprised to have to battle scdaemon again, as I've had previous issues with it. Which leads me to my biggest gripe so far: it seems SoloKeys (the company) isn't really fixing firmware issues anymore and doesn't seems to care. The last firmware release is about a year old. Although people are experiencing serious bugs, there is no official way to report them, which leads to issues being seemingly ignored. For example, the NFC feature is apparently killing keys (!!!), but no one from the company seems to have acknowledged the issue. The same goes for my GnuPG bug, which was flagged in September 2022. For a project that mainly differentiates itself from its (superior) competition by being "Open", it's not a very good look... Although SoloKeys is still an unprofitable open source side business of its creators 3, this kind of attitude certainly doesn't help foster trust. Conclusion If you want to have a nice, durable FIDO2 token, I would suggest you get one of the many models Yubico offers. They are similarly priced, are readily commercially available, are part of a nice and maintained software ecosystem and have more features than the Solo V2 (OpenPGP support being the one I miss the most). Yubikeys are the practical option. What they are not is open-source hardware, whereas the Solo V2 is. As bunnie very well explained on his blog in 2019, it does not mean the later is inherently more trustable than the former, but it does make the Solo V2 the ideological option. Knowledge is power and it should be free. As such, tread carefully with SoloKeys, but don't dismiss them altogether: the Solo V2 is certainly functioning well enough for me.

  1. Although U2F is still part of the FIDO2 specification, the Tomus predate this standard and were thus not fully compliant with FIDO2. So long and thanks for all the fish little boards, you've served me well!
  2. It appears the Solo V2 shares its firmware with the Nitrokey 3, which had a similar issue a while back.
  3. This is a direct quote from one of the Solo V2 firmware maintainers.

29 May 2023

Louis-Philippe V ronneau: Python 3.11, pip and (breaking) system packages

As we get closer to Debian Bookworm's release, I thought I'd share one change in Python 3.11 that will surely affect many people. Python 3.11 implements the new PEP 668, Marking Python base environments as externally managed 1. If you use pip regularly on Debian, it's likely you'll eventually hit the externally-managed-environment error:
error: externally-managed-environment
  This environment is externally managed
 > To install Python packages system-wide, try apt install
    python3-xyz, where xyz is the package you are trying to
    install.
    If you wish to install a non-Debian-packaged Python package,
    create a virtual environment using python3 -m venv path/to/venv.
    Then use path/to/venv/bin/python and path/to/venv/bin/pip. Make
    sure you have python3-full installed.
    If you wish to install a non-Debian packaged Python application,
    it may be easiest to use pipx install xyz, which will manage a
    virtual environment for you. Make sure you have pipx installed.
    See /usr/share/doc/python3.11/README.venv for more information.
note: If you believe this is a mistake, please contact your Python installation or OS distribution provider. You can override this, at the risk of breaking your Python installation or OS, by passing --break-system-packages.
hint: See PEP 668 for the detailed specification.
With this PEP, Python tools can now distinguish between packages that have been installed by the user with a tool like pip and ones installed using a distribution's package manager, like apt. This is generally great news: it was previously too easy to break a system by mixing the two types of packages. This PEP will simplify our role as a distribution, as well as improve the overall Python user experience in Debian. Sadly, it's also likely this change will break some of your scripts, especially CI that (legitimately) install packages via pip alongside system packages. For example, I use the following gitlab-ci snippet to make sure my PRs don't break my build process2:
build:flit:
  stage: build
  script:
  - apt-get update && apt-get install -y flit python3-pip
  - FLIT_ROOT_INSTALL=1 flit install
  - metalfinder --help
With Python 3.11, this snippet will error out, as pip will refuse to install packages alongside the system's. The fix is to tell pip it's OK to "break" your system packages, either using the --break-system-packages parameter, or the PIP_BREAK_SYSTEM_PACKAGES=1 environment variable3. This, of course, is not something you should be using in production to restore the old behavior! The "proper" way to fix this issue, as the externally-managed-environment error message aptly (har har) informs you, is to use virtual environments. Happy hacking!

  1. Kudos to our own Matthias Klose, Stefano Rivera and Elana Hashman, who worked on designing and implementing this PEP!
  2. Which is something that bit me before... You push some changes to your git repository, everything seems fine and all the tests pass, so you merge it and make a new git tag. When the time comes to build and upload this tag to PyPi, you find out some minor thing broke your build system (which you weren't testing) and you have to scramble to make a point-release to fix the issue. Sad!
  3. Don't go searching for this environment variable in pip's code though, as you won't find it! All of pip's command line options can be passed as env vars using the PIP_<UPPER_LONG_NAME> format. Useful for tools that use pip indirectly, like flit.

28 March 2023

kpcyrd: Writing a Linux executable from scratch with x86_64-unknown-none and Rust

I recently mentioned on the internet I did work in this direction and a friend of mine asked me to write a blogpost on this. I didn t blog for a long time (keeping all the goodness for myself hehe), so here we go. To set the scene, let s assume we want to make an exectuable binary for x86_64 Linux that s supposed to be extremely portable. It should work on both Debian and Arch Linux. It should work on systems without glibc like Alpine Linux. It should even work in a FROM scratch Docker container. In a more serious setting you would statically link musl-libc with your Rust program, but today we re in a silly-goofy mood so we re going to try to make this work without a libc. And we re also going to use Rust for this, more specifically the stable release channel of Rust, so this blog post won t use any nightly-only features that might still change/break. If you re using a Rust 1.0 version that was recent at the time of writing or later (>= 1.68.0 according to my computer), you should be able to try this at home just fine . This tutorial assumes you have no prior programming experience in any programming language, but it s going to involve some x86_64 assembly. If you already know what a syscall is, you ll be just fine. If this is your first exposure to programming you might still be able to follow along, but it might be a wild ride. If you haven t already, install rustup (possibly also available in your package manager, who knows?)
# when asked, press enter to confirm default settings
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs   sh
This is going to install everything you need to use Rust on Linux (this tutorial assumes you re following along on Linux btw). Usually it s still using a system linker (by calling the cc binary, and errors out if none is present), but instead we re going to use rustup to install an additional target:
rustup target add x86_64-unknown-none
I don t know if/how this is made available by Linux distributions, so I recommend following along with rust installed from rustup. Anyway, we re creating a new project with cargo, this creates a new directory that we can then change into (you might ve done this before):
cargo new hack-the-planet
cd hack-the-planet
There s going to be a file named Cargo.toml, we don t need to make any changes there, but the one that was auto-generated for me at the time of writing looks like this:
[package]
name = "hack-the-planet"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
There s a second file named src/main.rs, it s going to contain some pre-generated hello world, but we re going to delete it and create a new, empty file:
rm src/main.rs
touch src/main.rs
Alrighty, leaving this file empty is not valid but we re going to walk through the individual steps so we re going to try to build with an empty file first. At this point I would like to credit this chapter of a fasterthanli.me series and a blogpost by Philipp Oppermann, this tutorial is merely an 2023 update and makes it work with stable Rust. Let s run the build:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0463]: can't find crate for  std 
   
  = note: the  x86_64-unknown-none  target may not support the standard library
  = note:  std  is required by  hack_the_planet  because it does not declare  #![no_std] 
error[E0601]:  main  function not found in crate  hack_the_planet 
   
  = note: consider adding a  main  function to  src/main.rs 
Some errors have detailed explanations: E0463, E0601.
For more information about an error, try  rustc --explain E0463 .
error: could not compile  hack-the-planet  due to 2 previous errors
Since this doesn t use a libc (oh right, I forgot to mention this up to this point actually), this also means there s no std standard library. Usually the standard library of Rust still uses the system libc to do syscalls, but since we specify our libc as none this means std won t be available (use std::fs::rename won t work). There are still other functions we can use and import, for example there s core that s effectively a second standard library, but much smaller. To opt-out of the std standard library, we can put #![no_std] into src/main.rs:
#![no_std]
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0601]:  main  function not found in crate  hack_the_planet 
 --> src/main.rs:1:11
   
1   #![no_std]
              ^ consider adding a  main  function to  src/main.rs 
For more information about this error, try  rustc --explain E0601 .
error: could not compile  hack-the-planet  due to previous error
Rust noticed we didn t define a main function and suggest we add one. This isn t what we want though so we ll politely decline and inform Rust we don t have a main and it shouldn t attempt to call it. We re adding #![no_main] to our file and src/main.rs now looks like this:
#![no_std]
#![no_main]
Running the build again:
$ cargo build
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error:  #[panic_handler]  function required, but not found
error: language item required, but not found:  eh_personality 
   
  = note: this can occur when a binary crate with  #![no_std]  is compiled for a target where  eh_personality  is defined in the standard library
  = help: you may be able to compile for a target that doesn't need  eh_personality , specify a target with  --target  or in  .cargo/config 
error: could not compile  hack-the-planet  due to 2 previous errors
Rust is asking us for a panic handler, basically I m going to jump to this address if something goes terribly wrong and execute whatever you put there . Eventually we would put some code there to just exit the program, but for now an infinitely loop will do. This is likely going to get stripped away anyway by the compiler if it notices our program has no code-branches leading to a panic and the code is unused. Our src/main.rs now looks like this:
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.16s
Neat, it worked! What happens if we run it?
$ target/x86_64-unknown-none/release/hack-the-planet
Segmentation fault (core dumped)
Oops. Let s try to disassemble it:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Ok that looks pretty from scratch to me . The file contains no cpu instructions. Also note how our infinity loop is not present (as predicted).

Making a basic program and executing it Ok let s try to make a valid program that basically just cleanly exits. First let s try to add some cpu instructions and verify they re indeed getting executed. Lemme introduce, the INT 3 instruction in x86_64 assembly. In binary it s also known as the 0xCC opcode. It crashes our program in a slightly different way, so if the error message changes, we know it worked. The other tutorials use a #[naked] function for the entry point, but since this feature isn t stabilized at the time of writing we re going to use the global_asm! macro. Also don t worry, I m not going to introduce every assembly instruction individually. Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "int 3"
 
Running the build again (ok basically from now on the build is always going to be expected to work unless I say otherwise):
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.11s
Let s try to disassemble the binary again:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	cc                   	int3
And sure enough, there s a cc instruction that was identified as int3. Let s try to run this:
$ target/x86_64-unknown-none/release/hack-the-planet
Trace/breakpoint trap (core dumped)
The error message of the crash is now slightly different because it s hitting our breakpoint cpu instruction. Funfact btw, if you run this in strace you can see this isn t making any system calls (aka not talking to the kernel at all, it just crashes):
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74f12430d1d8 /* 39 vars */) = 0
--- SIGTRAP  si_signo=SIGTRAP, si_code=SI_KERNEL, si_addr=NULL  ---
+++ killed by SIGTRAP (core dumped) +++
[1]    2796457 trace trap (core dumped)  strace -f ./hack-the-planet
Let s try to make a program that does a clean shutdown. To do this we inform the kernel with a system call that we may like to exit. We can get more info on this with man 2 exit and it defines exit like this:
[[noreturn]] void _exit(int status);
On Linux this syscall is actually called _exit and exit is implemented as a libc function, but we don t care about any of that today, it s going to do the job just fine. Also note how it takes a single argument of type int. In C-speak this means signed 32 bit , i32 in Rust. Next we need to figure out the syscall number of this syscall. These numbers are cpu architecture specific for some reason (idk, idc). We re looking these numbers up with ripgrep in /usr/include/asm/:
$ rg __NR_exit /usr/include/asm
/usr/include/asm/unistd_64.h
64:#define __NR_exit 60
235:#define __NR_exit_group 231
/usr/include/asm/unistd_x32.h
53:#define __NR_exit (__X32_SYSCALL_BIT + 60)
206:#define __NR_exit_group (__X32_SYSCALL_BIT + 231)
/usr/include/asm/unistd_32.h
5:#define __NR_exit 1
253:#define __NR_exit_group 252
Since we re on x86_64 the correct value is the one in unistd_64.h, 60. Also, on x86_64 the syscall number goes into the rax cpu register, the status argument goes in the rdi register. The return value of the syscall is going to be placed in the rax register after the syscall is done, but for exit the execution is never given back to us. Let s try to write 60 into the rax register and 69 into the rdi register. To copy into registers we re going to use the mov destination, source instruction to copy from source to destination. With these registers setup we can use the syscall cpu instruction to hand execution over to the kernel. Don t worry, there s only one more assembly instruction coming and for everything else we re going to use Rust. Our code now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rax, 60",
    "mov rdi, 69",
    "syscall"
 
Build the binary, run it and print the exit code:
$ cargo build --release --target x86_64-unknown-none
$ target/x86_64-unknown-none/release/hack-the-planet; echo $?
69
Nice. Rust is quite literally putting these cpu instructions into the binary for us, nothing else.
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 c7 c0 3c 00 00 00 	mov    $0x3c,%rax
    1217:	48 c7 c7 45 00 00 00 	mov    $0x45,%rdi
    121e:	0f 05                	syscall
Running this with strace shows the program does exactly one thing.
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x70699fe8c908 /* 39 vars */) = 0
exit(69)                                = ?
+++ exited with 69 +++

Writing Rust Ok but even though cpu instructions can be fun at times, I d rather not deal with them most of the time (this might strike you as odd, considering this blog post). Instead let s try to define a function in Rust and call into that instead. We re going to define this function as unsafe (btw none of this is taking advantage of the safety guarantees by Rust in case it wasn t obvious. This tutorial is mostly going to stick to unsafe Rust, but for bigger projects you can attempt to reduce your usage of unsafe to opt back into normal safe Rust), it also declares the function with #[no_mangle] so the function name is preserved as main and we can call it from our global_asm entry point. Lastely, when our program is started it s going to get the stack address passed in one of the cpu registers, this value is expected to be passed to our function as an argument. Our function declares ! as return type, which means it never returns:
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    // TODO: this is missing
 
This won t compile yet, we need to add our assembly for the exit syscall back in.
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
This time we re using the asm! macro, this is a slightly more declarative approach. We want to run the syscall cpu instruction with 60 in the rax register, and this time we want the rdi register to be zero, to indicate a successful exit. We also use options(noreturn) so Rust knows it should assume execution does not resume after this assembly is executed (the Linux kernel guarantees this). We modify our global_asm! entrypoint to call our new main function, and to copy the stack address from rsp into the register for the first argument rdi because it would otherwise get lost forever:
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
Our full program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
After building and disassembling this the Rust compiler is slowly starting to do work for us:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
The mov and syscall instructions are still the same, but it noticed it can XOR the rdi register with itself to set it to zero. It s using x86 assembly language (the 32 bit variant of x86_64, that also happens to work on x86_64) to do so, that s why the register is refered to as edi in the disassembly. You can also see it s inserting a bunch of 0xCC instructions (for alignment) and Rust puts the opcodes 0x0F 0x0B at the end of the function to force an invalid opcode exception so the program is guaranteed to crash in case the exit syscall doesn t do it. This code still executes as expected:
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x72dae7e5dc08 /* 39 vars */) = 0
exit(0)                                 = ?
+++ exited with 0 +++

Adding functions Ok we re getting closer but we aren t quite there yet. Let s try to write an exit function for our assembly that we can then call like a normal function. Remember that it takes a signed 32 bit integer that s supposed to go into rdi.
unsafe fn exit(status: i32) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") status,
        options(noreturn)
    );
 
Actually, since this function doesn t take any raw pointers and any i32 is valid for this syscall we re going to remove the unsafe marker of this function. When doing this we still need to use unsafe within the function for our inline assembly.
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
Let s call this function from our main, and also remove the infinity loop of the panic handler with a call to exit(1):
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    exit(0);
 
Running this still works, but interestingly the generated assembly didn t change at all:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
Rust noticed there s no need to make it a separate function at runtime and instead merged the instructions of the exit function directly into our main. It also noticed the 0 argument in exit(0) means rdi is supposed to be zero and uses the XOR optimization mentioned before. Since main is not calling any unsafe functions anymore we could mark it as safe too, but in the next few functions we re going to deal with file descriptors and raw pointers, so this is likely the only safe function we re going to write in this tutorial so let s just keep the unsafe marker.

Printing text Ok let s try to do a quick hello world, to do this we re going to call the write syscall. Looking it up with man 2 write:
ssize_t write(int fd, const void buf[.count], size_t count);
The write syscall takes 3 arguments and returns a signed size_t. In Rust this is called isize. In C size_t is an unsigned integer type that can hold any value of sizeof(...) for the given platform, ssize_t can only store half of that because it uses one of the bits to indicate an error has occured (the first s means signed, write returns -1 in case of an error). The arguments for write are:
  • the file descriptor to write to. stdout is located on file descriptor 1.
  • a pointer/address to some memory.
  • the number of bytes that should be written, starting at the given address.
Let s also lookup the syscall number of write:
% rg __NR_write /usr/include/asm
/usr/include/asm/unistd_64.h
5:#define __NR_write 1
24:#define __NR_writev 20
/usr/include/asm/unistd_32.h
8:#define __NR_write 4
150:#define __NR_writev 146
/usr/include/asm/unistd_x32.h
5:#define __NR_write (__X32_SYSCALL_BIT + 1)
323:#define __NR_writev (__X32_SYSCALL_BIT + 516)
The value we re looking for is 1. Let s write our write function (heh).
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1 => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
Now that s a lot of stuff at once. Since this syscall is actually going to hand execution back to our program we need to let Rust know which cpu registers the syscall is writing to, so Rust doesn t attempt to use them to store data (that would be silently overwritten by the syscall). inlateout("raw") 1 => r0 means we re writing a value to the register and want the result back in variable r0. in("rdi") fd means we want to write the value of fd into the rdi register. lateout("rcx") _ means the Linux kernel may write to that register (so the previous value may get lost), but we don t want to store the value anywhere (the underscore acts as a dummy variable name). This doesn t compile just yet though
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error: incompatible types for asm inout argument
  --> src/main.rs:35:26
    
35           inlateout("rax") 1 => r0,
                              ^    ^^ type  isize 
                               
                              type  i32 
    
   = note: asm inout arguments must have the same type, unless they are both pointers or integers of the same size
error: could not compile  hack-the-planet  due to previous error
Rust has inferred the type of r0 is isize since that s what our function returns, but the type of the input value for the register was inferred to be i32. We re going to select a specific number type to fix this.
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
We can now call our new write function like this:
write(1, b"Hello world\n".as_ptr(), 12);
We need to set the number of bytes we want to write explicitly because there s no concept of null-byte termination in the write system call, it s quite literally write the next X bytes, starting from this address . Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    write(1, b"Hello world\n".as_ptr(), 12);
    exit(0);
 
Let s try to build and disassemble it:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001220 <_start>:
    1220:	48 89 e7             	mov    %rsp,%rdi
    1223:	e8 08 00 00 00       	call   1230 <main>
    1228:	cc                   	int3
    1229:	cc                   	int3
    122a:	cc                   	int3
    122b:	cc                   	int3
    122c:	cc                   	int3
    122d:	cc                   	int3
    122e:	cc                   	int3
    122f:	cc                   	int3
0000000000001230 <main>:
    1230:	50                   	push   %rax
    1231:	48 8d 35 d5 ef ff ff 	lea    -0x102b(%rip),%rsi        # 20d <_start-0x1013>
    1238:	b8 01 00 00 00       	mov    $0x1,%eax
    123d:	ba 0c 00 00 00       	mov    $0xc,%edx
    1242:	bf 01 00 00 00       	mov    $0x1,%edi
    1247:	0f 05                	syscall
    1249:	b8 3c 00 00 00       	mov    $0x3c,%eax
    124e:	31 ff                	xor    %edi,%edi
    1250:	0f 05                	syscall
    1252:	0f 0b                	ud2
This time there are 2 syscalls, first write, then exit. For write it s setting up the 3 arguments in our cpu registers (rdi, rsi, rdx). The lea instruction subtracts 0x102b from the rip register (the instruction pointer) and places the result in the rsi register. This is effectively saying an address relative to wherever this code was loaded into memory . The instruction pointer is going to point directly behind the opcodes of the lea instruction, so 0x1238 - 0x102b = 0x20d. This address is also pointed out in the disassembly as a comment. We don t see the string in our disassembly but we can convert our 0x20d hex to 525 in decimal and use dd to read 12 bytes from that offset, and sure enough:
$ dd bs=1 skip=525 count=12 if=target/x86_64-unknown-none/release/hack-the-planet
Hello world
12+0 records in
12+0 records out
Execute our binary with strace also shows the new write syscall (and the bytes that are being written mixed up in the output).
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74493abe64a8 /* 39 vars */) = 0
write(1, "Hello world\n", 12Hello world
)           = 12
exit(0)                                 = ?
+++ exited with 0 +++
After running strip on it to remove some symbols the binary is so small, if you open it in a text editor it fits on a screenshot:

3 March 2023

Louis-Philippe V ronneau: Goodbye Bullseye report from the Montreal 2023 BSP

Hello World! I haven't really had time to blog here since the start of the semester, as I've been pretty busy at work1. All this to say, this report for the Bug Squashing Party we held in Montreal last weekend is a little late, sorry :) First of all, I'm pleased to announce our local community seems to be doing great and has recovered from the pandemic-induced lull. May COVID stay away from our bodies forever. This time around, a total of 9 people made it to what has become somewhat of a biennial tradition2. We worked on a grand total of 14 bugs and even managed to close some! It looks like I was too concentrated on bugs to take a picture of the event... To redeem myself, I hereby offer you a picture of a cute-but-hairless cat I met on Sunday morning: Picture of a curious sphinx cat on a table You should try to join an upcoming BSP or to organise one if you can. It's loads of fun and you'll be helping the project make the next release happen sooner! As always, thanks to Debian for granting us a budget for the food and to rent the venue. Goodbye Bullseye!

  1. Which I guess is a good thing, since it means I actually have work this semester :O
  2. See our previous BSPs in 2017, 2019 and 2021.

Next.