Search Results: "abi"

20 October 2021

Arturo Borrero Gonz lez: Iterating on how we do NFS at Wikimedia Cloud Services

Logos This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez. NFS is a central piece of infrastructure that is essential to services like Toolforge. Recently, the Cloud Services team at Wikimedia had been reviewing how we do NFS. The current situation NFS is a central piece of technology for some of the services that the Wikimedia Cloud Services team offers to the community. We have several shares that power different use cases: Toolforge user home directories live on NFS, and Cloud VPS users can also access dumps using this protocol. The current setup involves several physical hardware servers, with about 20TB of storage, offering shares over 10G links to the cloud. For the system to be more fault-tolerant, we duplicate each share for redundancy using DRBD. Running NFS on dedicated hardware servers has traditionally offered us advantages: mostly on the performance and the capacity fields. As time has passed, we have been enumerating more and more reasons to review how we do NFS. For one, the current setup is in violation of some of our internal rules regarding realm separation. Additionally, we had been longing for additional flexibility managing our servers: we wanted to use virtual machines managed by Openstack Nova. The DRBD-based high-availability system required mostly a hand-crafted procedure for failover/failback. There s also some scalability concerns as NFS is easy to grow up, but not to grow horizontally, and of course, we have to be able to keep the tenancy setup while doing so, something that NFS does by using LDAP/Unix users and may get complicated too when growing. In general, the servers have become too big to fail , clearly technical debt, and it has taken us years to decide on taking on the task to rethink the architecture. It s worth mentioning that in an ideal world, we wouldn t depend on NFS, but the truth is that it will still be a central piece of infrastructure for years to come in services like Toolforge. Over a series of brainstorming meetings, the WMCS team evaluated the situation and sorted out the many moving parts. The team managed to boil down the potential service future to two competing options: Then we decided to research both options in parallel. For a number of reasons, the evaluation was timeboxed to three weeks. Both ideas had a couple of points in common: the NFS data would be stored on our Ceph farm via Cinder volumes, and we would rely on Ceph reliability to avoid using DRBD. Another open topic was how to back up data from Ceph, to store our important bits in more than one basket. We will get to the back up topic later. The manila experiment The Wikimedia Foundation was an early adopter of some Openstack components (Nova, Glance, Designate, Horizon), but Manila was never evaluated for usage until now. Our approach for this experiment was to closely follow the upstream guidelines. We read the documentation and tried to understand the different setups you can build with Manila. As we often feel with other Openstack components, the documentation doesn t perfectly describe how to introduce a given component in your particular local setup. Here we use an admin-controller flat-topology Neutron network. This network is shared by all tenants (or projects) in our Openstack deployment. Also, Manila can use many different driver backends, for things like NetApps or CephFS that we don t use , yet. After some research, the generic driver was the one that seemed to better fit our use case. The generic driver leverages Nova virtual machines instances plus Cinder volume to create and manage the shares. In general, Manila supports two operational modes, whether it should create/destroy the share servers (i.e, the virtual machine instances) or not. This option is called driver_handles_share_server (or DHSS) and takes a boolean value. We were interested in trying with DHSS=true, to really benefit from the potential of the setup. Manila diagram NFS idea 6, original image in Wikitech So, after sorting all these variables, we moved on with our initial testing. We built a PoC setup as depicted in the diagram above, with the manila-share component running in a virtual machine inside the cloud. The PoC led to us reporting several bugs upstream: In some cases we tried to address these bugs ourselves: It s worth mentioning that the upstream community was extra-welcoming to us, and we re thankful for that. However, at the end of our three-week period, our Manila setup still wasn t working as expected. Your experience may change with other drivers perhaps the ZFSonLinux or the CephFS ones. In general, we were having trouble making the setup work as expected, so we decided to abandon this approach in favor of the other option we were considering at the beginning. Simple virtual machine serving NFS The alternative was to create a Nova virtual machine instance by hand and to configure it using puppet. We have been investing in an automation framework lately, so the idea is to not actually create the server by hand. Anyway, the data would be decoupled from the instance into Cinder volumes, which led us to the question we left for later: How should we back up those terabytes of important information? Just to be clear, the backup problem was independent of the above options; with Manila we would still have had to solve the same challenge. We would like to see our data be backed up somewhere else other than in Ceph. And that s exactly where we are at right now. We ve been exploring different backup strategies and will finally use the Cinder backup API. Conclusion The iteration will end with the dedicated NFS hardware servers being stopped, and the shares being served from within the cloud. The migration will take some time to happen because we will check and double-check that everything works as expected (including from the performance point of view) before making definitive changes. We already have some plans to make sure our users experience as little service impact as possible. The most troublesome shares will be those related to Toolforge. At some point we will need to disallow writes to the NFS share, rsync the data out of the hardware servers into the Cinder volumes, point the NFS clients to the new virtual machines, and then enable writes again. The main Toolforge share has about 8TB of data, so this will take a while. We will have more updates in the future. Who knows, perhaps our next-next iteration, in a couple of years, will see us adopting Openstack Manila for good. Featured image credit: File:(from break water) Manila Skyline panoramio.jpg, ewol, CC BY-SA 3.0 This post was originally published in the Wikimedia Tech blog, authored by Arturo Borrero Gonzalez.

19 October 2021

Raphaël Hertzog: Freexian s report about Debian Long Term Support, September 2021

A Debian LTS logo
Like each month, have a look at the work funded by Freexian s Debian LTS offering. Debian project funding Folks from the LTS team, along with members of the Debian Android Tools team and Phil Morrel, have proposed work on the Java build tool, gradle, which is currently blocked due to the need to build with a plugin not available in Debian. The LTS team reviewed the project submission and it has been approved. After approval we ve created a Request for Bids which is active now. You ll hear more about this through official Debian channels, but in the meantime, if you feel you can help with this project, please submit a bid. Thanks! This September, Freexian set aside 2550 EUR to fund Debian projects. We re looking forward to receive more projects from various Debian teams! Learn more about the rationale behind this initiative in this article. Debian LTS contributors In September, 15 contributors have been paid to work on Debian LTS, their reports are available: Evolution of the situation In September we released 30 DLAs. September was also the second month of Jeremiah coordinating LTS contributors. Also, we would like say that we are always looking for new contributors to LTS. Please contact Jeremiah if you are interested! The security tracker currently lists 33 packages with a known CVE and the dla-needed.txt file has 26 packages needing an update. Thanks to our sponsors Sponsors that joined recently are in bold.

18 October 2021

Dirk Eddelbuettel: dang 0.0.14: Several Updates

A new release of the dang package arrived at CRAN a couple of hours ago, exactly eight months after the previous release. The dang package regroups a few functions of mine that had no other home as for example lsos() from a StackOverflow question from 2009 (!!), the overbought/oversold price band plotter from an older blog post, the market monitor from the last release as well the checkCRANStatus() function recently tweeted about by Tim Taylor. This release regroups a few small edits to several functions, adds a sample function for character encoding reading and conversion using a library already used by R (hence look Ma, no new depends ), adds a weekday helper, and a sample usage (computing rolling min/max values) of a new simple vector class added to tidyCpp (and the function and class need to get another blog post or study ), and an experimental git sha1sum and date marker (as I am not the fan of autogenerated binaries from repos as opposed to marked released meaning: we may see different binary release with the same version number). The full NEWS entry follows.

Changes in version 0.0.14 (2021-10-17)
  • Updated continuous integration to run on Linux only.
  • Edited checkNonAscii.cpp for readability.
  • More robust title display in intradayMarketMonitor.R.
  • New C++-based function to read and convert encoding via the R-supplied iconv library, noted a potential variability.
  • New function wday returning day of the week as integer.
  • The signature to as.data.table was standardized.
  • A new function rollMinMax was added illustrating use of the NumVec class from tidyCpp.
  • The configure script can record the last commit date and sha1 to automate timestamping builds, but not activated in this release.
  • checkCRANStatus() now works correctly for single-package lookups (Jordan Mark Barbone in #4).

Courtesy of my CRANberries, there is a comparison to the previous release. For questions or comments use the issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

13 October 2021

Dirk Eddelbuettel: RcppQuantuccia 0.0.4 on CRAN: Updated Calendar

A new release of RcppQuantuccia arrived on CRAN earlier today. RcppQuantuccia brings the Quantuccia header-only subset / variant of QuantLib to R. At the current stage, it mostly offers date and calendaring functions. This release is the first in two years and brings a few internal updates (such as a swift to continuous integration to the trusted r-ci setup) along with a first update of the United States calendar. Which, just like RQuantLib, now knows about two new calendars LiborUpdate and FederalReserve. So now we can for example look for holidays during June of next year under the Federal Reserve calendar and see
> library(RcppQuantuccia)
> setCalendar("UnitedStates/FederalReserve")
> getHolidays(as.Date("2022-06-01"), as.Date("2022-06-30"))
[1] "2022-06-20"
> 
that Juneteenth 2022 will be observed on (Monday) June 20th. We should note that Quantuccia itself was a bit of a trial balloon and is not actively maintained so we may concentrate on these calendaring functions to keep them in sync with QuantLib. Being a header-only subset is good, and the removal of the (very !!) expensive (in terms of compiled library size) Sobol sequence-based RNG in release 0.0.3 was the right call. So time permitting, a leaner, meaner RcppQuantuccia with a calendaring focus may emerge. The complete list changes follows.

Changes in version 0.0.4 (2021-10-12)
  • Allow for 'Null' calendar without weekends or holidays
  • Switch CI use to r-ci
  • Updated UnitedStates calendar to current QuantLib calendar
  • Small updates to DESCRIPTION and README.md

Courtesy of CRANberries, there is also a diffstat report relative to the previous release. More information is on the RcppQuantuccia page. Issues and bugreports should go to the GitHub issue tracker. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

12 October 2021

Antonio Terceiro: Triaging Debian build failure logs with collab-qa-tools

The Ruby team is working now on transitioning to ruby 3.0. Even though most packages will work just fine, there is substantial amount of packages that require some work to adapt. We have been doing test rebuilds for a while during transitions, but usually triaged the problems manually. This time I decided to try collab-qa-tools, a set of scripts Lucas Nussbaum uses when he does archive-wide rebuilds. I'm really glad that I did, because those tols save a lot of time when processing a large number of build failures. In this post, I will go through how to triage a set of build logs using collab-qa-tools. I have some some improvements to the code. Given my last merge request is very new and was not merged yet, a few of the things I mention here may apply only to my own ruby3.0 branch. collab-qa-tools also contains a few tools do perform the builds in the cloud, but since we already had the builds done, I will not be mentioning that part and will write exclusively about the triaging tools. Installing collab-qa-tools The first step is to clone the git repository. Make sure you have the dependencies from debian/control installed (a few Ruby libraries). One of the patches I sent, and was already accepted, is the ability to run it without the need to install:
source /path/to/collab-qa-tools/activate.sh
This will add the tools to your $PATH. Preparation The first think you need to do is getting all your build logs in a directory. The tools assume .log file extension, and they can be named $ PACKAGE _*.log or just $ PACKAGE .log. Creating a TODO file
cqa-scanlogs   grep -v OK > todo
todo will contain one line for each log with a summary of the failure, if it's able to find one. collab-qa-tools has a large set of regular expressions for finding errors in the build logs It's a good idea to split the TODO file in multiple ones. This can easily be done with split(1), and can be used to delimit triaging sessions, and/or to split the triaging between multiple people. For example this will create todo into todo00, todo01, ..., each containing 30 lines:
split --lines=30 --numeric-suffixes todo todo
Triaging You can now do the triaging. Let's say we split the TODO files, and will start with todo01. The first step is calling cqa-fetchbugs (it does what it says on the tin):
cqa-fetchbugs --TODO=todo01
Then, cqa-annotate will guide you through the logs and allow you to report bugs:
cqa-annotate --TODO=todo01
I wrote myself a process.sh wrapper script for cqa-fetchbugs and cqa-annotate that looks like this:
#!/bin/sh
set -eu
for todo in $@; do
  # force downloading bugs
  awk ' print(".bugs." $1) ' "$ todo "   xargs rm -f
  cqa-fetchbugs --TODO="$ todo "
  cqa-annotate \
    --template=template.txt.jinja2 \
    --TODO="$ todo "
done
The --template option is a recent contribution of mine. This is a template for the bug reports you will be sending. It uses Liquid templates, which is very similar to Jinja2 for Python. You will notice that I am even pretending it is Jinja2 to trick vim into doing syntax highlighting for me. The template I'm using looks like this:
From:   fullname   <  email  >
To: submit@bugs.debian.org
Subject:   package  : FTBFS with ruby3.0:   summary  
Source:   package  
Version:   version   split:'+rebuild'   first  
Severity: serious
Justification: FTBFS
Tags: bookworm sid ftbfs
User: debian-ruby@lists.debian.org
Usertags: ruby3.0
Hi,
We are about to enable building against ruby3.0 on unstable. During a test
rebuild,   package   was found to fail to build in that situation.
To reproduce this locally, you need to install ruby-all-dev from experimental
on an unstable system or build chroot.
Relevant part (hopefully):
 % for line in extract % >   line  
 % endfor % 
The full build log is available at
https://people.debian.org/~kanashiro/ruby3.0/round2/builds/3/  package  /  filename   replace:".log",".build.txt"  
The cqa-annotate loop cqa-annotate will parse each log file, display an extract of what it found as possibly being the relevant part, and wait for your input:
######## ruby-cocaine_0.5.8-1.1+rebuild1633376733_amd64.log ########
--------- Error:
     Failure/Error: undef_method :exitstatus
     FrozenError:
       can't modify frozen object: pid 2351759 exit 0
     # ./spec/support/unsetting_exitstatus.rb:4:in  undef_method'
     # ./spec/support/unsetting_exitstatus.rb:4:in  singleton class'
     # ./spec/support/unsetting_exitstatus.rb:3:in  assuming_no_processes_have_been_run'
     # ./spec/cocaine/errors_spec.rb:55:in  block (2 levels) in <top (required)>'
Deprecation Warnings:
Using  should  from rspec-expectations' old  :should  syntax without explicitly enabling the syntax is deprecated. Use the new  :expect  syntax or explicitly enable  :should  with  config.expect_with(:rspec)    c  c.syntax = :should   instead. Called from /<<PKGBUILDDIR>>/spec/cocaine/command_line/runners/backticks_runner_spec.rb:19:in  block (2 levels) in <top (required)>'.
If you need more of the backtrace for any of these deprecations to
identify where to make the necessary changes, you can configure
 config.raise_errors_for_deprecations! , and it will turn the
deprecation warnings into errors, giving you the full backtrace.
1 deprecation warning total
Finished in 6.87 seconds (files took 2.68 seconds to load)
67 examples, 1 failure
Failed examples:
rspec ./spec/cocaine/errors_spec.rb:54 # When an error happens does not blow up if running the command errored before execution
/usr/bin/ruby3.0 -I/usr/share/rubygems-integration/all/gems/rspec-support-3.9.3/lib:/usr/share/rubygems-integration/all/gems/rspec-core-3.9.2/lib /usr/share/rubygems-integration/all/gems/rspec-core-3.9.2/exe/rspec --pattern ./spec/\*\*/\*_spec.rb --format documentation failed
ERROR: Test "ruby3.0" failed:
----------------
ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus
----------------
package: ruby-cocaine
lines: 30
------------------------------------------------------------------------
s: skip
i: ignore this package permanently
r: report new bug
f: view full log
------------------------------------------------------------------------
Action [s i r f]:
You can then choose one of the options: When there are existing bugs in the package, cqa-annotate will list them among the options. If you choose a bug number, the TODO file will be annotated with that bug number and new runs of cqa-annotate will not ask about that package anymore. For example after I reported a bug for ruby-cocaine for the issue listed above, I aborted with a ctrl-c, and when I run my process.sh script again I then get this prompt:
----------------
ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus
----------------
package: ruby-cocaine
lines: 30
------------------------------------------------------------------------
s: skip
i: ignore this package permanently
1: 996206 serious ruby-cocaine: FTBFS with ruby3.0: ERROR: Test "ruby3.0" failed:      Failure/Error: undef_method :exitstatus  
r: report new bug
f: view full log
------------------------------------------------------------------------
Action [s i 1 r f]:
Chosing 1 will annotate the TODO file with the bug number, and I'm done with this package. Only a few other hundreds to go.

4 October 2021

Rapha&#235;l Hertzog: Freexian s report about Debian Long Term Support, August 2021

A Debian LTS logo
Like each month, have a look at the work funded by Freexian s Debian LTS offering. Debian project funding In August, we put aside 2460 EUR to fund Debian projects. We received a new project proposal that got approved and there s an associated bid request if you feel like proposing yourself to implement this project. We re looking forward to receive more projects from various Debian teams! Learn more about the rationale behind this initiative in this article. Debian LTS contributors In August, 14 contributors have been paid to work on Debian LTS, their reports are available: Evolution of the situation In August we released 30 DLAs.

This is the first month of Jeremiah coordinating LTS contributors. We would like to thank Holger Levsen for his work on this role up to now.

Also, we would like to remark once again that we are constantly looking for new contributors. Please contact Jeremiah if you are interested! The security tracker currently lists 73 packages with a known CVE and the dla-needed.txt file has 29 packages needing an update. Thanks to our sponsors Sponsors that joined recently are in bold.

Paul Wise: FLOSS Activities September 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian BTS: reopened bugs closed by a spammer
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The purple-discord/harmony/pyemd/librecaptcha/esprima-python work was sponsored by my employer. All other work was done on a volunteer basis.

27 September 2021

Russ Allbery: Review: The Problem with Work

Review: The Problem with Work, by Kathi Weeks
Publisher: Duke University Press
Copyright: 2011
ISBN: 0-8223-5112-9
Format: Kindle
Pages: 304
One of the assumptions baked deeply into US society (and many others) is that people are largely defined by the work they do, and that work is the primary focus of life. Even in Marxist analysis, which is otherwise critical of how work is economically organized, work itself reigns supreme. This has been part of the feminist critique of both capitalism and Marxism, namely that both devalue domestic labor that has traditionally been unpaid, but even that criticism is normally framed as expanding the definition of work to include more of human activity. A few exceptions aside, we shy away from fundamentally rethinking the centrality of work to human experience. The Problem with Work begins as a critical analysis of that centrality of work and a history of some less-well-known movements against it. But, more valuably for me, it becomes a discussion of the types and merits of utopian thinking, including why convincing other people is not the only purpose for making a political demand. The largest problem with this book will be obvious early on: the writing style ranges from unnecessarily complex to nearly unreadable. Here's an excerpt from the first chapter:
The lack of interest in representing the daily grind of work routines in various forms of popular culture is perhaps understandable, as is the tendency among cultural critics to focus on the animation and meaningfulness of commodities rather than the eclipse of laboring activity that Marx identifies as the source of their fetishization (Marx 1976, 164-65). The preference for a level of abstraction that tends not to register either the qualitative dimensions or the hierarchical relations of work can also account for its relative neglect in the field of mainstream economics. But the lack of attention to the lived experiences and political textures of work within political theory would seem to be another matter. Indeed, political theorists tend to be more interested in our lives as citizens and noncitizens, legal subjects and bearers of rights, consumers and spectators, religious devotees and family members, than in our daily lives as workers.
This is only a quarter of a paragraph, and the entire book is written like this. I don't mind the occasional use of longer words for their precise meanings ("qualitative," "hierarchical") and can tolerate the academic habit of inserting mostly unnecessary citations. I have less patience with the meandering and complex sentences, excessive hedge words ("perhaps," "seem to be," "tend to be"), unnecessarily indirect phrasing ("can also account for" instead of "explains"), or obscure terms that are unnecessary to the sentence (what is "animation of commodities"?). And please have mercy and throw a reader some paragraph breaks. The writing style means substantial unnecessary effort for the reader, which is why it took me six months to read this book. It stalled all of my non-work non-fiction reading and I'm not sure it was worth the effort. That's unfortunate, because there were several important ideas in here that were new to me. The first was the overview of the "wages for housework" movement, which I had not previously heard of. It started from the common feminist position that traditional "women's work" is undervalued and advocated taking the next logical step of giving it equality with paid work by making it paid work. This was not successful, obviously, although the increasing prevalence of day care and cleaning services has made it partly true within certain economic classes in an odd and more capitalist way. While I, like Weeks, am dubious this was the right remedy, the observation that household work is essential to support capitalist activity but is unmeasured by GDP and often uncompensated both economically and socially has only become more accurate since the 1970s. Weeks argues that the usefulness of this movement should not be judged by its lack of success in achieving its demands, which leads to the second interesting point: the role of utopian demands in reframing and expanding a discussion. I normally judge a political demand on its effectiveness at convincing others to grant that demand, by which standard many activist campaigns (such as wages for housework) are unsuccessful. Weeks points out that making a utopian demand changes the way the person making the demand perceives the world, and this can have value even if the demand will never be granted. For example, to demand wages for housework requires rethinking how work is defined, what activities are compensated by the economic system, how such wages would be paid, and the implications for domestic social structures, among other things. That, in turn, helps in questioning assumptions and understanding more about how existing society sustains itself. Similarly, even if a utopian demand is never granted by society at large, forcing it to be rebutted can produce the same movement in thinking in others. In order to rebut a demand, one has to take it seriously and mount a defense of the premises that would allow one to rebut it. That can open a path to discussing and questioning those premises, which can have long-term persuasive power apart from the specific utopian demand. It's a similar concept as the Overton Window, but with more nuance: the idea isn't solely to move the perceived range of accepted discussion, but to force society to examine its assumptions and premises well enough to defend them, or possibly discover they're harder to defend than one might have thought. Weeks applies this principle to universal basic income, as a utopian demand that questions the premise that work should be central to personal identity. I kept thinking of the Black Lives Matter movement and the demand to abolish the police, which (at least in popular discussion) is a more recent example than this book but follows many of the same principles. The demand itself is unlikely to be met, but to rebut it requires defending the existence and nature of the police. That in turn leads to questions about the effectiveness of policing, such as clearance rates (which are far lower than one might have assumed). Many more examples came to mind. I've had that experience of discovering problems with my assumptions I'd never considered when debating others, but had not previously linked it with the merits of making demands that may be politically infeasible. The book closes with an interesting discussion of the types of utopias, starting from the closed utopia in the style of Thomas More in which the author sets up an ideal society. Weeks points out that this sort of utopia tends to collapse with the first impossibility or inconsistency the reader notices. The next step is utopias that acknowledge their own limitations and problems, which are more engaging (she cites Le Guin's The Dispossessed). More conditional than that is the utopian manifesto, which only addresses part of society. The least comprehensive and the most open is the utopian demand, such as wages for housework or universal basic income, which asks for a specific piece of utopia while intentionally leaving unspecified the rest of the society that could achieve it. The demand leaves room to maneuver; one can discuss possible improvements to society that would approach that utopian goal without committing to a single approach. I wish this book were better-written and easier to read, since as it stands I can't recommend it. There were large sections that I read but didn't have the mental energy to fully decipher or retain, such as the extended discussion of Ernst Bloch and Friedrich Nietzsche in the context of utopias. But that way of thinking about utopian demands and their merits for both the people making them and for those rebutting them, even if they're not politically feasible, will stick with me. Rating: 5 out of 10

22 September 2021

Ian Jackson: Tricky compatibility issue - Rust's io::ErrorKind

This post is about some changes recently made to Rust's ErrorKind, which aims to categorise OS errors in a portable way. Audiences for this post Background and context Error handling principles Handling different errors differently is often important (although, sadly, often neglected). For example, if a program tries to read its default configuration file, and gets a "file not found" error, it can proceed with its default configuration, knowing that the user hasn't provided a specific config. If it gets some other error, it should probably complain and quit, printing the message from the error (and the filename). Otherwise, if the network fileserver is down (say), the program might erroneously run with the default configuration and do something entirely wrong. Rust's portability aims The Rust programming language tries to make it straightforward to write portable code. Portable error handling is always a bit tricky. One of Rust's facilities in this area is std::io::ErrorKind which is an enum which tries to categorise (and, sometimes, enumerate) OS errors. The idea is that a program can check the error kind, and handle the error accordingly. That these ErrorKinds are part of the Rust standard library means that to get this right, you don't need to delve down and get the actual underlying operating system error number, and write separate code for each platform you want to support. You can check whether the error is ErrorKind::NotFound (or whatever). Because ErrorKind is so important in many Rust APIs, some code which isn't really doing an OS call can still have to provide an ErrorKind. For this purpose, Rust provides a special category ErrorKind::Other, which doesn't correspond to any particular OS error. Rust's stability aims and approach Another thing Rust tries to do is keep existing code working. More specifically, Rust tries to:
  1. Avoid making changes which would contradict the previously-published documentation of Rust's language and features.
  2. Tell you if you accidentally rely on properties which are not part of the published documentation.
By and large, this has been very successful. It means that if you write code now, and it compiles and runs cleanly, it is quite likely that it will continue work properly in the future, even as the language and ecosystem evolves. This blog post is about a case where Rust failed to do (2), above, and, sadly, it turned out that several people had accidentally relied on something the Rust project definitely intended to change. Furthermore, it was something which needed to change. And the new (corrected) way of using the API is not so obvious. Rust enums, as relevant to io::ErrorKind (Very briefly:) When you have a value which is an io::ErrorKind, you can compare it with specific values:
    if error.kind() == ErrorKind::NotFound   ...
  
But in Rust it's more usual to write something like this (which you can read like a switch statement):
    match error.kind()  
      ErrorKind::NotFound => use_default_configuration(),
      _ => panic!("could not read config file  :  ", &file, &error),
     
  
Here _ means "anything else". Rust insists that match statements are exhaustive, meaning that each one covers all the possibilities. So if you left out the line with the _, it wouldn't compile. Rust enums can also be marked non_exhaustive, which is a declaration by the API designer that they plan to add more kinds. This has been done for ErrorKind, so the _ is mandatory, even if you write out all the possibilities that exist right now: this ensures that if new ErrorKinds appear, they won't stop your code compiling. Improving the error categorisation The set of error categories stabilised in Rust 1.0 was too small. It missed many important kinds of error. This makes writing error-handling code awkward. In any case, we expect to add new error categories occasionally. I set about trying to improve this by proposing new ErrorKinds. This obviously needed considerable community review, which is why it took about 9 months. The trouble with Other and tests Rust has to assign an ErrorKind to every OS error, even ones it doesn't really know about. Until recently, it mapped all errors it didn't understand to ErrorKind::Other - reusing the category for "not an OS error at all". Serious people who write serious code like to have serious tests. In particular, testing error conditions is really important. For example, you might want to test your program's handling of disk full, to make sure it didn't crash, or corrupt files. You would set up some contraption that would simulate a full disk. And then, in your tests, you might check that the error was correct. But until very recently (still now, in Stable Rust), there was no ErrorKind::StorageFull. You would get ErrorKind::Other. If you were diligent you would dig out the OS error code (and check for ENOSPC on Unix, corresponding Windows errors, etc.). But that's tiresome. The more obvious thing to do is to check that the kind is Other. Obvious but wrong. ErrorKind is non_exhaustive, implying that more error kinds will appears, and, naturally, these would more finely categorise previously-Other OS errors. Unfortunately, the documentation note
Errors that are Other now may move to a different or a new ErrorKind variant in the future.
was only added in May 2020. So the wrongness of the "obvious" approach was, itself, not very obvious. And even with that docs note, there was no compiler warning or anything. The unfortunate result is that there is a body of code out there in the world which might break any time an error that was previously Other becomes properly categorised. Furthermore, there was nothing stopping new people writing new obvious-but-wrong code. Chosen solution: Uncategorized The Rust developers wanted an engineered safeguard against the bug of assuming that a particular error shows up as Other. They chose the following solution: There is now a new ErrorKind::Uncategorized which is now used for all OS errors for which there isn't a more specific categorisation. The fallback translation of unknown errors was changed from Other to Uncategorised. This is de jure justified by the fact that this enum has always been marked non_exhaustive. But in practice because this bug wasn't previously detected, there is such code in the wild. That code now breaks (usually, in the form of failing test cases). Usually when Rust starts to detect a particular programming error, it is reported as a new warning, which doesn't break anything. But that's not possible here, because this is a behavioural change. The new ErrorKind::Uncategorized is marked unstable. This makes it impossible to write code on Stable Rust which insists that an error comes out as Uncategorized. So, one cannot now write code that will break when new ErrorKinds are added. That's the intended effect. The downside is that this does break old code, and, worse, it is not as clear as it should be what the fixed code looks like. Alternatives considered and rejected by the Rust developers Not adding more ErrorKinds This was not tenable. The existing set is already too small, and error categorisation is in any case expected to improve over time. Just adding ErrorKinds as had been done before This would mean occasionally breaking test cases (or, possibly, production code) when an error that was previously Other becomes categorised. The broken code would have been "obvious", but de jure wrong, just as it is now, So this option amounts to expecting this broken code to continue to be written and continuing to break it occasionally. Somehow using Rust's Edition system The Rust language has a system to allow language evolution, where code declares its Edition (2015, 2018, 2021). Code from multiple editions can be combined, so that the ecosystem can upgrade gradually. It's not clear how this could be used for ErrorKind, though. Errors have to be passed between code with different editions. If those different editions had different categorisations, the resulting programs would have incoherent and broken error handling. Also some of the schemes for making this change would mean that new ErrorKinds could only be stabilised about once every 3 years, which is far too slow. How to fix code broken by this change Most main-line error handling code already has a fallback case for unknown errors. Simply replacing any occurrence of Other with _ is right. How to fix thorough tests The tricky problem is tests. Typically, a thorough test case wants to check that the error is "precisely as expected" (as far as the test can tell). Now that unknown errors come out as an unstable Uncategorized variant that's not so easy. If the test is expecting an error that is currently not categorised, you want to write code that says "if the error is any of the recognised kinds, call it a test failure". What does "any of the recognised kinds" mean here ? It doesn't meany any of the kinds recognised by the version of the Rust stdlib that is actually in use. That set might get bigger. When the test is compiled and run later, perhaps years later, the error in this test case might indeed be categorised. What you actually mean is "the error must not be any of the kinds which existed when the test was written". IMO therefore the right solution for such a test case is to cut and paste the current list of stable ErrorKinds into your code. This will seem wrong at first glance, because the list in your code and in Rust can get out of step. But when they do get out of step you want your version, not the stdlib's. So freezing the list at a point in time is precisely right. You probably only want to maintain one copy of this list, so put it somewhere central in your codebase's test support machinery. Periodically, you can update the list deliberately - and fix any resulting test failures. Unfortunately this approach is not suggested by the documentation. In theory you could work all this out yourself from first principles, given even the situation prior to May 2020, but it seems unlikely that many people have done so. In particular, cutting and pasting the list of recognised errors would seem very unnatural. Conclusions This was not an easy problem to solve well. I think Rust has done a plausible job given the various constraints, and the result is technically good. It is a shame that this change to make the error handling stability more correct caused the most trouble for the most careful people who write the most thorough tests. I also think the docs could be improved.
edited shortly after posting, and again 2021-09-22 16:11 UTC, to fix HTML slips


comment count unavailable comments

16 September 2021

Chris Lamb: On Colson Whitehead's Harlem Shuffle

Colson Whitehead's latest novel, Harlem Shuffle, was always going to be widely reviewed, if only because his last two books won Pulitzer prizes. Still, after enjoying both The Underground Railroad and The Nickel Boys, I was certainly going to read his next book, regardless of what the critics were saying indeed, it was actually quite agreeable to float above the manufactured energy of the book's launch. Saying that, I was encouraged to listen to an interview with the author by Ezra Klein. Now I had heard Whitehead speak once before when he accepted the Orwell Prize in 2020, and once again he came across as a pretty down-to-earth guy. Or if I were to emulate the detached and cynical tone Whitehead embodied in The Nickel Boys, after winning so many literary prizes in the past few years, he has clearly rehearsed how to respond to the cliched questions authors must be asked in every interview. With the obligatory throat-clearing of 'so, how did you get into writing?', for instance, Whitehead replies with his part of the catechism that 'It seemed like being a writer could be a cool job. You could work from home and not talk to people.' The response is the right combination of cute and self-effacing... and with its slight tone-deafness towards enforced isolation, it was no doubt honed before Covid-19. Harlem Shuffle tells three separate stories about Ray Carney, a furniture salesman and 'fence' for stolen goods in New York in the 1960s. Carney doesn't consider himself a genuine criminal though, and there's a certain logic to his relativistic morality. After all, everyone in New York City is on the take in some way, and if some 'lightly used items' in Carney's shop happened to have had 'previous owners', well, that's not quite his problem. 'Nothing solid in the city but the bedrock,' as one character dryly observes. Yet as Ezra pounces on in his NYT interview mentioned abov, the focus on the Harlem underworld means there are very few women in the book, and Whitehead's circular response ah well, it's a book about the criminals at that time! was a little unsatisfying. Not only did it feel uncharacteristically slippery of someone justly lauded for his unflinching power of observation (after all, it was the author who decided what to write about in the first place), it foreclosed on the opportunity to delve into why the heist and caper genres (from The Killing, The Feather Thief, Ocean's 11, etc.) have historically been a 'male' mode of storytelling. Perhaps knowing this to be the case, the conversation quickly steered towards Ray Carney's wife, Elizabeth, the only woman in the book who could be said possesses some plausible interiority. The following off-hand remark from Whitehead caught my attention:
My wife is convinced that [Elizabeth] knows everything about Carney's criminal life, and is sort of giving him a pass. And I'm not sure if that's true. I have to have to figure out exactly what she knows and when she knows it and how she feels about it.
I was quite taken by this, although not simply due to its effect on the story it self. As in, it immediately conjured up a charming picture of Whitehead's domestic arrangements: not only does Whitehead's wife feel free to disagree with what one of Whitehead's 'own' characters knows or believes, but that Colson has no problem whatsoever sharing that disagreement with the public at large. (It feels somehow natural that Whitehead's wife believes her counterpart knows more than she lets on, whilst Whitehead himself imbues the protagonist's wife with a kind of neo-Victorian innocence.) I'm minded to agree with Whitehead's partner myself, if only due to the passages where Elizabeth is studiously ignoring Carney's otherwise unexplained freak-outs. But all of these meta-thoughts simply underline just how emancipatory the Death of the Author can be. This product of academic literary criticism (the term was coined by Roland Barthes' 1967 essay of the same name) holds that the original author's intentions, ideas or biographical background carry no especial weight in determining how others should interpret their work. It is usually understood as meaning that a writer's own views are no more valid or 'correct' than the views held by someone else. (As an aside, I've found that most readers who encounter this concept for the first time have been reading books in this way since they were young. But the opposite is invariably true with cinephiles, who often have a bizarre obsession with researching or deciphering the 'true' interpretation of a film.) And with all that in mind, can you think of a more wry example of how freeing (and fun) nature of the Death of the Author than an author's own partner dissenting with their (Pulitzer Prize-winning) husband on the position of a lynchpin character?
The 1964 Harlem riot began after James Powell, a 15-year-old African American, was shot and killed by Thomas Gilligan, an NYPD police officer in front of 10s of witnesses. Gilligan was subsequently cleared by a grand jury.
As it turns out, the reviews for Harlem Shuffle have been almost universally positive, and after reading it in the two days after its release, I would certainly agree it is an above-average book. But it didn't quite take hold of me in the way that The Underground Railroad or The Nickel Boys did, especially the later chapters of The Nickel Boys that were set in contemporary New York and could thus make some (admittedly fairly explicit) connections from the 1960s to the present day that kind of connection is not there in Harlem Shuffle, or at least I did not pick up on it during my reading. I can see why one might take exception to that, though. For instance, it is certainly true that the week-long Harlem Riot forms a significant part of the plot, and some events in particular are entirely contingent on the ramifications of this momentous event. But it's difficult to argue the riot's impact are truly integral to the story, so not only is this uprising against police brutality almost regarded as a background event, any contemporary allusion to the murder of George Floyd is subsequently watered down. It's nowhere near the historical rubbernecking of Forrest Gump (1994), of course, but that's not a battle you should ever be fighting. Indeed, whilst a certain smoothness of affect is to be priced into the Whitehead reading experience, my initial overall reaction to Harlem Shuffle was fairly flat, despite all the action and intrigue on the page. The book perhaps belies its origins as a work conceived during quarantine after all, the book is essentially comprised of three loosely connected novellas, almost as if the unreality and mental turbulence of lockdown prevented the author from performing the psychological 'deep work' of producing a novel-length text with his usual depth of craft. A few other elements chimed with this being a 'lockdown novel' as well, particularly the book's preoccupation with the sheer physicality of the city compared to the usual complex interplay between its architecture and its inhabitants. This felt like it had been directly absorbed into the book from the author walking around his deserted city, and thus being able to take in details for the first time:
The doorways were entrances into different cities no, different entrances into one vast, secret city. Ever close, adjacent to all you know, just underneath. If you know where to look.
And I can't fail to mention that you can almost touch Whitehead's sublimated hunger to eat out again as well:
Stickups were chops they cook fast and hot, you re in and out. A stakeout was ribs fire down low, slow, taking your time. [ ] Sometimes when Carney jumped into the Hudson when he was a kid, some of that stuff got into his mouth. The Big Apple Diner served it up and called it coffee.
More seriously, however, the relatively thin personalities of minor characters then reminded me of the simulacrum of Zoom-based relationships, and the essentially unsatisfactory endings to the novellas felt reminiscent of lockdown pseudo-events that simply fizzle out without a bang. One of the stories ties up loose ends with: 'These things were usually enough to terminate a mob war, and they appeared to end the hostilities in this case as well.' They did? Well, okay, I guess.
The corner of 125th Street and Morningside Avenue in 2019, the purported location of Carney's fictional furniture store. Signage plays a prominent role in Harlem Shuffle, possibly due to the author's quarantine walks.
Still, it would be unfair to characterise myself as 'disappointed' with the novel, and none of this piece should be taken as really deep criticism. The book certainly was entertaining enough, and pretty funny in places as well:
Carney didn t have an etiquette book in front of him, but he was sure it was bad manners to sit on a man s safe. [ ] The manager of the laundromat was a scrawny man in a saggy undershirt painted with sweat stains. Launderer, heal thyself.
Yet I can't shake the feeling that every book you write is a book that you don't, and so we might need to hold out a little longer for Whitehead's 'George Floyd novel'. (Although it is for others to say how much of this sentiment is the expectations of a White Reader for The Black Author to ventriloquise the pain of 'their' community.) Some room for personal critique is surely permitted. I dearly missed the junk food energy of the dry and acerbic observations that run through Whitehead's previous work. At one point he had a good line on the model tokenisation that lurks behind 'The First Negro to...' labels, but the callbacks to this idea ceased without any payoff. Similar things happened with the not-so-subtle critiques of the American Dream:
Entrepreneur? Pepper said the last part like manure. That s just a hustler who pays taxes. [ ] One thing I ve learned in my job is that life is cheap, and when things start getting expensive, it gets cheaper still.
Ultimately, though, I think I just wanted more. I wanted a deeper exploration of how the real power in New York is not wielded by individual street hoodlums or even the cops but in the form of real estate, essentially serving as a synecdoche for Capital as a whole. (A recent take of this can be felt in Jed Rothstein's 2021 documentary, WeWork: Or the Making and Breaking of a $47 Billion Unicorn and it is perhaps pertinent to remember that the US President at the time this novel was written was affecting to be a real estate tycoon.). Indeed, just like the concluding scenes of J. J. Connolly's Layer Cake, although you can certainly pull off a cool heist against the Man, power ultimately resides in those who control the means of production... and a homespun furniture salesman on the corner of 125 & Morningside just ain't that. There are some nods to kind of analysis in the conclusion of the final story ('Their heist unwound as if it had never happened, and Van Wyck kept throwing up buildings.'), but, again, I would have simply liked more. And when I attempted then file this book away into the broader media landscape, given the current cultural visibility of 1960s pop culture (e.g. One Night in Miami (2020), Judas and the Black Messiah (2021), Summer of Soul (2021), etc.), Harlem Shuffle also seemed like a missed opportunity to critically analyse our (highly-qualified) longing for the civil rights era. I can certainly understand why we might look fondly on the cultural products from a period when politics was less alienated, when society was less atomised, and when it was still possible to imagine meaningful change, but in this dimension at least, Harlem Shuffle seems to merely contribute to this nostalgic escapism.

14 September 2021

Joachim Breitner: A Candid explainer: Quirks

This is the fifth and final post in a series about the interface description language Candid. If you made it this far, you now have a good understanding of what Candid is, what it is for and how it is used. For this final post, I ll put the spotlight on specific aspects of Candid that are maybe surprising, or odd, or quirky. This section will be quite opinionated, and could maybe be called what I d do differently if I d re-do the whole thing . Note that these quirks are not serious problems, and they don t invalidate the overall design. I am writing this up not to discourage the use of Candid, but merely help interested parties to understand it better.

References in the wire format When the work on Candid began at DFINITY, the Internet Computer was still far away from being a thing, and many fundamental aspects about it were still in the air. I particular, there was still talk about embracing capabilities as a core feature of the application model, which would be implemented as opaque references on the system level, likely building on WebAssembly s host reference type proposal (which only landed recently), and could be used to model access permissions, custom tokens and many other things. So Candid is designed with that in mind, and you ll find that its wire format is not just a type table and a value table, but actually
a triple (T, M, R), where T ( type ) and M ( memory ) are sequences of bytes and R ( references ) is a sequence of references.
Also the wire format for values of function service tyeps have an extra byte to distinguish between public references (represented by a principal and possible a method name in the data part), and these opaque references. Alas, references never made it into the Internet Computer, so all Candid implementations simply ignore that part of the specification. But it s still in the spec, and if it confused you before, now you know why.

Hashed field names Candid record and variant types look like they have textual field names:
type T = record   syndactyle : nat; trustbuster: bool  
But that is actually only true superficially. The wire format for Candid only stores hashes of field names. So the above is actually equivalent to
type T = record   4260381820 : nat; 3504418361 : bool  
or, for that matter, to
type T = record   destroys : bool; rectum : nat  
(Yes, I used an english word list to find these hash collisions. There aren t that many actually.) The benefit of such hashing is that the messages are a bit smaller in most (not all) cases, but it is a big annoyance for dynamic uses of Candid. It s the reason why tools like dfx, if they don t know the Candid interface of a service, will print the result with just the numerical hash, letting you guess which field is which. It also complicates host languages that derive Candid types from the host language, like Motoko, as some records (e.g. record trustbuster: bool; destroys : int ) with field name hash collisions can not be represented in Candid, and either the host language s type system needs to be Candid aware now (as is the case of Motoko), or serialization/deserialization will fail at runtime, or odd bugs can happen. (More discussion of this issue).

Tuples Many languages have a built-in notion of a tuple type (e.g. (Int, Bool)), but Candid does not have such a type. The only first class product type is records. This means that tuples have to encoded as records somehow. Conveniently(?) record fields are just numbers after all, so the type (Int, Bool) would be mapped to the type
record   0 : int; 1 : bool  
So tuples can be expressed. But from my experience implementing the type mappings for Motoko and Haskell this is causing headaches. To get a good experience when importing from Candid, the tools have to try to reconstruct which records may have been tuples originally, and turn them into tuples. The main argument for the status quo is that Candid types should be canonical, and there should not be more than one product type, and records are fine, and there needs to be no anonymous product type. But that has never quite resonated with me, given the practical reality of tuple types in many host languages.

Argument sequences Did I say that Candid does not have tuple types? Turns out it does, sort of. There is no first class anonymous product, but since functions take sequences of arguments and results, there is a tuple type right there:
func foo : (bool, int) -> (int, bool)
Again, I found that ergonomic interaction with host languages becomes relatively unwieldy by requiring functions to take and return sequences of values. This is especially true for languages where functions take one argument value or return one result type (the latter being very common). Here, return sequences of length one are turned into that type directly, longer argument sequences turn into the host language s tuple type, and nullary argument sequences turn into the idiomatic unit type. But this means that the types (int, bool) -> () and (record 0: int, 1: bool ) -> () may be mapped to the same host language type, which causes problems when you hope to encode all necessary Candid type information in the host language. Another oddity with argument and result sequences is that you can give names to the entries, e.g. write
func hello : (last_name : text; first_name : text) -> ()
but these names are completely ignored! So while this looks like you can, for example, add new optional arguments in the middle, such as
func hello : (last_name : text; middle_name: opt text, first_name : text) -> ()
without breaking clients, this does not have the effect you think it has and will likely break. My suggestion is to never put names on function arguments and result values in Candid interfaces, and for anything that might be extended with new fields or where you want to name the arguments, use a single record type as the only argument:
func hello : (record   last_name : text; first_name : text ) -> ()
This allows you to add and remove arguments more easily and reliably.

Type shorthands The Candid specification defines a system of types, and then adds a number of syntactic short-hands . For example, if you write blob in a Candid type description, it ought to means the same as vec nat8. My qualm with that is that it doesn t always mean the same. A Candid type description is interpreted by a number of, say, consumers . Two such consumers are part of the Candid specification:
  • The specification that defines the wire format for that type
  • The upgrading (subtyping) rules
But there are more! For every host language, there is some mapping from Candid types to host language types, and also generic tools like Candid UI are consumers of the type algebra. If these were to take the Candid specification as gospel, they would be forced to treat blob and vec nat8 the same, but that would be quite unergonomic and might cause performance regressions (most language try to map blob to some compact binary data type, while vec t tends to turn into some form of array structure). So they need to be pragmatic and treat blob and vec nat8 differently. But then, for all practical purposes, blob is not just a short-hand of vec nat8. They are different types that just happens to have the same wire representations and subtyping relations. This affects not just blob, but also tuples (record blob; int; bool ) and field names , as discussed above.

The value text format For a while after defining Candid, the only implementation was in Motoko, and all the plumbing was automatic, so there was never a need for users to to explicitly handle Candid values, as all values were Motoko values. Still, for debugging and testing and such things, we eventually needed a way to print out Candid values, so the text format was defined ( To enable convenient debugging, the following grammar specifies a text format for values ). But soon the dfx tool learned to talk to canisters, and now users needed to enter Candid value on the command line, possibly even when talking to canisters for which the interface was not known to dfx. And, sure enough, the textual interface defined in the Candid spec was used. Unfortunately, since it was not designed for that use case, it is rather unwieldy:
  • It is quite verbose. You have to write record , not just . Vectors are written vec ; instead of some conventional syntax like [ , ]. Variants are written as variant error = " " with braces that don t any value here, and something like #error " " might have worked as well. With a bit more care, a more concise and ergonomic syntax might have been possible.
  • It wasn t designed to be sufficient to create a Candid value from it. If you write 5 it s unclear whether that s a nat or an int16 or what (and all of these have different wire representations). Type annotations were later added, but are relatively unwieldy, and don t cover all cases (e.g. a service reference with a recursive type cannot be represented in the textual format at the moment).
  • Not really the fault of the textual format, but some useful information about the types is not reflected in the type description that s part of the wire format. In particular not the field names, and whether a value was intended to be binary data (blob) or a list of small numbers (vec nat8), so pretty-printing such values requires guesswork. The Haskell library even tries to brute-force the hash to guess the field name, if it is short or in a english word list!
In hindsight I think it was too optimistic to assume that correct static type information is always available, and instead of actively trying to discourage dynamic use, Candid might be better if we had taken these (unavoidable?) use cases into account.

Custom wire format At the beginning of this post, I have a Candid is list. The list is relatively long, and the actual wire format is just one bullet point. Yes, defining a wire format that works isn t rocket science, and it was easiest to just make one up. But since most of the interesting meat of Candid is in other aspects (subtyping rules, host language integration), I wonder if it would have been better to use an existing, generic wire format, such as CBOR, and build Candid as a layer on top. This would give us plenty of tools and libraries to begin with. And maybe it would have reduced barrier of entry for developers, which now led to the very strange situation that DFINITY advocates for the universal use of Candid on the Internet Computer, so that all services can smoothly interact, but two of the most important services on the Internet Computer (the registry and the ledger) use Protobuf as their primary interface format, with Candid interfaces missing or an afterthought.

Sideways Interface Evolution This is not a quirk of Candid itself, but rather an idiom of how you can use Candid that emerged from our solution for record extensions and upgrades. Consider our example from before, a service with interface
service   add_user : (record   login : text; name : text  ) -> ()  
where you want to add an age field, which should be a number. The official way of doing that is to add that field with an optional type:
service   add_user : (record   login : text; name : text; age : opt nat  ) -> ()  
As explained above, this will not break old clients, as the decoder will treat a missing argument as null. So far so good. But often when adding such a field you don t want to bother new clients with the fact that this age was, at some point in the past, not there yet. And you can do that! The trick is to distinguish between the interface you publish and the interface you implement. You can (for example in your documentation) state that the interface is
service   add_user : (record   login : text; name : text; age : nat  ) -> ()  
which is not a subtype of the old type, but it is the interface you want new clients to work with. And then your implementation uses the type with opt nat. Calls from old clients will come through as null, and calls from new clients will come through as opt 42. We can see this idiom used in the Management Canister of the Internet Computer. The current documented interface only mentions a controllers : vec principal field in the settings, but the implementation still can handle both the old controller : principal and the new controllers field. It s probably advisable to let your CI system check that new versions of your service continue to implement all published interfaces, including past interfaces. But as long as the actual implementation s interface is a subtype of all interfaces ever published, this works fine. This pattern is related to when your service implements, say, http_request (so its implemented interface is a subtype of that common interface), but does not include that method in the published documentation (because clients of your service don t need to call it).

Self-describing Services As you have noticed, Candid was very much designed assuming that all parties always have the service type of services they want to interact with. But the Candid specification does not define how one can obtain the interface of a given service, and there isn t really a an official way to do that on the Internet Computer. That is unfortunate, because many interesting features depend on that: Such as writing import C "ic:7e6iv-biaaa-aaaaf-aaada-cai" in your Motoko program, and having it s type right there. Or tools like ic.rocks, that allow you to interact with any canister right there. One reason why we don t really have that feature yet is because of disagreements about how dynamic that feature should be. Should you be able to just ask the canister for its interface (and allow the canister to vary the response, for example if it can change its functionality over time, even without changing the actual wasm code)? Or is the interface a static property of the code, and one should be able to query the system for that data, without the canister s active involvement. Or, completely different, should interfaces be distributed out of band, maybe together with API documentation, or in some canister registry somewhere else? I always leaned towards the first of these options, but not convincingly enough. The second options requires system assistance, so more components to change, more teams to be involved that maybe intrinsically don t care a lot about this feature. And the third might have emerged as the community matures and builds such infrastructure, but that did not happen yet. In the end I sneaked in an implementation of the first into Motoko, arguing that even if we don t know yet how this feature will be implemented eventually, we all want the feature to exist somehow, and we really really want to unblock all the interesting applications it enables (e.g. Candid UI). That s why every Motoko canister, and some rust canisters too, implements a method
__get_candid_interface_tmp_hack : () -> (text)
that one can use to get the Candid interface file. The name was chosen to signal that this may not be the final interface, but like all good provisional solutions, it may last longer than intended. If that s the case, I m not sorry. This concludes my blog post series about Candid, for now. If you want to know more, feel free to post your question on the DFINTY developer forum, and I ll probably answer.

12 September 2021

Vincent Bernat: Short feedback on Cisco pyATS and Genie Parser

Cisco pyATS is a framework for network automation and testing. It includes, among other things, an open-source multi-vendor set of parsers and models, Genie Parser. It features 2700 parsers for various commands over many network OS. On the paper, this seems a great tool!
>>> from genie.conf.base import Device
>>> device = Device("router", os="iosxr")
>>> # Hack to parse outputs without connecting to a device
>>> device.custom.setdefault("abstraction",  )["order"] = ["os", "platform"]
>>> cmd = "show route ipv4 unicast"
>>> output = """
... Tue Oct 29 21:29:10.924 UTC
...
... O    10.13.110.0/24 [110/2] via 10.12.110.1, 5d23h, GigabitEthernet0/0/0/0.110
... """
>>> device.parse(cmd, output=output)
 'vrf':  'default':  'address_family':  'ipv4':  'routes':  '10.13.110.0/24':  'route': '10.13.110.0/24',
       'active': True,
       'route_preference': 110,
       'metric': 2,
       'source_protocol': 'ospf',
       'source_protocol_codes': 'O',
       'next_hop':  'next_hop_list':  1:  'index': 1,
          'next_hop': '10.12.110.1',
          'outgoing_interface': 'GigabitEthernet0/0/0/0.110',
          'updated': '5d23h' 
First deception: pyATS is closed-source with some exceptions. This is quite annoying if you run into some issues outside Genie Parser. For example, although pyATS is using the ssh command, it cannot leverage my ssh_config file: pyATS resolves hostnames before providing them to ssh. There is no plan to open source pyATS. Then, Genie Parser has two problems:
  1. The data models used are dependent on the vendor and OS, despite the documentation saying otherwise. For example, the data model used for IPv4 interfaces is different between NX-OS and IOS-XR.
  2. The parsers rely on line-by-line regular expressions to extract data and some Python code as glue. This is fragile and may break silently.
To illustrate the second point, let s assume the output of show ipv4 vrf all interface is:
  Loopback10 is Up, ipv4 protocol is Up
    Vrf is default (vrfid 0x60000000)
    Internet protocol processing disabled
  Loopback30 is Up, ipv4 protocol is Down [VRF in FWD reference]
    Vrf is ran (vrfid 0x0)
    Internet address is 203.0.113.17/32
    MTU is 1500 (1500 is available to IP)
    Helper address is not set
    Directed broadcast forwarding is disabled
    Outgoing access list is not set
    Inbound  common access list is not set, access list is not set
    Proxy ARP is disabled
    ICMP redirects are never sent
    ICMP unreachables are always sent
    ICMP mask replies are never sent
    Table Id is 0x0
Because the regular expression to parse an interface name does not expect the extra data after the interface state, Genie Parser ignores the line starting the definition of Loopback30 and parses the output to this structure:1
 
  "Loopback10":  
    "int_status": "up",
    "oper_status": "up",
    "vrf": "ran",
    "vrf_id": "0x0",
    "ipv4":  
      "203.0.113.17/32":  
        "ip": "203.0.113.17",
        "prefix_length": "32"
       ,
      "mtu": 1500,
      "mtu_available": 1500,
      "broadcast_forwarding": "disabled",
      "proxy_arp": "disabled",
      "icmp_redirects": "never sent",
      "icmp_unreachables": "always sent",
      "icmp_replies": "never sent",
      "table_id": "0x0"
     
   
 
While this bug is simple to fix, this is an uphill battle. Any existing or future slight variation in the output of a command could trigger another similar undetected bug, despite the extended test coverage. I have reported and fixed several other silent parsing errors: #516, #529, and #530. A more robust alternative would have been to use TextFSM and to trigger a warning when some output is not recognized, like Batfish, a configuration analysis tool, does. In the future, we should rely on YANG for data extraction, but it is currently not widely supported. SNMP is still a valid possibility but much information is not accessible through this protocol. In the meantime, I would advise you to only use Genie Parser with caution.

  1. As an exercise, the astute reader is asked to write the code to extract the IPv4 from this structure.

9 September 2021

Bits from Debian: DebConf21 online closes

DebConf21 group photo - click to enlarge On Saturday 28 August 2021, the annual Debian Developers and Contributors Conference came to a close. DebConf21 has been held online for the second time, due to the coronavirus (COVID-19) disease pandemic. All of the sessions have been streamed, with a variety of ways of participating: via IRC messaging, online collaborative text documents, and video conferencing meeting rooms. With 740 registered attendees from more than 15 different countries and a total of over 70 event talks, discussion sessions, Birds of a Feather (BoF) gatherings and other activities, DebConf21 was a large success. The setup made for former online events involving Jitsi, OBS, Voctomix, SReview, nginx, Etherpad, a web-based frontend for voctomix has been improved and used for DebConf21 successfully. All components of the video infrastructure are free software, and configured through the Video Team's public ansible repository. The DebConf21 schedule included a wide variety of events, grouped in several tracks: The talks have been streamed using two rooms, and several of these activities have been held in different languages: Telugu, Portuguese, Malayalam, Kannada, Hindi, Marathi and English, allowing a more diverse audience to enjoy and participate. Between talks, the video stream has been showing the usual sponsors on the loop, but also some additional clips including photos from previous DebConfs, fun facts about Debian and short shout-out videos sent by attendees to communicate with their Debian friends. The Debian publicity team did the usual live coverage to encourage participation with micronews announcing the different events. The DebConf team also provided several mobile options to follow the schedule. For those who were not able to participate, most of the talks and sessions are already available through the Debian meetings archive website, and the remaining ones will appear in the following days. The DebConf21 website will remain active for archival purposes and will continue to offer links to the presentations and videos of talks and events. Next year, DebConf22 is planned to be held in Prizren, Kosovo, in July 2022. DebConf is committed to a safe and welcome environment for all participants. During the conference, several teams (Front Desk, Welcome team and Community team) have been available to help so participants get their best experience in the conference, and find solutions to any issue that may arise. See the web page about the Code of Conduct in DebConf21 website for more details on this. Debian thanks the commitment of numerous sponsors to support DebConf21, particularly our Platinum Sponsors: Lenovo, Infomaniak, Roche, Amazon Web Services (AWS) and Google. About Debian The Debian Project was founded in 1993 by Ian Murdock to be a truly free community project. Since then the project has grown to be one of the largest and most influential open source projects. Thousands of volunteers from all over the world work together to create and maintain Debian software. Available in 70 languages, and supporting a huge range of computer types, Debian calls itself the universal operating system. About DebConf DebConf is the Debian Project's developer conference. In addition to a full schedule of technical, social and policy talks, DebConf provides an opportunity for developers, contributors and other interested people to meet in person and work together more closely. It has taken place annually since 2000 in locations as varied as Scotland, Argentina, and Bosnia and Herzegovina. More information about DebConf is available from https://debconf.org/. About Lenovo As a global technology leader manufacturing a wide portfolio of connected products, including smartphones, tablets, PCs and workstations as well as AR/VR devices, smart home/office and data center solutions, Lenovo understands how critical open systems and platforms are to a connected world. About Infomaniak Infomaniak is Switzerland's largest web-hosting company, also offering backup and storage services, solutions for event organizers, live-streaming and video on demand services. It wholly owns its datacenters and all elements critical to the functioning of the services and products provided by the company (both software and hardware). About Roche Roche is a major international pharmaceutical provider and research company dedicated to personalized healthcare. More than 100.000 employees worldwide work towards solving some of the greatest challenges for humanity using science and technology. Roche is strongly involved in publicly funded collaborative research projects with other industrial and academic partners and have supported DebConf since 2017. About Amazon Web Services (AWS) Amazon Web Services (AWS) is one of the world's most comprehensive and broadly adopted cloud platform, offering over 175 fully featured services from data centers globally (in 77 Availability Zones within 24 geographic regions). AWS customers include the fastest-growing startups, largest enterprises and leading government agencies. About Google Google is one of the largest technology companies in the world, providing a wide range of Internet-related services and products such as online advertising technologies, search, cloud computing, software, and hardware. Google has been supporting Debian by sponsoring DebConf for more than ten years, and is also a Debian partner sponsoring parts of Salsa's continuous integration infrastructure within Google Cloud Platform. Contact Information For further information, please visit the DebConf21 web page at https://debconf21.debconf.org/ or send mail to press@debian.org.

8 September 2021

Thorsten Alteholz: My Debian Activities in August 2021

FTP master Yeah, Bullseye is released, thanks a lot to everybody involved! This month I accepted 242 and rejected 18 packages. The overall number of packages that got accepted was 253. Debian LTS This was my eighty-sixth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. This month my all in all workload has been 23.75h. During that time I did LTS and normal security uploads of: I also started to work on openssl, grilo and had to process packages from NEW on security-master. As the CVE of btrbk was later marked as no-dsa, an upload to stable and oldstable is needed now. Last but not least I did some days of frontdesk duties. Debian ELTS This month was the thirty-eighth ELTS month. During my allocated time I uploaded: I also started to work on openssl. Last but not least I did some days of frontdesk duties. Other stuff This month I uploaded new upstream versions of: On my neverending golang challenge I again uploaded some packages either for NEW or as source upload.

7 September 2021

Martin-&#201;ric Racine: sudo apt-get update && sudo apt-get dist-upgrade

Debian 11 (codename Bullseye) was recently released. This was the smoothest upgrade I've experienced in some 20 years as a Debian user. In my haste, I completely forgot to first upgrade dpkg and apt, doing a straight dist-upgrade. Nonetheless, everything worked out of the box. No unresolved dependency cycles. Via my last-mile Gigabit connection, it took about 5 minutes to upgrade and reboot. Congratulations to everyone who made this possible! Since the upgrade, only a handful of bugs were found. I filed bug reports. Over these past few days, maintainers have started responding. In once particular case, my report exposed a CVE caused by copy-pasted code between two similar packages. The source package fixed their code to something more secure a few years ago, while the destination package missed it. The situation has been brought to Debian's security team's attention and should be fixed over the next few days. Afterthoughts Having recently experienced hard-disk problems on my main desktop, upgrading to Bullseye made me revisit a few issues. One of these was the possibility of transiting to BTRFS. Last time I investigated the possibility was back when Ubuntu briefly switched their default filesystem to BRTFS. Back then, my feeling was that BRTFS wasn't ready for mainstream. For instance, the utility to convert an EXT2/3/4 partition to BTRFS corrupted the end of the partition. No thanks. However, in recent years, many large-scale online services have migrated to BRTFS and seem to be extremely happy with the result. Additionally, Linux kernel 5 added useful features such as background defragmentation. This got me pondering whether now would be a good time to migrate to BRTFS. Sadly it seems that the stock kernel shipping with Bullseye doesn't have any of these advanced features enabled in its configuration. Oh well. Geode The only point that has become problematic is my Geode hosts. For one things, upstream Rust maintainers have decided to ignore the fact that i686 is a specification and arbitrarily added compiler flags for more recent x86-32 CPUs to their i686 target. While Debian Rust maintainers have purposely downgraded the target, RustC still produces binaries that the Geode LX (essentially an i686 without PAE) cannot process. This affects fairly basic packages such as librsvg, which breaks SVG image support for a number of dependencies. Additionally, there's been persistent problems with systemd crashing on my Geode hosts whenever daemon-reload is issued. Then, a few days ago, problems started occurring with C++ binaries, because GCC-11 upstream enabled flags for more recent CPUs in their default i686 target. While I realize that SSE and similar recent CPU features produce better binaries, I cannot help but feel that treating CPU targets as anything else than a specification is a mistake. i686 is a specification. It is not a generic equivalent to x86-32.

Russell Coker: Oracle Cloud Free Tier

It seems that every cloud service of note has a free tier nowadays and the Oracle Cloud is the latest that I ve discovered (thanks to r/homelab which I highly recommend reading). Here s Oracle s summary of what they offer for free [1]. Oracle s always free tier (where presumable always is defined as until we change our contract ) currently offers ARM64 VMs to a total capacity of 4 CPU cores, 24G of RAM, and 200G of storage with a default VM size of 1/4 that (1 CPU core and 6G of RAM). It also includes 2 AMD64 VMs that each have 1G of RAM, but a 64bit VM with 1G of RAM isn t that useful nowadays. Web Interface The first thing to note is that the management interface is a massive pain to use. When a login times out for security reasons it redirects to a web page that gives a 404 error, maybe the redirection works OK if you are using it when it times out, but if you go off and spend an hour doing something else you will return to a 404 page. A web interface should never refer you to a page with a 404. There doesn t seem to be a way of bookmarking the commonly used links (as AWS does) and the set of links on the left depend on the section you are in with no obvious way of going between sections. Sometimes I got stuck in a set of pages about authentication controls (the identity cloud ) and there seems to be no link I could click on to get me back to cloud computing, I had to go to a bookmarked link for the main cloud login page. A web interface should never force the user to type in the main URL or go to a bookmark, you should be able to navigate from every page to every other page in a logical manner. An advanced user might have their own bookmarks in their browser to suit their workflow. But a beginner should be able to go to anywhere without breaking the session. Some parts of the interface appear to be copied from AWS, but unfortunately not the good parts. The way AWS manages IP access control is not easy to manage and it s not clear why packets are dropped, Oracle copies all this. On the upside Oracle has some good Datadog style analytics so for a new deployment you can debug IP access control by seeing records of rejected packets. Just to make it extra annoying when you create a rule with multiple ports specified the web interface will expand it out to multiple rules for one port each, having ports 80 and 443 on separate lines doesn t make things easier. Also it forces you to have IPv4 and IPv6 as separate rules, so if you want HTTP and HTTPS on both IPv4 and IPv6 (a common requirement) then you need 4 separate rules. One final annoying thing is that the web interface doesn t make your previous settings a default. As I ve created many ARM images and haven t created a single AMD image it should know that the probability that I want to create an AMD image is very low and stop defaulting to that. Recovery When trying a new system you will inevitably break things and have to recover things. The way to recover from a configuration error that prevents your VM from booting and getting to a state of allowing a login is to go to stop the VM, then go to the Boot volume section under Resources and use the settings button to detach the boot volume. Then you go to another VM (which must be running), go to the Attached block volumes menu and attach it as Paravirtualised (not iSCSI and not default which will probably be iSCSI). After some time the block device will appear and you can mount it and do stuff to it. Then after umounting it you detach it from the recovery VM and attach it again to the original VM (where it will still have an entry in the Boot volume section) and boot the original VM. As an aside it s really annoying that you can t attach a volume to a VM that isn t running. My first attempt at image recovery started with making a snapshot of the Boot volume, this didn t work well because the image uses EFI and therefore GPT and because the snapshot was larger than the original block device (which incidentally was the default size). I admit that I might have made a mistake making the snapshot, but if so it shouldn t be so easy to do. With GPT if you have a larger block device then partitioning tools complain about the backup partition table not being found, and they complain even more if you try to go back to the smaller size later on. Generally GPT partition tables are a bad idea for VMs, when I run the host I don t use partition tables, I have a separate block device for each filesystem or swap space. Snapshots aren t needed for recovery, they don t seem to work very well, and if it s possible to attach a snapshot to a VM in place of it s original Boot volume I haven t figured out how to do it. Console Connection If you boot Oracle Linux a derivative of RHEL that has SE Linux enabled in enforcing mode (yay) then you can go to the Console connection . The console is a Javascript console which allows you to login on a virtual serial console on device /dev/ttyAMA0. It tells you to type help but that isn t accepted, you have a straight Linux console login prompt. If you boot Ubuntu then you don t get a working serial console, it tells you to type help for help but doesn t respond to that. It seems that the Oracle Linux kernel 5.4.17-2102.204.4.4.el7uek.aarch64 is compiled with support for /dev/ttyAMA0 (the default ARM serial device) while the kernel 5.11.0-1016-oracle compiled by Oracle for their Ubuntu VMs doesn t have it. Performance I haven t done any detailed tests of VM performance. As a quick test I used zstd to compress a 154MB file, on my home workstation (E5-2620 v4 @ 2.10GHz) it took 11.3 seconds of CPU time to compress with zstd -9 and 7.2s to decompress. On the Oracle cloud it took 7.2s and 5.4s. So it seems that for some single core operations the ARM CPU used by the Oracle cloud is about 30% to 50% faster than a E5-2620 v4 (a slightly out of date server processor that uses DDR4 RAM). If you ran all the free resources in a single VM that would make a respectable build server. If you want to contribute to free software development and only have a laptop with 4G of RAM then an ARM build/test server with 24G of RAM and 4 cores would be very useful. Ubuntu Configuration The advantage of using EFI is that you can manage the kernel from within the VM. The default Oracle kernel for Ubuntu has a lot of modules included and is compiled with a lot of security options including SE Linux. Competitors https://aws.amazon.com/free AWS offers 750 hours (just over 31 days) per month of free usage of a t2.micro or t3.micro EC2 instance (which means 1GB of RAM). But that only lasts for 12 months and it s still only 1GB of RAM. AWS has some other things that could be useful like 1 million free Lambda requests per month. If you want to run your personal web site on Lambda you shouldn t hit that limit. They also apparently have some good offers for students. https://cloud.google.com/free The Google Cloud Project (GCP) offers $300 of credit. https://cloud.google.com/free/docs/gcp-free-tier#free-tier-usage-limits GCP also has ongoing free tier usage for some services. Some of them are pretty much unlimited use (50GB of storage for Cloud Source Repositories is a heap of source code). But for VMs you get the equivalent of 1*e2-micro instance running 24*7. A e2-micro has 1G of RAM. You also only get 30G of storage and 1GB of outbound data. It s clearly not as generous an offer as Oracle, but Oracle is the underdog so they have to try harder. https://azure.microsoft.com/en-us/free/ Azure appears to be much the same as AWS, free Linux VM for a year and then other less popular services free forever (or until they change the contract). https://www.ibm.com/cloud/free The IBM cloud free tier is the least generous offer, a VM is only free for 30 days. But what they offer for 30 days is pretty decent. If you want to try the IBM cloud and see if it can do what your company needs then this will do well. If you want to have free hosting for your hobby stuff then it s no good. Oracle seems like the most generous offer if you want to do stuff, but also one of the least valuable if you want to learn things that will help you at a job interview. For job interviews AWS seems the most useful and then GCP and Azure vying for second place.

6 September 2021

Vincent Bernat: Switching to the i3 window manager

I have been using the awesome window manager for 10 years. It is a tiling window manager, configurable and extendable with the Lua language. Using a general-purpose programming language to configure every aspect is a double-edged sword. Due to laziness and the apparent difficulty of adapting my configuration about 3000 lines to newer releases, I was stuck with the 3.4 version, whose last release is from 2013. It was time for a rewrite. Instead, I have switched to the i3 window manager, lured by the possibility to migrate to Wayland and Sway later with minimal pain. Using an embedded interpreter for configuration is not as important to me as it was in the past: it brings both complexity and brittleness.
i3 dual screen setup
Dual screen desktop running i3, Emacs, some terminals, including a Quake console, Firefox, Polybar as the status bar, and Dunst as the notification daemon.
The window manager is only one part of a desktop environment. There are several options for the other components. I am also introducing them in this post.

i3: the window manager i3 aims to be a minimal tiling window manager. Its documentation can be read from top to bottom in less than an hour. i3 organize windows in a tree. Each non-leaf node contains one or several windows and has an orientation and a layout. This information arbitrates the window positions. i3 features three layouts: split, stacking, and tabbed. They are demonstrated in the below screenshot:
Example of layouts
Demonstration of the layouts available in i3. The main container is split horizontally. The first child is split vertically. The second one is tabbed. The last one is stacking.
Tree representation of the previous screenshot
Tree representation of the previous screenshot.
Most of the other tiling window managers, including the awesome window manager, use predefined layouts. They usually feature a large area for the main window and another area divided among the remaining windows. These layouts can be tuned a bit, but you mostly stick to a couple of them. When a new window is added, the behavior is quite predictable. Moreover, you can cycle through the various windows without thinking too much as they are ordered. i3 is more flexible with its ability to build any layout on the fly, it can feel quite overwhelming as you need to visualize the tree in your head. At first, it is not unusual to find yourself with a complex tree with many useless nested containers. Moreover, you have to navigate windows using directions. It takes some time to get used to. I set up a split layout for Emacs and a few terminals, but most of the other workspaces are using a tabbed layout. I don t use the stacking layout. You can find many scripts trying to emulate other tiling window managers but I did try to get my setup pristine of these tentatives and get a chance to familiarize myself. i3 can also save and restore layouts, which is quite a powerful feature. My configuration is quite similar to the default one and has less than 200 lines.

i3 companion: the missing bits i3 philosophy is to keep a minimal core and let the user implements missing features using the IPC protocol:
Do not add further complexity when it can be avoided. We are generally happy with the feature set of i3 and instead focus on fixing bugs and maintaining it for stability. New features will therefore only be considered if the benefit outweighs the additional complexity, and we encourage users to implement features using the IPC whenever possible. Introduction to the i3 window manager
While this is not as powerful as an embedded language, it is enough for many cases. Moreover, as high-level features may be opinionated, delegating them to small, loosely coupled pieces of code keeps them more maintainable. Libraries exist for this purpose in several languages. Users have published many scripts to extend i3: automatic layout and window promotion to mimic the behavior of other tiling window managers, window swallowing to put a new app on top of the terminal launching it, and cycling between windows with Alt+Tab. Instead of maintaining a script for each feature, I have centralized everything into a single Python process, i3-companion using asyncio and the i3ipc-python library. Each feature is self-contained into a function. It implements the following components:
make a workspace exclusive to an application
When a workspace contains Emacs or Firefox, I would like other applications to move to another workspace, except for the terminal which is allowed to intrude into any workspace. The workspace_exclusive() function monitors new windows and moves them if needed to an empty workspace or to one with the same application already running.
implement a Quake console
The quake_console() function implements a drop-down console available from any workspace. It can be toggled with Mod+ . This is implemented as a scratchpad window.
back and forth workspace switching on the same output
With the workspace back_and_forth command, we can ask i3 to switch to the previous workspace. However, this feature is not restricted to the current output. I prefer to have one keybinding to switch to the workspace on the next output and one keybinding to switch to the previous workspace on the same output. This behavior is implemented in the previous_workspace() function by keeping a per-output history of the focused workspaces.
create a new empty workspace or move a window to an empty workspace
To create a new empty workspace or move a window to an empty workspace, you have to locate a free slot and use workspace number 4 or move container to workspace number 4. The new_workspace() function finds a free number and use it as the target workspace.
restart some services on output change
When adding or removing an output, some actions need to be executed: refresh the wallpaper, restart some components unable to adapt their configuration on their own, etc. i3 triggers an event for this purpose. The output_update() function also takes an extra step to coalesce multiple consecutive events and to check if there is a real change with the low-level library xcffib.
I will detail the other features as this post goes on. On the technical side, each function is decorated with the events it should react to:
@on(CommandEvent("previous-workspace"), I3Event.WORKSPACE_FOCUS)
async def previous_workspace(i3, event):
    """Go to previous workspace on the same output."""
The CommandEvent() event class is my way to send a command to the companion, using either i3-msg -t send_tick or binding a key to a nop command. The latter is used to avoid spawning a shell and a i3-msg process just to send a message. The companion listens to binding events and checks if this is a nop command.
bindsym $mod+Tab nop "previous-workspace"
There are other decorators to avoid code duplication: @debounce() to coalesce multiple consecutive calls, @static() to define a static variable, and @retry() to retry a function on failure. The whole script is a bit more than 1000 lines. I think this is worth a read as I am quite happy with the result.

dunst: the notification daemon Unlike the awesome window manager, i3 does not come with a built-in notification system. Dunst is a lightweight notification daemon. I am running a modified version with HiDPI support for X11 and recursive icon lookup. The i3 companion has a helper function, notify(), to send notifications using DBus. container_info() and workspace_info() uses it to display information about the container or the tree for a workspace.
Notification showing i3 tree for a workspace
Notification showing i3 s tree for a workspace

polybar: the status bar i3 bundles i3bar, a versatile status bar, but I have opted for Polybar. A wrapper script runs one instance for each monitor. The first module is the built-in support for i3 workspaces. To not have to remember which application is running in a workspace, the i3 companion renames workspaces to include an icon for each application. This is done in the workspace_rename() function. The icons are from the Font Awesome project. I maintain a mapping between applications and icons. This is a bit cumbersome but it looks great.
i3 workspaces in Polybar
i3 workspaces in Polybar
For CPU, memory, brightness, battery, disk, and audio volume, I am relying on the built-in modules. Polybar s wrapper script generates the list of filesystems to monitor and they get only displayed when available space is low. The battery widget turns red and blinks slowly when running out of power. Check my Polybar configuration for more details.
Various modules for Polybar
Polybar displaying various information: CPU usage, memory usage, screen brightness, battery status, Bluetooth status (with a connected headset), network status (connected to a wireless network and to a VPN), notification status, and speaker volume.
For Bluetooh, network, and notification statuses, I am using Polybar s ipc module: the next version of Polybar can receive an arbitrary text on an IPC socket. The module is defined with a single hook to be executed at the start to restore the latest status.
[module/network]
type = custom/ipc
hook-0 = cat $XDG_RUNTIME_DIR/i3/network.txt 2> /dev/null
initial = 1
It can be updated with polybar-msg action "#network.send.XXXX". In the i3 companion, the @polybar() decorator takes the string returned by a function and pushes the update through the IPC socket. The i3 companion reacts to DBus signals to update the Bluetooth and network icons. The @on() decorator accepts a DBusSignal() object:
@on(
    StartEvent,
    DBusSignal(
        path="/org/bluez",
        interface="org.freedesktop.DBus.Properties",
        member="PropertiesChanged",
        signature="sa sv as",
        onlyif=lambda args: (
            args[0] == "org.bluez.Device1"
            and "Connected" in args[1]
            or args[0] == "org.bluez.Adapter1"
            and "Powered" in args[1]
        ),
    ),
)
@retry(2)
@debounce(0.2)
@polybar("bluetooth")
async def bluetooth_status(i3, event, *args):
    """Update bluetooth status for Polybar."""
The middle of the bar is occupied by the date and a weather forecast. The latest also uses the IPC mechanism, but the source is a Python script triggered by a timer.
Date and weather in Polybar
Current date and weather forecast for the day in Polybar. The data is retrieved with the OpenWeather API.
I don t use the system tray integrated with Polybar. The embedded icons usually look horrible and they all behave differently. A few years back, Gnome has removed the system tray. Most of the problems are fixed by the DBus-based Status Notifier Item protocol also known as Application Indicators or Ayatana Indicators for GNOME. However, Polybar does not support this protocol. In the i3 companion, The implementation of Bluetooth and network icons, including displaying notifications on change, takes about 200 lines. I got to learn a bit about how DBus works and I get exactly the info I want.

picom: the compositor I like having slightly transparent backgrounds for terminals and to reduce the opacity of unfocused windows. This requires a compositor.1 picom is a lightweight compositor. It works well for me, but it may need some tweaking depending on your graphic card.2 Unlike the awesome window manager, i3 does not handle transparency, so the compositor needs to decide by itself the opacity of each window. Check my configuration for details.

systemd: the service manager I use systemd to start i3 and the various services around it. My xsession script only sets some environment variables and lets systemd handles everything else. Have a look at this article from Micha G ral for the rationale. Notably, each component can be easily restarted and their logs are not mangled inside the ~/.xsession-errors file.3 I am using a two-stage setup: i3.service depends on xsession.target to start services before i3:
[Unit]
Description=X session
BindsTo=graphical-session.target
Wants=autorandr.service
Wants=dunst.socket
Wants=inputplug.service
Wants=picom.service
Wants=pulseaudio.socket
Wants=policykit-agent.service
Wants=redshift.service
Wants=spotify-clean.timer
Wants=ssh-agent.service
Wants=xiccd.service
Wants=xsettingsd.service
Wants=xss-lock.service
Then, i3 executes the second stage by invoking the i3-session.target:
[Unit]
Description=i3 session
BindsTo=graphical-session.target
Wants=wallpaper.service
Wants=wallpaper.timer
Wants=polybar-weather.service
Wants=polybar-weather.timer
Wants=polybar.service
Wants=i3-companion.service
Wants=misc-x.service
Have a look on my configuration files for more details.

rofi: the application launcher Rofi is an application launcher. Its appearance can be customized through a CSS-like language and it comes with several themes. Have a look at my configuration for mine.
Rofi as an application launcher
Rofi as an application launcher
It can also act as a generic menu application. I have a script to control a media player and another one to select the wifi network. It is quite a flexible application.
Rofi as a wifi network selector
Rofi to select a wireless network

xss-lock and i3lock: the screen locker i3lock is a simple screen locker. xss-lock invokes it reliably on inactivity or before a system suspend. For inactivity, it uses the XScreenSaver events. The delay is configured using the xset s command. The locker can be invoked immediately with xset s activate. X11 applications know how to prevent the screen saver from running. I have also developed a small dimmer application that is executed 20 seconds before the locker to give me a chance to move the mouse if I am not away.4 Have a look at my configuration script.
Demonstration of xss-lock, xss-dimmer and i3lock with a 4 speedup.

The remaining components
  • autorandr is a tool to detect the connected display, match them against a set of profiles, and configure them with xrandr.
  • inputplug executes a script for each new mouse and keyboard plugged. This is quite useful to load the appropriate the keyboard map. See my configuration.
  • xsettingsd provides settings to X11 applications, not unlike xrdb but it notifies applications for changes. The main use is to configure the Gtk and DPI settings. See my article on HiDPI support on Linux with X11.
  • Redshift adjusts the color temperature of the screen according to the time of day.
  • maim is a utility to take screenshots. I use Prt Scn to trigger a screenshot of a window or a specific area and Mod+Prt Scn to capture the whole desktop to a file. Check the helper script for details.
  • I have a collection of wallpapers I rotate every hour. A script selects them using advanced machine learning algorithms and stitches them together on multi-screen setups. The selected wallpaper is reused by i3lock.

  1. Apart from the eye candy, a compositor also helps to get tear-free video playbacks.
  2. My configuration works with both Haswell (2014) and Whiskey Lake (2018) Intel GPUs. It also works with AMD GPU based on the Polaris chipset (2017).
  3. You cannot manage two different displays this way e.g. :0 and :1. In the first implementation, I did try to parametrize each service with the associated display, but this is useless: there is only one DBus user session and many services rely on it. For example, you cannot run two notification daemons.
  4. I have only discovered later that XSecureLock ships such a dimmer with a similar implementation. But mine has a cool countdown!

5 September 2021

Fran ois Marier: Using implicit TLS in Postfix

In order to mitigate the NO STARTTLS vulnerabilities, I recently switched my local SMTP smarthosts from STARTTLS (port 587) to implicit TLS (port 465). Here are the key configuration parameters for Postfix (i.e. /etc/postfix/main.cf):
relayhost = [smtp.kolabnow.com]:465
smtp_tls_wrappermode = yes
smtp_tls_security_level = secure
Note that this is for KolabNow, but the same works for GMail and Novus. The square brackets around the hostname tell Postfix not to look up the MX name using DNS and instead to use the SMTP server name as-is. Setting the smtp_tls_security_level parameter to secure ensures that the server is using a valid TLS certificate.

2 September 2021

Eddy Petri&#537;or: Stretch to Buster upgrade issues: "Grub error: symbol grub_is_lockdown not found", missing RTL8111/8168/8411 Ethernet driver and RTL8821CE Wireless adapter on Linux Kernel 5.10 (and 4.19)

I have been Debian Stretch running on my HP Pavilion 14-ce0000nq laptop since buying it back in April 2019, just before my presence at Oxidizeconf where I presented "How to Rust When Standards Are Defined in C".Debian Buster (aka Debian 10) was released about 4 months later and I've been postponing the upgrade as my free time isn't what it used to be. I also tend to wait for the first or even second update of the release to avoid any sharp edges.As this laptop has a Realtek 8821CE wireless card that wasn't officially supported in the Linux kernel, I had to use an out-of-tree hacked driver to have the wireless work on Stretch kernels such as 4.19, it didn't even got along with DKMS, so all compilations and installations of it, I did them manually. More reason to wait for a newer release that would contain a driver inside the official kernel.I was waiting for the inevitable and dreading the wireless issues, but since mid-august Bullseye became stable, turning Stretch into oldoldstable, I decided that I had to do the upgrade, at least to buster.The Grub error and the fix
Everything went quite smooth, except that after the reboot, the laptop failed to boot with this Grub error:

error: symbol grub_is_lockdown not found
I looked for a solution and it seemed everyone was stuck or the solution was unclear.There is even a bug report in Debian about this error, bug #984760.
Adding to the pile of confusion my own confused solution: I tried supergrubdisk2/rescatux, it didn't work for me, it might have been a combination of me using LVM and grub-efi-amd64. I also tried to boot in rescue mode the Buster first DVD (to avoid the need for network), I was able to enter the partition, mount the EFI partition, too, but since I didn't want to mess the setup even more or depend on an external USB stick, I didn't know where should I try to write the Grub EFI config - the root partition is on an NVME storage.When buying the laptop it had FreeDOS installed on it and some HP rescue app, which I did not wipe when installing Debian. I even forgot where or how was the EFI installed on the disk and EFI, even if it should be more reliable and simpler, I never got the hang of it.
In the end, I realized that I could via BIOS actually select manually which EFI executable should be booted into, so I was able to boot with some manual intervention during boot into the regular system.I tried regenerating the grub configuration, installing it and also tried restoring the default proper boot sequence (and I even installed refind in the system during my fumbling), but I think somewhere between grub-efi-amd64 reconfiguration and its reinstallation I managed to do the right thing, as the default boot screen is the Grub one now.Hints for anyone reading this in the hope to fix the same issue, hopefully it will make things better, not worse (see the text below):1) regenerate the grub config:
update-grub2
2) reinstall grub-efi-amd64 and make Debian the default
dpkg-reconfigure -plow grub-efi-amd64
When reinstalling grub-efi-amd64 onto the disk, I think the scariest questions were to these:

Force extra installation to the EFI removable media path?

Some EFI-based systems are buggy and do not handle new bootloaders correctly. If you force an extra installation of GRUB to the EFI removable media path, this should ensure that this system will boot Debian correctly despite such a problem. However, it may remove the ability to boot any other operating systems that also depend on this path. If so, you will need to make sure that GRUB is configured successfully to be able to boot any other OS installations correctly.

and
Update NVRAM variables to automatically boot into Debian?

GRUB can configure your platform's NVRAM variables so that it boots into Debian automatically when powered on. However, you may prefer to disable this behavior and avoid changes to your boot configuration. For example, if your NVRAM variables have been set up such that your system contacts a PXE server on every boot, this would preserve that behavior.

I think the first can be safely answered "No" if you don't plan on booting via a removable USB stick, while the second is the one that does the restoring.The second question is probably safe if you don't use PXE boot or other boot method, at least that's what I understand. But if you do, I suspect by installing refind, by playing with the multiple efi* named packages and tools, you can restore that, or it might be that your BIOS allows that directly.
I just did a walk through of these 2 steps again on my laptop and answered "No" to the removable media question as it leads to errors when the media was not inserted (in my case the internal SD card reader), and "Yes" to making Debian the default.It seems that for me this broke the FreeDOS and HP utilities boot entries from Grub, but I still can boot via the BIOS options and my goal was to have Debian boot correctly by default.
Fixing the missing RTL811/8168/8411 Ethernet card issue
As a side note for people with computers having Realtek RTL8111/8168/8411 Gigabit Ethernet Controller and upgrading to Buster or switching to a newer kernel, please note that you might end up having the unpleasant surprise even your Ethernet card to disappear because the r8169 driver is not loader by default.I had to add it to /etc/modules so is loaded by default:
eddy@aptonia:/ $ cat /etc/modules
# /etc/modules: kernel modules to load at boot time.
#
# This file contains the names of kernel modules that should be loaded
# at boot time, one per line. Lines beginning with "#" are ignored.
r8169

The 5.10 compatible driver for RTL8821CE wireless adapterAfter the upgrade to Buster, the oldstable version of the kernel, 4.19, the hacked version of the driver I've been using on Stretch on 4.9 kernels was no longer compatible - failed to compile due to missing symbols.The fix for me was to switch to the DKMS compatible driver from https://github.com/tomaspinho/rtl8821ce, as this seems to work for both 4.19 and 5.10 kernels (installed from backports).
I installed it via a modification of the manual install method only for the 4.19 and 5.10 kernels, leaving the legacy 4.9 kernels working with the hacked driver. You can do the same if instead of running the provided script, you do its steps manually and you install only for the kernel versions you want, instead of the default to install for all:I looked inside the dkms-install.sh script to do the required steps:Copy the driver, add it to the dkms set of known drivers:
DRV_NAME=rtl8821ce
DRV_VERSION=v5.5.2_34066.20200325

cp -r . /usr/src/$ DRV_NAME -$ DRV_VERSION

dkms add -m $ DRV_NAME -v $ DRV_VERSION
But you just build and install them only for the select kernel versions of your choice:
dkms build -m $ DRV_NAME -v $ DRV_VERSION -k 5.10.0-0.bpo.8-amd64
dkms install -m $ DRV_NAME -v $ DRV_VERSION -k 5.10.0-0.bpo.8-amd64
Or, without the variables:
dkms build rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64
dkms install rtl8821ce/v5.5.2_34066.20200325 -k 4.19.0-17-amd64
dkms status should confirm everything is in place and I think you need to update grub2 again after this.
Please note this driver is no longer maintained and the 5.10 tree should support the RTL8821CE wireless card with the rtw88 driver from the kernel, but for me it did not. I'll probably try this at a later time, or after I upgrade to the current Debian stable, Bullseye.

31 August 2021

Russ Allbery: kstart 4.3

kstart provides the programs k5start and krenew, which are similar to the Kerberos kinit program with some extra support for running programs with separate credentials and running as a daemon. This is the first full release in nearly six years. The major change is new support for the Linux kafs module, which is a native Linux implementation of the AFS protocol that David Howells and others have been working on for years. It has an entirely different way of thinking about tokens and credential isolation built on Linux keyrings rather than the AFS token concept (which sometimes uses keyrings, but in a different way, and sometimes uses other hacks). k5start and krenew, when run with the -t option to get AFS tokens, would fail if AFS was not available. That meant -t would fail with kafs even if the AKLOG environment variable were set properly to aklog-kafs. This release fixes that. The programs also optionally link with libkeyutils and use it when used to run a command to isolate the AFS credentials from the calling process. This is done by creating a new session keyring and linking it to the user keyring before running the aklog program. Thanks to Bill MacAllister, David Howells, and Jeffrey Altman for the help with this feature. I'm not sure that I have it right, so please let me know if it doesn't work for you. Also in this release is a fix from Aasif Versi to use a smarter exit status if k5start or krenew is running another program and that program is killed with a signal. Previously, that would cause k5start or krenew to exit with a status of 0, which was not helpful. Now it exits with a status formed by adding 128 to the signal number, which matches the behavior of bash. Since this is the first release in a while, it also contains some other minor fixes and portability updates. You can get the latest release from the kstart distribution page.

Next.