Search Results: "undef"

6 May 2025

Enrico Zini: Python-like abspath for c++

Python's os.path.abspath or Path.absolute are great: you give them a path, which might not exist, and you get a path you can use regardless of the current directory. os.path.abspath will also normalize it, while Path will not by default because with Paths a normal form is less needed. This is great to normalize input, regardless of if it's an existing file you're needing to open, or a new file you're needing to create. In C++17, there is a filesystem library with methods with enticingly similar names, but which are almost, but not quite, totally unlike Python's abspath. Because in my C++ code I need to normalize input, regardless of if it's an existing file I'm needing to open or a new file I'm needing to create, here's an apparently working Python-like abspath for C++ implemented on top of the std::filesystem library:
std::filesystem::path abspath(const std::filesystem::path& path)
 
    // weakly_canonical is defined as "the result of calling canonical() with a
    // path argument composed of the leading elements of p that exist (as
    // determined by status(p) or status(p, ec)), if any, followed by the
    // elements of p that do not exist."
    //
    // This means that if no lead components of the path exist then the
    // resulting path is not made absolute, and we need to work around that.
    if (!path.is_absolute())
        return abspath(std::filesystem::current_path() / path);
    // This is further and needlessly complicated because we need to work
    // around https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118733
    unsigned retry = 0;
    while (true)
     
        std::error_code code;
        auto result = std::filesystem::weakly_canonical(path, code);
        if (!code)
         
            // fprintf(stderr, "%s: ok in %u tries\n", path.c_str(), retry+1);
            return result;
         
        if (code == std::errc::no_such_file_or_directory)
         
            ++retry;
            if (retry > 50)
                throw std::system_error(code);
         
        else
            throw std::system_error(code);
     
    // Alternative implementation that however may not work on all platforms
    // since, formally, "[std::filesystem::absolute] Implementations are
    // encouraged to not consider p not existing to be an error", but they do
    // not mandate it, and if they did, they might still be affected by the
    // undefined behaviour outlined in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=118733
    //
    // return std::filesystem::absolute(path).lexically_normal();
 
I added it to my wobble code repository, which is the thin repository of components I use to ease my C++ systems programming.

13 April 2025

Keith Packard: sanitizer-fun

Fun with -fsanitize=undefined and Picolibc Both GCC and Clang support the -fsanitize=undefined flag which instruments the generated code to detect places where the program wanders into parts of the C language specification which are either undefined or implementation defined. Many of these are also common programming errors. It would be great if there were sanitizers for other easily detected bugs, but for now, at least the undefined sanitizer does catch several useful problems. Supporting the sanitizer The sanitizer can be built to either trap on any error or call handlers. In both modes, the same problems are identified, but when trap mode is enabled, the compiler inserts a trap instruction and doesn't expect the program to continue running. When handlers are in use, each identified issue is tagged with a bunch of useful data and then a specific sanitizer handling function is called. The specific functions are not all that well documented, nor are the parameters they receive. Maybe this is because both compilers provide an implementation of all of the functions they use and don't really expect external implementations to exist? However, to make these useful in an embedded environment, picolibc needs to provide a complete set of handlers that support all versions both gcc and clang as the compiler-provided versions depend upon specific C (and C++) libraries. Of course, programs can be built in trap-on-error mode, but that makes it much more difficult to figure out what went wrong. Fixing Sanitizer Issues Once the sanitizer handlers were implemented, picolibc could be built with them enabled and all of the picolibc tests run to uncover issues within the library. As with the static analyzer adventure from last year, the vast bulk of sanitizer complaints came from invoking undefined or implementation-defined behavior in harmless ways: Signed integer shifts This is one area where the C language spec is just wrong. For left shift, before C99, it worked on signed integers as a bit-wise operator, equivalent to the operator on unsigned integers. After that, left shift of negative integers became undefined. Fortunately, it's straightforward (if tedious) to work around this issue by just casting the operand to unsigned, performing the shift and casting it back to the original type. Picolibc now has an internal macro, lsl, which does this:
    #define lsl(__x,__s) ((sizeof(__x) == sizeof(char)) ?                   \
                          (__typeof(__x)) ((unsigned char) (__x) << (__s)) :  \
                          (sizeof(__x) == sizeof(short)) ?                  \
                          (__typeof(__x)) ((unsigned short) (__x) << (__s)) : \
                          (sizeof(__x) == sizeof(int)) ?                    \
                          (__typeof(__x)) ((unsigned int) (__x) << (__s)) :   \
                          (sizeof(__x) == sizeof(long)) ?                   \
                          (__typeof(__x)) ((unsigned long) (__x) << (__s)) :  \
                          (sizeof(__x) == sizeof(long long)) ?              \
                          (__typeof(__x)) ((unsigned long long) (__x) << (__s)) : \
                          __undefined_shift_size(__x, __s))
Right shift is significantly more complicated to implement. What we want is an arithmetic shift with the sign bit being replicated as the value is shifted rightwards. C defines no such operator. Instead, right shift of negative integers is implementation defined. Fortunately, both gcc and clang define the >> operator on signed integers as arithmetic shift. Also fortunately, C hasn't made this undefined, so the program itself doesn't end up undefined. The trouble with arithmetic right shift is that it is not equivalent to right shift of unsigned values. Here's what Per Vognsen came up with using standard C operators:
    int
    __asr_int(int x, int s)  
        return x < 0 ? ~(~x >> s) : x >> s;
     
When the value is negative, we invert all of the bits (making it positive), shift right, then flip all of the bits back. Both GCC and Clang seem to compile this to a single asr instruction. This function is replicated for each of the five standard integer types and then the set of them wrapped in another sizeof-selecting macro:
    #define asr(__x,__s) ((sizeof(__x) == sizeof(char)) ?           \
                          (__typeof(__x))__asr_char(__x, __s) :       \
                          (sizeof(__x) == sizeof(short)) ?          \
                          (__typeof(__x))__asr_short(__x, __s) :      \
                          (sizeof(__x) == sizeof(int)) ?            \
                          (__typeof(__x))__asr_int(__x, __s) :        \
                          (sizeof(__x) == sizeof(long)) ?           \
                          (__typeof(__x))__asr_long(__x, __s) :       \
                          (sizeof(__x) == sizeof(long long)) ?      \
                          (__typeof(__x))__asr_long_long(__x, __s):   \
                          __undefined_shift_size(__x, __s))
The lsl and asr macros use sizeof instead of the type-generic mechanism to remain compatible with compilers that lack type-generic support. Once these macros were written, they needed to be applied where required. To preserve the benefits of detecting programming errors, they were only applied where required, not blindly across the whole codebase. There are a couple of common patterns in the math code using shift operators. One is when computing the exponent value for subnormal numbers.
for (ix = -1022, i = hx << 11; i > 0; i <<= 1)
    ix -= 1;
This code computes the exponent by shifting the significand left by 11 bits (the width of the exponent field) and then incrementally shifting it one bit at a time until the sign flips, which indicates that the most-significant bit is set. Use of the pre-C99 definition of the left shift operator is intentional here; so both shifts are replaced with our lsl operator. In the implementation of pow, the final exponent is computed as the sum of the two exponents, both of which are in the allowed range. The resulting sum is then tested to see if it is zero or negative to see if the final value is sub-normal:
hx += n << 20;
if (hx >> 20 <= 0)
    /* do sub-normal things */
In this case, the exponent adjustment, n, is a signed value and so that shift is replaced with the lsl macro. The test value needs to compute the correct the sign bit, so we replace this with the asr macro. Because the right shift operation is not undefined, we only use our fancy macro above when the undefined behavior sanitizer is enabled. On the other hand, the lsl macro should have zero cost and covers undefined behavior, so it is always used. Actual Bugs Found! The goal of this little adventure was both to make using the undefined behavior sanitizer with picolibc possible as well as to use the sanitizer to identify bugs in the library code. I fully expected that most of the effort would be spent masking harmless undefined behavior instances, but was hopeful that the effort would also uncover real bugs in the code. I was not disappointed. Through this work, I found (and fixed) eight bugs in the code:
  1. setlocale/newlocale didn't check for NULL locale names
  2. qsort was using uintptr_t to swap data around. On MSP430 in 'large' mode, that's a 20-bit type inside a 32-bit representation.
  3. random() was returning values in int range rather than long.
  4. m68k assembly for memcpy was broken for sizes > 64kB.
  5. freopen returned NULL, even on success
  6. The optimized version of memrchr was always performing unaligned accesses.
  7. String to float conversion had a table missing four values. This caused an array access overflow which resulted in imprecise values in some cases.
  8. vfwscanf mis-parsed floating point values by assuming that wchar_t was unsigned.
Sanitizer Wishes While it's great to have a way to detect places in your C code which evoke undefined and implementation defined behaviors, it seems like this tooling could easily be extended to detect other common programming mistakes, even where the code is well defined according to the language spec. An obvious example is in unsigned arithmetic. How many bugs come from this seemingly innocuous line of code?
    p = malloc(sizeof(*p) * c);
Because sizeof returns an unsigned value, the resulting computation never results in undefined behavior, even when the multiplication wraps around, so even with the undefined behavior sanitizer enabled, this bug will not be caught. Clang seems to have an unsigned integer overflow sanitizer which should do this, but I couldn't find anything like this in gcc. Summary The undefined behavior sanitizers present in clang and gcc both provide useful diagnostics which uncover some common programming errors. In most cases, replacing undefined behavior with defined behavior is straightforward, although the lack of an arithmetic right shift operator in standard C is irksome. I recommend anyone using C to give it a try.

18 March 2025

Petter Reinholdtsen: New theora release 1.2.0beta1 after almost 15 years

When I a few days ago discovered that a security problem reported against the theora library last year was still not fixed, and because I was already up to speed on Xiph development, I decided it was time to wrap up a new theora release. This new release was tagged in the Xiph gitlab theora instance Saturday. You can fetch the new release from the Theora home page. The list of changes since The 1.2.0alpha1 release from the CHANGES file in the tarball look like this:
libteora 1.2.0beta1 (2025 March 15)
  • Bumped minor SONAME versions as methods changed constness of arguments.
  • Updated libogg dependency to version 1.3.4 for ogg_uint64_t.
  • Updated doxygen setup.
  • Updated autotools setup and support scripts (#1467 #1800 #1987 #2318 #2320).
  • Added support for RISC OS.
  • Fixed mingw build (#2141).
  • Improved ARM support.
  • Converted SCons setup to work with Python 3.
  • Introduced new configure options --enable-mem-constraint and --enable-gcc-sanitizers.
  • Fixed all known compiler warnings and errors from gcc and clang.
  • Improved examples for stability and correctness.
  • Variuos speed, bug fixes and code quality improvements.
  • Fixed build problem with Visual Studio (#2317).
  • Avoids undefined bit shift of signed numbers (#2321, #2322).
  • Avoids example encoder crash on bogus audio input (#2305).
  • Fixed musl linking issue with asm enabled (#2287).
  • Fixed some broken clamping in rate control (#2229).
  • Added NULL check _tc and _setup even for data packets (#2279).
  • Fixed mismatched oc_mb_fill_cmapping11 signature (#2068).
  • Updated the documentation for theora_encode_comment() (#726).
  • Adjusted build to Only link libcompat with dump_video (#1587).
  • Corrected an operator precedence error in the visualization code (#1751).
  • Fixed two spelling errors in the comments (#1804).
  • Avoid negative bit shift operation in huffdec.c (CVE-2024-56431).
  • Improved library documentation and specification text.
  • Adjusted library dependencies so libtheoraenc do not depend on libtheoradec.
  • Handle fallout from CVE-2017-14633 in libvorbis, check return value in encoder_example and transcoder_example.
There are a few bugs still being investigated, and my plan is to wrap up a final 1.2.0 release two weekends from now. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

10 January 2025

Dirk Eddelbuettel: nanotime 0.3.11 on CRAN: Polish

Another minor update 0.3.11 for our nanotime package is now on CRAN. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release covers two corner case. Michael sent in a PR avoiding a clang warning on complex types. We fixed an issue that surfaced in a downstream package under sanitizier checks: R extends coverage of NA to types such as integer or character which need special treatment in non-R library code as they do not know . We flagged (character) formatted values after we had called the corresponding CCTZ function but that leaves potentiall undefined values (from R s NA values for int, say, cast to double) so now we flag them, set a transient safe value for the call and inject the (character) representation "NA" after the call in those spots. End result is the same, but without a possibly slap on the wrist from sanitizer checks. The NEWS snippet below has the full details.

Changes in version 0.3.11 (2025-01-10)
  • Explicit Rcomplex assignment accommodates pickier compilers over newer R struct (Michael Chirico in #135 fixing #134)
  • When formatting, NA are flagged before CCTZ call to to not trigger santizier, and set to NA after call (Dirk in #136)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

2 January 2025

Colin Watson: Free software activity in December 2024

Most of my Debian contributions this month were sponsored by Freexian, as well as one direct donation via Liberapay (thanks!). OpenSSH I issued a bookworm update with a number of fixes that had accumulated over the last year, especially fixing GSS-API key exchange which was quite broken in bookworm. base-passwd A few months ago, the adduser maintainer started a discussion with me (as the base-passwd maintainer) and the shadow maintainer about bringing all three source packages under one team, since they often need to cooperate on things like user and group names. I agreed, but hadn t got round to doing anything about it until recently. I ve now officially moved it under team maintenance. debconf Gioele Barabucci has been working on eliminating duplicated code between debconf and cdebconf, ultimately with the goal of migrating to cdebconf (which I m not sure I m convinced of as a goal, but if we can make improvements to both packages as part of working towards it then there s no harm in that). I finally got round to reviewing and merging confmodule changes in each of debconf and cdebconf. This caused an installer regression due to a weirdness in cdebconf-udeb s packaging, which I fixed - sorry about that! I ve also been dealing with a few patch submissions that had been in my queue for a long time, but more on that next month if all goes well. CI issues I noticed and fixed a problem with Restrictions: needs-sudo in autopkgtest. I fixed broken aptly images in the Salsa CI pipeline. Python team Last month, I mentioned some progress on sorting out the multipart vs. python-multipart name conflict in Debian (#1085728), and said that I thought we d be able to finish it soon. I was right! We got it all done this month: The Python 3.13 transition continues, and last month we were able to add it to the supported Python versions in testing. (The next step will be to make it the default.) I fixed lots of problems in aid of this, including: Sphinx 8.0 removed some old intersphinx_mapping syntax which turned out to still be in use by many packages in Debian. The fixes for this were individually trivial, but there were a lot of them: I found that twisted 24.11.0 broke tests in buildbot and wokkel, and fixed those. I packaged python-flatdict, needed for a new upstream version of python-semantic-release. I tracked down a test failure in vdirsyncer (which I ve been using for some years, but had never previously needed to modify) and contributed a fix upstream. I fixed some packages to tolerate future versions of dh-python that will drop their dependency on python3-setuptools: I fixed django-cte to remove a build-dependency on the obsolete python3-nose package. I added Django 5.1 support to django-polymorphic. (There are a number of other packages that still need work here.) I fixed various other build/test failures: I upgraded these packages to new upstream versions: I updated the team s library style guide to remove material related to Python 2 and early versions of Python 3, which is no longer relevant to any current Python packaging work. Other Python upstream work I happened to notice a Twisted upstream issue requesting the removal of the deprecated twisted.internet.defer.returnValue, realized it was still used in many places in Debian, and went on a PR-filing spree informed by codesearch to try to reduce the future impact of such a change on Debian: Other small fixes Santiago Vila has been building the archive with make --shuffle (also see its author s explanation). I fixed associated bugs in cccc (contributed upstream), groff, and spectemu. I backported an upstream patch to putty to fix undefined behaviour that affected use of the small keypad . I removed groff s Recommends: libpaper1 (#1091375, #1091376), since it isn t currently all that useful and was getting in the way of a transition to libpaper2. I filed an upstream bug suggesting better integration in this area.

27 December 2024

Wouter Verhelst: Writing an extensible JSON-based DSL with Moose

At work, I've been maintaining a perl script that needs to run a number of steps as part of a release workflow. Initially, that script was very simple, but over time it has grown to do a number of things. And then some of those things did not need to be run all the time. And then we wanted to do this one exceptional thing for this one case. And so on; eventually the script became a big mess of configuration options and unreadable flow, and so I decided that I wanted it to be more configurable. I sat down and spent some time on this, and eventually came up with what I now realize is a domain-specific language (DSL) in JSON, implemented by creating objects in Moose, extensible by writing more object classes. Let me explain how it works. In order to explain, however, I need to explain some perl and Moose basics first. If you already know all that, you can safely skip ahead past the "Preliminaries" section that's next.

Preliminaries

Moose object creation, references. In Moose, creating a class is done something like this:
package Foo;
use v5.40;
use Moose;
has 'attribute' => (
    is  => 'ro',
    isa => 'Str',
    required => 1
);
sub say_something  
    my $self = shift;
    say "Hello there, our attribute is " . $self->attribute;
 
The above is a class that has a single attribute called attribute. To create an object, you use the Moose constructor on the class, and pass it the attributes you want:
use v5.40;
use Foo;
my $foo = Foo->new(attribute => "foo");
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a new object with the attribute attribute set to bar. The attribute accessor is a method generated by Moose, which functions both as a getter and a setter (though in this particular case we made the attribute "ro", meaning read-only, so while it can be set at object creation time it cannot be changed by the setter anymore). So yay, an object. And it has methods, things that we set ourselves. Basic OO, all that. One of the peculiarities of perl is its concept of "lists". Not to be confused with the lists of python -- a concept that is called "arrays" in perl and is somewhat different -- in perl, lists are enumerations of values. They can be used as initializers for arrays or hashes, and they are used as arguments to subroutines. Lists cannot be nested; whenever a hash or array is passed in a list, the list is "flattened", that is, it becomes one big list. This means that the below script is functionally equivalent to the above script that uses our "Foo" object:
use v5.40;
use Foo;
my %args;
$args attribute  = "foo";
my $foo = Foo->new(%args);
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a hash %args wherein we set the attributes that we want to pass to our constructor. We set one attribute in %args, the one called attribute, and then use %args and rely on list flattening to create the object with the same attribute set (list flattening turns a hash into a list of key-value pairs). Perl also has a concept of "references". These are scalar values that point to other values; the other value can be a hash, a list, or another scalar. There is syntax to create a non-scalar value at assignment time, called anonymous references, which is useful when one wants to remember non-scoped values. By default, references are not flattened, and this is what allows you to create multidimensional values in perl; however, it is possible to request list flattening by dereferencing the reference. The below example, again functionally equivalent to the previous two examples, demonstrates this:
use v5.40;
use Foo;
my $args =  ;
$args-> attribute  = "foo";
my $foo = Foo->new(%$args);
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a scalar $args, which is a reference to an anonymous hash. Then, we set the key attribute of that anonymous hash to bar (note the use arrow operator here, which is used to indicate that we want to dereference a reference to a hash), and create the object using that reference, requesting hash dereferencing and flattening by using a double sigil, %$. As a side note, objects in perl are references too, hence the fact that we have to use the dereferencing arrow to access the attributes and methods of Moose objects. Moose attributes don't have to be strings or even simple scalars. They can also be references to hashes or arrays, or even other objects:
package Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef[Str]',
    predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    predicate => 'has_object_attribute',
);
sub say_something  
    my $self = shift;
    if($self->has_object_attribute)  
        $self->object_attribute->say_something;
     
    $self->SUPER::say_something unless $self->has_hash_attribute;
    say "We have a hash attribute!"
 
This creates a subclass of Foo called Bar that has a hash attribute called hash_attribute, and an object attribute called object_attribute. Both of them are references; one to a hash, the other to an object. The hash ref is further limited in that it requires that each value in the hash must be a string (this is optional but can occasionally be useful), and the object ref in that it must refer to an object of the class Foo, or any of its subclasses. The predicates used here are extra subroutines that Moose provides if you ask for them, and which allow you to see if an object's attribute has a value or not. The example script would use an object like this:
use v5.40;
use Bar;
my $foo = Foo->new(attribute => "foo");
my $bar = Bar->new(object_attribute => $foo, attribute => "bar");
$bar->say_something;
(output: Hello there, our attribute is foo) This example also shows object inheritance, and methods implemented in child classes. Okay, that's it for perl and Moose basics. On to...

Moose Coercion Moose has a concept of "value coercion". Value coercion allows you to tell Moose that if it sees one thing but expects another, it should convert is using a passed subroutine before assigning the value. That sounds a bit dense without example, so let me show you how it works. Reimaginging the Bar package, we could use coercion to eliminate one object creation step from the creation of a Bar object:
package "Bar";
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
extends "Foo";
coerce "Foo",
    from "HashRef",
    via   Foo->new(%$_)  ;
has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef',
    predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    coerce => 1,
    predicate => 'has_object_attribute',
);
sub say_something  
    my $self = shift;
    if($self->has_object_attribute)  
        $self->object_attribute->say_something;
     
    $self->SUPER::say_something unless $self->has_hash_attribute;
    say "We have a hash attribute!"
 
Okay, let's unpack that a bit. First, we add the Moose::Util::TypeConstraints module to our package. This is required to declare coercions. Then, we declare a coercion to tell Moose how to convert a HashRef to a Foo object: by using the Foo constructor on a flattened list created from the hashref that it is given. Then, we update the definition of the object_attribute to say that it should use coercions. This is not the default, because going through the list of coercions to find the right one has a performance penalty, so if the coercion is not requested then we do not do it. This allows us to simplify declarations. With the updated Bar class, we can simplify our example script to this:
use v5.40;
use Bar;
my $bar = Bar->new(attribute => "bar", object_attribute =>   attribute => "foo"  );
$bar->say_something
(output: Hello there, our attribute is foo) Here, the coercion kicks in because the value object_attribute, which is supposed to be an object of class Foo, is instead a hash ref. Without the coercion, this would produce an error message saying that the type of the object_attribute attribute is not a Foo object. With the coercion, however, the value that we pass to object_attribute is passed to a Foo constructor using list flattening, and then the resulting Foo object is assigned to the object_attribute attribute. Coercion works for more complicated things, too; for instance, you can use coercion to coerce an array of hashes into an array of objects, by creating a subtype first:
package MyCoercions;
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
use Foo;
subtype "ArrayOfFoo", as "ArrayRef[Foo]";
subtype "ArrayOfHashes", as "ArrayRef[HashRef]";
coerce "ArrayOfFoo", from "ArrayOfHashes", via   [ map   Foo->create(%$_)   @ $_  ]  ;
Ick. That's a bit more complex. What happens here is that we use the map function to iterate over a list of values. The given list of values is @ $_ , which is perl for "dereference the default value as an array reference, and flatten the list of values in that array reference". So the ArrayRef of HashRefs is dereferenced and flattened, and each HashRef in the ArrayRef is passed to the map function. The map function then takes each hash ref in turn and passes it to the block of code that it is also given. In this case, that block is Foo->create(%$_) . In other words, we invoke the create factory method with the flattened hashref as an argument. This returns an object of the correct implementation (assuming our hash ref has a type attribute set), and with all attributes of their object set to the correct value. That value is then returned from the block (this could be made more explicit with a return call, but that is optional, perl defaults a return value to the rvalue of the last expression in a block). The map function then returns a list of all the created objects, which we capture in an anonymous array ref (the [] square brackets), i.e., an ArrayRef of Foo object, passing the Moose requirement of ArrayRef[Foo]. Usually, I tend to put my coercions in a special-purpose package. Although it is not strictly required by Moose, I find that it is useful to do this, because Moose does not allow a coercion to be defined if a coercion for the same type had already been done in a different package. And while it is theoretically possible to make sure you only ever declare a coercion once in your entire codebase, I find that doing so is easier to remember if you put all your coercions in a specific package. Okay, now you understand Moose object coercion! On to...

Dynamic module loading Perl allows loading modules at runtime. In the most simple case, you just use require inside a stringy eval:
my $module = "Foo";
eval "require $module";
This loads "Foo" at runtime. Obviously, the $module string could be a computed value, it does not have to be hardcoded. There are some obvious downsides to doing things this way, mostly in the fact that a computed value can basically be anything and so without proper checks this can quickly become an arbitrary code vulnerability. As such, there are a number of distributions on CPAN to help you with the low-level stuff of figuring out what the possible modules are, and how to load them. For the purposes of my script, I used Module::Pluggable. Its API is fairly simple and straightforward:
package Foo;
use v5.40;
use Moose;
use Module::Pluggable require => 1;
has 'attribute' => (
    is => 'ro',
    isa => 'Str',
);
has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);
sub handles_type  
    return 0;
 
sub create  
    my $class = shift;
    my %data = @_;
    foreach my $impl($class->plugins)  
        if($impl->can("handles_type") && $impl->handles_type($data type ))  
            return $impl->new(%data);
         
     
    die "could not find a plugin for type " . $data type ;
 
sub say_something  
    my $self = shift;
    say "Hello there, I am a " . $self->type;
 
The new concept here is the plugins class method, which is added by Module::Pluggable, and which searches perl's library paths for all modules that are in our namespace. The namespace is configurable, but by default it is the name of our module; so in the above example, if there were a package "Foo::Bar" which
  • has a subroutine handles_type
  • that returns a truthy value when passed the value of the type key in a hash that is passed to the create subroutine,
  • then the create subroutine creates a new object with the passed key/value pairs used as attribute initializers.
Let's implement a Foo::Bar package:
package Foo::Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);
has 'serves_drinks' => (
    is => 'ro',
    isa => 'Bool',
    default => 0,
);
sub handles_type  
    my $class = shift;
    my $type = shift;
    return $type eq "bar";
 
sub say_something  
    my $self = shift;
    $self->SUPER::say_something;
    say "I serve drinks!" if $self->serves_drinks;
 
We can now indirectly use the Foo::Bar package in our script:
use v5.40;
use Foo;
my $obj = Foo->create(type => bar, serves_drinks => 1);
$obj->say_something;
output:
Hello there, I am a bar.
I serve drinks!
Okay, now you understand all the bits and pieces that are needed to understand how I created the DSL engine. On to...

Putting it all together We're actually quite close already. The create factory method in the last version of our Foo package allows us to decide at run time which module to instantiate an object of, and to load that module at run time. We can use coercion and list flattening to turn a reference to a hash into an object of the correct type. We haven't looked yet at how to turn a JSON data structure into a hash, but that bit is actually ridiculously trivial:
use JSON::MaybeXS;
my $data = decode_json($json_string);
Tada, now $data is a reference to a deserialized version of the JSON string: if the JSON string contained an object, $data is a hashref; if the JSON string contained an array, $data is an arrayref, etc. So, in other words, to create an extensible JSON-based DSL that is implemented by Moose objects, all we need to do is create a system that
  • takes hash refs to set arguments
  • has factory methods to create objects, which
    • uses Module::Pluggable to find the available object classes, and
    • uses the type attribute to figure out which object class to use to create the object
  • uses coercion to convert hash refs into objects using these factory methods
In practice, we could have a JSON file with the following structure:
 
    "description": "do stuff",
    "actions": [
         
            "type": "bar",
            "serves_drinks": true,
         ,
         
            "type": "bar",
            "serves_drinks": false,
         
    ]
 
... and then we could have a Moose object definition like this:
package MyDSL;
use v5.40;
use Moose;
use MyCoercions;
has "description" => (
    is => 'ro',
    isa => 'Str',
);
has 'actions' => (
    is => 'ro',
    isa => 'ArrayOfFoo'
    coerce => 1,
    required => 1,
);
sub say_something  
    say "Hello there, I am described as " . $self->description . " and I am performing my actions: ";
    foreach my $action(@ $self->actions )  
        $action->say_something;
     
 
Now, we can write a script that loads this JSON file and create a new object using the flattened arguments:
use v5.40;
use MyDSL;
use JSON::MaybeXS;
my $input_file_name = shift;
my $args = do  
    local $/ = undef;
    open my $input_fh, "<", $input_file_name or die "could not open file";
    <$input_fh>;
 ;
$args = decode_json($args);
my $dsl = MyDSL->new(%$args);
$dsl->say_something
Output:
Hello there, I am described as do stuff and I am performing my actions:
Hello there, I am a bar
I am serving drinks!
Hello there, I am a bar
In some more detail, this will:
  • Read the JSON file and deserialize it;
  • Pass the object keys in the JSON file as arguments to a constructor of the MyDSL class;
  • The MyDSL class then uses those arguments to set its attributes, using Moose coercion to convert the "actions" array of hashes into an array of Foo::Bar objects.
  • Perform the say_something method on the MyDSL object
Once this is written, extending the scheme to also support a "quux" type simply requires writing a Foo::Quux class, making sure it has a method handles_type that returns a truthy value when called with quux as the argument, and installing it into the perl library path. This is rather easy to do. It can even be extended deeper, too; if the quux type requires a list of arguments rather than just a single argument, it could itself also have an array attribute with relevant coercions. These coercions could then be used to convert the list of arguments into an array of objects of the correct type, using the same schema as above. The actual DSL is of course somewhat more complex, and also actually does something useful, in contrast to the DSL that we define here which just says things. Creating an object that actually performs some action when required is left as an exercise to the reader.

29 September 2024

Reproducible Builds: Supporter spotlight: Kees Cook on Linux kernel security

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project, David A. Wheeler and Simon Butler. Today, however, we will be talking with Kees Cook, founder of the Kernel Self-Protection Project.

Vagrant Cascadian: Could you tell me a bit about yourself? What sort of things do you work on? Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA. I have been focusing on the upstream Linux kernel s protection of itself. There is a lot of support that the kernel provides userspace to defend itself, but when I first started focusing on this there was not as much attention given to the kernel protecting itself. As userspace got more hardened the kernel itself became a bigger target. Almost 9 years ago I formally announced the Kernel Self-Protection Project because the work necessary was way more than my time and expertise could do alone. So I just try to get people to help as much as possible; people who understand the ARM architecture, people who understand the memory management subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this sort of thing? Kees: I have always been interested in security through the aspect of exploitable flaws. I always thought it was like a magic trick to make a computer do something that it was very much not designed to do and seeing how easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I started working at Canonical on Ubuntu and was mainly focusing on bringing Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s security hardening efforts. Both had really pioneered a lot of userspace hardening with compiler flags and ELF stuff and many other things for hardened binaries. On the whole, Debian had not really paid attention to it. Debian s packaging building process at the time was sort of a chaotic free-for-all as there wasn t centralized build methodology for defining things. Luckily that did slowly change over the years. In Ubuntu we had the opportunity to apply top down build rules for hardening all the packages. In 2011 Chrome OS was following along and took advantage of a bunch of the security hardening work as they were based on ebuild out of Gentoo and when they looked for someone to help out they reached out to me. We recognized the Linux kernel was pretty much the weakest link in the Chrome OS security posture and I joined them to help solve that. Their userspace was pretty well handled but the kernel had a lot of weaknesses, so focusing on hardening was the next place to go. When I compared notes with other users of the Linux kernel within Google there were a number of common concerns and desires. Chrome OS already had an upstream first requirement, so I tried to consolidate the concerns and solve them upstream. It was challenging to land anything in other kernel team repos at Google, as they (correctly) wanted to minimize their delta from upstream, so I needed to work on any major improvements entirely in upstream and had a lot of support from Google to do that. As such, my focus shifted further from working directly on Chrome OS into being entirely upstream and being more of a consultant to internal teams, helping with integration or sometimes backporting. Since the volume of needed work was so gigantic I needed to find ways to inspire other developers (both inside and outside of Google) to help. Once I had a budget I tried to get folks paid (or hired) to work on these areas when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically defining undefined behavior in the language or compiler? Kees: I ve found the term undefined behavior to have a really strict meaning within the compiler community, so I have tried to redefine my goal as eliminating unexpected behavior or ambiguous language constructs . At the end of the day ambiguity leads to bugs, and bugs lead to exploitable security flaws. I ve been taking a four-pronged approach: supporting the work people are doing to get rid of ambiguity, identify new areas where ambiguity needs to be removed, actually removing that ambiguity from the C language, and then dealing with any needed refactoring in the Linux kernel source to adapt to the new constraints. None of this is particularly novel; people have recognized how dangerous some of these language constructs are for decades and decades but I think it is a combination of hard problems and a lot of refactoring that nobody has the interest/resources to do. So, we have been incrementally going after the lowest hanging fruit. One clear example in recent years was the elimination of C s implicit fall-through in switch statements. The language would just fall through between adjacent cases if a break (or other code flow directive) wasn t present. But this is ambiguous: is the code meant to fall-through, or did the author just forget a break statement? By defining the [[fallthrough]] statement, and requiring its use in Linux, all switch statements now have explicit code flow, and the entire class of bugs disappeared. During our refactoring we actually found that 1 in 10 added [[fallthrough]] statements were actually missing break statements. This was an extraordinarily common bug! So getting rid of that ambiguity is where we have been. Another area I ve been spending a bit of time on lately is looking at how defensive security work has challenges associated with metrics. How do you measure your defensive security impact? You can t say because we installed locks on the doors, 20% fewer break-ins have happened. Much of our signal is always secondary or retrospective, which is frustrating: This class of flaw was used X much over the last decade so, and if we have eliminated that class of flaw and will never see it again, what is the impact? Is the impact infinity? Attackers will just move to the next easiest thing. But it means that exploitation gets incrementally more difficult. As attack surfaces are reduced, the expense of exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be if people just gave up? Kees: I think it would be pretty bad, because as we have seen, using secondary factors, the work we have done in the industry at large, not just the Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else is doing for their respective software ecosystems, has shown that the price of functional exploits in the black market has gone up. Especially for really egregious stuff like a zero-click remote code execution. If those were cheap then obviously we are not doing something right, and it becomes clear that it s trivial for anyone to attack the infrastructure that our lives depend on. But thankfully we have seen over the last two decades that prices for exploits keep going up and up into millions of dollars. I think it is important to keep working on that because, as a central piece of modern computer infrastructure, the Linux kernel has a giant target painted on it. If we give up, we have to accept that our computers are not doing what they were designed to do, which I can t accept. The safety of my grandparents shouldn t be any different from the safety of journalists, and political activists, and anyone else who might be the target of attacks. We need to be able to trust our devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years? Kees: I think with all these things I am not the only actor. Almost everything that we have been successful at has been because of a lot of people s work, and one of the big ones that has been coordinated across the ecosystem and across compilers was initializing stack variables to 0 by default. This feature was added in Clang, GCC, and MSVC across the board even though there were a lot of fears about forking the C language. The worry was that developers would come to depend on zero-initialized stack variables, but this hasn t been the case because we still warn about uninitialized variables when the compiler can figure that out. So you still still get the warnings at compile time but now you can count on the contents of your stack at run-time and we drop an entire class of uninitialized variable flaws. While the exploitation of this class has mostly been around memory content exposure, it has also been used for control flow attacks. So that was politically and technically a large challenge: convincing people it was necessary, showing its utility, and implementing it in a way that everyone would be happy with, resulting in the elimination of a large and persistent class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways in which that might affect your work? Kees: One of the questions I frequently get is, What version of the Linux kernel has feature $foo? If I know how things are built, I can answer with just a version number. In a Reproducible Builds scenario I can count on the compiler version, compiler flags, kernel configuration, etc. all those things are known, so I can actually answer definitively that a certain feature exists. So that is an area where Reproducible Builds affects me most directly. Indirectly, it is just being able to trust the binaries you are running are going to behave the same for the same build environment is critical for sane testing.
Vagrant: Have you used diffoscope? Kees: I have! One subset of tree-wide refactoring that we do when getting rid of ambiguous language usage in the kernel is when we have to make source level changes to satisfy some new compiler requirement but where the binary output is not expected to change at all. It is mostly about getting the compiler to understand what is happening, what is intended in the cases where the old ambiguity does actually match the new unambiguous description of what is intended. The binary shouldn t change. We have used diffoscope to compare the before and after binaries to confirm that yep, there is no change in binary .
Vagrant: You cannot just use checksums for that? Kees: For the most part, we need to only compare the text segments. We try to hold as much stable as we can, following the Reproducible Builds documentation for the kernel, but there are macros in the kernel that are sensitive to source line numbers and as a result those will change the layout of the data segment (and sometimes the text segment too). With diffoscope there s flexibility where I can exclude or include different comparisons. Sometimes I just go look at what diffoscope is doing and do that manually, because I can tweak that a little harder, but diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you? Kees: One of the notable wins of reproducible builds lately was dealing with the fallout of the XZ backdoor and just being able to ask the question is my build environment running the expected code? and to be able to compare the output generated from one install that never had a vulnerable XZ and one that did have a vulnerable XZ and compare the results of what you get. That was important for kernel builds because the XZ threat actor was working to expand their influence and capabilities to include Linux kernel builds, but they didn t finish their work before they were noticed. I think what happened with Debian proving the build infrastructure was not affected is an important example of how people would have needed to verify the kernel builds too.
Vagrant: What do you want to see for the near or distant future in security work? Kees: For reproducible builds in the kernel, in the work that has been going on in the ClangBuiltLinux project, one of the driving forces of code and usability quality has been the continuous integration work. As soon as something breaks, on the kernel side, the Clang side, or something in between the two, we get a fast signal and can chase it and fix the bugs quickly. I would like to see someone with funding to maintain a reproducible kernel build CI. There have been places where there are certain architecture configurations or certain build configuration where we lose reproducibility and right now we have sort of a standard open source development feedback loop where those things get fixed but the time in between introduction and fix can be large. Getting a CI for reproducible kernels would give us the opportunity to shorten that time.
Vagrant: Well, thanks for that! Any last closing thoughts? Kees: I am a big fan of reproducible builds, thank you for all your work. The world is a safer place because of it.
Vagrant: Likewise for your work!


For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

28 July 2024

Marco d'Itri: An interesting architecture-dependent autopkgtest regression

More than two years after I last uploaded the purity-off Debian package, its autopkgtest (the Debian distribution-wide continuous integration system) started failing on arm64, and only on this architecture. The failing test is very simple: it prints a long stream of "y" or "n" characters to purity(6)'s standard input and then checks the output for the expected result. While investigating the live autopkgtest system, I figured out that: I did not have time to verify this, but I have noticed that the 25 years old purity(6) program calls TIOCGWINSZ to determine the screen length, and then uses the results in the answer buffer without checking if the ioctl(2) call returned an error. Which it obviously does in this case, because standard input is not a console but a pipe. So my theory is that paging is enabled because the undefined result of the ioctl has changed, and only on this architecture. Since I do not want to fix purity(6) right now, I have implemented the workaround of printing many more "y" characters as input.

18 July 2024

Enrico Zini: meson, includedir, and current directory

Suppose you have a meson project like this: meson.build:
project('example', 'cpp', version: '1.0', license : ' ', default_options: ['warning_level=everything', 'cpp_std=c++17'])
subdir('example')
example/meson.build:
test_example = executable('example-test', ['main.cc'])
example/string.h:
/* This file intentionally left empty */
example/main.cc:
#include <cstring>
int main(int argc,const char* argv[])
 
    std::string foo("foo");
    return 0;
 
This builds fine with autotools and cmake, but not meson:
$ meson setup builddir
The Meson build system
Version: 1.0.1
Source dir: /home/enrico/dev/deb/wobble-repr
Build dir: /home/enrico/dev/deb/wobble-repr/builddir
Build type: native build
Project name: example
Project version: 1.0
C++ compiler for the host machine: ccache c++ (gcc 12.2.0 "c++ (Debian 12.2.0-14) 12.2.0")
C++ linker for the host machine: c++ ld.bfd 2.40
Host machine cpu family: x86_64
Host machine cpu: x86_64
Build targets in project: 1
Found ninja-1.11.1 at /usr/bin/ninja
$ ninja -C builddir
ninja: Entering directory  builddir'
[1/2] Compiling C++ object example/example-test.p/main.cc.o
FAILED: example/example-test.p/main.cc.o
ccache c++ -Iexample/example-test.p -Iexample -I../example -fdiagnostics-color=always -D_FILE_OFFSET_BITS=64 -Wall -Winvalid-pch -Wextra -Wpedantic -Wcast-qual -Wconversion -Wfloat-equal -Wformat=2 -Winline -Wmissing-declarations -Wredundant-decls -Wshadow -Wundef -Wuninitialized -Wwrite-strings -Wdisabled-optimization -Wpacked -Wpadded -Wmultichar -Wswitch-default -Wswitch-enum -Wunused-macros -Wmissing-include-dirs -Wunsafe-loop-optimizations -Wstack-protector -Wstrict-overflow=5 -Warray-bounds=2 -Wlogical-op -Wstrict-aliasing=3 -Wvla -Wdouble-promotion -Wsuggest-attribute=const -Wsuggest-attribute=noreturn -Wsuggest-attribute=pure -Wtrampolines -Wvector-operation-performance -Wsuggest-attribute=format -Wdate-time -Wformat-signedness -Wnormalized=nfc -Wduplicated-cond -Wnull-dereference -Wshift-negative-value -Wshift-overflow=2 -Wunused-const-variable=2 -Walloca -Walloc-zero -Wformat-overflow=2 -Wformat-truncation=2 -Wstringop-overflow=3 -Wduplicated-branches -Wattribute-alias=2 -Wcast-align=strict -Wsuggest-attribute=cold -Wsuggest-attribute=malloc -Wanalyzer-too-complex -Warith-conversion -Wbidi-chars=ucn -Wopenacc-parallelism -Wtrivial-auto-var-init -Wctor-dtor-privacy -Weffc++ -Wnon-virtual-dtor -Wold-style-cast -Woverloaded-virtual -Wsign-promo -Wstrict-null-sentinel -Wnoexcept -Wzero-as-null-pointer-constant -Wabi-tag -Wuseless-cast -Wconditionally-supported -Wsuggest-final-methods -Wsuggest-final-types -Wsuggest-override -Wmultiple-inheritance -Wplacement-new=2 -Wvirtual-inheritance -Waligned-new=all -Wnoexcept-type -Wregister -Wcatch-value=3 -Wextra-semi -Wdeprecated-copy-dtor -Wredundant-move -Wcomma-subscript -Wmismatched-tags -Wredundant-tags -Wvolatile -Wdeprecated-enum-enum-conversion -Wdeprecated-enum-float-conversion -Winvalid-imported-macros -std=c++17 -O0 -g -MD -MQ example/example-test.p/main.cc.o -MF example/example-test.p/main.cc.o.d -o example/example-test.p/main.cc.o -c ../example/main.cc
In file included from ../example/main.cc:1:
/usr/include/c++/12/cstring:77:11: error:  memchr  has not been declared in  :: 
   77     using ::memchr;
                  ^~~~~~
/usr/include/c++/12/cstring:78:11: error:  memcmp  has not been declared in  :: 
   78     using ::memcmp;
                  ^~~~~~
/usr/include/c++/12/cstring:79:11: error:  memcpy  has not been declared in  :: 
   79     using ::memcpy;
                  ^~~~~~
/usr/include/c++/12/cstring:80:11: error:  memmove  has not been declared in  :: 
   80     using ::memmove;
                  ^~~~~~~
 
It turns out that meson adds the current directory to the include path by default:
Another thing to note is that include_directories adds both the source directory and corresponding build directory to include path, so you don't have to care.
It seems that I have to care after all. Thankfully there is an implicit_include_directories setting that can turn this off if needed. Its documentation is not as easy to find as I'd like (kudos to Kangie on IRC), and hopefully this blog post will make it easier for me to find it in the future.

2 April 2024

Bits from Debian: Bits from the DPL

Dear Debianites This morning I decided to just start writing Bits from DPL and send whatever I have by 18:00 local time. Here it is, barely proof read, along with all it's warts and grammar mistakes! It's slightly long and doesn't contain any critical information, so if you're not in the mood, don't feel compelled to read it! Get ready for a new DPL! Soon, the voting period will start to elect our next DPL, and my time as DPL will come to an end. Reading the questions posted to the new candidates on debian-vote, it takes quite a bit of restraint to not answer all of them myself, I think I can see how that aspect contributed to me being reeled in to running for DPL! In total I've done so 5 times (the first time I ran, Sam was elected!). Good luck to both Andreas and Sruthi, our current DPL candidates! I've already started working on preparing handover, and there's multiple request from teams that have came in recently that will have to wait for the new term, so I hope they're both ready to hit the ground running! Things that I wish could have gone better Communication Recently, I saw a t-shirt that read:
Adulthood is saying, 'But after this week things will slow down a bit' over and over until you die.
I can relate! With every task, crisis or deadline that appears, I think that once this is over, I'll have some more breathing space to get back to non-urgent, but important tasks. "Bits from the DPL" was something I really wanted to get right this last term, and clearly failed spectacularly. I have two long Bits from the DPL drafts that I never finished, I tend to have prioritised problems of the day over communication. With all the hindsight I have, I'm not sure which is better to prioritise, I do rate communication and transparency very highly and this is really the top thing that I wish I could've done better over the last four years. On that note, thanks to people who provided me with some kind words when I've mentioned this to them before. They pointed out that there are many other ways to communicate and be in touch with the community, and they mentioned that they thought that I did a good job with that. Since I'm still on communication, I think we can all learn to be more effective at it, since it's really so important for the project. Every time I publicly spoke about us spending more money, we got more donations. People out there really like to see how we invest funds in to Debian, instead of just making it heap up. DSA just spent a nice chunk on money on hardware, but we don't have very good visibility on it. It's one thing having it on a public line item in SPI's reporting, but it would be much more exciting if DSA could provide a write-up on all the cool hardware they're buying and what impact it would have on developers, and post it somewhere prominent like debian-devel-announce, Planet Debian or Bits from Debian (from the publicity team). I don't want to single out DSA there, it's difficult and affects many other teams. The Salsa CI team also spent a lot of resources (time and money wise) to extend testing on AMD GPUs and other AMD hardware. It's fantastic and interesting work, and really more people within the project and in the outside world should know about it! I'm not going to push my agendas to the next DPL, but I hope that they continue to encourage people to write about their work, and hopefully at some point we'll build enough excitement in doing so that it becomes a more normal part of our daily work. Founding Debian as a standalone entity This was my number one goal for the project this last term, which was a carried over item from my previous terms. I'm tempted to write everything out here, including the problem statement and our current predicaments, what kind of ground work needs to happen, likely constitutional changes that need to happen, and the nature of the GR that would be needed to make such a thing happen, but if I start with that, I might not finish this mail. In short, I 100% believe that this is still a very high ranking issue for Debian, and perhaps after my term I'd be in a better position to spend more time on this (hmm, is this an instance of "The grass is always better on the other side", or "Next week will go better until I die?"). Anyway, I'm willing to work with any future DPL on this, and perhaps it can in itself be a delegation tasked to properly explore all the options, and write up a report for the project that can lead to a GR. Overall, I'd rather have us take another few years and do this properly, rather than rush into something that is again difficult to change afterwards. So while I very much wish this could've been achieved in the last term, I can't say that I have any regrets here either. My terms in a nutshell COVID-19 and Debian 11 era My first term in 2020 started just as the COVID-19 pandemic became known to spread globally. It was a tough year for everyone, and Debian wasn't immune against its effects either. Many of our contributors got sick, some have lost loved ones (my father passed away in March 2020 just after I became DPL), some have lost their jobs (or other earners in their household have) and the effects of social distancing took a mental and even physical health toll on many. In Debian, we tend to do really well when we get together in person to solve problems, and when DebConf20 got cancelled in person, we understood that that was necessary, but it was still more bad news in a year we had too much of it already. I can't remember if there was ever any kind of formal choice or discussion about this at any time, but the DebConf video team just kind of organically and spontaneously became the orga team for an online DebConf, and that lead to our first ever completely online DebConf. This was great on so many levels. We got to see each other's faces again, even though it was on screen. We had some teams talk to each other face to face for the first time in years, even though it was just on a Jitsi call. It had a lasting cultural change in Debian, some teams still have video meetings now, where they didn't do that before, and I think it's a good supplement to our other methods of communication. We also had a few online Mini-DebConfs that was fun, but DebConf21 was also online, and by then we all developed an online conference fatigue, and while it was another good online event overall, it did start to feel a bit like a zombieconf and after that, we had some really nice events from the Brazillians, but no big global online community events again. In my opinion online MiniDebConfs can be a great way to develop our community and we should spend some further energy into this, but hey! This isn't a platform so let me back out of talking about the future as I see it... Despite all the adversity that we faced together, the Debian 11 release ended up being quite good. It happened about a month or so later than what we ideally would've liked, but it was a solid release nonetheless. It turns out that for quite a few people, staying inside for a few months to focus on Debian bugs was quite productive, and Debian 11 ended up being a very polished release. During this time period we also had to deal with a previous Debian Developer that was expelled for his poor behaviour in Debian, who continued to harass members of the Debian project and in other free software communities after his expulsion. This ended up being quite a lot of work since we had to take legal action to protect our community, and eventually also get the police involved. I'm not going to give him the satisfaction by spending too much time talking about him, but you can read our official statement regarding Daniel Pocock here: https://www.debian.org/News/2021/20211117 In late 2021 and early 2022 we also discussed our general resolution process, and had two consequent votes to address some issues that have affected past votes: In my first term I addressed our delegations that were a bit behind, by the end of my last term all delegation requests are up to date. There's still some work to do, but I'm feeling good that I get to hand this over to the next DPL in a very decent state. Delegation updates can be very deceiving, sometimes a delegation is completely re-written and it was just 1 or 2 hours of work. Other times, a delegation updated can contain one line that has changed or a change in one team member that was the result of days worth of discussion and hashing out differences. I also received quite a few requests either to host a service, or to pay a third-party directly for hosting. This was quite an admin nightmare, it either meant we had to manually do monthly reimbursements to someone, or have our TOs create accounts/agreements at the multiple providers that people use. So, after talking to a few people about this, we founded the DebianNet team (we could've admittedly chosen a better name, but that can happen later on) for providing hosting at two different hosting providers that we have agreement with so that people who host things under debian.net have an easy way to host it, and then at the same time Debian also has more control if a site maintainer goes MIA. More info: https://wiki.debian.org/Teams/DebianNet You might notice some Openstack mentioned there, we had some intention to set up a Debian cloud for hosting these things, that could also be used for other additional Debiany things like archive rebuilds, but these have so far fallen through. We still consider it a good idea and hopefully it will work out some other time (if you're a large company who can sponsor few racks and servers, please get in touch!) DebConf22 and Debian 12 era DebConf22 was the first time we returned to an in-person DebConf. It was a bit smaller than our usual DebConf - understandably so, considering that there were still COVID risks and people who were at high risk or who had family with high risk factors did the sensible thing and stayed home. After watching many MiniDebConfs online, I also attended my first ever MiniDebConf in Hamburg. It still feels odd typing that, it feels like I should've been at one before, but my location makes attending them difficult (on a side-note, a few of us are working on bootstrapping a South African Debian community and hopefully we can pull off MiniDebConf in South Africa later this year). While I was at the MiniDebConf, I gave a talk where I covered the evolution of firmware, from the simple e-proms that you'd find in old printers to the complicated firmware in modern GPUs that basically contain complete operating systems- complete with drivers for the device their running on. I also showed my shiny new laptop, and explained that it's impossible to install that laptop without non-free firmware (you'd get a black display on d-i or Debian live). Also that you couldn't even use an accessibility mode with audio since even that depends on non-free firmware these days. Steve, from the image building team, has said for a while that we need to do a GR to vote for this, and after more discussion at DebConf, I kept nudging him to propose the GR, and we ended up voting in favour of it. I do believe that someone out there should be campaigning for more free firmware (unfortunately in Debian we just don't have the resources for this), but, I'm glad that we have the firmware included. In the end, the choice comes down to whether we still want Debian to be installable on mainstream bare-metal hardware. At this point, I'd like to give a special thanks to the ftpmasters, image building team and the installer team who worked really hard to get the changes done that were needed in order to make this happen for Debian 12, and for being really proactive for remaining niggles that was solved by the time Debian 12.1 was released. The included firmware contributed to Debian 12 being a huge success, but it wasn't the only factor. I had a list of personal peeves, and as the hard freeze hit, I lost hope that these would be fixed and made peace with the fact that Debian 12 would release with those bugs. I'm glad that lots of people proved me wrong and also proved that it's never to late to fix bugs, everything on my list got eliminated by the time final freeze hit, which was great! We usually aim to have a release ready about 2 years after the previous release, sometimes there are complications during a freeze and it can take a bit longer. But due to the excellent co-ordination of the release team and heavy lifting from many DDs, the Debian 12 release happened 21 months and 3 weeks after the Debian 11 release. I hope the work from the release team continues to pay off so that we can achieve their goals of having shorter and less painful freezes in the future! Even though many things were going well, the ongoing usr-merge effort highlighted some social problems within our processes. I started typing out the whole history of usrmerge here, but it's going to be too long for the purpose of this mail. Important questions that did come out of this is, should core Debian packages be team maintained? And also about how far the CTTE should really be able to override a maintainer. We had lots of discussion about this at DebConf22, but didn't make much concrete progress. I think that at some point we'll probably have a GR about package maintenance. Also, thank you to Guillem who very patiently explained a few things to me (after probably having have to done so many times to others before already) and to Helmut who have done the same during the MiniDebConf in Hamburg. I think all the technical and social issues here are fixable, it will just take some time and patience and I have lots of confidence in everyone involved. UsrMerge wiki page: https://wiki.debian.org/UsrMerge DebConf 23 and Debian 13 era DebConf23 took place in Kochi, India. At the end of my Bits from the DPL talk there, someone asked me what the most difficult thing I had to do was during my terms as DPL. I answered that nothing particular stood out, and even the most difficult tasks ended up being rewarding to work on. Little did I know that my most difficult period of being DPL was just about to follow. During the day trip, one of our contributors, Abraham Raji, passed away in a tragic accident. There's really not anything anyone could've done to predict or stop it, but it was devastating to many of us, especially the people closest to him. Quite a number of DebConf attendees went to his funeral, wearing the DebConf t-shirts he designed as a tribute. It still haunts me when I saw his mother scream "He was my everything! He was my everything!", this was by a large margin the hardest day I've ever had in Debian, and I really wasn't ok for even a few weeks after that and I think the hurt will be with many of us for some time to come. So, a plea again to everyone, please take care of yourself! There's probably more people that love you than you realise. A special thanks to the DebConf23 team, who did a really good job despite all the uphills they faced (and there were many!). As DPL, I think that planning for a DebConf is near to impossible, all you can do is show up and just jump into things. I planned to work with Enrico to finish up something that will hopefully save future DPLs some time, and that is a web-based DD certificate creator instead of having the DPL do so manually using LaTeX. It already mostly works, you can see the work so far by visiting https://nm.debian.org/person/ACCOUNTNAME/certificate/ and replacing ACCOUNTNAME with your Debian account name, and if you're a DD, you should see your certificate. It still needs a few minor changes and a DPL signature, but at this point I think that will be finished up when the new DPL start. Thanks to Enrico for working on this! Since my first term, I've been trying to find ways to improve all our accounting/finance issues. Tracking what we spend on things, and getting an annual overview is hard, especially over 3 trusted organisations. The reimbursement process can also be really tedious, especially when you have to provide files in a certain order and combine them into a PDF. So, at DebConf22 we had a meeting along with the treasurer team and Stefano Rivera who said that it might be possible for him to work on a new system as part of his Freexian work. It worked out, and Freexian funded the development of the system since then, and after DebConf23 we handled the reimbursements for the conference via the new reimbursements site: https://reimbursements.debian.net/ It's still early days, but over time it should be linked to all our TOs and we'll use the same category codes across the board. So, overall, our reimbursement process becomes a lot simpler, and also we'll be able to get information like how much money we've spent on any category in any period. It will also help us to track how much money we have available or how much we spend on recurring costs. Right now that needs manual polling from our TOs. So I'm really glad that this is a big long-standing problem in the project that is being fixed. For Debian 13, we're waving goodbye to the KFreeBSD and mipsel ports. But we're also gaining riscv64 and loongarch64 as release architectures! I have 3 different RISC-V based machines on my desk here that I haven't had much time to work with yet, you can expect some blog posts about them soon after my DPL term ends! As Debian is a unix-like system, we're affected by the Year 2038 problem, where systems that uses 32 bit time in seconds since 1970 run out of available time and will wrap back to 1970 or have other undefined behaviour. A detailed wiki page explains how this works in Debian, and currently we're going through a rather large transition to make this possible. I believe this is the right time for Debian to be addressing this, we're still a bit more than a year away for the Debian 13 release, and this provides enough time to test the implementation before 2038 rolls along. Of course, big complicated transitions with dependency loops that causes chaos for everyone would still be too easy, so this past weekend (which is a holiday period in most of the west due to Easter weekend) has been filled with dealing with an upstream bug in xz-utils, where a backdoor was placed in this key piece of software. An Ars Technica covers it quite well, so I won't go into all the details here. I mention it because I want to give yet another special thanks to everyone involved in dealing with this on the Debian side. Everyone involved, from the ftpmasters to security team and others involved were super calm and professional and made quick, high quality decisions. This also lead to the archive being frozen on Saturday, this is the first time I've seen this happen since I've been a DD, but I'm sure next week will go better! Looking forward It's really been an honour for me to serve as DPL. It might well be my biggest achievement in my life. Previous DPLs range from prominent software engineers to game developers, or people who have done things like complete Iron Man, run other huge open source projects and are part of big consortiums. Ian Jackson even authored dpkg and is now working on the very interesting tag2upload service! I'm a relative nobody, just someone who grew up as a poor kid in South Africa, who just really cares about Debian a lot. And, above all, I'm really thankful that I didn't do anything major to screw up Debian for good. Not unlike learning how to use Debian, and also becoming a Debian Developer, I've learned a lot from this and it's been a really valuable growth experience for me. I know I can't possible give all the thanks to everyone who deserves it, so here's a big big thanks to everyone who have worked so hard and who have put in many, many hours to making Debian better, I consider you all heroes! -Jonathan

29 January 2024

Russ Allbery: Review: Bluebird

Review: Bluebird, by Ciel Pierlot
Publisher: Angry Robot
Copyright: 2022
ISBN: 0-85766-967-2
Format: Kindle
Pages: 458
Bluebird is a stand-alone far-future science fiction adventure. Ten thousand years ago, a star fell into the galaxy carrying three factions of humanity. The Ascetics, the Ossuary, and the Pyrites each believe that only their god survived and the other two factions are heretics. Between them, they have conquered the rest of the galaxy and its non-human species. The only thing the factions hate worse than each other are those who attempt to stay outside the faction system. Rig used to be a Pyrite weapon designer before she set fire to her office and escaped with her greatest invention. Now she's a Nightbird, a member of an outlaw band that tries to help refugees and protect her fellow Kashrini against Pyrite genocide. On her side, she has her girlfriend, an Ascetic librarian; her ship, Bluebird; and her guns, Panache and Pizzazz. And now, perhaps, the mysterious Ginka, a Zazra empath and remarkably capable fighter who helps Rig escape from an ambush by Pyrite soldiers. Rig wants to stay alive, help her people, and defy the factions. Pyrite wants Rig's secrets and, as leverage, has her sister. What Ginka wants is not entirely clear even to Ginka. This book is absurd, but I still had fun with it. It's dangerous for me to compare things to anime given how little anime that I've watched, but Bluebird had that vibe for me: anime, or maybe Japanese RPGs or superhero comics. The storytelling is very visual, combat-oriented, and not particularly realistic. Rig is a pistol sharpshooter and Ginka is the type of undefined deadly acrobatic fighter so often seen in that type of media. In addition to her ship, Rig has a gorgeous hand-maintained racing hoverbike with a beautiful paint job. It's that sort of book. It's also the sort of book where the characters obey cinematic logic designed to maximize dramatic physical confrontations, even if their actions make no logical sense. There is no facial recognition or screening, and it's bizarrely easy for the protagonists to end up in same physical location as high-up bad guys. One of the weapon systems that's critical to the plot makes no sense whatsoever. At critical moments, the bad guys behave more like final bosses in a video game, picking up weapons to deal with the protagonists directly instead of using their supposedly vast armies of agents. There is supposedly a whole galaxy full of civilizations with capital worlds covered in planet-spanning cities, but politics barely exist and the faction leaders get directly involved in the plot. If you are looking for a realistic projection of technology or society, I cannot stress enough that this is not the book that you're looking for. You probably figured that out when I mentioned ten thousand years of war, but that will only be the beginning of the suspension of disbelief problems. You need to turn off your brain and enjoy the action sequences and melodrama. I'm normally good at that, and I admit I still struggled because the plot logic is such a mismatch with the typical novels I read. There are several points where the characters do something that seems so monumentally dumb that I was sure Pierlot was setting them up for a fall, and then I got wrong-footed because their plan worked fine, or exploded for unrelated reasons. I think this type of story, heavy on dramatic eye-candy and emotional moments with swelling soundtracks, is a lot easier to pull off in visual media where all the pretty pictures distract your brain. In a novel, there's a lot of time to think about the strategy, technology, and government structure, which for this book is not a good idea. If you can get past that, though, Rig is entertainingly snarky and Ginka, who turns out to be the emotional heart of the book, is an enjoyable character with a real growth arc. Her background is a bit simplistic and the villains are the sort of pure evil that you might expect from this type of cinematic plot, but I cared about the outcome of her story. Some parts of the plot dragged and I think the editing could have been tighter, but there was enough competence porn and banter to pull me through. I would recommend Bluebird only cautiously, since you're going to need to turn off large portions of your brain and be in the right mood for nonsensically dramatic confrontations, but I don't regret reading it. It's mostly in primary colors and the emotional conflicts are not what anyone would call subtle, but it delivers a character arc and a somewhat satisfying ending. Content warning: There is a lot of serious physical injury in this book, including surgical maiming. If that's going to bother you, you may want to give this one a pass. Rating: 6 out of 10

2 January 2024

Matthew Garrett: Dealing with weird ELF libraries

Libraries are collections of code that are intended to be usable by multiple consumers (if you're interested in the etymology, watch this video). In the old days we had what we now refer to as "static" libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We've moved beyond that, thankfully, and now make use of what we call "dynamic" or "shared" libraries - instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I'm going to somewhat simplify the explanations from here on - things like symbol versioning make this a bit more complicated but aren't strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it's not going to care about whether or not every function called in your code actually exists or not - that'll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren't just functions - libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of "section headers". Each section has a name and a type, but the ones we're interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting - it's a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you're actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn't sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it's not possible to simply encode this as "Call this address in this library" - if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time - there are other sections related to dynamic linking, but you can link against a library that's missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process's address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym - at the very least, it needs to know the list of required libraries.

There's a separate section called .dynamic that contains another list of structures, and it's the data here that's used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED - this is the list of libraries that an executable requires. There's also a bunch of other stuff that's required to actually make all of this work, but the only thing I'm going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that's going to take some time. The DT_HASH entry points to a hash table - the RTLD hashes the symbol name it's trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn't hit a hash collision - if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There's also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that performs even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and both are required along with .dynamic at runtime. This seems simple enough, but obviously there's a twist and I'm sorry it's taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd - all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn't resolve the symbols - but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that's present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there's no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that's what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There's one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn't directly exist in DYNAMIC - to figure out how many symbols exist, you're expected to walk the hash tables and keep track of the largest number you've seen. Since every symbol has to be referenced in the hash table, once you've hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] "Shared object" is the source of the .so filename extension used in various Unix-style operating systems
[2] You'll note that "RTLD" is not an acryonym for "runtime dynamic linker", because reasons
[3] For environments using the GNU RTLD, at least - I have no idea whether this is the case in all ELF environments

comment count unavailable comments

6 December 2023

Reproducible Builds: Reproducible Builds in November 2023

Welcome to the November 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a rather rapid recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries (more).

Reproducible Builds Summit 2023 Between October 31st and November 2nd, we held our seventh Reproducible Builds Summit in Hamburg, Germany! Amazingly, the agenda and all notes from all sessions are all online many thanks to everyone who wrote notes from the sessions. As a followup on one idea, started at the summit, Alexander Couzens and Holger Levsen started work on a cache (or tailored front-end) for the snapshot.debian.org service. The general idea is that, when rebuilding Debian, you do not actually need the whole ~140TB of data from snapshot.debian.org; rather, only a very small subset of the packages are ever used for for building. It turns out, for amd64, arm64, armhf, i386, ppc64el, riscv64 and s390 for Debian trixie, unstable and experimental, this is only around 500GB ie. less than 1%. Although the new service not yet ready for usage, it has already provided a promising outlook in this regard. More information is available on https://rebuilder-snapshot.debian.net and we hope that this service becomes usable in the coming weeks. The adjacent picture shows a sticky note authored by Jan-Benedict Glaw at the summit in Hamburg, confirming Holger Levsen s theory that rebuilding all Debian packages needs a very small subset of packages, the text states that 69,200 packages (in Debian sid) list 24,850 packages in their .buildinfo files, in 8,0200 variations. This little piece of paper was the beginning of rebuilder-snapshot and is a direct outcome of the summit! The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.

Beyond Trusting FOSS presentation at SeaGL On November 4th, Vagrant Cascadian presented Beyond Trusting FOSS at SeaGL in Seattle, WA in the United States. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. The summary of Vagrant s talk mentions that it will:
[ ] introduce the concepts of Reproducible Builds, including best practices for developing and releasing software, the tools available to help diagnose issues, and touch on progress towards solving decades-old deeply pervasive fundamental security issues Learn how to verify and demonstrate trust, rather than simply hoping everything is OK!
Germane to the contents of the talk, the slides for Vagrant s talk can be built reproducibly, resulting in a PDF with a SHA1 of cfde2f8a0b7e6ec9b85377eeac0661d728b70f34 when built on Debian bookworm and c21fab273232c550ce822c4b0d9988e6c49aa2c3 on Debian sid at the time of writing.

Human Factors in Software Supply Chain Security Marcel Fourn , Dominik Wermke, Sascha Fahl and Yasemin Acar have published an article in a Special Issue of the IEEE s Security & Privacy magazine. Entitled A Viewpoint on Human Factors in Software Supply Chain Security: A Research Agenda, the paper justifies the need for reproducible builds to reach developers and end-users specifically, and furthermore points out some under-researched topics that we have seen mentioned in interviews. An author pre-print of the article is available in PDF form.

Community updates On our mailing list this month:

openSUSE updates Bernhard M. Wiedemann has created a wiki page outlining an proposal to create a general-purpose Linux distribution which consists of 100% bit-reproducible packages albeit minus the embedded signature within RPM files. It would be based on openSUSE Tumbleweed or, if available, its Slowroll-variant. In addition, Bernhard posted another monthly update for his work elsewhere in openSUSE.

Ubuntu Launchpad now supports .buildinfo files Back in 2017, Steve Langasek filed a bug against Ubuntu s Launchpad code hosting platform to report that .changes files (artifacts of building Ubuntu and Debian packages) reference .buildinfo files that aren t actually exposed by Launchpad itself. This was causing issues when attempting to process .changes files with tools such as Lintian. However, it was noticed last month that, in early August of this year, Simon Quigley had resolved this issue, and .buildinfo files are now available from the Launchpad system.

PHP reproducibility updates There have been two updates from the PHP programming language this month. Firstly, the widely-deployed PHPUnit framework for the PHP programming language have recently released version 10.5.0, which introduces the inclusion of a composer.lock file, ensuring total reproducibility of the shipped binary file. Further details and the discussion that went into their particular implementation can be found on the associated GitHub pull request. In addition, the presentation Leveraging Nix in the PHP ecosystem has been given in late October at the PHP International Conference in Munich by Pol Dellaiera. While the video replay is not yet available, the (reproducible) presentation slides and speaker notes are available.

diffoscope changes diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes, including:
  • Improving DOS/MBR extraction by adding support for 7z. [ ]
  • Adding a missing RequiredToolNotFound import. [ ]
  • As a UI/UX improvement, try and avoid printing an extended traceback if diffoscope runs out of memory. [ ]
  • Mark diffoscope as stable on PyPI.org. [ ]
  • Uploading version 252 to Debian unstable. [ ]

Website updates A huge number of notes were added to our website that were taken at our recent Reproducible Builds Summit held between October 31st and November 2nd in Hamburg, Germany. In particular, a big thanks to Arnout Engelen, Bernhard M. Wiedemann, Daan De Meyer, Evangelos Ribeiro Tzaras, Holger Levsen and Orhun Parmaks z. In addition to this, a number of other changes were made, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Track packages marked as Priority: important in a new package set. [ ][ ]
    • Stop scheduling packages that fail to build from source in bookworm [ ] and bullseye. [ ].
    • Add old releases dashboard link in web navigation. [ ]
    • Permit re-run of the pool_buildinfos script to be re-run for a specific year. [ ]
    • Grant jbglaw access to the osuosl4 node [ ][ ] along with lynxis [ ].
    • Increase RAM on the amd64 Ionos builders from 48 GiB to 64 GiB; thanks IONOS! [ ]
    • Move buster to archived suites. [ ][ ]
    • Reduce the number of arm64 architecture workers from 24 to 16 in order to improve stability [ ], reduce the workers for amd64 from 32 to 28 and, for i386, reduce from 12 down to 8 [ ].
    • Show the entire build history of each Debian package. [ ]
    • Stop scheduling already tested package/version combinations in Debian bookworm. [ ]
  • Snapshot service for rebuilders
    • Add an HTTP-based API endpoint. [ ][ ]
    • Add a Gunicorn instance to serve the HTTP API. [ ]
    • Add an NGINX config [ ][ ][ ][ ]
  • System-health:
    • Detect failures due to HTTP 503 Service Unavailable errors. [ ]
    • Detect failures to update package sets. [ ]
    • Detect unmet dependencies. (This usually occurs with builds of Debian live-build.) [ ]
  • Misc-related changes:
    • do install systemd-ommd on jenkins. [ ]
    • fix harmless typo in squid.conf for codethink04. [ ]
    • fixup: reproducible Debian: add gunicorn service to serve /api for rebuilder-snapshot.d.o. [ ]
    • Increase codethink04 s Squid cache_dir size setting to 16 GiB. [ ]
    • Don t install systemd-oomd as it unfortunately kills sshd [ ]
    • Use debootstrap from backports when commisioning nodes. [ ]
    • Add the live_build_debian_stretch_gnome, debsums-tests_buster and debsums-tests_buster jobs to the zombie list. [ ][ ]
    • Run jekyll build with the --watch argument when building the Reproducible Builds website. [ ]
    • Misc node maintenance. [ ][ ][ ]
Other changes were made as well, however, including Mattia Rizzolo fixing rc.local s Bash syntax so it can actually run [ ], commenting away some file cleanup code that is (potentially) deleting too much [ ] and fixing the html_brekages page for Debian package builds [ ]. Finally, diagnosed and submitted a patch to add a AddEncoding gzip .gz line to the tests.reproducible-builds.org Apache configuration so that Gzip files aren t re-compressed as Gzip which some clients can t deal with (as well as being a waste of time). [ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

22 October 2023

Ian Jackson: DigiSpark (ATTiny85) - Arduino, C, Rust, build systems

Recently I completed a small project, including an embedded microcontroller. For me, using the popular Arduino IDE, and C, was a mistake. The experience with Rust was better, but still very exciting, and not in a good way. Here follows the rant. Introduction In a recent project (I ll write about the purpose, and the hardware in another post) I chose to use a DigiSpark board. This is a small board with a USB-A tongue (but not a proper plug), and an ATTiny85 microcontroller, This chip has 8 pins and is quite small really, but it was plenty for my application. By choosing something popular, I hoped for convenient hardware, and an uncomplicated experience. Convenient hardware, I got. Arduino IDE The usual way to program these boards is via an IDE. I thought I d go with the flow and try that. I knew these were closely related to actual Arduinos and saw that the IDE package arduino was in Debian. But it turns out that the Debian package s version doesn t support the DigiSpark. (AFAICT from the list it offered me, I m not sure it supports any ATTiny85 board.) Also, disturbingly, its board manager seemed to be offering to install board support, suggesting it would download stuff from the internet and run it. That wouldn t be acceptable for my main laptop. I didn t expect to be doing much programming or debugging, and the project didn t have significant security requirements: the chip, in my circuit, has only a very narrow ability do anything to the real world, and no network connection of any kind. So I thought it would be tolerable to do the project on my low-security video laptop . That s the machine where I m prepared to say yes to installing random software off the internet. So I went to the upstream Arduino site and downloaded a tarball containing the Arduino IDE. After unpacking that in /opt it ran and produced a pointy-clicky IDE, as expected. I had already found a 3rd-party tutorial saying I needed to add a magic URL (from the DigiSpark s vendor) in the preferences. That indeed allowed it to download a whole pile of stuff. Compilers, bootloader clients, god knows what. However, my tiny test program didn t make it to the board. Half-buried in a too-small window was an error message about the board s bootloader ( Micronucleus ) being too new. The boards I had came pre-flashed with micronucleus 2.2. Which is hardly new, But even so the official Arduino IDE (or maybe the DigiSpark s board package?) still contains an old version. So now we have all the downsides of curl bash-ware, but we re lacking the it s up to date and it just works upsides. Further digging found some random forum posts which suggested simply downloading a newer micronucleus and manually stuffing it into the right place: one overwrites a specific file, in the middle the heaps of stuff that the Arduino IDE s board support downloader squirrels away in your home directory. (In my case, the home directory of the untrusted shared user on the video laptop,) So, whatever . I did that. And it worked! Having demo d my ability to run code on the board, I set about writing my program. Writing C again The programming language offered via the Arduino IDE is C. It s been a little while since I started a new thing in C. After having spent so much of the last several years writing Rust. C s primitiveness quickly started to grate, and the program couldn t easily be as DRY as I wanted (Don t Repeat Yourself, see Wilson et al, 2012, 4, p.6). But, I carried on; after all, this was going to be quite a small job. Soon enough I had a program that looked right and compiled. Before testing it in circuit, I wanted to do some QA. So I wrote a simulator harness that #included my Arduino source file, and provided imitations of the few Arduino library calls my program used. As an side advantage, I could build and run the simulation on my main machine, in my normal development environment (Emacs, make, etc.). The simulator runs confirmed the correct behaviour. (Perhaps there would have been some more faithful simulation tool, but the Arduino IDE didn t seem to offer it, and I wasn t inclined to go further down that kind of path.) So I got the video laptop out, and used the Arduino IDE to flash the program. It didn t run properly. It hung almost immediately. Some very ad-hoc debugging via led-blinking (like printf debugging, only much worse) convinced me that my problem was as follows: Arduino C has 16-bit ints. My test harness was on my 64-bit Linux machine. C was autoconverting things (when building for the micrcocontroller). The way the Arduino IDE ran the compiler didn t pass the warning options necessary to spot narrowing implicit conversions. Those warnings aren t the default in C in general because C compilers hate us all for compatibility reasons. I don t know why those warnings are not the default in the Arduino IDE, but my guess is that they didn t want to bother poor novice programmers with messages from the compiler explaining how their program is quite possibly wrong. After all, users don t like error messages so we shouldn t report errors. And novice programmers are especially fazed by error messages so it s better to just let them struggle themselves with the arcane mysteries of undefined behaviour in C? The Arduino IDE does offer a dropdown for compiler warnings . The default is None. Setting it to All didn t produce anything about my integer overflow bugs. And, the output was very hard to find anyway because the log window has a constant stream of strange messages from javax.jmdns, with hex DNS packet dumps. WTF. Other things that were vexing about the Arduino IDE: it has fairly fixed notions (which don t seem to be documented) about how your files and directories ought to be laid out, and magical machinery for finding things you put nearby its sketch (as it calls them) and sticking them in its ear, causing lossage. It has a tendency to become confused if you edit files under its feet (e.g. with git checkout). It wasn t really very suited to a workflow where principal development occurs elsewhere. And, important settings such as the project s clock speed, or even the target board, or the compiler warning settings to use weren t stored in the project directory along with the actual code. I didn t look too hard, but I presume they must be in a dotfile somewhere. This is madness. Apparently there is an Arduino CLI too. But I was already quite exasperated, and I didn t like the idea of going so far off the beaten path, when the whole point of using all this was to stay with popular tooling and share fate with others. (How do these others cope? I have no idea.) As for the integer overflow bug: I didn t seriously consider trying to figure out how to control in detail the C compiler options passed by the Arduino IDE. (Perhaps this is possible, but not really documented?) I did consider trying to run a cross-compiler myself from the command line, with appropriate warning options, but that would have involved providing (or stubbing, again) the Arduino/DigiSpark libraries (and bugs could easily lurk at that interface). Instead, I thought, if only I had written the thing in Rust . But that wasn t possible, was it? Does Rust even support this board? Rust on the DigiSpark I did a cursory web search and found a very useful blog post by Dylan Garrett. This encouraged me to think it might be a workable strategy. I looked at the instructions there. It seemed like I could run them via the privsep arrangement I use to protect myself when developing using upstream cargo packages from crates.io. I got surprisingly far surprisingly quickly. It did, rather startlingly, cause my rustup to download a random recent Nightly Rust, but I have six of those already for other Reasons. Very quickly I got the trinket LED blink example, referenced by Dylan s blog post, to compile. Manually copying the file to the video laptop allowed me to run the previously-downloaded micronucleus executable and successfully run the blink example on my board! I thought a more principled approach to the bootloader client might allow a more convenient workflow. I found the upstream Micronucleus git releases and tags, and had a look over its source code, release dates, etc. It seemed plausible, so I compiled v2.6 from source. That was a success: now I could build and install a Rust program onto my board, from the command line, on my main machine. No more pratting about with the video laptop. I had got further, more quickly, with Rust, than with the Arduino IDE, and the outcome and workflow was superior. So, basking in my success, I copied the directory containing the example into my own project, renamed it, and adjusted the path references. That didn t work. Now it didn t build. Even after I copied about .cargo/config.toml and rust-toolchain.toml it didn t build, producing a variety of exciting messages, depending what precisely I tried. I don t have detailed logs of my flailing: the instructions say to build it by cd ing to the subdirectory, and, given that what I was trying to do was to not follow those instructions, it didn t seem sensible to try to prepare a proper repro so I could file a ticket. I wasn t optimistic about investigating it more deeply myself: I have some experience of fighting cargo, and it s not usually fun. Looking at some of the build control files, things seemed quite complicated. Additionally, not all of the crates are on crates.io. I have no idea why not. So, I would need to supply local copies of them anyway. I decided to just git subtree add the avr-hal git tree. (That seemed better than the approach taken by the avr-hal project s cargo template, since that template involve a cargo dependency on a foreign git repository. Perhaps it would be possible to turn them into path dependencies, but given that I had evidence of file-location-sensitive behaviour, which I didn t feel like I wanted to spend time investigating, using that seems like it would possibly have invited more trouble. Also, I don t like package templates very much. They re a form of clone-and-hack: you end up stuck with whatever bugs or oddities exist in the version of the template which was current when you started.) Since I couldn t get things to build outside avr-hal, I edited the example, within avr-hal, to refer to my (one) program.rs file outside avr-hal, with a #[path] instruction. That s not pretty, but it worked. I also had to write a nasty shell script to work around the lack of good support in my nailing-cargo privsep tool for builds where cargo must be invoked in a deep subdirectory, and/or Cargo.lock isn t where it expects, and/or the target directory containing build products is in a weird place. It also has to filter the output from cargo to adjust the pathnames in the error messages. Otherwise, running both cd A; cargo build and cd B; cargo build from a Makefile produces confusing sets of error messages, some of which contain filenames relative to A and some relative to B, making it impossible for my Emacs to reliably find the right file. RIIR (Rewrite It In Rust) Having got my build tooling sorted out I could go back to my actual program. I translated the main program, and the simulator, from C to Rust, more or less line-by-line. I made the Rust version of the simulator produce the same output format as the C one. That let me check that the two programs had the same (simulated) behaviour. Which they did (after fixing a few glitches in the simulator log formatting). Emboldened, I flashed the Rust version of my program to the DigiSpark. It worked right away! RIIR had caused the bug to vanish. Of course, to rewrite the program in Rust, and get it to compile, it was necessary to be careful about the types of all the various integers, so that s not so surprising. Indeed, it was the point. I was then able to refactor the program to be a bit more natural and DRY, and improve some internal interfaces. Rust s greater power, compared to C, made those cleanups easier, so making them worthwhile. However, when doing real-world testing I found a weird problem: my timings were off. Measured, the real program was too fast by a factor of slightly more than 2. A bit of searching (and searching my memory) revealed the cause: I was using a board template for an Adafruit Trinket. The Trinket has a clock speed of 8MHz. But the DigiSpark runs at 16.5MHz. (This is discussed in a ticket against one of the C/C++ libraries supporting the ATTiny85 chip.) The Arduino IDE had offered me a choice of clock speeds. I have no idea how that dropdown menu took effect; I suspect it was adding prelude code to adjust the clock prescaler. But my attempts to mess with the CPU clock prescaler register by hand at the start of my Rust program didn t bear fruit. So instead, I adopted a bodge: since my code has (for code structure reasons, amongst others) only one place where it dealt with the underlying hardware s notion of time, I simply changed my delay function to adjust the passed-in delay values, compensating for the wrong clock speed. There was probably a more principled way. For example I could have (re)based my work on either of the two unmerged open MRs which added proper support for the DigiSpark board, rather than abusing the Adafruit Trinket definition. But, having a nearly-working setup, and an explanation for the behaviour, I preferred the narrower fix to reopening any cans of worms. An offer of help As will be obvious from this posting, I m not an expert in dev tools for embedded systems. Far from it. This area seems like quite a deep swamp, and I m probably not the person to help drain it. (Frankly, much of the improvement work ought to be done, and paid for, by hardware vendors.) But, as a full Member of the Debian Project, I have considerable gatekeeping authority there. I also have much experience of software packaging, build systems, and release management. If anyone wants to try to improve the situation with embedded tooling in Debian, and is willing to do the actual packaging work. I would be happy to advise, and to review and sponsor your contributions. An obvious candidate: it seems to me that micronucleus could easily be in Debian. Possibly a DigiSpark board definition could be provided to go with the arduino package. Unfortunately, IMO Debian s Rust packaging tooling and workflows are very poor, and the first of my suggestions for improvement wasn t well received. So if you need help with improving Rust packages in Debian, please talk to the Debian Rust Team yourself. Conclusions Embedded programming is still rather a mess and probably always will be. Embedded build systems can be bizarre. Documentation is scant. You re often expected to download board support packages full of mystery binaries, from the board vendor (or others). Dev tooling is maddening, especially if aimed at novice programmers. You want version control? Hermetic tracking of your project s build and install configuration? Actually to be told by the compiler when you write obvious bugs? You re way off the beaten track. As ever, Free Software is under-resourced and the maintainers are often busy, or (reasonably) have other things to do with their lives. All is not lost Rust can be a significantly better bet than C for embedded software: The Rust compiler will catch a good proportion of programming errors, and an experienced Rust programmer can arrange (by suitable internal architecture) to catch nearly all of them. When writing for a chip in the middle of some circuit, where debugging involves staring an LED or a multimeter, that s precisely what you want. Rust embedded dev tooling was, in this case, considerably better. Still quite chaotic and strange, and less mature, perhaps. But: significantly fewer mystery downloads, and significantly less crazy deviations from the language s normal build system. Overall, less bad software supply chain integrity. The ATTiny85 chip, and the DigiSpark board, served my hardware needs very well. (More about the hardware aspects of this project in a future posting.)

comment count unavailable comments

16 June 2023

John Goerzen: Using git-annex for Data Archiving

In my recent post about data archiving to removable media, I laid out the difference between backing up and archiving, and also said I d evaluate git-annex and dar. This post evaluates git-annex. The next will look at dar, and then I ll make a comparison post. What is git-annex? git-annex is a fantastic and versatile program that does well, it s one of those things that can do so much that it s a bit hard to describe. Its homepage says:
git-annex allows managing large files with git, without storing the file contents in git. It can sync, backup, and archive your data, offline and online. Checksums and encryption keep your data safe and secure. Bring the power and distributed nature of git to bear on your large files with git-annex.
I think the particularly interesting features of git-annex aren t actually included in that list. Among the features of git-annex that make it shine for this purpose, its location tracking is key. git-annex can know exactly which device has which file at which version at all times. Combined with its preferred content settings, this lets you very easily say things like: git-annex can be set to allow a configurable amount of free space to remain on a device, and it will fill it up with whatever copies are necessary up until it hits that limit. Very convenient! git-annex will store files in a folder structure that mirrors the origin folder structure, in plain files just as they were. This maximizes the ability for a future person to access the content, since it is all viewable without any special tool at all. Of course, for things like optical media, git-annex will essentially be creating what amounts to incrementals. To obtain a consistent copy of the original tree, you would still need to use git-annex to process (export) the archives. git-annex challenges In my prior post, I related some challenges with git-annex. The biggest of them quite poor performance of the directory special remote when dealing with many files has been resolved by Joey, git-annex s author! That dramatically improves the git-annex use scenario here! The fixing commit is in the source tree but not yet in a release. git-annex no doubt may still have performance challenges with repositories in the 100,000+-range, but in that order of magnitude it now looks usable. I m not sure about 1,000,000-file repositories (I haven t tested); there is a page about scalability. A few other more minor challenges remain: I worked around the timestamp issue by using the mtree-netbsd package in Debian. mtree writes out a summary of files and metadata in a tree, and can restore them. To save: mtree -c -R nlink,uid,gid,mode -p /PATH/TO/REPO -X <(echo './.git') > /tmp/spec And, after restoration, the timestamps can be applied with: mtree -t -U -e < /tmp/spec Walkthrough: initial setup To use git-annex in this way, we have to do some setup. My general approach is this: Let's get started! I've set all these shell variables appropriately for this example, and REPONAME to "testdata". We'll begin by setting up the metadata-only tracking repo.
$ REPONAME=testdata
$ mkdir "$METAREPO"
$ cd "$METAREPO"
$ git init
$ git config annex.thin true
There is a sort of complicated topic of how git-annex stores files in a repo, which varies depending on whether the data for the file is present in a given repo, and whether the file is locked or unlocked. Basically, the options I use here cause git-annex to mostly use hard links instead of symlinks or pointer files, for maximum compatibility with non-POSIX filesystems such as NTFS and UDF, which might be used on these devices. thin is part of that. Let's continue:
$ git annex init 'local hub'
init local hub ok
(recording state in git...)
$ git annex wanted . "include=* and exclude=$REPONAME/*"
wanted . ok
(recording state in git...)
In a bit, we are going to import the source data under the directory named $REPONAME (here, testdata). The wanted command says: in this repository (represented by the bare dot), the files we want are matched by the rule that says eveyrthing except what's under $REPONAME. In other words, we don't want to make an unnecessary copy here. Because I expect to use an mtree file as documented above, and it is not under $REPONAME/, it will be included. Let's just add it and tweak some things.
$ touch mtree
$ git annex add mtree
add mtree
ok
(recording state in git...)
$ git annex sync
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
[main (root-commit) 6044742] git-annex in local hub
1 file changed, 1 insertion(+)
create mode 120000 mtree
ok
$ ls -l
total 9
lrwxrwxrwx 1 jgoerzen jgoerzen 178 Jun 15 22:31 mtree -> .git/annex/objects/pX/ZJ/...
OK! We've added a file, and it got transformed into a symlink. That's the thing I said we were going to avoid, so:
git annex adjust --unlock-present
adjust
Switched to branch 'adjusted/main(unlockpresent)'
ok
$ ls -l
total 1
-rw-r--r-- 2 jgoerzen jgoerzen 0 Jun 15 22:31 mtree
You'll notice it transformed into a hard link (nlinks=2) file. Great! Now let's import the source data. For that, we'll use the directory special remote.
$ git annex initremote source type=directory directory=$SOURCEDIR importtree=yes \
encryption=none
initremote source ok
(recording state in git...)
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
$ git config remote.source.annex-tracking-branch main:$REPONAME
OK, so here we created a new remote named "source". We enabled it, and set some configuration. Most notably, that last line causes files from "source" to be imported under $REPONAME/ as we wanted earlier. Now we're ready to scan the source.
$ git annex sync
At this point, you'll see git-annex computing a hash for every file in the source directory. I can verify with du that my metadata-only repo only uses 14MB of disk space, while my source is around 4GB. Now we can see what git-annex thinks about file locations:
$ git-annex whereis less
whereis mtree (1 copy)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
ok
whereis testdata/[redacted] (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
... many more lines ...
So remember we said we wanted mtree, but nothing under testdata, under this repo? That's exactly what we got. git-annex knows that the files under testdata can be found under the "source" special remote, but aren't in any git-annex repo -- yet. Now we'll start adding them. Walkthrough: removable drives I've set up two 500MB filesystems to represent removable drives. We'll see how git-annex works with them.
$ cd $DRIVE01
$ df -h .
Filesystem Size Used Avail Use% Mounted on
acrypt/no-backup/annexdrive01 500M 1.0M 499M 1% /acrypt/no-backup/annexdrive01
$ git clone $METAREPO
Cloning into 'testdata'...
done.
$ cd $REPONAME
$ git config annex.thin true
$ git annex init "test drive #1"
$ git annex adjust --hide-missing --unlock
adjust
Switched to branch 'adjusted/main(hidemissing-unlocked)'
ok
$ git annex sync
OK, that's the initial setup. Now let's enable the source remote and configure it the same way we did before:
$ git annex enableremote source directory=$SOURCEDIR
enableremote source ok
(recording state in git...)
$ git config remote.source.annex-readonly true
$ git config remote.source.annex-tracking-branch main:$REPONAME
$ git config annex.securehashesonly true
$ git config annex.genmetadata true
$ git config annex.diskreserve 100M
Now, we'll add the drive to a group called "driveset01" and configure what we want on it:
$ git annex group . driveset01
$ git annex wanted . '(not copies=driveset01:1)'
What this does is say: first of all, this drive is in a group named driveset01. Then, this drive wants any files for which there isn't already at least one copy in driveset01. Now let's load up some files!
$ git annex sync --content
As the messages fly by from here, you'll see it mentioning that it got mtree, and then various files from "source" -- until, that is, the filesystem had less than 100MB free, at which point it complained of no space for the rest. Exactly like we wanted! Now, we need to teach $METAREPO about $DRIVE01.
$ cd $METAREPO
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync drive01
git-annex sync will change default behavior to operate on --content in a future version of git-annex. Recommend you explicitly use --no-content (or -g) to prepare for that change. (Or you can configure annex.synccontent)
commit
On branch adjusted/main(unlockpresent)
nothing to commit, working tree clean
ok
merge synced/main (Merging into main...)
Updating d1d9e53..817befc
Fast-forward
(Merging into adjusted branch...)
Updating 7ccc20b..861aa60
Fast-forward
ok
pull drive01
remote: Enumerating objects: 214, done.
remote: Counting objects: 100% (214/214), done.
remote: Compressing objects: 100% (95/95), done.
remote: Total 110 (delta 6), reused 0 (delta 0), pack-reused 0
Receiving objects: 100% (110/110), 13.01 KiB 1.44 MiB/s, done.
Resolving deltas: 100% (6/6), completed with 6 local objects.
From /acrypt/no-backup/annexdrive01/testdata
* [new branch] adjusted/main(hidemissing-unlocked) -> drive01/adjusted/main(hidemissing-unlocked)
* [new branch] adjusted/main(unlockpresent) -> drive01/adjusted/main(unlockpresent)
* [new branch] git-annex -> drive01/git-annex
* [new branch] main -> drive01/main
* [new branch] synced/main -> drive01/synced/main
ok
OK! This step is important, because drive01 and drive02 (which we'll set up shortly) won't necessarily be able to reach each other directly, due to not being plugged in simultaneously. Our $METAREPO, however, will know all about where every file is, so that the "wanted" settings can be correctly resolved. Let's see what things look like now:
$ git annex whereis less
whereis mtree (2 copies)
8aed01c5-da30-46c0-8357-1e8a94f67ed6 -- local hub [here]
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
ok
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
If I scroll down a bit, I'll see the files past the 400MB mark that didn't make it onto drive01. Let's add another example drive! Walkthrough: Adding a second drive The steps for $DRIVE02 are the same as we did before, just with drive02 instead of drive01, so I'll omit listing it all a second time. Now look at this excerpt from whereis:
whereis testdata/[redacted] (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
whereis testdata/[redacted] (1 copy)
c4540343-e3b5-4148-af46-3f612adda506 -- test drive #2 [drive02]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
Look at that! Some files on drive01, some on drive02, some neither place. Perfect! Walkthrough: Updates So I've made some changes in the source directory: moved a file, added another, and deleted one. All of these were copied to drive01 above. How do we handle this? First, we update the metadata repo:
$ cd $METAREPO
$ git annex sync
$ git annex dropunused all
OK, this has scanned $SOURCEDIR and noted changes. Let's see what whereis says:
$ git annex whereis less
...
whereis testdata/cp (0 copies)
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
failed
whereis testdata/file01-unchanged (1 copy)
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [drive01]
The following untrusted locations may also have copies:
9e48387e-b096-400a-8555-a3caf5b70a64 -- [source]
ok
So this looks right. The file I added was a copy of /bin/cp. I moved another file to one named file01-unchanged. Notice that it realized this was a rename and that the data still exists on drive01. Well, let's update drive01.
$ cd $DRIVE01/$REPONAME
$ git annex sync --content
Looking at the testdata/ directory now, I see that file01-unchanged has been renamed, the deleted file is gone, but cp isn't yet here -- probably due to space issues; as it's new, it's undefined whether it or some other file would fill up free space. Let's work along a few more commands.
$ git annex get --auto
$ git annex drop --auto
$ git annex dropunused all
And now, let's make sure metarepo is updated with its state.
$ cd $METAREPO
$ git annex sync
We could do the same for drive02. This is how we would proceed with every update. Walkthrough: Restoration Now, we have bare files at reasonable locations in drive01 and drive02. But, to generate a consistent restore, we need to be able to actually do an export. Otherwise, we may have files with old names, duplicate files, etc. Let's assume that we lost our source and metadata repos and have to restore from scratch. We'll make a new $RESTOREDIR. We'll begin with drive01 since we used it most recently.
$ mv $METAREPO $METAREPO.disabled
$ mv $SOURCEDIR $SOURCEDIR.disabled
$ git clone $DRIVE01/$REPONAME $RESTOREDIR
$ cd $RESTOREDIR
$ git config annex.thin true
$ git annex init "restore"
$ git annex adjust --hide-missing --unlock
Now, we need to connect the drive01 and pull the files from it.
$ git remote add drive01 $DRIVE01/$REPONAME
$ git annex sync --content
Now, repeat with drive02:
$ git remote add drive02 $DRIVE02/$REPONAME
$ git annex sync --content
Now we've got all our content back! Here's what whereis looks like:
whereis testdata/file01-unchanged (3 copies)
3d663d0f-1a69-4943-8eb1-f4fe22dc4349 -- restore [here]
9e48387e-b096-400a-8555-a3caf5b70a64 -- source
b46fc85c-c68e-4093-a66e-19dc99a7d5e7 -- test drive #1 [origin]
ok
...
I was a little surprised that drive01 didn't seem to know what was on drive02. Perhaps that could have been remedied by adding more remotes there? I'm not entirely sure; I'd thought would have been able to do that automatically. Conclusions I think I have demonstrated two things: First, git-annex is indeed an extremely powerful tool. I have only scratched the surface here. The location tracking is a neat feature, and being able to just access the data as plain files if all else fails is nice for future users. Secondly, it is also a complex tool and difficult to get right for this purpose (I think much easier for some other purposes). For someone that doesn't live and breathe git-annex, it can be hard to get right. In fact, I'm not entirely sure I got it right here. Why didn't drive02 know what files were on drive01 and vice-versa? I don't know, and that reflects some kind of misunderstanding on my part about how metadata is synced; perhaps more care needs to be taken in restore, or done in a different order, than I proposed. I initially tried to do a restore by using git annex export to a directory special remote with exporttree=yes, but I couldn't ever get it to actually do anything, and I don't know why. These two cut against each other. On the one hand, the raw accessibility of the data to someone with no computer skills is unmatched. On the other hand, I'm not certain I have the skill to always prepare the discs properly, or to do a proper consistent restore.

18 January 2023

Matthew Garrett: PKCS#11. hardware keystores, and Apple frustrations

There's a bunch of ways you can store cryptographic keys. The most obvious is to just stick them on disk, but that has the downside that anyone with access to the system could just steal them and do whatever they wanted with them. At the far end of the scale you have Hardware Security Modules (HSMs), hardware devices that are specially designed to self destruct if you try to take them apart and extract the keys, and which will generate an audit trail of every key operation. In between you have things like smartcards, TPMs, Yubikeys, and other platform secure enclaves - devices that don't allow arbitrary access to keys, but which don't offer the same level of assurance as an actual HSM (and are, as a result, orders of magnitude cheaper).

The problem with all of these hardware approaches is that they have entirely different communication mechanisms. The industry realised this wasn't ideal, and in 1994 RSA released version 1 of the PKCS#11 specification. This defines a C interface with a single entry point - C_GetFunctionList. Applications call this and are given a structure containing function pointers, with each entry corresponding to a PKCS#11 function. The application can then simply call the appropriate function pointer to trigger the desired functionality, such as "Tell me how many keys you have" and "Sign this, please". This is both an example of C not just being a programming language and also of you having to shove a bunch of vendor-supplied code into your security critical tooling, but what could possibly go wrong.

(Linux distros work around this problem by using p11-kit, which is a daemon that speaks d-bus and loads PKCS#11 modules for you. You can either speak to it directly over d-bus, or for apps that only speak PKCS#11 you can load a module that just transports the PKCS#11 commands over d-bus. This moves the weird vendor C code out of process, and also means you can deal with these modules without having to speak the C ABI, so everyone wins)

One of my work tasks at the moment is helping secure SSH keys, ensuring that they're only issued to appropriate machines and can't be stolen afterwards. For Windows and Linux machines we can stick them in the TPM, but Macs don't have a TPM as such. Instead, there's the Secure Enclave - part of the T2 security chip on x86 Macs, and directly integrated into the M-series SoCs. It doesn't have anywhere near as many features as a TPM, let alone an HSM, but it can generate NIST curve elliptic curve keys and sign things with them and that's good enough. Things are made more complicated by Apple only allowing keys to be used by the app that generated them, so it's hard for applications to generate keys on behalf of each other. This can be mitigated by using CryptoTokenKit, an interface that allows apps to present tokens to the systemwide keychain. Although this is intended for allowing a generic interface for access to such tokens (kind of like PKCS#11), an app can generate its own keys in the Secure Enclave and then expose them to other apps via the keychain through CryptoTokenKit.

Of course, applications then need to know how to communicate with the keychain. Browsers mostly do so, and Apple's version of SSH can to an extent. Unfortunately, that extent is "Retrieve passwords to unlock on-disk keys", which doesn't help in our case. PKCS#11 comes to the rescue here! Apple ship a module called ssh-keychain.dylib, a PKCS#11 module that's intended to allow SSH to use keys that are present in the system keychain. Unfortunately it's not super well maintained - it got broken when Big Sur moved all the system libraries into a cache, but got fixed up a few releases later. Unfortunately every time I tested it with our CryptoTokenKit provider (and also when I retried with SecureEnclaveToken to make sure it wasn't just our code being broken), ssh would tell me "provider /usr/lib/ssh-keychain.dylib returned no slots" which is not especially helpful. Finally I realised that it was actually generating more debug output, but it was being sent to the system debug logs rather than the ssh debug output. Well, when I say "more debug output", I mean "Certificate []: algorithm is not supported, ignoring it", which still doesn't tell me all that much. So I stuck it in Ghidra and searched for that string, and the line above it was

iVar2 = __auth_stubs::_objc_msgSend(uVar7,"isEqual:",*(undefined8*)__got::_kSecAttrKeyTypeRSA);

with it immediately failing if the key isn't RSA. Which it isn't, since the Secure Enclave doesn't support RSA. Apple's PKCS#11 module appears incapable of making use of keys generated on Apple's hardware.

There's a couple of ways of dealing with this. The first, which is taken by projects like Secretive, is to implement the SSH agent protocol and have SSH delegate key management to that agent, which can then speak to the keychain. But if you want this to work in all cases you need to implement all the functionality in the existing ssh-agent, and that seems like a bunch of work. The second is to implement a PKCS#11 module, which sounds like less work but probably more mental anguish. I'll figure that out tomorrow.

comment count unavailable comments

16 January 2023

Dirk Eddelbuettel: RcppArmadillo 0.11.4.3.1 on CRAN: Updates

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra and scientific computing. It aims towards a good balance between speed and ease of use, has a syntax deliberately close to Matlab, and is useful for algorithm development directly in C++, or quick conversion of research code into production environments. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 1034 packages other packages on CRAN, downloaded 27.6 million times (per the partial logs from the cloud mirrors of CRAN), and the CSDA paper (preprint / vignette) by Conrad and myself has been cited 509 times according to Google Scholar. This release brings another upstream bugfix interation 11.4.3, released in accordance with the aimed-for monthly release cadence. We had hoped to move away from suppressing deprecation warnings in this release, and had prepared over two dozen patch sets all well as pull requests as documented in issue #391. However, it turns out that we both missed with one or two needed set of changes as well as two other sets of changes triggering deprecation warnings. So we expanded issue #391, and added issue #402 and prepared another eleven pull requests and patches today. With that we can hopefully remove the suppression of these warnings by an expected late of late April. The full set of changes (since the last CRAN release 0.11.4.2.1) follows.

Changes in RcppArmadillo version 0.11.4.3.1 (2023-01-14)
  • The #define ARMA_IGNORE_DEPRECATED_MARKER remains active to suppress the (upstream) deprecation warnings, see #391 and #402 for details.

Changes in RcppArmadillo version 0.11.4.3.0 (2022-12-28) (GitHub Only)
  • Upgraded to Armadillo release 11.4.3 (Ship of Theseus)
    • fix corner case in pinv() when processing symmetric matrices
  • Protect the undefine of NDEBUG behind additional opt-in define

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 December 2022

Simon Josefsson: Preseeding Trisquel Virtual Machines Using netinst Images

I m migrating some self-hosted virtual machines to Trisquel, and noticed that Trisquel does not offer cloud-images similar to the Debian Cloud and Ubuntu Cloud images. Thus my earlier approach based on virt-install --cloud-init and cloud-localds does not work with Trisquel. While I hope that Trisquel will eventually publish cloud-compatible images, I wanted to document an alternative approach for Trisquel based on preseeding. This is how I used to install Debian and Ubuntu in the old days, and the automated preseed method is best documented in the Debian installation manual. I was hoping to forget about the preseed format, but maybe it will become one of those legacy technologies that never really disappears? Like FAT16 and 8-bit microcontrollers. Below I assume you have a virtual machine host server up that runs libvirt and has virt-install and similar tools; install them with the following command. I run a pre-release version of Trisquel 11 aramo on my VM-host, but I believe any recent dpkg-based distribution like Trisquel 9/10, PureOS 10, Debian 11 or Ubuntu 20.04/22.04 would work.
apt-get install libvirt-daemon-system virtinst genisoimage cloud-image-utils osinfo-db-tools
The approach can install Trisquel 9 (etiona), Trisquel 10 (nabia) and the pre-release of Trisquel 11. First download and verify the integrity of the netinst images that we will need. Unfortunately the Trisquel 11 netinst beta image does not have any checksum or signature available.
mkdir -p /root/iso
cd /root/iso
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_9.0.2_amd64.iso
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_9.0.2_amd64.iso.asc
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_9.0.2_amd64.iso.sha256
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_10.0.1_amd64.iso
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_10.0.1_amd64.iso.asc
wget -q https://mirror.fsf.org/trisquel-images/trisquel-netinst_10.0.1_amd64.iso.sha256
wget -q -O- https://archive.trisquel.info/trisquel/trisquel-archive-signkey.gpg   gpg --import
sha256sum -c trisquel-netinst_9.0.2_amd64.iso.sha256
gpg --verify trisquel-netinst_9.0.2_amd64.iso.asc
sha256sum -c trisquel-netinst_10.0.1_amd64.iso.sha256
gpg --verify trisquel-netinst_10.0.1_amd64.iso.asc
wget -q https://cdbuilds.trisquel.org/aramo/trisquel-netinst_11.0-20221225_amd64.iso
echo '179566639ca8f14f0c3d5658209c59a0916d9e3bf9c026660cc07b28f2311631  trisquel-netinst_11.0-20221225_amd64.iso'   sha256sum -c
I have developed the following fairly minimal preseed file that works with all three Trisquel releases. Compare it against the official Trisquel 11 preseed skeleton and the Debian 11 example preseed file. You should modify obvious things like SSH key, host/IP settings, partition layout and decide for yourself how to deal with passwords. While Ubuntu/Trisquel usually wants to setup a user account, I prefer to login as root hence setting passwd/root-login to true and passwd/make-user to false.

root@trana:~# cat>trisquel.preseed 
d-i debian-installer/locale select en_US
d-i keyboard-configuration/xkb-keymap select us
d-i netcfg/choose_interface select auto
d-i netcfg/disable_autoconfig boolean true
d-i netcfg/get_ipaddress string 192.168.10.201
d-i netcfg/get_netmask string 255.255.255.0
d-i netcfg/get_gateway string 192.168.10.46
d-i netcfg/get_nameservers string 192.168.10.46
d-i netcfg/get_hostname string trisquel
d-i netcfg/get_domain string sjd.se
d-i clock-setup/utc boolean true
d-i time/zone string UTC
d-i mirror/country string manual
d-i mirror/http/hostname string ftp.acc.umu.se
d-i mirror/http/directory string /mirror/trisquel/packages
d-i mirror/http/proxy string
d-i partman-auto/method string regular
d-i partman-partitioning/confirm_write_new_label boolean true
d-i partman/choose_partition select finish
d-i partman/confirm boolean true
d-i partman/confirm_nooverwrite boolean true
d-i partman-basicfilesystems/no_swap boolean false
d-i partman-auto/expert_recipe string myroot :: 1000 50 -1 ext4 \
     $primary    $bootable    method  format   \
     format    use_filesystem    filesystem  ext4   \
     mountpoint  /   \
    .
d-i partman-auto/choose_recipe select myroot
d-i passwd/root-login boolean true
d-i user-setup/allow-password-weak boolean true
d-i passwd/root-password password r00tme
d-i passwd/root-password-again password r00tme
d-i passwd/make-user boolean false
tasksel tasksel/first multiselect
d-i pkgsel/include string openssh-server
popularity-contest popularity-contest/participate boolean false
d-i grub-installer/only_debian boolean true
d-i grub-installer/with_other_os boolean true
d-i grub-installer/bootdev string default
d-i finish-install/reboot_in_progress note
d-i preseed/late_command string mkdir /target/root/.ssh ; echo ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAILzCFcHHrKzVSPDDarZPYqn89H5TPaxwcORgRg+4DagE cardno:FFFE67252015 > /target/root/.ssh/authorized_keys
^D
root@trana:~# 
Use the file above as a skeleton for preparing a VM-specific preseed file as follows. The environment variables HOST and IPS will be used later on too.

root@trana:~# HOST=foo
root@trana:~# IP=192.168.10.197
root@trana:~# sed -e "s,get_ipaddress string.*,get_ipaddress string $IP," -e "s,get_hostname string.*,get_hostname string $HOST," < trisquel.preseed > vm-$HOST.preseed
root@trana:~# 
The following script is used to prepare the ISO images with the preseed file that we will need. This script is inspired by the Debian Wiki Preseed EditIso page and the Trisquel ISO customization wiki page. There are a couple of variations based on earlier works. Paths are updated to match the Trisquel netinst ISO layout, which differ slightly from Debian. We modify isolinux.cfg to boot the auto label without a timeout. On Trisquel 11 the auto boot label exists, but on Trisquel 9 and Trisquel 10 it does not exist so we add it in order to be able to start the automated preseed installation.

root@trana:~# cat gen-preseed-iso 
#!/bin/sh
# Copyright (C) 2018-2022 Simon Josefsson -- GPLv3+
# https://wiki.debian.org/DebianInstaller/Preseed/EditIso
# https://trisquel.info/en/wiki/customizing-trisquel-iso
set -e
set -x
ISO="$1"
PRESEED="$2"
OUTISO="$3"
LASTPWD="$PWD"
test -f "$ISO"
test -f "$PRESEED"
test ! -f "$OUTISO"
TMPDIR=$(mktemp -d)
mkdir "$TMPDIR/mnt"
mkdir "$TMPDIR/tmp"
cp "$PRESEED" "$TMPDIR"/preseed.cfg
cd "$TMPDIR"
mount "$ISO" mnt/
cp -rT mnt/ tmp/
umount mnt/
chmod +w -R tmp/
gunzip tmp/initrd.gz
echo preseed.cfg   cpio -H newc -o -A -F tmp/initrd
gzip tmp/initrd
chmod -w -R tmp/
sed -i "s/timeout 0/timeout 1/" tmp/isolinux.cfg
sed -i "s/default vesamenu.c32/default auto/" tmp/isolinux.cfg
if ! grep -q auto tmp/adtxt.cfg; then
    cat<<EOF >> tmp/adtxt.cfg
label auto
	menu label ^Automated install
	kernel linux
	append auto=true priority=critical vga=788 initrd=initrd.gz --- quiet
EOF
fi
cd tmp/
find -follow -type f   xargs md5sum  > md5sum.txt
cd ..
cd "$LASTPWD"
genisoimage -r -J -b isolinux.bin -c boot.cat \
            -no-emul-boot -boot-load-size 4 -boot-info-table \
            -o "$OUTISO" "$TMPDIR/tmp/"
rm -rf "$TMPDIR"
exit 0
^D
root@trana:~# chmod +x gen-preseed-iso 
root@trana:~# 
Next run the command on one of the downloaded ISO image and the generated preseed file.

root@trana:~# ./gen-preseed-iso /root/iso/trisquel-netinst_10.0.1_amd64.iso vm-$HOST.preseed vm-$HOST.iso
+ ISO=/root/iso/trisquel-netinst_10.0.1_amd64.iso
+ PRESEED=vm-foo.preseed
+ OUTISO=vm-foo.iso
+ LASTPWD=/root
+ test -f /root/iso/trisquel-netinst_10.0.1_amd64.iso
+ test -f vm-foo.preseed
+ test ! -f vm-foo.iso
+ mktemp -d
+ TMPDIR=/tmp/tmp.mNEprT4Tx9
+ mkdir /tmp/tmp.mNEprT4Tx9/mnt
+ mkdir /tmp/tmp.mNEprT4Tx9/tmp
+ cp vm-foo.preseed /tmp/tmp.mNEprT4Tx9/preseed.cfg
+ cd /tmp/tmp.mNEprT4Tx9
+ mount /root/iso/trisquel-netinst_10.0.1_amd64.iso mnt/
mount: /tmp/tmp.mNEprT4Tx9/mnt: WARNING: source write-protected, mounted read-only.
+ cp -rT mnt/ tmp/
+ umount mnt/
+ chmod +w -R tmp/
+ gunzip tmp/initrd.gz
+ echo preseed.cfg
+ cpio -H newc -o -A -F tmp/initrd
5 blocks
+ gzip tmp/initrd
+ chmod -w -R tmp/
+ sed -i s/timeout 0/timeout 1/ tmp/isolinux.cfg
+ sed -i s/default vesamenu.c32/default auto/ tmp/isolinux.cfg
+ grep -q auto tmp/adtxt.cfg
+ cat
+ cd tmp/
+ find -follow -type f
+ xargs md5sum
+ cd ..
+ cd /root
+ genisoimage -r -J -b isolinux.bin -c boot.cat -no-emul-boot -boot-load-size 4 -boot-info-table -o vm-foo.iso /tmp/tmp.mNEprT4Tx9/tmp/
I: -input-charset not specified, using utf-8 (detected in locale settings)
Using GCRY_000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha512.mod (gcry_sha256.mod)
Using XNU_U000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/xnu_uuid.mod (xnu_uuid_test.mod)
Using PASSW000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/password_pbkdf2.mod (password.mod)
Using PART_000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/part_sunpc.mod (part_sun.mod)
Using USBSE000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_pl2303.mod (usbserial_ftdi.mod)
Using USBSE001.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_ftdi.mod (usbserial_usbdebug.mod)
Using VIDEO000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/videotest.mod (videotest_checksum.mod)
Using GFXTE000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gfxterm_background.mod (gfxterm_menu.mod)
Using GCRY_001.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/gcry_sha256.mod (gcry_sha1.mod)
Using MULTI000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/multiboot2.mod (multiboot.mod)
Using USBSE002.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/usbserial_usbdebug.mod (usbserial_common.mod)
Using MDRAI000.MOD;1 for  /tmp/tmp.mNEprT4Tx9/tmp/boot/grub/x86_64-efi/mdraid09.mod (mdraid09_be.mod)
Size of boot image is 4 sectors -> No emulation
 22.89% done, estimate finish Thu Dec 29 23:36:18 2022
 45.70% done, estimate finish Thu Dec 29 23:36:18 2022
 68.56% done, estimate finish Thu Dec 29 23:36:18 2022
 91.45% done, estimate finish Thu Dec 29 23:36:18 2022
Total translation table size: 2048
Total rockridge attributes bytes: 24816
Total directory bytes: 40960
Path table size(bytes): 64
Max brk space used 46000
21885 extents written (42 MB)
+ rm -rf /tmp/tmp.mNEprT4Tx9
+ exit 0
root@trana:~#
Now the image is ready for installation, so invoke virt-install as follows. The machine will start directly, launching the preseed automatic installation. At this point, I usually click on the virtual machine in virt-manager to follow screen output until the installation has finished. If everything works OK the machines comes up and I can ssh into it.

root@trana:~# virt-install --name $HOST --disk vm-$HOST.img,size=5 --cdrom vm-$HOST.iso --osinfo linux2020 --autostart --noautoconsole --wait
Using linux2020 default --memory 4096
Starting install...
Allocating 'vm-foo.img'                                                                                                                                     0 B  00:00:00 ... 
Creating domain...                                                                                                                                          0 B  00:00:00     
Domain is still running. Installation may be in progress.
Waiting for the installation to complete.
Domain has shutdown. Continuing.
Domain creation completed.
Restarting guest.
root@trana:~# 
There are some problems that I have noticed that would be nice to fix, but are easy to work around. The first is that at the end of the installation of Trisquel 9 and Trisquel 10, the VM hangs after displaying Sent SIGKILL to all processes followed by Requesting system reboot. I kill the VM manually using virsh destroy foo and start it up again using virsh start foo. For production use I expect to be running Trisquel 11, where the problem doesn t happen, so this does not bother me enough to debug further. The remaining issue that once booted, a Trisquel 11 VM has lost its DNS nameserver configuration, presumably due to poor integration with systemd-resolved. Both Trisquel 9 and Trisquel 10 uses systemd-resolved where DNS works after first boot, so this appears to be a Trisquel 11 bug. You can work around it with rm -f /etc/resolv.conf && echo 'nameserver A.B.C.D' > /etc/resolv.conf or drink the systemd Kool-Aid. If you want to clean up and re-start the process, here is how you wipe out what you did. After this, you may run the sed, ./gen-preseed-iso and virt-install commands again. Remember, use virsh shutdown foo to gracefully shutdown a VM.

root@trana:~# virsh destroy foo
Domain 'foo' destroyed
root@trana:~# virsh undefine foo --remove-all-storage
Domain 'foo' has been undefined
Volume 'vda'(/root/vm-foo.img) removed.
root@trana:~# rm vm-foo.*
root@trana:~# 
Happy hacking on your virtal machines!

4 October 2022

Russ Allbery: Review: The Dragon Never Sleeps

Review: The Dragon Never Sleeps, by Glen Cook
Publisher: Night Shade Books
Copyright: 1988
Printing: 2008
ISBN: 1-59780-099-6
Format: MOBI
Pages: 449
Canon Space is run, in a way, by the noble mercantile houses, who spread their cities, colonies, and mines through the mysterious Web that allows faster-than-light interstellar travel. The true rulers of Canon Space, though, are the Guardships: enormous, undefeatable starships run by crews who have become effectively immortal by repeated uploading and reincarnation. Or, in the case of the Deified, without reincarnation, observing, ordering, advising, and meddling from an entirely virtual existence. The Guardships have enforced the status quo for four thousand years. House Tregesser thinks they have the means to break the stranglehold of the Guardships. They have contact with Outsiders from beyond Canon Space who can provide advanced technology. They have their own cloning technology, which they use to create backup copies of their elites. And they have Lupo Provik, a quietly brilliant schemer who has devoted his life to destroying Guardships. This book was so bad. A more sensible person than I would have given up after the first hundred pages, but I was too stubborn. The stubbornness did not pay off. Sometimes I pick up an older SFF novel and I'm reminded of how much the quality bar in the field has been raised over the past twenty years. It's not entirely fair to treat The Dragon Never Sleeps as typical of 1980s science fiction: Cook is far better known for his Black Company military fantasy series, this is one of his minor novels, and it's only been intermittently in print. But I'm dubious this would have been published at all today. First, the writing is awful. It's choppy, cliched, awkward, and has no rhythm or sense of beauty. Here's a nearly random paragraph near the beginning of the book as a sample:
He hurled thunders and lightnings with renewed fury. The whole damned universe was out to frustrate him. XII Fulminata! What the hell? Was some malign force ranged against him? That was his most secret fear. That somehow someone or something was using him the way he used so many others.
(Yes, this is one of the main characters throwing a temper tantrum with a special effects machine.) In a book of 450 pages, there are 151 chapters, and most chapters switch viewpoint characters. Most of them also end with a question or some vaguely foreboding sentence to try to build tension, and while I'm willing to admit that sometimes works when used sparingly, every three pages is not sparingly. This book is also weirdly empty of description for its size. We get a few set pieces, a few battles, and a sentence or two of physical description of most characters when they're first introduced, but it's astonishing how little of a mental image I had of this universe after reading the whole book. Cook probably describes a Guardship at some point in this book, but if he does, it completely failed to stick in my memory. There are aliens that everyone recognizes as aliens, so presumably they look different than humans, but for most of them I have no idea how. Very belatedly we're told one important species (which never gets a name) has a distinctive smell. That's about it. Instead, nearly the whole book is dialogue and scheming. It's clear that Cook is intending to write a story of schemes and counter-schemes and jousting between brilliant characters. This can work if the dialogue is sufficiently sharp and snappy to carry the story. It is not.
"What mischief have you been up to, Kez Maefele?" "Staying alive in a hostile universe." "You've had more than your share of luck." "Perhaps luck had nothing to do with it, WarAvocat. Till now." "Luck has run out. The Ku Question has run its course. The symbol is about to receive its final blow."
There are hundreds of pages of this sort of thing. The setting is at least intriguing, if not stunningly original. There are immortal warships oppressing human space, mysterious Outsiders, great house politics, and an essentially immortal alien warrior who ends up carrying most of the story. That's material for a good space opera if the reader slowly learns the shape of the universe, its history, and its landmarks and political factions. Or the author can decline to explain any of that. I suppose that's also a choice. Here are some things that you may have been curious about after reading my summary, and which I'm still curious about after having finished the book: What laws do the Guardships impose and what's the philosophy behind those laws? How does the economic system work? Who built the Guardships originally, and how? How do the humans outside of Canon Space live? Who are the Ku? Why did they start fighting the humans? How many other aliens are there? What do they think of this? How does the Canon government work? How have the Guardships remained technologically superior for four thousand years? Even where the reader gets a partial explanation, such as what Web is and how it was built, it's an unimportant aside that's largely devoid of a sense of wonder. The one piece of world-building that this book is interested in is the individual Guardships and the different ways in which they've handled millennia of self-contained patrol, and even there we only get to see a few of them. There is a plot with appropriately epic scope, but even that is undermined by the odd pacing. Five, ten, or fifty years sometimes goes by in a sentence. A war starts, with apparently enormous implications for Canon Space, and then we learn that it continues for years without warranting narrative comment. This is done without transitions and without signposts for the reader; it's just another sentence in the narration, mixed in with the rhetorical questions and clumsy foreshadowing. I would like to tell you that at least the book has a satisfying ending that resolves the plot conflict that it finally reveals to the reader, but I had a hard time understanding why the ending even mattered. The plot was so difficult to follow that I'm sure I missed something, but it's not difficult to follow in the fun way that would make me want to re-read it. It's difficult to follow because Cook doesn't seem able to explain the plot in his head to the reader in any coherent form. I think the status quo was slightly disrupted? Maybe? Also, I no longer care. Oh, and there's a gene-engineered sex slave in this book, who various male characters are very protective and possessive of, who never develops much of a personality, and who has no noticeable impact on the plot despite being a major character. Yay. This was one of the worst books I've read in a long time. In retrospect, it was an awful place to start with Glen Cook. Hopefully his better-known works are also better-written, but I can't say I feel that inspired to find out. Rating: 2 out of 10

10 September 2022

Joachim Breitner: rec-def: Behind the scenes

A week ago I wrote about the rec-def Haskell library, which allows you to write more recursive definitions, such as in this small example:
let s1 = RS.insert 23 s2
    s2 = RS.insert 42 s1
in RS.get s1
This will not loop (as it would if you d just used Data.Set), but rather correctly return the set S.fromList [23,42]. See the previous blog post for more examples and discussion of the user-facing side of this. For quick reference, these are the types of the functions involved here: The type of s1 and s2 above is not Set Int, but rather RSet Int, and in this post I ll explain how RSet works internally.

Propagators, in general The conceptual model behind an recursive equation like above is
  • There are a multiple cells that can hold values of an underlying type (here Set)
  • These cells have relations that explain how the values in the cells should relate to each other
  • After registering all the relations, some form of solving happens.
  • If the solving succeeds, we can read off the values from the cells, and they should satisfy the registered relation.
This is sometimes called a propagator network, and is a quite general model that can support different kind of relations (e.g. equalities, inequalities, functions), there can be various solving strategies (iterative fixed-points, algebraic solution, unification, etc.) and information can flow on along the edges (and hyper-edges) possibly in multiple directions. For our purposes, we only care about propagator networks where all relations are functional, so they have a single output cell that is declared to be a function of multiple (possibly zero) input cells, without affecting these input cells. Furthermore, every cell is the output of exactly one such relation.

IO-infested propagator interfaces This suggests that an implementation of such a propagator network could provide an interface with the following three operations:
  • Functions to declare cells
  • Functions to declare relations
  • Functions to read values off cells
This is clearly an imperative interface, so we ll see monads, and we ll simply use IO. So concretely for our small example above, we might expect
There is no need for an explicit solve function: solving can happen when declareInsert or getCell is called as a User I do not care about that. You might be curious about the implementation of newCell, declareInsert and getCell, but I have to disappoint you: This is not the topic of this article. Instead, I want to discuss how to turn this IO-infested interface into the pure interface seen above?

Pure, but too strict Obviously, we have to get rid of the IO somehow, and have to use unsafePerformIO :: IO a -> a somehow. This dangerous function creates a pure-looking value that, when used the first time, will run the IO-action and turn into that action s result. So maybe we can simply write the following: Indeed, the types line up, but if we try to use that code, nothing will happen. Our insert is too strict to be used recursively: It requires the value of c2 (as it is passed to declareInsert, which we assume to be strict in its arguments) before it can return c1, so the recursive example at the top of this post will not make any progress.

Pure, lazy, but forgetful To work around this, maybe it suffices if we do not run declareInsert right away, but just remember that we have to do it eventually? So let s introduce a new data type for RSet a that contains not just the cell (Cell a), but also an action that we still have to run before getting a value: This is better: insert is now lazy in its arguments (for this it is crucial to pattern-match on RSet only inside the todo code, not in the pattern of insert!) This means that our recursive code above does not get stuck right away.

Pure, lazy, but runs in circles But it is still pretty bad: Note that we do not run get s2 in the example above, so that cell s todo, which would declareInsert 42, will never run. This cannot work! We have to (eventually) run the declaration code from all involved cells before we can use getCell! We can try to run the todo action of all the dependencies as part of a cell s todo action: Now we certainly won t forget to run the second cell s todo action, so that is good. But that cell s todo action will run the first cell s todo action, and that again the second cell s, and so on.

Pure, lazy, terminating, but not thread safe This is silly: We only need (and should!) run that code once! So let s keep track of whether we ran it already: Ah, much better: It works! Our call to get c1 will trigger the first cell s todo action, which will mark it as done before calling the second cell s todo action. When that now invokes the first cell s todo action, it is already marked done and we break the cycle, and by the time we reach getCell, all relations have been correctly registered. In a single-threaded world, this would be all good and fine, but we have to worry about multiple threads running get concurrently, on the same or on different cells. In fact, because we use unsafePerformIO, we have to worry about this even when the program is not using threads. And the above code has problems. Imagine a second call to get c1 while the first one has already marked it as done, but has not finished processing all the dependencies yet: It will call getCell before all relations are registered, which is bad.

Recursive do-once IO actions Making this thread-safe seems to be possible, but would clutter both the code and this blog post. So let s hide that problem behind a nice and clean interface. Maybe there will be a separate blog post about its implementation (let me know if you are curious), or you can inspect the code in System.IO.RecThunk module yourself). The interface is simply
data Thunk
thunk :: IO [Thunk] -> IO Thunk
force :: Thunk -> IO ()
and the idea is that thunk act will defer the action act until the thunk is passed to force for the first time, and force will not return until the action has been performed (possibly waiting if another thread is doing that at the moment), and also until the actions of all the thunks returned by act have performed, recursively, without running into cycles. We can use this in our definition of RSet and get to the final, working solution: This snippet captures the essential ideas behind rec-def:
  • Use laziness to allow recursive definition to describe the propagator graph naturally
  • Use a form of explicit thunk to register the propagator graph relations at the right time (not too early/strict, not too late)

And that s all? The actual implementation in rec-def has a few more moving parts. In particular, it tries to support different value types (not just sets), possibly with different implementations, and even mixing them (e.g. in member :: Ord a => a -> RSet a -> RBool), so the generic code is in Data.Propagator.Purify, and supports various propagators underneath. The type RSet is then just a newtype around that, defined in Data.Recursive.Internal to maintain the safety of the abstraction, I went back and forth on a few variants of the design here, including one where there was a generic R type constructor (R (Set a), R Bool etc.), but then monomorphic interface seems simpler.

Does it really work? The big remaining question is certainly: Is this really safe and pure? Does it still behave like Haskell? The answer to these questions certainly depends on the underlying propagator implementation. But it also depends on what we actually mean by safe and pure ? For example, do we expect the Static Argument Transformation be semantics preserving? Or is it allowed to turn undefined values into defined ones (as it does here)? I am unsure myself yet, so I ll defer this discussion to a separate blog post, after I hopefully had good discussions about this here at ICFP 2022 in Ljubljana. If you are around and want to discuss, please hit me up!

Next.