Search Results: "xam"

12 January 2025

Bastian Venthur: Investigating the popularity of Python build backends over time (II)

Last year, I analyzed the popularity of build backends used in pyproject.toml files over time. This post is the update for 2024. Analysis Like last year, I m using Tom Forbes fantastic dataset containing information about every file within every release uploaded to PyPI. To get the current dataset, I followed the same process as in last year s analysis, so I won t repeat all the details here. Instead, I ll highlight the main steps: Downloading all the parquet files took roughly a week due to GitHub s rate limiting. Tom suggested leveraging the Git v2 protocol to fetch the data directly. This approach could bypass rate limiting and complete the download of all pyproject.toml files in just 20 minutes(!). However, I couldn t find sufficient documentation that would help me to implement this method, so this will have to wait until next year s analysis. Once all the data is downloaded, I perform some preprocessing: Results I modified the plots a bit from last year to make them easier to read. Most notably, I binned the data into quarters to make the plots less noisy, and secondly, I stopped stacking the relative distribution plots to make the percentages directly readable. The first plot shows the absolute number of uploads (in thousands) by quarter and build backend. Absolute distribution of build backends by quarter The second plot shows the relative distribution of build backends by quarter. Relative distribution of build backends by quarter In 2024, we observe that: The script for downloading and analyzing the data is available in my GitHub repository. If someone has insights or examples on implementing the Git v2 protocol to download the pyproject.toml file given the repository URL and its hash, I d love to hear from you!

10 January 2025

Dirk Eddelbuettel: nanotime 0.3.11 on CRAN: Polish

Another minor update 0.3.11 for our nanotime package is now on CRAN. nanotime relies on the RcppCCTZ package (as well as the RcppDate package for additional C++ operations) and offers efficient high(er) resolution time parsing and formatting up to nanosecond resolution, using the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. This release covers two corner case. Michael sent in a PR avoiding a clang warning on complex types. We fixed an issue that surfaced in a downstream package under sanitizier checks: R extends coverage of NA to types such as integer or character which need special treatment in non-R library code as they do not know . We flagged (character) formatted values after we had called the corresponding CCTZ function but that leaves potentiall undefined values (from R s NA values for int, say, cast to double) so now we flag them, set a transient safe value for the call and inject the (character) representation "NA" after the call in those spots. End result is the same, but without a possibly slap on the wrist from sanitizer checks. The NEWS snippet below has the full details.

Changes in version 0.3.11 (2025-01-10)
  • Explicit Rcomplex assignment accommodates pickier compilers over newer R struct (Michael Chirico in #135 fixing #134)
  • When formatting, NA are flagged before CCTZ call to to not trigger santizier, and set to NA after call (Dirk in #136)

Thanks to my CRANberries, there is a diffstat report for this release. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository and all documentation is provided at the nanotime documentation site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

9 January 2025

Reproducible Builds: Reproducible Builds in December 2024

Welcome to the December 2024 report from the Reproducible Builds project! Our monthly reports outline what we ve been up to over the past month and highlight items of news from elsewhere in the world of software supply-chain security when relevant. As ever, however, if you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. Table of contents:
  1. reproduce.debian.net
  2. debian-repro-status
  3. On our mailing list
  4. Enhancing the Security of Software Supply Chains
  5. diffoscope
  6. Supply-chain attack in the Solana ecosystem
  7. Website updates
  8. Debian changes
  9. Other development news
  10. Upstream patches
  11. Reproducibility testing framework

reproduce.debian.net Last month saw the introduction of reproduce.debian.net. Announced at the recent Debian MiniDebConf in Toulouse, reproduce.debian.net is an instance of rebuilderd operated by the Reproducible Builds project. rebuilderd is our server designed monitor the official package repositories of Linux distributions and attempts to reproduce the observed results there. This month, however, we are pleased to announce that not only does the service now produce graphs, the reproduce.debian.net homepage itself has become a start page of sorts, and the amd64.reproduce.debian.net and i386.reproduce.debian.net pages have emerged. The first of these rebuilds the amd64 architecture, naturally, but it also is building Debian packages that are marked with the no architecture label, all. The second builder is, however, only rebuilding the i386 architecture. Both of these services were also switched to reproduce the Debian trixie distribution instead of unstable, which started with 43% of the archive rebuild with 79.3% reproduced successfully. This is very much a work in progress, and we ll start reproducing Debian unstable soon. Our i386 hosts are very kindly sponsored by Infomaniak whilst the amd64 node is sponsored by OSUOSL thank you! Indeed, we are looking for more workers for more Debian architectures; please contact us if you are able to help.

debian-repro-status Reproducible builds developer kpcyrd has published a client program for reproduce.debian.net (see above) that queries the status of the locally installed packages and rates the system with a percentage score. This tool works analogously to arch-repro-status for the Arch Linux Reproducible Builds setup. The tool was packaged for Debian and is currently available in Debian trixie: it can be installed with apt install debian-repro-status.

On our mailing list On our mailing list this month:
  • Bernhard M. Wiedemann wrote a detailed post on his long journey towards a bit-reproducible Emacs package. In his interesting message, Bernhard goes into depth about the tools that they used and the lower-level technical details of, for instance, compatibility with the version for glibc within openSUSE.
  • Shivanand Kunijadar posed a question pertaining to the reproducibility issues with encrypted images. Shivanand explains that they must use a random IV for encryption with AES CBC. The resulting artifact is not reproducible due to the random IV used. The message resulted in a handful of replies, hopefully helpful!
  • User Danilo posted an in interesting question related to their attempts in trying to achieve reproducible builds for Threema Desktop 2.0. The question resulted in a number of replies attempting to find the right combination of compiler and linker flags (for example).
  • Longstanding contributor David A. Wheeler wrote to our list announcing the release of the Census III of Free and Open Source Software: Application Libraries report written by Frank Nagle, Kate Powell, Richie Zitomer and David himself. As David writes in his message, the report attempts to answer the question what is the most popular Free and Open Source Software (FOSS)? .
  • Lastly, kpcyrd followed-up to a post from September 2024 which mentioned their desire for someone to implement a hashset of allowed module hashes that is generated during the kernel build and then embedded in the kernel image , thus enabling a deterministic and reproducible build. However, they are now reporting that somebody implemented the hash-based allow list feature and submitted it to the Linux kernel mailing list . Like kpcyrd, we hope it gets merged.

Enhancing the Security of Software Supply Chains: Methods and Practices Mehdi Keshani of the Delft University of Technology in the Netherlands has published their thesis on Enhancing the Security of Software Supply Chains: Methods and Practices . Their introductory summary first begins with an outline of software supply chains and the importance of the Maven ecosystem before outlining the issues that it faces that threaten its security and effectiveness . To address these:
First, we propose an automated approach for library reproducibility to enhance library security during the deployment phase. We then develop a scalable call graph generation technique to support various use cases, such as method-level vulnerability analysis and change impact analysis, which help mitigate security challenges within the ecosystem. Utilizing the generated call graphs, we explore the impact of libraries on their users. Finally, through empirical research and mining techniques, we investigate the current state of the Maven ecosystem, identify harmful practices, and propose recommendations to address them.
A PDF of Mehdi s entire thesis is available to download.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 283 and 284 to Debian:
  • Update copyright years. [ ]
  • Update tests to support file 5.46. [ ][ ]
  • Simplify tests_quines.py::test_ differences,differences_deb to simply use assert_diff and not mangle the test fixture. [ ]

Supply-chain attack in the Solana ecosystem A significant supply-chain attack impacted Solana, an ecosystem for decentralised applications running on a blockchain. Hackers targeted the @solana/web3.js JavaScript library and embedded malicious code that extracted private keys and drained funds from cryptocurrency wallets. According to some reports, about $160,000 worth of assets were stolen, not including SOL tokens and other crypto assets.

Website updates Similar to last month, there was a large number of changes made to our website this month, including:
  • Chris Lamb:
    • Make the landing page hero look nicer when the vertical height component of the viewport is restricted, not just the horizontal width.
    • Rename the Buy-in page to Why Reproducible Builds? [ ]
    • Removing the top black border. [ ][ ]
  • Holger Levsen:
  • hulkoba:
    • Remove the sidebar-type layout and move to a static navigation element. [ ][ ][ ][ ]
    • Create and merge a new Success stories page, which highlights the success stories of Reproducible Builds, showcasing real-world examples of projects shipping with verifiable, reproducible builds. These stories aim to enhance the technical resilience of the initiative by encouraging community involvement and inspiring new contributions. . [ ]
    • Further changes to the homepage. [ ]
    • Remove the translation icon from the navigation bar. [ ]
    • Remove unused CSS styles pertaining to the sidebar. [ ]
    • Add sponsors to the global footer. [ ]
    • Add extra space on large screens on the Who page. [ ]
    • Hide the side navigation on small screens on the Documentation pages. [ ]

Debian changes There were a significant number of reproducibility-related changes within Debian this month, including:
  • Santiago Vila uploaded version 0.11+nmu4 of the dh-buildinfo package. In this release, the dh_buildinfo becomes a no-op ie. it no longer does anything beyond warning the developer that the dh-buildinfo package is now obsolete. In his upload, Santiago wrote that We still want packages to drop their [dependency] on dh-buildinfo, but now they will immediately benefit from this change after a simple rebuild.
  • Holger Levsen filed Debian bug #1091550 requesting a rebuild of a number of packages that were built with a very old version of dpkg.
  • Fay Stegerman contributed to an extensive thread on the debian-devel development mailing list on the topic of Supporting alternative zlib implementations . In particular, Fay wrote about her results experimenting whether zlib-ng produces identical results or not.
  • kpcyrd uploaded a new rust-rebuilderd-worker, rust-derp, rust-in-toto and debian-repro-status to Debian, which passed successfully through the so-called NEW queue.
  • Gioele Barabucci filed a number of bugs against the debrebuild component/script of the devscripts package, including:
    • #1089087: Address a spurious extra subdirectory in the build path.
    • #1089201: Extra zero bytes added to .dynstr when rebuilding CMake projects.
    • #1089088: Some binNMUs have a 1-second offset in some timestamps.
  • Gioele Barabucci also filed a bug against the dh-r package to report that the Recommends and Suggests fields are missing from rebuilt R packages. At the time of writing, this bug has no patch and needs some help to make over 350 binary packages reproducible.
  • Lastly, 8 reviews of Debian packages were added, 11 were updated and 11 were removed this month adding to our knowledge about identified issues.

Other development news In other ecosystem and distribution news:
  • Lastly, in openSUSE, Bernhard M. Wiedemann published another report for the distribution. There, Bernhard reports about the success of building R-B-OS , a partial fork of openSUSE with only 100% bit-reproducible packages. This effort was sponsored by the NLNet NGI0 initiative.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In November, a number of changes were made by Holger Levsen, including:
  • reproduce.debian.net-related:
    • Add a new i386.reproduce.debian.net rebuilder. [ ][ ][ ][ ][ ][ ]
    • Make a number of updates to the documentation. [ ][ ][ ][ ][ ]
    • Run i386.reproduce.debian.net run on a public port to allow external workers. [ ]
    • Add a link to the /api/v0/pkgs/list endpoint. [ ]
    • Add support for a statistics page. [ ][ ][ ][ ][ ][ ]
    • Limit build logs to 20 MiB and diffoscope output to 10 MiB. [ ]
    • Improve the frontpage. [ ][ ]
    • Explain that we re testing arch:any and arch:all on the amd64 architecture, but only arch:any on i386. [ ]
  • Misc:
    • Remove code for testing Arch Linux, which has moved to reproduce.archlinux.org. [ ][ ]
    • Don t install dstat on Jenkins nodes anymore as its been removed from Debian trixie. [ ]
    • Prepare the infom08-i386 node to become another rebuilder. [ ]
    • Add debug date output for benchmarking the reproducible_pool_buildinfos.sh script. [ ]
    • Install installation-birthday everywhere. [ ]
    • Temporarily disable automatic updates of pool links on buildinfos.debian.net. [ ]
    • Install Recommends by default on Jenkins nodes. [ ]
    • Rename rebuilder_stats.py to rebuilderd_stats.py. [ ]
    • r.d.n/stats: minor formatting changes. [ ]
    • Install files under /etc/cron.d/ with the correct permissions. [ ]
and Jochen Sprickerhof made the following changes: Lastly, Gioele Barabucci also classified packages affected by 1-second offset issue filed as Debian bug #1089088 [ ][ ][ ][ ], Chris Hofstaedtler updated the URL for Grml s dpkg.selections file [ ], Roland Clobus updated the Jenkins log parser to parse warnings from diffoscope [ ] and Mattia Rizzolo banned a number of bots and crawlers from the service [ ][ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

Valhalla's Things: Poor Man Media Server

Posted on January 9, 2025
Tags: madeof:bits
Some time ago I installed minidlna on our media server: it was pretty easy to do, but quite limited in its support for the formats I use most, so I ended up using other solutions such as mounting the directory with sshfs. Now, doing that from a phone, even a pinephone running debian, may not be as convenient as doing it from the laptop where I already have my ssh key :D and I needed to listed to music from the pinephone. So, in anger, I decided to configure a web server to serve the files. I installed lighttpd because I already had a role for this kind of configuration in my ansible directory, and configured it to serve the relevant directory in /etc/lighttpd/conf-available/20-music.conf:
$HTTP["host"] =~ "music.example.org"  
    server.name          = "music.example.org"
    server.document-root = "/path/to/music"
 
the domain was already configured in my local dns (since everything is only available to the local network), and I enabled both 20-music.conf and 10-dir-listing.conf. And. That s it. It works. I can play my CD rips on a single flac exactly in the same way as I was used to (by ssh-ing to the media server and using alsaplayer). Then this evening I was talking to normal people1, and they mentioned that they wouldn t mind being able to skip tracks and fancy things like those :D and I ve found one possible improvement. For the directories with the generated single-track ogg files I ve added some playlists with the command ls *.ogg > playlist.m3u, then in the directory above I ve run ls */*.m3u > playlist.m3u and that also works. With vlc I can now open http://music.example.org/band/album/playlist.m3u to listen to an album that I have in ogg, being able to move between tracks, or I can open http://music.example.org/band/playlist.m3u and in the playlist view I can browse between the different albums. Left as an exercise to the reader2 are writing a bash script to generate all of the playlist.m3u files (and running it via some git hook when the files change) or writing a php script to generate them on the fly.
Update 2025-01-10: another reader3 wrote the php script and has authorized me to post it here.
<?php
define("MUSIC_FOLDER", __DIR__);
define("ID3v2", false);


function dd()  
    echo "<pre>"; call_user_func_array("var_dump", func_get_args());
    die();
 

function getinfo($file)  
    $cmd = 'id3info "' . MUSIC_FOLDER . "/" . $file . '"';
    exec($cmd, $output);
    $res = [];
    foreach($output as $line)  
    if (str_starts_with($line, "=== "))  
        $key = explode(" ", $line)[1];
        $val = end(explode(": ", $line, 2));
        $res[$key] = $val;
     
     
    if (isset($res['TPE1'])   isset($res['TIT2']))
    echo "#EXTINF: , " . ($res['TPE1'] ?? "Unk") . " - " . ($res['TIT2'] ?? "Untl") . "\r\n";
    if (isset($res['TALB']))
    echo "#EXTALB: " . $res['TALB'] . "\r\n";
 


function pathencode($path, $name)  
    $path = urlencode($path);
    $path =  str_replace("%2F", "/", $path);
    $name = urlencode($name);
    if ($path != "") $path = "/" . $path;
    return $path . "/" . $name;
 

function serve_playlist($path)  
    echo "#EXTM3U";
    echo "# PATH: $path\n\r";
    foreach (glob(MUSIC_FOLDER . "/$path/*") as $filename)  
    $name = basename($filename);
    if (is_dir($filename))  
        echo pathencode($path, $name) . ".m3u\r\n";
     
    $t = explode(".", $filename);
    $ext = array_pop($t);
    if (in_array($ext, ["mp3", "ogg", "flac", "mp4", "m4a"]))  
        if (ID3v2)  
 	   getinfo($path . "/" . $name);
          else  
 	   echo "#EXTINF: , " . $path . "/" . $name . "\r\n";
         
        echo pathencode($path, $name) . "\r\n";
     
     
    die();
 



$path = $_SERVER["REQUEST_URI"];
$path = urldecode($path);
$path = trim($path, "/");

if (str_ends_with($path, ".m3u"))  
    $path = str_replace(".m3u", "", $path);

    serve_playlist($path);
 

$path = MUSIC_FOLDER . "/" . $path;
if (file_exists($path) && is_file($path))  
    header('Content-Description: File Transfer');
    header('Content-Type: application/octet-stream');
    header('Expires: 0');
    header('Cache-Control: must-revalidate');
    header('Pragma: public');
    header('Content-Length: ' . filesize($path));
    readfile($path);
 
It s php, so I assume no responsability for it :D

  1. as much as the members of our LUG can be considered normal.
  2. i.e. the person in the LUG who wanted me to share what I had done.
  3. i.e. the other person in the LUG who was in that conversation and suggested the php script option.

7 January 2025

Jonathan Wiltshire: Using TPM for Automatic Disk Decryption in Debian 12

These days it s straightforward to have reasonably secure, automatic decryption of your root filesystem at boot time on Debian 12. Here s how I did it on an existing system which already had a stock kernel, secure boot enabled, grub2 and an encrypted root filesystem with the passphrase in key slot 0. There s no need to switch to systemd-boot for this setup but you will use systemd-cryptenroll to manage the TPM-sealed key. If that offends you, there are other ways of doing this.

Caveat The parameters I ll seal a key against in the TPM include a hash of the initial ramdisk. This is essential to prevent an attacker from swapping the image for one which discloses the key. However, it also means the key has to be re-sealed every time the image is rebuilt. This can be frequent, for example when installing/upgrading/removing packages which include a kernel module. You won t get locked out (as long as you still have a passphrase in another slot), but will need to re-seal the key to restore the automation. You can also choose not to include this parameter for the seal, but that opens the door to such an attack.

Caution: these are the steps I took on my own system. You may need to adjust them to avoid ending up with a non-booting system.

Check for a usable TPM device We ll bind the secure boot state, kernel parameters, and other boot measurements to a decryption key. Then, we ll seal it using the TPM. This prevents the disk being moved to another system, the boot chain being tampered with and various other attacks.
# apt install tpm2-tools
# systemd-cryptenroll --tpm2-device list
PATH        DEVICE     DRIVER 
/dev/tpmrm0 STM0125:00 tpm_tis

Clean up older kernels including leftover configurations I found that previously-removed (but not purged) kernel packages sometimes cause dracut to try installing files to the wrong paths. Identify them with:
# apt install aptitude
# aptitude search '~c'
Change search to purge or be more selective, this part is an exercise for the reader.

Switch to dracut for initramfs images Unless you have a particular requirement for the default initramfs-tools, replace it with dracut and customise:
# mkdir /etc/dracut.conf.d
# echo 'add_dracutmodules+=" tpm2-tss crypt "' > /etc/dracut.conf.d/crypt.conf
# apt install dracut

Remove root device from crypttab, configure grub Remove (or comment) the root device from /etc/crypttab and rebuild the initial ramdisk with dracut -f. Edit /etc/default/grub and add rd.auto rd.luks=1 to GRUB_CMDLINE_LINUX. Re-generate the config with update-grub. At this point it s a good idea to sanity-check the initrd contents with lsinitrd. Then, reboot using the new image to ensure there are no issues. This will also have up-to-date TPM measurements ready for the next step.

Identify device and seal a decryption key
# lsblk -ip -o NAME,TYPE,MOUNTPOINTS
NAME                                                    TYPE  MOUNTPOINTS
/dev/nvme0n1p4                                          part  /boot
/dev/nvme0n1p5                                          part  
 -/dev/mapper/luks-deff56a9-8f00-4337-b34a-0dcda772e326 crypt 
   -/dev/mapper/lv-var                                  lvm   /var
   -/dev/mapper/lv-root                                 lvm   /
   -/dev/mapper/lv-home                                 lvm   /home
In this example my root filesystem is in a container on /dev/nvme0n1p5. The existing passphrase key is in slot 0.
# systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=7+8+9+14 /dev/nvme0n1p5
Please enter current passphrase for disk /dev/nvme0n1p5: ********
New TPM2 token enrolled as key slot 1.
The PCRs I chose (7, 8, 9 and 14) correspond to the secure boot policy, kernel command line (to prevent init=/bin/bash-style attacks), files read by grub including that crucial initrd measurement, and secure boot MOK certificates and hashes. You could also include PCR 5 for the partition table state, and any others appropriate for your setup.

Reboot You should now be able to reboot and the root device will be unlocked automatically, provided the secure boot measurements remain consistent. The key slot protected by a passphrase (mine is slot 0) is now your recovery key. Do not remove it!
Please consider supporting my work in Debian and elsewhere through Liberapay.

5 January 2025

Dominique Dumont: cme: new field in fill.copyright.blanks.yml for Debian copyright file

Hi The file fill.copyright.blanks.yml is used to fill missing copyright information when running cme update dpkg-copyright. This file can contain a comment field that is used for book-keeping. Here s an example from libuv1:
README.md:
comment:  -
  the license from this file is used as a main license and tends to
  apply expat or CC to all files. Which is wrong. Let's skip this file
  and let cme retrieve data from files.
skip: true
You may ask: why no use a YAML comments ? The problem is that YAML comments are dropped by cme edit dpkg. So you should not use them in fill.copyrights.blanks.yml. It occurred to me that it may be interesting to copy the content of this comment in to debian/copyright file entries. But not in all cases, as some comments make sense in fill.copyright.blanks.yml but not in debian/copyright. So I ve added a new forwarded-comment parameter in fill.copyright.blanks.yml. The content of this field is copied verbatim in debian/copyright. This way, you can add comments for book keeping and comments for debian/copyright entries. For instance:
pan/gui/*:
  forwarded-comment: some comment about gui
  comment: this is an example from cme test files
yields:
Files: pan/gui/*
Copyright: 1989, 1991, Free Software Foundation, Inc.
License: GPL-2
Comment: some comment about gui
This new functionality is available in libconfig-model-dpkg-perl >= 3.008. All the best

2 January 2025

Matthew Garrett: The GPU, not the TPM, is the root of hardware DRM

As part of their "Defective by Design" anti-DRM campaign, the FSF recently made the following claim:
Today, most of the major streaming media platforms utilize the TPM to decrypt media streams, forcefully placing the decryption out of the user's control (from here).
This is part of an overall argument that Microsoft's insistence that only hardware with a TPM can run Windows 11 is with the goal of aiding streaming companies in their attempt to ensure media can only be played in tightly constrained environments.

I'm going to be honest here and say that I don't know what Microsoft's actual motivation for requiring a TPM in Windows 11 is. I've been talking about TPM stuff for a long time. My job involves writing a lot of TPM code. I think having a TPM enables a number of worthwhile security features. Given the choice, I'd certainly pick a computer with a TPM. But in terms of whether it's of sufficient value to lock out Windows 11 on hardware with no TPM that would otherwise be able to run it? I'm not sure that's a worthwhile tradeoff.

What I can say is that the FSF's claim is just 100% wrong, and since this seems to be the sole basis of their overall claim about Microsoft's strategy here, the argument is pretty significantly undermined. I'm not aware of any streaming media platforms making use of TPMs in any way whatsoever. There is hardware DRM that the media companies use to restrict users, but it's not in the TPM - it's in the GPU.

Let's back up for a moment. There's multiple different DRM implementations, but the big three are Widevine (owned by Google, used on Android, Chromebooks, and some other embedded devices), Fairplay (Apple implementation, used for Mac and iOS), and Playready (Microsoft's implementation, used in Windows and some other hardware streaming devices and TVs). These generally implement several levels of functionality, depending on the capabilities of the device they're running on - this will range from all the DRM functionality being implemented in software up to the hardware path that will be discussed shortly. Streaming providers can choose what level of functionality and quality to provide based on the level implemented on the client device, and it's common for 4K and HDR content to be tied to hardware DRM. In any scenario, they stream encrypted content to the client and the DRM stack decrypts it before the compressed data can be decoded and played.

The "problem" with software DRM implementations is that the decrypted material is going to exist somewhere the OS can get at it at some point, making it possible for users to simply grab the decrypted stream, somewhat defeating the entire point. Vendors try to make this difficult by obfuscating their code as much as possible (and in some cases putting some of it in-kernel), but pretty much all software DRM is at least somewhat broken and copies of any new streaming media end up being available via Bittorrent pretty quickly after release. This is why higher quality media tends to be restricted to clients that implement hardware-based DRM.

The implementation of hardware-based DRM varies. On devices in the ARM world this is usually handled by performing the cryptography in a Trusted Execution Environment, or TEE. A TEE is an area where code can be executed without the OS having any insight into it at all, with ARM's TrustZone being an example of this. By putting the DRM code in TrustZone, the cryptography can be performed in RAM that the OS has no access to, making the scraping described earlier impossible. x86 has no well-specified TEE (Intel's SGX is an example, but is no longer implemented in consumer parts), so instead this tends to be handed off to the GPU. The exact details of this implementation are somewhat opaque - of the previously mentioned DRM implementations, only Playready does hardware DRM on x86, and I haven't found any public documentation of what drivers need to expose for this to work.

In any case, as part of the DRM handshake between the client and the streaming platform, encryption keys are negotiated with the key material being stored in the GPU or the TEE, inaccessible from the OS. Once decrypted, the material is decoded (again either on the GPU or in the TEE - even in implementations that use the TEE for the cryptography, the actual media decoding may happen on the GPU) and displayed. One key point is that the decoded video material is still stored in RAM that the OS has no access to, and the GPU composites it onto the outbound video stream (which is why if you take a screenshot of a browser playing a stream using hardware-based DRM you'll just see a black window - as far as the OS can see, there is only a black window there).

Now, TPMs are sometimes referred to as a TEE, and in a way they are. However, they're fixed function - you can't run arbitrary code on the TPM, you only have whatever functionality it provides. But TPMs do have the ability to decrypt data using keys that are tied to the TPM, so isn't this sufficient? Well, no. First, the TPM can't communicate with the GPU. The OS could push encrypted material to it, and it would get plaintext material back. But the entire point of this exercise was to avoid the decrypted version of the stream from ever being visible to the OS, so this would be pointless. And rather more fundamentally, TPMs are slow. I don't think there's a TPM on the market that could decrypt a 1080p stream in realtime, let alone a 4K one.

The FSF's focus on TPMs here is not only technically wrong, it's indicative of a failure to understand what's actually happening in the industry. While the FSF has been focusing on TPMs, GPU vendors have quietly deployed all of this technology without the FSF complaining at all. Microsoft has enthusiastically participated in making hardware DRM on Windows possible, and user freedoms have suffered as a result, but Playready hardware-based DRM works just fine on hardware that doesn't have a TPM and will continue to do so.

comment count unavailable comments

28 December 2024

Enrico Zini: Disable spellchecker popup on Android

On Android, there's a spellchecker popup that occasionally appears over the keyboard, getting very annoyingly in the way. See for example this unanswered question with screenshots. It looks like a feature of the keyboard, but it's not, and so I looked and I looked and I could not find how to turn it off. The answer is to look for how to disable the spellchecker in the keyboard section of the android system settings, not in the android keyboard app settings. See for example this answer on stackexchange.

27 December 2024

Wouter Verhelst: Writing an extensible JSON-based DSL with Moose

At work, I've been maintaining a perl script that needs to run a number of steps as part of a release workflow. Initially, that script was very simple, but over time it has grown to do a number of things. And then some of those things did not need to be run all the time. And then we wanted to do this one exceptional thing for this one case. And so on; eventually the script became a big mess of configuration options and unreadable flow, and so I decided that I wanted it to be more configurable. I sat down and spent some time on this, and eventually came up with what I now realize is a domain-specific language (DSL) in JSON, implemented by creating objects in Moose, extensible by writing more object classes. Let me explain how it works. In order to explain, however, I need to explain some perl and Moose basics first. If you already know all that, you can safely skip ahead past the "Preliminaries" section that's next.

Preliminaries

Moose object creation, references. In Moose, creating a class is done something like this:
package Foo;
use v5.40;
use Moose;
has 'attribute' => (
    is  => 'ro',
    isa => 'Str',
    required => 1
);
sub say_something  
    my $self = shift;
    say "Hello there, our attribute is " . $self->attribute;
 
The above is a class that has a single attribute called attribute. To create an object, you use the Moose constructor on the class, and pass it the attributes you want:
use v5.40;
use Foo;
my $foo = Foo->new(attribute => "foo");
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a new object with the attribute attribute set to bar. The attribute accessor is a method generated by Moose, which functions both as a getter and a setter (though in this particular case we made the attribute "ro", meaning read-only, so while it can be set at object creation time it cannot be changed by the setter anymore). So yay, an object. And it has methods, things that we set ourselves. Basic OO, all that. One of the peculiarities of perl is its concept of "lists". Not to be confused with the lists of python -- a concept that is called "arrays" in perl and is somewhat different -- in perl, lists are enumerations of values. They can be used as initializers for arrays or hashes, and they are used as arguments to subroutines. Lists cannot be nested; whenever a hash or array is passed in a list, the list is "flattened", that is, it becomes one big list. This means that the below script is functionally equivalent to the above script that uses our "Foo" object:
use v5.40;
use Foo;
my %args;
$args attribute  = "foo";
my $foo = Foo->new(%args);
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a hash %args wherein we set the attributes that we want to pass to our constructor. We set one attribute in %args, the one called attribute, and then use %args and rely on list flattening to create the object with the same attribute set (list flattening turns a hash into a list of key-value pairs). Perl also has a concept of "references". These are scalar values that point to other values; the other value can be a hash, a list, or another scalar. There is syntax to create a non-scalar value at assignment time, called anonymous references, which is useful when one wants to remember non-scoped values. By default, references are not flattened, and this is what allows you to create multidimensional values in perl; however, it is possible to request list flattening by dereferencing the reference. The below example, again functionally equivalent to the previous two examples, demonstrates this:
use v5.40;
use Foo;
my $args =  ;
$args-> attribute  = "foo";
my $foo = Foo->new(%$args);
$foo->say_something;
(output: Hello there, our attribute is foo) This creates a scalar $args, which is a reference to an anonymous hash. Then, we set the key attribute of that anonymous hash to bar (note the use arrow operator here, which is used to indicate that we want to dereference a reference to a hash), and create the object using that reference, requesting hash dereferencing and flattening by using a double sigil, %$. As a side note, objects in perl are references too, hence the fact that we have to use the dereferencing arrow to access the attributes and methods of Moose objects. Moose attributes don't have to be strings or even simple scalars. They can also be references to hashes or arrays, or even other objects:
package Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef[Str]',
    predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    predicate => 'has_object_attribute',
);
sub say_something  
    my $self = shift;
    if($self->has_object_attribute)  
        $self->object_attribute->say_something;
     
    $self->SUPER::say_something unless $self->has_hash_attribute;
    say "We have a hash attribute!"
 
This creates a subclass of Foo called Bar that has a hash attribute called hash_attribute, and an object attribute called object_attribute. Both of them are references; one to a hash, the other to an object. The hash ref is further limited in that it requires that each value in the hash must be a string (this is optional but can occasionally be useful), and the object ref in that it must refer to an object of the class Foo, or any of its subclasses. The predicates used here are extra subroutines that Moose provides if you ask for them, and which allow you to see if an object's attribute has a value or not. The example script would use an object like this:
use v5.40;
use Bar;
my $foo = Foo->new(attribute => "foo");
my $bar = Bar->new(object_attribute => $foo, attribute => "bar");
$bar->say_something;
(output: Hello there, our attribute is foo) This example also shows object inheritance, and methods implemented in child classes. Okay, that's it for perl and Moose basics. On to...

Moose Coercion Moose has a concept of "value coercion". Value coercion allows you to tell Moose that if it sees one thing but expects another, it should convert is using a passed subroutine before assigning the value. That sounds a bit dense without example, so let me show you how it works. Reimaginging the Bar package, we could use coercion to eliminate one object creation step from the creation of a Bar object:
package "Bar";
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
extends "Foo";
coerce "Foo",
    from "HashRef",
    via   Foo->new(%$_)  ;
has 'hash_attribute' => (
    is => 'ro',
    isa => 'HashRef',
    predicate => 'has_hash_attribute',
);
has 'object_attribute' => (
    is => 'ro',
    isa => 'Foo',
    coerce => 1,
    predicate => 'has_object_attribute',
);
sub say_something  
    my $self = shift;
    if($self->has_object_attribute)  
        $self->object_attribute->say_something;
     
    $self->SUPER::say_something unless $self->has_hash_attribute;
    say "We have a hash attribute!"
 
Okay, let's unpack that a bit. First, we add the Moose::Util::TypeConstraints module to our package. This is required to declare coercions. Then, we declare a coercion to tell Moose how to convert a HashRef to a Foo object: by using the Foo constructor on a flattened list created from the hashref that it is given. Then, we update the definition of the object_attribute to say that it should use coercions. This is not the default, because going through the list of coercions to find the right one has a performance penalty, so if the coercion is not requested then we do not do it. This allows us to simplify declarations. With the updated Bar class, we can simplify our example script to this:
use v5.40;
use Bar;
my $bar = Bar->new(attribute => "bar", object_attribute =>   attribute => "foo"  );
$bar->say_something
(output: Hello there, our attribute is foo) Here, the coercion kicks in because the value object_attribute, which is supposed to be an object of class Foo, is instead a hash ref. Without the coercion, this would produce an error message saying that the type of the object_attribute attribute is not a Foo object. With the coercion, however, the value that we pass to object_attribute is passed to a Foo constructor using list flattening, and then the resulting Foo object is assigned to the object_attribute attribute. Coercion works for more complicated things, too; for instance, you can use coercion to coerce an array of hashes into an array of objects, by creating a subtype first:
package MyCoercions;
use v5.40;
use Moose;
use Moose::Util::TypeConstraints;
use Foo;
subtype "ArrayOfFoo", as "ArrayRef[Foo]";
subtype "ArrayOfHashes", as "ArrayRef[HashRef]";
coerce "ArrayOfFoo", from "ArrayOfHashes", via   [ map   Foo->create(%$_)   @ $_  ]  ;
Ick. That's a bit more complex. What happens here is that we use the map function to iterate over a list of values. The given list of values is @ $_ , which is perl for "dereference the default value as an array reference, and flatten the list of values in that array reference". So the ArrayRef of HashRefs is dereferenced and flattened, and each HashRef in the ArrayRef is passed to the map function. The map function then takes each hash ref in turn and passes it to the block of code that it is also given. In this case, that block is Foo->create(%$_) . In other words, we invoke the create factory method with the flattened hashref as an argument. This returns an object of the correct implementation (assuming our hash ref has a type attribute set), and with all attributes of their object set to the correct value. That value is then returned from the block (this could be made more explicit with a return call, but that is optional, perl defaults a return value to the rvalue of the last expression in a block). The map function then returns a list of all the created objects, which we capture in an anonymous array ref (the [] square brackets), i.e., an ArrayRef of Foo object, passing the Moose requirement of ArrayRef[Foo]. Usually, I tend to put my coercions in a special-purpose package. Although it is not strictly required by Moose, I find that it is useful to do this, because Moose does not allow a coercion to be defined if a coercion for the same type had already been done in a different package. And while it is theoretically possible to make sure you only ever declare a coercion once in your entire codebase, I find that doing so is easier to remember if you put all your coercions in a specific package. Okay, now you understand Moose object coercion! On to...

Dynamic module loading Perl allows loading modules at runtime. In the most simple case, you just use require inside a stringy eval:
my $module = "Foo";
eval "require $module";
This loads "Foo" at runtime. Obviously, the $module string could be a computed value, it does not have to be hardcoded. There are some obvious downsides to doing things this way, mostly in the fact that a computed value can basically be anything and so without proper checks this can quickly become an arbitrary code vulnerability. As such, there are a number of distributions on CPAN to help you with the low-level stuff of figuring out what the possible modules are, and how to load them. For the purposes of my script, I used Module::Pluggable. Its API is fairly simple and straightforward:
package Foo;
use v5.40;
use Moose;
use Module::Pluggable require => 1;
has 'attribute' => (
    is => 'ro',
    isa => 'Str',
);
has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);
sub handles_type  
    return 0;
 
sub create  
    my $class = shift;
    my %data = @_;
    foreach my $impl($class->plugins)  
        if($impl->can("handles_type") && $impl->handles_type($data type ))  
            return $impl->new(%data);
         
     
    die "could not find a plugin for type " . $data type ;
 
sub say_something  
    my $self = shift;
    say "Hello there, I am a " . $self->type;
 
The new concept here is the plugins class method, which is added by Module::Pluggable, and which searches perl's library paths for all modules that are in our namespace. The namespace is configurable, but by default it is the name of our module; so in the above example, if there were a package "Foo::Bar" which
  • has a subroutine handles_type
  • that returns a truthy value when passed the value of the type key in a hash that is passed to the create subroutine,
  • then the create subroutine creates a new object with the passed key/value pairs used as attribute initializers.
Let's implement a Foo::Bar package:
package Foo::Bar;
use v5.40;
use Moose;
extends 'Foo';
has 'type' => (
    is => 'ro',
    isa => 'Str',
    required => 1,
);
has 'serves_drinks' => (
    is => 'ro',
    isa => 'Bool',
    default => 0,
);
sub handles_type  
    my $class = shift;
    my $type = shift;
    return $type eq "bar";
 
sub say_something  
    my $self = shift;
    $self->SUPER::say_something;
    say "I serve drinks!" if $self->serves_drinks;
 
We can now indirectly use the Foo::Bar package in our script:
use v5.40;
use Foo;
my $obj = Foo->create(type => bar, serves_drinks => 1);
$obj->say_something;
output:
Hello there, I am a bar.
I serve drinks!
Okay, now you understand all the bits and pieces that are needed to understand how I created the DSL engine. On to...

Putting it all together We're actually quite close already. The create factory method in the last version of our Foo package allows us to decide at run time which module to instantiate an object of, and to load that module at run time. We can use coercion and list flattening to turn a reference to a hash into an object of the correct type. We haven't looked yet at how to turn a JSON data structure into a hash, but that bit is actually ridiculously trivial:
use JSON::MaybeXS;
my $data = decode_json($json_string);
Tada, now $data is a reference to a deserialized version of the JSON string: if the JSON string contained an object, $data is a hashref; if the JSON string contained an array, $data is an arrayref, etc. So, in other words, to create an extensible JSON-based DSL that is implemented by Moose objects, all we need to do is create a system that
  • takes hash refs to set arguments
  • has factory methods to create objects, which
    • uses Module::Pluggable to find the available object classes, and
    • uses the type attribute to figure out which object class to use to create the object
  • uses coercion to convert hash refs into objects using these factory methods
In practice, we could have a JSON file with the following structure:
 
    "description": "do stuff",
    "actions": [
         
            "type": "bar",
            "serves_drinks": true,
         ,
         
            "type": "bar",
            "serves_drinks": false,
         
    ]
 
... and then we could have a Moose object definition like this:
package MyDSL;
use v5.40;
use Moose;
use MyCoercions;
has "description" => (
    is => 'ro',
    isa => 'Str',
);
has 'actions' => (
    is => 'ro',
    isa => 'ArrayOfFoo'
    coerce => 1,
    required => 1,
);
sub say_something  
    say "Hello there, I am described as " . $self->description . " and I am performing my actions: ";
    foreach my $action(@ $self->actions )  
        $action->say_something;
     
 
Now, we can write a script that loads this JSON file and create a new object using the flattened arguments:
use v5.40;
use MyDSL;
use JSON::MaybeXS;
my $input_file_name = shift;
my $args = do  
    local $/ = undef;
    open my $input_fh, "<", $input_file_name or die "could not open file";
    <$input_fh>;
 ;
$args = decode_json($args);
my $dsl = MyDSL->new(%$args);
$dsl->say_something
Output:
Hello there, I am described as do stuff and I am performing my actions:
Hello there, I am a bar
I am serving drinks!
Hello there, I am a bar
In some more detail, this will:
  • Read the JSON file and deserialize it;
  • Pass the object keys in the JSON file as arguments to a constructor of the MyDSL class;
  • The MyDSL class then uses those arguments to set its attributes, using Moose coercion to convert the "actions" array of hashes into an array of Foo::Bar objects.
  • Perform the say_something method on the MyDSL object
Once this is written, extending the scheme to also support a "quux" type simply requires writing a Foo::Quux class, making sure it has a method handles_type that returns a truthy value when called with quux as the argument, and installing it into the perl library path. This is rather easy to do. It can even be extended deeper, too; if the quux type requires a list of arguments rather than just a single argument, it could itself also have an array attribute with relevant coercions. These coercions could then be used to convert the list of arguments into an array of objects of the correct type, using the same schema as above. The actual DSL is of course somewhat more complex, and also actually does something useful, in contrast to the DSL that we define here which just says things. Creating an object that actually performs some action when required is left as an exercise to the reader.

26 December 2024

Kentaro Hayashi: How to check what matches linux-any?

Usually Architecture: any is recommended in debian/control except upstream explicitly doesn't/won't support that architecture. In practical use case, linux-any is useful to exclude hurd architecture. (Previously it is also useful to exclude kfreebsd) Here is the simple script to check whether specific architecutre matches linux-any or not.
2024/12/28: UPDATE I've got feedback that the following command should be used. (Thanks Cyril Brulebois and Guillem Jover)
dpkg-architecture -L -W linux-any
or
dpkg-architecture --match-wildcard linux-any --list-known
NOTE: the following example is wrong, but for a record what I did wrongly, keep it as is:
#!usr/bin/bash
TARGETS="
amd64
arm64
armel
armhf
i386
mips64el
ppc64el
riscv64
s390x
alpha
hppa
hurd-amd64
hurd-i386
loong64
m68k
powerpc
ppc64
sh4
sparc64
x32
"
for d in $TARGETS; do
    dpkg-architecture -i linux-any -a $d
    if [ $? -eq 0 ]; then
    echo -e "[\e[32m\e[40mOK\e[0m] $d is linux-any (dpkg-architecture -i linux-any -a $d)"
    else
    echo -e "[\e[31m\e[40mNG\e[0m] $d is NOT linux-any (dpkg-architecture -i linux-any -a $d)"
    fi
done
screenshot of shell script

24 December 2024

Russ Allbery: Review: Number Go Up

Review: Number Go Up, by Zeke Faux
Publisher: Crown Currency
Copyright: 2023
Printing: 2024
ISBN: 0-593-44382-9
Format: Kindle
Pages: 373
Number Go Up is a cross between a history and a first-person account of investigative journalism around the cryptocurrency bubble and subsequent collapse in 2022. The edition I read has an afterward from June 2024 that brings the story up to date with Sam Bankman-Fried's trial and a few other events. Zeke Faux is a reporter for Bloomberg News and a fellow of New America. Last year, I read Michael Lewis's Going Infinite, a somewhat-sympathetic book-length profile of Sam Bankman-Fried that made a lot of people angry. One of the common refrains at the time was that people should read Number Go Up instead, and since I'm happy to read more about the absurdities of the cryptocurrency world, I finally got around to reading the other big crypto book of 2023. This is a good book, with some caveats that I am about to explain at absurd length. If you want a skeptical history of the cryptocurrency bubble, you should read it. People who think that it's somehow in competition with Michael Lewis's book or who think the two books disagree (including Faux himself) have profoundly missed the point of Going Infinite. I agree with Matt Levine: Both of these books are worth your time if this is the sort of thing you like reading about. But (much) more on Faux's disagreements with Lewis later. The frame of Number Go Up is Faux's quixotic quest to prove that Tether is a fraud. To review this book, I therefore need to briefly explain what Tether is. This is only the first of many extended digressions. One natural way to buy cryptocurrency would be to follow the same pattern as a stock brokerage account. You would deposit some amount of money into the account (or connect the brokerage account to your bank account), and then exchange money for cryptocurrency or vice versa, using bank transfers to put money in or take it out. However, there are several problems with this. One is that swapping cryptocurrency for money is awkward and sometimes expensive. Another is that holding people's investment money for them is usually highly regulated, partly for customer safety but also to prevent money laundering. These are often called KYC laws (Know Your Customer), and the regulation-hostile world of cryptocurrency didn't want to comply with them. Tether is a stablecoin, which means that the company behind Tether attempts to guarantee that one Tether is always worth exactly one US dollar. It is not a speculative investment like Bitcoin; it's a cryptocurrency substitute for dollars. People exchange dollars for Tether to get their money into the system and then settle all of their subsequent trades in Tether, only converting the Tether back to dollars when they want to take their money out of cryptocurrency entirely. In essence, Tether functions like the cash reserve in a brokerage account: Your Tether holdings are supposedly guaranteed to be equivalent to US dollars, you can withdraw them at any time, and because you can do so, you don't bother, instead leaving your money in the reserve account while you contemplate what new coin you want to buy. As with a bank, this system rests on the assurance that one can always exchange one Tether for one US dollar. The instant people stop believing this is true, people will scramble to get their money out of Tether, creating the equivalent of a bank run. Since Tether is not a regulated bank or broker and has no deposit insurance or strong legal protections, the primary defense against a run on Tether is Tether's promise that they hold enough liquid assets to be able to hand out dollars to everyone who wants to redeem Tether. (A secondary defense that I wish Faux had mentioned is that Tether limits redemptions to registered accounts redeeming more than $100,000, which is a tiny fraction of the people who hold Tether, but for most purposes this doesn't matter because that promise is sufficient to maintain the peg with the dollar.) Faux's firmly-held belief throughout this book is that Tether is lying. He believes they do not have enough money to redeem all existing Tether coins, and that rather than backing every coin with very safe liquid assets, they are using the dollars deposited in the system to make illiquid and risky investments. Faux never finds the evidence that he's looking for, which makes this narrative choice feel strange. His theory was tested when there was a run on Tether following the collapse of the Terra stablecoin. Tether passed without apparent difficulty, redeeming $16B or about 20% of the outstanding Tether coins. This doesn't mean Faux is wrong; being able to redeem 20% of the outstanding tokens is very different from being able to redeem 100%, and Tether has been fined for lying about its reserves. But Tether is clearly more stable than Faux thought it was, which makes the main narrative of the book weirdly unsatisfying. If he admitted he might be wrong, I would give him credit for showing his work even if it didn't lead where he expected, but instead he pivots to focusing on Tether's role in money laundering without acknowledging that his original theory took a serious blow. In Faux's pursuit of Tether, he wanders through most of the other elements of the cryptocurrency bubble, and that's the strength of this book. Rather than write Number Go Up as a traditional history, Faux chooses to closely follow his own thought processes and curiosity. This has the advantage of giving Faux an easy and natural narrative, something that non-fiction books of this type can struggle with, and it lets Faux show how confusing and off-putting the cryptocurrency world is to an outsider. The best parts of this book were the parts unrelated to Tether. Faux provides an excellent summary of the Axie Infinity speculative bubble and even traveled to the Philippines to interview people who were directly affected. He then wandered through the bizarre world of NFTs, and his first-hand account of purchasing one (specifically a Mutant Ape) to get entrance to a party (which sounded like a miserable experience I would pay money to get out of) really drives home how sketchy and weird cryptocurrency-related software and markets can be. He also went to El Salvador to talk to people directly about the country's supposed embrace of Bitcoin, and there's no substitute for that type of reporting to show how exaggerated and dishonest the claims of cryptocurrency adoption are. The disadvantage of this personal focus on Faux himself is that it sometimes feels tedious or sensationalized. I was much less interested in his unsuccessful attempts to interview the founder of Tether than Faux was, and while the digression into forced labor compounds in Cambodia devoted to pig butchering scams was informative (and horrific), I think Faux leaned too heavily on an indirect link to Tether. His argument is that cryptocurrency enables a type of money laundering that is particularly well-suited to supporting scams, but both scams and this type of economic slavery existed before cryptocurrency and will exist afterwards. He did not make a very strong case that Tether was uniquely valuable as a money laundering service, as opposed to a currently useful tool that would be replaced with some other tool should it go away. This part of the book is essentially an argument that money laundering is bad because it enables crime, and sure, to an extent I agree. But if you're going to put this much emphasis on the evils of money laundering, I think you need to at least acknowledge that many people outside the United States do not want to give US government, which is often openly hostile to them, veto power over their financial transactions. Faux does not. The other big complaint I have with this book, and with a lot of other reporting on cryptocurrency, is that Faux is sloppy with the term "Ponzi scheme." This is going to sound like nit-picking, but I think this sloppiness matters because it may obscure an ongoing a shift in cryptocurrency markets. A Ponzi scheme is not any speculative bubble. It is a very specific type of fraud in which investors are promised improbably high returns at very low risk and with safe principal. These returns are paid out, not via investment in some underlying enterprise, but by taking the money from new investments and paying it to earlier investors. Ponzi schemes are doomed because satisfying their promises requires a constantly increasing flow of new investors. Since the population of the world is finite, all Ponzi schemes are mathematically guaranteed to eventually fail, often in a sudden death spiral of ever-increasing promises to lure new investors when the investment stream starts to dry up. There are some Ponzi schemes in cryptocurrency, but most practices that are called Ponzi schemes are not. For example, Faux calls Axie Infinity a Ponzi scheme, but it was missing the critical elements of promised safe returns and fraudulently paying returns from the investments of later investors. It was simply a speculative bubble that people bought into on the assumption that its price would increase, and like any speculative bubble those who sold before the peak made money at the expense of those who bought at the peak. The reason why this matters is that Ponzi schemes are a self-correcting problem. One can decry the damage caused when they collapse, but one can also feel the reassuring certainty that they will inevitably collapse and prove the skeptics correct. The same is not true of speculative assets in general. You may think that the lack of an underlying economic justification for prices means that a speculative bubble is guaranteed to collapse eventually, but in the famous words of Gary Schilling, "markets can remain irrational a lot longer than you and I can remain solvent." One of the people Faux interviews explains this distinction to him directly:
Rong explained that in a true Ponzi scheme, the organizer would have to handle the "fraud money." Instead, he gave the sneakers away and then only took a small cut of each trade. "The users are trading between each other. They are not going through me, right?" Rong said. Essentially, he was arguing that by downloading the Stepn app and walking to earn tokens, crypto bros were Ponzi'ing themselves.
Faux is openly contemptuous of this response, but it is technically correct. Stepn is not a Ponzi scheme; it's a speculative bubble. There are no guaranteed returns being paid out of later investments and no promise that your principal is safe. People are buying in at price that you may consider irrational, but Stepn never promised you would get your money back, let alone make a profit, and therefore it doesn't have the exponential progression of a Ponzi scheme. One can argue that this is a distinction without a moral difference, and personally I would agree, but it matters immensely if one is trying to analyze the future of cryptocurrencies. Schemes as transparently unstable as Stepn (which gives you coins for exercise and then tries to claim those coins have value through some vigorous hand-waving) are nearly as certain as Ponzi schemes to eventually collapse. But it's also possible to create a stable business around allowing large numbers of people to regularly lose money to small numbers of sophisticated players who are collecting all of the winnings. It's called a poker room at a casino, and no one thinks poker rooms are Ponzi schemes or are doomed to collapse, even though nearly everyone who plays poker will lose money. This is the part of the story that I think Faux largely missed, and which Michael Lewis highlights in Going Infinite. FTX was a legitimate business that made money (a lot of money) off of trading fees, in much the same way that a casino makes money off of poker rooms. Lots of people want to bet on cryptocurrencies, similar to how lots of people want to play poker. Some of those people will win; most of those people will lose. The casino doesn't care. Its profit comes from taking a little bit of each pot, regardless of who wins. Bankman-Fried also speculated with customer funds, and therefore FTX collapsed, but there is no inherent reason why the core exchange business cannot be stable if people continue to want to speculate in cryptocurrencies. Perhaps people will get tired of this method of gambling, but poker has been going strong for 200 years. It's also important to note that although trading fees are the most obvious way to be a profitable cryptocurrency casino, they're not the only way. Wall Street firms specialize in finding creative ways to take a cut of every financial transaction, and many of those methods are more sophisticated than fees. They are so good at this that buying and selling stock through trading apps like Robinhood is free. The money to run the brokerage platform comes from companies that are delighted to pay for the opportunity to handle stock trades by day traders with a phone app. This is not, as some conspiracy theories would have you believe, due to some sort of fraudulent price manipulation. It is because the average person with a Robinhood phone app is sufficiently unsophisticated that companies that have invested in complex financial modeling will make a steady profit taking the other side of their trades, mostly because of the spread (the difference between offered buy and sell prices). Faux is so caught up in looking for Ponzi schemes and fraud that I think he misses this aspect of cryptocurrency's transformation. Wall Street trading firms aren't piling into cryptocurrency because they want to do securities fraud. They're entering this market because there seems to be persistent demand for this form of gambling, cryptocurrency markets reward complex financial engineering, and running a legal casino is a profitable business model. Michael Lewis appears as a character in this book, and Faux portrays him quite negatively. The root of this animosity appears to stem from a cryptocurrency conference in the Bahamas that Faux attended. Lewis interviewed Bankman-Fried on stage, and, from Faux's account, his questions were fawning and he praised cryptocurrencies in ways that Faux is certain he knew were untrue. From that point on, Faux treats Lewis as an apologist for the cryptocurrency industry and for Sam Bankman-Fried specifically. I think this is a legitimate criticism of Lewis's methods of getting close to the people he wants to write about, but I think Faux also makes the common mistake of assuming Lewis is a muckraking reporter like himself. This has never been what Lewis is interested in. He writes about people he finds interesting and that he thinks a reader will also find interesting. One can legitimately accuse him of being credulous, but that's partly because he's not even trying to do the same thing Faux is doing. He's not trying to judge; he's trying to understand. This shows when it comes to the parts of this book about Sam Bankman-Fried. Faux's default assumption is that everyone involved in cryptocurrency is knowingly doing fraud, and a lot of his research is looking for evidence to support the conclusion he had already reached. I don't think there's anything inherently wrong with that approach: Faux is largely, although not entirely, correct, and this type of hostile journalism is incredibly valuable for society at large. Upton Sinclair didn't start writing The Jungle with an open mind about the meat-packing industry. But where Faux and Lewis disagree on Bankman-Fried's motivations and intentions, I think Lewis has the much stronger argument. Faux's position is that Bankman-Fried always intended to steal people's money through fraud, perhaps to fund his effective altruism donations, and his protestations that he made mistakes and misplaced funds are obvious lies. This is an appealing narrative if one is looking for a simple villain, but Faux's evidence in support of this is weak. He mostly argues through stereotype: Bankman-Fried was a physics major and a Jane Street trader and therefore could not possibly be the type of person to misplace large amounts of money or miscalculate risk. If he wants to understand how that could be possible, he could read Going Infinite? I find it completely credible that someone with what appears to be uncontrolled, severe ADHD could be adept at trading and calculating probabilities and yet also misplace millions of dollars of assets because he wasn't thinking about them and therefore they stopped existing. Lewis made a lot of people angry by being somewhat sympathetic to someone few people wanted to be sympathetic towards, but Faux (and many others) are also misrepresenting his position. Lewis agrees that Bankman-Fried intentionally intermingled customer funds with his hedge fund and agrees that he lied about doing this. His only contention is that Bankman-Fried didn't do this to steal the money; instead, he invested customer money in risky bets that he thought would pay off. In support of this, Lewis made a prediction that was widely scoffed at, namely that much less of FTX's money was missing than was claimed, and that likely most or all of it would be found. And, well, Lewis was basically correct? The FTX bankruptcy is now expected to recover considerably more than the amount of money owed to creditors. Faux argues that this is only because the bankruptcy clawed back assets and cryptocurrencies have gone up considerably since the FTX bankruptcy, and therefore that the lost money was just replaced by unexpected windfall profits on other investments, but I don't think this point is as strong as he thinks it is. Bankman-Fried lost money on some of what he did with customer funds, made money on other things, and if he'd been able to freeze withdrawals for the year that the bankruptcy froze them, it does appear most of the money would have been recoverable. This does not make what he did legal or morally right, but no one is arguing that, only that he didn't intentionally steal money for his own personal gain or for effective altruism donations. And on that point, I don't think Faux is giving Lewis's argument enough credit. I have a lot of complaints about this book because I know way too much about this topic than anyone should probably know. I think Faux missed the plot in a couple of places, and I wish someone would write a book about where cryptocurrency markets are currently going. (Matt Levine's Money Stuff newsletter is quite good, but it's about all sorts of things other than cryptocurrency and isn't designed to tell a coherent story.) But if you know less about cryptocurrency and just want to hear the details of the run-up to the 2022 bubble, this is a great book for that. Faux is writing for people who are already skeptical and is not going to convince people who are cryptocurrency true believers, but that's fine. The details are largely correct (and extensively footnoted) and will satisfy most people's curiosity. Lewis's Going Infinite is a better book, though. It's not the same type of book at all, and it will not give you the broader overview of the cryptocurrency world. But if you're curious about what was going through the head of someone at the center of all of this chaos, I think Lewis's analysis is much stronger than Faux's. I'm happy I read both books. Rating: 8 out of 10

23 December 2024

Simon Josefsson: OpenSSH and Git on a Post-Quantum SPHINCS+

Are you aware that Git commits and tags may be signed using OpenSSH? Git signatures may be used to improve integrity and authentication of our software supply-chain. Popular signature algorithms include Ed25519, ECDSA and RSA. Did you consider that these algorithms may not be safe if someone builds a post-quantum computer? As you may recall, I have earlier blogged about the efficient post-quantum key agreement mechanism called Streamlined NTRU Prime and its use in SSH and I have attempted to promote the conservatively designed Classic McEliece in a similar way, although it remains to be adopted. What post-quantum signature algorithms are available? There is an effort by NIST to standardize post-quantum algorithms, and they have a category for signature algorithms. According to wikipedia, after round three the selected algorithms are CRYSTALS-Dilithium, FALCON and SPHINCS+. Of these, SPHINCS+ appears to be a conservative choice suitable for long-term digital signatures. Can we get this to work? Recall that Git uses the ssh-keygen tool from OpenSSH to perform signing and verification. To refresh your memory, let s study the commands that Git uses under the hood for Ed25519. First generate a Ed25519 private key:
jas@kaka:~$ ssh-keygen -t ed25519 -f my_ed25519_key -P ""
Generating public/private ed25519 key pair.
Your identification has been saved in my_ed25519_key
Your public key has been saved in my_ed25519_key.pub
The key fingerprint is:
SHA256:fDa5+jmC2+/aiLhWeWA3IV8Wj6yMNTSuRzqUZlIGlXQ jas@kaka
The key's randomart image is:
+--[ED25519 256]--+
     .+=.E ..      
      oo=.ooo      
     . =o=+o .     
      =oO+o .      
      .=+S.=       
       oo.o o      
      . o  .       
     ...o.+..      
    .o.o.=**.      
+----[SHA256]-----+
jas@kaka:~$ cat my_ed25519_key
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAMwAAAAtzc2gtZW
QyNTUxOQAAACAWP/aZ8hzN0WNRMSpjzbgW1tJXNd2v6/dnbKaQt7iIBQAAAJCeDotOng6L
TgAAAAtzc2gtZWQyNTUxOQAAACAWP/aZ8hzN0WNRMSpjzbgW1tJXNd2v6/dnbKaQt7iIBQ
AAAEBFRvzgcD3YItl9AMmVK4xDKj8NTg4h2Sluj0/x7aSPlhY/9pnyHM3RY1ExKmPNuBbW
0lc13a/r92dsppC3uIgFAAAACGphc0BrYWthAQIDBAU=
-----END OPENSSH PRIVATE KEY-----
jas@kaka:~$ cat my_ed25519_key.pub 
ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBY/9pnyHM3RY1ExKmPNuBbW0lc13a/r92dsppC3uIgF jas@kaka
jas@kaka:~$ 
Then let s sign something with this key:
jas@kaka:~$ echo "Hello world!" > msg
jas@kaka:~$ ssh-keygen -Y sign -f my_ed25519_key -n my-namespace msg
Signing file msg
Write signature to msg.sig
jas@kaka:~$ cat msg.sig 
-----BEGIN SSH SIGNATURE-----
U1NIU0lHAAAAAQAAADMAAAALc3NoLWVkMjU1MTkAAAAgFj/2mfIczdFjUTEqY824FtbSVz
Xdr+v3Z2ymkLe4iAUAAAAMbXktbmFtZXNwYWNlAAAAAAAAAAZzaGE1MTIAAABTAAAAC3Nz
aC1lZDI1NTE5AAAAQLmWsq05tqOOZIJqjxy5ZP/YRFoaX30lfIllmfyoeM5lpVnxJ3ZxU8
SF0KodDr8Rtukg2N3Xo80NGvZOzbG/9Aw=
-----END SSH SIGNATURE-----
jas@kaka:~$
Now let s create a list of trusted public-keys and associated identities:
jas@kaka:~$ echo 'my.name@example.org ssh-ed25519 AAAAC3NzaC1lZDI1NTE5AAAAIBY/9pnyHM3RY1ExKmPNuBbW0lc13a/r92dsppC3uIgF' > allowed-signers
jas@kaka:~$ 
Then let s verify the message we just signed:
jas@kaka:~$ cat msg   ssh-keygen -Y verify -f allowed-signers -I my.name@example.org -n my-namespace -s msg.sig
Good "my-namespace" signature for my.name@example.org with ED25519 key SHA256:fDa5+jmC2+/aiLhWeWA3IV8Wj6yMNTSuRzqUZlIGlXQ
jas@kaka:~$ 
I have implemented support for SPHINCS+ in OpenSSH. This is early work, but I wanted to announce it to get discussion of some of the details going and to make people aware of it. What would a better way to demonstrate SPHINCS+ support in OpenSSH than by validating the Git commit that implements it using itself? Here is how to proceed, first get a suitable development environment up and running. I m using a Debian container launched in a protected environment using podman.
jas@kaka:~$ podman run -it --rm debian:stable
Then install the necessary build dependencies for OpenSSH.
# apt-get update 
# apt-get install git build-essential autoconf libz-dev libssl-dev
Now clone my OpenSSH branch with the SPHINCS+ implentation and build it. You may browse the commit on GitHub first if you are curious.
# cd
# git clone https://github.com/jas4711/openssh-portable.git -b sphincsp
# cd openssh-portable
# autoreconf -fvi
# ./configure
# make
Configure a Git allowed signers list with my SPHINCS+ public key (make sure to keep the public key on one line with the whitespace being one ASCII SPC character):
# mkdir -pv ~/.ssh
# echo 'simon@josefsson.org ssh-sphincsplus@openssh.com AAAAG3NzaC1zcGhpbmNzcGx1c0BvcGVuc3NoLmNvbQAAAECI6eacTxjB36xcPtP0ZyxJNIGCN350GluLD5h0KjKDsZLNmNaPSFH2ynWyKZKOF5eRPIMMKSCIV75y+KP9d6w3' > ~/.ssh/allowed_signers
# git config gpg.ssh.allowedSignersFile ~/.ssh/allowed_signers
Then verify the commit using the newly built ssh-keygen binary:
# PATH=$PWD:$PATH
# git log -1 --show-signature
commit ce0b590071e2dc845373734655192241a4ace94b (HEAD -> sphincsp, origin/sphincsp)
Good "git" signature for simon@josefsson.org with SPHINCSPLUS key SHA256:rkAa0fX0lQf/7V7QmuJHSI44L/PAPPsdWpis4nML7EQ
Author: Simon Josefsson <simon@josefsson.org>
Date:   Tue Dec 3 18:44:25 2024 +0100
    Add SPHINCS+.
# git verify-commit ce0b590071e2dc845373734655192241a4ace94b
Good "git" signature for simon@josefsson.org with SPHINCSPLUS key SHA256:rkAa0fX0lQf/7V7QmuJHSI44L/PAPPsdWpis4nML7EQ
# 
Yay! So what are some considerations? SPHINCS+ comes in many different variants. First it comes with three security levels approximately matching 128/192/256 bit symmetric key strengths. Second choice is between the SHA2-256, SHAKE256 (SHA-3) and Haraka hash algorithms. Final choice is between a robust and a simple variant with different security and performance characteristics. To get going, I picked the sphincss256sha256robust SPHINCS+ implementation from SUPERCOP 20241022. There is a good size comparison table in the sphincsplus implementation, if you want to consider alternative variants. SPHINCS+ public-keys are really small, as you can see in the allowed signers file. This is really good because they are handled by humans and often by cut n paste. What about private keys? They are slightly longer than Ed25519 private keys but shorter than typical RSA private keys.
# ssh-keygen -t sphincsplus -f my_sphincsplus_key -P ""
Generating public/private sphincsplus key pair.
Your identification has been saved in my_sphincsplus_key
Your public key has been saved in my_sphincsplus_key.pub
The key fingerprint is:
SHA256:4rNfXdmLo/ySQiWYzsBhZIvgLu9sQQz7upG8clKziBg root@ad600ff56253
The key's randomart image is:
+[SPHINCSPLUS 256-+
  .  .o            
 o . oo.           
  = .o.. o         
 o o  o o . .   o  
 .+    = S o   o . 
 Eo=  . + . . .. . 
 =*.+  o . . oo .  
 B+=    o o.o. .   
 o*o   ... .oo.    
+----[SHA256]-----+
# cat my_sphincsplus_key.pub 
ssh-sphincsplus@openssh.com AAAAG3NzaC1zcGhpbmNzcGx1c0BvcGVuc3NoLmNvbQAAAEAltAX1VhZ8pdW9FgC+NdM6QfLxVXVaf1v2yW4v+tk2Oj5lxmVgZftfT37GOMOlK9iBm9SQHZZVYZddkEJ9F1D7 root@ad600ff56253
# cat my_sphincsplus_key 
-----BEGIN OPENSSH PRIVATE KEY-----
b3BlbnNzaC1rZXktdjEAAAAABG5vbmUAAAAEbm9uZQAAAAAAAAABAAAAYwAAABtzc2gtc3
BoaW5jc3BsdXNAb3BlbnNzaC5jb20AAABAJbQF9VYWfKXVvRYAvjXTOkHy8VV1Wn9b9slu
L/rZNjo+ZcZlYGX7X09+xjjDpSvYgZvUkB2WVWGXXZBCfRdQ+wAAAQidiIwanYiMGgAAAB
tzc2gtc3BoaW5jc3BsdXNAb3BlbnNzaC5jb20AAABAJbQF9VYWfKXVvRYAvjXTOkHy8VV1
Wn9b9sluL/rZNjo+ZcZlYGX7X09+xjjDpSvYgZvUkB2WVWGXXZBCfRdQ+wAAAIAbwBxEhA
NYzITN6VeCMqUyvw/59JM+WOLXBlRbu3R8qS7ljc4qFVWUtmhy8B3t9e4jrhdO6w0n5I4l
mnLnBi2hJbQF9VYWfKXVvRYAvjXTOkHy8VV1Wn9b9sluL/rZNjo+ZcZlYGX7X09+xjjDpS
vYgZvUkB2WVWGXXZBCfRdQ+wAAABFyb290QGFkNjAwZmY1NjI1MwECAwQ=
-----END OPENSSH PRIVATE KEY-----
# 
Signature size? Now here is the challenge, for this variant the size is around 29kb or close to 600 lines of base64 data:
# git cat-file -p ce0b590071e2dc845373734655192241a4ace94b   head -10
tree ede42093e7d5acd37fde02065a4a19ac1f418703
parent 826483d51a9fee60703298bbf839d9ce37943474
author Simon Josefsson <simon@josefsson.org> 1733247865 +0100
committer Simon Josefsson <simon@josefsson.org> 1734907869 +0100
gpgsig -----BEGIN SSH SIGNATURE-----
 U1NIU0lHAAAAAQAAAGMAAAAbc3NoLXNwaGluY3NwbHVzQG9wZW5zc2guY29tAAAAQIjp5p
 xPGMHfrFw+0/RnLEk0gYI3fnQaW4sPmHQqMoOxks2Y1o9IUfbKdbIpko4Xl5E8gwwpIIhX
 vnL4o/13rDcAAAADZ2l0AAAAAAAAAAZzaGE1MTIAAHSDAAAAG3NzaC1zcGhpbmNzcGx1c0
 BvcGVuc3NoLmNvbQAAdGDHlobgfgkKKQBo3UHmnEnNXczCMNdzJmeYJau67QM6xZcAU+d+
 2mvhbksm5D34m75DWEngzBb3usJTqWJeeDdplHHRe3BKVCQ05LHqRYzcSdN6eoeZqoOBvR
# git cat-file -p ce0b590071e2dc845373734655192241a4ace94b   tail -5 
 ChvXUk4jfiNp85RDZ1kljVecfdB2/6CHFRtxrKHJRDiIavYjucgHF1bjz0fqaOSGa90UYL
 RZjZ0OhdHOQjNP5QErlIOcZeqcnwi0+RtCJ1D1wH2psuXIQEyr1mCA==
 -----END SSH SIGNATURE-----
Add SPHINCS+.
# git cat-file -p ce0b590071e2dc845373734655192241a4ace94b   wc -l
579
# 
What about performance? Verification is really fast:
# time git verify-commit ce0b590071e2dc845373734655192241a4ace94b
Good "git" signature for simon@josefsson.org with SPHINCSPLUS key SHA256:rkAa0fX0lQf/7V7QmuJHSI44L/PAPPsdWpis4nML7EQ
real	0m0.010s
user	0m0.005s
sys	0m0.005s
# 
On this machine, verifying an Ed25519 signature is a couple of times slower, and needs around 0.07 seconds. Signing is slower, it takes a bit over 2 seconds on my laptop.
# echo "Hello world!" > msg
# time ssh-keygen -Y sign -f my_sphincsplus_key -n my-namespace msg
Signing file msg
Write signature to msg.sig
real	0m2.226s
user	0m2.226s
sys	0m0.000s
# echo 'my.name@example.org ssh-sphincsplus@openssh.com AAAAG3NzaC1zcGhpbmNzcGx1c0BvcGVuc3NoLmNvbQAAAEAltAX1VhZ8pdW9FgC+NdM6QfLxVXVaf1v2yW4v+tk2Oj5lxmVgZftfT37GOMOlK9iBm9SQHZZVYZddkEJ9F1D7' > allowed-signers
# cat msg   ssh-keygen -Y verify -f allowed-signers -I my.name@example.org -n my-namespace -s msg.sig
Good "my-namespace" signature for my.name@example.org with SPHINCSPLUS key SHA256:4rNfXdmLo/ySQiWYzsBhZIvgLu9sQQz7upG8clKziBg
# 
Welcome to our new world of Post-Quantum safe digital signatures of Git commits, and Happy Hacking!

21 December 2024

Dirk Eddelbuettel: anytime 0.3.11 on CRAN: Maintenance

A follow-up release 0.3.11 to the recent 0.3.10 release release of the anytime package arrived on CRAN two days ago. The package is fairly feature-complete, and code and functionality remain mature and stable, of course. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character, factor, ordered, input format to either POSIXct (when called as anytime) or Date objects (when called as anydate) and to do so without requiring a format string as well as accomodating different formats in one input vector. See the anytime page, or the GitHub repo for a few examples, and the beautiful documentation site for all documentation. This release simply skips one test file. CRAN labeled an error M1mac yet it did not reproduce on any of the other M1 macOS I can access (macbuilder, GitHub Actions) as this appeared related to a local setting of timezone values I could not reproduce anywwhere. So the only way to get rid of the fail is to not to run the test. Needless to say the upload process was a little tedious as I got the passive-aggressive not responding treatment on a first upload and the required email answer it lead to. Anyway, after a few days, and even more deep breaths, it is taken care of and now the package result standing is (at least currently) pristinely clean. The short list of changes follows.

Changes in anytime version 0.3.11 (2024-12-18)
  • Skip a test file

Courtesy of my CRANberries, there is also a diffstat report of changes relative to the previous release. The issue tracker tracker off the GitHub repo can be use for questions and comments. More information about the package is at the package page, the GitHub repo and the documentation site.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

18 December 2024

Simon Josefsson: Guix Container Images for GitLab CI/CD

I am using GitLab CI/CD pipelines for several upstream projects (libidn, libidn2, gsasl, inetutils, libtasn1, libntlm, ) and a long-time concern for these have been that there is too little testing on GNU Guix. Several attempts have been made, and earlier this year Ludo came really close to finish this. My earlier effort to idempotently rebuild Debian recently led me to think about re-bootstrapping Debian. Since Debian is a binary distribution, it re-use earlier binary packages when building new packages. The prospect of re-bootstrapping Debian in a reproducible way by rebuilding all of those packages going back to the beginning of time does not appeal to me. Instead, wouldn t it be easier to build Debian trixie (or some future release of Debian) from Guix, by creating a small bootstrap sandbox that can start to build Debian packages, and then make sure that the particular Debian release can idempotently rebuild itself in a reproducible way? Then you will eventually end up with a reproducible and re-bootstrapped Debian, which pave the way for a trustworthy release of Trisquel. Fortunately, such an endeavour appears to offer many rabbit holes. Preparing Guix container images for use in GitLab pipelines is one that I jumped into in the last few days, and just came out of. Let s go directly to the point of this article: here is a GitLab pipeline job that runs in a native Guix container image that builds libksba after installing the libgpg-error dependency from Guix using the pre-built substitutes.
test-amd64-latest-wget-configure-make-libksba:
  image: registry.gitlab.com/debdistutils/guix/container:latest
  before_script:
  - lndir /gnu/store/*profile/etc/ /etc
  - rm -f /etc/group
  - groupadd --system guixbuild
  - for i in $(seq -w 1 10); do useradd -g guixbuild -G guixbuild -d /var/empty -s $(command -v nologin) -c "Guix build user $i" --system guixbuilder$i; done
  - export HOME=/
  - export LANG=C.UTF-8
  - guix-daemon --disable-chroot --build-users-group=guixbuild &
  - guix archive --authorize < /share/guix/ci.guix.gnu.org.pub
  - guix archive --authorize < /share/guix/bordeaux.guix.gnu.org.pub
  - guix describe
  - guix package -i libgpg-error
  - GUIX_PROFILE="//.guix-profile"
  - . "$GUIX_PROFILE/etc/profile"
  script:
  - wget https://www.gnupg.org/ftp/gcrypt/libksba/libksba-1.6.7.tar.bz2
  - tar xfa libksba-1.6.7.tar.bz2
  - cd libksba-1.6.7
  - ./configure
  - make V=1
  - make check VERBOSE=t V=1
You can put that in a .gitlab-ci.yml and push it to GitLab and you will end up with a nice pipeline job output. As you may imagine, there are several things that are sub-optimal in the before_script above that ought to be taken care of by the Guix container image, and I hope to be able to remove as much of the ugliness as possible. However that doesn t change that these images are useful now, and I wanted to announce this work to allow others to start testing them and possibly offer help. I have started to make use of these images in some projects, see for example the libntlm commit for that. You are welcome to join me in the Guix container images for GitLab CI/CD project! Issues and merge requests are welcome happy hacking folks!

17 December 2024

Russ Allbery: Review: Iris Kelly Doesn't Date

Review: Iris Kelly Doesn't Date, by Ashley Herring Blake
Series: Bright Falls #3
Publisher: Berkley Romance
Copyright: October 2023
ISBN: 0-593-55058-7
Format: Kindle
Pages: 381
Iris Kelly Doesn't Date is a sapphic romance novel (probably a romantic comedy, although I'm bad at romance subgenres). It is the third book in the Bright Falls series. In the romance style, it has a new set of protagonists, but the protagonists of the previous books appear as supporting characters and reading this will spoil the previous books. Among the friend group we were introduced to in Delilah Green Doesn't Care, Iris was the irrepressible loudmouth. She's bad at secrets, good at saying whatever is on her mind, and has zero desire to either get married or have children. After one of the side plots of Astrid Parker Doesn't Fail, she has sworn off dating entirely. Iris is also now a romance novelist. Her paper store didn't get enough foot traffic to justify staying open, so she switched her planner business to online only and wrote a romance novel that was good enough to get a two-book deal. Now she needs to write a second book and she has absolutely nothing. Her own avoidance of romantic situations is not helping, but neither is her meddling family who are convinced her choices about marriage and family can be overturned with sufficient pestering. She desperately needs to shake up her life, get out of her creative rut, and do something new. Failing that, she'll settle for meeting someone in a bar and having some fun. Stevie is a barista and actress living in Portland. Six months ago, she broke up with Adri, her creative partner, girlfriend of six years, and the first person with whom she had a serious relationship. More precisely, Adri broke up with her. They're still friends, truly, even though that friendship is being seriously strained by Adri dating Vanessa, another member of their small and close-knit friend group. Stevie has occasionally-crippling anxiety, not much luck in finding real acting roles in Portland, and a desperate desire to not make waves. Ren, the fourth member of their friend group, thinks Stevie needs a new relationship, or at least a fling. That's how Stevie, with Ren as backup and encouragement, ends up at the same bar with Iris. The resulting dance and conversation was rather fun for both Stevie and Iris. The attempted one-night stand afterwards was a disaster due to Stevie's anxiety, and neither of them expected to see the other again. Stevie therefore felt safe pretending they'd hit it off to get her friends off her back. When Iris's continued restlessness lands her in an audition for Adri's fundraiser play that she also talked Stevie into performing in, this turns into a full-blown fake dating trope. These books continue to be impossible to put down. I'm not sure what Blake is doing to make the pacing so perfect, but as with the previous books of the series I found this utterly compulsive reading. I started it in the afternoon, took a break in the evening for a few hours, and then finished it at 2am. I wasn't sure if a book focused on Iris would work as well, but I need not have worried. Iris Kelly Doesn't Date is both more dramatic and more trope-centered than the earlier books, but Blake handles that in a way that fits Iris's personality and wasn't annoying even to a reader like me, who has an aversion to many types of relationship drama. The secret is Stevie, and specifically having the other protagonist be someone with severe anxiety.
No was never a very easy word for Stevie when it came to Adri, when it came to anyone, really. She could handle the little stuff do you want a soda, have you seen this movie, do you like onions on your pizza but the big stuff, the stuff that caused disappointed expressions and down-turned mouths... yeah, she sucked at that part. Her anxiety would flare, and she'd spend the next week convinced her friends hated her, she'd die alone and miserable, and wasn't worth a damn to anyone. Then, when said friend or family member eventually got ahold of her to tell her that, no, of course they didn't hate her, why in the world would she think that, her anxiety would crest once again, convincing her that she was terrible at understanding people and could never trust her own brain to make heads or tails of any social situation.
This is a spot-on description of a particular type of anxiety, but also this is the perfect protagonist to pair with Iris. Throughout the series, Iris has always been the ride-or-die friend, the person who may have no idea how to help but who will show up anyway and at least try to distract you. Stevie's anxiety makes Iris feel protective, which reveals one of the best sides of Iris's personality, and then the protectiveness plays off against Iris's own relationship issues and tendency to avoid taking anything too seriously. It's one of those relationships that starts a bit one-sided and then becomes mutually supporting once Stevie gets her feet under her. That's a relationship pattern I really enjoy reading about. As with the rest of the series, the friendship dynamics are great. Here we get to see two friend groups at work: Iris's, which we've seen in the previous two volumes and which expanded interestingly in Astrid Parker Doesn't Fail, and Stevie's, which is new. I liked all of these people, even Adri in her own way (although she's the hardest to like). The previous happily-ever-afters do get a bit awkward here, but Blake tries to make that part of the plot and also avoids most of the problem of somewhat-boring romantic bliss by spreading the friendship connections a bit wider. Stevie's friend group formed at orientation at Reed College, and that let me put my finger on another property of this series: essentially all of the characters are from a very specific social class. They're nearly all arts people (bookstore owner, photographer, interior decorator, actress, writer, director), they've mostly gone to college, and while most of them don't have lots of money, there's always at least one person in each friend group with significant wealth. Jordan, from the previous book, is a bit of an exception since she works in a trade (a carpenter), but she still acts like someone from that same social class. It's a bit like reading Jane Austen novels and realizing that the protagonists are drawn from a very specific and very narrow portion of society. This is not a complaint, to be clear; I have no objections to reading about a very specific social class. But if one has already read lots of books about this class of people, I could see that diminishing the appeal of this series a bit. There are a lot of assumptions baked into the story that aren't really questioned, such as the ubiquity of therapists. (I don't know how Stevie affords one on a barista salary.) There are also some small things in the terminology (therapy speak, for example) and in the specific type of earnestness with which the books attempt to be diverse on most axes other than social class that I suspect may grate a bit for some readers. If that's you, this is your warning. There is a third-act breakup here, just like the previous volumes. There is also a defense of the emotional punch of third-act breakups in romance novels in the book itself, put into Iris's internal monologue, so I suspect that's the author's answer to critics like myself who don't like the trope. I was less frustrated by this one because it fit the drama level of the protagonists, but I'll also know to expect a third-act breakup in any Blake novel I read in the future. But, all that said, the summary once again is that I loved this book and could not put it down. Iris is dramatic and occasionally self-destructive but has a core of earnest empathy that makes her easy to like. She's exactly the sort of extrovert who is soothing to introverts rather than draining because she carries the extrovert load of social situations. Stevie is adorably earnest and thoughtful beneath her anxiety. They two of them are wildly different and yet remarkably good together, and I loved reading their story. Highly recommended, along with the whole series. Start with Delilah Green Doesn't Care; if you like that, you're in for a treat. Content note: This book is also rather sex-forward and pretty explicit in the sex scenes, maybe a touch more than Astrid Parker Doesn't Fail. If that is or is not your thing in romance novels, be aware going in. Rating: 9 out of 10

16 December 2024

Dirk Eddelbuettel: #45: Some r-ci Updates

market monitor Welcome to post 45 in the $R^4 series! We introduced r-ci here in post #32 here nearly four years ago. It has found pretty widespread use and adoption, and we received a few kind words then (in the linked issue) and also more recently (in a follow-up comment) from which we merrily quote:
[ ] almost 3 years later on and I have had zero problems with this CI setup. For people who want reliable R software, resources like these are invaluable.
And while we followed up with post #41 about r2u for simple continuous integration, we may not have posted when we based r-ci on r2u (for the obvious Linux usage case). So let s make time now for a (comparitively smaller) update, and an update usage examples. We made two changes in the last few days. One is a (obvious in hindsight) simplification. Given that the bootstrap step was always executed, and needed no parameters, we pulled it into a new aggregated setup simply called r-ci that includes it so that it can be omitted as a step in the yaml file. Second, we recently needed Fortran on macOS too, and realized it was not installed by default so we just added that too. With that a real and used example is now as simple as the screenshot to the left (and hence one paragraph shorter). The trained eye will no doubt observe that there is nothing specific to a given repo. And that is basically the key feature: we can simply copy this file around and get fast and easy and reliable CI by taking advantage of the underlying robustness of r2u solving all dependency automagically and reliably. The option to enable macOS is also solid and compelling as the GitHub runners are fast (but more expensive in how the count against the limit of minutes so again a tradeoff to make), as is the option to run coverage if one so desires. Some of my repos do too. Take a look at the r-ci website which has more examples for the other supported CI servics it can used with, and feel free to ask questions as issue in the repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub. Please report excessive re-aggregation in third-party for-profit settings.

Russ Allbery: Review: Finders

Review: Finders, by Melissa Scott
Series: Firstborn, Lastborn #1
Publisher: Candlemark & Gleam
Copyright: 2018
ISBN: 1-936460-87-4
Format: Kindle
Pages: 409
Finders is a far future science fiction novel with cyberpunk vibes. It is the first of a series, but the second (and, so far, only other) book of the series is a prequel. It stands alone reasonably well (more on that later). Cassilde Sam is a salvor. That means she specializes in exploring ancient wrecks and ruins left behind by the Ancients and salvaging materials that can be reused. The most important of those are what are called Ancestral elements: BLUE, which can hold programming; GOLD, which which reacts to BLUE instructions; RED, which produces actions or output; and GREEN, the rarest and most valuable, which powers everything else. Cassilde and her partner Dai Winter file claims on newly-discovered or incompletely salvaged Ancestor sites and then extract elemental material and anything else of value in their small salvage ship. Cassilde is also dying. She has Lightman's, an incurable degenerative disease that can only be treated with ever-increasing quantities of GREEN. It's hard to sleep, hard to get warm, hard to breathe, and eventually she'll run out of money to pay for the GREEN and she'll die. To push that day off into the future, she and Dai need work. The good news is that the wreckage of a new Ancestor sky palace was discovered in a long orbit and will create enough salvage work for every experienced salvor in the system. The bad news is that they're not qualified to bid on it. They need a scholar with a class-one license to bid on the best sections, and they haven't had a reliable scholar since their former partner and lover Summerland Ashe picked the opposite side in the Troubles and left the Fringe for the Entente, the more densely settled and connected portion of human space. But, unexpectedly and suspiciously, Ashe may be back and offering to work with them again. So, first, I love this setting. This is far from the first SF novel that is set in the aftermath of a general collapse of human civilization and revolving around discovering lost mysteries. Most examples of that genre are post-apocalyptic novels limited to Earth or the local solar system, but Kate Elliott's Unconquerable Sun comes immediately to mind. It's also not the first space archaeology series I've read; Kristine Kathyrn Rusch's story series starting with "Diving into the Wreck" also came to mind. But I don't recall the last time I've seen the author sell the setting so effectively. This is a world with starships and spaceports and clearly advanced technology, but it feels like a post-collapse society that's built on ruins. It's not just that technology runs on half-understood Ancestral elements and states fight over control of debris fields. It's also that the society repurposes Ancestral remnants in ways that both they and the reader know weren't originally intended, and that sometimes are more ingenious or efficient than how the Ancestors probably used them. There's a creative grittiness here that reminds me of good cyberpunk. It's not just good atmospheric writing, though. Scott makes a world-building decision that is going to sound trivial when I say it, but that has brilliant implications for the rest of the setting. There was not just one collapse; there were two. The Ancestor civilization, presumed to be the first human civilization, has passed into myth, quite literally when it comes to the stories around its downfall in the aftermath of a war against AIs. After the Ancestors came the Successors, who followed a similar salvage and rebuild approach and got as far as inventing their own warp drive technology that was based on but different than the Ancestor technology. Then they also collapsed, leaving their adapted technology and salvage operations layered over Ancestor sites. Cassilde's civilization is the third human starfaring civilization, and it is very specifically the third, neither the second nor one of dozens. This has so many small but effective implications that improve this story. A fall happened twice, so it feels like a pattern that makes Cassilde's civilization paranoid, but it happened for two very different reasons, so there is room to argue against it being a pattern. Salvage is harder because of the layering of Ancestor and Successor activity. Successors had their own way of controlling technology that is not accessible to Cassilde and her crew but is also not how the technology was intended to be used, which sends small ripples of interesting complexity through the background. And salvors are competing not only against each other but also against Successor salvage operations for which they have fragmentary records. It's a beautifully effective touch. Melissa Scott has been publishing science fiction for forty years, and it shows in this book. The protagonists are older characters: established professionals with resource problems but also social connections and an earned reputation, people who are trying to do a job and live their lives, not change the world. The writing is competent, deft, and atmospheric, with the confidence of long practice, but it also has the feel of an earlier era of science fiction. I mentioned the cyberpunk influence, which shows in the grittiness of the descriptions, the marginality of the characters in society, and the background theme of repurposing and reusing technology in unintended ways. This is the sort of book that feels solidly in the center of science fiction, without the genre mixing into either fantasy or romance that has become somewhat more common, and also without the dramatics of space opera (although the reader discovers that the stakes of this novel may be higher than anyone realized). And yet, so much of this book is about navigating a complicated romantic relationship, and that's where the story structure felt a bit odd. Cassilde, Dai, and Ashe were a polyamorous triad (polyamory also shows up in Scott's excellent Roads of Heaven series), and much of the first third of the book deals with the fracturing of trust with Ashe and their renegotiation of that relationship given his return. This is refreshingly written as the thoughtful interaction of three adults who take issues of trust seriously, but that also means it's much less dramatic than it sounds, and that means this book starts exceptionally slow. Scott is going somewhere, and the slow build became engrossing around the midpoint of the book, but I had to fight to stick with it at the start. About 80% of the way through this book, I had no idea how Scott was going to wrap things up in the pages remaining and was bracing myself for some sort of series cliffhanger. This is not what happens; the plot is not fully resolved in every detail, but it reaches a conclusion of sorts that does not mandate a sequel. I did think the end was a little bit unsatisfying, though, and I want another book that explores the implications of the ending. I think it would have to be a much different book, and the tonal shift might be stark. I've had this book on my to-read list for a while and kept putting it off because I wasn't sure I was in the mood for something precarious and gritty. This turned out to be an accurate worry: this is literally a book about salvaging the pieces of something full of wonders inextricably connected to dangers. You have to be in a cyberpunk sort of mood. But I've never read a bad Melissa Scott book, and this is no exception. The simplicity and ALL-CAPSNESS of the Ancestral elements grated a bit, but apart from that, the world-building is exceptional and well worth the trip. Recommended, although be warned that, if you're like me, it may not grab you from the first page. Followed by Fallen, but that book is a prequel that does not share any protagonists. Content notes: disability and degenerative illness in a universe where magical cures are possible, so be warned if that specific thematic combination is not what you're looking for. Rating: 7 out of 10

15 December 2024

Russell Coker: Hisense 65U80G 65 Inch 8K ULED Android TV (2021)

The Aim I just bought a Hisense 65U80G 65 Inch 8K ULED Android TV (2021 model) for $1,568 including delivery. I got that deal by googling refurbished 8K TVs and finding the cheapest one I could buy. Amazon and eBay didn t have any good prices on second hand 8K TVs and new ones start at $3,000 on special. I didn t assess how Hisense compares to other TVs, as far as I could determine there was only one model of 8K TV on sale in Australia in the price range I was prepared to pay. So I won t review how this TV compares to other models but how refurbished TVs compare to other display options. I bought this because the highest resolution monitor in my price range is 5120*2160 [1]. While I could get a 5128*2880 monitor for around $1,500 paying 3* the money for 33% more pixels is bad value for money. Getting 4* the pixels for under 3* the price is good value even when it s a TV with the lower display quality that involves. Before buying this TV I read this blog post by Daniel Lawrence about using an 8K TV as a primary monitor [2]. While he has an interesting setup with a 65 TV on a large desk it s not what I plan to do at this time. My Plans for Use I don t plan to make it a main monitor. While 5120*2160 isn t as good as I like on my desk it s bearable and the quality of the display is high. High resolution isn t needed for all tasks, for example I m writing this blog post on my laptop while watching a movie on the 8K TV. One thing I d like to do with the 8K TV when I get it working as a monitor is to share the screen for team programming projects. I don t have any specific plans other than team coding projects at the moment. But it will be interesting to experiment with it when I get it working. Technical Issues with High Resolution Monitors Hardware Needed A lot of the graphic hardware out there don t support resolutions higher than 5120*2880. It seems that most laptops don t support resolutions higher than that and higher resolutions than 4K are difficult. Only quite recent and high end video cards will do 8K. Apparently the RTX 2080 is one of the oldest ones that does and that s $400 on ebay. Strangely the GPU chipset spec pages don t list the maximum resolution and there s the additional complication that the other chips might not support the resolutions that the GPU itself can support. As an aside I don t use NVidia cards for regular workstations due to reliability problems. But they are good for ML work and for special purpose systems. Interface Versions To do 8K video it seems that you need HDMI 2.1 (or maybe 2.0 with 4:2:0 chroma subsampling) or DisplayPort 1.3 for 30Hz with 24bit color and 2.0 for higher refresh rates. But using a particular version of the interface doesn t require supporting all the resolutions that it might support. This TV has HDMI 2.1 inputs, I ve bought an adaptor cable that does DisplayPort 1.4 to HDMI 2.1 at 8K resolution. So I need a video card that does DisplayPort 1.4 or HDMI 2.1 output. That doesn t mean that the card will work, but it could work. It s a pity that no-one has made a USB-C video controller that has a basic frame-buffer supporting 8K and the minimal GPU capabilities. The consensus of opinion is that no games will run well at 8K at this time so anyone using 8K resolution doesn t need GPU power unless it s for ML stuff. I m thinking of making a system that can be used as a ML server and X/Wayland server so a GPU with a decent amount of RAM and compute power would be good. I m not particularly interested in spending $1,500+ to get a GPU that can drive a $1,568 TV. I m looking into getting a RTX A2000 with 12G of RAM which should be adequate for ML experiments and can handle 8K@60Hz output. I ve ordered a DisplayPort to HDMI converter cable so if I get a DisplayPort card it will work. Software Support When I first got started with 4K monitors I had significant problems in adjusting the UI to be usable. The support for scaling software is much better now than it was then and 8K 65 has a lower DPI than 4K 32 . So I hope this won t be an issue. Progress So Far My first Hisense 8K TV stopped working properly. It would change to a mostly white screen after being used for some time. The screen would change in ways that correlate to changes in what should appear, but not in a way that was usable. It was just a different pattern of white blobs when I changed to a menu view not anything that allowed using it. I presume that this was the problem that drove a need for refurbishment as when I first got the TV it was still signed in to Google accounts for YouTube and to NetFlix. Best Buy Electrical was good about providing a quick replacement, they took away the old TV and delivered a new one on the same visit and it s now working well. I ve obtained a NVidia card that can allegedly do 8K output and a combination of cables that might be able to carry an 8K signal. Now I just need to get the NVidia drivers to not cause a kernel panic to get things to work.

13 December 2024

Emanuele Rocca: Murder Mystery: GCC Builds Failing After sbuild Refactoring

This is the story of an investigation conducted by Jochen Sprickerhof, Helmut Grohne, and myself. It was true teamwork, and we would have not reached the bottom of the issue working individually. We think you will find it as interesting and fun as we did, so here is a brief writeup. A few of the steps mentioned here took several days, others just a few minutes. What is described as a natural progression of events did not always look very obvious at the moment at all.
Let us go through the Six Stages of Debugging together.

Stage 1: That cannot happen
Official Debian GCC builds start failing on multiple architectures in late November.
The build error happens on the build servers when running the testuite, but we know this cannot happen. GCC builds are not meant to fail in case of testsuite failures! Return codes are not making the build fail, make is being called with -k, it just cannot happen.
A lot of the GCC tests are always failing in fact, and an extensive log of the results is posted to the debian-gcc mailing list, but the packages always build fine regardless.
On the build daemons, build failures take several hours.

Stage 2: That does not happen on my machine
Building on my machine running Bookworm is just fine. The Build Daemons run Bookworm and use a Sid chroot for the build environment, just like I am. Same kernel.
The only obvious difference between my setup and the Debian buildds is that I am using sbuild 0.85.0 from bookworm, and the buildds have 0.86.3~bpo12+1 from bookworm-backports. Trying again with 0.86.3~bpo12+1, the build fails on my system too. The build daemons were updated to the bookworm-backports version of sbuild at some point in late November. Ha.

Stage 3: That should not happen
There are quite a few sbuild versions in between 0.85.0 and 0.86.3~bpo12+1, but looking at recent sbuild bugs shows that sbuild 0.86.0 was breaking "quite a number of packages". Indeed, with 0.86.0 the build still fails. Trying the version immediately before, 0.85.11, the build finishes correctly. This took more time than it sounds, one run including the tests takes several hours. We need a way to shorten this somehow.
The Debian packaging of GCC allows to specify which languages you may want to skip, and by default it builds Ada, Go, C, C++, D, Fortran, Objective C, Objective C++, M2, and Rust. When running the tests sequentially, the build logs stop roughly around the tests of a runtime library for D, libphobos. So can we still reproduce the failure by skipping everything except for D? With DEB_BUILD_OPTIONS=nolang=ada,go,c,c++,fortran,objc,obj-c++,m2,rust the build still fails, and it fails faster than before. Several minutes, not hours. This is progress, and time to file a bug. The report contains massive spoilers, so no link. :-)

Stage 4: Why does that happen?
Something is causing the build to end prematurely. It s not the OOM killer, and the kernel does not have anything useful to say in the logs. Can it be that the D language tests are sending signals to some process, and that is what s killing make ? We start tracing signals sent with bpftrace by writing the following script, signals.bt:
tracepoint:signal:signal_generate  
    printf("%s PID %d (%s) sent signal %d to PID %d\n", comm, pid, args->sig, args->pid);
 
And executing it with sudo bpftrace signals.bt.
The build takes its sweet time, and it fails. Looking at the trace output there s a suspicious process.exe terminating stuff.
process.exe (PID: 2868133) sent signal 15 to PID 711826
That looks interesting, but we have no clue what PID 711826 may be. Let s change the script a bit, and trace signals received as well.
tracepoint:signal:signal_generate  
    printf("PID %d (%s) sent signal %d to %d\n", pid, comm, args->sig, args->pid);
 
tracepoint:signal:signal_deliver  
    printf("PID %d (%s) received signal %d\n", pid, comm, args->sig);
 
The working version of sbuild was using dumb-init, whereas the new one features a little init in perl. We patch the current version of sbuild by making it use dumb-init instead, and trace two builds: one with the perl init, one with dumb-init.
Here are the signals observed when building with dumb-init.
PID 3590011 (process.exe) sent signal 2 to 3590014
PID 3590014 (sleep) received signal 9
PID 3590011 (process.exe) sent signal 15 to 3590063
PID 3590063 (std.process tem) received signal 9
PID 3590011 (process.exe) sent signal 9 to 3590065
PID 3590065 (std.process tem) received signal 9
And this is what happens with the new init in perl:
PID 3589274 (process.exe) sent signal 2 to 3589291
PID 3589291 (sleep) received signal 9
PID 3589274 (process.exe) sent signal 15 to 3589338
PID 3589338 (std.process tem) received signal 9
PID 3589274 (process.exe) sent signal 9 to 3589340
PID 3589340 (std.process tem) received signal 9
PID 3589274 (process.exe) sent signal 15 to 3589341
PID 3589274 (process.exe) sent signal 15 to 3589323
PID 3589274 (process.exe) sent signal 15 to 3589320
PID 3589274 (process.exe) sent signal 15 to 3589274
PID 3589274 (process.exe) received signal 9
PID 3589341 (sleep) received signal 9
PID 3589273 (sbuild-usernsex) sent signal 9 to 3589320
PID 3589273 (sbuild-usernsex) sent signal 9 to 3589323
There are a few additional SIGTERM being sent when using the perl init, that s helpful. At this point we are fairly convinced that process.exe is worth additional inspection. The source code of process.d shows something interesting:
1221 @system unittest
1222  
[...]
1247     auto pid = spawnProcess(["sleep", "10000"],
[...]
1260     // kill the spawned process with SIGINT
1261     // and send its return code
1262     spawn((shared Pid pid)  
1263         auto p = cast() pid;
1264         kill(p, SIGINT);
So yes, there s our sleep and the SIGINT (signal 2) right in the unit tests of process.d, just like we have observed in the bpftrace output.
Can we study the behavior of process.exe in isolation, separatedly from the build? Indeed we can. Let s take the executable from a failed build, and try running it under /usr/libexec/sbuild-usernsexec.
First, we prepare a chroot inside a suitable user namespace:
unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs
cd /tmp/rootfs
cat /home/ema/.cache/sbuild/unstable-arm64.tar   unshare --map-auto --setuid 0 --setgid 0 tar xf  -
unshare --map-auto --setuid 0 --setgid 0 mkdir /tmp/rootfs/whatever
unshare --map-auto --setuid 0 --setgid 0 cp process.exe /tmp/rootfs/
Now we can run process.exe on its own using the perl init, and trace signals at will:
/usr/libexec/sbuild-usernsexec --pivotroot --nonet u:0:100000:65536  g:0:100000:65536 /tmp/rootfs ema /whatever -- /process.exe
We can compare the behavior of the perl init vis-a-vis the one using dumb-init in milliseconds instead of minutes.

Stage 5: Oh, I see.
Why does process.exe send more SIGTERMs when using the perl init is now the big question. We have a simple reproducer, so this is where using strace becomes possible.
sudo strace --user ema --follow-forks -o sbuild-dumb-init.strace ./sbuild-usernsexec-dumb-init --pivotroot --nonet u:0:100000:65536  g:0:100000:65536 /tmp/dumbroot ema /whatever -- /process.exe
We start comparing the strace output of dumb-init with that of perl-init, looking in particular for different calls to kill.
Here is what process.exe does under dumb-init:
3593883 kill(-2, SIGTERM)               = -1 ESRCH (No such process)
No such process. Under perl-init instead:
3593777 kill(-2, SIGTERM <unfinished ...>
The process is there under perl-init!
That is a kill with negative pid. From the kill(2) man page:
If pid is less than -1, then sig is sent to every process in the process group whose ID is -pid.
It would have been very useful to see this kill with negative pid in the output of bpftrace, why didn t we? The tracepoint used, tracepoint:signal:signal_generate, shows when signals are actually being sent, and not the syscall being called. To confirm, one can trace tracepoint:syscalls:sys_enter_kill and see the negative PIDs, for example:
PID 312719 (bash) sent signal 2 to -312728
The obvious question at this point is: why is there no process group 2 when using dumb-init?

Stage 6: How did that ever work?
We know that process.exe sends a SIGTERM to every process in the process group with ID 2. To find out what this process group may be, we spawn a shell with dumb-init and observe under /proc PIDs 1, 16, and 17. With perl-init we have 1, 2, and 17. When running dumb-init, there are a few forks before launching the program, explaining the difference. Looking at /proc/2/cmdline we see that it s bash, ie. the program we are running under perl-init. When building a package, that is dpkg-buildpackage itself.
The test is accidentally killing its own process group.
Now where does this -2 come from in the test?
2363     // Special values for _processID.
2364     enum invalid = -1, terminated = -2;
Oh. -2 is used as a special value for PID, meaning "terminated". And there s a call to kill() later on:
2694     do   s = tryWait(pid);   while (!s.terminated);
[...]
2697     assertThrown!ProcessException(kill(pid));
What sets pid to terminated you ask?
Here is tryWait:
2568 auto tryWait(Pid pid) @safe
2569  
2570     import std.typecons : Tuple;
2571     assert(pid !is null, "Called tryWait on a null Pid.");
2572     auto code = pid.performWait(false);
And performWait:
2306         _processID = terminated;
The solution, dear reader, is not to kill.
PS: the bug report with spoilers for those interested is #1089007.

12 December 2024

Matthew Garrett: When should we require that firmware be free?

The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.

Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.

How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.

But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.

So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?

I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.

Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.

This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).

RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.

The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.

For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.

But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.

As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.

Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.

[1] Yes yes SMM

comment count unavailable comments

Next.