Search Results: "agi"

1 May 2024

Guido G nther: Free Software Activities April 2024

A short status update of what happened on my side last month. Maintenance and code review keep to be the top time sinks (in a positive way). If you want to support my work see donations.

Russ Allbery: Review: To Each This World

Review: To Each This World, by Julie E. Czerneda
Publisher: DAW
Copyright: November 2022
ISBN: 0-7564-1543-8
Format: Kindle
Pages: 676
To Each This World is a standalone science fiction novel. Henry m'Yama t'Nowak is the Arbiter of New Earth. This is somewhat akin to a president, but only in very specific ways. Henry's job is to deal with the Kmet. New Earth was settled by a slower-than-light colony ship from old Earth, our Earth. It is, so far as they know, the last of humanity in the universe. Origin Earth fell silent hundreds of years previous, before the colonists even landed. New Earth is now a carefully and thoughtfully managed world where humans survived, thrived, and at one point sent out six slower-than-light colony ships of its own. All were feared lost after a rushed launch due to a solar storm. As this story opens, a probe from one of those ships arrives. This is cause for rejoicing, but there are two small problems. The first is that the culture of New Earth has changed drastically since the days when they launched the Halcyon colony ships. New Earth is now part of the Duality, a new alliance with aliens painstakingly negotiated after their portal appeared in orbit. The Kmet were peaceful, eager to form an alliance and offer new technology, although they struggled with concepts such as individuality and insisted on interacting only with the Arbiter. Their technological gifts and the apparent loss of the Halcyon colony ships refocused New Earth on safety and caution. This unexpected message is a somewhat tricky political problem, a reminder of the path not taken. The other small problem is that the reaction of the Kmet to this message is... dramatic. This book has several problems, but the most serious is that it is simply too long. If you have read any other Czerneda novels, you know that she tends towards sprawling world-building, but usually there are enough twists and turns in the plot to keep the story moving while the protagonists slowly puzzle out the scientific mysteries. To Each This World is not sufficiently twisty for 676 pages. I think you could have cut half the novel without losing any major plot points. The interesting parts of this book, to me, were figuring out what's going on with the Kmet, some of the political tensions within the New Earth government, and understanding what Henry and Pilot Killian's story had to do with the apparently-unrelated but intriguing interludes following Beth Seeker in a strange place called Doublet. All that stuff is in here, but it's alongside a whole lot of Henry wrestling with lifeboat ethics in situations where he thinks he needs to lie to and manipulate people for their own good. We also get several extended tours of societies that, while vaguely interesting in a science fiction world-building way, have essentially nothing to do with the plot. We also get a whole lot of Henry's eagerly helpful AI polymorph Flip. I wanted to like this character, and I occasionally managed, but I felt like there was a constant mismatch between, in hindsight, how Czerneda meant for me to see Flip and what I thought she was signaling while I was reading. I wanted Flip to either be a fascinatingly weird companion or to be directly relevant to the plot, but instead there were hundreds of pages of unnerving creepiness mixed with obsequiousness and emotional neediness, all of which I think I read more into than Czerneda had intended. The overall experience was more exhausting than fun. The core of the plot is solid, and if you like SF novels built around world-building and scientific mysteries, there's a lot here to enjoy. I think Czerneda's Species Imperative series (starting with Survival) is a better execution of some of the same ideas, but I liked that series a lot and was willing to read another take on it. Czerneda is one of the SF writers who takes biology seriously and is willing to write very alien aliens, and that leads to a few satisfying twists. Also, Beth Seeker is a great character (I wish we'd seen more of her), and Killian, while a bit generic, is a serviceable protagonist when Czerneda needs someone to go poke things with a stick. Henry... I'm not sure what I think of Henry, and your enjoyment of this book may depend on how much you click with him. Henry is a diplomat and an extrovert. His greatest joy and talent is talking to people, navigating political situations, and negotiating. Science fiction is full of protagonists who should be this character, but they rarely are this character, probably because a lot of writers are introverts. I think Czerneda deserves real credit for making her charismatic politician sufficiently accurate that his thought processes occasionally felt alien. For me, Henry was easiest to appreciate when Killian was the viewpoint protagonist and I could look at him through someone else's eyes, but Henry's viewpoint mostly worked as well. There's a lot of competence porn enjoyment in watching him do his thing. The problem for me is that I thought several of his actions were unforgivably unethical, but no one in the book who matters seems to agree. I can see why he reached those unethical decisions, but they were profound violations of consent. He directly lies to people because he thinks telling the truth would be too risky and not get them to do what he wants them to do, and Czerneda sets up the story to imply that he might be right. This is not necessarily a bad choice in a novel, but the author has to do some work to bring me along, and Czerneda didn't do enough of that work. I kept wanting there to be some twist or sting or complication that forced Henry to come to terms with what he was doing, but it never happens. He has to pick between two moral principles that I consider rather finely balanced, if not tilted in the opposite direction that he does, and he treats one principle as inviolable and the other as mostly unimportant. The plans he makes on that basis work fine, and those on the other side of that decision are never heard from again. It left a bad taste in my mouth, particularly given how much of the book is built around Henry making tough, tricky decisions under pressure. I don't know about this book. I have a lot of mixed feelings. Parts of it I quite enjoyed. Parts of it I mostly enjoyed but wish were much less dragged out. Parts of it frustrated or bored me. It's one of those books where the more I thought about it after reading it, the more the parts I disliked annoyed me. If you like Czerneda's style of world-building and biology, and if you have more tolerance for Henry's decisions than I did, you may well like this, but read Species Imperative first. I should probably also warn that there is a lot of magical technology in this book that blatantly violates some core principles of physics. I have a high tolerance for that sort of thing, but if you don't, you're going to be grumbling. Rating: 6 out of 10

Matthew Palmer: The Mediocre Programmer's Guide to Rust

Me: Hi everyone, my name s Matt, and I m a mediocre programmer. Everyone: Hi, Matt. Facilitator: Are you an alcoholic, Matt? Me: No, not since I stopped reading Twitter. Facilitator: Then I think you re in the wrong room.
Yep, that s my little secret I m a mediocre programmer. The definition of the word hacker I most closely align with is someone who makes furniture with an axe . I write simple, straightforward code because trying to understand complexity makes my head hurt. Which is why I ve always avoided the more academic languages, like OCaml, Haskell, Clojure, and so on. I know they re good languages people far smarter than me are building amazing things with them but the time I hear the word endofunctor , I ve lost all focus (and most of my will to live). My preferred languages are the ones that come with less intellectual overhead, like C, PHP, Python, and Ruby. So it s interesting that I ve embraced Rust with significant vigour. It s by far the most complicated language that I feel at least vaguely comfortable with using in anger . Part of that is that I ve managed to assemble a set of principles that allow me to almost completely avoid arguing with Rust s dreaded borrow checker, lifetimes, and all the rest of the dark, scary corners of the language. It s also, I think, that Rust helps me to write better software, and I can feel it helping me (almost) all of the time. In the spirit of helping my fellow mediocre programmers to embrace Rust, I present the principles I ve assembled so far.

Neither a Borrower Nor a Lender Be If you know anything about Rust, you probably know about the dreaded borrow checker . It s the thing that makes sure you don t have two pieces of code trying to modify the same data at the same time, or using a value when it s no longer valid. While Rust s borrowing semantics allow excellent performance without compromising safety, for us mediocre programmers it gets very complicated, very quickly. So, the moment the compiler wants to start talking about explicit lifetimes , I shut it up by just using owned values instead. It s not that I never borrow anything; I have some situations that I know are borrow-safe for the mediocre programmer (I ll cover those later). But any time I m not sure how things will pan out, I ll go straight for an owned value. For example, if I need to store some text in a struct or enum, it s going straight into a String. I m not going to start thinking about lifetimes and &'a str; I ll leave that for smarter people. Similarly, if I need a list of things, it s a Vec<T> every time no &'b [T] in my structs, thank you very much.

Attack of the Clones Following on from the above, I ve come to not be afraid of .clone(). I scatter them around my code like seeds in a field. Life s too short to spend time trying to figure out who s borrowing what from whom, if I can just give everyone their own thing. There are warnings in the Rust book (and everywhere else) about how a clone can be expensive . While it s true that, yes, making clones of data structures consumes CPU cycles and memory, it very rarely matters. CPU cycles are (usually) plentiful and RAM (usually) relatively cheap. Mediocre programmer mental effort is expensive, and not to be spent on premature optimisation. Also, if you re coming from most any other modern language, Rust is already giving you so much more performance that you re probably ending up ahead of the game, even if you .clone() everything in sight. If, by some miracle, something I write gets so popular that the expense of all those spurious clones becomes a problem, it might make sense to pay someone much smarter than I to figure out how to make the program a zero-copy masterpiece of efficient code. Until then clone early and clone often, I say!

Derive Macros are Powerful Magicks If you start .clone()ing everywhere, pretty quickly you ll be hit with this error:

error[E0599]: no method named  clone  found for struct  Foo  in the current scope

This is because not everything can be cloned, and so if you want your thing to be cloned, you need to implement the method yourself. Well sort of. One of the things that I find absolutely outstanding about Rust is the derive macro . These allow you to put a little marker on a struct or enum, and the compiler will write a bunch of code for you! Clone is one of the available so-called derivable traits , so you add #[derive(Clone)] to your structs, and poof! you can .clone() to your heart s content. But there are other things that are commonly useful, and so I ve got a set of traits that basically all of my data structures derive:

#[derive(Clone, Debug, Default)]
struct Foo  
    // ...
 

Every time I write a struct or enum definition, that line #[derive(Clone, Debug, Default)] goes at the top. The Debug trait allows you to print a debug representation of the data structure, either with the dbg!() macro, or via the :? format in the format!() macro (and anywhere else that takes a format string). Being able to say what exactly is that? comes in handy so often, not having a Debug implementation is like programming with one arm tied behind your Aeron. Meanwhile, the Default trait lets you create an empty instance of your data structure, with all of the fields set to their own default values. This only works if all the fields themselves implement Default, but a lot of standard types do, so it s rare that you ll define a structure that can t have an auto-derived Default. Enums are easily handled too, you just mark one variant as the default:

#[derive(Clone, Debug, Default)]
enum Bar  
    Something(String),
    SomethingElse(i32),
    #[default]   // <== mischief managed
    Nothing,
 

Borrowing is OK, Sometimes While I previously said that I like and usually use owned values, there are a few situations where I know I can borrow without angering the borrow checker gods, and so I m comfortable doing it. The first is when I need to pass a value into a function that only needs to take a little look at the value to decide what to do. For example, if I want to know whether any values in a Vec<u32> are even, I could pass in a Vec, like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(numbers)  
        println!("EVENS!");
     
 
fn has_evens(numbers: Vec<u32>) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

Howver, this gets ugly if I m going to use numbers later, like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(numbers)  
        println!("EVENS!");
     
    // Compiler complains about "value borrowed here after move"
    println!("Sum:  ", numbers.iter().sum::<u32>());
 
fn has_evens(numbers: Vec<u32>) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

Helpfully, the compiler will suggest I use my old standby, .clone(), to fix this problem. But I know that the borrow checker won t have a problem with lending that Vec<u32> into has_evens() as a borrowed slice, &[u32], like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(&numbers)  
        println!("EVENS!");
     
 
fn has_evens(numbers: &[u32]) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

The general rule I ve got is that if I can take advantage of lifetime elision (a fancy term meaning the compiler can figure it out ), I m probably OK. In less fancy terms, as long as the compiler doesn t tell me to put 'a anywhere, I m in the green. On the other hand, the moment the compiler starts using the words explicit lifetime , I nope the heck out of there and start cloning everything in sight. Another example of using lifetime elision is when I m returning the value of a field from a struct or enum. In that case, I can usually get away with returning a borrowed value, knowing that the caller will probably just be taking a peek at that value, and throwing it away before the struct itself goes out of scope. For example:

struct Foo  
    id: u32,
    desc: String,
 
impl Foo  
    fn description(&self) -> &str  
        &self.desc
     
 

Returning a reference from a function is practically always a mortal sin for mediocre programmers, but returning one from a struct method is often OK. In the rare case that the caller does want the reference I return to live for longer, they can always turn it into an owned value themselves, by calling .to_owned().

Avoid the String Tangle Rust has a couple of different types for representing strings String and &str being the ones you see most often. There are good reasons for this, however it complicates method signatures when you just want to take some sort of bunch of text , and don t care so much about the messy details. For example, let s say we have a function that wants to see if the length of the string is even. Using the logic that since we re just taking a peek at the value passed in, our function might take a string reference, &str, like this:

fn is_even_length(s: &str) -> bool  
    s.len() % 2 == 0
 

That seems to work fine, until someone wants to check a formatted string:

fn main()  
    // The compiler complains about "expected  &str , found  String "
    if is_even_length(format!("my string is  ", std::env::args().next().unwrap()))  
        println!("Even length string");
     
 

Since format! returns an owned string, String, rather than a string reference, &str, we ve got a problem. Of course, it s straightforward to turn the String from format!() into a &str (just prefix it with an &). But as mediocre programmers, we can t be expected to remember which sort of string all our functions take and add & wherever it s needed, and having to fix everything when the compiler complains is tedious. The converse can also happen: a method that wants an owned String, and we ve got a &str (say, because we re passing in a string literal, like "Hello, world!"). In this case, we need to use one of the plethora of available turn this into a String mechanisms (.to_string(), .to_owned(), String::from(), and probably a few others I ve forgotten), on the value before we pass it in, which gets ugly real fast. For these reasons, I never take a String or an &str as an argument. Instead, I use the Power of Traits to let callers pass in anything that is, or can be turned into, a string. Let us have some examples. First off, if I would normally use &str as the type, I instead use impl AsRef<str>:

fn is_even_length(s: impl AsRef<str>) -> bool  
    s.as_ref().len() % 2 == 0
 

Note that I had to throw in an extra as_ref() call in there, but now I can call this with either a String or a &str and get an answer. Now, if I want to be given a String (presumably because I plan on taking ownership of the value, say because I m creating a new instance of a struct with it), I use impl Into<String> as my type:

struct Foo  
    id: u32,
    desc: String,
 
impl Foo  
    fn new(id: u32, desc: impl Into<String>) -> Self  
        Self   id, desc: desc.into()  
     
 

We have to call .into() on our desc argument, which makes the struct building a bit uglier, but I d argue that s a small price to pay for being able to call both Foo::new(1, "this is a thing") and Foo::new(2, format!("This is a thing named name ")) without caring what sort of string is involved.

Always Have an Error Enum Rust s error handing mechanism (Results everywhere), along with the quality-of-life sugar surrounding it (like the short-circuit operator, ?), is a delightfully ergonomic approach to error handling. To make life easy for mediocre programmers, I recommend starting every project with an Error enum, that derives thiserror::Error, and using that in every method and function that returns a Result. How you structure your Error type from there is less cut-and-dried, but typically I ll create a separate enum variant for each type of error I want to have a different description. With thiserror, it s easy to then attach those descriptions:

#[derive(Clone, Debug, thiserror::Error)]
enum Error  
    #[error(" 0  caught fire")]
    Combustion(String),
    #[error(" 0  exploded")]
    Explosion(String),
 

I also implement functions to create each error variant, because that allows me to do the Into<String> trick, and can sometimes come in handy when creating errors from other places with .map_err() (more on that later). For example, the impl for the above Error would probably be:

impl Error  
    fn combustion(desc: impl Into<String>) -> Self  
        Self::Combustion(desc.into())
     
    fn explosion(desc: impl Into<String>) -> Self  
        Self::Explosion(desc.into())
     
 

It s a tedious bit of boilerplate, and you can use the thiserror-ext crate s thiserror_ext::Construct derive macro to do the hard work for you, if you like. It, too, knows all about the Into<String> trick.

Banish map_err (well, mostly) The newer mediocre programmer, who is just dipping their toe in the water of Rust, might write file handling code that looks like this:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error>  
    let mut f = File::open(name.as_ref())
        .map_err( e  Error::FileOpenError(name.as_ref().to_string(), e))?;
    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        .map_err( e  Error::ReadError(e))?;
    String::from_utf8(buf)
        .map_err( e  Error::EncodingError(e))?
        .parse::<u32>()
        .map_err( e  Error::ParseError(e))
 

This works great (or it probably does, I haven t actually tested it), but there are a lot of .map_err() calls in there. They take up over half the function, in fact. With the power of the From trait and the magic of the ? operator, we can make this a lot tidier. First off, assume we ve written boilerplate error creation functions (or used thiserror_ext::Construct to do it for us)). That allows us to simplify the file handling portion of the function a bit:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error>  
    let mut f = File::open(name.as_ref())
        // We've dropped the  .to_string()  out of here...
        .map_err( e  Error::file_open_error(name.as_ref(), e))?;
    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        // ... and the explicit parameter passing out of here
        .map_err(Error::read_error)?;
    // ...

If that latter .map_err() call looks weird, without the e and such, it s passing a function-as-closure, which just saves on a few characters typing. Just because we re mediocre, doesn t mean we re not also lazy. Next, if we implement the From trait for the other two errors, we can make the string-handling lines significantly cleaner. First, the trait impl:

impl From<std::string::FromUtf8Error> for Error  
    fn from(e: std::string::FromUtf8Error) -> Self  
        Self::EncodingError(e)
     
 
impl From<std::num::ParseIntError> for Error  
    fn from(e: std::num::ParseIntError) -> Self  
        Self::ParseError(e)
     
 

(Again, this is boilerplate that can be autogenerated, this time by adding a #[from] tag to the variants you want a From impl on, and thiserror will take care of it for you) In any event, no matter how you get the From impls, once you have them, the string-handling code becomes practically error-handling-free:

    Ok(
        String::from_utf8(buf)?
            .parse::<u32>()?
    )

The ? operator will automatically convert the error from the types returned from each method into the return error type, using From. The only tiny downside to this is that the ? at the end strips the Result, and so we ve got to wrap the returned value in Ok() to turn it back into a Result for returning. But I think that s a small price to pay for the removal of those .map_err() calls. In many cases, my coding process involves just putting a ? after every call that returns a Result, and adding a new Error variant whenever the compiler complains about not being able to convert some new error type. It s practically zero effort outstanding outcome for the mediocre programmer.

Just Because You re Mediocre, Doesn t Mean You Can t Get Better To finish off, I d like to point out that mediocrity doesn t imply shoddy work, nor does it mean that you shouldn t keep learning and improving your craft. One book that I ve recently found extremely helpful is Effective Rust, by David Drysdale. The author has very kindly put it up to read online, but buying a (paper or ebook) copy would no doubt be appreciated. The thing about this book, for me, is that it is very readable, even by us mediocre programmers. The sections are written in a way that really clicked with me. Some aspects of Rust that I d had trouble understanding for a long time such as lifetimes and the borrow checker, and particularly lifetime elision actually made sense after I d read the appropriate sections.

Finally, a Quick Beg I m currently subsisting on the kindness of strangers, so if you found something useful (or entertaining) in this post, why not buy me a refreshing beverage? It helps to know that people like what I m doing, and helps keep me from having to sell my soul to a private equity firm.

25 April 2024

Petter Reinholdtsen: 45 orphaned Debian packages moved to git, 391 to go

Nine days ago, I started migrating orphaned Debian packages with no version control system listed in debian/control of the source to git. At the time there were 438 such packages. Now there are 391, according to the UDD. In reality it is slightly less, as there is a delay between uploads and UDD updates. In the nine days since, I have thus been able to work my way through ten percent of the packages. I am starting to run out of steam, and hope someone else will also help brushing some dust of these packages. Here is a recipe how to do it. I start by picking a random package by querying the UDD for a list of 10 random packages from the set of remaining packages:
PGPASSWORD="udd-mirror" psql --port=5432 --host=udd-mirror.debian.net \
  --username=udd-mirror udd -c "select source from sources \
   where release = 'sid' and (vcs_url ilike '%anonscm.debian.org%' \
   OR vcs_browser ilike '%anonscm.debian.org%' or vcs_url IS NULL \
   OR vcs_browser IS NULL) AND maintainer ilike '%packages@qa.debian.org%' \
   order by random() limit 10;"
Next, I visit http://salsa.debian.org/debian and search for the package name, to ensure no git repository already exist. If it does, I clone it and try to get it to an uploadable state, and add the Vcs-* entries in d/control to make the repository more widely known. These packages are a minority, so I will not cover that use case here. For packages without an existing git repository, I run the following script debian-snap-to-salsa to prepare a git repository with the existing packaging.
#!/bin/sh
#
# See also https://bugs.debian.org/804722#31
set -e
# Move to this Standards-Version.
SV_LATEST=4.7.0
PKG="$1"
if [ -z "$PKG" ]; then
    echo "usage: $0 "
    exit 1
fi
if [ -e "$ PKG -salsa" ]; then
    echo "error: $ PKG -salsa already exist, aborting."
    exit 1
fi
if [ -z "ALLOWFAILURE" ] ; then
    ALLOWFAILURE=false
fi
# Fetch every snapshotted source package.  Manually loop until all
# transfers succeed, as 'gbp import-dscs --debsnap' do not fail on
# download failures.
until debsnap --force -v $PKG   $ALLOWFAILURE ; do sleep 1; done
mkdir $ PKG -salsa; cd $ PKG -salsa
git init
# Specify branches to override any debian/gbp.conf file present in the
# source package.
gbp import-dscs  --debian-branch=master --upstream-branch=upstream \
    --pristine-tar ../source-$PKG/*.dsc
# Add Vcs pointing to Salsa Debian project (must be manually created
# and pushed to).
if ! grep -q ^Vcs- debian/control ; then
    awk "BEGIN   s=1   /^\$/   if (s==1)   print \"Vcs-Browser: https://salsa.debian.org/debian/$PKG\"; print \"Vcs-Git: https://salsa.debian.org/debian/$PKG.git\"  ; s=0     print  " < debian/control > debian/control.new && mv debian/control.new debian/control
    git commit -m "Updated vcs in d/control to Salsa." debian/control
fi
# Tell gbp to enforce the use of pristine-tar.
inifile +inifile debian/gbp.conf +create +section DEFAULT +key pristine-tar +value True
git add debian/gbp.conf
git commit -m "Added d/gbp.conf to enforce the use of pristine-tar." debian/gbp.conf
# Update to latest Standards-Version.
SV="$(grep ^Standards-Version: debian/control awk ' print $2 ')"
if [ $SV_LATEST != $SV ]; then
    sed -i "s/\(Standards-Version: \)\(.*\)/\1$SV_LATEST/" debian/control
    git commit -m "Updated Standards-Version from $SV to $SV_LATEST." debian/control
fi
if grep -q pkg-config debian/control; then
    sed -i s/pkg-config/pkgconf/ debian/control
    git commit -m "Replaced obsolete pkg-config build dependency with pkgconf." debian/control
fi
if grep -q libncurses5-dev debian/control; then
    sed -i s/libncurses5-dev/libncurses-dev/ debian/control
    git commit -m "Replaced obsolete libncurses5-dev build dependency with libncurses-dev." debian/control
fi
Some times the debsnap script fail to download some of the versions. In those cases I investigate, and if I decide the failing versions will not be missed, I call it using ALLOWFAILURE=true to ignore the problem and create the git repository anyway. With the git repository in place, I do a test build (gbp buildpackage) to ensure the build is actually working. If it does not I pick a different package, or if the build failure is trivial to fix, I fix it before continuing. At this stage I revisit http://salsa.debian.org/debian and create the project under this group for the package. I then follow the instructions to publish the local git repository. Here is from a recent example:
git remote add origin git@salsa.debian.org:debian/perl-byacc.git
git push --set-upstream origin master upstream pristine-tar
git push --tags
With a working build, I have a look at the build rules if I want to remove some more dust. I normally try to move to debhelper compat level 13, which involves removing debian/compat and modifying debian/control to build depend on debhelper-compat (=13). I also test with 'Rules-Requires-Root: no' in debian/control and verify in debian/rules that hardening is enabled, and include all of these if the package still build. If it fail to build with level 13, I try with 12, 11, 10 and so on until I find a level where it build, as I do not want to spend a lot of time fixing build issues. Some times, when I feel inspired, I make sure debian/copyright is converted to the machine readable format, often by starting with 'debhelper -cc' and then cleaning up the autogenerated content until it matches realities. If I feel like it, I might also clean up non-dh-based debian/rules files to use the short style dh build rules. Once I have removed all the dust I care to process for the package, I run 'gbp dch' to generate a debian/changelog entry based on the commits done so far, run 'dch -r' to switch from 'UNRELEASED' to 'unstable' and get an editor to make sure the 'QA upload' marker is in place and that all long commit descriptions are wrapped into sensible lengths, run 'debcommit --release -a' to commit and tag the new debian/changelog entry, run 'debuild -S' to build a source only package, and 'dput ../perl-byacc_2.0-10_source.changes' to do the upload. During the entire process, and many times per step, I run 'debuild' to verify the changes done still work. I also some times verify the set of built files using 'find debian' to see if I can spot any problems (like no file in usr/bin any more or empty package). I also try to fix all lintian issues reported at the end of each 'debuild' run. If I find Debian specific patches, I try to ensure their metadata is fairly up to date and some times I even try to reach out to upstream, to make the upstream project aware of the patches. Most of my emails bounce, so the success rate is low. For projects with no Homepage entry in debian/control I try to track down one, and for packages with no debian/watch file I try to create one. But at least for some of the packages I have been unable to find a functioning upstream, and must skip both of these. If I could handle ten percent in nine days, twenty people could complete the rest in less then five days. I use approximately twenty minutes per package, when I have twenty minutes spare time to spend. Perhaps you got twenty minutes to spare too? As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b. Update 2024-05-04: There is an updated edition of my migration script, last updated 2024-05-04.

Russ Allbery: Review: Nation

Review: Nation, by Terry Pratchett
Publisher: Harper
Copyright: 2008
Printing: 2009
ISBN: 0-06-143303-9
Format: Trade paperback
Pages: 369
Nation is a stand-alone young adult fantasy novel. It was published in the gap between Discworld novels Making Money and Unseen Academicals. Nation starts with a plague. The Russian influenza has ravaged Britain, including the royal family. The next in line to the throne is off on a remote island and must be retrieved and crowned as soon as possible, or an obscure provision in Magna Carta will cause no end of trouble. The Cutty Wren is sent on this mission, carrying the Gentlemen of Last Resort. Then comes the tsunami. In the midst of fire raining from the sky and a wave like no one has ever seen, Captain Roberts tied himself to the wheel of the Sweet Judy and steered it as best he could, straight into an island. The sole survivor of the shipwreck: one Ermintrude Fanshaw, daughter of the governor of some British island possessions. Oh, and a parrot. Mau was on the Boys' Island when the tsunami came, going through his rite of passage into manhood. He was to return to the Nation the next morning and receive his tattoos and his adult soul. He survived in a canoe. No one else in the Nation did. Terry Pratchett considered Nation to be his best book. It is not his best book, at least in my opinion; it's firmly below the top tier of Discworld novels, let alone Night Watch. It is, however, an interesting and enjoyable book that tackles gods and religion with a sledgehammer rather than a knife. It's also very, very dark and utterly depressing at the start, despite a few glimmers of Pratchett's humor. Mau is the main protagonist at first, and the book opens with everyone he cares about dying. This is the place where I thought Pratchett diverged the most from his Discworld style: in Discworld, I think most of that would have been off-screen, but here we follow Mau through the realization, the devastation, the disassociation, the burials at sea, the thoughts of suicide, and the complete upheaval of everything he thought he was or was about to become. I found the start of this book difficult to get through. The immediate transition into potentially tragic misunderstandings between Mau and Daphne (as Ermintrude names herself once there is no one to tell her not to) didn't help. As I got farther into the book, though, I warmed to it. The best parts early on are Daphne's baffled but scientific attempts to understand Mau's culture and her place in it. More survivors arrive, and they start to assemble a community, anchored in large part by Mau's stubborn determination to do what's right even though he's lost all of his moorings. That community eventually re-establishes contact with the rest of the world and the opening plot about the British monarchy, but not before Daphne has been changed profoundly by being part of it. I think Pratchett worked hard at keeping Mau's culture at the center of the story. It's notable that the community that reforms over the course of the book essentially follows the patterns of Mau's lost Nation and incorporates Daphne into it, rather than (as is so often the case) the other way around. The plot itself is fiercely anti-colonial in a way that mostly worked. Still, though, it's a quasi-Pacific-island culture written by a white British man, and I had some qualms. Pratchett quite rightfully makes it clear in the afterward that this is an alternate world and Mau's culture is not a real Pacific island culture. However, that also means that its starkly gender-essentialist nature was a free choice, rather than one based on some specific culture, and I found that choice somewhat off-putting. The religious rituals are all gendered, the dwelling places are gendered, and one's entire life course in Mau's world seems based on binary classification as a man or a woman. Based on Pratchett's other books, I assume this was more an unfortunate default than a deliberate choice, but it's still a choice he could have avoided. The end of this book wrestles directly with the relative worth of Mau's culture versus that of the British. I liked most of this, but the twists that Pratchett adds to avoid the colonialist results we saw in our world stumble partly into the trap of making Mau's culture valuable by British standards. (I'm being a bit vague here to avoid spoilers.) I think it is very hard to base this book on a different set of priorities and still bring the largely UK, US, and western European audience along, so I don't blame Pratchett for failing to do it, but I'm a bit sad that the world still revolved around a British axis. This felt quite similar to Discworld to me in its overall sensibilities, but with the roles of moral philosophy and humor reversed. Discworld novels usually start with some larger-than-life characters and an absurd plot, and then the moral philosophy sneaks up behind you when you're not looking and hits you over the head. Nation starts with the moral philosophy: Mau wrestles with his gods and the problem of evil in a way that reminded me of Job, except with a far different pantheon and rather less tolerance for divine excuses on the part of the protagonist. It's the humor, instead, that sneaks up on you and makes you laugh when the plot is a bit too much. But the mix arrives at much the same place: the absurd hand-in-hand with the profound, and all seen from an angle that makes it a bit easier to understand. I'm not sure I would recommend Nation as a good place to start with Pratchett. I felt like I benefited from having read a lot of Discworld to build up my willingness to trust where Pratchett was going. But it has the quality of writing of late Discworld without the (arguable) need to read 25 books to understand all of the backstory. Regardless, recommended, and you'll never hear Twinkle Twinkle Little Star in quite the same way again. Rating: 8 out of 10

22 April 2024

Vincent Fourmond: QSoas version 3.3 is out

Version 3.3 brings in new features, including reverse Laplace transforms and fits, pH fits, commands for picking points from a dataset, averaging points with the same X value, or perform singular value decomposition. In addition to these new features, many previous commands were improved, like the addition of a bandcut filter in FFT filtering, better handling of the loading of files produced by QSoas itself, and a button to interrupt the processing of scripts. There are a lot of other new features, improvements and so on, look for the full list there. About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.3. You can download for free its source code or precompiled versions for MacOS and Windows there. Alternatively, you can clone from the GitHub repository.

Russ Allbery: Review: The Stars, Like Dust

Review: The Stars, Like Dust, by Isaac Asimov
Series: Galactic Empire #2
Publisher: Fawcett Crest
Copyright: 1950, 1951
Printing: June 1972
Format: Mass market
Pages: 192
The Stars, Like Dust is usually listed as the first book in Asimov's lesser-known Galactic Empire Trilogy since it takes place before Pebble in the Sky. Pebble in the Sky was published first, though, so I count it as the second book. It is very early science fiction with a few mystery overtones. Buying books produces about 5% of the pleasure of reading them while taking much less than 5% of the time. There was a time in my life when I thoroughly enjoyed methodically working through a used book store, list in hand, tracking down cheap copies to fill in holes in series. This means that I own a lot of books that I thought at some point that I would want to read but never got around to, often because, at the time, I was feeling completionist about some series or piece of world-building. From time to time, I get the urge to try to read some of them. Sometimes this is a poor use of my time. The Galactic Empire series is from Asimov's first science fiction period, after the Foundation series but contemporaneous with their collection into novels. They're set long, long before Foundation, but after humans have inhabited numerous star systems and Earth has become something of a backwater. That process is just starting in The Stars, Like Dust: Earth is still somewhere where an upper-class son might be sent for an education, but it has been devastated by nuclear wars and is well on its way to becoming an inward-looking relic on the edge of galactic society. Biron Farrill is the son of the Lord Rancher of Widemos, a wealthy noble whose world is one of those conquered by the Tyranni. In many other SF novels, the Tyranni would be an alien race; here, it's a hierarchical and authoritarian human civilization. The book opens with Biron discovering a radiation bomb planted in his dorm room. Shortly after, he learns that his father had been arrested. One of his fellow students claims to be on Biron's side against the Tyranni and gives him false papers to travel to Rhodia, a wealthy world run by a Tyranni sycophant. Like most books of this era, The Stars, Like Dust is a short novel full of plot twists. Unlike some of its contemporaries, it's not devoid of characterization, but I might have liked it better if it were. Biron behaves like an obnoxious teenager when he's not being an arrogant ass. There is a female character who does a few plot-relevant things and at no point is sexually assaulted, so I'll give Asimov that much, but the gender stereotypes are ironclad and there is an entire subplot focused on what I can only describe as seduction via petty jealousy. The writing... well, let me quote a typical passage:
There was no way of telling when the threshold would be reached. Perhaps not for hours, and perhaps the next moment. Biron remained standing helplessly, flashlight held loosely in his damp hands. Half an hour before, the visiphone had awakened him, and he had been at peace then. Now he knew he was going to die. Biron didn't want to die, but he was penned in hopelessly, and there was no place to hide.
Needless to say, Biron doesn't die. Even if your tolerance for pulp melodrama is high, 192 small-print pages of this sort of thing is wearying. Like a lot of Asimov plots, The Stars, Like Dust has some of the shape of a mystery novel. Biron, with the aid of some newfound companions on Rhodia, learns of a secret rebellion against the Tyranni and attempts to track down its base to join them. There are false leads, disguised identities, clues that are difficult to interpret, and similar classic mystery trappings, all covered with a patina of early 1950s imaginary science. To me, it felt constructed and artificial in ways that made the strings Asimov was pulling obvious. I don't know if someone who likes mystery construction would feel differently about it. The worst part of the plot thankfully doesn't come up much. We learn early in the story that Biron was on Earth to search for a long-lost document believed to be vital to defeating the Tyranni. The nature of that document is revealed on the final page, so I won't spoil it, but if you try to think of the stupidest possible document someone could have built this plot around, I suspect you will only need one guess. (In Asimov's defense, he blamed Galaxy editor H.L. Gold for persuading him to include this plot, and disavowed it a few years later.) The Stars, Like Dust is one of the worst books I have ever read. The characters are overwrought, the politics are slapdash and build on broad stereotypes, the romantic subplot is dire and plays out mainly via Biron egregiously manipulating his petulant love interest, and the writing is annoying. Sometimes pulp fiction makes up for those common flaws through larger-than-life feats of daring, sweeping visions of future societies, and ever-escalating stakes. There is little to none of that here. Asimov instead provides tedious political maneuvering among a class of elitist bankers and land owners who consider themselves natural leaders. The only places where the power structures of this future government make sense are where Asimov blatantly steals them from either the Roman Empire or the Doge of Venice. The one thing this book has going for it the thing, apart from bloody-minded completionism, that kept me reading is that the technology is hilariously weird in that way that only 1940s and 1950s science fiction can be. The characters have access to communication via some sort of interstellar telepathy (messages coded to a specific person's "brain waves") and can travel between stars through hyperspace jumps, but each jump is manually calculated by referring to the pilot's (paper!) volumes of the Standard Galactic Ephemeris. Communication between ships (via "etheric radio") requires manually aiming a radio beam at the area in space where one thinks the other ship is. It's an unintentionally entertaining combination of technology that now looks absurdly primitive and science that is so advanced and hand-waved that it's obviously made up. I also have to give Asimov some points for using spherical coordinates. It's a small thing, but the coordinate systems in most SF novels and TV shows are obviously not fit for purpose. I spent about a month and a half of this year barely reading, and while some of that is because I finally tackled a few projects I'd been putting off for years, a lot of it was because of this book. It was only 192 pages, and I'm still curious about the glue between Asimov's Foundation and Robot series, both of which I devoured as a teenager. But every time I picked it up to finally finish it and start another book, I made it about ten pages and then couldn't take any more. Learn from my error: don't try this at home, or at least give up if the same thing starts happening to you. Followed by The Currents of Space. Rating: 2 out of 10

20 April 2024

Bastian Venthur: Help needed: creating a WSDL file to interact with debbugs

I am upstream and Debian package maintainer of python-debianbts, which is a Python library that allows for querying Debian s Bug Tracking System (BTS). python-debianbts is used by reportbug, the standard tool to report bugs in Debian, and therefore the glue between the reportbug and the BTS. debbugs, the software that powers Debian s BTS, provides a SOAP interface for querying the BTS. Unfortunately, SOAP is not a very popular protocol anymore, and I m facing the second migration to another underlying SOAP library as they continue to become unmaintained over time. Zeep, the library I m currently considering, requires a WSDL file in order to work with a SOAP service, however, debbugs does not provide one. Since I m not familiar with WSDL, I need help from someone who can create a WSDL file for debbugs, so I can migrate python-debianbts away from pysimplesoap to zeep. How did we get here? Back in the olden days, reportbug was querying the BTS by parsing its HTML output. While this worked, it tightly coupled the user-facing presentation of the BTS with critical functionality of the bug reporting tool. The setup was fragile, prone to breakage, and did not allow changing anything in the BTS frontend for fear of breaking reportbug itself. In 2007, I started to work on reportbug-ng, a user-friendly alternative to reportbug, targeted at users not comfortable using the command line. Early on, I decided to use the BTS SOAP interface instead of parsing HTML like reportbug did. 2008, I extracted the code that dealt with the BTS into a separate Python library, and after some collaboration with the reportbug maintainers, reportbug adopted python-debianbts in 2011 and has used it ever since. 2015, I was working on porting python-debianbts to Python 3. During that process, it turned out that its major dependency, SoapPy was pretty much unmaintained for years and blocking the Python3 transition. Thanks to the help of Gaetano Guerriero, who ported python-debianbts to pysimplesoap, the migration was unblocked and could proceed. In 2024, almost ten years later, pysimplesoap seems to be unmaintained as well, and I have to look again for alternatives. The most promising one right now seems to be zeep. Unfortunately, zeep requires a WSDL file for working with a SOAP service, which debbugs does not provide. How can you help? reportbug (and thus python-debianbts) is used by thousands of users and I have a certain responsibility to keep things working properly. Since I simply don t know enough about WSDL to create such a file for debbugs myself, I m looking for someone who can help me with this task. If you re familiar with SOAP, WSDL and optionally debbugs, please get in touch with me. I don t speak Pearl, so I m not really able to read debbugs code, but I do know some things about the SOAP requests and replies due to my work on python-debianbts, so I m sure we can work something out. There is a WSDL file for a debbugs version used by GNU, but I don t think it s official and it currently does not work with zeep. It may be a good starting point, though. The future of debbugs API While we can probably continue to support debbugs SOAP interface for a while, I don t think it s very sustainable in the long run. A simpler, well documented REST API that returns JSON seems more appropriate nowadays. The queries and replies that debbugs currently supports are simple enough to design a REST API with JSON around it. The benefit would be less complex libraries on the client side and probably easier maintainability on the server side as well. debbugs maintainer seemed to be in agreement with this idea back in 2018. I created an attempt to define a new API (HTML render), but somehow we got stuck and no progress has been made since then. I m still happy to help shaping such an API for debbugs, but I can t really implement anything in debbugs itself, as it is written in Perl, which I m not familiar with.

19 April 2024

Louis-Philippe V ronneau: Montreal's Debian & Stuff - March 2024

Time really flies when you are really busy you have fun! Our Montr al Debian User Group met on Sunday March 31st and I only just found the time to write our report :) This time around, 9 of us we met at EfficiOS's offices1 to chat, hang out and work on Debian and other stuff! Here is what we did: pollo: tvaz: tassia: viashimo: lavamind: justin: Pictures Here are pictures of the event. Well, one picture (thanks Tassia!) of the event itself and another one of the crisp Italian lager I drank at the bar after the event :) People at the event working around a long table A glass of beer illuminated by sunlight

  1. Maintainers, amongst other things, of the great LTTng.

18 April 2024

Russ Allbery: Review: Unseen Academicals

Review: Unseen Academicals, by Terry Pratchett
Series: Discworld #37
Publisher: Harper
Copyright: October 2009
Printing: November 2014
ISBN: 0-06-233500-6
Format: Mass market
Pages: 517
Unseen Academicals is the 37th Discworld novel and includes many of the long-standing Ankh-Morpork cast, but mostly as supporting characters. The main characters are a new (and delightful) bunch with their own concerns. You arguably could start reading here if you really wanted to, although you would risk spoiling several previous books (most notably Thud!) and will miss some references that depend on familiarity with the cast. The Unseen University is, like most institutions of its sort, funded by an endowment that allows the wizards to focus on the pure life of the mind (or the stomach). Much to their dismay, they have just discovered that an endowment that amounts to most of their food budget requires that they field a football team. Glenda runs the night kitchen at the Unseen University. Given the deep and abiding love that wizards have for food, there is both a main kitchen and a night kitchen. The main kitchen is more prestigious, but the night kitchen is responsible for making pies, something that Glenda is quietly but exceptionally good at. Juliet is Glenda's new employee. She is exceptionally beautiful, not very bright, and a working class girl of the Ankh-Morpork streets down to her bones. Trevor Likely is a candle dribbler, responsible for assisting the Candle Knave in refreshing the endless university candles and ensuring that their wax is properly dribbled, although he pushes most of that work off onto the infallibly polite and oddly intelligent Mr. Nutt. Glenda, Trev, and Juliet are the sort of people who populate the great city of Ankh-Morpork. While the people everyone has heard of have political crises, adventures, and book plots, they keep institutions like the Unseen University running. They read romance novels, go to the football games, and nurse long-standing rivalries. They do not expect the high mucky-mucks to enter their world, let alone mess with their game. I approached Unseen Academicals with trepidation because I normally don't get along as well with the Discworld wizard books. I need not have worried; Pratchett realized that the wizards would work better as supporting characters and instead turns the main plot (or at least most of it; more on that later) over to the servants. This was a brilliant decision. The setup of this book is some of the best of Discworld up to this point. Trev is a streetwise rogue with an uncanny knack for kicking around a can that he developed after being forbidden to play football by his dear old mum. He falls for Juliet even though their families support different football teams, so you may think that a Romeo and Juliet spoof is coming. There are a few gestures of one, but Pratchett deftly avoids the pitfalls and predictability and instead makes Juliet one of the best characters in the book by playing directly against type. She is one of the characters that Pratchett is so astonishingly good at, the ones that are so thoroughly themselves that they transcend the stories they're put into. The heart of this book, though, is Glenda.
Glenda enjoyed her job. She didn't have a career; they were for people who could not hold down jobs.
She is the kind of person who knows where she fits in the world and likes what she does and is happy to stay there until she decides something isn't right, and then she changes the world through the power of common sense morality, righteous indignation, and sheer stubborn persistence. Discworld is full of complex and subtle characters fencing with each other, but there are few things I have enjoyed more than Glenda being a determinedly good person. Vetinari of course recognizes and respects (and uses) that inner core immediately. Unfortunately, as great as the setup and characters are, Unseen Academicals falls apart a bit at the end. I was eagerly reading the story, wondering what Pratchett was going to weave out of the stories of these individuals, and then it partly turned into yet another wizard book. Pratchett pulled another of his deus ex machina tricks for the climax in a way that I found unsatisfying and contrary to the tone of the rest of the story, and while the characters do get reasonable endings, it lacked the oomph I was hoping for. Rincewind is as determinedly one-note as ever, the wizards do all the standard wizard things, and the plot just isn't that interesting. I liked Mr. Nutt a great deal in the first part of the book, and I wish he could have kept that edge of enigmatic competence and unflappableness. Pratchett wanted to tell a different story that involved more angst and self-doubt, and while I appreciate that story, I found it less engaging and a bit more melodramatic than I was hoping for. Mr. Nutt's reactions in the last half of the book were believable and fit his background, but that was part of the problem: he slotted back into an archetype that I thought Pratchett was going to twist and upend. Mr. Nutt does, at least, get a fantastic closing line, and as usual there are a lot of great asides and quotes along the way, including possibly the sharpest and most biting Vetinari speech of the entire series.
The Patrician took a sip of his beer. "I have told this to few people, gentlemen, and I suspect never will again, but one day when I was a young boy on holiday in Uberwald I was walking along the bank of a stream when I saw a mother otter with her cubs. A very endearing sight, I'm sure you will agree, and even as I watched, the mother otter dived into the water and came up with a plump salmon, which she subdued and dragged on to a half-submerged log. As she ate it, while of course it was still alive, the body split and I remember to this day the sweet pinkness of its roes as they spilled out, much to the delight of the baby otters who scrambled over themselves to feed on the delicacy. One of nature's wonders, gentlemen: mother and children dining on mother and children. And that's when I first learned about evil. It is built into the very nature of the universe. Every world spins in pain. If there is any kind of supreme being, I told myself, it is up to all of us to become his moral superior."
My dissatisfaction with the ending prevents Unseen Academicals from rising to the level of Night Watch, and it's a bit more uneven than the best books of the series. Still, though, this is great stuff; recommended to anyone who is reading the series. Followed in publication order by I Shall Wear Midnight. Rating: 8 out of 10

15 April 2024

Andreas R nnquist: Status update for Allegro packaging in Debian

I have mailed to a Debian bug on allegro4.4 describing my reasoning regarding the allegro libraries in short, allegro4.4 is pretty much dead upstream, and my interest was basically to keep alex4 (which is cool) in Debian, but since it migrated to non-free, my interest in allegro4.4 has waned. So, if anybody would like to still see allegro4.4 in Debian, please step up now and help out. Since it is dead upstream, my reasoning is that it is better to remove it from Debian if no maintainer who wants to help steps up. Previously Tobias Hansen has helped out, but now it is 8 (!) years since his last upload of either package. (Please don t interpret this as judgement, I am very happy for the help he has provided and all the work he has done on the packages). Allegro5 is another deal still active upstream, and I have kept it up to date in Debian, and while I have held the latest upload a short while because of the time_t transition, it will come sooner or later There I am also waiting on a final decision on this bug from upstream. Other than that allegro 5 is in a very good state, and I will keep maintaining it as long as I can. But help would of course be appreciated on allegro5 too.

14 April 2024

Petter Reinholdtsen: Time to move orphaned Debian packages to git

There are several packages in Debian without a associated git repository with the packaging history. This is unfortunate and it would be nice if more of these would do so. Quote a lot of these are without a maintainer, ie listed as maintained by the 'Debian QA Group' place holder. In fact, 438 packages have this property according to UDD (SELECT source FROM sources WHERE release = 'sid' AND (vcs_url ilike '%anonscm.debian.org%' OR vcs_browser ilike '%anonscm.debian.org%' or vcs_url IS NULL OR vcs_browser IS NULL) AND maintainer ilike '%packages@qa.debian.org%';). Such packages can be updated without much coordination by any Debian developer, as they are considered orphaned. To try to improve the situation and reduce the number of packages without associated git repository, I started a few days ago to search out candiates and provide them with a git repository under the 'debian' collaborative Salsa project. I started with the packages pointing to obsolete Alioth git repositories, and am now working my way across the ones completely without git references. In addition to updating the Vcs-* debian/control fields, I try to update Standards-Version, debhelper compat level, simplify d/rules, switch to Rules-Requires-Root: no and fix lintian issues reported. I only implement those that are trivial to fix, to avoid spending too much time on each orphaned package. So far my experience is that it take aproximately 20 minutes to convert a package without any git references, and a lot more for packages with existing git repositories incompatible with git-buildpackages. So far I have converted 10 packages, and I will keep going until I run out of steam. As should be clear from the numbers, there is enough packages remaining for more people to do the same without stepping on each others toes. I find it useful to start by searching for a git repo already on salsa, as I find that some times a git repo has already been created, but no new version is uploaded to Debian yet. In those cases I start with the existing git repository. I convert to the git-buildpackage+pristine-tar workflow, and ensure a debian/gbp.conf file with "pristine-tar=True" is added early, to avoid uploading a orig.tar.gz with the wrong checksum by mistake. Did that three times in the begin before I remembered my mistake. So, if you are a Debian Developer and got some spare time, perhaps considering migrating some orphaned packages to git? As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

13 April 2024

Simon Josefsson: Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz
The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:
make srcdist
gpg -b libntlm-1.8-src.tar.gz
Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:
91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz
So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:
podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz
You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:
podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz
The source-only minimal tarball can be regenerated on Debian 11:
podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 
As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.
podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz
Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:
git archive --prefix=libntlm-v1.8/ -o libntlm-1.8-src.tar.gz HEAD
Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

Paul Tagliamonte: Domo Arigato, Mr. debugfs

Years ago, at what I think I remember was DebConf 15, I hacked for a while on debhelper to write build-ids to debian binary control files, so that the build-id (more specifically, the ELF note .note.gnu.build-id) wound up in the Debian apt archive metadata. I ve always thought this was super cool, and seeing as how Michael Stapelberg blogged some great pointers around the ecosystem, including the fancy new debuginfod service, and the find-dbgsym-packages helper, which uses these same headers, I don t think I m the only one. At work I ve been using a lot of rust, specifically, async rust using tokio. To try and work on my style, and to dig deeper into the how and why of the decisions made in these frameworks, I ve decided to hack up a project that I ve wanted to do ever since 2015 write a debug filesystem. Let s get to it.

Back to the Future Time to admit something. I really love Plan 9. It s just so good. So many ideas from Plan 9 are just so prescient, and everything just feels right. Not just right like, feels good like, correct. The bit that I ve always liked the most is 9p, the network protocol for serving a filesystem over a network. This leads to all sorts of fun programs, like the Plan 9 ftp client being a 9p server you mount the ftp server and access files like any other files. It s kinda like if fuse were more fully a part of how the operating system worked, but fuse is all running client-side. With 9p there s a single client, and different servers that you can connect to, which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic. The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9 in terms of adoption 9p is in all sorts of places folks don t usually expect. For instance, the Windows Subsystem for Linux uses the 9p protocol to share files between Windows and Linux. ChromeOS uses it to share files with Crostini, and qemu uses 9p (virtio-p9) to share files between guest and host. If you re noticing a pattern here, you d be right; for some reason 9p is the go-to protocol to exchange files between hypervisor and guest. Why? I have no idea, except maybe due to being designed well, simple to implement, and it s a lot easier to validate the data being shared and validate security boundaries. Simplicity has its value. As a result, there s a lot of lingering 9p support kicking around. Turns out Linux can even handle mounting 9p filesystems out of the box. This means that I can deploy a filesystem to my LAN or my localhost by running a process on top of a computer that needs nothing special, and mount it over the network on an unmodified machine unlike fuse, where you d need client-specific software to run in order to mount the directory. For instance, let s mount a 9p filesystem running on my localhost machine, serving requests on 127.0.0.1:564 (tcp) that goes by the name mountpointname to /mnt.
$ mount -t 9p \
-o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \
127.0.0.1 \
/mnt
Linux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS.

WHEREIN I STYX WITH IT Since I wanted to push myself a bit more with rust and tokio specifically, I opted to implement the whole stack myself, without third party libraries on the critical path where I could avoid it. The 9p protocol (sometimes called Styx, the original name for it) is incredibly simple. It s a series of client to server requests, which receive a server to client response. These are, respectively, T messages, which transmit a request to the server, which trigger an R message in response (Reply messages). These messages are TLV payload with a very straight forward structure so straight forward, in fact, that I was able to implement a working server off nothing more than a handful of man pages. Later on after the basics worked, I found a more complete spec page that contains more information about the unix specific variant that I opted to use (9P2000.u rather than 9P2000) due to the level of Linux specific support for the 9P2000.u variant over the 9P2000 protocol.

MR ROBOTO The backend stack over at zoo is rust and tokio running i/o for an HTTP and WebRTC server. I figured I d pick something fairly similar to write my filesystem with, since 9P can be implemented on basically anything with I/O. That means tokio tcp server bits, which construct and use a 9p server, which has an idiomatic Rusty API that partially abstracts the raw R and T messages, but not so much as to cause issues with hiding implementation possibilities. At each abstraction level, there s an escape hatch allowing someone to implement any of the layers if required. I called this framework arigato which can be found over on docs.rs and crates.io.
/// Simplified version of the arigato File trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.File.html
trait File  
/// OpenFile is the type returned by this File via an Open call.
 type OpenFile: OpenFile;
/// Return the 9p Qid for this file. A file is the same if the Qid is
 /// the same. A Qid contains information about the mode of the file,
 /// version of the file, and a unique 64 bit identifier.
 fn qid(&self) -> Qid;
/// Construct the 9p Stat struct with metadata about a file.
 async fn stat(&self) -> FileResult<Stat>;
/// Attempt to update the file metadata.
 async fn wstat(&mut self, s: &Stat) -> FileResult<()>;
/// Traverse the filesystem tree.
 async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>;
/// Request that a file's reference be removed from the file tree.
 async fn unlink(&mut self) -> FileResult<()>;
/// Create a file at a specific location in the file tree.
 async fn create(
&mut self,
name: &str,
perm: u16,
ty: FileType,
mode: OpenMode,
extension: &str,
) -> FileResult<Self>;
/// Open the File, returning a handle to the open file, which handles
 /// file i/o. This is split into a second type since it is genuinely
 /// unrelated -- and the fact that a file is Open or Closed can be
 /// handled by the  arigato  server for us.
 async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>;
 
/// Simplified version of the arigato OpenFile trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html
trait OpenFile  
/// iounit to report for this file. The iounit reported is used for Read
 /// or Write operations to signal, if non-zero, the maximum size that is
 /// guaranteed to be transferred atomically.
 fn iounit(&self) -> u32;
/// Read some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes read is
 /// returned.
 async fn read_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
/// Write some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes written
 /// is returned.
 fn write_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
 

Thanks, decade ago paultag! Let s do it! Let s use arigato to implement a 9p filesystem we ll call debugfs that will serve all the debug files shipped according to the Packages metadata from the apt archive. We ll fetch the Packages file and construct a filesystem based on the reported Build-Id entries. For those who don t know much about how an apt repo works, here s the 2-second crash course on what we re doing. The first is to fetch the Packages file, which is specific to a binary architecture (such as amd64, arm64 or riscv64). That architecture is specific to a component (such as main, contrib or non-free). That component is specific to a suite, such as stable, unstable or any of its aliases (bullseye, bookworm, etc). Let s take a look at the Packages.xz file for the unstable-debug suite, main component, for all amd64 binaries.
$ curl \
https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \
  unxz
This will return the Debian-style rfc2822-like headers, which is an export of the metadata contained inside each .deb file which apt (or other tools that can use the apt repo format) use to fetch information about debs. Let s take a look at the debug headers for the netlabel-tools package in unstable which is a package named netlabel-tools-dbgsym in unstable-debug.
Package: netlabel-tools-dbgsym
Source: netlabel-tools (0.30.0-1)
Version: 0.30.0-1+b1
Installed-Size: 79
Maintainer: Paul Tagliamonte <paultag@debian.org>
Architecture: amd64
Depends: netlabel-tools (= 0.30.0-1+b1)
Description: debug symbols for netlabel-tools
Auto-Built-Package: debug-symbols
Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a
Description-md5: a0e587a0cf730c88a4010f78562e6db7
Section: debug
Priority: optional
Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
Size: 62776
SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60
So here, we can parse the package headers in the Packages.xz file, and store, for each Build-Id, the Filename where we can fetch the .deb at. Each .deb contains a number of files but we re only really interested in the files inside the .deb located at or under /usr/lib/debug/.build-id/, which you can find in debugfs under rfc822.rs. It s crude, and very single-purpose, but I m feeling a bit lazy.

Who needs dpkg?! For folks who haven t seen it yet, a .deb file is a special type of .ar file, that contains (usually) three files inside debian-binary, control.tar.xz and data.tar.xz. The core of an .ar file is a fixed size (60 byte) entry header, followed by the specified size number of bytes.
[8 byte .ar file magic]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
...
First up was to implement a basic ar parser in ar.rs. Before we get into using it to parse a deb, as a quick diversion, let s break apart a .deb file by hand something that is a bit of a rite of passage (or at least it used to be? I m getting old) during the Debian nm (new member) process, to take a look at where exactly the .debug file lives inside the .deb file.
$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ ls
control.tar.xz debian-binary
data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ tar --list -f data.tar.xz   grep '.debug$'
./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
Since we know quite a bit about the structure of a .deb file, and I had to implement support from scratch anyway, I opted to implement a (very!) basic debfile parser using HTTP Range requests. HTTP Range requests, if supported by the server (denoted by a accept-ranges: bytes HTTP header in response to an HTTP HEAD request to that file) means that we can add a header such as range: bytes=8-68 to specifically request that the returned GET body be the byte range provided (in the above case, the bytes starting from byte offset 8 until byte offset 68). This means we can fetch just the ar file entry from the .deb file until we get to the file inside the .deb we are interested in (in our case, the data.tar.xz file) at which point we can request the body of that file with a final range request. I wound up writing a struct to handle a read_at-style API surface in hrange.rs, which we can pair with ar.rs above and start to find our data in the .deb remotely without downloading and unpacking the .deb at all. After we have the body of the data.tar.xz coming back through the HTTP response, we get to pipe it through an xz decompressor (this kinda sucked in Rust, since a tokio AsyncRead is not the same as an http Body response is not the same as std::io::Read, is not the same as an async (or sync) Iterator is not the same as what the xz2 crate expects; leading me to read blocks of data to a buffer and stuff them through the decoder by looping over the buffer for each lzma2 packet in a loop), and tarfile parser (similarly troublesome). From there we get to iterate over all entries in the tarfile, stopping when we reach our file of interest. Since we can t seek, but gdb needs to, we ll pull it out of the stream into a Cursor<Vec<u8>> in-memory and pass a handle to it back to the user. From here on out its a matter of gluing together a File traited struct in debugfs, and serving the filesystem over TCP using arigato. Done deal!

A quick diversion about compression I was originally hoping to avoid transferring the whole tar file over the network (and therefore also reading the whole debug file into ram, which objectively sucks), but quickly hit issues with figuring out a way around seeking around an xz file. What s interesting is xz has a great primitive to solve this specific problem (specifically, use a block size that allows you to seek to the block as close to your desired seek position just before it, only discarding at most block size - 1 bytes), but data.tar.xz files generated by dpkg appear to have a single mega-huge block for the whole file. I don t know why I would have expected any different, in retrospect. That means that this now devolves into the base case of How do I seek around an lzma2 compressed data stream ; which is a lot more complex of a question. Thankfully, notoriously brilliant tianon was nice enough to introduce me to Jon Johnson who did something super similar adapted a technique to seek inside a compressed gzip file, which lets his service oci.dag.dev seek through Docker container images super fast based on some prior work such as soci-snapshotter, gztool, and zran.c. He also pulled this party trick off for apk based distros over at apk.dag.dev, which seems apropos. Jon was nice enough to publish a lot of his work on this specifically in a central place under the name targz on his GitHub, which has been a ton of fun to read through. The gist is that, by dumping the decompressor s state (window of previous bytes, in-memory data derived from the last N-1 bytes) at specific checkpoints along with the compressed data stream offset in bytes and decompressed offset in bytes, one can seek to that checkpoint in the compressed stream and pick up where you left off creating a similar block mechanism against the wishes of gzip. It means you d need to do an O(n) run over the file, but every request after that will be sped up according to the number of checkpoints you ve taken. Given the complexity of xz and lzma2, I don t think this is possible for me at the moment especially given most of the files I ll be requesting will not be loaded from again especially when I can just cache the debug header by Build-Id. I want to implement this (because I m generally curious and Jon has a way of getting someone excited about compression schemes, which is not a sentence I thought I d ever say out loud), but for now I m going to move on without this optimization. Such a shame, since it kills a lot of the work that went into seeking around the .deb file in the first place, given the debian-binary and control.tar.gz members are so small.

The Good First, the good news right? It works! That s pretty cool. I m positive my younger self would be amused and happy to see this working; as is current day paultag. Let s take debugfs out for a spin! First, we need to mount the filesystem. It even works on an entirely unmodified, stock Debian box on my LAN, which is huge. Let s take it for a spin:
$ mount \
-t 9p \
-o trans=tcp,version=9p2000.u,aname=unstable-debug \
192.168.0.2 \
/usr/lib/debug/.build-id/
And, let s prove to ourselves that this actually mounted before we go trying to use it:
$ mount   grep build-id
192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564)
Slick. We ve got an open connection to the server, where our host will keep a connection alive as root, attached to the filesystem provided in aname. Let s take a look at it.
$ ls /usr/lib/debug/.build-id/
00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3
01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4
02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5
03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6
04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7
05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8
06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9
07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa
08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb
09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc
0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd
0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe
0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff
Outstanding. Let s try using gdb to debug a binary that was provided by the Debian archive, and see if it ll load the ELF by build-id from the right .deb in the unstable-debug suite:
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Yes! Yes it will!
$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
/usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped

The Bad Linux s support for 9p is mainline, which is great, but it s not robust. Network issues or server restarts will wedge the mountpoint (Linux can t reconnect when the tcp connection breaks), and things that work fine on local filesystems get translated in a way that causes a lot of network chatter for instance, just due to the way the syscalls are translated, doing an ls, will result in a stat call for each file in the directory, even though linux had just got a stat entry for every file while it was resolving directory names. On top of that, Linux will serialize all I/O with the server, so there s no concurrent requests for file information, writes, or reads pending at the same time to the server; and read and write throughput will degrade as latency increases due to increasing round-trip time, even though there are offsets included in the read and write calls. It works well enough, but is frustrating to run up against, since there s not a lot you can do server-side to help with this beyond implementing the 9P2000.L variant (which, maybe is worth it).

The Ugly Unfortunately, we don t know the file size(s) until we ve actually opened the underlying tar file and found the correct member, so for most files, we don t know the real size to report when getting a stat. We can t parse the tarfiles for every stat call, since that d make ls even slower (bummer). Only hiccup is that when I report a filesize of zero, gdb throws a bit of a fit; let s try with a size of 0 to start:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug]
[...]
This obviously won t work since gdb will throw away all our hard work because of stat s output, and neither will loading the real size of the underlying file. That only leaves us with hardcoding a file size and hope nothing else breaks significantly as a result. Let s try it again:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Much better. I mean, terrible but better. Better for now, anyway.

Kilroy was here Do I think this is a particularly good idea? I mean; kinda. I m probably going to make some fun 9p arigato-based filesystems for use around my LAN, but I don t think I ll be moving to use debugfs until I can figure out how to ensure the connection is more resilient to changing networks, server restarts and fixes on i/o performance. I think it was a useful exercise and is a pretty great hack, but I don t think this ll be shipping anywhere anytime soon. Along with me publishing this post, I ve pushed up all my repos; so you should be able to play along at home! There s a lot more work to be done on arigato; but it does handshake and successfully export a working 9P2000.u filesystem. Check it out on on my github at arigato, debugfs and also on crates.io and docs.rs. At least I can say I was here and I got it working after all these years.

Russell Coker: Software Needed for Work

When I first started studying computer science setting up a programming project was easy, write source code files and a Makefile and that was it. IRC was the only IM system and email was the only other communications system that was used much. Writing Makefiles is difficult but products like the Borland Turbo series of IDEs did all that for you so you could just start typing code and press a function key to compile and run (F5 from memory). Over the years the requirements and expectations of computer use have grown significantly. The typical office worker is now doing many more things with computers than serious programmers used to do. Running an IM system, an online document editing system, and a series of web apps is standard for companies nowadays. Developers have to do all that in addition to tools for version control, continuous integration, bug reporting, and feature tracking. The development process is also more complex with extra steps for reproducible builds, automated tests, and code coverage metrics for the tests. I wonder how many programmers who started in the 90s would have done something else if faced with Github as their introduction. How much of this is good? Having the ability to send instant messages all around the world is great. Having dozens of different ways of doing so is awful. When a company uses multiple IM systems such as MS-Teams and Slack and forces some of it s employees to use them both it s getting ridiculous. Having different friend groups on different IM systems is anti-social networking. In the EU the Digital Markets Act [1] forces some degree of interoperability between different IM systems and as it s impossible to know who s actually in the EU that will end up being world-wide. In corporations document management often involves multiple ways of storing things, you have Google Docs, MS Office online, hosted Wikis like Confluence, and more. Large companies tend to use several such systems which means that people need to learn multiple systems to be able to work and they also need to know which systems are used by the various groups that they communicate with. Microsoft deserves some sort of award for the range of ways they have for managing documents, Sharepoint, OneDrive, Office Online, attachments to Teams rooms, and probably lots more. During WW2 the predecessor to the CIA produced an excellent manual for simple sabotage [2]. If something like that was written today the section General Interference with Organisations and Production would surely have something about using as many incompatible programs and web sites as possible in the work flow. The proliferation of software required for work is a form of denial of service attack against corporations. The efficiency of companies doesn t really bother me. It sucks that companies are creating a demoralising workplace that is unpleasant for workers. But the upside is that the biggest companies are the ones doing the worst things and are also the most afflicted by these problems. It s almost like the Bureau of Sabotage in some of Frank Herbert s fiction [3]. The thing that concerns me is the effect of multiple standards on free software development. We have IRC the most traditional IM support system which is getting replaced by Matrix but we also have some projects using Telegram, and Jabber hasn t gone away. I m sure there are others too. There are also multiple options for version control (although github seems to dominate the market), forums, bug trackers, etc. Reporting bugs or getting support in free software often requires interacting with several of them. Developing free software usually involves dealing with the bug tracking and documentation systems of the distribution you use as well as the upstream developers of the software. If the problem you have is related to compatibility between two different pieces of free software then you can end up dealing with even more bug tracking systems. There are real benefits to some of the newer programs to track bugs, write documentation, etc. There is also going to be a cost in changing which gives an incentive for the older projects to keep using what has worked well enough for them in the past, How can we improve things? Use only the latest tools? Prioritise ease of use? Aim more for the entry level contributors?

11 April 2024

Reproducible Builds: Reproducible Builds in March 2024

Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Arch Linux minimal container userland now 100% reproducible
  2. Validating Debian s build infrastructure after the XZ backdoor
  3. Making Fedora Linux (more) reproducible
  4. Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management
  5. Software and source code identification with GNU Guix and reproducible builds
  6. Two new Rust-based tools for post-processing determinism
  7. Distribution work
  8. Mailing list highlights
  9. Website updates
  10. Delta chat clients now reproducible
  11. diffoscope updates
  12. Upstream patches
  13. Reproducibility testing framework

Arch Linux minimal container userland now 100% reproducible In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux minimal container userland is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a real world , widely-used Linux distribution being reproducible. Their post, which kpcyrd suffixed with the question now what? , continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.

Validating Debian s build infrastructure after the XZ backdoor From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability. Vagrant reports (with some caveats):
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.
That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.

Making Fedora Linux (more) reproducible In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible. Documented in more detail on Fedora s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what s coming next. (YouTube video)

Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF
Julien s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can be leveraged to further improve the safety of the software supply chain , etc.

Software and source code identification with GNU Guix and reproducible builds In a long line of commendably detailed blog posts, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: What does it take to identify software ? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it? Later in the month, Ludovic Court s wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic s post touches on GNU Guix s aim to support time travel , the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.

Two new Rust-based tools for post-processing determinism Zbigniew J drzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora In addition, Yossi Kreinin published a blog post titled refix: fast, debuggable, reproducible builds that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.

Distribution work In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of normal to a new level of wishlist. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories toolchain issue [ ] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource issue [ ] and updating the random_uuid_in_notebooks_generated_by_nbsphinx to reference a relevant discussion thread [ ]. In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t transition. Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.

Mailing list highlights Elsewhere on our mailing list this month:

Website updates There were made a number of improvements to our website this month, including:
  • Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the CITATION.cff file. Pol also added an substantial new section to the buy in page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [ ][ ]
  • Bernhard M. Wiedemann added a new commandments page to the documentation [ ][ ] and fixed some incorrect YAML elsewhere on the site [ ].
  • Chris Lamb add three recent academic papers to the publications page of the website. [ ]
  • Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of amd64 virtual machines. [ ][ ][ ]
  • Roland Clobus updated the stable outputs page, dropping version numbers from Python documentation pages [ ] and noting that Python s set data structure is also affected by the PYTHONHASHSEED functionality. [ ]

Delta chat clients now reproducible Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions 259, 260 and 261 to Debian and made the following additional changes:
  • New features:
    • Add support for the zipdetails tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. [ ]
  • Bug fixes:
    • Don t identify Redis database dumps as GNU R database files based simply on their filename. [ ]
    • Add a missing call to File.recognizes so we actually perform the filename check for GNU R data files. [ ]
    • Don t crash if we encounter an .rdb file without an equivalent .rdx file. (#1066991)
    • Correctly check for 7z being available and not lz4 when testing 7z. [ ]
    • Prevent a traceback when comparing a contentful .pyc file with an empty one. [ ]
  • Testsuite improvements:
    • Fix .epub tests after supporting the new zipdetails tool. [ ]
    • Don t use parenthesis within test skipping messages, as PyTest adds its own parenthesis. [ ]
    • Factor out Python version checking in test_zip.py. [ ]
    • Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. [ ]
    • Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). [ ]
In addition, Fay Stegerman updated diffoscope s monkey patch for supporting the unusual Mozilla ZIP file format after Python s zipfile module changed to detect potentially insecure overlapping entries within .zip files. (#362) Chris Lamb also updated the trydiffoscope command line client, dropping a build-dependency on the deprecated python3-distutils package to fix Debian bug #1065988 [ ], taking a moment to also refresh the packaging to the latest Debian standards [ ]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. [ ]

Upstream patches This month, we wrote a large number of patches, including: Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their %check section, thus failing when built with the --no-checks option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy, exiv2, gnome-disk-utility, grisbi, gsl, itinerary, kosmindoormap, libQuotient, med-tools, plasma6-disks, pspp, python-pypuppetdb, python-urlextract, rsync, vagrant-libvirt and xsimd. Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it s safe to say that the firmware should keep working.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, an enormous number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Sleep less after a so-called 404 package state has occurred. [ ]
    • Schedule package builds more often. [ ][ ]
    • Regenerate all our HTML indexes every hour, but only every 12h for the released suites. [ ]
    • Create and update unstable and experimental base systems on armhf again. [ ][ ]
    • Don t reschedule so many depwait packages due to the current size of the i386 architecture queue. [ ]
    • Redefine our scheduling thresholds and amounts. [ ]
    • Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. [ ]
    • Only create the stats_buildinfo.png graph once per day. [ ][ ]
    • Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. [ ]
    • Document how to use systemctl with new systemd-based services. [ ]
    • Temporarily disable armhf and i386 continuous integration tests in order to get some stability back. [ ]
    • Use the deb.debian.org CDN everywhere. [ ]
    • Remove the rsyslog logging facility on bookworm systems. [ ]
    • Add zst to the list of packages which are false-positive diskspace issues. [ ]
    • Detect failures to bootstrap Debian base systems. [ ]
  • Arch Linux-related changes:
    • Temporarily disable builds because the pacman package manager is broken. [ ][ ]
    • Split reproducible_html_live_status and split the scheduling timing . [ ][ ][ ]
    • Improve handling when database is locked. [ ][ ]
  • Misc changes:
    • Show failed services that require manual cleanup. [ ][ ]
    • Integrate two new Infomaniak nodes. [ ][ ][ ][ ]
    • Improve IRC notifications for artifacts. [ ]
    • Run diffoscope in different systemd slices. [ ]
    • Run the node health check more often, as it can now repair some issues. [ ][ ]
    • Also include the string Bot in the userAgent for Git. (Re: #929013). [ ]
    • Document increased tmpfs size on our OUSL nodes. [ ]
    • Disable memory account for the reproducible_build service. [ ][ ]
    • Allow 10 times as many open files for the Jenkins service. [ ]
    • Set OOMPolicy=continue and OOMScoreAdjust=-1000 for both the Jenkins and the reproducible_build service. [ ]
Mattia Rizzolo also made the following changes:
  • Debian-related changes:
    • Define a systemd slice to group all relevant services. [ ][ ]
    • Add a bunch of quotes in scripts to assuage the shellcheck tool. [ ]
    • Add stats on how many packages have been built today so far. [ ]
    • Instruct systemd-run to handle diffoscope s exit codes specially. [ ]
    • Prefer the pgrep tool over grepping the output of ps. [ ]
    • Re-enable a couple of i386 and armhf architecture builders. [ ][ ]
    • Fix some stylistic issues flagged by the Python flake8 tool. [ ]
    • Cease scheduling Debian unstable and experimental on the armhf architecture due to the time_t transition. [ ]
    • Start a few more i386 & armhf workers. [ ][ ][ ]
    • Temporarly skip pbuilder updates in the unstable distribution, but only on the armhf architecture. [ ]
  • Other changes:
    • Perform some large-scale refactoring on how the systemd service operates. [ ][ ]
    • Move the list of workers into a separate file so it s accessible to a number of scripts. [ ]
    • Refactor the powercycle_x86_nodes.py script to use the new IONOS API and its new Python bindings. [ ]
    • Also fix nph-logwatch after the worker changes. [ ]
    • Do not install the stunnel tool anymore, it shouldn t be needed by anything anymore. [ ]
    • Move temporary directories related to Arch Linux into a single directory for clarity. [ ]
    • Update the arm64 architecture host keys. [ ]
    • Use a common Postfix configuration. [ ]
The following changes were also made by:
  • Jan-Benedict Glaw:
    • Initial work to clean up a messy NetBSD-related script. [ ][ ]
  • Roland Clobus:
    • Show the installer log if the installer fails to build. [ ]
    • Avoid the minus character (i.e. -) in a variable in order to allow for tags in openQA. [ ]
    • Update the schedule of Debian live image builds. [ ]
  • Vagrant Cascadian:
    • Maintenance on the virt* nodes is completed so bring them back online. [ ]
    • Use the fully qualified domain name in configuration. [ ]
Node maintenance was also performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

9 April 2024

Matthew Palmer: How I Tripped Over the Debian Weak Keys Vulnerability

Those of you who haven t been in IT for far, far too long might not know that next month will be the 16th(!) anniversary of the disclosure of what was, at the time, a fairly earth-shattering revelation: that for about 18 months, the Debian OpenSSL package was generating entirely predictable private keys. The recent xz-stential threat (thanks to @nixCraft for making me aware of that one), has got me thinking about my own serendipitous interaction with a major vulnerability. Given that the statute of limitations has (probably) run out, I thought I d share it as a tale of how huh, that s weird can be a powerful threat-hunting tool but only if you ve got the time to keep pulling at the thread.

Prelude to an Adventure Our story begins back in March 2008. I was working at Engine Yard (EY), a now largely-forgotten Rails-focused hosting company, which pioneered several advances in Rails application deployment. Probably EY s greatest claim to lasting fame is that they helped launch a little code hosting platform you might have heard of, by providing them free infrastructure when they were little more than a glimmer in the Internet s eye. I am, of course, talking about everyone s favourite Microsoft product: GitHub. Since GitHub was in the right place, at the right time, with a compelling product offering, they quickly started to gain traction, and grow their userbase. With growth comes challenges, amongst them the one we re focusing on today: SSH login times. Then, as now, GitHub provided SSH access to the git repos they hosted, by SSHing to git@github.com with publickey authentication. They were using the standard way that everyone manages SSH keys: the ~/.ssh/authorized_keys file, and that became a problem as the number of keys started to grow. The way that SSH uses this file is that, when a user connects and asks for publickey authentication, SSH opens the ~/.ssh/authorized_keys file and scans all of the keys listed in it, looking for a key which matches the key that the user presented. This linear search is normally not a huge problem, because nobody in their right mind puts more than a few keys in their ~/.ssh/authorized_keys, right?
2008-era GitHub giving monkey puppet side-eye to the idea that nobody stores many keys in an authorized_keys file
Of course, as a popular, rapidly-growing service, GitHub was gaining users at a fair clip, to the point that the one big file that stored all the SSH keys was starting to visibly impact SSH login times. This problem was also not going to get any better by itself. Something Had To Be Done. EY management was keen on making sure GitHub ran well, and so despite it not really being a hosting problem, they were willing to help fix this problem. For some reason, the late, great, Ezra Zygmuntowitz pointed GitHub in my direction, and let me take the time to really get into the problem with the GitHub team. After examining a variety of different possible solutions, we came to the conclusion that the least-worst option was to patch OpenSSH to lookup keys in a MySQL database, indexed on the key fingerprint. We didn t take this decision on a whim it wasn t a case of yeah, sure, let s just hack around with OpenSSH, what could possibly go wrong? . We knew it was potentially catastrophic if things went sideways, so you can imagine how much worse the other options available were. Ensuring that this wouldn t compromise security was a lot of the effort that went into the change. In the end, though, we rolled it out in early April, and lo! SSH logins were fast, and we were pretty sure we wouldn t have to worry about this problem for a long time to come. Normally, you d think patching OpenSSH to make mass SSH logins super fast would be a good story on its own. But no, this is just the opening scene.

Chekov s Gun Makes its Appearance Fast forward a little under a month, to the first few days of May 2008. I get a message from one of the GitHub team, saying that somehow users were able to access other users repos over SSH. Naturally, as we d recently rolled out the OpenSSH patch, which touched this very thing, the code I d written was suspect number one, so I was called in to help.
The lineup scene from the movie The Usual Suspects They're called The Usual Suspects for a reason, but sometimes, it really is Keyser S ze
Eventually, after more than a little debugging, we discovered that, somehow, there were two users with keys that had the same key fingerprint. This absolutely shouldn t happen it s a bit like winning the lottery twice in a row1 unless the users had somehow shared their keys with each other, of course. Still, it was worth investigating, just in case it was a web application bug, so the GitHub team reached out to the users impacted, to try and figure out what was going on. The users professed no knowledge of each other, neither admitted to publicising their key, and couldn t offer any explanation as to how the other person could possibly have gotten their key. Then things went from weird to what the ? . Because another pair of users showed up, sharing a key fingerprint but it was a different shared key fingerprint. The odds now have gone from winning the lottery multiple times in a row to as close to this literally cannot happen as makes no difference.
Milhouse from The Simpsons says that We're Through The Looking Glass Here, People
Once we were really, really confident that the OpenSSH patch wasn t the cause of the problem, my involvement in the problem basically ended. I wasn t a GitHub employee, and EY had plenty of other customers who needed my help, so I wasn t able to stay deeply involved in the on-going investigation of The Mystery of the Duplicate Keys. However, the GitHub team did keep talking to the users involved, and managed to determine the only apparent common factor was that all the users claimed to be using Debian or Ubuntu systems, which was where their SSH keys would have been generated. That was as far as the investigation had really gotten, when along came May 13, 2008.

Chekov s Gun Goes Off With the publication of DSA-1571-1, everything suddenly became clear. Through a well-meaning but ultimately disasterous cleanup of OpenSSL s randomness generation code, the Debian maintainer had inadvertently reduced the number of possible keys that could be generated by a given user from bazillions to a little over 32,000. With so many people signing up to GitHub some of them no doubt following best practice and freshly generating a separate key it s unsurprising that some collisions occurred. You can imagine the sense of oooooooh, so that s what s going on! that rippled out once the issue was understood. I was mostly glad that we had conclusive evidence that my OpenSSH patch wasn t at fault, little knowing how much more contact I was to have with Debian weak keys in the future, running a huge store of known-compromised keys and using them to find misbehaving Certificate Authorities, amongst other things.

Lessons Learned While I ve not found a description of exactly when and how Luciano Bello discovered the vulnerability that became CVE-2008-0166, I presume he first came across it some time before it was disclosed likely before GitHub tripped over it. The stable Debian release that included the vulnerable code had been released a year earlier, so there was plenty of time for Luciano to have discovered key collisions and go hmm, I wonder what s going on here? , then keep digging until the solution presented itself. The thought hmm, that s odd , followed by intense investigation, leading to the discovery of a major flaw is also what ultimately brought down the recent XZ backdoor. The critical part of that sequence is the ability to do that intense investigation, though. When I reflect on my brush with the Debian weak keys vulnerability, what sticks out to me is the fact that I didn t do the deep investigation. I wonder if Luciano hadn t found it, how long it might have been before it was found. The GitHub team would have continued investigating, presumably, and perhaps they (or I) would have eventually dug deep enough to find it. But we were all super busy myself, working support tickets at EY, and GitHub feverishly building features and fighting the fires in their rapidly-growing service. As it was, Luciano was able to take the time to dig in and find out what was happening, but just like the XZ backdoor, I feel like we, as an industry, got a bit lucky that someone with the skills, time, and energy was on hand at the right time to make a huge difference. It s a luxury to be able to take the time to really dig into a problem, and it s a luxury that most of us rarely have. Perhaps an understated takeaway is that somehow we all need to wrestle back some time to follow our hunches and really dig into the things that make us go hmm .

Support My Hunches If you d like to help me be able to do intense investigations of mysterious software phenomena, you can shout me a refreshing beverage on ko-fi.
  1. the odds are actually probably more like winning the lottery about twenty times in a row. The numbers involved are staggeringly huge, so it s easiest to just approximate it as really, really unlikely .

5 April 2024

Bits from Debian: apt install dpl-candidate: Andreas Tille

The Debian Project Developers will shortly vote for a new Debian Project Leader known as the DPL. The Project Leader is the official representative of The Debian Project tasked with managing the overall project, its vision, direction, and finances. The DPL is also responsible for the selection of Delegates, defining areas of responsibility within the project, the coordination of Developers, and making decisions required for the project. Our outgoing and present DPL Jonathan Carter served 4 terms, from 2020 through 2024. Jonathan shared his last Bits from the DPL post to Debian recently and his hopes for the future of Debian. Recently, we sat with the two present candidates for the DPL position asking questions to find out who they really are in a series of interviews about their platforms, visions for Debian, lives, and even their favorite text editors. The interviews were conducted by disaster2life (Yashraj Moghe) and made available from video and audio transcriptions: Voting for the position starts on April 6, 2024. Editors' note: This is our official return to Debian interviews, readers should stay tuned for more upcoming interviews with Developers and other important figures in Debian as part of our "Meet your Debian Developer" series. We used the following tools and services: Turboscribe.ai for the transcription from the audio and video files, IRC: Oftc.net for communication, Jitsi meet for interviews, and Open Broadcaster Software (OBS) for editing and video. While we encountered many technical difficulties in the return to this process, we are still able and proud to present the transcripts of the interviews edited only in a few areas for readability. 2024 Debian Project Leader Candidate: Andrea Tille Andreas' Interview Who are you? Tell us a little about yourself. [Andreas]:
How am I? Well, I'm, as I wrote in my platform, I'm a proud grandfather doing a lot of free software stuff, doing a lot of sports, have some goals in mind which I like to do and hopefully for the best of Debian.
And How are you today? [Andreas]:
How I'm doing today? Well, actually I have some headaches but it's fine for the interview. So, usually I feel very good. Spring was coming here and today it's raining and I plan to do a bicycle tour tomorrow and hope that I do not get really sick but yeah, for the interview it's fine.
What do you do in Debian? Could you mention your story here? [Andreas]:
Yeah, well, I started with Debian kind of an accident because I wanted to have some package salvaged which is called WordNet. It's a monolingual dictionary and I did not really plan to do more than maybe 10 packages or so. I had some kind of training with xTeddy which is totally unimportant, a cute teddy you can put on your desktop. So, and then well, more or less I thought how can I make Debian attractive for my employer which is a medical institute and so on. It could make sense to package bioinformatics and medicine software and it somehow evolved in a direction I did neither expect it nor wanted to do, that I'm currently the most busy uploader in Debian, created several teams around it. DebianMate is very well known from me. I created the Blends team to create teams and techniques around what we are doing which was Debian TIS, Debian Edu, Debian Science and so on and I also created the packaging team for R, for the statistics package R which is technically based and not topic based. All these blends are covering a certain topic and R is just needed by lots of these blends. So, yeah, and to cope with all this I have written a script which is routing an update to manage all these uploads more or less automatically. So, I think I had one day where I uploaded 21 new packages but it's just automatically generated, right? So, it's on one day more than I ever planned to do.
What is the first thing you think of when you think of Debian? Editors' note: The question was misunderstood as the worst thing you think of when you think of Debian [Andreas]:
The worst thing I think about Debian, it's complicated. I think today on Debian board I was asked about the technical progress I want to make and in my opinion we need to standardize things inside Debian. For instance, bringing all the packages to salsa, follow some common standards, some common workflow which is extremely helpful. As I said, if I'm that productive with my own packages we can adopt this in general, at least in most cases I think. I made a lot of good experience by the support of well-formed teams. Well-formed teams are those teams where people support each other, help each other. For instance, how to say, I'm a physicist by profession so I'm not an IT expert. I can tell apart what works and what not but I'm not an expert in those packages. I do and the amount of packages is so high that I do not even understand all the techniques they are covering like Go, Rust and something like this. And I also don't speak Java and I had a problem once in the middle of the night and I've sent the email to the list and was a Java problem and I woke up in the morning and it was solved. This is what I call a team. I don't call a team some common repository that is used by random people for different packages also but it's working together, don't hesitate to solve other people's problems and permit people to get active. This is what I call a team and this is also something I observed in, it's hard to give a percentage, in a lot of other teams but we have other people who do not even understand the concept of the team. Why is working together make some advantage and this is also a tough thing. I [would] like to tackle in my term if I get elected to form solid teams using the common workflow. This is one thing. The other thing is that we have a lot of good people in our infrastructure like FTP masters, DSA and so on. I have the feeling they have a lot of work and are working more or less on their limits, and I like to talk to them [to ask] what kind of change we could do to move that limits or move their personal health to the better side.
The DPL term lasts for a year, What would you do during that you couldn't do now? [Andreas]:
Yeah, well this is basically what I said are my main issues. I need to admit I have no really clear imagination what kind of tasks will come to me as a DPL because all these financial issues and law issues possible and issues [that] people who are not really friendly to Debian might create. I'm afraid these things might occupy a lot of time and I can't say much about this because I simply don't know.
What are three key terms about you and your candidacy? [Andreas]:
As I said, I like to work on standards, I d like to make Debian try [to get it right so] that people don't get overworked, this third key point is be inviting to newcomers, to everybody who wants to come. Yeah, I also mentioned in my term this diversity issue, geographical and from gender point of view. This may be the three points I consider most important.
Preferred text editor? [Andreas]:
Yeah, my preferred one? Ah, well, I have no preferred text editor. I'm using the Midnight Commander very frequently which has an internal editor which is convenient for small text. For other things, I usually use VI but I also use Emacs from time to time. So, no, I have not preferred text editor. Whatever works nicely for me.
What is the importance of the community in the Debian Project? How would like to see it evolving over the next few years? [Andreas]:
Yeah, I think the community is extremely important. So, I was on a lot of DebConfs. I think it's not really 20 but 17 or 18 DebCons and I really enjoyed these events every year because I met so many friends and met so many interesting people that it's really enriching my life and those who I never met in person but have read interesting things and yeah, Debian community makes really a part of my life.
And how do you think it should evolve specifically? [Andreas]:
Yeah, for instance, last year in Kochi, it became even clearer to me that the geographical diversity is a really strong point. Just discussing with some women from India who is afraid about not coming next year to Busan because there's a problem with Shanghai and so on. I'm not really sure how we can solve this but I think this is a problem at least I wish to tackle and yeah, this is an interesting point, the geographical diversity and I'm running the so-called mentoring of the month. This is a small project to attract newcomers for the Debian Med team which has the focus on medical packages and I learned that we had always men applying for this and so I said, okay, I dropped the constraint of medical packages. Any topic is fine, I teach you packaging but it must be someone who does not consider himself a man. I got only two applicants, no, actually, I got one applicant and one response which was kind of strange if I'm hunting for women or so. I did not understand but I got one response and interestingly, it was for me one of the least expected counters. It was from Iran and I met a very nice woman, very open, very skilled and gifted and did a good job or have even lose contact today and maybe we need more actively approach groups that are underrepresented. I don't know if what's a good means which I did but at least I tried and so I try to think about these kind of things.
What part of Debian has made you smile? What part of the project has kept you going all through the years? [Andreas]:
Well, the card game which is called Mao on the DebConf made me smile all the time. I admit I joined only two or three times even if I really love this kind of games but I was occupied by other stuff so this made me really smile. I also think the first online DebConf in 2020 made me smile because we had this kind of short video sequences and I tried to make a funny video sequence about every DebConf I attended before. This is really funny moments but yeah, it's not only smile but yeah. One thing maybe it's totally unconnected to Debian but I learned personally something in Debian that we have a do-ocracy and you can do things which you think that are right if not going in between someone else, right? So respect everybody else but otherwise you can do so. And in 2020 I also started to take trees which are growing widely in my garden and plant them into the woods because in our woods a lot of trees are dying and so I just do something because I can. I have the resource to do something, take the small tree and bring it into the woods because it does not harm anybody. I asked the forester if it is okay, yes, yes, okay. So everybody can do so but I think the idea to do something like this came also because of the free software idea. You have the resources, you have the computer, you can do something and you do something productive, right? And when thinking about this I think it was also my Debian work. Meanwhile I have planted more than 3,000 trees so it's not a small number but yeah, I enjoy this.
What part of Debian would you have some criticisms for? [Andreas]:
Yeah, it's basically the same as I said before. We need more standards to work together. I do not want to repeat this but this is what I think, yeah.
What field in Free Software generally do you think requires the most work to be put into it? What do you think is Debian's part in the field? [Andreas]:
It's also in general, the thing is the fact that I'm maintaining packages which are usually as modern software is maintained in Git, which is fine but we have some software which is at Sourceport, we have software laying around somewhere, we have software where Debian somehow became Upstream because nobody is caring anymore and free software is very different in several things, ways and well, I in principle like freedom of choice which is the basic of all our work. Sometimes this freedom goes in the way of productivity because everybody is free to re-implement. You asked me for the most favorite editor. In principle one really good working editor would be great to have and would work and we have maybe 500 in Debian or so, I don't know. I could imagine if people would concentrate and say five instead of 500 editors, we could get more productive, right? But I know this will not happen, right? But I think this is one thing which goes in the way of making things smooth and productive and we could have more manpower to replace one person who's [having] children, doing some other stuff and can't continue working on something and maybe this is a problem I will not solve, definitely not, but which I see.
What do you think is Debian's part in the field? [Andreas]:
Yeah, well, okay, we can bring together different Upstreams, so we are building some packages and have some general overview about similar things and can say, oh, you are doing this and some other person is doing more or less the same, do you want to join each other or so, but this is kind of a channel we have to our Upstreams which is probably not very successful. It starts with code copies of some libraries which are changed a little bit, which is fine license-wise, but not so helpful for different things and so I've tried to convince those Upstreams to forward their patches to the original one, but for this and I think we could do some kind of, yeah, [find] someone who brings Upstream together or to make them stop their forking stuff, but it costs a lot of energy and we probably don't have this and it's also not realistic that we can really help with this problem.
Do you have any questions for me? [Andreas]:
I enjoyed the interview, I enjoyed seeing you again after half a year or so. Yeah, actually I've seen you in the eating room or cheese and wine party or so, I do not remember we had to really talk together, but yeah, people around, yeah, for sure. Yeah.

Bits from Debian: apt install dpl-candidate: Sruthi Chandran

The Debian Project Developers will shortly vote for a new Debian Project Leader known as the DPL. The DPL is the official representative of representative of The Debian Project tasked with managing the overall project, its vision, direction, and finances. The DPL is also responsible for the selection of Delegates, defining areas of responsibility within the project, the coordination of Developers, and making decisions required for the project. Our outgoing and present DPL Jonathan Carter served 4 terms, from 2020 through 2024. Jonathan shared his last Bits from the DPL post to Debian recently and his hopes for the future of Debian. Recently, we sat with the two present candidates for the DPL position asking questions to find out who they really are in a series of interviews about their platforms, visions for Debian, lives, and even their favorite text editors. The interviews were conducted by disaster2life (Yashraj Moghe) and made available from video and audio transcriptions: Voting for the position starts on April 6, 2024. Editors' note: This is our official return to Debian interviews, readers should stay tuned for more upcoming interviews with Developers and other important figures in Debian as part of our "Meet your Debian Developer" series. We used the following tools and services: Turboscribe.ai for the transcription from the audio and video files, IRC: Oftc.net for communication, Jitsi meet for interviews, and Open Broadcaster Software (OBS) for editing and video. While we encountered many technical difficulties in the return to this process, we are still able and proud to present the transcripts of the interviews edited only in a few areas for readability. 2024 Debian Project Leader Candidate: Sruthi Chandran Sruthi's interview Hi Sruthi, so for the first question, who are you and could you tell us a little bit about yourself? [Sruthi]:
I usually talk about me whenever I am talking about answering the question who am I, I usually say like I am a librarian turned free software enthusiast and a Debian Developer. So I had no technical background and I learned, I was introduced to free software through my husband and then I learned Debian packaging, and eventually I became a Debian Developer. So I always give my example to people who say I am not technically inclined, I don't have technical background so I can't contribute to free software. So yeah, that's what I refer to myself.
For the next question, could you tell me what do you do in Debian, and could you mention your story up until here today? [Sruthi]:
Okay, so let me start from my initial days in Debian. I started contributing to Debian, my first contribution was a Tibetan font. We went to a Tibetan place and they were saying they didn't have a font in Linux. So that's how I started contributing. Then I moved on to Ruby packages, then I have some JavaScript and Go packages, all dependencies of GitLab. So I was involved with maintaining GitLab for some time, now I'm not very active there. But yeah, so GitLab was the main package I was contributing to since I contributed since 2016 to maybe like 2020 or something. Later I have come [over to] packaging. Now I am part of some of the teams, delegated teams, like community team and outreach team, as well as the Debconf committee. And the biggest, I think, my activity in Debian, I would say is organizing Debconf 2023. So it was a great experience and yeah, so that's my story in Debian.
So what are three key terms about you and your candidacy? [Sruthi]:
Okay, let me first think about it. For candidacy, I can start with diversity is one point I started expressing from the first time I contested for DPL. But to be honest, that's the main point I want to bring.
[Yashraj]:
So for diversity, if you could break down your thoughts on diversity and make them, [about] your three points including diversity.
[Sruthi]:
So in addition to, eventually when starting it was just diversity. Now I have like a bit more ideas, like community, like I want to be a leader for the Debian community. More than, I don't know, maybe people may not agree, but I would say I want to be a leader of Debian community rather than a Debian operating system. I connect to community more and third point I would say.
The term of a DPL lasts for an year. So what do you think during, what would you try to do during that, that you can't do from your position now? [Sruthi]:
Okay. So I, like, I am very happy with the structure of Debian and how things work in Debian. Like you can do almost a lot of things, like almost all things without being a DPL. Whatever change you want to bring about or whatever you want to do, you can do without being a DPL. Anyone, like every DD has the same rights. Only things I feel [the] DPL has hold on are mainly the budget or the funding part, which like, that's where they do the decision making part. And then comes like, and one advantage of DPL driving some idea is that somehow people tend to listen to that with more, like, tend to give more attention to what DPL is saying rather than a normal DD. So I wanted to, like, I have answered some of the questions on how to, how I plan to do the financial budgeting part, how I want to handle, like, and the other thing is using the extra attention that I get as a DPL, I would like to obviously start with the diversity aspect in Debian. And yeah, like, I, what I want to do is not, like, be a leader and say, like, take Debian to one direction where I want to go, but I would rather take suggestions and inputs from the whole community and go about with that. So yes, that's what I would say.
And taking a less serious question now, what is your preferred text editor? [Sruthi]:
Vim.
[Yashraj]:
Vim, wholeheartedly team Vim?
[Sruthi]:
Yes.
[Yashraj]:
Great. Well, this was made in Vim, all the text for this.
[Sruthi]:
So, like, since you mentioned extra data, I'll give my example, like, it's just a fun note, when I started contributing to Debian, as I mentioned, I didn't have any knowledge about free software, like Debian, and I was not used to even using Linux. So, and I didn't have experience with these text editors. So, when I started contributing, I used to do the editing part using gedit. So, that's how I started. Eventually, I moved to Nano, and once I reached Vim, I didn't move on.
Team Vim. Next question. What, what do you think is the importance of the Debian project in the world today? And where would you like to see it in 10 years, like 10 years into the future? [Sruthi]:
Okay. So, Debian, as we all know, is referred to as the universal operating system without, like, it is said for a reason. We have hundreds and hundreds of operating systems, like Linux, distributions based on Debian. So, I believe Debian, like even now, Debian has good influence on the, at least on the Linux or Linux ecosystem. So, what we implement in Debian has, like, is going to affect quite a lot of, like, a very good percentage of people using Linux. So, yes. So, I think Debian is one of the leading Linux distributions. And I think in 10 years, we should be able to reach a position, like, where we are not, like, even now, like, even these many years after having Linux, we face a lot of problems in newer and newer hardware coming up and installing on them is a big problem. Like, firmwares and all those things are getting more and more complicated. Like, it should be getting simpler, but it's getting more and more complicated. So, I, one thing I would imagine, like, I don't know if we will ever reach there, but I would imagine that eventually with the Debian, we should be able to have some, at least a few of the hardware developers or hardware producers have Debian pre-installed and those kind of things. Like, not, like, become, I'm not saying it's all, it's also available right now. What I'm saying is that it becomes prominent enough to be opted as, like, default distro.
What part of Debian has made you And what part of the project has kept you going all through these years? [Sruthi]:
Okay. So, I started to contribute in 2016, and I was part of the team doing GitLab packaging, and we did have a lot of training workshops and those kind of things within India. And I was, like, I had interacted with some of the Indian DDs, but I never got, like, even through chat or mail. I didn't have a lot of interaction with the rest of the world, DDs. And the 2019 Debconf changed my whole perspective about Debian. Before that, I wasn't, like, even, I was interested in free software. I was doing the technical stuff and all. But after DebConf, my whole idea has been, like, my focus changed to the community. Debian community is a very welcoming, very interesting community to be with. And so, I believe that, like, 2019 DebConf was a for me. And that kept, from 2019, my focus has been to how to support, like, how, I moved to the community part of Debian from there. Then in 2020 I became part of the community team, and, like, I started being part of other teams. So, these, I would say, the Debian community is the one, like, aspect of Debian that keeps me whole, keeps me held on to the Debian ecosystem as a whole.
Continuing to speak about Debian, what do you think, what is the first thing that comes to your mind when you think of Debian, like, the word, the community, what's the first thing? [Sruthi]:
I think I may sound like a broken record or something.
[Yashraj]:
No, no.
[Sruthi]:
Again, I would say the Debian community, like, it's the people who makes Debian, that makes Debian special. Like, apart from that, if I say, I would say I'm very, like, one part of Debian that makes me very happy is the, how the governing system of Debian works, the Debian constitution and all those things, like, it's a very unique thing for Debian. And, and it's like, when people say you can't work without a proper, like, establishment or even somebody deciding everything for you, it's difficult. When people say, like, we have been, Debian has been proving it for quite a long time now, that it's possible. So, so that's one thing I believe, like, that's one unique point. And I am very proud about that.
What areas do you think Debian is failing in, how can it (that standing) be improved? [Sruthi]:
So, I think where Debian is failing now is getting new people into Debian. Like, I don't remember, like, exactly the answer. But I remember hearing someone mention, like, the average age of a Debian Developer is, like, above 40 or 45 or something, like, exact age, I don't remember. But it's like, Debian is getting old. Like, the people in Debian are getting old and we are not getting enough of new people into Debian. And that's very important to have people, like, new people coming up. Otherwise, eventually, like, after a few years, nobody, like, we won't have enough people to take the project forward. So, yeah, I believe that is where we need to work on. We are doing some efforts, like, being part of GSOC or outreachy and having maybe other events, like, local events. Like, we used to have a lot of Debian packaging workshops in India. And those kind of, I think, in Brazil and all, they all have, like, local communities are doing. But we are not very successful in retaining the people who maybe come and try out things. But we are not very good at retaining the people, like, retaining people who come. So, we need to work on those things. Right now, I don't have a solid answer for that. But one thing, like, I was thinking about is, like, having a Debian specific outreach project, wherein the focus will be about the Debian, like, starting will be more on, like, usually what happens in GSOC and outreach is that people come, have the, do the contributions, and they go back. Like, they don't have that connection with the Debian, like, Debian community or Debian project. So, what I envision with these, the Debian outreach, the Debian specific outreach is that we have some part of the internship, like, even before starting the internship, we have some sessions and, like, with the people in Debian having, like, getting them introduced to the Debian philosophy and Debian community and Debian, how Debian works. And those things, we focus on that. And then we move on to the technical internship parts. So, I believe this could do some good in having, like, when you have people you can connect to, you tend to stay back in a project mode. When you feel something more than, like, right now, we have so many technical stuff to do, like, the choice for a college student is endless. So, if they want, if they stay back for something, like, maybe for Debian, I would say, we need to have them connected to the Debian project before we go into technical parts. Like, technical parts, like, there are other things as well, where they can go and do the technical part, but, like, they can come here, like, yeah. So, that's what I was saying. Focused outreach projects is one thing. That's just one. That's not enough. We need more of, like, more ideas to have more new people come up. And I'm very happy with, like, the DebConf thing. We tend to get more and more people from the places where we have a DebConf. Brazil is an example. After the Debconf, they have quite a good improvement on Debian contributors. And I think in India also, it did give a good result. Like, we have more people contributing and staying back and those things. So, yeah. So, these were the things I would say, like, we can do to improve.
For the final question, what field in free software do you, what field in free software generally do you think requires the most work to be put into it? What do you think is Debian's part in that field? [Sruthi]:
Okay. Like, right now, what comes to my mind is the free software licenses parts. Like, we have a lot of free software licenses, and there are non-free software licenses. But currently, I feel free software is having a big problem in enforcing these licenses. Like, there are, there may be big corporations or like some people who take up the whole, the code and may not follow the whole, for example, the GPL licenses. Like, we don't know how much of those, how much of the free softwares are used in the bigger things. Yeah, I agree. There are a lot of corporations who are afraid to touch free software. But there would be good amount of free software, free work that converts into property, things violating the free software licenses and those things. And we do not have the kind of like, we have SFLC, SFC, etc. But still, we do not have the ability to go behind and trace and implement the licenses. So, enforce those licenses and bring people who are violating the licenses forward and those kind of things is challenging because one thing is it takes time, like, and most importantly, money is required for the legal stuff. And not always people who like people who make small software, or maybe big, but they may not have the kind of time and money to have these things enforced. So, that's a big challenge free software is facing, especially in our current scenario. I feel we are having those, like, we need to find ways how we can get it sorted. I don't have an answer right now what to do. But this is a challenge I felt like and Debian's part in that. Yeah, as I said, I don't have a solution for that. But the Debian, so DFSG and Debian sticking on to the free software licenses is a good support, I think.
So, that was the final question, Do you have anything else you want to mention for anyone watching this? [Sruthi]:
Not really, like, I am happy, like, I think I was able to answer the questions. And yeah, I would say who is watching. I won't say like, I'm the best DPL candidate, you can't have a better one or something. I stand for a reason. And if you believe in that, or the Debian community and Debian diversity, and those kinds of things, if you believe it, I hope you would be interested, like, you would want to vote for me. That's it. Like, I'm not, I'll make it very clear. I'm not doing a technical leadership part here. So, those, I can't convince people who want technical leadership to vote for me. But I would say people who connect with me, I hope they vote for me.

4 April 2024

John Goerzen: The xz Issue Isn t About Open Source

You ve probably heard of the recent backdoor in xz. There have been a lot of takes on this, most of them boiling down to some version of:
The problem here is with Open Source Software.
I want to say not only is that view so myopic that it pushes towards the incorrect, but also it blinds us to more serious problems. Now, I don t pretend that there are no problems in the FLOSS community. There have been various pieces written about what this issue says about the FLOSS community (usually without actionable solutions). I m not here to say those pieces are wrong. Just that there s a bigger picture. So with this xz issue, it may well be a state actor (aka spy ) that added this malicious code to xz. We also know that proprietary software and systems can be vulnerable. For instance, a Twitter whistleblower revealed that Twitter employed Indian and Chinese spies, some knowingly. A recent report pointed to security lapses at Microsoft, including preventable lapses in security. According to the Wikipedia article on the SolarWinds attack, it was facilitated by various kinds of carelessness, including passwords being posted to Github and weak default passwords. They directly distributed malware-infested updates, encouraged customers to disable anti-malware tools when installing SolarWinds products, and so forth. It would be naive indeed to assume that there aren t black hat actors among the legions of programmers employed by companies that outsource work to low-cost countries some of which have challenges with bribery. So, given all this, we can t really say the problem is Open Source. Maybe it s more broad:
The problem here is with software.
Maybe that inches us closer, but is it really accurate? We have all heard of Boeing s recent issues, which seem to have some element of root causes in corporate carelessness, cost-cutting, and outsourcing. That sounds rather similar to the SolarWinds issue, doesn t it?
Well then, the problem is capitalism.
Maybe it has a role to play, but isn t it a little too easy to just say capitalism and throw up our hands helplessly, just as some do with FLOSS as at the start of this article? After all, capitalism also brought us plenty of products of very high quality over the years. When we can point to successful, non-careless products and I own some of them (for instance, my Framework laptop). We clearly haven t reached the root cause yet. And besides, what would you replace it with? All the major alternatives that have been tried have even stronger downsides. Maybe you replace it with better regulated capitalism , but that s still capitalism.
Then the problem must be with consumers.
As this argument would go, it s consumers buying patterns that drive problems. Buyers individual and corporate seek flashy features and low cost, prizing those over quality and security. No doubt this is true in a lot of cases. Maybe greed or status-conscious societies foster it: Temu promises people to shop like a billionaire , and unloads on them cheap junk, which all but guarantees that shipments from Temu containing products made with forced labor are entering the United States on a regular basis . But consumers are also people, and some fraction of them are quite capable of writing fantastic software, and in fact, do so. So what we need is some way to seize control. Some way to do what is right, despite the pressures of consumers or corporations. Ah yes, dear reader, you have been slogging through all these paragraphs and now realize I have been leading you to this:
Then the solution is Open Source.
Indeed. Faults and all, FLOSS is the most successful movement I know where people are bringing us back to the commons: working and volunteering for the common good, unleashing a thousand creative variants on a theme, iterating in every direction imaginable. We have FLOSS being vital parts of everything from $30 Raspberry Pis to space missions. It is bringing education and communication to impoverished parts of the world. It lets everyone write and release software. And, unlike the SolarWinds and Twitter issues, it exposes both clever solutions and security flaws to the world. If an authentication process in Windows got slower, we would all shrug and mutter Microsoft under our breath. Because, really, what else can we do? We have no agency with Windows. If an authentication process in Linux gets slower, anybody that s interested anybody at all can dive in and ask why and trace it down to root causes. Some look at this and say FLOSS is responsible for this mess. I look at it and say, this would be so much worse if it wasn t FLOSS and experience backs me up on this. FLOSS doesn t prevent security issues itself. What it does do is give capabilities to us all. The ability to investigate. Ability to fix. Yes, even the ability to break and its cousin, the power to learn. And, most rewarding, the ability to contribute.

Next.

Previous.