Search Results: "free"

3 May 2024

Colin Watson: Playing with rich

One of the things I do as a side project for Freexian is to work on various bits of business automation: accounting tools, programs to help contributors report their hours, invoicing, that kind of thing. While it s not quite my usual beat, this makes quite a good side project as the tools involved are mostly rather sensible and easy to deal with (Python, git, ledger, that sort of thing) and it s the kind of thing where I can dip into it for a day or so a week and feel like I m making useful contributions. The logic can be quite complex, but there s very little friction in the tools themselves. A recent case where I did run into some friction in the tools was with some commands that need to present small amounts of tabular data on the terminal, using OSC 8 hyperlinks if the terminal supports them: think customer-related information with some links to issues. One of my colleagues had previously done this using a hack on top of texttable, which was perfectly fine as far as it went. However, now I wanted to be able to add multiple links in a single table cell in some cases, and that was really going to stretch the limits of that approach: working out the width of the displayed text in the cell was going to take an annoying amount of bookkeeping. I started looking around to see whether any other approaches might be easier, without too much effort (remember that a day or so a week bit above). ansiwrap looked somewhat promising, but it isn t currently packaged in Debian, and it would have still left me with the problem of figuring out how to integrate it into texttable, which looked like it would be quite complicated. Then I remembered that I d heard good things about rich, and thought I d take a look. rich turned out to be exactly what I wanted. Instead of something like this based on the texttable hack above:
import shutil
from pyxian.texttable import UrlTable
termsize = shutil.get_terminal_size((80, 25))
table = UrlTable(max_width=termsize.columns)
table.set_deco(UrlTable.HEADER)
table.set_cols_align(["l"])
table.set_cols_dtype(["u"])
table.add_row(["Issue"])
table.add_row([(issue_url, f"# issue_id ")]
print(table.draw())
now I can do this instead:
import rich
from rich import box
from rich.table import Table
table = Table(box=box.SIMPLE)
table.add_column("Issue")
table.add_row(f"[link= issue_url ]# issue_id [/link]")
rich.print(table)
While this is a little shorter, the real bonus is that I can now just put multiple [link] tags in a single string, and it all just works. No ceremony. In fact, once the relevant bits of code passed type-checking (since the real code is a bit more complex than the samples above), it worked first time. It s a pleasure to work with a library like that. It looks like I ve only barely scratched the surface of rich, but I expect I ll reach for it more often now.

1 May 2024

Bits from Debian: Debian welcomes the 2024 GSOC contributors/students

GSoC logo We are very excited to announce that Debian has selected seven contributors to work under mentorship on a variety of projects with us during the Google Summer of Code. Here are the list of the projects, students, and details of the tasks to be performed.
Project: Android SDK Tools in Debian Deliverables of the project: Make the entire Android toolchain, Android Target Platform Framework, and SDK tools available in the Debian archives.
Project: Benchmarking Parallel Performance of Numerical MPI Packages Deliverables of the project: Deliver an automated method for Debian maintainers to test selected numerical Debian packages for their parallel performance in clusters, in particular to catch performance regressions from updates, and to verify expected performance gains, such as Amdahl s and Gufstafson s law, from increased cluster resources.
Project: Debian MobCom Deliverables of the project: Update the outdated mobile packages and recreate aged packages due to new dependencies. Bring in more mobile communication tools by adding about 5 new packages.
Project: Improve support of the Rust coreutils in Debian Deliverables of the project: Make uutils behave more like GNU s coreutils by improving compatibility with GNU coreutils test suit.
Project: Improve support of the Rust findutils in Debian Deliverables of the project: A safer and more performant implementation of the GNU suite's xargs, find, locate and updatedb tools in rust.
Project: Expanding ROCm support within Debian and derivatives Deliverables of the project: Building, packaging, and uploading missing ROCm software into Debian repositories, starting with simple tools and progressing to high-level applications like PyTorch, with the final deliverables comprising a series of ROCm packages meeting community quality assurance standards.
Project: procps: Development of System Monitoring, Statistics and Information Tools in Rust Deliverables of the project: Improve the usability of the entire Rust-based implementation of the procps utility on Linux.
Congratulations and welcome to all the contributors! The Google Summer of Code program is possible in Debian thanks to the efforts of Debian Developers and Debian Contributors that dedicate part of their free time to mentor contributors and outreach tasks. Join us and help extend Debian! You can follow the contributors' weekly reports on the debian-outreach mailing-list, chat with us on our IRC channel or reach out to the individual projects' team mailing lists.

Colin Watson: Free software activity in April 2024

My Debian contributions this month were all sponsored by Freexian. You can support my work directly via Liberapay.

Bits from Debian: Infomaniak First Platinum Sponsor of DebConf24

infomaniaklogo We are pleased to announce that Infomaniak has committed to sponsor DebConf24 as a Platinum Sponsor. Infomaniak is an independent cloud service provider recognised throughout Europe for its commitment to privacy, the local economy and the environment. Recording growth of 18% in 2023, the company is developing a suite of online collaborative tools and cloud hosting, streaming, marketing and events solutions. Infomaniak uses exclusively renewable energy, builds its own data centers and develops its solutions in Switzerland at the heart of Europe, without relocating. The company powers the website of the Belgian radio and TV service (RTBF) and provides streaming for more than 3,000 TV and radio stations in Europe. With this commitment as Platinum Sponsor, Infomaniak is contributing to the Debian annual Developers' conference, directly supporting the progress of Debian and Free Software. Infomaniak contributes to strengthen the community that collaborates on Debian projects from all around the world throughout all of the year. Thank you very much, Infomaniak, for your support of DebConf24! Become a sponsor too! DebConf24 will take place from 28th July to 4th August 2024 in Busan, South Korea, and will be preceded by DebCamp, from 21st to 27th July 2024. DebConf24 is accepting sponsors! Interested companies and organizations should contact the DebConf team through sponsors@debconf.org, or viisit the DebConf24 website at https://debconf24.debconf.org/sponsors/become-a-sponsor/.

Guido G nther: Free Software Activities April 2024

A short status update of what happened on my side last month. Maintenance and code review keep to be the top time sinks (in a positive way). If you want to support my work see donations.

Matthew Palmer: The Mediocre Programmer's Guide to Rust

Me: Hi everyone, my name s Matt, and I m a mediocre programmer. Everyone: Hi, Matt. Facilitator: Are you an alcoholic, Matt? Me: No, not since I stopped reading Twitter. Facilitator: Then I think you re in the wrong room.
Yep, that s my little secret I m a mediocre programmer. The definition of the word hacker I most closely align with is someone who makes furniture with an axe . I write simple, straightforward code because trying to understand complexity makes my head hurt. Which is why I ve always avoided the more academic languages, like OCaml, Haskell, Clojure, and so on. I know they re good languages people far smarter than me are building amazing things with them but the time I hear the word endofunctor , I ve lost all focus (and most of my will to live). My preferred languages are the ones that come with less intellectual overhead, like C, PHP, Python, and Ruby. So it s interesting that I ve embraced Rust with significant vigour. It s by far the most complicated language that I feel at least vaguely comfortable with using in anger . Part of that is that I ve managed to assemble a set of principles that allow me to almost completely avoid arguing with Rust s dreaded borrow checker, lifetimes, and all the rest of the dark, scary corners of the language. It s also, I think, that Rust helps me to write better software, and I can feel it helping me (almost) all of the time. In the spirit of helping my fellow mediocre programmers to embrace Rust, I present the principles I ve assembled so far.

Neither a Borrower Nor a Lender Be If you know anything about Rust, you probably know about the dreaded borrow checker . It s the thing that makes sure you don t have two pieces of code trying to modify the same data at the same time, or using a value when it s no longer valid. While Rust s borrowing semantics allow excellent performance without compromising safety, for us mediocre programmers it gets very complicated, very quickly. So, the moment the compiler wants to start talking about explicit lifetimes , I shut it up by just using owned values instead. It s not that I never borrow anything; I have some situations that I know are borrow-safe for the mediocre programmer (I ll cover those later). But any time I m not sure how things will pan out, I ll go straight for an owned value. For example, if I need to store some text in a struct or enum, it s going straight into a String. I m not going to start thinking about lifetimes and &'a str; I ll leave that for smarter people. Similarly, if I need a list of things, it s a Vec<T> every time no &'b [T] in my structs, thank you very much.

Attack of the Clones Following on from the above, I ve come to not be afraid of .clone(). I scatter them around my code like seeds in a field. Life s too short to spend time trying to figure out who s borrowing what from whom, if I can just give everyone their own thing. There are warnings in the Rust book (and everywhere else) about how a clone can be expensive . While it s true that, yes, making clones of data structures consumes CPU cycles and memory, it very rarely matters. CPU cycles are (usually) plentiful and RAM (usually) relatively cheap. Mediocre programmer mental effort is expensive, and not to be spent on premature optimisation. Also, if you re coming from most any other modern language, Rust is already giving you so much more performance that you re probably ending up ahead of the game, even if you .clone() everything in sight. If, by some miracle, something I write gets so popular that the expense of all those spurious clones becomes a problem, it might make sense to pay someone much smarter than I to figure out how to make the program a zero-copy masterpiece of efficient code. Until then clone early and clone often, I say!

Derive Macros are Powerful Magicks If you start .clone()ing everywhere, pretty quickly you ll be hit with this error:

error[E0599]: no method named  clone  found for struct  Foo  in the current scope

This is because not everything can be cloned, and so if you want your thing to be cloned, you need to implement the method yourself. Well sort of. One of the things that I find absolutely outstanding about Rust is the derive macro . These allow you to put a little marker on a struct or enum, and the compiler will write a bunch of code for you! Clone is one of the available so-called derivable traits , so you add #[derive(Clone)] to your structs, and poof! you can .clone() to your heart s content. But there are other things that are commonly useful, and so I ve got a set of traits that basically all of my data structures derive:

#[derive(Clone, Debug, Default)]
struct Foo  
    // ...
 

Every time I write a struct or enum definition, that line #[derive(Clone, Debug, Default)] goes at the top. The Debug trait allows you to print a debug representation of the data structure, either with the dbg!() macro, or via the :? format in the format!() macro (and anywhere else that takes a format string). Being able to say what exactly is that? comes in handy so often, not having a Debug implementation is like programming with one arm tied behind your Aeron. Meanwhile, the Default trait lets you create an empty instance of your data structure, with all of the fields set to their own default values. This only works if all the fields themselves implement Default, but a lot of standard types do, so it s rare that you ll define a structure that can t have an auto-derived Default. Enums are easily handled too, you just mark one variant as the default:

#[derive(Clone, Debug, Default)]
enum Bar  
    Something(String),
    SomethingElse(i32),
    #[default]   // <== mischief managed
    Nothing,
 

Borrowing is OK, Sometimes While I previously said that I like and usually use owned values, there are a few situations where I know I can borrow without angering the borrow checker gods, and so I m comfortable doing it. The first is when I need to pass a value into a function that only needs to take a little look at the value to decide what to do. For example, if I want to know whether any values in a Vec<u32> are even, I could pass in a Vec, like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(numbers)  
        println!("EVENS!");
     
 
fn has_evens(numbers: Vec<u32>) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

Howver, this gets ugly if I m going to use numbers later, like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(numbers)  
        println!("EVENS!");
     
    // Compiler complains about "value borrowed here after move"
    println!("Sum:  ", numbers.iter().sum::<u32>());
 
fn has_evens(numbers: Vec<u32>) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

Helpfully, the compiler will suggest I use my old standby, .clone(), to fix this problem. But I know that the borrow checker won t have a problem with lending that Vec<u32> into has_evens() as a borrowed slice, &[u32], like this:

fn main()  
    let numbers = vec![0u32, 1, 2, 3, 4, 5];
    if has_evens(&numbers)  
        println!("EVENS!");
     
 
fn has_evens(numbers: &[u32]) -> bool  
    numbers.iter().any( n  n % 2 == 0)
 

The general rule I ve got is that if I can take advantage of lifetime elision (a fancy term meaning the compiler can figure it out ), I m probably OK. In less fancy terms, as long as the compiler doesn t tell me to put 'a anywhere, I m in the green. On the other hand, the moment the compiler starts using the words explicit lifetime , I nope the heck out of there and start cloning everything in sight. Another example of using lifetime elision is when I m returning the value of a field from a struct or enum. In that case, I can usually get away with returning a borrowed value, knowing that the caller will probably just be taking a peek at that value, and throwing it away before the struct itself goes out of scope. For example:

struct Foo  
    id: u32,
    desc: String,
 
impl Foo  
    fn description(&self) -> &str  
        &self.desc
     
 

Returning a reference from a function is practically always a mortal sin for mediocre programmers, but returning one from a struct method is often OK. In the rare case that the caller does want the reference I return to live for longer, they can always turn it into an owned value themselves, by calling .to_owned().

Avoid the String Tangle Rust has a couple of different types for representing strings String and &str being the ones you see most often. There are good reasons for this, however it complicates method signatures when you just want to take some sort of bunch of text , and don t care so much about the messy details. For example, let s say we have a function that wants to see if the length of the string is even. Using the logic that since we re just taking a peek at the value passed in, our function might take a string reference, &str, like this:

fn is_even_length(s: &str) -> bool  
    s.len() % 2 == 0
 

That seems to work fine, until someone wants to check a formatted string:

fn main()  
    // The compiler complains about "expected  &str , found  String "
    if is_even_length(format!("my string is  ", std::env::args().next().unwrap()))  
        println!("Even length string");
     
 

Since format! returns an owned string, String, rather than a string reference, &str, we ve got a problem. Of course, it s straightforward to turn the String from format!() into a &str (just prefix it with an &). But as mediocre programmers, we can t be expected to remember which sort of string all our functions take and add & wherever it s needed, and having to fix everything when the compiler complains is tedious. The converse can also happen: a method that wants an owned String, and we ve got a &str (say, because we re passing in a string literal, like "Hello, world!"). In this case, we need to use one of the plethora of available turn this into a String mechanisms (.to_string(), .to_owned(), String::from(), and probably a few others I ve forgotten), on the value before we pass it in, which gets ugly real fast. For these reasons, I never take a String or an &str as an argument. Instead, I use the Power of Traits to let callers pass in anything that is, or can be turned into, a string. Let us have some examples. First off, if I would normally use &str as the type, I instead use impl AsRef<str>:

fn is_even_length(s: impl AsRef<str>) -> bool  
    s.as_ref().len() % 2 == 0
 

Note that I had to throw in an extra as_ref() call in there, but now I can call this with either a String or a &str and get an answer. Now, if I want to be given a String (presumably because I plan on taking ownership of the value, say because I m creating a new instance of a struct with it), I use impl Into<String> as my type:

struct Foo  
    id: u32,
    desc: String,
 
impl Foo  
    fn new(id: u32, desc: impl Into<String>) -> Self  
        Self   id, desc: desc.into()  
     
 

We have to call .into() on our desc argument, which makes the struct building a bit uglier, but I d argue that s a small price to pay for being able to call both Foo::new(1, "this is a thing") and Foo::new(2, format!("This is a thing named name ")) without caring what sort of string is involved.

Always Have an Error Enum Rust s error handing mechanism (Results everywhere), along with the quality-of-life sugar surrounding it (like the short-circuit operator, ?), is a delightfully ergonomic approach to error handling. To make life easy for mediocre programmers, I recommend starting every project with an Error enum, that derives thiserror::Error, and using that in every method and function that returns a Result. How you structure your Error type from there is less cut-and-dried, but typically I ll create a separate enum variant for each type of error I want to have a different description. With thiserror, it s easy to then attach those descriptions:

#[derive(Clone, Debug, thiserror::Error)]
enum Error  
    #[error(" 0  caught fire")]
    Combustion(String),
    #[error(" 0  exploded")]
    Explosion(String),
 

I also implement functions to create each error variant, because that allows me to do the Into<String> trick, and can sometimes come in handy when creating errors from other places with .map_err() (more on that later). For example, the impl for the above Error would probably be:

impl Error  
    fn combustion(desc: impl Into<String>) -> Self  
        Self::Combustion(desc.into())
     
    fn explosion(desc: impl Into<String>) -> Self  
        Self::Explosion(desc.into())
     
 

It s a tedious bit of boilerplate, and you can use the thiserror-ext crate s thiserror_ext::Construct derive macro to do the hard work for you, if you like. It, too, knows all about the Into<String> trick.

Banish map_err (well, mostly) The newer mediocre programmer, who is just dipping their toe in the water of Rust, might write file handling code that looks like this:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error>  
    let mut f = File::open(name.as_ref())
        .map_err( e  Error::FileOpenError(name.as_ref().to_string(), e))?;
    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        .map_err( e  Error::ReadError(e))?;
    String::from_utf8(buf)
        .map_err( e  Error::EncodingError(e))?
        .parse::<u32>()
        .map_err( e  Error::ParseError(e))
 

This works great (or it probably does, I haven t actually tested it), but there are a lot of .map_err() calls in there. They take up over half the function, in fact. With the power of the From trait and the magic of the ? operator, we can make this a lot tidier. First off, assume we ve written boilerplate error creation functions (or used thiserror_ext::Construct to do it for us)). That allows us to simplify the file handling portion of the function a bit:

fn read_u32_from_file(name: impl AsRef<str>) -> Result<u32, Error>  
    let mut f = File::open(name.as_ref())
        // We've dropped the  .to_string()  out of here...
        .map_err( e  Error::file_open_error(name.as_ref(), e))?;
    let mut buf = vec![0u8; 30];
    f.read(&mut buf)
        // ... and the explicit parameter passing out of here
        .map_err(Error::read_error)?;
    // ...

If that latter .map_err() call looks weird, without the e and such, it s passing a function-as-closure, which just saves on a few characters typing. Just because we re mediocre, doesn t mean we re not also lazy. Next, if we implement the From trait for the other two errors, we can make the string-handling lines significantly cleaner. First, the trait impl:

impl From<std::string::FromUtf8Error> for Error  
    fn from(e: std::string::FromUtf8Error) -> Self  
        Self::EncodingError(e)
     
 
impl From<std::num::ParseIntError> for Error  
    fn from(e: std::num::ParseIntError) -> Self  
        Self::ParseError(e)
     
 

(Again, this is boilerplate that can be autogenerated, this time by adding a #[from] tag to the variants you want a From impl on, and thiserror will take care of it for you) In any event, no matter how you get the From impls, once you have them, the string-handling code becomes practically error-handling-free:

    Ok(
        String::from_utf8(buf)?
            .parse::<u32>()?
    )

The ? operator will automatically convert the error from the types returned from each method into the return error type, using From. The only tiny downside to this is that the ? at the end strips the Result, and so we ve got to wrap the returned value in Ok() to turn it back into a Result for returning. But I think that s a small price to pay for the removal of those .map_err() calls. In many cases, my coding process involves just putting a ? after every call that returns a Result, and adding a new Error variant whenever the compiler complains about not being able to convert some new error type. It s practically zero effort outstanding outcome for the mediocre programmer.

Just Because You re Mediocre, Doesn t Mean You Can t Get Better To finish off, I d like to point out that mediocrity doesn t imply shoddy work, nor does it mean that you shouldn t keep learning and improving your craft. One book that I ve recently found extremely helpful is Effective Rust, by David Drysdale. The author has very kindly put it up to read online, but buying a (paper or ebook) copy would no doubt be appreciated. The thing about this book, for me, is that it is very readable, even by us mediocre programmers. The sections are written in a way that really clicked with me. Some aspects of Rust that I d had trouble understanding for a long time such as lifetimes and the borrow checker, and particularly lifetime elision actually made sense after I d read the appropriate sections.

Finally, a Quick Beg I m currently subsisting on the kindness of strangers, so if you found something useful (or entertaining) in this post, why not buy me a refreshing beverage? It helps to know that people like what I m doing, and helps keep me from having to sell my soul to a private equity firm.

28 April 2024

Russell Coker: Kitty and Mpv

6 months ago I switched to Kitty for terminal emulation [1]. So far there s only been one thing that I couldn t effectively do with Kitty that I did with Konsole in the past, that is watching a music video in 1/4 of the screen while using the rest for terminals. I could setup multiple Kitty windows taking up the rest of the screen but I wanted to keep using a single Kitty with multiple terminals and just have mpv go over one of them. Kitty supports it s own graphical interface so mpv vo=kitty works but took 6* the CPU power in my tests which isn t good for a laptop. For X11 there s a ontop option for mpv that does what you expect, but that doesn t work on Wayland. Not working is mostly Wayland s fault as there is a long tail of less commonly used graphical operations that work in X11 but aren t yet implemented in Wayland. I have filed a Debian bug report about this, the mpv man page should note that it s only going to work on X11 on Linux. I have discovered a solution to that, in the KDE settings there s a Window Rules section, I created an entry for Window class exactly matching mpv and then added a rule Keep above other windows and set it for force and yes . After that I can just resize mpv to occlude just one terminal and keep using the rest. Also one noteworthy thing with this is that it makes mpv go on top of the KDE taskbar, which can be a feature.

26 April 2024

Russell Coker: Humane AI Pin

I wrote a blog post The Shape of Computers [1] exploring ideas of how computers might evolve and how we can use them. One of the devices I mentioned was the Humane AI Pin, which has just been the recipient of one of the biggest roast reviews I ve ever seen [2], good work Marques Brownlee! As an aside I was once given a product to review which didn t work nearly as well as I think it should have worked so I sent an email to the developers saying sorry this product failed to work well so I can t say anything good about it and didn t publish a review. One of the first things that caught my attention in the review is the note that the AI Pin doesn t connect to your phone. I think that everything should connect to everything else as a usability feature. For security we don t want so much connecting and it s quite reasonable to turn off various connections at appropriate times for security, the Librem5 is an example of how this can be done with hardware switches to disable Wifi etc. But to just not have connectivity is bad. The next noteworthy thing is the external battery which also acts as a magnetic attachment from inside your shirt. So I guess it s using wireless charging through your shirt. A magnetically attached external battery would be a great feature for a phone, you could quickly swap a discharged battery for a fresh one and keep using it. When I tried to make the PinePhonePro my daily driver [3] I gave up and charging was one of the main reasons. One thing I learned from my experiment with the PinePhonePro is that the ratio of charge time to discharge time is sometimes more important than battery life and being able to quickly swap batteries without rebooting is a way of solving that. The reviewer of the AI Pin complains later in the video about battery life which seems to be partly due to wireless charging from the detachable battery and partly due to being physically small. It seems the phablet form factor is the smallest viable personal computer at this time. The review glosses over what could be the regarded as the 2 worst issues of the device. It does everything via the cloud (where the cloud means a computer owned by someone I probably shouldn t trust ) and it records everything. Strange that it s not getting the hate the Google Glass got. The user interface based on laser projection of menus on the palm of your hand is an interesting concept. I d rather have a Bluetooth attached tablet or something for operations that can t be conveniently done with voice. The reviewer harshly criticises the laser projection interface later in the video, maybe technology isn t yet adequate to implement this properly. The first criticism of the device in the review part of the video is of the time taken to answer questions, especially when Internet connectivity is poor. His question who designed the Washington Monument took 8 seconds to start answering it in his demonstration. I asked the Alpaca LLM the same question running on 4 cores of a E5-2696 and it took 10 seconds to start answering and then printed the words at about speaking speed. So if we had a free software based AI device for this purpose it shouldn t be difficult to get local LLM computation with less delay than the Humane device by simply providing more compute power than 4 cores of a E5-2696v3. How does a 32 core 1.05GHz Mali G72 from 2017 (as used in the Galaxy Note 9) compare to 4 cores of a 2.3GHz Intel CPU from 2015? Passmark says that Intel CPU can do 48GFlop with all 18 cores so 4 cores can presumably do about 10GFlop which seems less than the claimed 20-32GFlop of the Mali G72. It seems that with the right software even older Android phones could give adequate performance for a local LLM. The Alpaca model I m testing with takes 4.2G of RAM to run which is usable in a Note 9 with 8G of RAM or a Pixel 8 Pro with 12G. A Pixel 8 Pro could have 4.2G of RAM reserved for a LLM and still have as much RAM for other purposes as my main laptop as of a few months ago. I consider the speed of Alpaca on my workstation to be acceptable but not great. If we can get FOSS phones running a LLM at that speed then I think it would be great for a first version we can always rely on newer and faster hardware becoming available. Marques notes that the cause of some of the problems is likely due to a desire to make it a separate powerful product in the future and that if they gave it phone connectivity in the start they would have to remove that later on. I think that the real problem is that the profit motive is incompatible with good design. They want to have a product that s stand-alone and justifies the purchase price plus subscription and that means not making it a phone accessory . While I think that the best thing for the user is to allow it to talk to a phone, a PC, a car, and anything else the user wants. He compares it to the Apple Vision Pro which has the same issue of trying to be a stand-alone computer but not being properly capable of it. One of the benefits that Marques cites for the AI Pin is the ability to capture voice notes. Dictaphones have been around for over 100 years and very few people have bought them, not even in the 80s when they became cheap. While almost everyone can occasionally benefit from being able to make a note of an idea when it s not convenient to write it down there are few people who need it enough to carry a separate device, not even if that device is tiny. But a phone as a general purpose computing device with microphone can easily be adapted to such things. One possibility would be to program a phone to start a voice note when the volume up and down buttons are pressed at the same time or when some other condition is met. Another possibility is to have a phone have a hotkey function that varies by what you are doing, EG if bushwalking have the hotkey be to take a photo or if on a flight have it be taking a voice note. On the Mobile Apps page on the Debian wiki I created a section for categories of apps that I think we need [4]. In that section I added the following list:
  1. Voice input for dictation
  2. Voice assistant like Google/Apple
  3. Voice output
  4. Full operation for visually impaired people
One thing I really like about the AI Pin is that it has the potential to become a really good computing and personal assistant device for visually impaired people funded by people with full vision who want to legally control a computer while driving etc. I have some concerns about the potential uses of the AI Pin while driving (as Marques stated an aim to do), but if it replaces the use of regular phones while driving it will make things less bad. Marques concludes his video by warning against buying a product based on the promise of what it can be in future. I bought the Librem5 on exactly that promise, the difference is that I have the source and the ability to help make the promise come true. My aim is to spend thousands of dollars on test hardware and thousands of hours of development time to help make FOSS phones a product that most people can use at low price with little effort. Another interesting review of the pin is by Mrwhostheboss [5], one of his examples is of asking the pin for advice about a chair but without him knowing the pin selected a different chair in the room. He compares this to using Google s apps on a phone and seeing which item the app has selected. He also said that he doesn t want to make an order based on speech he wants to review a page of information about it. I suspect that the design of the pin had too much input from people accustomed to asking a corporate travel office to find them a flight and not enough from people who look through the details of the results of flight booking services trying to save an extra $20. Some people might say if you need to save $20 on a flight then a $24/month subscription computing service isn t for you , I reject that argument. I can afford lots of computing services because I try to get the best deal on every moderately expensive thing I pay for. Another point that Mrwhostheboss makes is regarding secret SMS, you probably wouldn t want to speak a SMS you are sending to your SO while waiting for a train. He makes it clear that changing between phone and pin while sharing resources (IE not having a separate phone number and separate data store) is a desired feature. The most insightful point Mrwhostheboss made was when he suggested that if the pin had come out before the smartphone then things might have all gone differently, but now anything that s developed has to be based around the expectations of phone use. This is something we need to keep in mind when developing FOSS software, there s lots of different ways that things could be done but we need to meet the expectations of users if we want our software to be used by many people. I previously wrote a blog post titled Considering Convergence [6] about the possible ways of using a phone as a laptop. While I still believe what I wrote there I m now considering the possibility of ease of movement of work in progress as a way of addressing some of the same issues. I ve written a blog post about Convergence vs Transferrence [7].

25 April 2024

Dirk Eddelbuettel: RQuantLib 0.4.22 on CRAN: Maintenance

A new minor release 0.4.22 of RQuantLib arrived at CRAN earlier today, and has been uploaded to Debian. QuantLib is a rather comprehensice free/open-source library for quantitative finance. RQuantLib connects (some parts of) it to the R environment and language, and has been part of CRAN for more than twenty years (!!) as it was one of the first packages I uploaded there. This release of RQuantLib updates to QuantLib version 1.34 which was just released yesterday, and deprecates use of an access point / type for price/yield conversion for bonds. We also made two minor earlier changes.

Changes in RQuantLib version 0.4.22 (2024-04-25)
  • Small code cleanup removing duplicate R code
  • Small improvements to C++ compilation flags
  • Robustify internal version comparison to accommodate RC releases
  • Adjustments to two C++ files for QuantLib 1.34

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the RQuantLib page. Questions, comments etc should go to the rquantlib-devel mailing list. Issue tickets can be filed at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Russ Allbery: Review: Nation

Review: Nation, by Terry Pratchett
Publisher: Harper
Copyright: 2008
Printing: 2009
ISBN: 0-06-143303-9
Format: Trade paperback
Pages: 369
Nation is a stand-alone young adult fantasy novel. It was published in the gap between Discworld novels Making Money and Unseen Academicals. Nation starts with a plague. The Russian influenza has ravaged Britain, including the royal family. The next in line to the throne is off on a remote island and must be retrieved and crowned as soon as possible, or an obscure provision in Magna Carta will cause no end of trouble. The Cutty Wren is sent on this mission, carrying the Gentlemen of Last Resort. Then comes the tsunami. In the midst of fire raining from the sky and a wave like no one has ever seen, Captain Roberts tied himself to the wheel of the Sweet Judy and steered it as best he could, straight into an island. The sole survivor of the shipwreck: one Ermintrude Fanshaw, daughter of the governor of some British island possessions. Oh, and a parrot. Mau was on the Boys' Island when the tsunami came, going through his rite of passage into manhood. He was to return to the Nation the next morning and receive his tattoos and his adult soul. He survived in a canoe. No one else in the Nation did. Terry Pratchett considered Nation to be his best book. It is not his best book, at least in my opinion; it's firmly below the top tier of Discworld novels, let alone Night Watch. It is, however, an interesting and enjoyable book that tackles gods and religion with a sledgehammer rather than a knife. It's also very, very dark and utterly depressing at the start, despite a few glimmers of Pratchett's humor. Mau is the main protagonist at first, and the book opens with everyone he cares about dying. This is the place where I thought Pratchett diverged the most from his Discworld style: in Discworld, I think most of that would have been off-screen, but here we follow Mau through the realization, the devastation, the disassociation, the burials at sea, the thoughts of suicide, and the complete upheaval of everything he thought he was or was about to become. I found the start of this book difficult to get through. The immediate transition into potentially tragic misunderstandings between Mau and Daphne (as Ermintrude names herself once there is no one to tell her not to) didn't help. As I got farther into the book, though, I warmed to it. The best parts early on are Daphne's baffled but scientific attempts to understand Mau's culture and her place in it. More survivors arrive, and they start to assemble a community, anchored in large part by Mau's stubborn determination to do what's right even though he's lost all of his moorings. That community eventually re-establishes contact with the rest of the world and the opening plot about the British monarchy, but not before Daphne has been changed profoundly by being part of it. I think Pratchett worked hard at keeping Mau's culture at the center of the story. It's notable that the community that reforms over the course of the book essentially follows the patterns of Mau's lost Nation and incorporates Daphne into it, rather than (as is so often the case) the other way around. The plot itself is fiercely anti-colonial in a way that mostly worked. Still, though, it's a quasi-Pacific-island culture written by a white British man, and I had some qualms. Pratchett quite rightfully makes it clear in the afterward that this is an alternate world and Mau's culture is not a real Pacific island culture. However, that also means that its starkly gender-essentialist nature was a free choice, rather than one based on some specific culture, and I found that choice somewhat off-putting. The religious rituals are all gendered, the dwelling places are gendered, and one's entire life course in Mau's world seems based on binary classification as a man or a woman. Based on Pratchett's other books, I assume this was more an unfortunate default than a deliberate choice, but it's still a choice he could have avoided. The end of this book wrestles directly with the relative worth of Mau's culture versus that of the British. I liked most of this, but the twists that Pratchett adds to avoid the colonialist results we saw in our world stumble partly into the trap of making Mau's culture valuable by British standards. (I'm being a bit vague here to avoid spoilers.) I think it is very hard to base this book on a different set of priorities and still bring the largely UK, US, and western European audience along, so I don't blame Pratchett for failing to do it, but I'm a bit sad that the world still revolved around a British axis. This felt quite similar to Discworld to me in its overall sensibilities, but with the roles of moral philosophy and humor reversed. Discworld novels usually start with some larger-than-life characters and an absurd plot, and then the moral philosophy sneaks up behind you when you're not looking and hits you over the head. Nation starts with the moral philosophy: Mau wrestles with his gods and the problem of evil in a way that reminded me of Job, except with a far different pantheon and rather less tolerance for divine excuses on the part of the protagonist. It's the humor, instead, that sneaks up on you and makes you laugh when the plot is a bit too much. But the mix arrives at much the same place: the absurd hand-in-hand with the profound, and all seen from an angle that makes it a bit easier to understand. I'm not sure I would recommend Nation as a good place to start with Pratchett. I felt like I benefited from having read a lot of Discworld to build up my willingness to trust where Pratchett was going. But it has the quality of writing of late Discworld without the (arguable) need to read 25 books to understand all of the backstory. Regardless, recommended, and you'll never hear Twinkle Twinkle Little Star in quite the same way again. Rating: 8 out of 10

22 April 2024

Vincent Fourmond: QSoas version 3.3 is out

Version 3.3 brings in new features, including reverse Laplace transforms and fits, pH fits, commands for picking points from a dataset, averaging points with the same X value, or perform singular value decomposition. In addition to these new features, many previous commands were improved, like the addition of a bandcut filter in FFT filtering, better handling of the loading of files produced by QSoas itself, and a button to interrupt the processing of scripts. There are a lot of other new features, improvements and so on, look for the full list there. About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.3. You can download for free its source code or precompiled versions for MacOS and Windows there. Alternatively, you can clone from the GitHub repository.

18 April 2024

Thomas Koch: Minimal overhead VMs with Nix and MicroVM

Posted on March 17, 2024
Joachim Breitner wrote about a Convenient sandboxed development environment and thus reminded me to blog about MicroVM. I ve toyed around with it a little but not yet seriously used it as I m currently not coding. MicroVM is a nix based project to configure and run minimal VMs. It can mount and thus reuse the hosts nix store inside the VM and thus has a very small disk footprint. I use MicroVM on a debian system using the nix package manager. The MicroVM author uses the project to host production services. Otherwise I consider it also a nice way to learn about NixOS after having started with the nix package manager and before making the big step to NixOS as my main system. The guests root filesystem is a tmpdir, so one must explicitly define folders that should be mounted from the host and thus be persistent across VM reboots. I defined the VM as a nix flake since this is how I started from the MicroVM projects example:
 
  description = "Haskell dev MicroVM";
  inputs.impermanence.url = "github:nix-community/impermanence";
  inputs.microvm.url = "github:astro/microvm.nix";
  inputs.microvm.inputs.nixpkgs.follows = "nixpkgs";
  outputs =   self, impermanence, microvm, nixpkgs  :
    let
      persistencePath = "/persistent";
      system = "x86_64-linux";
      user = "thk";
      vmname = "haskell";
      nixosConfiguration = nixpkgs.lib.nixosSystem  
          inherit system;
          modules = [
            microvm.nixosModules.microvm
            impermanence.nixosModules.impermanence
            ( pkgs, ...  :  
            environment.persistence.$ persistencePath  =  
                hideMounts = true;
                users.$ user  =  
                  directories = [
                    "git" ".stack"
                  ];
                 ;
               ;
              environment.sessionVariables =  
                TERM = "screen-256color";
               ;
              environment.systemPackages = with pkgs; [
                ghc
                git
                (haskell-language-server.override   supportedGhcVersions = [ "94" ];  )
                htop
                stack
                tmux
                tree
                vcsh
                zsh
              ];
              fileSystems.$ persistencePath .neededForBoot = nixpkgs.lib.mkForce true;
              microvm =  
                forwardPorts = [
                    from = "host"; host.port = 2222; guest.port = 22;  
                    from = "guest"; host.port = 5432; guest.port = 5432;   # postgresql
                ];
                hypervisor = "qemu";
                interfaces = [
                    type = "user"; id = "usernet"; mac = "00:00:00:00:00:02";  
                ];
                mem = 4096;
                shares = [  
                  # use "virtiofs" for MicroVMs that are started by systemd
                  proto = "9p";
                  tag = "ro-store";
                  # a host's /nix/store will be picked up so that no
                  # squashfs/erofs will be built for it.
                  source = "/nix/store";
                  mountPoint = "/nix/.ro-store";
                   
                  proto = "virtiofs";
                  tag = "persistent";
                  source = "~/.local/share/microvm/vms/$ vmname /persistent";
                  mountPoint = persistencePath;
                  socket = "/run/user/1000/microvm-$ vmname -persistent";
                 
                ];
                socket = "/run/user/1000/microvm-control.socket";
                vcpu = 3;
                volumes = [];
                writableStoreOverlay = "/nix/.rwstore";
               ;
              networking.hostName = vmname;
              nix.enable = true;
              nix.nixPath = ["nixpkgs=$ builtins.storePath <nixpkgs> "];
              nix.settings =  
                extra-experimental-features = ["nix-command" "flakes"];
                trusted-users = [user];
               ;
              security.sudo =  
                enable = true;
                wheelNeedsPassword = false;
               ;
              services.getty.autologinUser = user;
              services.openssh =  
                enable = true;
               ;
              system.stateVersion = "24.11";
              systemd.services.loadnixdb =  
                description = "import hosts nix database";
                path = [pkgs.nix];
                wantedBy = ["multi-user.target"];
                requires = ["nix-daemon.service"];
                script = "cat $ persistencePath /nix-store-db-dump nix-store --load-db";
               ;
              time.timeZone = nixpkgs.lib.mkDefault "Europe/Berlin";
              users.users.$ user  =  
                extraGroups = [ "wheel" "video" ];
                group = "user";
                isNormalUser = true;
                openssh.authorizedKeys.keys = [
                  "ssh-rsa REDACTED"
                ];
                password = "";
               ;
              users.users.root.password = "";
              users.groups.user =  ;
             )
          ];
         ;
    in  
      packages.$ system .default = nixosConfiguration.config.microvm.declaredRunner;
     ;
 
I start the microVM with a templated systemd user service:
[Unit]
Description=MicroVM for Haskell development
Requires=microvm-virtiofsd-persistent@.service
After=microvm-virtiofsd-persistent@.service
AssertFileNotEmpty=%h/.local/share/microvm/vms/%i/flake/flake.nix
[Service]
Type=forking
ExecStartPre=/usr/bin/sh -c "[ /nix/var/nix/db/db.sqlite -ot %h/.local/share/microvm/nix-store-db-dump ]   nix-store --dump-db >%h/.local/share/microvm/nix-store-db-dump"
ExecStartPre=ln -f -t %h/.local/share/microvm/vms/%i/persistent/ %h/.local/share/microvm/nix-store-db-dump
ExecStartPre=-%h/.local/state/nix/profile/bin/tmux new -s microvm -d
ExecStart=%h/.local/state/nix/profile/bin/tmux new-window -t microvm: -n "%i" "exec %h/.local/state/nix/profile/bin/nix run --impure %h/.local/share/microvm/vms/%i/flake"
The above service definition creates a dump of the hosts nix store db so that it can be imported in the guest. This is necessary so that the guest can actually use what is available in /nix/store. There is an effort for an overlayed nix store that would be preferable to this hack. Finally the microvm is started inside a tmux session named microvm . This way I can use the VM with SSH or through the console and also access the qemu console. And for completeness the virtiofsd service:
[Unit]
Description=serve host persistent folder for dev VM
AssertPathIsDirectory=%h/.local/share/microvm/vms/%i/persistent
[Service]
ExecStart=%h/.local/state/nix/profile/bin/virtiofsd \
 --socket-path=$ XDG_RUNTIME_DIR /microvm-%i-persistent \
 --shared-dir=%h/.local/share/microvm/vms/%i/persistent \
 --gid-map :995:%G:1: \
 --uid-map :1000:%U:1:

Thomas Koch: Rebuild search with trust

Posted on January 20, 2024
Finally there is a thing people can agree on: Apparently, Google Search is not good anymore. And I m not the only one thinking about decentralization to fix it: Honey I federated the search engine - finding stuff online post-big tech - a lightning talk at the recent chaos communication congress The speaker however did not mention, that there have already been many attempts at building distributed search engines. So why do I think that such an attempt could finally succeed? My definition of success is:
A mildly technical computer user (able to install software) has access to a search engine that provides them with superior search results compared to Google for at least a few predefined areas of interest.
The exact algorithm used by Google Search to rank websites is a secret even to most Googlers. Still it is clear, that it relies heavily on big data: billions of queries, a comprehensive web index and user behaviour data. - All this is not available to us. A distributed search engine however can instead rely on user input. Every admin of one node seeds the node ranking with their personal selection of trusted sites. They connect their node with nodes of people they trust. This results in a web of (transitive) trust much like pgp. For comparison, imagine you are searching for something in a world without computers: You ask the people around you. They probably forward your question to their peers. I already had a look at YaCy. It is active, somewhat usable and has a friendly maintainer. Unfortunately I consider the codebase to show its age. It takes a lot of time for a newcomer to find their way around and it contains a lot of cruft. Nevertheless, YaCy is a good example that a decentralized search software can be done even by a small team or just one person. I myself started working on a software in Haskell and keep my notes here: Populus:DezInV. Since I m learning Haskell along the way, there is nothing there to see yet. Additionally I took a yak shaving break to learn nix. By the way: DuckDuckGo is not the alternative. And while I would encourage you to also try Yandex for a second opinion, I don t consider this a solution.

Thomas Koch: Using nix package manager in Debian

Posted on January 16, 2024
The nix package manager is available in Debian since May 2020. Why would one use it in Debian? Especially the last point nagged me every time I set up a new Debian installation. My emacs configuration and my Desktop setup expects certain software to be installed. Please be aware that I m a beginner with nix and that my config might not follow best practice. Additionally many nix users are already using the new flakes feature of nix that I m still learning about. So I ve got this file at .config/nixpkgs/config.nix1:
with (import <nixpkgs>  );
 
  packageOverrides = pkgs: with pkgs;  
    thk-emacsWithPackages = (pkgs.emacsPackagesFor emacs-gtk).emacsWithPackages (
      epkgs:
      (with epkgs.elpaPackages; [
        ace-window
        company
        org
        use-package
      ]) ++ (with epkgs.melpaPackages; [
        editorconfig
        flycheck
        haskell-mode
        magit
        nix-mode
        paredit
        rainbow-delimiters
        treemacs
        visual-fill-column
        yasnippet-snippets
      ]) ++ [    # From main packages set
      ]
    );

    userPackages = buildEnv  
      extraOutputsToInstall = [ "doc" "info" "man" ];
      name = "user-packages";
      paths = [
        ghc
        git
        (pkgs.haskell-language-server.override   supportedGhcVersions = [ "94" ];  )
        nix
        stack
        thk-emacsWithPackages
        tmux
        vcsh
        virtiofsd
      ];
     ;
   ;
 
Every time I change the file or want to receive updates, I do:
nix-env --install --attr nixpkgs.userPackages --remove-all
You can see that I install nix with nix. This gives me a newer version than the one available in Debian stable. However, the nix-daemon still runs as the older binary from Debian. My dirty hack is to put this override in /etc/systemd/system/nix-daemon.service.d/override.conf:
[Service]
ExecStart=
ExecStart=@/home/thk/.local/state/nix/profile/bin/nix-daemon nix-daemon --daemon
I m not too interested in a cleaner way since I hope to fully migrate to Nix anyways.

  1. Note the nixpkgs in the path. This is not a config file for nix the package manager but for the nix package collection. See the nixpkgs manual.

Thomas Koch: Chromium gtk-filechooser preview size

Posted on January 9, 2024
I wanted to report this issue in chromiums issue tracker, but it gave me:
Something went wrong, please try again later.
Ok, then at least let me reply to this askubuntu question. But my attempt to signup with my launchpad account gave me:
Launchpad Login Failed. Please try logging in again.
I refrain from commenting on this to not violate some code of conduct. So this is what I wanted to write:
GTK file chooser image preview size should be configurable The file chooser that appears when uploading a file (e.g. an image to Google Fotos) learned to show a preview in issue 15500. The preview image size is hard coded to 256x512 in kPreviewWidth and kPreviewHeight in ui/gtk/select_file_dialog_linux_gtk.cc. Please make the size configurable. On high DPI screens the images are too small to be of much use.
Yes, I should not use chromium anymore.

Thomas Koch: Good things come ... state folder

Posted on January 2, 2024
Just a little while ago (10 years) I proposed the addition of a state folder to the XDG basedir specification and expanded the article XDGBaseDirectorySpecification in the Debian wiki. Recently I learned, that version 0.8 (from May 2021) of the spec finally includes a state folder. Granted, I wasn t the first to have this idea (2009), nor the one who actually made it happen. Now, please go ahead and use it! Thank you.

15 April 2024

Andreas R nnquist: Status update for Allegro packaging in Debian

I have mailed to a Debian bug on allegro4.4 describing my reasoning regarding the allegro libraries in short, allegro4.4 is pretty much dead upstream, and my interest was basically to keep alex4 (which is cool) in Debian, but since it migrated to non-free, my interest in allegro4.4 has waned. So, if anybody would like to still see allegro4.4 in Debian, please step up now and help out. Since it is dead upstream, my reasoning is that it is better to remove it from Debian if no maintainer who wants to help steps up. Previously Tobias Hansen has helped out, but now it is 8 (!) years since his last upload of either package. (Please don t interpret this as judgement, I am very happy for the help he has provided and all the work he has done on the packages). Allegro5 is another deal still active upstream, and I have kept it up to date in Debian, and while I have held the latest upload a short while because of the time_t transition, it will come sooner or later There I am also waiting on a final decision on this bug from upstream. Other than that allegro 5 is in a very good state, and I will keep maintaining it as long as I can. But help would of course be appreciated on allegro5 too.

13 April 2024

Simon Josefsson: Reproducible and minimal source-only tarballs

With the release of Libntlm version 1.8 the release tarball can be reproduced on several distributions. We also publish a signed minimal source-only tarball, produced by git-archive which is the same format used by Savannah, Codeberg, GitLab, GitHub and others. Reproducibility of both tarballs are tested continuously for regressions on GitLab through a CI/CD pipeline. If that wasn t enough to excite you, the Debian packages of Libntlm are now built from the reproducible minimal source-only tarball. The resulting binaries are reproducible on several architectures. What does that even mean? Why should you care? How you can do the same for your project? What are the open issues? Read on, dear reader This article describes my practical experiments with reproducible release artifacts, following up on my earlier thoughts that lead to discussion on Fosstodon and a patch by Janneke Nieuwenhuizen to make Guix tarballs reproducible that inspired me to some practical work. Let s look at how a maintainer release some software, and how a user can reproduce the released artifacts from the source code. Libntlm provides a shared library written in C and uses GNU Make, GNU Autoconf, GNU Automake, GNU Libtool and gnulib for build management, but these ideas should apply to most project and build system. The following illustrate the steps a maintainer would take to prepare a release:
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make distcheck
gpg -b libntlm-1.8.tar.gz
The generated files libntlm-1.8.tar.gz and libntlm-1.8.tar.gz.sig are published, and users download and use them. This is how the GNU project have been doing releases since the late 1980 s. That is a testament to how successful this pattern has been! These tarballs contain source code and some generated files, typically shell scripts generated by autoconf, makefile templates generated by automake, documentation in formats like Info, HTML, or PDF. Rarely do they contain binary object code, but historically that happened. The XZUtils incident illustrate that tarballs with files that are not included in the git archive offer an opportunity to disguise malicious backdoors. I blogged earlier how to mitigate this risk by using signed minimal source-only tarballs. The risk of hiding malware is not the only motivation to publish signed minimal source-only tarballs. With pre-generated content in tarballs, there is a risk that GNU/Linux distributions such as Trisquel, Guix, Debian/Ubuntu or Fedora ship generated files coming from the tarball into the binary *.deb or *.rpm package file. Typically the person packaging the upstream project never realized that some installed artifacts was not re-built through a typical autoconf -fi && ./configure && make install sequence, and never wrote the code to rebuild everything. This can also happen if the build rules are written but are buggy, shipping the old artifact. When a security problem is found, this can lead to time-consuming situations, as it may be that patching the relevant source code and rebuilding the package is not sufficient: the vulnerable generated object from the tarball would be shipped into the binary package instead of a rebuilt artifact. For architecture-specific binaries this rarely happens, since object code is usually not included in tarballs although for 10+ years I shipped the binary Java JAR file in the GNU Libidn release tarball, until I stopped shipping it. For interpreted languages and especially for generated content such as HTML, PDF, shell scripts this happens more than you would like. Publishing minimal source-only tarballs enable easier auditing of a project s code, to avoid the need to read through all generated files looking for malicious content. I have taken care to generate the source-only minimal tarball using git-archive. This is the same format that GitLab, GitHub etc offer for the automated download links on git tags. The minimal source-only tarballs can thus serve as a way to audit GitLab and GitHub download material! Consider if/when hosting sites like GitLab or GitHub has a security incident that cause generated tarballs to include a backdoor that is not present in the git repository. If people rely on the tag download artifact without verifying the maintainer PGP signature using GnuPG, this can lead to similar backdoor scenarios that we had for XZUtils but originated with the hosting provider instead of the release manager. This is even more concerning, since this attack can be mounted for some selected IP address that you want to target and not on everyone, thereby making it harder to discover. With all that discussion and rationale out of the way, let s return to the release process. I have added another step here:
make srcdist
gpg -b libntlm-1.8-src.tar.gz
Now the release is ready. I publish these four files in the Libntlm s Savannah Download area, but they can be uploaded to a GitLab/GitHub release area as well. These are the SHA256 checksums I got after building the tarballs on my Trisquel 11 aramo laptop:
91de864224913b9493c7a6cec2890e6eded3610d34c3d983132823de348ec2ca  libntlm-1.8-src.tar.gz
ce6569a47a21173ba69c990965f73eb82d9a093eb871f935ab64ee13df47fda1  libntlm-1.8.tar.gz
So how can you reproduce my artifacts? Here is how to reproduce them in a Ubuntu 22.04 container:
podman run -it --rm ubuntu:22.04
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
./bootstrap
./configure
make dist srcdist
sha256sum libntlm-*.tar.gz
You should see the exact same SHA256 checksum values. Hooray! This works because Trisquel 11 and Ubuntu 22.04 uses the same version of git, autoconf, automake, and libtool. These tools do not guarantee the same output content for all versions, similar to how GNU GCC does not generate the same binary output for all versions. So there is still some delicate version pairing needed. Ideally, the artifacts should be possible to reproduce from the release artifacts themselves, and not only directly from git. It is possible to reproduce the full tarball in a AlmaLinux 8 container replace almalinux:8 with rockylinux:8 if you prefer RockyLinux:
podman run -it --rm almalinux:8
dnf update -y
dnf install -y make wget gcc
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8.tar.gz
tar xfa libntlm-1.8.tar.gz
cd libntlm-1.8
./configure
make dist
sha256sum libntlm-1.8.tar.gz
The source-only minimal tarball can be regenerated on Debian 11:
podman run -it --rm debian:11
apt-get update
apt-get install -y --no-install-recommends make git ca-certificates
git clone https://gitlab.com/gsasl/libntlm.git
cd libntlm
git checkout v1.8
make -f cfg.mk srcdist
sha256sum libntlm-1.8-src.tar.gz 
As the Magnus Opus or chef-d uvre, let s recreate the full tarball directly from the minimal source-only tarball on Trisquel 11 replace docker.io/kpengboy/trisquel:11.0 with ubuntu:22.04 if you prefer.
podman run -it --rm docker.io/kpengboy/trisquel:11.0
apt-get update
apt-get install -y --no-install-recommends autoconf automake libtool make wget git ca-certificates
wget https://download.savannah.nongnu.org/releases/libntlm/libntlm-1.8-src.tar.gz
tar xfa libntlm-1.8-src.tar.gz
cd libntlm-v1.8
./bootstrap
./configure
make dist
sha256sum libntlm-1.8.tar.gz
Yay! You should now have great confidence in that the release artifacts correspond to what s in version control and also to what the maintainer intended to release. Your remaining job is to audit the source code for vulnerabilities, including the source code of the dependencies used in the build. You no longer have to worry about auditing the release artifacts. I find it somewhat amusing that the build infrastructure for Libntlm is now in a significantly better place than the code itself. Libntlm is written in old C style with plenty of string manipulation and uses broken cryptographic algorithms such as MD4 and single-DES. Remember folks: solving supply chain security issues has no bearing on what kind of code you eventually run. A clean gun can still shoot you in the foot. Side note on naming: GitLab exports tarballs with pathnames libntlm-v1.8/ (i.e.., PROJECT-TAG/) and I ve adopted the same pathnames, which means my libntlm-1.8-src.tar.gz tarballs are bit-by-bit identical to GitLab s exports and you can verify this with tools like diffoscope. GitLab name the tarball libntlm-v1.8.tar.gz (i.e., PROJECT-TAG.ARCHIVE) which I find too similar to the libntlm-1.8.tar.gz that we also publish. GitHub uses the same git archive style, but unfortunately they have logic that removes the v in the pathname so you will get a tarball with pathname libntlm-1.8/ instead of libntlm-v1.8/ that GitLab and I use. The content of the tarball is bit-by-bit identical, but the pathname and archive differs. Codeberg (running Forgejo) uses another approach: the tarball is called libntlm-v1.8.tar.gz (after the tag) just like GitLab, but the pathname inside the archive is libntlm/, otherwise the produced archive is bit-by-bit identical including timestamps. Savannah s CGIT interface uses archive name libntlm-1.8.tar.gz with pathname libntlm-1.8/, but otherwise file content is identical. Savannah s GitWeb interface provides snapshot links that are named after the git commit (e.g., libntlm-a812c2ca.tar.gz with libntlm-a812c2ca/) and I cannot find any tag-based download links at all. Overall, we are so close to get SHA256 checksum to match, but fail on pathname within the archive. I ve chosen to be compatible with GitLab regarding the content of tarballs but not on archive naming. From a simplicity point of view, it would be nice if everyone used PROJECT-TAG.ARCHIVE for the archive filename and PROJECT-TAG/ for the pathname within the archive. This aspect will probably need more discussion. Side note on git archive output: It seems different versions of git archive produce different results for the same repository. The version of git in Debian 11, Trisquel 11 and Ubuntu 22.04 behave the same. The version of git in Debian 12, AlmaLinux/RockyLinux 8/9, Alpine, ArchLinux, macOS homebrew, and upcoming Ubuntu 24.04 behave in another way. Hopefully this will not change that often, but this would invalidate reproducibility of these tarballs in the future, forcing you to use an old git release to reproduce the source-only tarball. Alas, GitLab and most other sites appears to be using modern git so the download tarballs from them would not match my tarballs even though the content would. Side note on ChangeLog: ChangeLog files were traditionally manually curated files with version history for a package. In recent years, several projects moved to dynamically generate them from git history (using tools like git2cl or gitlog-to-changelog). This has consequences for reproducibility of tarballs: you need to have the entire git history available! The gitlog-to-changelog tool also output different outputs depending on the time zone of the person using it, which arguable is a simple bug that can be fixed. However this entire approach is incompatible with rebuilding the full tarball from the minimal source-only tarball. It seems Libntlm s ChangeLog file died on the surgery table here. So how would a distribution build these minimal source-only tarballs? I happen to help on the libntlm package in Debian. It has historically used the generated tarballs as the source code to build from. This means that code coming from gnulib is vendored in the tarball. When a security problem is discovered in gnulib code, the security team needs to patch all packages that include that vendored code and rebuild them, instead of merely patching the gnulib package and rebuild all packages that rely on that particular code. To change this, the Debian libntlm package needs to Build-Depends on Debian s gnulib package. But there was one problem: similar to most projects that use gnulib, Libntlm depend on a particular git commit of gnulib, and Debian only ship one commit. There is no coordination about which commit to use. I have adopted gnulib in Debian, and add a git bundle to the *_all.deb binary package so that projects that rely on gnulib can pick whatever commit they need. This allow an no-network GNULIB_URL and GNULIB_REVISION approach when running Libntlm s ./bootstrap with the Debian gnulib package installed. Otherwise libntlm would pick up whatever latest version of gnulib that Debian happened to have in the gnulib package, which is not what the Libntlm maintainer intended to be used, and can lead to all sorts of version mismatches (and consequently security problems) over time. Libntlm in Debian is developed and tested on Salsa and there is continuous integration testing of it as well, thanks to the Salsa CI team. Side note on git bundles: unfortunately there appears to be no reproducible way to export a git repository into one or more files. So one unfortunate consequence of all this work is that the gnulib *.orig.tar.gz tarball in Debian is not reproducible any more. I have tried to get Git bundles to be reproducible but I never got it to work see my notes in gnulib s debian/README.source on this aspect. Of course, source tarball reproducibility has nothing to do with binary reproducibility of gnulib in Debian itself, fortunately. One open question is how to deal with the increased build dependencies that is triggered by this approach. Some people are surprised by this but I don t see how to get around it: if you depend on source code for tools in another package to build your package, it is a bad idea to hide that dependency. We ve done it for a long time through vendored code in non-minimal tarballs. Libntlm isn t the most critical project from a bootstrapping perspective, so adding git and gnulib as Build-Depends to it will probably be fine. However, consider if this pattern was used for other packages that uses gnulib such as coreutils, gzip, tar, bison etc (all are using gnulib) then they would all Build-Depends on git and gnulib. Cross-building those packages for a new architecture will therefor require git on that architecture first, which gets circular quick. The dependency on gnulib is real so I don t see that going away, and gnulib is a Architecture:all package. However, the dependency on git is merely a consequence of how the Debian gnulib package chose to make all gnulib git commits available to projects: through a git bundle. There are other ways to do this that doesn t require the git tool to extract the necessary files, but none that I found practical ideas welcome! Finally some brief notes on how this was implemented. Enabling bootstrappable source-only minimal tarballs via gnulib s ./bootstrap is achieved by using the GNULIB_REVISION mechanism, locking down the gnulib commit used. I have always disliked git submodules because they add extra steps and has complicated interaction with CI/CD. The reason why I gave up git submodules now is because the particular commit to use is not recorded in the git archive output when git submodules is used. So the particular gnulib commit has to be mentioned explicitly in some source code that goes into the git archive tarball. Colin Watson added the GNULIB_REVISION approach to ./bootstrap back in 2018, and now it no longer made sense to continue to use a gnulib git submodule. One alternative is to use ./bootstrap with --gnulib-srcdir or --gnulib-refdir if there is some practical problem with the GNULIB_URL towards a git bundle the GNULIB_REVISION in bootstrap.conf. The srcdist make rule is simple:
git archive --prefix=libntlm-v1.8/ -o libntlm-v1.8.tar.gz HEAD
Making the make dist generated tarball reproducible can be more complicated, however for Libntlm it was sufficient to make sure the modification times of all files were set deterministically to the timestamp of the last commit in the git repository. Interestingly there seems to be a couple of different ways to accomplish this, Guix doesn t support minimal source-only tarballs but rely on a .tarball-timestamp file inside the tarball. Paul Eggert explained what TZDB is using some time ago. The approach I m using now is fairly similar to the one I suggested over a year ago. If there are problems because all files in the tarball now use the same modification time, there is a solution by Bruno Haible that could be implemented. Side note on git tags: Some people may wonder why not verify a signed git tag instead of verifying a signed tarball of the git archive. Currently most git repositories uses SHA-1 for git commit identities, but SHA-1 is not a secure hash function. While current SHA-1 attacks can be detected and mitigated, there are fundamental doubts that a git SHA-1 commit identity uniquely refers to the same content that was intended. Verifying a git tag will never offer the same assurance, since a git tag can be moved or re-signed at any time. Verifying a git commit is better but then we need to trust SHA-1. Migrating git to SHA-256 would resolve this aspect, but most hosting sites such as GitLab and GitHub does not support this yet. There are other advantages to using signed tarballs instead of signed git commits or git tags as well, e.g., tar.gz can be a deterministically reproducible persistent stable offline storage format but .git sub-directory trees or git bundles do not offer this property. Doing continous testing of all this is critical to make sure things don t regress. Libntlm s pipeline definition now produce the generated libntlm-*.tar.gz tarballs and a checksum as a build artifact. Then I added the 000-reproducability job which compares the checksums and fails on mismatches. You can read its delicate output in the job for the v1.8 release. Right now we insists that builds on Trisquel 11 match Ubuntu 22.04, that PureOS 10 builds match Debian 11 builds, that AlmaLinux 8 builds match RockyLinux 8 builds, and AlmaLinux 9 builds match RockyLinux 9 builds. As you can see in pipeline job output, not all platforms lead to the same tarballs, but hopefully this state can be improved over time. There is also partial reproducibility, where the full tarball is reproducible across two distributions but not the minimal tarball, or vice versa. If this way of working plays out well, I hope to implement it in other projects too. What do you think? Happy Hacking!

Paul Tagliamonte: Domo Arigato, Mr. debugfs

Years ago, at what I think I remember was DebConf 15, I hacked for a while on debhelper to write build-ids to debian binary control files, so that the build-id (more specifically, the ELF note .note.gnu.build-id) wound up in the Debian apt archive metadata. I ve always thought this was super cool, and seeing as how Michael Stapelberg blogged some great pointers around the ecosystem, including the fancy new debuginfod service, and the find-dbgsym-packages helper, which uses these same headers, I don t think I m the only one. At work I ve been using a lot of rust, specifically, async rust using tokio. To try and work on my style, and to dig deeper into the how and why of the decisions made in these frameworks, I ve decided to hack up a project that I ve wanted to do ever since 2015 write a debug filesystem. Let s get to it.

Back to the Future Time to admit something. I really love Plan 9. It s just so good. So many ideas from Plan 9 are just so prescient, and everything just feels right. Not just right like, feels good like, correct. The bit that I ve always liked the most is 9p, the network protocol for serving a filesystem over a network. This leads to all sorts of fun programs, like the Plan 9 ftp client being a 9p server you mount the ftp server and access files like any other files. It s kinda like if fuse were more fully a part of how the operating system worked, but fuse is all running client-side. With 9p there s a single client, and different servers that you can connect to, which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic. The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9 in terms of adoption 9p is in all sorts of places folks don t usually expect. For instance, the Windows Subsystem for Linux uses the 9p protocol to share files between Windows and Linux. ChromeOS uses it to share files with Crostini, and qemu uses 9p (virtio-p9) to share files between guest and host. If you re noticing a pattern here, you d be right; for some reason 9p is the go-to protocol to exchange files between hypervisor and guest. Why? I have no idea, except maybe due to being designed well, simple to implement, and it s a lot easier to validate the data being shared and validate security boundaries. Simplicity has its value. As a result, there s a lot of lingering 9p support kicking around. Turns out Linux can even handle mounting 9p filesystems out of the box. This means that I can deploy a filesystem to my LAN or my localhost by running a process on top of a computer that needs nothing special, and mount it over the network on an unmodified machine unlike fuse, where you d need client-specific software to run in order to mount the directory. For instance, let s mount a 9p filesystem running on my localhost machine, serving requests on 127.0.0.1:564 (tcp) that goes by the name mountpointname to /mnt.
$ mount -t 9p \
-o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \
127.0.0.1 \
/mnt
Linux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS.

WHEREIN I STYX WITH IT Since I wanted to push myself a bit more with rust and tokio specifically, I opted to implement the whole stack myself, without third party libraries on the critical path where I could avoid it. The 9p protocol (sometimes called Styx, the original name for it) is incredibly simple. It s a series of client to server requests, which receive a server to client response. These are, respectively, T messages, which transmit a request to the server, which trigger an R message in response (Reply messages). These messages are TLV payload with a very straight forward structure so straight forward, in fact, that I was able to implement a working server off nothing more than a handful of man pages. Later on after the basics worked, I found a more complete spec page that contains more information about the unix specific variant that I opted to use (9P2000.u rather than 9P2000) due to the level of Linux specific support for the 9P2000.u variant over the 9P2000 protocol.

MR ROBOTO The backend stack over at zoo is rust and tokio running i/o for an HTTP and WebRTC server. I figured I d pick something fairly similar to write my filesystem with, since 9P can be implemented on basically anything with I/O. That means tokio tcp server bits, which construct and use a 9p server, which has an idiomatic Rusty API that partially abstracts the raw R and T messages, but not so much as to cause issues with hiding implementation possibilities. At each abstraction level, there s an escape hatch allowing someone to implement any of the layers if required. I called this framework arigato which can be found over on docs.rs and crates.io.
/// Simplified version of the arigato File trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.File.html
trait File  
/// OpenFile is the type returned by this File via an Open call.
 type OpenFile: OpenFile;
/// Return the 9p Qid for this file. A file is the same if the Qid is
 /// the same. A Qid contains information about the mode of the file,
 /// version of the file, and a unique 64 bit identifier.
 fn qid(&self) -> Qid;
/// Construct the 9p Stat struct with metadata about a file.
 async fn stat(&self) -> FileResult<Stat>;
/// Attempt to update the file metadata.
 async fn wstat(&mut self, s: &Stat) -> FileResult<()>;
/// Traverse the filesystem tree.
 async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>;
/// Request that a file's reference be removed from the file tree.
 async fn unlink(&mut self) -> FileResult<()>;
/// Create a file at a specific location in the file tree.
 async fn create(
&mut self,
name: &str,
perm: u16,
ty: FileType,
mode: OpenMode,
extension: &str,
) -> FileResult<Self>;
/// Open the File, returning a handle to the open file, which handles
 /// file i/o. This is split into a second type since it is genuinely
 /// unrelated -- and the fact that a file is Open or Closed can be
 /// handled by the  arigato  server for us.
 async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>;
 
/// Simplified version of the arigato OpenFile trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html
trait OpenFile  
/// iounit to report for this file. The iounit reported is used for Read
 /// or Write operations to signal, if non-zero, the maximum size that is
 /// guaranteed to be transferred atomically.
 fn iounit(&self) -> u32;
/// Read some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes read is
 /// returned.
 async fn read_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
/// Write some number of bytes up to  buf.len()  from the provided
 ///  offset  of the underlying file. The number of bytes written
 /// is returned.
 fn write_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
 

Thanks, decade ago paultag! Let s do it! Let s use arigato to implement a 9p filesystem we ll call debugfs that will serve all the debug files shipped according to the Packages metadata from the apt archive. We ll fetch the Packages file and construct a filesystem based on the reported Build-Id entries. For those who don t know much about how an apt repo works, here s the 2-second crash course on what we re doing. The first is to fetch the Packages file, which is specific to a binary architecture (such as amd64, arm64 or riscv64). That architecture is specific to a component (such as main, contrib or non-free). That component is specific to a suite, such as stable, unstable or any of its aliases (bullseye, bookworm, etc). Let s take a look at the Packages.xz file for the unstable-debug suite, main component, for all amd64 binaries.
$ curl \
https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \
  unxz
This will return the Debian-style rfc2822-like headers, which is an export of the metadata contained inside each .deb file which apt (or other tools that can use the apt repo format) use to fetch information about debs. Let s take a look at the debug headers for the netlabel-tools package in unstable which is a package named netlabel-tools-dbgsym in unstable-debug.
Package: netlabel-tools-dbgsym
Source: netlabel-tools (0.30.0-1)
Version: 0.30.0-1+b1
Installed-Size: 79
Maintainer: Paul Tagliamonte <paultag@debian.org>
Architecture: amd64
Depends: netlabel-tools (= 0.30.0-1+b1)
Description: debug symbols for netlabel-tools
Auto-Built-Package: debug-symbols
Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a
Description-md5: a0e587a0cf730c88a4010f78562e6db7
Section: debug
Priority: optional
Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
Size: 62776
SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60
So here, we can parse the package headers in the Packages.xz file, and store, for each Build-Id, the Filename where we can fetch the .deb at. Each .deb contains a number of files but we re only really interested in the files inside the .deb located at or under /usr/lib/debug/.build-id/, which you can find in debugfs under rfc822.rs. It s crude, and very single-purpose, but I m feeling a bit lazy.

Who needs dpkg?! For folks who haven t seen it yet, a .deb file is a special type of .ar file, that contains (usually) three files inside debian-binary, control.tar.xz and data.tar.xz. The core of an .ar file is a fixed size (60 byte) entry header, followed by the specified size number of bytes.
[8 byte .ar file magic]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
...
First up was to implement a basic ar parser in ar.rs. Before we get into using it to parse a deb, as a quick diversion, let s break apart a .deb file by hand something that is a bit of a rite of passage (or at least it used to be? I m getting old) during the Debian nm (new member) process, to take a look at where exactly the .debug file lives inside the .deb file.
$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ ls
control.tar.xz debian-binary
data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ tar --list -f data.tar.xz   grep '.debug$'
./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
Since we know quite a bit about the structure of a .deb file, and I had to implement support from scratch anyway, I opted to implement a (very!) basic debfile parser using HTTP Range requests. HTTP Range requests, if supported by the server (denoted by a accept-ranges: bytes HTTP header in response to an HTTP HEAD request to that file) means that we can add a header such as range: bytes=8-68 to specifically request that the returned GET body be the byte range provided (in the above case, the bytes starting from byte offset 8 until byte offset 68). This means we can fetch just the ar file entry from the .deb file until we get to the file inside the .deb we are interested in (in our case, the data.tar.xz file) at which point we can request the body of that file with a final range request. I wound up writing a struct to handle a read_at-style API surface in hrange.rs, which we can pair with ar.rs above and start to find our data in the .deb remotely without downloading and unpacking the .deb at all. After we have the body of the data.tar.xz coming back through the HTTP response, we get to pipe it through an xz decompressor (this kinda sucked in Rust, since a tokio AsyncRead is not the same as an http Body response is not the same as std::io::Read, is not the same as an async (or sync) Iterator is not the same as what the xz2 crate expects; leading me to read blocks of data to a buffer and stuff them through the decoder by looping over the buffer for each lzma2 packet in a loop), and tarfile parser (similarly troublesome). From there we get to iterate over all entries in the tarfile, stopping when we reach our file of interest. Since we can t seek, but gdb needs to, we ll pull it out of the stream into a Cursor<Vec<u8>> in-memory and pass a handle to it back to the user. From here on out its a matter of gluing together a File traited struct in debugfs, and serving the filesystem over TCP using arigato. Done deal!

A quick diversion about compression I was originally hoping to avoid transferring the whole tar file over the network (and therefore also reading the whole debug file into ram, which objectively sucks), but quickly hit issues with figuring out a way around seeking around an xz file. What s interesting is xz has a great primitive to solve this specific problem (specifically, use a block size that allows you to seek to the block as close to your desired seek position just before it, only discarding at most block size - 1 bytes), but data.tar.xz files generated by dpkg appear to have a single mega-huge block for the whole file. I don t know why I would have expected any different, in retrospect. That means that this now devolves into the base case of How do I seek around an lzma2 compressed data stream ; which is a lot more complex of a question. Thankfully, notoriously brilliant tianon was nice enough to introduce me to Jon Johnson who did something super similar adapted a technique to seek inside a compressed gzip file, which lets his service oci.dag.dev seek through Docker container images super fast based on some prior work such as soci-snapshotter, gztool, and zran.c. He also pulled this party trick off for apk based distros over at apk.dag.dev, which seems apropos. Jon was nice enough to publish a lot of his work on this specifically in a central place under the name targz on his GitHub, which has been a ton of fun to read through. The gist is that, by dumping the decompressor s state (window of previous bytes, in-memory data derived from the last N-1 bytes) at specific checkpoints along with the compressed data stream offset in bytes and decompressed offset in bytes, one can seek to that checkpoint in the compressed stream and pick up where you left off creating a similar block mechanism against the wishes of gzip. It means you d need to do an O(n) run over the file, but every request after that will be sped up according to the number of checkpoints you ve taken. Given the complexity of xz and lzma2, I don t think this is possible for me at the moment especially given most of the files I ll be requesting will not be loaded from again especially when I can just cache the debug header by Build-Id. I want to implement this (because I m generally curious and Jon has a way of getting someone excited about compression schemes, which is not a sentence I thought I d ever say out loud), but for now I m going to move on without this optimization. Such a shame, since it kills a lot of the work that went into seeking around the .deb file in the first place, given the debian-binary and control.tar.gz members are so small.

The Good First, the good news right? It works! That s pretty cool. I m positive my younger self would be amused and happy to see this working; as is current day paultag. Let s take debugfs out for a spin! First, we need to mount the filesystem. It even works on an entirely unmodified, stock Debian box on my LAN, which is huge. Let s take it for a spin:
$ mount \
-t 9p \
-o trans=tcp,version=9p2000.u,aname=unstable-debug \
192.168.0.2 \
/usr/lib/debug/.build-id/
And, let s prove to ourselves that this actually mounted before we go trying to use it:
$ mount   grep build-id
192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564)
Slick. We ve got an open connection to the server, where our host will keep a connection alive as root, attached to the filesystem provided in aname. Let s take a look at it.
$ ls /usr/lib/debug/.build-id/
00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3
01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4
02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5
03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6
04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7
05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8
06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9
07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa
08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb
09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc
0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd
0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe
0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff
Outstanding. Let s try using gdb to debug a binary that was provided by the Debian archive, and see if it ll load the ELF by build-id from the right .deb in the unstable-debug suite:
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Yes! Yes it will!
$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
/usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped

The Bad Linux s support for 9p is mainline, which is great, but it s not robust. Network issues or server restarts will wedge the mountpoint (Linux can t reconnect when the tcp connection breaks), and things that work fine on local filesystems get translated in a way that causes a lot of network chatter for instance, just due to the way the syscalls are translated, doing an ls, will result in a stat call for each file in the directory, even though linux had just got a stat entry for every file while it was resolving directory names. On top of that, Linux will serialize all I/O with the server, so there s no concurrent requests for file information, writes, or reads pending at the same time to the server; and read and write throughput will degrade as latency increases due to increasing round-trip time, even though there are offsets included in the read and write calls. It works well enough, but is frustrating to run up against, since there s not a lot you can do server-side to help with this beyond implementing the 9P2000.L variant (which, maybe is worth it).

The Ugly Unfortunately, we don t know the file size(s) until we ve actually opened the underlying tar file and found the correct member, so for most files, we don t know the real size to report when getting a stat. We can t parse the tarfiles for every stat call, since that d make ls even slower (bummer). Only hiccup is that when I report a filesize of zero, gdb throws a bit of a fit; let s try with a size of 0 to start:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug]
[...]
This obviously won t work since gdb will throw away all our hard work because of stat s output, and neither will loading the real size of the underlying file. That only leaves us with hardcoding a file size and hope nothing else breaks significantly as a result. Let s try it again:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Much better. I mean, terrible but better. Better for now, anyway.

Kilroy was here Do I think this is a particularly good idea? I mean; kinda. I m probably going to make some fun 9p arigato-based filesystems for use around my LAN, but I don t think I ll be moving to use debugfs until I can figure out how to ensure the connection is more resilient to changing networks, server restarts and fixes on i/o performance. I think it was a useful exercise and is a pretty great hack, but I don t think this ll be shipping anywhere anytime soon. Along with me publishing this post, I ve pushed up all my repos; so you should be able to play along at home! There s a lot more work to be done on arigato; but it does handshake and successfully export a working 9P2000.u filesystem. Check it out on on my github at arigato, debugfs and also on crates.io and docs.rs. At least I can say I was here and I got it working after all these years.

Russell Coker: Software Needed for Work

When I first started studying computer science setting up a programming project was easy, write source code files and a Makefile and that was it. IRC was the only IM system and email was the only other communications system that was used much. Writing Makefiles is difficult but products like the Borland Turbo series of IDEs did all that for you so you could just start typing code and press a function key to compile and run (F5 from memory). Over the years the requirements and expectations of computer use have grown significantly. The typical office worker is now doing many more things with computers than serious programmers used to do. Running an IM system, an online document editing system, and a series of web apps is standard for companies nowadays. Developers have to do all that in addition to tools for version control, continuous integration, bug reporting, and feature tracking. The development process is also more complex with extra steps for reproducible builds, automated tests, and code coverage metrics for the tests. I wonder how many programmers who started in the 90s would have done something else if faced with Github as their introduction. How much of this is good? Having the ability to send instant messages all around the world is great. Having dozens of different ways of doing so is awful. When a company uses multiple IM systems such as MS-Teams and Slack and forces some of it s employees to use them both it s getting ridiculous. Having different friend groups on different IM systems is anti-social networking. In the EU the Digital Markets Act [1] forces some degree of interoperability between different IM systems and as it s impossible to know who s actually in the EU that will end up being world-wide. In corporations document management often involves multiple ways of storing things, you have Google Docs, MS Office online, hosted Wikis like Confluence, and more. Large companies tend to use several such systems which means that people need to learn multiple systems to be able to work and they also need to know which systems are used by the various groups that they communicate with. Microsoft deserves some sort of award for the range of ways they have for managing documents, Sharepoint, OneDrive, Office Online, attachments to Teams rooms, and probably lots more. During WW2 the predecessor to the CIA produced an excellent manual for simple sabotage [2]. If something like that was written today the section General Interference with Organisations and Production would surely have something about using as many incompatible programs and web sites as possible in the work flow. The proliferation of software required for work is a form of denial of service attack against corporations. The efficiency of companies doesn t really bother me. It sucks that companies are creating a demoralising workplace that is unpleasant for workers. But the upside is that the biggest companies are the ones doing the worst things and are also the most afflicted by these problems. It s almost like the Bureau of Sabotage in some of Frank Herbert s fiction [3]. The thing that concerns me is the effect of multiple standards on free software development. We have IRC the most traditional IM support system which is getting replaced by Matrix but we also have some projects using Telegram, and Jabber hasn t gone away. I m sure there are others too. There are also multiple options for version control (although github seems to dominate the market), forums, bug trackers, etc. Reporting bugs or getting support in free software often requires interacting with several of them. Developing free software usually involves dealing with the bug tracking and documentation systems of the distribution you use as well as the upstream developers of the software. If the problem you have is related to compatibility between two different pieces of free software then you can end up dealing with even more bug tracking systems. There are real benefits to some of the newer programs to track bugs, write documentation, etc. There is also going to be a cost in changing which gives an incentive for the older projects to keep using what has worked well enough for them in the past, How can we improve things? Use only the latest tools? Prioritise ease of use? Aim more for the entry level contributors?

Next.