Search Results: "voc"

28 March 2024

Joey Hess: the vulture in the coal mine

Turns out that VPS provider Vultr's terms of service were quietly changed some time ago to give them a "perpetual, irrevocable" license to use content hosted there in any way, including modifying it and commercializing it "for purposes of providing the Services to you." This is very similar to changes that Github made to their TOS in 2017. Since then, Github has been rebranded as "The world s leading AI-powered developer platform". The language in their TOS now clearly lets them use content stored in Github for training AI. (Probably this is their second line of defense if the current attempt to legitimise copyright laundering via generative AI fails.) Vultr is currently in damage control mode, accusing their concerned customers of spreading "conspiracy theories" (-- founder David Aninowsky) and updating the TOS to remove some of the problem language. Although it still allows them to "make derivative works", so could still allow their AI division to scrape VPS images for training data. Vultr claims this was the legalese version of technical debt, that it only ever applied to posts in a forum (not supported by the actual TOS language) and basically that they and their lawyers are incompetant but not malicious. Maybe they are indeed incompetant. But even if I give them the benefit of the doubt, I expect that many other VPS providers, especially ones targeting non-corporate customers, are watching this closely. If Vultr is not significantly harmed by customers jumping ship, if the latest TOS change is accepted as good enough, then other VPS providers will know that they can try this TOS trick too. If Vultr's AI division does well, others will wonder to what extent it is due to having all this juicy training data. For small self-hosters, this seems like a good time to make sure you're using a VPS provider you can actually trust to not be eyeing your disk image and salivating at the thought of stripmining it for decades of emails. Probably also worth thinking about moving to bare metal hardware, perhaps hosted at home. I wonder if this will finally make it worthwhile to mess around with VPS TPMs?

29 February 2024

Russell Coker: Links February 2024

In 2018 Charles Stross wrote an insightful blog post Dude You Broke the Future [1]. It covers AI in both fiction and fact and corporations (the real AIs) and the horrifying things they can do right now. LongNow has an interesting article about the concept of the Magnum Opus [2]. As an aside I ve been working on SE Linux for 22 years. Cory Doctorow wrote an insightful article about the incentives for enshittification of the Internet and how economic issues and regulations shape that [3]. CCC has a lot of great talks, and this talk from the latest CCC about the Triangulation talk on an attak on Kaspersky iPhones is particularly epic [4]. GoodCar is an online sales site for electric cars in Australia [5]. Ulrike wrote an insightful blog post about how the reliance on volunteer work in the FOSS community hurts diversity [6]. Cory Doctorow wrote an insightful article about The Internet s Original Sin which is misuse of copyright law [7]. He advocates for using copyright strictly for it s intended purpose and creating other laws for privacy, labor rights, etc. David Brin wrote an interesting article on neoteny and sexual selection in humans [8]. 37C3 has an interesting lecture about software licensing for a circular economy which includes environmental savings from better code [9]. Now they track efficiency in KDE bug reports!

23 February 2024

Gunnar Wolf: 10 things software developers should learn about learning

This post is a review for Computing Reviews for 10 things software developers should learn about learning , a article published in Communications of the ACM

As software developers, we understand the detailed workings of the different components of our computer systems. And probably due to how computers were presented since their appearance as digital brains in the 1940s we sometimes believe we can transpose that knowledge to how our biological brains work, be it as learners or as problem solvers. This article aims at making the reader understand several mechanisms related to how learning and problem solving actually work in our brains. It focuses on helping expert developers convey knowledge to new learners, as well as learners who need to get up to speed and start coding. The article s narrative revolves around software developers, but much of what it presents can be applied to different problem domains. The article takes this mission through ten points, with roughly the same space given to each of them, starting with wrong assumptions many people have about the similarities between computers and our brains. The first section, Human Memory Is Not Made of Bits, explains the brain processes of remembering as a way of strengthening the force of a memory ( reconsolidation ) and the role of activation in related network pathways. The second section, Human Memory Is Composed of One Limited and One Unlimited System, goes on to explain the organization of memories in the brain between long-term memory (functionally limitless, permanent storage) and working memory (storing little amounts of information used for solving a problem at hand). However, the focus soon shifts to how experience in knowledge leads to different ways of using the same concepts, the importance of going from abstract to concrete knowledge applications and back, and the role of skills repetition over time. Toward the end of the article, the focus shifts from the mechanical act of learning to expertise. Section 6, The Internet Has Not Made Learning Obsolete, emphasizes that problem solving is not just putting together the pieces of a puzzle; searching online for solutions to a problem does not activate the neural pathways that would get fired up otherwise. The final sections tackle the differences that expertise brings to play when teaching or training a newcomer: the same tools that help the beginner s productivity as training wheels will often hamper the expert user s as their knowledge has become automated. The article is written with a very informal and easy-to-read tone and vocabulary, and brings forward several issues that might seem like commonsense but do ring bells when it comes to my own experiences both as a software developer and as a teacher. The article closes by suggesting several books that further expand on the issues it brings forward. While I could not identify a single focus or thesis with which to characterize this article, the several points it makes will likely help readers better understand (and bring forward to consciousness) mental processes often taken for granted, and consider often-overlooked aspects when transmitting knowledge to newcomers.

21 January 2024

Debian Brasil: MiniDebConf BH 2024 - patroc nio e financiamento coletivo

J est rolando a inscri o de participante e a chamada de atividades para a MiniDebConf Belo Horizonte 2024, que acontecer de 27 a 30 de abril no Campus Pampulha da UFMG. Este ano estamos ofertando bolsas de alimenta o, hospedagem e passagens para contribuidores(as) ativos(as) do Projeto Debian. Patroc nio: Para a realiza o da MiniDebConf, estamos buscando patroc nio financeiro de empresas e entidades. Ent o se voc trabalha em uma empresa/entidade (ou conhece algu m que trabalha em uma) indique o nosso plano de patroc nio para ela. L voc ver os valores de cada cota e os seus benef cios. Financiamento coletivo: Mas voc tamb m pode ajudar a realiza o da MiniDebConf por meio do nosso financiamento coletivo! Fa a uma doa o de qualquer valor e tenha o seu nome publicado no site do evento como apoiador(a) da MiniDebConf Belo Horizonte 2024. Mesmo que voc n o pretenda vir a Belo Horizonte para participar do evento, voc pode doar e assim contribuir para o mais importante evento do Projeto Debian no Brasil. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o

20 January 2024

Niels Thykier: Making debputy: Writing declarative parsing logic

In this blog post, I will cover how debputy parses its manifest and the conceptual improvements I did to make parsing of the manifest easier. All instructions to debputy are provided via the debian/debputy.manifest file and said manifest is written in the YAML format. After the YAML parser has read the basic file structure, debputy does another pass over the data to extract the information from the basic structure. As an example, the following YAML file:

manifest-version: "0.1"
installations:
  - install:
      source: foo
      dest-dir: usr/bin

would be transformed by the YAML parser into a structure resembling:

 
  "manifest-version": "0.1",
  "installations": [
      
       "install":  
         "source": "foo",
         "dest-dir": "usr/bin",
        
      
  ]

This structure is then what debputy does a pass on to translate this into an even higher level format where the "install" part is translated into an InstallRule. In the original prototype of debputy, I would hand-write functions to extract the data that should be transformed into the internal in-memory high level format. However, it was quite tedious. Especially because I wanted to catch every possible error condition and report "You are missing the required field X at Y" rather than the opaque KeyError: X message that would have been the default. Beyond being tedious, it was also quite error prone. As an example, in debputy/0.1.4 I added support for the install rule and you should allegedly have been able to add a dest-dir: or an as: inside it. Except I crewed up the code and debputy was attempting to look up these keywords from a dict that could never have them. Hand-writing these parsers were so annoying that it demotivated me from making manifest related changes to debputy simply because I did not want to code the parsing logic. When I got this realization, I figured I had to solve this problem better. While reflecting on this, I also considered that I eventually wanted plugins to be able to add vocabulary to the manifest. If the API was "provide a callback to extract the details of whatever the user provided here", then the result would be bad.

Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.

This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.

So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:

It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.

It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.

It had to support "normalization" of the input, such as

           # User provides
           - install: "foo"
           # Which is normalized into:
           - install:
               source: "foo"
4) It must be simple to tell  debputy  what input you expected.

At this point, I remembered that I had seen a Python (PYPI) package where you could give it a TypedDict and an arbitrary input (Sadly, I do not remember the name). The package would then validate the said input against the TypedDict. If the match was successful, you would get the result back casted as the TypedDict. If the match was unsuccessful, the code would raise an error for you. Conceptually, this seemed to be a good starting point for where I wanted to be. Then I looked a bit on the normalization requirement (point 3). What is really going on here is that you have two "schemas" for the input. One is what the programmer will see (the normalized form) and the other is what the user can input (the manifest form). The problem is providing an automatic normalization from the user input to the simplified programmer structure. To expand a bit on the following example:

# User provides
- install: "foo"
# Which is normalized into:
- install:
    source: "foo"

Given that install has the attributes source, sources, dest-dir, as, into, and when, how exactly would you automatically normalize "foo" (str) into source: "foo"? Even if the code filtered by "type" for these attributes, you would end up with at least source, dest-dir, and as as candidates. Turns out that TypedDict actually got this covered. But the Python package was not going in this direction, so I parked it here and started looking into doing my own. At this point, I had a general idea of what I wanted. When defining an extension to the manifest, the plugin would provide debputy with one or two definitions of TypedDict. The first one would be the "parsed" or "target" format, which would be the normalized form that plugin provider wanted to work on. For this example, lets look at an earlier version of the install-examples rule:

# Example input matching this typed dict.
#    
#       "source": ["foo"]
#       "into": ["pkg"]
#    
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

In this form, the install-examples has two attributes - both are list of strings. On the flip side, what the user can input would look something like this:

# Example input matching this typed dict.
#    
#       "source": "foo"
#       "into": "pkg"
#    
#
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
FullInstallExamplesManifestFormat = Union[
    InstallExamplesManifestFormat,
    List[str],
    str,
]

The idea was that the plugin provider would use these two definitions to tell debputy how to parse install-examples. Pseudo-registration code could look something like:

def _handler(
    normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
    ...  # Do something with the normalized form and return an InstallRule.
concept_debputy_api.add_install_rule(
  keyword="install-examples",
  target_form=InstallExamplesTargetFormat,
  manifest_form=FullInstallExamplesManifestFormat,
  handler=_handler,
)

This was my conceptual target and while the current actual API ended up being slightly different, the core concept remains the same.

From concept to basic implementation Building this code is kind like swallowing an elephant. There was no way I would just sit down and write it from one end to the other. So the first prototype of this did not have all the features it has now. Spoiler warning, these next couple of sections will contain some Python typing details. When reading this, it might be helpful to know things such as Union[str, List[str]] being the Python type for either a str (string) or a List[str] (list of strings). If typing makes your head spin, these sections might less interesting for you. To build this required a lot of playing around with Python's introspection and typing APIs. My very first draft only had one "schema" (the normalized form) and had the following features:

Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.

Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.

This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time. There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:

def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
    return x if isinstance(x, list) else [x]

This basic form basically works for all types (assuming none of the types will have List[List[...]]). The logic for determining when this rule is applicable is slightly more involved. My current code is about 100 lines of Python code that would probably lose most of the casual readers. For the interested, you are looking for _union_narrowing in declarative_parser.py With this, when the manifest format had Union[str, List[str]] and the target format had List[str] the generated parser would silently map a string into a list of strings for the plugin provider. But with that in place, I had covered the basics of what I needed to get started. I was quite excited about this milestone of having my first keyword parsed without handwriting the parser logic (at the expense of writing a more generic parse-generator framework).

Adding the first parse hint With the basic implementation done, I looked at what to do next. As mentioned, at the time sources in the manifest format was Union[str, List[str]] and I considered to split into a source: str and a sources: List[str] on the manifest side while keeping the normalized form as sources: List[str]. I ended up committing to this change and that meant I had to solve the problem getting my parser generator to understand the situation:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[str]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

There are two related problems to solve here:

How will the parser generator understand that source should be normalized and then mapped into sources?

Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.

While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:

# For all intents and purposes,  foo  is a string despite all the  Annotated  stuff.
foo: Annotated[str, "hello world"] = "my string here"

The exciting thing is that you can put arbitrary details into the type field and read it out again in your introspection code. Which meant, I could add "parse hints" into the type. Some "quick" prototyping later (a day or so), I got the following to work:

# Map from
#      
#        "source": "foo"  # (or "sources": ["foo"])
#        "into": "pkg"
#      
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[str, List[str]]
# ... into
#      
#        "source": ["foo"]
#        "into": ["pkg"]
#      
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[List[str]]

Without me (as a plugin provider) writing a line of code, I can have debputy rename or "merge" attributes from the manifest form into the normalized form. Obviously, this required me (as the debputy maintainer) to write a lot code so other me and future plugin providers did not have to write it.

High level typing At this point, basic normalization between one mapping to another mapping form worked. But one thing irked me with these install rules. The into was a list of strings when the parser handed them over to me. However, I needed to map them to the actual BinaryPackage (for technical reasons). While I felt I was careful with my manual mapping, I knew this was exactly the kind of case where a busy programmer would skip the "is this a known package name" check and some user would typo their package resulting in an opaque KeyError: foo. Side note: "Some user" was me today and I was super glad to see debputy tell me that I had typoed a package name (I would have been more happy if I had remembered to use debputy check-manifest, so I did not have to wait through the upstream part of the build that happened before debhelper passed control to debputy...) I thought adding this feature would be simple enough. It basically needs two things:

Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)

At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.

It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:

# Map from
class InstallExamplesManifestFormat(TypedDict):
    # Note that sources here is split into source (str) vs. sources (List[str])
    sources: NotRequired[List[str]]
    source: NotRequired[
        Annotated[
            str,
            DebputyParseHint.target_attribute("sources")
        ]
    ]
    # We allow the user to write  into: foo  in addition to  into: [foo] 
    into: Union[BinaryPackage, List[BinaryPackage]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
    # Which source files to install (dest-dir is fixed)
    sources: List[str]
    # Which package(s) that should have these files installed.
    into: NotRequired[
        Annotated[
            List[BinaryPackage],
            DebputyParseHint.required_when_multi_binary()
        ]
    ]

Code-wise, I still had to check for into being absent and providing a default for that case (that is still true in the current codebase - I will hopefully fix that eventually). But I now had less room for mistakes and a standardized error message when you misspell the package name, which was a plus.

The added side-effect - Introspection A lovely side-effect of all the parsing logic being provided to debputy in a declarative form was that the generated parser snippets had fields containing all expected attributes with their types, which attributes were required, etc. This meant that adding an introspection feature where you can ask debputy "What does an install rule look like?" was quite easy. The code base already knew all of this, so the "hard" part was resolving the input the to concrete rule and then rendering it to the user. I added this feature recently along with the ability to provide online documentation for parser rules. I covered that in more details in my blog post Providing online reference documentation for debputy in case you are interested. :)

Wrapping it up This was a short insight into how debputy parses your input. With this declarative technique:

The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.

It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.

The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.

Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)

In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)

16 January 2024

Matthew Palmer: Pwned Certificates on the Fediverse

As well as the collection and distribution of compromised keys, the pwnedkeys project also matches those pwned keys against issued SSL certificates. I m excited to announce that, as of the beginning of 2024, all matched certificates are now being published on the Fediverse, thanks to the botsin.space Mastodon server. Want to know which sites are susceptible to interception and interference, in (near-)real time? Do you have a burning desire to know who is issuing certificates to people that post their private keys in public? Now you can.

How It Works The process for publishing pwned certs is, roughly, as follows:

All the certificates in Certificate Transparency (CT) logs are hoovered up (using my scrape-ct-log tool, the fastest log scraper in the west!), and the fingerprint of the public key of each certificate is stored in an LMDB datafile.

As new private keys are identified as having been compromised, the fingerprint of that key is checked against all the LMDB files, which map key fingerprints to certificates (actually to CT log entry IDs, from which the certificates themselves are retrieved).

If one or more matches are found, then the certificates using the compromised key are forwarded to the tooter , which publishes them for the world to marvel at.

This makes it sound all very straightforward, and it is in theory. The trick comes in optimising the pipeline so that the five million or so new certificates every day can get indexed on the one slightly middle-aged server I ve got, without getting backlogged.

Why Don t You Just Have the Certificates Revoked? Funny story about that I used to notify CAs of certificates they d issued using compromised keys, which had the effect of requiring them to revoke the associated certificates. However, several CAs disliked having to revoke all those certificates, because it cost them staff time (and hence money) to do so. They went so far as to change their procedures from the standard way of accepting problem reports (emailing a generic attestation of compromise), and instead required CA-specific hoop-jumping to notify them of compromised keys. Since the effectiveness of revocation in the WebPKI is, shall we say, homeopathic at best, I decided I couldn t be bothered to play whack-a-mole with CAs that just wanted to be difficult, and I stopped sending compromised key notifications to CAs. Instead, now I m publishing the details of compromised certificates to everyone, so that users can protect themselves directly should they choose to.

Further Work The astute amongst you may have noticed, in the above How It Works description, a bit of a gap in my scanning coverage. CAs can (and do!) issue certificates for keys that are already compromised, including weak keys that have been known about for a decade or more (1, 2, 3). However, as currently implemented, the pwnedkeys certificate checker does not automatically find such certificates. My plan is to augment the CT scraping / cert processing pipeline to check all incoming certificates against the existing (2M+) set of pwned keys. Though, with over five million new certificates to check every day, it s not necessarily as simple as just hit the pwnedkeys API for every new cert . The poor old API server might not like that very much.

Support My Work If you d like to see this extra matching happen a bit quicker, I ve setup a ko-fi supporters page, where you can support my work on pwnedkeys and the other open source software and projects I work on by buying me a refreshing beverage. I would be very appreciative, and your support lets me know I should do more interesting things with the giant database of compromised keys I ve accumulated.

14 January 2024

Debian Brasil: MiniDebConf BH 2024 - abertura de inscri o e chamada de atividades

Est aberta a inscri o de participantes e a chamada de atividades para a MiniDebConf Belo Horizonte 2024 e para o FLISOL - Festival Latino-americano de Instala o de Software Livre. Veja abaixo algumas informa es importantes: Data e local da MiniDebConf e do FLISOL A MiniDebConf acontecer de 27 a 30 de abril no Campus Pampulha da UFMG - Universidade Federal de Minas Gerais. No dia 27 (s bado) tamb m realizaremos uma edi o do FLISOL - Festival Latino-americano de Instala o de Software Livre, evento que acontece no mesmo dia em v rias cidades da Am rica Latina. Enquanto a MiniDebConf ter atividades focados no Debian, o FLISOL ter atividades gerais sobre Software Livre e temas relacionados como linguagem de programa o, CMS, administra o de redes e sistemas, filosofia, liberdade, licen as, etc. Inscri o gratuita e oferta de bolsas Voc j pode realizar a sua inscri o gratuita para a MiniDebConf Belo Horizonte 2024. A MiniDebConf um evento aberto a todas as pessoas, independente do seu n vel de conhecimento sobre Debian. O mais importante ser reunir a comunidade para celebrar um dos maiores projeto de Software Livre no mundo, por isso queremos receber desde usu rios(as) inexperientes que est o iniciando o seu contato com o Debian at Desenvolvedores(as) oficiais do projeto. Ou seja, est o todos(as) convidados(as)! Este ano estamos ofertando bolsas de hospedagem e passagens para viabilizar a vinda de pessoas de outras cidades que contribuem para o Projeto Debian. Contribuidores(as) n o oficiais, DMs e DDs podem solicitar as bolsas usando o formul rio de inscri o. Tamb m estamos ofertando bolsas de alimenta o para todos(as) os(as) participantes, mesmo n o contribuidores(as), e pessoas que moram na regi o de BH. Os recursos financeiros s o bastante limitados, mas tentaremos atender o m ximo de pedidos. Se voc pretende pedir alguma dessas bolsas, acesse este link e veja mais informa es antes de realizar a sua inscri o: A inscri o (sem bolsas) poder ser feita at a data do evento, mas temos uma data limite para o pedido de bolsas de hospedagem e passagens, por isso fique atento(a) ao prazo final: at 18 de fevereiro. Como estamos usando mesmo formul rio para os dois eventos, a inscri o ser v lida tanto para a MiniDebConf quanto para o FLISOL. Para se inscrever, acesse o site, v em Criar conta. Criei a sua conta (preferencialmente usando o Salsa) e acesse o seu perfil. L voc ver o bot o de Se inscrever. https://bh.mini.debconf.org Chamada de atividades Tamb m est aberta a chamada de atividades tanto para MiniDebConf quanto para o FLISOL. Para mais informa es, acesse este link. Fique atento ao prazo final para enviar sua proposta de atividade: at 18 de fevereiro. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o

21 December 2023

Russell Coker: Links December 2023

David Brin wrote an insightful blog post about the latest round of UFO delusion [1]. There aren t a heap of scientists secretly working on UFOs. David Brin wrote an informative and insightful blog post about rich doomsday preppers who want to destroy democracy [2]. Cory Doctorow wrote an interesting article about how ChatGPT helps people write letters and how that decreases the value of the letter [3]. What can we do to show that letters mean something? Hand deliver them? Pay someone to hand deliver them? Cory concentrates on legal letters and petitions but this can apply to other things too. David Brin wrote an informative blog post about billionaires prepping for disaster and causing the disaster [4]. David Brin wrote an insightful Wired article about ways of dealing with potential rogue AIs [5]. David Brin has an interesting take on government funded science [6]. Bruce Schneier wrote an insightful article about AI Risks which is worth reading [7]. Ximion wrote a great blog post about how tp use AppStream metadata to indicate what type of hardware/environment is required to use an app [8]. This is great for the recent use of Debian on phones and can provide real benefits for more traditional uses (like all those servers that accidentally got LibreOffice etc installed). Also for Convergence it will be good to have the app launcher take note of this, when your phone isn t connected to a dock there s no point offering to launch apps that require a full desktop screen. Russ Albery wrote an interesting summary of the book Going Infinite about the Sam Bankman-Fried FTX fiasco [9]. That summary really makes Sam sound Autistic. Cory Doctorow wrote an insightful article Microincentives and Enshittification explaining why Google search has to suck [10]. Charles Stross posted the text of a lecture he gave titles We re Sorry We Created the Torment Nexus [11] about sci-fi ideas that shouldn t be implemented. The Daily WTF has many stories of corporate computer stupidity, but The White Appliphant is one of the most epic [12]. The Verge has an informative article on new laws in the US and the EU to give a right to repair and how this explains the sudden change to 7 year support for Pixel phones [13].

19 December 2023

Matthew Garrett: Making SSH host certificates more usable

Earlier this year, after Github accidentally committed their private RSA SSH host key to a public repository, I wrote about how better support for SSH host certificates would allow this sort of situation to be handled in a user-transparent way without any negative impact on security. I was hoping that someone would read this and be inspired to fix the problem but sadly that didn't happen so I've actually written some code myself.

The core part of this is straightforward - if a server presents you with a certificate associated with a host key, then make the trust in that host be whoever signed the certificate rather than just trusting the host key. This means that if someone needs to replace the host key for any reason (such as, for example, them having published the private half), you can replace the host key with a new key and a new certificate, and as long as the new certificate is signed by the same key that the previous certificate was, you'll trust the new key and key rotation can be carried out without any user errors. Hurrah!

So obviously I wrote that bit and then thought about the failure modes and it turns out there's an obvious one - if an attacker obtained both the private key and the certificate, what stops them from continuing to use it? The certificate isn't a secret, so we basically have to assume that anyone who possesses the private key has access to it. We may have silently transitioned to a new host key on the legitimate servers, but a hostile actor able to MITM a user can keep on presenting the old key and the old certificate until it expires.

There's two ways to deal with this - either have short-lived certificates (ie, issue a new certificate every 24 hours or so even if you haven't changed the key, and specify that the certificate is invalid after those 24 hours), or have a mechanism to revoke the certificates. The former is viable if you have a very well-engineered certificate issuing operation, but still leaves a window for an attacker to make use of the certificate before it expires. The latter is something SSH has support for, but the spec doesn't define any mechanism for distributing revocation data.

So, I've implemented a new SSH protocol extension that allows a host to send a key revocation list to a client. The idea is that the client authenticates to the server, receives a key revocation list, and will no longer trust any certificates that are contained within that list. This seems simple enough, but a naive implementation opens the client to various DoS attacks. For instance, if you simply revoke any key contained within the received KRL, a hostile server could revoke any certificates that were otherwise trusted by the client. The easy way around this is for the client to ensure that any revoked keys are associated with the same CA that signed the host certificate - that way a compromised host can only revoke certificates associated with that CA, and can't interfere with anyone else.

Unfortunately that still means that a single compromised host can still trigger revocation of certificates inside that trust domain (ie, a compromised host a.test.com could push a KRL that invalidated the certificate for b.test.com), because there's no way in the KRL format to indicate that a given revocation is associated with a specific hostname. This means we need a mechanism to verify that the KRL update is legitimate, and the easiest way to handle that is to sign it. The KRL format specifies an in-band signature but this was deprecated earlier this year - instead KRLs are supposed to be signed with the sshsig format. But we control both the server and the client, which means it's easy enough to send a detached signature as part of the extension data.

Putting this all together: you ssh to a server you've never contacted before, and it presents you with a host certificate. Instead of the host key being added to known_hosts, the CA key associated with the certificate is added. From now on, if you ssh to that host and it presents a certificate signed by that CA, it'll be trusted. Optionally, the host can also send you a KRL and a signature. If the signature is generated by the CA key that you already trust, any certificates in that KRL associated with that CA key will be incorporated into local storage. The expected flow if a key is compromised is that the owner of the host generates a new keypair, obtains a new certificate for the new key, and adds the old certificate to a KRL that is signed with the CA key. The next time the user connects to that host, they receive the new key and new certificate, trust it because it's signed by the same CA key, and also receive a KRL signed with the same CA that revokes trust in the old certificate.

Obviously this breaks down if a user is MITMed with a compromised key and certificate immediately after the host is compromised - they'll see a legitimate certificate and won't receive any revocation list, so will trust the host. But this is the same failure mode that would occur in the absence of keys, where the attacker simply presents the compromised key to the client before trust in the new key has been created. This seems no worse than the status quo, but means that most users will seamlessly transition to a new key and revoke trust in the old key with no effort on their part.

The work in progress tree for this is here - at the point of writing I've merely implemented this and made sure it builds, not verified that it actually works or anything. Cleanup should happen over the next few days, and I'll propose this to upstream if it doesn't look like there's any showstopper design issues.

comments

17 November 2023

Jonathan Dowland: denver luna

picture of the denver luna record on a turntable

I haven't done one of these in a while! Denver Luna is the latest single from Underworld, here on a pink 12" vinyl. The notable thing about this release was it was preceded by an "acapella" mix, consisting of just Karl Hyde's vocals: albeit treated and layered. Personally I prefer the "main" single mix, which calls back to their biggest hits. The vinyl also features an instrumental take, which is currently unavailable in any other formats. The previous single (presumably both from a forthcoming album) was and the colour red. In this crazy world we live in, this was limited to 1,000 copies. Flippers have sold 4 on eBay already, at between 55 and 75.

14 November 2023

John Goerzen: It s More Important To Recognize What Direction People Are Moving Than Where They Are

I recently read a post on social media that went something like this (paraphrased): If you buy an EV, you re part of the problem. You re advancing car culture and are actively hurting the planet. The only ethical thing to do is ditch your cars and put all your effort into supporting transit. Anything else is worthless. There is some truth there; supporting transit in areas it makes sense is better than having more cars, even EVs. But of course the key here is in areas it makes sense. My road isn t even paved. I live miles from the nearest town. And get into the remote regions of the western USA and you ll find people that live 40 miles from the nearest neighbor. There s no realistic way that mass transit is ever going to be a thing in these areas. And even if it were somehow usable, sending buses over miles where nobody lives just to reach the few that are there will be worse than private EVs. And because I can hear this argument coming a mile away, no, it doesn t make sense to tell these people to just not live in the country because the planet won t support that anymore, because those people are literally the ones that feed the ones that live in the cities. The funny thing is: the person that wrote that shares my concerns and my goals. We both care deeply about climate change. We both want positive change. And I, ahem, recently bought an EV. I have seen this play out in so many ways over the last few years. Drive a car? Get yelled at. Support the wrong politician? Get a shunning. Not speak up loudly enough about the right politician? That s a yellin too. The problem is, this doesn t make friends. In fact, it hurts the cause. It doesn t recognize this truth:

It is more important to recognize what direction people are moving than where they are.

I support trains and transit. I ve donated money and written letters to politicians. But, realistically, there will never be transit here. People in my county are unable to move all the way to transit. But what can we do? Plenty. We bought an EV. I ve been writing letters to the board of our local electrical co-op advocating for relaxation of rules around residential solar installations, and am planning one myself. It may well be that our solar-powered transportation winds up having a lower carbon footprint than the poster s transit use. Pick your favorite cause. Whatever it is, consider your strategy: What do you do with someone that is very far away from you, but has taken the first step to move an inch in your direction? Do you yell at them for not being there instantly? Or do you celebrate that they have changed and are moving?

8 October 2023

Niels Thykier: A new Debian package helper: debputy

I have made a new helper for producing Debian packages called debputy. Today, I uploaded it to Debian unstable for the first time. This enables others to migrate their package build using dh +debputy rather than the classic dh. Eventually, I hope to remove dh entirely from this equation, so you only need debputy. But for now, debputy still leverages dh support for managing upstream build systems. The debputy tool takes a radicially different approach to packaging compared to our existing packaging methods by using a single highlevel manifest instead of all the debian/install (etc.) and no hook targets in debian/rules. Here are some of the things that debputy can do or does:

debputy can perform installation similar to dh_install, dh_installdocs (etc.) plus a bit of the dh-exec support. Notably, debputy supports install /usr/bin/foo into foo and install everything else in /usr/bin into foo-utils without you having to resort to weird tricks. With debhelper, this would require dh-exec s => /usr/bin/foo operator.
debputy can assign mode to files without needing hooks and static file ownership can be assigned without resorting to fakeroot. If you request Rules-Requires-Root: no, debputy will assemble the deb without using fakeroot. The fragileness of fakeroot may some day just be a horror story we tell our unruly children that they do not really believe is true.
debputy defaults to all scripts with a #! /bin/tool or #! /usr/bin/tool to have mode 0755 (a+x) unless they are placed in directories known not to have executable files (such as the perl module dirs). As an example, scripts in the examples directory may now get an automatic executable bit if they have a proper #!-line for /usr/bin or /bin.
debputy supports the default flow of 48 debhelper tools. If you are using pure dh $@ with no sequence add-ons and no hook targets in debian/rules (or only hook targets for the upstream side), odds are that debputy got your needs covered.

There are also some features that debputy does not support at the moment:

Almost any debhelper sequence add-on. The debputy tool comes with a migration tool that will auto-detect any unsupported dh add-on from Build-Depends and will flag them as potential problematic. The migration tool works from an list of approved add-ons. Note that manually activated add-ons via dh $@ with are not detected.
Anything that installs or recently installed into /lib or another /usr-merged location. My life is too short to be directly involved in the /usr-merge transition. This means no udev and no systemd unit support at the moment (note tmpfiles and sysusers is supported). For the systemd side, I am also contemplating a different model for how to deal with services. Even without the /usr-merge transition, these would not have been supported. The migration tool will detect problematic file in the debian directory immediately and debputy will detect upstream provided systemd unit files at build time.
There is also no support for packager provided maintscript files at this time. If you have your own maintscripts, then you will not be able to migrate. The migration tool detects the debhelper based path and warns you (such as debian/postinst).
Additionally, if you need special cases in tools (such as perl-base dependency with dh_perl) or rely on dh_strip-nondeterminsm for reproducibility, then you cannot or is advised not to migrate at this time.

There are all limitations of the current work in progress. I hope to resolve them all in due time.

Trying debputy With the limitations aside, lets talk about how you would go about migrating a package:

# Assuming here you have already run: apt install dh-debputy
$ git clone https://salsa.debian.org/rra/kstart
[...]
$ cd kstart
# Add a Build-Dependency on dh-sequence-debputy
$ perl -n -i -e \
   'print; print " dh-sequence-debputy,\n" if m/debhelper-compat/;' \
    debian/control
$ debputy migrate-from-dh --apply-changes
debputy: info: Loading plugin debputy (version: archive/debian/4.3-1) ...
debputy: info: Verifying the generating manifest
debputy: info: Updated manifest debian/debputy.manifest
debputy: info: Removals:
debputy: info:   rm -f "./debian/docs"
debputy: info:   rm -f "./debian/examples"
debputy: info: Migrations performed successfully
debputy: info: Remember to validate the resulting binary packages after rebuilding with debputy
$ cat debian/debputy.manifest 
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - NEWS
    - README
    - TODO
- install-examples:
    source: examples/krenew-agent
$ git add debian/debputy.manifest
$ git commit --signoff -am"Migrate to debputy"
# Run build tool of choice to verify the output.

This is of course a specific example that works out of the box. If you were to try this on debianutils (from git), the output would look something like this:

$ debputy migrate-from-dh
debputy: info: Loading plugin debputy (version: 5.13-13-g9836721) ...
debputy: error: Unable to migrate automatically due to missing features in debputy.
  * The "debian/triggers" debhelper config file (used by dh_installdeb is currently not supported by debputy.
Use --acceptable-migration-issues=[...] to convert this into a warning [...]

And indeed, debianutils requires at least 4 debhelper features beyond what debputy can support at the moment (all related to maintscripts and triggers).

Rapid feedback Rapid feedback cycles are important for keeping developers engaged in their work. The debputy tool provides the following features to enable rapid feedback.

Immediate manifest validation It would be absolutely horrible if you had to do a full-rebuild only to realize you got the manifest syntax wrong. Therefore, debputy has a check-manifest command that checks the manifest for syntactical and semantic issues.

$ cat debian/debputy.manifest
manifest-version: '0.1'
installations:
- install-docs:
    sources:
    - GETTING-STARTED-WITH-dh-debputy.md
    - MANIFEST-FORMAT.md
    - MIGRATING-A-DH-PLUGIN.md
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-1-gf34bd66) ...
debputy: info: No errors detected.
$ cat <<EOF >> debian/debputy.manifest
- install:
    sourced: foo
    as: usr/bin/foo
EOF
# Did I typo anything?
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: warning: Possible typo: The key "sourced" at "installations[1].install" should probably have been 'source'
debputy: error: Unknown keys " 'sourced' " at installations[1].install".  Keys that could be used here are: sources, when, dest-dir, source, into.
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
$ sed -i s/sourced:/source:/ debian/debputy.manifest
$ debputy check-manifest
debputy: info: Loading plugin debputy (version: 0.1.7-2-g4ef8c2f) ...
debputy: info: No errors detected.

The debputy check-manifest command is limited to the manifest itself and does not warn about foo not existing as it could be produced as apart of the upstream build system. Therefore, there are still issues that can only be detected at package build time. But where debputy can reliably give you immediate feedback, it will do so.

Idempotence: Clean re-runs of dh_debputy without clean/rebuild If you read the fine print of many debhelper commands, you may see the following note their manpage:
This command is not idempotent. dh_prep(1) should be called between invocations of this command Manpage of an anonymous debhelper tool
What this usually means, is that if you run the command twice, you will get its maintscript change (etc.) twice in the final deb. This fits into our single-use clean throw-away chroot builds on the buildds and CI as well as dpkg-buildpackage s no-clean (-nc) option. Single-use throw-away chroots are not very helpful for debugging though, so I rarely use them when doing the majority of my packaging work as I do not want to wait for the chroot initialization (including installing of build-depends). But even then, I have found that dpkg-buildpackage -nc has been useless for me in many cases as I am stuck between two options:

With -nc, you often still interact with the upstream build system. As an example, debhelper will do a dh_prep followed by dh_auto_install, so now we are waiting for upstream s install target to run again. What should have taken seconds now easily take 0.5-1 minute extra per attempt.

If you want to by-pass this, you have to manually call the helpers needed (in correct order) and every run accumulates cruft from previous runs to the point that cruft drowns out the actual change you want to see. Also, I am rarely in the mood to play human dh, when I am debugging an issue that I failed to fix in my first, second and third try.

As you can probably tell, neither option has worked that well for me. But with dh_debputy, I have made it a goal that it will not self-taint the final output. If dh_debputy fails, you should be able to tweak the manifest and re-run dh_debputy with the same arguments.

No waiting for dpkg-buildpackage -nc nor anything implied by that.

No self-tainting of the final deb. The result you get, is the result you would have gotten if the previous dh_debputy run never happened.

Because dh_debputy produces the final result, I do not have to run multiple tools in the right order.

Obviously, this is currently a lot easier, because debputy is not involved in the upstream build system at all. If this feature is useful to you, please do let me know and I will try to preserve it as debputy progresses in features.

Packager provided files On a different topic, have you ever wondered what kind of files you can place into the debian directory that debhelper automatically picks up or reacts too? I do not have an answer to that beyond it is over 80 files and that as the maintainer of debhelper, I am not willing to manually maintain such a list manually. However, I do know what the answer is in debputy, because I can just ask debputy:

$ debputy plugin list packager-provided-files
+-----------------------------+---------------------------------------------[...]
  Stem                          Installed As                                [...]
+-----------------------------+---------------------------------------------[...]
  NEWS                          /usr/share/doc/ name /NEWS.Debian           [...]
  README.Debian                 /usr/share/doc/ name /README.Debian         [...]
  TODO                          /usr/share/doc/ name /TODO.Debian           [...]
  bug-control                   /usr/share/bug/ name /control               [...]
  bug-presubj                   /usr/share/bug/ name /presubj               [...]
  bug-script                    /usr/share/bug/ name /script                [...]
  changelog                     /usr/share/doc/ name /changelog.Debian      [...]
  copyright                     /usr/share/doc/ name /copyright             [...]
[...]

This will list all file types (Stem column) that debputy knows about and it accounts for any plugin that debputy can find. Note to be deterministic, debputy will not auto-load plugins that have not been explicitly requested during package builds. So this list could list files that are available but not active for your current package. Note the output is not intended to be machine readable. That may come in later version. Feel free to chime in if you have a concrete use-case.

Take it for a spin As I started this blog post with, debputy is now available in unstable. I hope you will take it for a spin on some of your simpler packages and provide feedback on it. For documentation, please have a look at:

GETTING-STARTED-WITH-dh-debputy.md (how-to guide)

MANIFEST-FORMAT.md (reference documentation)

Thanks for considering PS: My deepest respect to the fakeroot maintainers. That game of whack-a-mole is not something I would have been willing to maintain. I think fakeroot is like the Python GIL in the sense that it has been important in getting Debian to where it is today. But at the same time, I feel it is time to let go of the crutch and find a proper solution.

30 September 2023

Ian Jackson: DKIM: rotate and publish your keys

If you are an email system administrator, you are probably using DKIM to sign your outgoing emails. You should be rotating the key regularly and automatically, and publishing old private keys. I have just released dkim-rotate 1.0; dkim-rotate is a tool to do this key rotation and publication. If you are an email user, your email provider ought to be doing this. If this is not done, your emails are non-repudiable , meaning that if they are leaked, anyone (eg, journalists, haters) can verify that they are authentic, and prove that to others. This is not desirable (for you). Non-repudiation of emails is undesirable This problem was described at some length in Matthew Green s article Ok Google: please publish your DKIM secret keys. Avoiding non-repudiation sounds a bit like lying. After all, I m advising creating a situation where some people can t verify that something is true, even though it is. So I m advocating casting doubt. Crucially, though, it s doubt about facts that ought to be private. When you send an email, that s between you and the recipient. Normally you don t intend for anyone, anywhere, who happens to get a copy, to be able to verify that it was really you that sent it. In practical terms, this verifiability has already been used by journalists to verify stolen emails. Associated Press provide a verification tool. Advice for all email users As a user, you probably don t want your emails to be non-repudiable. (Other people might want to be able to prove you sent some email, but your email system ought to serve your interests, not theirs.) So, your email provider ought to be rotating their DKIM keys, and publishing their old ones. At a rough guess, your provider probably isn t :-(. How to tell by looking at email headers A quick and dirty way to guess is to have a friend look at the email headers of a message you sent. (It is important that the friend uses a different email provider, since often DKIM signatures are not applied within a single email system.) If your friend sees a DKIM-Signature header then the message is DKIM signed. If they don t, then it wasn t. Most email traversing the public internet is DKIM signed nowadays; so if they don t see the header probably they re not looking using the right tools, or they re actually on the same email system as you. In messages signed by a system running dkim-rotate, there will also be a header about the key rotation, to notify potential verifiers of the situation. Other systems that avoid non-repudiation-through-DKIM might do something similar. dkim-rotate s header looks like this:

DKIM-Signature-Warning: NOTE REGARDING DKIM KEY COMPROMISE
 https://www.chiark.greenend.org.uk/dkim-rotate/README.txt
 https://www.chiark.greenend.org.uk/dkim-rotate/ae/aeb689c2066c5b3fee673355309fe1c7.pem

But an email system might do half of the job of dkim-rotate: regularly rotating the key would cause the signatures of old emails to fail to verify, which is a good start. In that case there probably won t be such a header. Testing verification of new and old messages You can also try verifying the signatures. This isn t entirely straightforward, especially if you don t have access to low-level mail tooling. Your friend will need to be able to save emails as raw whole headers and body, un-decoded, un-rendered. If your friend is using a traditional Unix mail program, they should save the message as an mbox file. Otherwise, ProPublica have instructions for attaching and transferring and obtaining the raw email. (Scroll down to How to Check DKIM and ARC .) Checking that recent emails are verifiable Firstly, have your friend test that they can in fact verify a DKIM signature. This will demonstrate that the next test, where the verification is supposed to fail, is working properly and fails for the right reasons. Send your friend a test email now, and have them do this on a Linux system:

    # save the message as test-email.mbox
    apt install libmail-dkim-perl # or equivalent on another distro
    dkimproxy-verify <test-email.mbox

You should see output containing something like this:

    originator address: ijackson@chiark.greenend.org.uk
    signature identity: @chiark.greenend.org.uk
    verify result: pass
    ...

If the output ontains verify result: fail (body has been altered) then probably your friend didn t manage to faithfully save the unalterered raw message. Checking old emails cannot be verified When you both have that working, have your friend find an older email of yours, from (say) month ago. Perform the same steps. Hopefully they will see something like this:

    originator address: ijackson@chiark.greenend.org.uk
    signature identity: @chiark.greenend.org.uk
    verify result: fail (bad RSA signature)

or maybe

    verify result: invalid (public key: not available)

This indicates that this old email can no longer be verified. That s good: it means that anyone who steals a copy, can t verify it either. If it s leaked, the journalist who receives it won t know it s genuine and unmodified; they should then be suspicious. If your friend sees verify result: pass, then they have verified that that old email of yours is genuine. Anyone who had a copy of the mail can do that. This is good for email thieves, but not for you. For email admins: announcing dkim-rotate 1.0 I have been running dkim-rotate 0.4 on my infrastructure, since last August. and I had entirely forgotten about it: it has run flawlessly for a year. I was reminded of the topic by seeing DKIM in other blog posts. Obviously, it is time to decreee that dkim-rotate is 1.0. If you re a mail system administrator, your users are best served if you use something like dkim-rotate. The package is available in Debian stable, and supports Exim out of the box, but other MTAs should be easy to support too, via some simple ad-hoc scripting. Limitation of this approach Even with this key rotation approach, emails remain nonrepudiable for a short period after they re sent - typically, a few days. Someone who obtains a leaked email very promptly, and shows it to the journalist (for example) right away, can still convince the journalist. This is not great, but at least it doesn t apply to the vast bulk of your email archive. There are possible email protocol improvements which might help, but they re quite out of scope for this article.

Edited 2023-10-01 00:20 +01:00 to fix some grammar

comments

Russell Coker: Links September 2023

Interesting article in Wired about adversarial attacks on ML systems to get them to do things that they are explicitely programmed not to do such as describe how to make illegal drugs [1]. The most interesting part of this is that the attacks work on most GPT systems which is probably due to the similar data used to train them. Vice has an interesting article about the Danish Synthetic Party , a political partyled by an AI [2]. Citizens can vote for candidates who will try to get laws passed that match the AI generated goals, there is no option of voting for an AI character. The policies they are advocating for are designed to appeal to the 20% of Danes who don t vote. They are also trying to inspire similar parties in other countries. I think this has the potential to improve democracy. Vice reports that in 2021 a man tried to assasinate the Queen of England with inspiration from Star Wars and an AI chat bot [3]. While someone who wants to be a real-life Sith is probably going to end up doing something bad we still don t want to have chat bots encourage it. Bruce Schneier wrote an interesting article about milestones for AI involvement in the political process [4]. Sam Varghese wrote an interesting article about the allegations that India is following the example of Saudi Arabia and assasinating people in other countries who disagree with their government [5]. We need to stop this. Ian Jackson wrote an interesting blog post advocating that DKIM PRIVATE keys be rotated and PUBLISHED [6]. The idea is that if a hostile party gets access to the mailbox of someone who received private email from you then in the normal DKIM setup of keys never changing they can prove that the email is authentic when they leak it. While if you mail server publishes the old keys as Ian advocates then the hostile party can t prove that you sent the email in question as anyone could have forged a signature. Anything that involves publishing a private key gets an immediate negative reaction but I can t fault the logic here.

25 August 2023

Debian Brasil: Debian Day 30 anos online no Brasil

Em 2023 o tradicional Debian Day est sendo celebrado de forma especial, afinal no dia 16 de agostoo Debian completou 30 anos! Para comemorar este marco especial na vida do Debian, a comunidade Debian Brasil organizou uma semana de palestras online de 14 a 18 de agosto. O evento foi chamado de Debian 30 anos. Foram realizadas 2 palestras por noite, das 19h s 22h, transmitidas pelo canal Debian Brasil no YouTube totalizando 10 palestras. As grava es j est o dispon veis tamb m no canal Debian Brasil no Peertube. Nas 10 atividades tivemos as participa es de 9 DDs, 1 DM, 3 contribuidores(as). A audi ncia ao vivo variou bastante, e o pico foi na palestra sobre preseed com o Eriberto Mota quando tivemos 47 pessoas assistindo. Obrigado a todos(as) participantes pela contribui o que voc s deram para o sucesso do nosso evento.

Antonio Terceiro
Aquila Macedo
Charles Melara
Daniel Lenharo de Souza
David Polverari
Eriberto Mota
Giovani Ferreira
Jefferson Maier
Lucas Kanashiro
Paulo Henrique de Lima Santana
Sergio Durigan Junior
Thais Araujo
Thiago Andrade

Veja abaixo as fotos de cada atividade:

Nova gera o: uma entrevista com iniciantes no projeto Debian

Instala o personalizada e automatizada do Debian com preseed

Manipulando patches com git-buildpackage

debian.social: Socializando Debian do jeito Debian

Proxy reverso com WireGuard

Celebra o dos 30 anos do Debian!

Instalando o Debian em disco criptografado com LUKS

O que a equipe de localiza o j conquistou nesses 30 anos

Debian - Projeto e Comunidade!

Design Gr fico e Software livre, o que fazer e por onde come ar

24 August 2023

Debian Brasil: Debian Day 30 anos em Belo Horizonte

Pela primeira vez a cidade de Belo Horizonte realizou um Debian Day para celebrar o anivers rio do Projeto Debian. As comunidades Debian Minas Gerais e Software Livre de BH e Regi o se sentiram motivadas para celebrar esta data especial devido aos 30 anos do Projeto Debian em 2023 e organizou um encontro no dia 12 de agosto dentro Espa o do Conhecimento da UFMG. A organiza o do Debian Day em Belo Horizonte recebeu o importante apoio do Departamento de Ci ncia da Computa o da UFMG para reservar a sala que foi utilizada para o evento. A programa o contou com tr s atividades:

Palestra O projeto Debian quer voc ! Paulo Henrique de Lima Santana
Palestra Personalizando o Debian para uso em escolas da PBH: a hist ria da Libertas - Fred Guimar es
Bate-papo sobre os pr ximos passos para formar uma comunidade de Software Livre em BH - Bruno Braga Fonseca

No total etiveram presentes 11 pessoas e fizemos uma foto com as que ficaram at o final. Presentes no Debian Day 2023 em BH

20 August 2023

Russell Coker: GPT Systems and Relationships

Sam Hartman wrote an interesting blog post about his work as a sex and intimacy educator and how GPT systems could impact that [1]. I ve read some positive reviews of Replika a commercial system that is somewhat promoted as a counsellor [2], so I decided to try it out. In my brief trial it seemed to be using all the methods that Android pay to play games are known for. Having multiple types of in-game currency, pay to buy new clothes etc for your friend, etc. Basically it seems pretty horrible. I didn t pay for it and the erotic and romantic features all require payment so I didn t test that. When thinking about this logically, having a system designed to deal with people when they are vulnerable (either being in a romantic relationship or getting counselling) that uses manipulative techniques to get money from them can t have a good result. So a free software system seems the best option. When I first learned of virtual girlfriends I never thought I would feel compelled to advocate for a free software virtual dating program, but that s where the world has got to. Virtual girlfriends have been around for years now. Several years ago I watched a documentary about their use in Japan. It seemed a bit strange when a group of men who had virtual girlfriends had a dinner party with their tablets and phones propped up so their girlfriends could join in as they all appeared to be dating the same girl. The documentary didn t go in to enough detail to cover whether the girlfriend app could learn or be customised enough that they would seem to have different personalities. Virtual boyfriends have also been around for a while apparently without most people noticing. I just Googled it and found a review of a virtual boyfriend app published in 2016! One thing that will probably concern people is the possibility for virtual dating systems to be used for inappropriate things. That is a reasonable thing to be concerned about but I don t think it s possible to prevent technology that has already been released from doing such things. As a general rule technology can always be used for good and bad things so we need to just make it easy to do good things and let the legal system develop ways of dealing with the bad things.

16 August 2023

Sam Hartman: A First Exercise with AI Training

Taking a hands-on low-level approach to learning AI has been incredibly rewarding. I wanted to create an achievable task that would motivate me to learn the tools and get practical experience training and using large language models. Just at the point when I was starting to spin up GPU instances, Llama2 was released to the public. So I elected to start with that model. As I mentioned, I m interested in exploring how sex-positive AI can help human connection in positive ways. For that reason, I suspected that Llama2 might not produce good results without training: some of Meta s safety goals run counter to what I m trying to explore. I suspected that there might be more attention paid to safety in the chat variants of Llama2 rather than the text generation variants, and working against that might be challenging for a first project, so I started with Llama-2-13b as a base. Preparing a Dataset I elected to generate a fine tuning dataset using fiction. Long term, that might not be a good fit. But I ve always wanted to understand how an LLM s tone is adjusted how you get an LLM to speak in a different voice. So much of fine tuning focuses on examples where a given prompt produces a particular result. I wanted to understand how to bring in data that wasn t structured as prompts. The Huggingface course actually gives an example of how to adjust a model set up for masked language modeling trained on wikitext to be better at predicting the vocabulary of movie reviews. There though, doing sample breaks in the dataset at movie review boundaries makes sense. There s another example of training an LLM from scratch based on a corpus of python code. Between these two examples, I figured out what I needed. It was relatively simple in retrospect: tokenize the whole mess, and treat everything as output. That is, compute loss on all the tokens. Long term, using fiction as a way to adjust how the model responds is likely to be the wrong starting point. However, it maximized focus on aspects of training I did not understand and allowed me to satisfy my curiosity. Rangling the Model I decided to actually try and add additional training to the model directly rather than building an adapter and fine tuning a small number of parameters. Partially this was because I had enough on my mind without understanding how LoRA adapters work. Partially, I wanted to gain an appreciation for the infrastructure complexity of AI training. I have enough of a cloud background that I ought to be able to work on distributed training. (As it turned out, using BitsAndBytes 8-bit optimizer, I was just able to fit my task onto a single GPU). I wasn t even sure that I could make a measurable difference in Llama-2-13b running 890,000 training tokens through a couple of training epochs. As it turned out I had nothing to fear on that front. Getting everything to work was more tricky than I expected. I didn t have an appreciation for exactly how memory intensive training was. The Transformers documentation points out that with typical parameters for mixed-precision training, it takes 18 bytes per model parameter. Using bfloat16 training and an 8-bit optimizer was enough to get things to fit. Of course then I got to play with convergence. My initial optimizer parameters caused the model to diverge, and before I knew it, my model had turned to NAN, and would only output newlines. Oops. But looking back over the logs, watching what happened to the loss, and looking at the math in the optimizer to understand how I ended up getting something that rounded to a divide by zero gave me a much better intuition for what was going on. The results. This time around I didn t do anything in the way of quantitative analysis of what I achieved. Empirically I definitely changed the tone of the model. The base Llama-2 model tends to steer away from sexual situations. It s relatively easy to get it to talk about affection and sometimes attraction. Unsurprisingly, given the design constraints, it takes a bit to get it to wonder into sexual situations. But if you hit it hard enough with your prompt, it will go there, and the results are depressing. At least for prompts I used, it tended to view sex fairly negatively. It tended to be less coherent than with other prompts. One inference managed to pop out in the middle of some text that wasn t hanging together well, Chapter 7 - Rape. With my training, I did manage to achieve my goal of getting the model to use more positive language and emotional signaling when talking about sexual situations. More importantly, I gained a practical understanding of many ways training can go wrong.

There were overfitting problems: names of characters from my dataset got more attention than I wished they did. As a model for interacting with some of the universes I used as input, that was kind of cool, but if I was looking to just adjust how the model talked about intimate situations, I massively got things to be too specific.
I gained a new appreciation for how easy it is to trigger catastrophic forgetting.
I begin to appreciate how this sort of unsupervised training could be best paired with supervised training to help correct model confusion. Playing with the model, I often ran into cases where my reaction was like Well, I don t want to train it to give that response, but if it ever does wander into this part of the state space, I d like to at least get it to respond more naturally. And I think I understand how to approach that either with custom loss functions or manipulating which tokens compute loss and which ones do not.
And of course realized I need to learn a lot about sanitizing and preparing datasets.

A lot of articles I ve been reading about training make more sense. I have better intuition for why you might want to do training a certain way, or why mechanisms for countering some problem will be important. Future Activities:

Look into LoRA adapters; having understood what happens when you manipulate the model directly, I can now move on to intelligent solutions.
Look into various mechanisms for rewards and supervised training.
See how hard it is to train a chat based model out of some of its safety constraints.
Construct datasets; possibly looking at sources like relationship questions/advice.

comments

9 August 2023

Antoine Beaupr : OpenPGP key transition

This is a short announcement to say that I have changed my main OpenPGP key. A signed statement is available with the cryptographic details but, in short, the reason is that I stopped using my old YubiKey NEO that I have worn on my keyring since 2015. I now have a YubiKey 5 which supports ED25519 which features much shorter keys and faster decryption. It allowed me to move all my secret subkeys on the key (including encryption keys) while retaining reasonable performance. I have written extensive documentation on how to do that OpenPGP key rotation and also YubiKey OpenPGP operations.

Warning on storing encryption keys on a YubiKey People wishing to move their private encryption keys to such a security token should be very careful as there are special precautions to take for disaster recovery. I am toying with the idea of writing an article specifically about disaster recovery for secrets and backups, dealing specifically with cases of death or disabilities.

Autocrypt changes One nice change is the impact on Autocrypt headers, which are considerably shorter. Before, the header didn't even fit on a single line in an email, it overflowed to five lines:

Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xsFNBEogKJ4BEADHRk8dXcT3VmnEZQQdiAaNw8pmnoRG2QkoAvv42q9Ua+DRVe/yAEUd03EOXbMJl++YKWpVuzSFr7IlZ+/lJHOCqDeSsBD6LKBSx/7uH2EOIDizGwfZNF3u7X+gVBMy2V7rTClDJM1eT9QuLMfMakpZkIe2PpGE4g5zbGZixn9er+wEmzk2mt20RImMeLK3jyd6vPb1/Ph9+bTEuEXi6/WDxJ6+b5peWydKOdY1tSbkWZgdi+Bup72DLUGZATE3+Ju5+rFXtb/1/po5dZirhaSRZjZA6sQhyFM/ZhIj92mUM8JJrhkeAC0iJejn4SW8ps2NoPm0kAfVu6apgVACaNmFb4nBAb2k1KWru+UMQnV+VxDVdxhpV628Tn9+8oDg6c+dO3RCCmw+nUUPjeGU0k19S6fNIbNPRlElS31QGL4H0IazZqnE+kw6ojn4Q44h8u7iOfpeanVumtp0lJs6dE2nRw0EdAlt535iQbxHIOy2x5m9IdJ6q1wWFFQDskG+ybN2Qy7SZMQtjjOqM+CmdeAnQGVwxowSDPbHfFpYeCEb+Wzya337Jy9yJwkfa+V7e7Lkv9/OysEsV4hJrOh8YXu9a4qBWZvZHnIO7zRbz7cqVBKmdrL2iGqpEUv/x5onjNQwpjSVX5S+ZRBZTzah0w186IpXVxsU8dSk0yeQskblrwARAQABzSlBbnRvaW5lIEJlYXVwcsOpIDxhbmFyY2F0QHRvcnByb2plY3Qub3JnPsLBlAQTAQgAPgIbAwULCQgHAwUVCgkICwUWAgMBAAIeAQIXgBYhBI3JAc5kFGwEitUPu3khUlJ7dZIeBQJihnFIBQkacFLiAAoJEHkhUlJ7dZIeXNAP/RsX+27l9K5uGspEaMH6jabAFTQVWD8Ch1om9YvrBgfYtq2k/m4WlkMh9IpT89Ahmlf0eq+V1Vph4wwXBS5McK0dzoFuHXJa1WHThNMaexgHhqJOs
 S60bWyLH4QnGxNaOoQvuAXiCYV4amKl7hSuDVZEn/9etDgm/UhGn2KS3yg0XFsqI7V/3RopHiDT+k7+zpAKd3st2V74w6ht+EFp2Gj0sNTBoCdbmIkRhiLyH9S4B+0Z5dUCUEopGIKKOSbQwyD5jILXEi7VTZhN0CrwIcCuqNo7OXI6e8gJd8McymqK4JrVoCipJbLzyOLxZMxGz8Ki0b9O844/DTzwcYcg9I1qogCsGmZfgVze2XtGxY+9zwSpeCLeef6QOPQ0uxsEYSfVgS+onCesSRCgwAPmppPiva+UlGuIMun87gPpQpV2fqFg/V8zBxRvs6YTGcfcQjfMoBHmZTGb+jk1//QAgnXMO7fGG38YH7iQSSzkmodrH2s27ZKgUTHVxpBL85ptftuRqbR7MzIKXZsKdA88kjIKKXwMmez9L1VbJkM4k+1Kzc5KdVydwi+ujpNegF6ZU8KDNFiN9TbDOlRxK5R+AjwdS8ZOIa4nci77KbNF9OZuO3l/FZwiKp8IFJ1nK7uiKUjmCukL0od/6X2rJtAzJmO5Co93ZVrd5r48oqUvjklzzsBNBFmeC3oBCADEV28RKzbv3dEbOocOsJQWr1R0EHUcbS270CrQZfb9VCZWkFlQ/1ypqFFQSjmmUGbNX2CG5mivVsW6Vgm7gg8HEnVCqzL02BPY4OmylskYMFI5Bra2wRNNQBgjg39L9XU4866q3BQzJp3r0fLRVH8gHM54Jf0FVmTyHotR/Xiw5YavNy2qaQXesqqUv8HBIha0rFblbuYI/cFwOtJ47gu0QmgrU0ytDjlnmDNx4rfsNylwTIHS0Oc7Pezp7MzLmZxnTM9b5VMprAXnQr4rewXCOUKBSto+j4rD5/77DzXw96bbueNruaupb2Iy2OHXNGkB0vKFD3xHsXE2x75NBovtABEBAAHCwqwEGAEIACAWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWZ4LegIbAgFACRB5IV
 JSe3WSHsB0IAQZAQgAHRYhBHsWQgTQlnI7AZY1qz6h3d2yYdl7BQJZngt6AAoJED6h3d2yYdl7CowH/Rp7GHEoPZTSUK8Ss7crwRmuAIDGBbSPkZbGmm4bOTaNs/gealc2tsVYpoMx7aYgqUW+t+84XciKHT+bjRv8uBnHescKZgDaomDuDKc2JVyx6samGFYuYPcGFReRcdmH0FOoPCn7bMW5mTPztV/wIA80LZD9kPKIXanfUyI3HLP0BPwZG4WTpKzJaalR1BNwu2oF6kEK0ymH3LfDiJ5Sr6emI2jrm4gH+/19ux/x+ST4tvm2PmH3BSQOPzgiqDiFd7RZoAIhmwr3FW4epsK9LtSxsi9gZ2vATBKO1oKtb6olW/keQT6uQCjqPSGojwzGRT2thEANH+5t6Vh0oDPZhrKUXRAAxHMBNHEaoo/M0sjZo+5OF3Ig1rMnI6XbKskLv6hu13cCymW0w/5E4XuYnyQ1cNC3pLvqDQbDx5mAPfBVHuqxJdRLQ3yDM/D2QIsxnkzQwi0FsJuni4vuJzWK/NHHDCvxMCh0YmSgbptUtgW8/niatd2Y6MbfRGxUHoctKtzqzivC8hKMTFrj4AbZhg/e9QVCsh5zSXtpWP0qFDJsxRMx0/432n9d4XUiy4U672r9Q09SsynB3QN6nTaCTWCIxGxjIb+8kJrRqTGwy/PElHX6kF0vQUWZNf2ITV1sd6LK/s/7sH+x4rzgUEHrsKr/qPvY3rUY/dQLd+owXesY83ANOu6oMWhSJnPMksbNa4tIKKbjmw3CFIOfoYHOWf3FtnydHNXoXfj4nBX8oSnkfhLILTJgf6JDFXfw6mTsv/jMzIfDs7PO1LK2oMK0+prSvSoM8bP9dmVEGIurzsTGjhTOBcb0zgyCmYVD3S48vZlTgHszAes1zwaCyt3/tOwrzU5JsRJVns+B/TUYaR/u3oIDMDygvE5ObWxXaFVnCC59r+zl0FazZ0ouyk2AYIR
 zHf+n1n98HCngRO4FRel2yzGDYO2rLPkXRm+NHCRvUA/i4zGkJs2AV0hsKK9/x8uMkBjHAdAheXhY+CsizGzsKjjfwvgqf84LwAzSDdZqLVE2yGTOwU0ESiArJwEQAJhtnC6pScWjzvvQ6rCTGAai6hrRiN6VLVVFLIMaMnlUp92EtgVSNpw6kANtRTpKXUB5fIPZVUrVdfEN06t96/6LE42tgifDAFyFTZY5FdHHri1GG/Cr39MpW2VqCDCtTTPVWHTUlU1ZG631BJ+9NB+ce58TmLr6wBTQrT+W367eRFBC54EsLNb7zQAspCn9pw1xf1XNHOGnrAQ4r9BXhOW5B8CzRd4nLRQwVgtw/c5M/bjemAOoq2WkwN+0mfJe4TSfHwFUozXuN274X+0Gr10fhp8xEDYuQM0qu6W3aDXMBBwIu0jTNudEELsTzhKUbqpsBc9WjwNMCZoCuSw/RTpFBV35mXbqQoQgbcU7uWZslLl9Wvv/C6rjXgd+GeX8SGBjTqq1ZkTv5UXLHTNQzPnbkNEExzqToi/QdSjFMIACnakeOSxc0ckfnsd9pfGv1PUyPyiwrHiqWFzBijzGIZEHxhNGFxAkXwTJR7Pd40a7RDxwbO6p/TSIIum41JtteehLHwTRDdQNMoyfLxuNLEtNYS0uR2jYI1EPQfCNWXCdT2ZK/l6GVP6jyB/olHBIOr+oVXqJh+48ki8cATPczhq3fUr7UivmguGwD67/4omZ4PCKtz1hNndnyYFS9QldEGo+AsB3AoUpVIA0XfQVkxD9IZr+Zu6aJ6nWq4M2bsoxABEBAAHCwXYEGAEIACACGwwWIQSNyQHOZBRsBIrVD7t5IVJSe3WSHgUCWPerZAAKCRB5IVJSe3WSHkIgEACTpxdn/FKrwH0/LDpZDTKWEWm4416l13RjhSt9CUhZ/Gm2GNfXcVTfoF/jKXXgjHcV1DHjfLUPmPVwMdqlf5ACOiFqIUM2ag/OEARh356w
 YG7YEobMjX0CThKe6AV2118XNzRBw/S2IO1LWnL5qaGYPZONUa9Pj0OaErdKIk/V1wge8Zoav2fQPautBcRLW5VA33PH1ggoqKQ4ES1hc9HC6SYKzTCGixu97mu/vjOa8DYgM+33TosLyNy+bCzw62zJkMf89X0tTSdaJSj5Op0SrRvfgjbC2YpJOnXxHr9qaXFbBZQhLjemZi6zRzUNeJ6A3Nzs+gIc4H7s/bYBtcd4ugPEhDeCGffdS3TppH9PnvRXfoa5zj5bsKFgjqjWolCyAmEvd15tXz5yNXtvrpgDhjF5ozPiNp/1EeWX4DxbH2i17drVu4fXwauFZ6lcsAcJxnvCA28RlQlmEQu/gFOx1axVXf6GIuXnQSjQN6qJbByUYrdc/cFCxPO2/lGuUxnufN9Tvb51Qh54laPgGLrlD2huQeSD9Sxa0MNUjNY0qLqaReT99Ygb2LPYGSLoFVx9iZz6sZNt07LqCx9qNgsJwsdmwYsNpMuFbc7nkWjtlEqzsXZHTvYN654p43S+hcAhmmOzQZcew6h71fAJLciiqsPBnCEdgCGFAWhZZdPkMA==

After the change, the entire key fits on a single line, neat!

Autocrypt: addr=anarcat@torproject.org; prefer-encrypt=nopreference;
 keydata=xjMEZHZPzhYJKwYBBAHaRw8BAQdAWdVzOFRW6FYVpeVaDo3sC4aJ2kUW4ukdEZ36UJLAHd7NKUFudG9pbmUgQmVhdXByw6kgPGFuYXJjYXRAdG9ycHJvamVjdC5vcmc+wpUEExYIAD4WIQS7ts1MmNdOE1inUqYCKTpvpOU0cwUCZHZgvwIbAwUJAeEzgAULCQgHAwUVCgkICwUWAgMBAAIeAQIXgAAKCRACKTpvpOU0c47SAPdEqfeHtFDx9UPhElZf7nSM69KyvPWXMocu9Kcu/sw1AQD5QkPzK5oxierims6/KUkIKDHdt8UcNp234V+UdD/ZB844BGR2UM4SCisGAQQBl1UBBQEBB0CYZha2IMY54WFXMG4S9/Smef54Pgon99LJ/hJ885p0ZAMBCAfCdwQYFggAIBYhBLu2zUyY104TWKdSpgIpOm+k5TRzBQJkdlDOAhsMAAoJEAIpOm+k5TRzBg0A+IbcsZhLx6FRIqBJCdfYMo7qovEo+vX0HZsUPRlq4HkBAIctCzmH3WyfOD/aUTeOF3tY+tIGUxxjQLGsNQZeGrQI

Note that I have implemented my own kind of ridiculous Autocrypt support for the Notmuch Emacs email client I use, see this elisp code. To import keys, I pipe the message into this script which is basically just:

sq autocrypt decode   gpg --import

... thanks to Sequoia best-of-class Autocrypt support.

Note on OpenPGP usage While some have claimed OpenPGP's death, I believe those are overstated. Maybe it's just me, but I still use OpenPGP for my password management, to authenticate users and messages, and it's the interface to my YubiKey for authenticating with SSH servers. I understand people feel that OpenPGP is possibly insecure, counter-intuitive and full of problems, but I think most of those problems should instead be attributed to its current flagship implementation, GnuPG. I have tried to work with GnuPG for years, and it keeps surprising me with evilness and oddities. I have high hopes that the Sequoia project can bring some sanity into this space, and I also hope that RFC4880bis can eventually get somewhere so we have a more solid specification with more robust crypto. It's kind of a shame that this has dragged on for so long, but Update: there's a separate draft called openpgp-crypto-refresh that might actually be adopted as the "OpenPGP RFC" soon! And it doesn't keep real work from happening in Sequoia and other implementations. Thunderbird rewrote their OpenPGP implementation with RNP (which was, granted, a bumpy road because it lost compatibility with GnuPG) and Sequoia now has a certificate store with trust management (but still no secret storage), preliminary OpenPGP card support and even a basic GnuPG compatibility layer. I'm also curious to try out the OpenPGP CA capabilities. So maybe it's just because I'm becoming an old fart that doesn't want to change tools, but so far I haven't seen a good incentive in switching away from OpenPGP, and haven't found a good set of tools that completely replace it. Maybe OpenSSH's keys and CA can eventually replace it, but I suspect they will end up rebuilding most of OpenPGP anyway, just more slowly. If they do, let's hope they avoid the mistakes our community has done in the past at least...

1 August 2023

Reproducible Builds: Supporter spotlight: Simon Butler on business adoption of Reproducible Builds

The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do. This is the seventh instalment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by featuring the Civil Infrastructure Platform project, and followed this up with a post about the Ford Foundation as well as recent ones about ARDC, the Google Open Source Security Team (GOSST), Bootstrappable Builds, the F-Droid project and David A. Wheeler. Today, however, we will be talking with Simon Butler, an associate senior lecturer in the School of Informatics at the University of Sk vde, where he undertakes research in software engineering that focuses on IoT and open source software, and contributes to the teaching of computer science to undergraduates.

Chris: For those who have not heard of it before, can you tell us more about the School of Informatics at Sk vde University? Simon: Certainly, but I may be a little long-winded. Sk vde is a city in the area between the two large lakes in southern Sweden. The city is a busy place. Sk vde is home to the regional hospital, some of Volvo s manufacturing facilities, two regiments of the Swedish defence force, a lot of businesses in the Swedish computer games industry, other tech companies and more.

The University of Sk vde is relatively small. Sweden s large land area and low population density mean that regional centres such as Sk vde are important and local universities support businesses by training new staff and supporting innovation. The School of Informatics has two divisions. One focuses on teaching and researching computer games. The other division encompasses a wider range of teaching and research, including computer science, web development, computer security, network administration, data science and so on.

Chris: You recently had a open-access paper published in Software Quality Journal. Could you tell us a little bit more about it and perhaps briefly summarise its key findings? Simon: The paper is one output of a collaborative research project with six Swedish businesses that use open source software. There are two parts to the paper. The first consists of an analysis of what the group of businesses in the project know about Reproducible Builds (R-Bs), their experiences with R-Bs and their perception of the value of R-Bs to the businesses. The second part is an interview study with business practitioners and others with experience and expertise in R-Bs. We set out to try to understand the extent to which software-intensive businesses were aware of R-Bs, the technical and business reasons they were or were not using R-Bs and to document the business and technical use cases for R-Bs. The key findings were that businesses are aware of R-Bs, and some are using R-Bs as part of their day-to-day development process. Some of the uses for R-Bs we found were not previously documented. We also found that businesses understood the value R-Bs have as part of engineering and software quality processes. They are also aware of the costs of implementing R-Bs and that R-Bs are an intangible value proposition - in other words, businesses can add value through process improvement by using R-Bs. But, that, currently at least, R-Bs are not a selling point for software or products.
Chris: You performed a large number of interviews in order to prepare your paper. What was the most surprising response to you? Simon: Most surprising is a good question. Everybody I spoke to brought something new to my understanding of R-Bs, and many responses surprised me. The interviewees that surprised me most were I01 and I02 (interviews were anonymised and interviewees were assigned numeric identities). I02 described the sceptical perspective that there is a viable, pragmatic alternative to R-Bs - verifiable builds - which I was aware of before undertaking the research. The company had developed a sufficiently robust system for their needs and worked well. With a large archive of software used in production, they couldn t justify the cost of retrofitting a different solution that might only offer small advantages over the existing system. Doesn t really sound too surprising, but the interview was one of the first I did on this topic, and I was very focused on the value of, and need for, trust in a system that motivated the R-B. The solution used by the company requires trust, but they seem to have established sufficient trust for their needs by securing their build systems to the extent that they are more or less tamper-proof. The other big surprise for me was I01 s use of R-Bs to support the verification of system configuration in a system with multiple embedded components at boot time. It s such an obvious application of R-Bs, and exactly the kind of response I hoped to get from interviewees. However, it is another instance of a solution where trust is only one factor. In the first instance, the developer is using R-Bs to establish trust in the toolchain. There is also the second application that the developer can use a set of R-Bs to establish that deployed system consists of compatible components. While this might not sound too significant, there appear to be some important potential applications. One that came to mind immediately is a problem with firmware updates on nodes in IoT systems where the node needs to update quickly with limited downtime and without failure. The node also needs to be able to roll back any update proposed by a server if there are conflicts with the current configuration or if any tests on the node fail. Perhaps the chances of failure could be reduced, if a node can instead negotiate with a server to determine a safe path to migrate from its current configuration to a working configuration with the upgraded components the central system requires? Another potential application appears to be in the configuration management of AI systems, where decisions need to be explainable. A means of specifying validated configurations of training data, models and deployed systems might, perhaps, be leveraged to prevent invalid or broken configurations from being deployed in production.
Chris: One of your findings was that reproducible builds were perceived to be good engineering practice . To what extent do you believe cultural forces affect the adoption or rejection of a given technology or practice? Simon: To a large extent. People s decisions are informed by cultural norms, and business decisions are made by people acting collectively. Of course, decision-making, including assessments of risk and usefulness, is mediated by individual positions on the continuum from conformity to non-conformity, as well as individual and in-group norms. Whether a business will consider a given technology for adoption will depend on cultural forces. The decision to adopt may well be made on the grounds of cost and benefits.
Chris: Another conclusion implied by your research is that businesses are often dealing with software deployment lifespans (eg. 20+ years) that differ from widely from those of the typical hobbyist programmer. To what degree do you think this temporal mismatch is a problem for both groups? Simon: This is a fascinating question. Long-term software maintenance is a requirement in some industries because of the working lifespans of the products and legal requirements to maintain the products for a fixed period. For some other industries, it is less of a problem. Consequently, I would tend to divide developers into those who have been exposed to long-term maintenance problems and those who have not. Although, more professional than hobbyist developers will have been exposed to the problem. Nonetheless, there are areas, such as music software, where there are also long-term maintenance challenges for data formats and software.
Chris: Based on your research, what would you say are the biggest blockers for the adoption of reproducible builds within business ? And, based on this, would you have any advice or recommendations for the broader reproducible builds ecosystem? Simon: From the research, the main blocker appears to be cost. Not an absolute cost, but there is an overhead to introducing R-Bs. Businesses (and thus business managers) need to understand the business case for R-Bs.

Making decision-makers in businesses aware of R-Bs and that they are valuable will take time. Advocacy at multiple levels appears to be the way forward and this is being done. I would recommend being persistent while being patient and to keep talking about reproducible builds. The work done in Linux distributions raises awareness of R-Bs amongst developers. Guix, NixOS and Software Heritage are all providing practical solutions and getting attention - I ve been seeing progressively more mentions of all three during the last couple of years. Increased awareness amongst developers should lead to more interest within companies. There is also research money being assigned to supply chain security and R-B s. The CHAINS project at KTH in Stockholm is one example of a strategic research project. There may be others that I m not aware of. The policy-level advocacy is slowly getting results in some countries, and where CISA leads, others may follow.
Chris: Was there a particular reason you alighted on the question of the adoption of reproducible builds in business? Do you think there s any truth behind the shopworn stereotype of hacker types neglecting the resources that business might be able to offer? Simon: Much of the motivation for the research came from the contrast between the visibility of R-Bs in open source projects and the relative invisibility of R-Bs in industry. Where companies are known to be using R-Bs (e.g. Google, etc.) there is no fuss, no hype. They were not selling R-Bs as a solution; instead the documentation is very matter-of-fact that R-Bs are part of a customer-facing process in their cloud solutions. An obvious question for me was that if some people use R-B s in software development, why doesn t everybody? There are limits to the tooling for some programming languages that mean R-Bs are difficult or impossible. But where creating an R-B is practical, why are they not used more widely? So, to your second question. There is another factor, which seems to be more about a lack of communication rather than neglecting opportunities. Businesses may not always be willing to discuss their development processes and innovations. Though I do think the increasing number of conferences (big and small) for software practitioners is helping to facilitate more communication and greater exchange of ideas.
Chris: Has your personal view of reproducible builds changed since before you embarked on writing this paper? Simon: Absolutely! In the early stages of the research, I was interested in questions of trust and how R-Bs were applied to resolve build and supply chain security problems. As the research developed, however, I started to see there were benefits to the use of R-Bs that were less obvious and that, in some cases, an R-B can have more than a single application.
Chris: Finally, do you have any plans to do future research touching on reproducible builds? Simon: Yes, definitely. There are a set of problems that interest me. One already mentioned is the use of reproducible builds with AI systems. Interpretable or explainable AI (XAI) is a necessity, and I think that R-Bs can be used to support traceability in the configuration and testing of both deployed systems and systems used during model training and evaluation. I would also like to return to a problem discussed briefly in the article, which is to develop a deeper understanding of the elements involved in the application of R-Bs that can be used to support reasoning about existing and potential applications of R-Bs. For example, R-Bs can be used to establish trust for different groups of individuals at different times, say, between remote developers prior to the release of software and by users after release. One question is whether when an R-B is used might be a significant factor. Another group of questions concerns the ways in which trust (of some sort) propagates among users of an R-B. There is an example in the paper of a company that rebuilds Debian reproducibly for security reasons and is then able to collaborate on software projects where software is built reproducibly with other companies that use public distributions of Debian.

Chris: Many thanks for this interview, Simon. If someone wanted to get in touch or learn more about you and your colleagues at the School of Informatics, where might they go? Thank you for the opportunity. It has been a pleasure to reflect a little more widely on the research! Personally, you can find out about my work on my official homepage and on my personal site. The software systems research group (SSRG) has a website, and the University of Sk vde s English language pages are also available. Chris: Many thanks for this interview, Simon!

For more information about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

Next.