Search Results: "christophe"

21 December 2025

Russell Coker: Links December 2025

Russ Allbery wrote an interesting review of Politics on the Edge, by Rory Stewart who sems like one of the few conservative politicians I could respect and possibly even like [1]. It has some good insights about the problems with our current political environment. The NY Times has an amusing article about the attempt to sell the solution to the CIA s encrypted artwork [2]. Wired has an interesting article about computer face recognition systems failing on people with facial disabilities or scars [3]. This is a major accessibility issue potentially violating disability legislation and a demonstration of the problems of fully automating systems when there should be a human in the loop. The October 2025 report from the Debian Reproducible Builds team is particularly interesting [4]. kpcyrd forwarded a fascinating tidbit regarding so-called ninja and samurai build ordering, that uses data structures in which the pointer values returned from malloc are used to determine some order of execution LOL Louis Rossmann made an insightful youtube video about the moral case for piracy of software and media [5]. Louis Rossman made an insightful video about the way that Hyundai is circumventing Right to Repair laws to make repairs needlessly expensive [6]. Korean cars aren t much good nowadays. Their prices keep increasing and the quality doesn t. Brian Krebs wrote an interesting article about how Google is taking legal action against SMS phishing crime groups [7]. We need more of this! Josh Griffiths wrote an informative blog post about how YouTube is awful [8]. I really should investigate Peertube. Louis Rossman made an informative YouTube video about Right to Repair and the US military, if even the US military is getting ripped off by this it s a bigger problem than most people realise [9]. He also asks the rhetorical question of whether politicians are bought or whether it s a subscription model . Brian Krebs wrote an informative article about the US plans to ban TP Link devices, OpenWRT seems like a good option [10]. Brian Krebs wrote an informative article about free streaming Android TV boxes that act as hidden residential VPN proxies [11]. Also the free streaming violates copyright law. Bruce Schneier and Nathan E. Sanders wrote an interesting article about ways that AI is being used to strengthen democracy [12]. Cory Doctorow wrote an insightful article about the incentives for making shitty goods and services and why we need legislation to protect consumers [13]. Linus Tech Tips has an interesting interview with Linus Torvalds [14]. Interesting video about the Kowloon Walled City [15]. It would be nice if a government deliberately created a hive city like that, the only example I know of is the Alaskan town in a single building. David Brin wrote an insightful set of 3 blog posts about a Democratic American deal that could improve the situation there [16].

31 July 2025

Russell Coker: Links July 2025

Louis Rossman made an informative YouTube video about right to repair and the US military [1]. This is really important as it helps promote free software and open standards. The ACM has an insightful article about hidden controls [2]. We need EU regulations about hidden controls in safety critical systems like cars. This Daily WTF article has some interesting security implications for Windows [3]. Earth.com has an interesting article about the rubber hand illusion and how it works on Octopus [4]. For a long time I have been opposed to eating Octopus because I think they are too intelligent. The Washington Post has an insightful article about the future of spies when everything is tracked by technology [5]. Micah Lee wrote an informative guide to using Signal groups for activism [6]. David Brin wrote an insightful blog post about the phases of the ongoing US civil war [7]. Christian Kastner wrote an interesting blog post about using Glibc hardware capabilities to use different builds of a shared library for a range of CPU features [8]. David Brin wrote an insightful and interesting blog post comparing President Carter with the criminals in the Republican party [9].

30 June 2025

Russell Coker: Links June 2025

Jonathan McDowell wrote part 2 of his blog series about setting up a voice assistant on Debian, I look forward to reading further posts [1]. I m working on some related things for Debian that will hopefully work with this. I m testing out OpenSnitch on Trixie inspired by this blog post, it s an interesting package [2]. Valerie wrote an informative article about creating mesh networks using LORA for emergency use [3]. Interesting article about Signal and Windows Recall. That gives us some things to consider regarding ML features on Linux systems [4]. Insightful article about AI and the end of prestige [5]. We should all learn about LLMs. Jonathan Dowland wrote an informative blog post about how to manage namespaces on Linux [6]. The Consumer Rights wiki is a great resource for raising awareness of corporations exploiting their customers for computer related goods and services [7]. Interesting article about Schizophrenia and the cliff-edge function of evolution [8].

24 June 2025

Matthew Garrett: Why is there no consistent single signon API flow?

Single signon is a pretty vital part of modern enterprise security. You have users who need access to a bewildering array of services, and you want to be able to avoid the fallout of one of those services being compromised and your users having to change their passwords everywhere (because they're clearly going to be using the same password everywhere), or you want to be able to enforce some reasonable MFA policy without needing to configure it in 300 different places, or you want to be able to disable all user access in one place when someone leaves the company, or, well, all of the above. There's any number of providers for this, ranging from it being integrated with a more general app service platform (eg, Microsoft or Google) or a third party vendor (Okta, Ping, any number of bizarre companies). And, in general, they'll offer a straightforward mechanism to either issue OIDC tokens or manage SAML login flows, requiring users present whatever set of authentication mechanisms you've configured.

This is largely optimised for web authentication, which doesn't seem like a huge deal - if I'm logging into Workday then being bounced to another site for auth seems entirely reasonable. The problem is when you're trying to gate access to a non-web app, at which point consistency in login flow is usually achieved by spawning a browser and somehow managing submitting the result back to the remote server. And this makes some degree of sense - browsers are where webauthn token support tends to live, and it also ensures the user always has the same experience.

But it works poorly for CLI-based setups. There's basically two options - you can use the device code authorisation flow, where you perform authentication on what is nominally a separate machine to the one requesting it (but in this case is actually the same) and as a result end up with a straightforward mechanism to have your users socially engineered into giving Johnny Badman a valid auth token despite webauthn nominally being unphisable (as described years ago), or you reduce that risk somewhat by spawning a local server and POSTing the token back to it - which works locally but doesn't work well if you're dealing with trying to auth on a remote device. The user experience for both scenarios sucks, and it reduces a bunch of the worthwhile security properties that modern MFA supposedly gives us.

There's a third approach, which is in some ways the obviously good approach and in other ways is obviously a screaming nightmare. All the browser is doing is sending a bunch of requests to a remote service and handling the response locally. Why don't we just do the same? Okta, for instance, has an API for auth. We just need to submit the username and password to that and see what answer comes back. This is great until you enable any kind of MFA, at which point the additional authz step is something that's only supported via the browser. And basically everyone else is the same.

Of course, when we say "That's only supported via the browser", the browser is still just running some code of some form and we can figure out what it's doing and do the same. Which is how you end up scraping constants out of Javascript embedded in the API response in order to submit that data back in the appropriate way. This is all possible but it's incredibly annoying and fragile - the contract with the identity provider is that a browser is pointed at a URL, not that any of the internal implementation remains consistent.

I've done this. I've implemented code to scrape an identity provider's auth responses to extract the webauthn challenges and feed those to a local security token without using a browser. I've also written support for forwarding those challenges over the SSH agent protocol to make this work with remote systems that aren't running a GUI. This week I'm working on doing the same again, because every identity provider does all of this differently.

There's no fundamental reason all of this needs to be custom. It could be a straightforward "POST username and password, receive list of UUIDs describing MFA mechanisms, define how those MFA mechanisms work". That even gives space for custom auth factors (I'm looking at you, Okta Fastpass). But instead I'm left scraping JSON blobs out of Javascript and hoping nobody renames a field, even though I only care about extremely standard MFA mechanisms that shouldn't differ across different identity providers.

Someone, please, write a spec for this. Please don't make it be me.

comment count unavailable comments

31 May 2025

Russell Coker: Links May 2025

Christopher Biggs gave an informative Evrything Open lecture about voice recognition [1]. We need this on Debian phones. Guido wrote an informative blog post about booting a custom Android kernel on a Pixel 3a [2]. Good work in writing this up, but a pity that Google made the process so difficult. Interesting to read about an expert being a victim of a phishing attack [3]. It can happen to anyone, everyone has moments when they aren t concentrating. Interesting advice on how to leak to a journalist [4]. Brian Krebs wrote an informative article about the ways that Trump is deliberately reducing the cyber security of the US government [5]. Brian Krebs wrote an interesting article about the main smishng groups from China [6]. Louis Rossmann (who is known for high quality YouTube videos about computer repair) made an informative video about a scammy Australian company run by a child sex offender [7]. The Helmover was one of the wildest engineering projects of WW2, an over the horizon guided torpedo that could one-shot a battleship [8]. Interesting blog post about DDoSecrets and the utter failure of the company Telemessages which was used by the US government [9]. Jonathan McDowell wrote an interesting blog post about developing a free software competitor to Alexa etc, the listening hardware costs $13US per node [10]. Noema Magazine published an insightful article about Rewilding the Internet, it has some great ideas [11].

2 May 2025

Daniel Lange: Cleaning a broken GnuPG (gpg) key

I've long said that the main tools in the Open Source security space, OpenSSL and GnuPG (gpg), are broken and only a complete re-write will solve this. And that is still pending as nobody came forward with the funding. It's not a sexy topic, so it has to get really bad before it'll get better. Gpg has a UI that is close to useless. That won't substantially change with more bolted-on improvements. Now Robert J. Hansen and Daniel Kahn Gillmor had somebody add ~50k signatures (read 1, 2, 3, 4 for the g l ory details) to their keys and - oops - they say that breaks gpg. But does it? I downloaded Robert J. Hansen's key off the SKS-Keyserver network. It's a nice 45MB file when de-ascii-armored (gpg --dearmor broken_key.asc ; mv broken_key.asc.gpg broken_key.gpg). Now a friendly:
$ /usr/bin/time -v gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit

pub rsa3072/0x1DCBDC01B44427C7
erzeugt: 2015-07-16 verf llt: niemals Nutzung: SC
Vertrauen: unbekannt G ltigkeit: unbekannt
sub ed25519/0xA83CAE94D3DC3873
erzeugt: 2017-04-05 verf llt: niemals Nutzung: S
sub cv25519/0xAA24CC81B8AED08B
erzeugt: 2017-04-05 verf llt: niemals Nutzung: E
sub rsa3072/0xDC0F82625FA6AADE
erzeugt: 2015-07-16 verf llt: niemals Nutzung: E
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2) Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3) Robert J. Hansen <rob@hansen.engineering>

User-ID "Robert J. Hansen <rjh@sixdemonbag.org>": 49705 Signaturen entfernt
User-ID "Robert J. Hansen <rob@enigmail.net>": 49704 Signaturen entfernt
User-ID "Robert J. Hansen <rob@hansen.engineering>": 49701 Signaturen entfernt

pub rsa3072/0x1DCBDC01B44427C7
erzeugt: 2015-07-16 verf llt: niemals Nutzung: SC
Vertrauen: unbekannt G ltigkeit: unbekannt
sub ed25519/0xA83CAE94D3DC3873
erzeugt: 2017-04-05 verf llt: niemals Nutzung: S
sub cv25519/0xAA24CC81B8AED08B
erzeugt: 2017-04-05 verf llt: niemals Nutzung: E
sub rsa3072/0xDC0F82625FA6AADE
erzeugt: 2015-07-16 verf llt: niemals Nutzung: E
[ unbekannt ] (1). Robert J. Hansen <rjh@sixdemonbag.org>
[ unbekannt ] (2) Robert J. Hansen <rob@enigmail.net>
[ unbekannt ] (3) Robert J. Hansen <rob@hansen.engineering>

Command being timed: "gpg --no-default-keyring --keyring ./broken_key.gpg --batch --quiet --edit-key 0x1DCBDC01B44427C7 clean save quit"
User time (seconds): 3911.14
System time (seconds): 2442.87
Percent of CPU this job got: 99%
Elapsed (wall clock) time (h:mm:ss or m:ss): 1:45:56
Average shared text size (kbytes): 0
Average unshared data size (kbytes): 0
Average stack size (kbytes): 0
Average total size (kbytes): 0
Maximum resident set size (kbytes): 107660
Average resident set size (kbytes): 0
Major (requiring I/O) page faults: 1
Minor (reclaiming a frame) page faults: 26630
Voluntary context switches: 43
Involuntary context switches: 59439
Swaps: 0
File system inputs: 112
File system outputs: 48
Socket messages sent: 0
Socket messages received: 0
Signals delivered: 0
Page size (bytes): 4096
Exit status: 0
And the result is a nicely useable 3835 byte file of the clean public key. If you supply a keyring instead of --no-default-keyring it will also keep the non-self signatures that are useful for you (as you apparently know the signing party). So it does not break gpg. It does break things that call gpg at runtime and not asynchronously. I heard Enigmail is affected, quelle surprise. Now the main problem here is the runtime. 1h45min is just ridiculous. As Filippo Valsorda puts it:
Someone added a few thousand entries to a list that lets anyone append to it. GnuPG, software supposed to defeat state actors, suddenly takes minutes to process entries. How big is that list you ask? 17 MiB. Not GiB, 17 MiB. Like a large picture. https://dev.gnupg.org/T4592
If I were a gpg / SKS keyserver developer, I'd That way another key can only be added to the keyserver network if it contains at least one signature from a previously known strong-set key. Attacking the keyserver network would become at least non-trivial. And the web-of-trust thing may make sense again. Updates 09.07.2019 GnuPG 2.2.17 has been released with another set of quickly bolted together fixes:
   gpg: Ignore all key-signatures received from keyservers.  This
    change is required to mitigate a DoS due to keys flooded with
    faked key-signatures.  The old behaviour can be achieved by adding
    keyserver-options no-self-sigs-only,no-import-clean
    to your gpg.conf.  [#4607]
   gpg: If an imported keyblocks is too large to be stored in the
    keybox (pubring.kbx) do not error out but fallback to an import
    using the options "self-sigs-only,import-clean".  [#4591]
   gpg: New command --locate-external-key which can be used to
    refresh keys from the Web Key Directory or via other methods
    configured with --auto-key-locate.
   gpg: New import option "self-sigs-only".
   gpg: In --auto-key-retrieve prefer WKD over keyservers.  [#4595]
   dirmngr: Support the "openpgpkey" subdomain feature from
    draft-koch-openpgp-webkey-service-07. [#4590].
   dirmngr: Add an exception for the "openpgpkey" subdomain to the
    CSRF protection.  [#4603]
   dirmngr: Fix endless loop due to http errors 503 and 504.  [#4600]
   dirmngr: Fix TLS bug during redirection of HKP requests.  [#4566]
   gpgconf: Fix a race condition when killing components.  [#4577]
Bug T4607 shows that these changes are all but well thought-out. They introduce artificial limits, like 64kB for WKD-distributed keys or 5MB for local signature imports (Bug T4591) which weaken the web-of-trust further. I recommend to not run gpg 2.2.17 in production environments without extensive testing as these limits and the unverified network traffic may bite you. Do validate your upgrade with valid and broken keys that have segments (packet groups) surpassing the above mentioned limits. You may be surprised what gpg does. On the upside: you can now refresh keys (sans signatures) via WKD. So if your buddies still believe in limiting their subkey validities, you can more easily update them bypassing the SKS keyserver network. NB: I have not tested that functionality. So test before deploying. 10.08.2019 Christopher Wellons (skeeto) has released his pgp-poisoner tool. It is a go program that can add thousands of malicious signatures to a GNUpg key per second. He comments "[pgp-poisoner is] proof that such attacks are very easy to pull off. It doesn't take a nation-state actor to break the PGP ecosystem, just one person and couple evenings studying RFC 4880. This system is not robust." He also hints at the next likely attack vector, public subkeys can be bound to a primary key of choice.

18 March 2025

Matthew Garrett: Failing upwards: the Twitter encrypted DM failure

Almost two years ago, Twitter launched encrypted direct messages. I wrote about their technical implementation at the time, and to the best of my knowledge nothing has changed. The short story is that the actual encryption primitives used are entirely normal and fine - messages are encrypted using AES, and the AES keys are exchanged via NIST P-256 elliptic curve asymmetric keys. The asymmetric keys are each associated with a specific device or browser owned by a user, so when you send a message to someone you encrypt the AES key with all of their asymmetric keys and then each device or browser can decrypt the message again. As long as the keys are managed appropriately, this is infeasible to break.

But how do you know what a user's keys are? I also wrote about this last year - key distribution is a hard problem. In the Twitter DM case, you ask Twitter's server, and if Twitter wants to intercept your messages they replace your key. The documentation for the feature basically admits this - if people with guns showed up there, they could very much compromise the protection in such a way that all future messages you sent were readable. It's also impossible to prove that they're not already doing this without every user verifying that the public keys Twitter hands out to other users correspond to the private keys they hold, something that Twitter provides no mechanism to do.

This isn't the only weakness in the implementation. Twitter may not be able read the messages, but every encrypted DM is sent through exactly the same infrastructure as the unencrypted ones, so Twitter can see the time a message was sent, who it was sent to, and roughly how big it was. And because pictures and other attachments in Twitter DMs aren't sent in-line but are instead replaced with links, the implementation would encrypt the links but not the attachments - this is "solved" by simply blocking attachments in encrypted DMs. There's no forward secrecy - if a key is compromised it allows access to not only all new messages created with that key, but also all previous messages. If you log out of Twitter the keys are still stored by the browser, so if you can potentially be extracted and used to decrypt your communications. And there's no group chat support at all, which is more a functional restriction than a conceptual one.

To be fair, these are hard problems to solve! Signal solves all of them, but Signal is the product of a large number of highly skilled experts in cryptography, and even so it's taken years to achieve all of this. When Elon announced the launch of encrypted DMs he indicated that new features would be developed quickly - he's since publicly mentioned the feature a grand total of once, in which he mentioned further feature development that just didn't happen. None of the limitations mentioned in the documentation have been addressed in the 22 months since the feature was launched.

Why? Well, it turns out that the feature was developed by a total of two engineers, neither of whom is still employed at Twitter. The tech lead for the feature was Christopher Stanley, who was actually a SpaceX employee at the time. Since then he's ended up at DOGE, where he apparently set off alarms when attempting to install Starlink, and who today is apparently being appointed to the board of Fannie Mae, a government-backed mortgage company.

Anyway. Use Signal.

comment count unavailable comments

26 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo (Red Hat), Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

25 September 2024

Melissa Wen: Reflections on 2024 Linux Display Next Hackfest

Hey everyone! The 2024 Linux Display Next hackfest concluded in May, and its outcomes continue to shape the Linux Display stack. Igalia hosted this year s event in A Coru a, Spain, bringing together leading experts in the field. Samuel Iglesias and I organized this year s edition and this blog post summarizes the experience and its fruits. One of the highlights of this year s hackfest was the wide range of backgrounds represented by our 40 participants (both on-site and remotely). Developers and experts from various companies and open-source projects came together to advance the Linux Display ecosystem. You can find the list of participants here. The event covered a broad spectrum of topics affecting the development of Linux projects, user experiences, and the future of display technologies on Linux. From cutting-edge topics to long-term discussions, you can check the event agenda here.

Organization Highlights The hackfest was marked by in-depth discussions and knowledge sharing among Linux contributors, making everyone inspired, informed, and connected to the community. Building on feedback from the previous year, we refined the unconference format to enhance participant preparation and engagement. Structured Agenda and Timeboxes: Each session had a defined scope, time limit (1h20 or 2h10), and began with an introductory talk on the topic.
  • Participant-Led Discussions: We pre-selected in-person participants to lead discussions, allowing them to prepare introductions, resources, and scope.
  • Transparent Scheduling: The schedule was shared in advance as GitHub issues, encouraging participants to review and prepare for sessions of interest.
Engaging Sessions: The hackfest featured a variety of topics, including presentations and discussions on how participants were addressing specific subjects within their companies.
  • No Breakout Rooms, No Overlaps: All participants chose to attend all sessions, eliminating the need for separate breakout rooms. We also adapted run-time schedule to keep everybody involved in the same topics.
  • Real-time Updates: We provided notifications and updates through dedicated emails and the event matrix room.
Strengthening Community Connections: The hackfest offered ample opportunities for networking among attendees.
  • Social Events: Igalia sponsored coffee breaks, lunches, and a dinner at a local restaurant.
  • Museum Visit: Participants enjoyed a sponsored visit to the Museum of Estrela Galicia Beer (MEGA).

Fruitful Discussions and Follow-up The structured agenda and breaks allowed us to cover multiple topics during the hackfest. These discussions have led to new display feature development and improvements, as evidenced by patches, merge requests, and implementations in project repositories and mailing lists. With the KMS color management API taking shape, we discussed refinements and best approaches to cover the variety of color pipeline from different hardware-vendors. We are also investigating techniques for a performant SDR<->HDR content reproduction and reducing latency and power consumption when using the color blocks of the hardware.

Color Management/HDR Color Management and HDR continued to be the hottest topic of the hackfest. We had three sessions dedicated to discuss Color and HDR across Linux Display stack layers.

Color/HDR (Kernel-Level) Harry Wentland (AMD) led this session. Here, kernel Developers shared the Color Management pipeline of AMD, Intel and NVidia. We counted with diagrams and explanations from HW-vendors developers that discussed differences, constraints and paths to fit them into the KMS generic color management properties such as advertising modeset needs, IN\_FORMAT, segmented LUTs, interpolation types, etc. Developers from Qualcomm and ARM also added information regarding their hardware. Upstream work related to this session:

Color/HDR (Compositor-Level) Sebastian Wick (RedHat) led this session. It started with Sebastian s presentation covering Wayland color protocols and compositor implementation. Also, an explanation of APIs provided by Wayland and how they can be used to achieve better color management for applications and discussions around ICC profiles and color representation metadata. There was also an intensive Q&A about LittleCMS with Marti Maria. Upstream work related to this session:

Color/HDR (Use Cases and Testing) Christopher Cameron (Google) and Melissa Wen (Igalia) led this session. In contrast to the other sessions, here we focused less on implementation and more on brainstorming and reflections of real-world SDR and HDR transformations (use and validation) and gainmaps. Christopher gave a nice presentation explaining HDR gainmap images and how we should think of HDR. This presentation and Q&A were important to put participants at the same page of how to transition between SDR and HDR and somehow emulating HDR. We also discussed on the usage of a kernel background color property. Finally, we discussed a bit about Chamelium and the future of VKMS (future work and maintainership).

Power Savings vs Color/Latency Mario Limonciello (AMD) led this session. Mario gave an introductory presentation about AMD ABM (adaptive backlight management) that is similar to Intel DPST. After some discussions, we agreed on exposing a kernel property for power saving policy. This work was already merged on kernel and the userspace support is under development. Upstream work related to this session:

Strategy for video and gaming use-cases Leo Li (AMD) led this session. Miguel Casas (Google) started this session with a presentation of Overlays in Chrome/OS Video, explaining the main goal of power saving by switching off GPU for accelerated compositing and the challenges of different colorspace/HDR for video on Linux. Then Leo Li presented different strategies for video and gaming and we discussed the userspace need of more detailed feedback mechanisms to understand failures when offloading. Also, creating a debugFS interface came up as a tool for debugging and analysis.

Real-time scheduling and async KMS API Xaver Hugl (KDE/BlueSystems) led this session. Compositor developers have exposed some issues with doing real-time scheduling and async page flips. One is that the Kernel limits the lifetime of realtime threads and if a modeset takes too long, the thread will be killed and thus the compositor as well. Also, simple page flips take longer than expected and drivers should optimize them. Another issue is the lack of feedback to compositors about hardware programming time and commit deadlines (the lastest possible time to commit). This is difficult to predict from drivers, since it varies greatly with the type of properties. For example, color management updates take much longer. In this regard, we discusssed implementing a hw_done callback to timestamp when the hardware programming of the last atomic commit is complete. Also an API to pre-program color pipeline in a kind of A/B scheme. It may not be supported by all drivers, but might be useful in different ways.

VRR/Frame Limit, Display Mux, Display Control, and more and beer We also had sessions to discuss a new KMS API to mitigate headaches on VRR and Frame Limit as different brightness level at different refresh rates, abrupt changes of refresh rates, low frame rate compensation (LFC) and precise timing in VRR more. On Display Control we discussed features missing in the current KMS interface for HDR mode, atomic backlight settings, source-based tone mapping, etc. We also discussed the need of a place where compositor developers can post TODOs to be developed by KMS people. The Content-adaptive Scaling and Sharpening session focused on sharpening and scaling filters. In the Display Mux session, we discussed proposals to expose the capability of dynamic mux switching display signal between discrete and integrated GPUs. In the last session of the 2024 Display Next Hackfest, participants representing different compositors summarized current and future work and built a Linux Display wish list , which includes: improvements to VTTY and HDR switching, better dmabuf API for multi-GPU support, definition of tone mapping, blending and scaling sematics, and wayland protocols for advertising to clients which colorspaces are supported. We closed this session with a status update on feature development by compositors, including but not limited to: plane offloading (from libcamera to output) / HDR video offloading (dma-heaps) / plane-based scrolling for web pages, color management / HDR / ICC profiles support, addressing issues such as flickering when color primaries don t match, etc. After three days of intensive discussions, all in-person participants went to a guided tour at the Museum of Extrela Galicia beer (MEGA), pouring and tasting the most famous local beer.

Feedback and Future Directions Participants provided valuable feedback on the hackfest, including suggestions for future improvements.
  • Schedule and Break-time Setup: Having a pre-defined agenda and schedule provided a better balance between long discussions and mental refreshments, preventing the fatigue caused by endless discussions.
  • Action Points: Some participants recommended explicitly asking for action points at the end of each session and assigning people to follow-up tasks.
  • Remote Participation: Remote attendees appreciated the inclusive setup and opportunities to actively participate in discussions.
  • Technical Challenges: There were bandwidth and video streaming issues during some sessions due to the large number of participants.

Thank you for joining the 2024 Display Next Hackfest We can t help but thank the 40 participants, who engaged in-person or virtually on relevant discussions, for a collaborative evolution of the Linux display stack and for building an insightful agenda. A big thank you to the leaders and presenters of the nine sessions: Christopher Cameron (Google), Harry Wentland (AMD), Leo Li (AMD), Mario Limoncello (AMD), Sebastian Wick (RedHat) and Xaver Hugl (KDE/BlueSystems) for the effort in preparing the sessions, explaining the topic and guiding discussions. My acknowledge to the others in-person participants that made such an effort to travel to A Coru a: Alex Goins (NVIDIA), David Turner (Raspberry Pi), Georges Stavracas (Igalia), Joan Torres (SUSE), Liviu Dudau (Arm), Louis Chauvet (Bootlin), Robert Mader (Collabora), Tian Mengge (GravityXR), Victor Jaquez (Igalia) and Victoria Brekenfeld (System76). It was and awesome opportunity to meet you and chat face-to-face. Finally, thanks virtual participants who couldn t make it in person but organized their days to actively participate in each discussion, adding different perspectives and valuable inputs even remotely: Abhinav Kumar (Qualcomm), Chaitanya Borah (Intel), Christopher Braga (Qualcomm), Dor Askayo, Jiri Koten (RedHat), Jonas dahl (Red Hat), Leandro Ribeiro (Collabora), Marti Maria (Little CMS), Marijn Suijten, Mario Kleiner, Martin Stransky (Red Hat), Michel D nzer (Red Hat), Miguel Casas-Sanchez (Google), Mitulkumar Golani (Intel), Naveen Kumar (Intel), Niels De Graef (Red Hat), Pekka Paalanen (Collabora), Pichika Uday Kiran (AMD), Shashank Sharma (AMD), Sriharsha PV (AMD), Simon Ser, Uma Shankar (Intel) and Vikas Korjani (AMD). We look forward to another successful Display Next hackfest, continuing to drive innovation and improvement in the Linux display ecosystem!

21 July 2024

Mike Gabriel: Polis - a FLOSS Tool for Civic Participation -- Issues extending Polis and adjusting our Goals

Here comes the 3rd article of the 5-episode blog post series on Polis, written by Guido Berh rster, member of staff at my company Fre(i)e Software GmbH. Enjoy also this read on Guido's work on Polis,
Mike
Table of Contents of the Blog Post Series
  1. Introduction
  2. Initial evaluation and adaptation
  3. Issues extending Polis and adjusting our goals (this article)
  4. Creating (a) new frontend(s) for Polis
  5. Current status and roadmap
Polis - Issues extending Polis and adjusting our Goals After the initial implementation of limited branding support, user feedback and the involvement of an UX designer lead to the conclusion that we needed more far-reaching changes to the user interface in order to reduce visual clutter, rearrange and improve UI elements, and provide better integration with the websites in which conversations are embedded. Challenges when visualizing Data in Polis Polis visualizes groups using a spatial projection of users based on similarities in voting behavior and places them in two to five groups using a clustering algorithm. During our testing and evaluation users were rarely able to interpret the visualization and often intuitively made incorrect assumptions e.g. by associating the filled area of a group with its significance or size. After consultation with a member of the Multi-Agent Systems (MAS) Group at the University of Groningen we chose to temporarily replace the visualization offered by Polis with simple bar charts representing agreement or disagreement with statements of a group or the majority. We intend to revisit this and explore different forms of visualization at a later point in time. The different factors playing into the weight attached to statements which determine the pseuodo-random order in which they are presented for voting ( comment routing ) proved difficult to explain to stakeholders and users and the admission of the ad-hoc and heuristic nature of the used algorithm1 by Polis authors lead to the decision to temporarily remove this feature. Instead, statements should be placed into three groups, namely
  1. metadata questions,
  2. seed statements,
  3. and participant statements
Statements should then be sorted by group but in a fully randomized order within the group so that metadata questions would be presented before seed statements which would be presented before participant s statements. This simpler method was deemed sufficient for the scale of our pilot projects, however we intend to revisit this decision and explore different methods of comment routing in cooperation with our scientific partners at a later point in time. An evaluation of the requirements for implementing mandatory authentication and adding support for additional authentication methods to Polis showed that significant changes to both the administration and participation frontend were needed due to a lack of an abstraction layer or extension mechanism and the current authentication providers being hardcoded in many parts of the code base. A New Frontend is born: Particiapp Based on the implementation details of the participation frontend, the invasive nature of the changes required, and the overhead of keeping up with active upstream development it became clear that a different, more flexible approach to development was needed. This ultimately lead to the creation of Particiapp, a new Open Source project providing the building blocks and necessary abstraction layers for rapid protoyping and experimentation with different fontends which are compatible with but independent from Polis.
  1. Small, Christopher T., Bjorkegren, Michael, Erkkil , Timo, Shaw, Lynette and Megill, Colin (2021). Polis: Scaling deliberation by mapping high dimensional opinion spaces. Recerca. Revista de Pensament i An lisi, 26(2), pp. 1-26.

30 November 2023

Bits from Debian: New Debian Developers and Maintainers (September and October 2023)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

8 February 2023

Chris Lamb: Most anticipated films of 2023

Very few highly-anticipated movies appear in January and February, as the bigger releases are timed so they can be considered for the Golden Globes in January and the Oscars in late February or early March, so film fans have the advantage of a few weeks after the New Year to collect their thoughts on the year ahead. In other words, I'm not actually late in outlining below the films I'm most looking forward to in 2023...

Barbie No, seriously! If anyone can make a good film about a doll franchise, it's probably Greta Gerwig. Not only was Little Women (2019) more than admirable, the same could be definitely said for Lady Bird (2017). More importantly, I can't help feel she was the real 'Driver' behind Frances Ha (2012), one of the better modern takes on Claudia Weill's revelatory Girlfriends (1978). Still, whenever I remember that Barbie will be a film about a billion-dollar toy and media franchise with a nettlesome history, I recall I rubbished the "Facebook film" that turned into The Social Network (2010). Anyway, the trailer for Barbie is worth watching, if only because it seems like a parody of itself.

Blitz It's difficult to overstate just how important the aerial bombing of London during World War II is crucial to understanding the British psyche, despite it being a constructed phenomenon from the outset. Without wishing to underplay the deaths of over 40,000 civilian deaths, Angus Calder pointed out in the 1990s that the modern mythology surrounding the event "did not evolve spontaneously; it was a propaganda construct directed as much at [then neutral] American opinion as at British." It will therefore be interesting to see how British Grenadian Trinidadian director Steve McQueen addresses a topic so essential to the British self-conception. (Remember the controversy in right-wing circles about the sole Indian soldier in Christopher Nolan's Dunkirk (2017)?) McQueen is perhaps best known for his 12 Years a Slave (2013), but he recently directed a six-part film anthology for the BBC which addressed the realities of post-Empire immigration to Britain, and this leads me to suspect he sees the Blitz and its surrounding mythology with a more critical perspective. But any attempt to complicate the story of World War II will be vigorously opposed in a way that will make the recent hullabaloo surrounding The Crown seem tame. All this is to say that the discourse surrounding this release may be as interesting as the film itself.

Dune, Part II Coming out of the cinema after the first part of Denis Vileneve's adaptation of Dune (2021), I was struck by the conception that it was less of a fresh adaptation of the 1965 novel by Frank Herbert than an attempt to rehabilitate David Lynch's 1984 version and in a broader sense, it was also an attempt to reestablish the primacy of cinema over streaming TV and the myriad of other distractions in our lives. I must admit I'm not a huge fan of the original novel, finding within it a certain prurience regarding hereditary military regimes and writing about them with a certain sense of glee that belies a secret admiration for them... not to mention an eyebrow-raising allegory for the Middle East. Still, Dune, Part II is going to be a fantastic spectacle.

Ferrari It'll be curious to see how this differs substantially from the recent Ford v Ferrari (2019), but given that Michael Mann's Heat (1995) so effectively re-energised the gangster/heist genre, I'm more than willing to kick the tires of this about the founder of the eponymous car manufacturer. I'm in the minority for preferring Mann's Thief (1981) over Heat, in part because the former deals in more abstract themes, so I'd have perhaps prefered to look forward to a more conceptual film from Mann over a story about one specific guy.

How Do You Live There are a few directors one can look forward to watching almost without qualification, and Hayao Miyazaki (My Neighbor Totoro, Kiki's Delivery Service, Princess Mononoke Howl's Moving Castle, etc.) is one of them. And this is especially so given that The Wind Rises (2013) was meant to be the last collaboration between Miyazaki and Studio Ghibli. Let's hope he is able to come out of retirement in another ten years.

Indiana Jones and the Dial of Destiny Given I had a strong dislike of Indiana Jones and the Kingdom of the Crystal Skull (2008), I seriously doubt I will enjoy anything this film has to show me, but with 1981's Raiders of the Lost Ark remaining one of my most treasured films (read my brief homage), I still feel a strong sense of obligation towards the Indiana Jones name, despite it feeling like the copper is being pulled out of the walls of this franchise today.

Kafka I only know Polish filmmaker Agnieszka Holland through her Spoor (2017), an adaptation of Olga Tokarczuk's 2009 eco-crime novel Drive Your Plow Over the Bones of the Dead. I wasn't an unqualified fan of Spoor (nor the book on which it is based), but I am interested in Holland's take on the life of Czech author Franz Kafka, an author enmeshed with twentieth-century art and philosophy, especially that of central Europe. Holland has mentioned she intends to tell the story "as a kind of collage," and I can hope that it is an adventurous take on the over-furrowed biopic genre. Or perhaps Gregor Samsa will awake from uneasy dreams to find himself transformed in his bed into a huge verminous biopic.

The Killer It'll be interesting to see what path David Fincher is taking today, especially after his puzzling and strangely cold Mank (2020) portraying the writing process behind Orson Welles' Citizen Kane (1941). The Killer is said to be a straight-to-Netflix thriller based on the graphic novel about a hired assassin, which makes me think of Fincher's Zodiac (2007), and, of course, Se7en (1995). I'm not as entranced by Fincher as I used to be, but any film with Michael Fassbender and Tilda Swinton (with a score by Trent Reznor) is always going to get my attention.

Killers of the Flower Moon In Killers of the Flower Moon, Martin Scorsese directs an adaptation of a book about the FBI's investigation into a conspiracy to murder Osage tribe members in the early years of the twentieth century in order to deprive them of their oil-rich land. (The only thing more quintessentially American than apple pie is a conspiracy combined with a genocide.) Separate from learning more about this disquieting chapter of American history, I'd love to discover what attracted Scorsese to this particular story: he's one of the few top-level directors who have the ability to lucidly articulate their intentions and motivations.

Napoleon It often strikes me that, despite all of his achievements and fame, it's somehow still possible to claim that Ridley Scott is relatively underrated compared to other directors working at the top level today. Besides that, though, I'm especially interested in this film, not least of all because I just read Tolstoy's War and Peace (read my recent review) and am working my way through the mind-boggling 431-minute Soviet TV adaptation, but also because several auteur filmmakers (including Stanley Kubrick) have tried to make a Napoleon epic and failed.

Oppenheimer In a way, a biopic about the scientist responsible for the atomic bomb and the Manhattan Project seems almost perfect material for Christopher Nolan. He can certainly rely on stars to queue up to be in his movies (Robert Downey Jr., Matt Damon, Kenneth Branagh, etc.), but whilst I'm certain it will be entertaining on many fronts, I fear it will fall into the well-established Nolan mould of yet another single man struggling with obsession, deception and guilt who is trying in vain to balance order and chaos in the world.

The Way of the Wind Marked by philosophical and spiritual overtones, all of Terrence Malick's films are perfumed with themes of transcendence, nature and the inevitable conflict between instinct and reason. My particular favourite is his stunning Days of Heaven (1978), but The Thin Red Line (1998) and A Hidden Life (2019) also touched me ways difficult to relate, and are one of the few films about the Second World War that don't touch off my sensitivity about them (see my remarks about Blitz above). It is therefore somewhat Malickian that his next film will be a biblical drama about the life of Jesus. Given Malick's filmography, I suspect this will be far more subdued than William Wyler's 1959 Ben-Hur and significantly more equivocal in its conviction compared to Paolo Pasolini's ardently progressive The Gospel According to St. Matthew (1964). However, little beyond that can be guessed, and the film may not even appear until 2024 or even 2025.

Zone of Interest I was mesmerised by Jonathan Glazer's Under the Skin (2013), and there is much to admire in his borderline 'revisionist gangster' film Sexy Beast (2000), so I will definitely be on the lookout for this one. The only thing making me hesitate is that Zone of Interest is based on a book by Martin Amis about a romance set inside the Auschwitz concentration camp. I haven't read the book, but Amis has something of a history in his grappling with the history of the twentieth century, and he seems to do it in a way that never sits right with me. But if Paul Verhoeven's Starship Troopers (1997) proves anything at all, it's all in the adaption.

5 February 2023

Jonathan Dowland: 2022 in reading

In 2022 I read 34 books (-19% on last year). In 2021 roughly a quarter of the books I read were written by women. I was determined to push that ratio in 2022, so I made an effort to try and only read books by women. I knew that I wouldn't manage that, but by trying to, I did get the ratio up to 58% (by page count). I'm not sure what will happen in 2023. My to-read pile has some back-pressure from books by male authors I postponed reading in 2022 (in particular new works by Christopher Priest and Adam Roberts). It's possible the ratio will swing back the other way, which would mean it would not be worth repeating the experiment. At least if the ratio is the point of the exercise. But perhaps it isn't: perhaps the useful outcome is more qualitative than quantitative. I tried to read some new (to me) authors. I really enjoyed Shirley Jackson (The Haunting of Hill House, We Have Always Lived In The Castle). I Struggled with Angela Carter's Heroes and Villains although I plan to return to her other work, in particular, The Bloody Chamber. I also got through Donna Tartt's The Secret History on the recommendation of a friend. I had to push through the first 15% or so but it turned out to be worth it.
a book cover for Shirley Jackson's 'We have always lived in the castle'
a book cover for Margaret Atwood's 'The Handmaid's Tale'
a book cover for Adam Roberts' 'The This'
a book cover for Emily St. John Mandel's 'Sea of Tranquility'

I finally read (and loved) The Handmaid's Tale, which I had never read despite loving Atwood. My top non-fiction book was The Nanny State Made Me by Stuart Maconie. I still read far more fiction than non-fiction. Or perhaps I'm not keeping track of non- fiction as well. I feel non-fiction requires a different approach to reading: not necessarily linear; it's not always important to read the whole book; it's often important to re-read sections. It might not make sense to consider them in the same bracket. My favourite novels this year were Sea of Tranquility by Emily St. John Mandel, a standalone sort-of sequel to The Glass House but in a very different genre; and The This by Adam Roberts, which was equally remarkable. The This has an interesting narrative device in the first third where a stream of tweets is presented in parallel with the main text. This works well, and does a good job of capturing the figurative river of tweet-like stuff that is woven into our lives at the moment. But I can't help but wonder how they tackle that in the audiobook.

8 December 2022

Reproducible Builds: Reproducible Builds in November 2022

Welcome to yet another report from the Reproducible Builds project, this time for November 2022. In all of these reports (which we have been publishing regularly since May 2015) we attempt to outline the most important things that we have been up to over the past month. As always, if you interested in contributing to the project, please visit our Contribute page on our website.

Reproducible Builds Summit 2022 Following-up from last month s report about our recent summit in Venice, Italy, a comprehensive report from the meeting has not been finalised yet watch this space! As a very small preview, however, we can link to several issues that were filed about the website during the summit (#38, #39, #40, #41, #42, #43, etc.) and collectively learned about Software Bill of Materials (SBOM) s and how .buildinfo files can be seen/used as SBOMs. And, no less importantly, the Reproducible Builds t-shirt design has been updated

Reproducible Builds at European Cyber Week 2022 During the European Cyber Week 2022, a Capture The Flag (CTF) cybersecurity challenge was created by Fr d ric Pierret on the subject of Reproducible Builds. The challenge consisted in a pedagogical sense based on how to make a software release reproducible. To progress through the challenge issues that affect the reproducibility of build (such as build path, timestamps, file ordering, etc.) were to be fixed in steps in order to get the final flag in order to win the challenge. At the end of the competition, five people succeeded in solving the challenge, all of whom were awarded with a shirt. Fr d ric Pierret intends to create similar challenge in the form of a how to in the Reproducible Builds documentation, but two of the 2022 winners are shown here:

On business adoption and use of reproducible builds Simon Butler announced on the rb-general mailing list that the Software Quality Journal published an article called On business adoption and use of reproducible builds for open and closed source software. This article is an interview-based study which focuses on the adoption and uses of Reproducible Builds in industry, with a focus on investigating the reasons why organisations might not have adopted them:
[ ] industry application of R-Bs appears limited, and we seek to understand whether awareness is low or if significant technical and business reasons prevent wider adoption.
This is achieved through interviews with software practitioners and business managers, and touches on both the business and technical reasons supporting the adoption (or not) of Reproducible Builds. The article also begins with an excellent explanation and literature review, and even introduces a new helpful analogy for reproducible builds:
[Users are] able to perform a bitwise comparison of the two binaries to verify that they are identical and that the distributed binary is indeed built from the source code in the way the provider claims. Applied in this manner, R-Bs function as a canary, a mechanism that indicates when something might be wrong, and offer an improvement in security over running unverified binaries on computer systems.
The full paper is available to download on an open access basis. Elsewhere in academia, Beatriz Michelson Reichert and Rafael R. Obelheiro have published a paper proposing a systematic threat model for a generic software development pipeline identifying possible mitigations for each threat (PDF). Under the Tampering rubric of their paper, various attacks against Continuous Integration (CI) processes:
An attacker may insert a backdoor into a CI or build tool and thus introduce vulnerabilities into the software (resulting in an improper build). To avoid this threat, it is the developer s responsibility to take due care when making use of third-party build tools. Tampered compilers can be mitigated using diversity, as in the diverse double compiling (DDC) technique. Reproducible builds, a recent research topic, can also provide mitigation for this problem. (PDF)

Misc news
On our mailing list this month:

Debian & other Linux distributions Over 50 reviews of Debian packages were added this month, another 48 were updated and almost 30 were removed, all of which adds to our knowledge about identified issues. Two new issue types were added as well. [ ][ ]. Vagrant Cascadian announced on our mailing list another online sprint to help clear the huge backlog of reproducible builds patches submitted by performing NMUs (Non-Maintainer Uploads). The first such sprint took place on September 22nd, but others were held on October 6th and October 20th. There were two additional sprints that occurred in November, however, which resulted in the following progress: Lastly, Roland Clobus posted his latest update of the status of reproducible Debian ISO images on our mailing list. This reports that all major desktops build reproducibly with bullseye, bookworm and sid as well as that no custom patches needed to applied to Debian unstable for this result to occur. During November, however, Roland proposed some modifications to live-setup and the rebuild script has been adjusted to fix the failing Jenkins tests for Debian bullseye [ ][ ].
In other news, Miro Hron ok proposed a change to clamp build modification times to the value of SOURCE_DATE_EPOCH. This was initially suggested and discussed on a devel@ mailing list post but was later written up on the Fedora Wiki as well as being officially proposed to Fedora Engineering Steering Committee (FESCo).

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 226 and 227 to Debian:
  • Support both python3-progressbar and python3-progressbar2, two modules providing the progressbar Python module. [ ]
  • Don t run Python decompiling tests on Python bytecode that file(1) cannot detect yet and Python 3.11 cannot unmarshal. (#1024335)
  • Don t attempt to attach text-only differences notice if there are no differences to begin with. (#1024171)
  • Make sure we recommend apksigcopier. [ ]
  • Tidy generation of os_list. [ ]
  • Make the code clearer around generating the Debian substvars . [ ]
  • Use our assert_diff helper in test_lzip.py. [ ]
  • Drop other copyright notices from lzip.py and test_lzip.py. [ ]
In addition to this, Christopher Baines added lzip support [ ], and FC Stegerman added an optimisation whereby we don t run apktool if no differences are detected before the signing block [ ].
A significant number of changes were made to the Reproducible Builds website and documentation this month, including Chris Lamb ensuring the openEuler logo is correctly visible with a white background [ ], FC Stegerman de-duplicated by email address to avoid listing some contributors twice [ ], Herv Boutemy added Apache Maven to the list of affiliated projects [ ] and boyska updated our Contribute page to remark that the Reproducible Builds presence on salsa.debian.org is not just the Git repository but is also for creating issues [ ][ ]. In addition to all this, however, Holger Levsen made the following changes:
  • Add a number of existing publications [ ][ ] and update metadata for some existing publications as well [ ].
  • Hide draft posts on the website homepage. [ ]
  • Add the Warpforge build tool as a participating project of the summit. [ ]
  • Clarify in the footer that we welcome patches to the website repository. [ ]

Testing framework The Reproducible Builds project operates a comprehensive testing framework at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, the following changes were made by Holger Levsen:
  • Improve the generation of meta package sets (used in grouping packages for reporting/statistical purposes) to treat Debian bookworm as equivalent to Debian unstable in this specific case [ ] and to parse the list of packages used in the Debian cloud images [ ][ ][ ].
  • Temporarily allow Frederic to ssh(1) into our snapshot server as the jenkins user. [ ]
  • Keep some reproducible jobs Jenkins logs much longer [ ] (later reverted).
  • Improve the node health checks to detect failures to update the Debian cloud image package set [ ][ ] and to improve prioritisation of some kernel warnings [ ].
  • Always echo any IRC output to Jenkins output as well. [ ]
  • Deal gracefully with problems related to processing the cloud image package set. [ ]
Finally, Roland Clobus continued his work on testing Live Debian images, including adding support for specifying the origin of the Debian installer [ ] and to warn when the image has unmet dependencies in the package list (e.g. due to a transition) [ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. You can get in touch with us via:

30 November 2022

Matthew Garrett: Making unphishable 2FA phishable

One of the huge benefits of WebAuthn is that it makes traditional phishing attacks impossible. An attacker sends you a link to a site that looks legitimate but isn't, and you type in your credentials. With SMS or TOTP-based 2FA, you type in your second factor as well, and the attacker now has both your credentials and a legitimate (if time-limited) second factor token to log in with. WebAuthn prevents this by verifying that the site it's sending the secret to is the one that issued it in the first place - visit an attacker-controlled site and said attacker may get your username and password, but they won't be able to obtain a valid WebAuthn response.

But what if there was a mechanism for an attacker to direct a user to a legitimate login page, resulting in a happy WebAuthn flow, and obtain valid credentials for that user anyway? This seems like the lead-in to someone saying "The Aristocrats", but unfortunately it's (a) real, (b) RFC-defined, and (c) implemented in a whole bunch of places that handle sensitive credentials. The villain of this piece is RFC 8628, and while it exists for good reasons it can be used in a whole bunch of ways that have unfortunate security consequences.

What is the RFC 8628-defined Device Authorization Grant, and why does it exist? Imagine a device that you don't want to type a password into - either it has no input devices at all (eg, some IoT thing) or it's awkward to type a complicated password (eg, a TV with an on-screen keyboard). You want that device to be able to access resources on behalf of a user, so you want to ensure that that user authenticates the device. RFC 8628 describes an approach where the device requests the credentials, and then presents a code to the user (either on screen or over Bluetooth or something), and starts polling an endpoint for a result. The user visits a URL and types in that code (or is given a URL that has the code pre-populated) and is then guided through a standard auth process. The key distinction is that if the user authenticates correctly, the issued credentials are passed back to the device rather than the user - on successful auth, the endpoint the device is polling will return an oauth token.

But what happens if it's not a device that requests the credentials, but an attacker? What if said attacker obfuscates the URL in some way and tricks a user into clicking it? The user will be presented with their legitimate ID provider login screen, and if they're using a WebAuthn token for second factor it'll work correctly (because it's genuinely talking to the real ID provider!). The user will then typically be prompted to approve the request, but in every example I've seen the language used here is very generic and doesn't describe what's going on or ask the user. AWS simply says "An application or device requested authorization using your AWS sign-in" and has a big "Allow" button, giving the user no indication at all that hitting "Allow" may give a third party their credentials.

This isn't novel! Christoph Tafani-Dereeper has an excellent writeup on this topic from last year, which builds on Nestori Syynimaa's earlier work. But whenever I've talked about this, people seem surprised at the consequences. WebAuthn is supposed to protect against phishing attacks, but this approach subverts that protection by presenting the user with a legitimate login page and then handing their credentials to someone else.

RFC 8628 actually recognises this vector and presents a set of mitigations. Unfortunately nobody actually seems to implement these, and most of the mitigations are based around the idea that this flow will only be used for physical devices. Sadly, AWS uses this for initial authentication for the aws-cli tool, so there's no device in that scenario. Another mitigation is that there's a relatively short window where the code is valid, and so sending a link via email is likely to result in it expiring before the user clicks it. An attacker could avoid this by directing the user to a domain under their control that triggers the flow and then redirects the user to the login page, ensuring that the code is only generated after the user has clicked the link.

Can this be avoided? The best way to do so is to ensure that you don't support this token issuance flow anywhere, or if you do then ensure that any tokens issued that way are extremely narrowly scoped. Unfortunately if you're an AWS user, that's probably not viable - this flow is required for the cli tool to perform SSO login, and users are going to end up with broadly scoped tokens as a result. The logs are also not terribly useful.

The infuriating thing is that this isn't necessary for CLI tooling. The reason this approach is taken is that you need a way to get the token to a local process even if the user is doing authentication in a browser. This can be avoided by having the process listen on localhost, and then have the login flow redirect to localhost (including the token) on successful completion. In this scenario the attacker can't get access to the token without having access to the user's machine, and if they have that they probably have access to the token anyway.

There's no real moral here other than "Security is hard". Sorry.

comment count unavailable comments

4 November 2022

Reproducible Builds (diffoscope): diffoscope 226 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 226. This version includes the following changes:
[ Christopher Baines ]
* Add an lzip comparator with tests.
[ Chris Lamb ]
* Add support for comparing the "text" content of HTML files using html2text.
  (Closes: #1022209, reproducible-builds/diffoscope#318)
* Misc/test improvements:
  * Drop the ALLOWED_TEST_FILES test; it's mostly just annoying.
  * Drop other copyright notices from lzip.py and test_lzip.py.
  * Use assert_diff helper in test_lzip.py.
  * Pylint tests/test_source.py.
[ Mattia Rizzolo ]
* Add lzip to debian dependencies.
You find out more by visiting the project homepage.

29 July 2022

Bits from Debian: New Debian Developers and Maintainers (May and June 2022)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

8 January 2022

Jonathan Dowland: 2021 in Fiction

Cover for *This is How You Lose the Time War*
Cover for *Robot*
Cover for *The Glass Hotel*
Following on from last year's round-up of my reading, here's a look at the fiction I enjoyed in 2021. I managed to read 42 books in 2021, up from 31 last year. That's partly to do with buying an ereader: 33/36% of my reading (by pages/by books) was ebooks. I think this demonstrates that ebooks have mostly complemented paper books for me, rather than replacing them. My book of the year (although it was published in 2019) was This is How You Lose the Time War by Amal El-Mohtar and Max Gladstone: A short epistolary love story between warring time travellers and quite unlike anything else I've read for a long time. Other notables were The Glass Hotel by Emily St John Mandel and Robot by Adam Wi niewski-Snerg. The biggest disappointment for me was The Ministry for the Future by Kim Stanley Robinson (KSR), which I haven't even finished. I love KSRs writing: I've written about him many times on this blog, at least in 2002, 2006 and 2009, I think I've read every other novel he's published and most of his short stories. But this one was too much of something for me. He's described this novel a the end-point of a particular journey and approach to writing he's taken, which I felt relieved to learn, assuming he writes any more novels (and I really hope that he does) they will likely be in a different "mode". My "new author discovery" for 2021 was Chris Beckett: I tore through Two Tribes and America City before promptly buying all his other work. He fits roughly into the same bracket as Adam Roberts and Christopher Priest, two of my other favourite authors. 5 of the books I read (12%) were from my "backlog" of already-purchased physical books. I'd like to try and reduce my Backlog further so I hope to push this figure up next year. I made a small effort to read more diverse authors this year. 24% of the books I read (by book count and page count) were by women. 15% by page count were (loosely) BAME (19% by book count). Again I'd like to increase these numbers modestly in 2022. Unlike 2020, I didn't complete any short story collections in 2021! This is partly because there was only one issue of Interzone published in all of 2021, a double-issue which I haven't yet finished. This is probably a sad date point in terms of Interzone's continued existence, but it's not dead yet.

25 May 2021

Vincent Bernat: Jerikan+Ansible: a configuration management system for network

There are many resources for network automation with Ansible. Most of them only expose the first steps or limit themselves to a narrow scope. They give no clue on how to expand from that. Real network environments may be large, versatile, heterogeneous, and filled with exceptions. The lack of real-world examples for Ansible deployments, unlike Puppet and SaltStack, leads many teams to build brittle and incomplete automation solutions. We have released under an open-source license our attempt to tackle this problem: Here is a quick demo to configure a new peering:
This work is the collective effort of C dric Hasco t, Jean-Christophe Legatte, Lo c Pailhas, S bastien Hurtel, Tchadel Icard, and Vincent Bernat. We are the network team of Blade, a French company operating Shadow, a cloud-computing product. In May 2021, our company was bought by Octave Klaba and the infrastructure is being transferred to OVHcloud, saving Shadow as a product, but making our team redundant. Our network was around 800 devices, spanning over 10 datacenters with more than 2.5 Tbps of available egress bandwidth. The released material is therefore a substantial example of managing a medium-scale network using Ansible. We have left out the handling of our legacy datacenters to make the final result more readable while keeping enough material to not turn it into a trivial example.

Jerikan The first component is Jerikan. As input, it takes a list of devices, configuration data, templates, and validation scripts. It generates a set of configuration files for each device. Ansible could cover this task, but it has the following limitations:
  • it is slow;
  • errors are difficult to debug;1 and
  • the hierarchy to look up a variable is rigid.
Jerikan inputs and outputs
Jerikan inputs and outputs
If you want to follow the examples, you only need to have Docker and Docker Compose installed. Clone the repository and you are ready!

Source of truth We use YAML files, versioned with Git, as the single source of truth instead of using a database, like NetBox, or a mix of a database and text files. This provides many advantages:
  • anyone can use their preferred text editor;
  • the team prepares changes in branches;
  • the team reviews changes using merge requests;
  • the merge requests expose the changes to the generated configuration files;
  • rollback to a previous state is easy; and
  • it is fast.
The first file is devices.yaml. It contains the device list. The second file is classifier.yaml. It defines a scope for each device. A scope is a set of keys and values. It is used in templates and to look up data associated with a device.
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
continent: apac
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p1.sk1
location: sk1
member: '1'
model: dell-s4048
os: cumulus
pod: '1'
shorthost: to1-p1
The device name is matched against a list of regular expressions and the scope is extended by the result of each match. For to1-p1.sk1.blade-group.net, the following subset of classifier.yaml defines its scope:
matchers:
  - '^(([^.]*)\..*)\.blade-group\.net':
      environment: prod
      host: '\1'
      shorthost: '\2'
  - '\.(sk1)\.':
      location: '\1'
      continent: apac
  - '^to([12])-[as]?p(\d+)\.':
      member: '\1'
      pod: '\2'
  - '^to[12]-p\d+\.':
      groups:
        - tor
        - tor-bgp
        - tor-bgp-compute
  - '^to[12]-(p ap)\d+\.sk1\.':
      os: cumulus
      model: dell-s4048
The third file is searchpaths.py. It describes which directories to search for a variable. A Python function provides a list of paths to look up in data/ for a given scope. Here is a simplified version:2
def searchpaths(scope):
    paths = [
        "host/ scope[location] / scope[shorthost] ",
        "location/ scope[location] ",
        "os/ scope[os] - scope[model] ",
        "os/ scope[os] ",
        'common'
    ]
    for idx in range(len(paths)):
        try:
            paths[idx] = paths[idx].format(scope=scope)
        except KeyError:
            paths[idx] = None
    return [path for path in paths if path]
With this definition, the data for to1-p1.sk1.blade-group.net is looked up in the following paths:
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
[ ]
Search paths:
  host/sk1/to1-p1
  location/sk1
  os/cumulus-dell-s4048
  os/cumulus
  common
Variables are scoped using a namespace that should be specified when doing a lookup. We use the following ones:
  • system for accounts, DNS, syslog servers,
  • topology for ports, interfaces, IP addresses, subnets,
  • bgp for BGP configuration
  • build for templates and validation scripts
  • apps for application variables
When looking up for a variable in a given namespace, Jerikan looks for a YAML file named after the namespace in each directory in the search paths. For example, if we look up a variable for to1-p1.sk1.blade-group.net in the bgp namespace, the following YAML files are processed: host/sk1/to1-p1/bgp.yaml, location/sk1/bgp.yaml, os/cumulus-dell-s4048/bgp.yaml, os/cumulus/bgp.yaml, and common/bgp.yaml. The search stops at the first match. The schema.yaml file allows us to override this behavior by asking to merge dictionaries and arrays across all matching files. Here is an excerpt of this file for the topology namespace:
system:
  users:
    merge: hash
  sampling:
    merge: hash
  ansible-vars:
    merge: hash
  netbox:
    merge: hash
The last feature of the source of truth is the ability to use Jinja2 templates for keys and values by prefixing them with ~ :
# In data/os/junos/system.yaml
netbox:
  manufacturer: Juniper
  model: "~  model upper  "
# In data/groups/tor-bgp-compute/system.yaml
netbox:
  role: net_tor_gpu_switch
Looking up for netbox in the system namespace for to1-p2.ussfo03.blade-group.net yields the following result:
$ ./run-jerikan scope to1-p2.ussfo03.blade-group.net
continent: us
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p2.ussfo03
location: ussfo03
member: '1'
model: qfx5110-48s
os: junos
pod: '2'
shorthost: to1-p2
[ ]
Search paths:
[ ]
  groups/tor-bgp-compute
[ ]
  os/junos
  common
$ ./run-jerikan lookup to1-p2.ussfo03.blade-group.net system netbox
manufacturer: Juniper
model: QFX5110-48S
role: net_tor_gpu_switch
This also works for structured data:
# In groups/adm-gateway/topology.yaml
interface-rescue:
  address: "~  lookup('topology', 'addresses').rescue  "
  up:
    - "~ip route add default via   lookup('topology', 'addresses').rescue ipaddr('first_usable')   table rescue"
    - "~ip rule add from   lookup('topology', 'addresses').rescue ipaddr('address')   table rescue priority 10"
# In groups/adm-gateway-sk1/topology.yaml
interfaces:
  ens1f0: "~  lookup('topology', 'interface-rescue')  "
This yields the following result:
$ ./run-jerikan lookup gateway1.sk1.blade-group.net topology interfaces
[ ]
ens1f0:
  address: 121.78.242.10/29
  up:
  - ip route add default via 121.78.242.9 table rescue
  - ip rule add from 121.78.242.10 table rescue priority 10
When putting data in the source of truth, we use the following rules:
  1. Don t repeat yourself.
  2. Put the data in the most specific place without breaking the first rule.
  3. Use templates with parsimony, mostly to help with the previous rules.
  4. Restrict the data model to what is needed for your use case.
The first rule is important. For example, when specifying IP addresses for a point-to-point link, only specify one side and deduce the other value in the templates. The last rule means you do not need to mimic a BGP YANG model to specify BGP peers and policies:
peers:
  transit:
    cogent:
      asn: 174
      remote:
        - 38.140.30.233
        - 2001:550:2:B::1F9:1
      specific-import:
        - name: ATT-US
          as-path: ".*7018$"
          lp-delta: 50
  ix-sfmix:
    rs-sfmix:
      monitored: true
      asn: 63055
      remote:
        - 206.197.187.253
        - 206.197.187.254
        - 2001:504:30::ba06:3055:1
        - 2001:504:30::ba06:3055:2
    blizzard:
      asn: 57976
      remote:
        - 206.197.187.42
        - 2001:504:30::ba05:7976:1
      irr: AS-BLIZZARD

Templates The list of templates to compile for each device is stored in the source of truth, under the build namespace:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: junos/main.j2
config-base.txt: junos/base.j2
config-irr.txt: junos/irr.j2
$ ./run-jerikan lookup to1-p1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: cumulus/main.j2
frr.conf: cumulus/frr.j2
interfaces.conf: cumulus/interfaces.j2
ports.conf: cumulus/ports.j2
dhcpd.conf: cumulus/dhcp.j2
default-isc-dhcp: cumulus/default-isc-dhcp.j2
authorized_keys: cumulus/authorized-keys.j2
motd: linux/motd.j2
acl.rules: cumulus/acl.j2
rsyslog.conf: cumulus/rsyslog.conf.j2
Templates are using Jinja2. This is the same engine used in Ansible. Jerikan ships some custom filters but also reuse some of the useful filters from Ansible, notably ipaddr. Here is an excerpt of templates/junos/base.j2 to configure DNS and NTP servers on Juniper devices:
system  
  ntp  
 % for ntp in lookup("system", "ntp") % 
    server   ntp  ;
 % endfor % 
   
  name-server  
 % for dns in lookup("system", "dns") % 
      dns  ;
 % endfor % 
   
 
The equivalent template for Cisco IOS-XR is:
 % for dns in lookup('system', 'dns') % 
domain vrf VRF-MANAGEMENT name-server   dns  
 % endfor % 
!
 % for syslog in lookup('system', 'syslog') % 
logging   syslog   vrf VRF-MANAGEMENT
 % endfor % 
!
There are three helper functions provided:
  • devices() returns the list of devices matching a set of conditions on the scope. For example, devices("location==ussfo03", "groups==tor-bgp") returns the list of devices in San Francisco in the tor-bgp group. You can also omit the operator if you want the specified value to be equal to the one in the local scope. For example, devices("location") returns devices in the current location.
  • lookup() does a key lookup. It takes the namespace, the key, and optionally, a device name. If not provided, the current device is assumed.
  • scope() returns the scope of the provided device.
Here is how you would define iBGP sessions between edge devices in the same location:
 % for neighbor in devices("location", "groups==edge") if neighbor != device % 
   % for address in lookup("topology", "addresses", neighbor).loopback tolist % 
protocols bgp group IPV  address ipv  -EDGES-IBGP  
  neighbor   address    
    description "IPv  address ipv  : iBGP to   neighbor  ";
   
 
   % endfor % 
 % endfor % 
We also have a global key-value store to save information to be reused in another template or device. This is quite useful to automatically build DNS records. First, capture the IP address inserted into a template with store() as a filter:
interface Loopback0
 description 'Loopback:'
  % for address in lookup('topology', 'addresses').loopback tolist % 
 ipv  address ipv   address   address store('addresses', 'Loopback0') ipaddr('cidr')  
  % endfor % 
!
Then, reuse it later to build DNS records by iterating over store():4
 % for device, ip, interface in store('addresses') % 
   % set interface = interface replace('/', '-') replace('.', '-') replace(':', '-') % 
   % set name = ' . '.format(interface lower, device) % 
  name  . IN   'A' if ip ipv4 else 'AAAA'     ip ipaddr('address')  
 % endfor % 
Templates are compiled locally with ./run-jerikan build. The --limit argument restricts the devices to generate configuration files for. Build is not done in parallel because a template may depend on the data collected by another template. Currently, it takes 1 minute to compile around 3000 files spanning over 800 devices.
Jerikan outputs when building templates
Output of Jerikan after building configuration files for six devices
When an error occurs, a detailed traceback is displayed, including the template name, the line number and the value of all visible variables. This is a major time-saver compared to Ansible!
templates/opengear/config.j2:15: in top-level template code
    config.interfaces.  interface  .netmask   adddress   ipaddr("netmask")  
        continent  = 'us'
        device     = 'con1-ag2.ussfo03.blade-group.net'
        environment = 'prod'
        host       = 'con1-ag2.ussfo03'
        infos      =  'address': '172.30.24.19/21' 
        interface  = 'wan'
        location   = 'ussfo03'
        loop       = <LoopContext 1/2>
        member     = '2'
        model      = 'cm7132-2-dac'
        os         = 'opengear'
        shorthost  = 'con1-ag2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = JerkianUndefined, query = 'netmask', version = False, alias = 'ipaddr'
[ ]
        # Check if value is a list and parse each element
        if isinstance(value, (list, tuple, types.GeneratorType)):
            _ret = [ipaddr(element, str(query), version) for element in value]
            return [item for item in _ret if item]
>       elif not value or value is True:
E       jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'adddress'
We don t have general-purpose rules when writing templates. Like for the source of truth, there is no need to create generic templates able to produce any BGP configuration. There is a balance to be found between readability and avoiding duplication. Templates can become scary and complex: sometimes, it s better to write a filter or a function in jerikan/jinja.py. Mastering Jinja2 is a good investment. Take time to browse through our templates as some of them show interesting features.

Checks Optionally, each configuration file can be validated by a script in the checks/ directory. Jerikan looks up the key checks in the build namespace to know which checks to run:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build checks
- description: Juniper configuration file syntax check
  script: checks/junoser
  cache:
    input: config.txt
    output: config-set.txt
- description: check YAML data
  script: checks/data.yaml
  cache: data.yaml
In the above example, checks/junoser is executed if there is a change to the generated config.txt file. It also outputs a transformed version of the configuration file which is easier to understand when using diff. Junoser checks a Junos configuration file using Juniper s XML schema definition for Netconf.5 On error, Jerikan displays:
jerikan/build.py:127: RuntimeError
-------------- Captured syntax check with Junoser call --------------
P: checks/junoser edge2.ussfo03.blade-group.net
C: /app/jerikan
O:
E: Invalid syntax:  set system syslog archive size 10m files 10 word-readable
S: 1

Integration into GitLab CI The next step is to compile the templates using a CI. As we are using GitLab, Jerikan ships with a .gitlab-ci.yml file. When we need to make a change, we create a dedicated branch and a merge request. GitLab compiles the templates using the same environment we use on our laptops and store them as an artifact.
Merge requests in GitLab for a change
Merge request to add a new port in USSFO03. The templates were compiled successfully but approval from another team member is still required to merge.
Before approving the merge request, another team member looks at the changes in data and templates but also the differences for the generated configuration files:
Differences for generated configuration files
The change configures a port on the Juniper device, adds records to DNS, and updates NetBox with the new IP addresses (not shown).

Ansible After Jerikan has built the configuration files, Ansible takes over. It is also packaged as a Docker image to avoid the trouble to maintain the right Python virtual environment and ensure everyone is using the same versions.

Inventory Jerikan has generated an inventory file. It contains all the managed devices, the variables defined for each of them and the groups converted to Ansible groups:
ob1-n1.sk1.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob2-n1.sk1.blade-group.net ansible_host=172.29.15.13 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob1-n1.ussfo03.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
none ansible_connection=local
[oob]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[os-ios]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[model-c2960s]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[location-sk1]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
[location-ussfo03]
ob1-n1.ussfo03.blade-group.net
[in-sync]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
none
in-sync is a special group for devices which configuration should match the golden configuration. Daily and unattended, Ansible should be able to push configurations to this group. The mid-term goal is to cover all devices. none is a special device for tasks not related to a specific host. This includes synchronizing NetBox, IRR objects, and the DNS, updating the RPKI, and building the geofeed files.

Playbook We use a single playbook for all devices. It is described in the ansible/playbooks/site.yaml file. Here is a shortened version:
- hosts: adm-gateway:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - blade.adm-gateway
    - done
- hosts: os-linux:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - done
- hosts: os-junos:!done
  gather_facts: false
  roles:
    - blade.junos
    - done
- hosts: os-opengear:!done
  gather_facts: false
  roles:
    - blade.opengear
    - done
- hosts: none:!done
  gather_facts: false
  roles:
    - blade.none
    - done
A host executes only one of the play. For example, a Junos device executes the blade.junos role. Once a play has been executed, the device is added to the done group and the other plays are skipped. The playbook can be executed with the configuration files generated by the GitLab CI using the ./run-ansible-gitlab command. This is a wrapper around Docker and the ansible-playbook command and it accepts the same arguments. To deploy the configuration on the edge devices for the SK1 datacenter in check mode, we use:
$ ./run-ansible-gitlab playbooks/site.yaml --limit='edge:&location-sk1' --diff --check
[ ]
PLAY RECAP *************************************************************
edge1.sk1.blade-group.net  : ok=6    changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
edge2.sk1.blade-group.net  : ok=5    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
We have some rules when writing roles:
  • --check must detect if a change is needed;
  • --diff must provide a visualization of the planned changes;
  • --check and --diff must not display anything if there is nothing to change;
  • writing a custom module tailored to our needs is a valid solution;
  • the whole device configuration is managed;6
  • secrets must be stored in Vault;
  • templates should be avoided as we have Jerikan for that; and
  • avoid duplication and reuse tasks.7
We avoid using collections from Ansible Galaxy, the exception being collections to connect and interact with vendor devices, like cisco.iosxr collection. The quality of Ansible Galaxy collections is quite random and it is an additional maintenance burden. It seems better to write roles tailored to our needs. The collections we use are in ci/ansible/ansible-galaxy.yaml. We use Mitogen to get a 10 speedup on Ansible executions on Linux hosts. We also have a few playbooks for operational purpose: upgrading the OS version, isolate an edge router, etc. We were also planning on how to add operational checks in roles: are all the BGP sessions up? They could have been used to validate a deployment and rollback if there is an issue. Currently, our playbooks are run from our laptops. To keep tabs, we are using ARA. A weekly dry-run on devices in the in-sync group also provides a dashboard on which devices we need to run Ansible on.

Configuration data and templates Jerikan ships with pre-populated data and templates matching the configuration of our USSFO03 and SK1 datacenters. They do not exist anymore but, we promise, all this was used in production back in the days!
Network architecture for Blade datacenter
The latest iteration of our network infrastructure for SK1, USSFO03, and future data centers. The production network is using BGPttH using a spine-leaf fabric. The out-of-band network is using a simple L2 design, using the spanning tree protocol, as well as a set of console servers.
Notably, you can find the configuration for:
our edge routers
Some are running on Junos, like edge2.ussfo03, the others on IOS-XR, like edge1.sk1. The implemented functionalities are similar in both cases and we could swap one for the other. It includes the BGP configuration for transits, peerings, and IX as well as the associated BGP policies. PeeringDB is queried to get the maximum number of prefixes to accept for peerings. bgpq3 and a containerized IRRd help to filter received routes. A firewall is added to protect the routing engine. Both IPv4 and IPv6 are configured.
our BGP-based fabric
BGP is used inside the datacenter8 and is extended on bare-metal hosts. The configuration is automatically derived from the device location and the port number.9 Top-of-the-rack devices are using passive BGP sessions for ports towards servers. They are also serving a provisioning network to let them boot using DHCP and PXE. They also act as a DHCP server. The design is multivendor. Some devices are running Cumulus Linux, like to1-p1.ussfo03, while some others are running Junos, like to1-p2.ussfo03.
our out-of-band fabric
We are using Cisco Catalyst 2960 switches to build an L2 out-of-band network. To provide redundancy and saving a few bucks on wiring, we build small loops and run the spanning-tree protocol. See ob1-p1.ussfo03. It is redundantly connected to our gateway servers. We also use OpenGear devices for console access. See con1-n1.ussfo03
our administrative gateways
These Linux servers have multiple purposes: SSH jump boxes, rescue connection, direct access to the out-of-band network, zero-touch provisioning of network devices,10 Internet access for management flows, centralization of the console servers using Conserver, and API for autoconfiguration of BGP sessions for bare-metal servers. They are the first servers installed in a new datacenter and are used to provision everything else. Check both the generated files and the associated Ansible tasks.

  1. Ansible does not even provide a line number when there is an error in a template. You may need to find the problem by bisecting.
    $ ansible --version
    ansible 2.10.8
    [ ]
    $ cat test.j2
    Hello   name  !
    $ ansible all -i localhost, \
    >  --connection=local \
    >  -m template \
    >  -a "src=test.j2 dest=test.txt"
    localhost   FAILED! =>  
        "changed": false,
        "msg": "AnsibleUndefinedVariable: 'name' is undefined"
     
    
  2. You may recognize the same concepts as in Hiera, the hierarchical key-value store from Puppet. At first, we were using Jerakia, a similar independent store exposing an HTTP REST interface. However, the lookup overhead is too large for our use. Jerikan implements the same functionality within a Python function.
  3. The list of available filters is mangled inside jerikan/jinja.py. This is a remain of the fact we do not maintain Jerikan as a standalone software.
  4. This is a bit confusing: we have a store() filter and a store() function. With Jinja2, filters and functions live in two different namespaces.
  5. We are using a fork with some modifications to be able to validate our configurations and exposing an HTTP service to reduce the time spent on each configuration check.
  6. There is a trend in network automation to automate a configuration subset, for example by having a playbook to create a new BGP session. We believe this is wrong: with time, your configuration will get out-of-sync with its expected state, notably hand-made changes will be left undetected.
  7. See ansible/roles/blade.linux/tasks/firewall.yaml and ansible/roles/blade.linux/tasks/interfaces.yaml. They are meant to be called when needed, using import_role.
  8. We also have some datacenters using BGP EVPN VXLAN at medium-scale using Juniper devices. As they are still in production today, we didn t include this feature but we may publish it in the future.
  9. In retrospect, this may not be a good idea unless you are pretty sure everything is uniform (number of switches for each layer, number of ports). This was not our case. We now think it is a better idea to assign a prefix to each device and write it in the source of truth.
  10. Non-linux based devices are upgraded and configured unattended. Cumulus Linux devices are automatically upgraded on install but the final configuration has to be pushed using Ansible: we didn t want to duplicate the configuration process using another tool.

17 April 2021

Chris Lamb: Tour d'Orwell: Wallington

Previously in George Orwell travel posts: Sutton Courtenay, Marrakesh, Hampstead, Paris, Southwold & The River Orwell. Wallington is a small village in Hertfordshire, approximately fifty miles north of London and twenty-five miles from the outskirts of Cambridge. George Orwell lived at No. 2 Kits Lane, better known as 'The Stores', on a mostly-permanent basis from 1936 to 1940, but he would continue to journey up from London on occasional weekends until 1947. His first reference to The Stores can be found in early 1936, where Orwell wrote from Lancashire during research for The Road to Wigan Pier to lament that he would very much like "to do some work again impossible, of course, in the [current] surroundings":
I am arranging to take a cottage at Wallington near Baldock in Herts, rather a pig in a poke because I have never seen it, but I am trusting the friends who have chosen it for me, and it is very cheap, only 7s. 6d. a week [ 20 in 2021].
For those not steeped in English colloquialisms, "a pig in a poke" is an item bought without seeing it in advance. In fact, one general insight that may be drawn from reading Orwell's extant correspondence is just how much he relied on a close network of friends, belying the lazy and hagiographical picture of an independent and solitary figure. (Still, even Orwell cultivated this image at times, such as in a patently autobiographical essay he wrote in 1946. But note the off-hand reference to varicose veins here, for they would shortly re-appear as a symbol of Winston's repressed humanity in Nineteen Eighty-Four.) Nevertheless, the porcine reference in Orwell's idiom is particularly apt, given that he wrote the bulk of Animal Farm at The Stores his 1945 novella, of course, portraying a revolution betrayed by allegorical pigs. Orwell even drew inspiration for his 'fairy story' from Wallington itself, principally by naming the novel's farm 'Manor Farm', just as it is in the village. But the allusion to the purchase of goods is just as appropriate, as Orwell returned The Stores to its former status as the village shop, even going so far as to drill peepholes in a door to keep an Orwellian eye on the jars of sweets. (Unfortunately, we cannot complete a tidy circle of references, as whilst it is certainly Napoleon Animal Farm's substitute for Stalin who is quoted as describing Britain as "a nation of shopkeepers", it was actually the maraisard Bertrand Bar re who first used the phrase). "It isn't what you might call luxurious", he wrote in typical British understatement, but Orwell did warmly emote on his animals. He kept hens in Wallington (perhaps even inspiring the opening line of Animal Farm: "Mr Jones, of the Manor Farm, had locked the hen-houses for the night, but was too drunk to remember to shut the pop-holes.") and a photograph even survives of Orwell feeding his pet goat, Muriel. Orwell's goat was the eponymous inspiration for the white goat in Animal Farm, a decidedly under-analysed character who, to me, serves to represent an intelligentsia that is highly perceptive of the declining political climate but, seemingly content with merely observing it, does not offer any meaningful opposition. Muriel's aesthetic of resistance, particularly in her reporting on the changes made to the Seven Commandments of the farm, thus rehearses the well-meaning (yet functionally ineffective) affinity for 'fact checking' which proliferates today. But I digress. There is a tendency to "read Orwell backwards", so I must point out that Orwell wrote several other works whilst at The Stores as well. This includes his Homage to Catalonia, his aforementioned The Road to Wigan Pier, not to mention countless indispensable reviews and essays as well. Indeed, another result of focusing exclusively on Orwell's last works is that we only encounter his ideas in their highly-refined forms, whilst in reality, it often took many years for concepts to fully mature we first see, for instance, the now-infamous idea of "2 + 2 = 5" in an essay written in 1939. This is important to understand for two reasons. Although the ostentatiously austere Barnhill might have housed the physical labour of its writing, it is refreshing to reflect that the philosophical heavy-lifting of Nineteen Eighty-Four may have been performed in a relatively undistinguished North Hertfordshire village. But perhaps more importantly, it emphasises that Orwell was just a man, and that any of us is fully capable of equally significant insight, with to quote Christopher Hitchens "little except a battered typewriter and a certain resilience."
The red commemorative plaque not only limits Orwell's tenure to the time he was permanently in the village, it omits all reference to his first wife, Eileen O'Shaughnessy, whom he married in the village church in 1936.
Wallington's Manor Farm, the inspiration for the farm in Animal Farm. The lower sign enjoins the public to inform the police "if you see anyone on the [church] roof acting suspiciously". Non-UK-residents may be surprised to learn about the systematic theft of lead.

Next.