Search Results: "vincent"

6 June 2025

Dirk Eddelbuettel: #49: The Two Cultures of Deploying Statistical Software

Welcome to post 49 in the R4 series. The Two Cultures is a term first used by C.P. Snow in a 1959 speech and monograph focused on the split between humanities and the sciences. Decades later, the term was (quite famously) re-used by Leo Breiman in a (somewhat prophetic) 2001 article about the split between data models and algorithmic models . In this note, we argue that statistical computing practice and deployment can also be described via this Two Cultures moniker. Referring to the term linking these foundational pieces is of course headline bait. Yet when preparing for the discussion of r2u in the invited talk in Mons (video, slides), it occurred to me that there is in fact a wide gulf between two alternative approaches of using R and, specifically, deploying packages. On the one hand we have the approach described by my friend Jeff as you go to the Apple store, buy the nicest machine you can afford, install what you need and then never ever touch it . A computer / workstation / laptop is seen as an immutable object where every attempt at change may lead to breakage, instability, and general chaos and is hence best avoided. If you know Jeff, you know he exaggerates. Maybe only slightly though. Similarly, an entire sub-culture of users striving for reproducibility (and sometimes also replicability ) does the same. This is for example evidenced by the popularity of package renv by Rcpp collaborator and pal Kevin. The expressed hope is that by nailing down a (sub)set of packages, outcomes are constrained to be unchanged. Hope springs eternal, clearly. (Personally, if need be, I do the same with Docker containers and their respective Dockerfile.) On the other hand, rolling is fundamentally different approach. One (well known) example is Google building everything at @HEAD . The entire (ginormous) code base is considered as a mono-repo which at any point in time is expected to be buildable as is. All changes made are pre-tested to be free of side effects to other parts. This sounds hard, and likely is more involved than an alternative of a whatever works approach of independent changes and just hoping for the best. Another example is a rolling (Linux) distribution as for example Debian. Changes are first committed to a staging place (Debian calls this the unstable distribution) and, if no side effects are seen, propagated after a fixed number of days to the rolling distribution (called testing ). With this mechanism, testing should always be installable too. And based on the rolling distribution, at certain times (for Debian roughly every two years) a release is made from testing into stable (following more elaborate testing). The released stable version is then immutable (apart from fixes for seriously grave bugs and of course security updates). So this provides the connection between frequent and rolling updates, and produces immutable fixed set: a release. This Debian approach has been influential for any other projects including CRAN as can be seen in aspects of its system providing a rolling set of curated packages. Instead of a staging area for all packages, extensive tests are made for candidate packages before adding an update. This aims to ensure quality and consistence and has worked remarkably well. We argue that it has clearly contributed to the success and renown of CRAN. Now, when accessing CRAN from R, we fundamentally have two accessor functions. But seemingly only one is widely known and used. In what we may call the Jeff model , everybody is happy to deploy install.packages() for initial installations. That sentiment is clearly expressed by this bsky post:
One of my #rstats coding rituals is that every time I load a @vincentab.bsky.social package I go check for a new version because invariably it s been updated with 18 new major features
And that is why we have two cultures. Because some of us, yours truly included, also use update.packages() at recurring (frequent !!) intervals: daily or near-daily for me. The goodness and, dare I say, gift of packages is not limited to those by my pal Vincent. CRAN updates all the time, and updates are (generally) full of (usually excellent) changes, fixes, or new features. So update frequently! Doing (many but small) updates (frequently) is less invasive than (large, infrequent) waterfall -style changes! But the fear of change, or disruption, is clearly pervasive. One can only speculate why. Is the experience of updating so painful on other operating systems? Is it maybe a lack of exposure / tutorials on best practices? These Two Cultures coexist. When I delivered the talk in Mons, I briefly asked for a show of hands among all the R users in the audience to see who in fact does use update.packages() regularly. And maybe a handful of hands went up: surprisingly few! Now back to the context of installing packages: Clearly only installing has its uses. For continuous integration checks we generally install into ephemeral temporary setups. Some debugging work may be with one-off container or virtual machine setups. But all other uses may well be under maintained setups. So consider calling update.packages() once in while. Or even weekly or daily. The rolling feature of CRAN is a real benefit, and it is there for the taking and enrichment of your statistical computing experience. So to sum up, the real power is to use For both tasks, relying on binary installations accelerates and eases the process. And where available, using binary installation with system-dependency support as r2u does makes it easier still, following the r2u slogan of Fast. Easy. Reliable. Pick All Three. Give it a try!

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

18 March 2025

Christian Kastner: 15th Anniversary of My First Debian Upload

Time flies! 15 years ago, on 2010-03-18, my first upload to the Debian archive was accepted. Debian had replaced Windows as my primary OS in 2005, but it was only when I saw that package zd1211-firmware had been orphaned that I thought of becoming a contributor. I owned a Zyxel G-202 USB WiFi fob that needed said firmware, and as is so often is with open-source software, I was going to scratch my own itch. Bart Martens thankfully helped me adopt the package, and sponsored my upload. I then joined Javier Fern ndez-Sanguino Pe a as a cron maintainer and upstream, and also worked within the Debian Python Applications, Debian Python Modules, and Debian Science Teams, where Jakub Wilk and Yaroslav Halchenko were kind enough to mentor me and eventually support my application to become a Debian Maintainer. Life intervened, and I was mostly inactive in Debian for the next two years. Upon my return in 2014, I had Vincent Cheng to thank for sponsoring most of my newer work, and for eventually supporting my application to become a Debian Developer. It was around that time that I also attended my first DebConf, in Portland, which remains one of my fondest memories. I had never been to an open-source software conference before, and DebConf14 really knocked it out of the park in so many ways. After another break, I returned in 2019 to work mostly on Python and machine learning libraries. In 2020, I finally completed a process that I had first started in 2012 but had never managed to finish before: converting cron from source format 1.0 (one big diff) to source format 3.0 (quilt) (a series of patches). This was a process where I converted 25 years worth of organic growth into a minimal series of logically grouped changes (more here). This was my white whale. In early 2023, shortly after the launch of ChatGPT which triggered an unprecedented AI boom, I started contributing to the Debian ROCm Team, where over the following year, I bootstrapped our CI at ci.rocm.debian.net. Debian's current tooling lack a way to express dependencies on specific hardware other than CPU ISA, nor does it have the means to run autopkgtests using such hardware. To get autopkgtests to make use of AMD GPUs in QEMU VMs and in containers, I had to fork autopkgtest, debci, and a few other components, as well as create a fair share of new tooling for ourselves. This worked out pretty well, and the CI has grown to support 17 different AMD GPU architectures. I will share more on this in upcoming posts. I have mentioned a few contributors by name, but I have countless others to thank for collaborations over the years. It has been a wonderful experience, and I look forward to many years more.

17 March 2025

Vincent Bernat: Offline PKI using 3 YubiKeys and an ARM single board computer

An offline PKI enhances security by physically isolating the certificate authority from network threats. A YubiKey is a low-cost solution to store a root certificate. You also need an air-gapped environment to operate the root CA.
PKI relying on a set of 3 YubiKeys: 2 for the root CA and 1 for the intermediate CA.
Offline PKI backed up by 3 YubiKeys
This post describes an offline PKI system using the following components: It is possible to add more YubiKeys as a backup of the root CA if needed. This is not needed for the intermediate CA as you can generate a new one if the current one gets destroyed.

The software part offline-pki is a small Python application to manage an offline PKI. It relies on yubikey-manager to manage YubiKeys and cryptography for cryptographic operations not executed on the YubiKeys. The application has some opinionated design choices. Notably, the cryptography is hard-coded to use NIST P-384 elliptic curve. The first step is to reset all your YubiKeys:
$ offline-pki yubikey reset
This will reset the connected YubiKey. Are you sure? [y/N]: y
New PIN code:
Repeat for confirmation:
New PUK code:
Repeat for confirmation:
New management key ('.' to generate a random one):
WARNING[pki-yubikey] Using random management key: e8ffdce07a4e3bd5c0d803aa3948a9c36cfb86ed5a2d5cf533e97b088ae9e629
INFO[pki-yubikey]  0: Yubico YubiKey OTP+FIDO+CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.management] Device config written
INFO[yubikit.piv] PIV application data reset performed
INFO[yubikit.piv] Management key set
INFO[yubikit.piv] New PUK set
INFO[yubikit.piv] New PIN set
INFO[pki-yubikey] YubiKey reset successful!
Then, generate the root CA and create as many copies as you want:
$ offline-pki certificate root --permitted example.com
Management key for Root X:
Plug YubiKey "Root X"...
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.piv] Data written to object slot 0x5fc10a
INFO[yubikit.piv] Certificate written to slot 9C (SIGNATURE), compression=True
INFO[yubikit.piv] Private key imported in slot 9C (SIGNATURE) of type ECCP384
Copy root certificate to another YubiKey? [y/N]: y
Plug YubiKey "Root X"...
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[yubikit.piv] Data written to object slot 0x5fc10a
INFO[yubikit.piv] Certificate written to slot 9C (SIGNATURE), compression=True
INFO[yubikit.piv] Private key imported in slot 9C (SIGNATURE) of type ECCP384
Copy root certificate to another YubiKey? [y/N]: n
You can inspect the result:
$ offline-pki yubikey info
INFO[pki-yubikey]  0: Yubico YubiKey CCID 00 00
INFO[pki-yubikey] SN: 23854514
INFO[pki-yubikey] Slot 9C (SIGNATURE):
INFO[pki-yubikey]   Private key type: ECCP384
INFO[pki-yubikey]   Public key:
INFO[pki-yubikey]     Algorithm:  secp384r1
INFO[pki-yubikey]     Issuer:     CN=Root CA
INFO[pki-yubikey]     Subject:    CN=Root CA
INFO[pki-yubikey]     Serial:     1
INFO[pki-yubikey]     Not before: 2024-07-05T18:17:19+00:00
INFO[pki-yubikey]     Not after:  2044-06-30T18:17:19+00:00
INFO[pki-yubikey]     PEM:
-----BEGIN CERTIFICATE-----
MIIBcjCB+aADAgECAgEBMAoGCCqGSM49BAMDMBIxEDAOBgNVBAMMB1Jvb3QgQ0Ew
HhcNMjQwNzA1MTgxNzE5WhcNNDQwNjMwMTgxNzE5WjASMRAwDgYDVQQDDAdSb290
IENBMHYwEAYHKoZIzj0CAQYFK4EEACIDYgAERg3Vir6cpEtB8Vgo5cAyBTkku/4w
kXvhWlYZysz7+YzTcxIInZV6mpw61o8W+XbxZV6H6+3YHsr/IeigkK04/HJPi6+i
zU5WJHeBJMqjj2No54Nsx6ep4OtNBMa/7T9foyMwITAPBgNVHRMBAf8EBTADAQH/
MA4GA1UdDwEB/wQEAwIBhjAKBggqhkjOPQQDAwNoADBlAjEAwYKy/L8leJyiZSnn
xrY8xv8wkB9HL2TEAI6fC7gNc2bsISKFwMkyAwg+mKFKN2w7AjBRCtZKg4DZ2iUo
6c0BTXC9a3/28V5aydZj6rvx0JqbF/Ln5+RQL6wFMLoPIvCIiCU=
-----END CERTIFICATE-----
Then, you can create an intermediate certificate with offline-pki yubikey intermediate and use it to sign certificates by providing a CSR to offline-pki certificate sign. Be careful and inspect the CSR before signing it, as only the subject name can be overridden. Check the documentation for more details. Get the available options using the --help flag.

The hardware part To ensure the operations on the root and intermediate CAs are air-gapped, a cost-efficient solution is to use an ARM64 single board computer. The Libre Computer Sweet Potato SBC is a more open alternative to the well-known Raspberry Pi.1
Libre Computer Sweet Potato single board computer relying on the Amlogic S905X SOC
Libre Computer Sweet Potato SBC, powered by the AML-S905X SOC
I interact with it through an USB to TTL UART converter:
$ tio /dev/ttyUSB0
[16:40:44.546] tio v3.7
[16:40:44.546] Press ctrl-t q to quit
[16:40:44.555] Connected to /dev/ttyUSB0
GXL:BL1:9ac50e:bb16dc;FEAT:ADFC318C:0;POC:1;RCY:0;SPI:0;0.0;CHK:0;
TE: 36574
BL2 Built : 15:21:18, Aug 28 2019. gxl g1bf2b53 - luan.yuan@droid15-sz
set vcck to 1120 mv
set vddee to 1000 mv
Board ID = 4
CPU clk: 1200MHz
[ ]

The Nix glue To bring everything together, I am using Nix with a Flake providing:
  • a package for the offline-pki application, with shell completion,
  • a development shell, including an editable version of the offline-pki application,
  • a NixOS module to setup the offline PKI, resetting the system at each boot,
  • a QEMU image for testing, and
  • an SD card image to be used on the Sweet Potato or another ARM64 SBC.
# Execute the application locally
nix run github:vincentbernat/offline-pki -- --help
# Run the application inside a QEMU VM
nix run github:vincentbernat/offline-pki\#qemu
# Build a SD card for the Sweet Potato or for the Raspberry Pi
nix build --system aarch64-linux github:vincentbernat/offline-pki\#sdcard.potato
nix build --system aarch64-linux github:vincentbernat/offline-pki\#sdcard.generic
# Get a development shell with the application
nix develop github:vincentbernat/offline-pki

  1. The key for the root CA is not generated by the YubiKey. Using an air-gapped computer is all the more important. Put it in a safe with the YubiKeys when done!

8 March 2025

Vincent Bernat: Auto-expanding aliases in Zsh

To avoid needless typing, the fish shell features command abbreviations to expand some words after pressing space. We can emulate such a feature with Zsh:
# Definition of abbrev-alias for auto-expanding aliases
typeset -ga _vbe_abbrevations
abbrev-alias()  
    alias $1
    _vbe_abbrevations+=($ 1%%\=* )
 
_vbe_zle-autoexpand()  
    local -a words; words=($ (z)LBUFFER )
    if (( $  #_vbe_abbrevations[(r)$ words[-1] ]  )); then
        zle _expand_alias
    fi
    zle magic-space
 
zle -N _vbe_zle-autoexpand
bindkey -M emacs " " _vbe_zle-autoexpand
bindkey -M isearch " " magic-space
# Correct common typos
(( $+commands[git] )) && abbrev-alias gti=git
(( $+commands[grep] )) && abbrev-alias grpe=grep
(( $+commands[sudo] )) && abbrev-alias suod=sudo
(( $+commands[ssh] )) && abbrev-alias shs=ssh
# Save a few keystrokes
(( $+commands[git] )) && abbrev-alias gls="git ls-files"
(( $+commands[ip] )) &&  
  abbrev-alias ip6='ip -6'
  abbrev-alias ipb='ip -brief'
 
# Hard to remember options
(( $+commands[mtr] )) && abbrev-alias mtrr='mtr -wzbe'
Here is a demo where gls is expanded to git ls-files after pressing space:
Auto-expanding gls to git ls-files
I don t auto-expand all aliases. I keep using regular aliases when slightly modifying the behavior of a command or for well-known abbreviations:
alias df='df -h'
alias du='du -h'
alias rm='rm -i'
alias mv='mv -i'
alias ll='ls -ltrhA'

21 January 2025

Dirk Eddelbuettel: ttdo 0.0.10 on CRAN: Small Extension

A new minor release of our ttdo package arrived on CRAN a few days ago. The ttdo package extends the excellent (and very minimal / zero depends) unit testing package tinytest by Mark van der Loo with the very clever and well-done diffobj package by Brodie Gaslam to give us test results with visual diffs (as shown in the screenshot below) which seemingly is so compelling an idea that it eventually got copied by another package which shall remain unnamed And as of this release, we also support visual diffs as provided by tinysnapshot by Vincent Arel-Bundock. ttdo screenshot This release chiefly adds support for visual diff plots. As we extend tinytest for use in the R autograder we maintain and deploy within the lovely PrairieLearn framework, we need to encode the diffplot that has been produced by the autograder (commonly in a separate Docker process) to communicate it back to the controlling (Docker also) instance of PrairieLearn. We also did the usual bits of package maintenance keeping badges, URLs, continuous integration and minor R coding standards current. As usual, the NEWS entry follows.

Changes in ttdo version 0.0.10 (2025-01-21)
  • Regular packaging updates to badges and continuous integration setup
  • Add support for a 'visual difference' taking advantage of a tinysnapshot feature returning a ttvd-typed result
  • Added (versioned) dependency on tinysnapshot (to also get the plot 'style' enhancement in the most recent version) and base64enc
  • Return ttdo-typed result if diffobj-printed result is returned
  • Ensure all package-documentation links in manual page have anchors to the package

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the issue tracker at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

11 November 2024

Vincent Bernat: Customize Caddy's plugins with Nix

Caddy is an open-source web server written in Go. It handles TLS certificates automatically and comes with a simple configuration syntax. Users can extend its functionality through plugins1 to add features like rate limiting, caching, and Docker integration. While Caddy is available in Nixpkgs, adding extra plugins is not simple.2 The compilation process needs Internet access, which Nix denies during build to ensure reproducibility. When trying to build the following derivation using xcaddy, a tool for building Caddy with plugins, it fails with this error: dial tcp: lookup proxy.golang.org on [::1]:53: connection refused.
  pkgs  :
pkgs.stdenv.mkDerivation  
  name = "caddy-with-xcaddy";
  nativeBuildInputs = with pkgs; [ go xcaddy cacert ];
  unpackPhase = "true";
  buildPhase =
    ''
      xcaddy build --with github.com/caddy-dns/powerdns@v1.0.1
    '';
  installPhase = ''
    mkdir -p $out/bin
    cp caddy $out/bin
  '';
 
Fixed-output derivations are an exception to this rule and get network access during build. They need to specify their output hash. For example, the fetchurl function produces a fixed-output derivation:
  stdenv, fetchurl  :
stdenv.mkDerivation rec  
  pname = "hello";
  version = "2.12.1";
  src = fetchurl  
    url = "mirror://gnu/hello/hello-$ version .tar.gz";
    hash = "sha256-jZkUKv2SV28wsM18tCqNxoCZmLxdYH2Idh9RLibH2yA=";
   ;
 
To create a fixed-output derivation, you need to set the outputHash attribute. The example below shows how to output Caddy s source code, with some plugin enabled, as a fixed-output derivation using xcaddy and go mod vendor.
pkgs.stdenvNoCC.mkDerivation rec  
  pname = "caddy-src-with-xcaddy";
  version = "2.8.4";
  nativeBuildInputs = with pkgs; [ go xcaddy cacert ];
  unpackPhase = "true";
  buildPhase =
    ''
      export GOCACHE=$TMPDIR/go-cache
      export GOPATH="$TMPDIR/go"
      XCADDY_SKIP_BUILD=1 TMPDIR="$PWD" \
        xcaddy build v$ version  --with github.com/caddy-dns/powerdns@v1.0.1
      (cd buildenv* && go mod vendor)
    '';
  installPhase = ''
    mv buildenv* $out
  '';
  outputHash = "sha256-F/jqR4iEsklJFycTjSaW8B/V3iTGqqGOzwYBUXxRKrc=";
  outputHashAlgo = "sha256";
  outputHashMode = "recursive";
 
With a fixed-output derivation, it is up to us to ensure the output is always the same: You can use this derivation to override the src attribute in pkgs.caddy:
pkgs.caddy.overrideAttrs (prev:  
  src = pkgs.stdenvNoCC.mkDerivation   /* ... */  ;
  vendorHash = null;
  subPackages = [ "." ];
 );
Check out the complete example in the GitHub repository. To integrate into a Flake, add github:vincentbernat/caddy-nix as an overlay:
 
  inputs =  
    nixpkgs.url = "nixpkgs";
    flake-utils.url = "github:numtide/flake-utils";
    caddy.url = "github:vincentbernat/caddy-nix";
   ;
  outputs =   self, nixpkgs, flake-utils, caddy  :
    flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = import nixpkgs  
          inherit system;
          overlays = [ caddy.overlays.default ];
         ;
      in
       
        packages =  
          default = pkgs.caddy.withPlugins  
            plugins = [ "github.com/caddy-dns/powerdns@v1.0.1" ];
            hash = "sha256-F/jqR4iEsklJFycTjSaW8B/V3iTGqqGOzwYBUXxRKrc=";
           ;
         ;
       );
 

Update (2024-11) This flake won t work with Nixpkgs 24.05 or older because it relies on this commit to properly override the vendorHash attribute.


  1. This article uses the term plugins, though Caddy documentation also refers to them as modules since they are implemented as Go modules.
  2. This is a feature request since quite some time. A proposed solution has been rejected. The one described in this article is a bit different and I have proposed it in another pull request.
  3. This is not perfect: if the source code produced by xcaddy changes, the hash would change and the build would fail.

31 August 2024

Vincent Bernat: Fixing layout shifts caused by web fonts

In 2020, Google introduced Core Web Vitals metrics to measure some aspects of real-world user experience on the web. This blog has consistently achieved good scores for two of these metrics: Largest Contentful Paint and Interaction to Next Paint. However, optimizing the third metric, Cumulative Layout Shift, which measures unexpected layout changes, has been more challenging. Let s face it: optimizing for this metric is not really useful for a site like this one. But getting a better score is always a good distraction. To prevent the flash of invisible text when using web fonts, developers should set the font-display property to swap in @font-face rules. This method allows browsers to initially render text using a fallback font, then replace it with the web font after loading. While this improves the LCP score, it causes content reflow and layout shifts if the fallback and web fonts are not metrically compatible. These shifts negatively affect the CLS score. CSS provides properties to address this issue by overriding font metrics when using fallback fonts: size-adjust, ascent-override, descent-override, and line-gap-override. Two comprehensive articles explain each property and their computation methods in detail: Creating Perfect Font Fallbacks in CSS and Improved font fallbacks.

Interactive tuning tool Instead of computing each property from font average metrics, I put together a tool for interactively tuning fallback fonts.1

Instructions
  1. Load your custom font.
  2. Select a fallback font to tune.
  3. Adjust the size-adjust property to match the width of your custom font with the fallback font. With a proportional font, it is not possible to achieve a perfect match.
  4. Fine-tune the ascent-override property. Aim to align the final dot of the last paragraph while monitoring the font s baseline. For more precise adjustment, disable the option.
  5. Modify the descent-override property. The goal is to make the two boxes match. You may need to alternate between this and the previous property for optimal results.
  6. If necessary, adjust the line-gap-override property. This step is typically not required.
The process needs to be repeated for each fallback font. Some platforms may not include certain fonts. Notably, Android lacks most fonts found in other operating systems. It replaces Georgia with Noto Serif, which is not metrically-compatible.

Tool

This tool is not available from the Atom feed.

Results For the body text of this blog, I get the following CSS definition:
@font-face  
  font-family: Merriweather;
  font-style: normal;
  font-weight: 400;
  src: url("../fonts/merriweather.woff2") format("woff2");
  font-display: swap;
 
@font-face  
  font-family: "Fallback for Merriweather";
  src: local("Noto Serif"), local("Droid Serif");
  size-adjust: 98.3%;
  ascent-override: 99%;
  descent-override: 27%;
 
@font-face  
  font-family: "Fallback for Merriweather";
  src: local("Georgia");
  size-adjust: 106%;
  ascent-override: 90.4%;
  descent-override: 27.3%;
 
font-family: Merriweather, "Fallback for Merriweather", serif;
After a month, the CLS metric improved to 0:
Core Web Vitals scores for vincent.bernat.ch showing all 6 metrics as green. Notably the Cumulative Layout Shift is 0.
Recent Core Web Vitals scores for vincent.bernat.ch

About custom fonts Using safe web fonts or a modern font stack is often simpler. However, I prefer custom web fonts. Merriweather and Iosevka, which are used in this blog, enhance the reading experience. An alternative approach could be to use Georgia as a serif option. Unfortunately, most default monospace fonts are ugly. Furthermore, paragraphs that combine proportional and monospace fonts can create visual disruption. This occurs due to mismatched vertical metrics or weights. To address this issue, I adjust Iosevka s metrics and weight to align with Merriweather s characteristics.

  1. Similar tools already exist, like the Fallback Font Generator, but they were missing a few features, such as the ability to load the fallback font or to have decimals for the CSS properties. And no source code.

28 July 2024

Vincent Bernat: Crafting endless AS paths in BGP

Combining BGP confederations and AS override can potentially create a BGP routing loop, resulting in an indefinitely expanding AS path. BGP confederation is a technique used to reduce the number of iBGP sessions and improve scalability in large autonomous systems (AS). It divides an AS into sub-ASes. Most eBGP rules apply between sub-ASes, except that next-hop, MED, and local preferences remain unchanged. The AS path length ignores contributions from confederation sub-ASes. BGP confederation is rarely used and BGP route reflection is typically preferred for scaling. AS override is a feature that allows a router to replace the ASN of a neighbor in the AS path of outgoing BGP routes with its own. It s useful when two distinct autonomous systems share the same ASN. However, it interferes with BGP s loop prevention mechanism and should be used cautiously. A safer alternative is the allowas-in directive.1 In the example below, we have four routers in a single confederation, each in its own sub-AS. R0 originates the 2001:db8::1/128 prefix. R1, R2, and R3 forward this prefix to the next router in the loop.
BGP routing loop involving 4 routers: R0 originates a prefix, R1, R2, R3 make it loop using next-hop-self and as-override
BGP routing loop using a confederation
The router configurations are available in a Git repository. They are running Cisco IOS XR. R2 uses the following configuration for BGP:
router bgp 64502
 bgp confederation peers
  64500
  64501
  64503
 !
 bgp confederation identifier 64496
 bgp router-id 1.0.0.2
 address-family ipv6 unicast
 !
 neighbor 2001:db8::2:0
  remote-as 64501
  description R1
  address-family ipv6 unicast
  !
 !
 neighbor 2001:db8::3:1
  remote-as 64503
  advertisement-interval 0
  description R3
  address-family ipv6 unicast
   next-hop-self
   as-override
  !
 !
!
The session with R3 uses both as-override and next-hop-self directives. The latter is only necessary to make the announced prefix valid, as there is no IGP in this example.2 Here s the sequence of events leading to an infinite AS path:
  1. R0 sends the prefix to R1 with AS path (64500).3
  2. R1 selects it as the best path, forwarding it to R2 with AS path (64501 64500).
  3. R2 selects it as the best path, forwarding it to R3 with AS path (64500 64501 64502).
  4. R3 selects it as the best path. It would forward it to R1 with AS path (64503 64502 64501 64500), but due to AS override, it substitutes R1 s ASN with its own, forwarding it with AS path (64503 64502 64503 64500).
  5. R1 accepts the prefix, as its own ASN is not in the AS path. It compares this new prefix with the one from R0. Both (64500) and (64503 64502 64503 64500) have the same length because confederation sub-ASes don t contribute to AS path length. The first tie-breaker is the router ID. R0 s router ID (1.0.0.4) is higher than R3 s (1.0.0.3). The new prefix becomes the best path and is forwarded to R2 with AS path (64501 64503 64501 64503 64500).
  6. R2 receives the new prefix, replacing the old one. It selects it as the best path and forwards it to R3 with AS path (64502 64501 64502 64501 64502 64500).
  7. R3 receives the new prefix, replacing the old one. It selects it as the best path and forwards it to R0 with AS path (64503 64502 64503 64502 64503 64502 64500).
  8. R1 receives the new prefix, replacing the old one. Again, it competes with the prefix from R0, and again the new prefix wins due to the lower router ID. The prefix is forwarded to R2 with AS path (64501 64503 64501 64503 64501 64503 64501 64500).
A few iterations later, R1 views the looping prefix as follows:4
RP/0/RP0/CPU0:R1#show bgp ipv6 u 2001:db8::1/128 bestpath-compare
BGP routing table entry for 2001:db8::1/128
Last Modified: Jul 28 10:23:05.560 for 00:00:00
Paths: (2 available, best #2)
  Path #1: Received by speaker 0
  Not advertised to any peer
  (64500)
    2001:db8::1:0 from 2001:db8::1:0 (1.0.0.4), if-handle 0x00000000
      Origin IGP, metric 0, localpref 100, valid, confed-external
      Received Path ID 0, Local Path ID 0, version 0
      Higher router ID than best path (path #2)
  Path #2: Received by speaker 0
  Advertised IPv6 Unicast paths to peers (in unique update groups):
    2001:db8::2:1
  (64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64503 64502 64500)
    2001:db8::4:0 from 2001:db8::4:0 (1.0.0.3), if-handle 0x00000000
      Origin IGP, metric 0, localpref 100, valid, confed-external, best, group-best
      Received Path ID 0, Local Path ID 1, version 37
      best of AS 64503, Overall best
There s no upper bound for an AS path, but BGP messages have size limits (4096 bytes per RFC 4271 or 65535 bytes per RFC 8654). At some point, BGP updates can t be generated. On Cisco IOS XR, the BGP process crashes well before reaching this limit.5
The main lessons from this tale are:

  1. When using BGP confederations with Cisco IOS XR, use allowconfedas-in instead. It s available since IOS XR 7.11.
  2. Using BGP confederations is already inadvisable. If you don t use the same IGP for all sub-ASes, you re inviting trouble! However, the scenario described here is also possible with an IGP.
  3. When an AS path segment is composed of ASNs from a confederation, it is displayed between parentheses.
  4. By default, IOS XR paces eBGP updates. This is controlled by the advertisement-interval directive. Its default value is 30 seconds for eBGP peers (even in the same confederation). R1 and R2 set this value to 0, while R3 sets it to 2 seconds. This gives some time to watch the AS path grow.
  5. This is CSCwk15887. It only happens when using as-override on an AS path with a too long AS_CONFED_SEQUENCE. This should be fixed around 24.3.1.

23 June 2024

Vincent Bernat: Why content providers need IPv6

IPv4 is an expensive resource. However, many content providers are still IPv4-only. The most common reason is that IPv4 is here to stay and IPv6 is an additional complexity.1 This mindset may seem selfish, but there are compelling reasons for a content provider to enable IPv6, even when they have enough IPv4 addresses available for their needs.

Disclaimer It s been a while since this article has been in my drafts. I started it when I was working at Shadow, a content provider, while I now work for Free, an internet service provider.

Why ISPs need IPv6? Providing a public IPv4 address to each customer is quite costly when each IP address costs US$40 on the market. For fixed access, some consumer ISPs are still providing one IPv4 address per customer.2 Other ISPs provide, by default, an IPv4 address shared among several customers. For mobile access, most ISPs distribute a shared IPv4 address. There are several methods to share an IPv4 address:3
NAT44
The customer device is given a private IPv4 address, which is translated to a public one by a service provider device. This device needs to maintain a state for each translation.
464XLAT and DS-Lite
The customer device translates the private IPv4 address to an IPv6 address or encapsulates IPv4 traffic in IPv6 packets. The provider device then translates the IPv6 address to a public IPv4 address. It still needs to maintain a state for the NAT64 translation.
Lightweight IPv4 over IPv6, MAP-E, and MAP-T
The customer device encapsulates IPv4 in IPv6 packets or performs a stateless NAT46 translation. The provider device uses a binding table or an algorithmic rule to map IPv6 tunnels to IPv4 addresses and ports. It does not need to maintain a state.
Solutions to share an IPv4 address
Solutions to share an IPv4 address across several customers. Some of them require the ISP to keep state, some don't.
All these solutions require a translation device in the ISP s network. This device represents a non-negligible cost in terms of money and reliability. As half of the top 1000 websites support IPv6 and the biggest players can deliver most of their traffic using IPv6,4 ISPs have a clear path to reduce the cost of translation devices: provide IPv6 by default to their customers.

Why content providers need IPv6? Content providers should expose their services over IPv6 primarily to avoid going through the ISP s translation devices. This doesn t help users who don t have IPv6 or users with a non-shared IPv4 address, but it provides a better service for all the others. Why would the service be better delivered over IPv6 than over IPv4 when a translation device is in the path? There are two main reasons for that:5
  1. Translation devices introduce additional latency due to their geographical placement inside the network: it is easier and cheaper to only install these devices at a few points in the network instead of putting them close to the users.
  2. Translation devices are an additional point of failure in the path between the user and the content. They can become overloaded or malfunction. Moreover, as they are not used for the five most visited websites, which serve their traffic over IPv6, the ISPs may not be incentivized to ensure they perform as well as the native IPv6 path.
Looking at Google statistics, half of the users reach Google over IPv6. Moreover, their latency is lower.6 In the US, all the nationwide mobile providers have IPv6 enabled. For France, we can refer to the annual ARCEP report: in 2022, 72% of fixed users and 60% of mobile users had IPv6 enabled, with projections of 94% and 88% for 2025. Starting from this projection, since all mobile users go through a network translation device, content providers can deliver a better service for 88% of them by exposing their services over IPv6. If we exclude Orange, which has 40% of the market share on consumer fixed access, enabling IPv6 should positively impact more than 55% of fixed access users.
In conclusion, content providers aiming for the best user experience should expose their services over IPv6. By avoiding translation devices, they can ensure fast and reliable content delivery. This is crucial for latency-sensitive applications, like live streaming, but also for websites in competitive markets, where even slight delays can lead to user disengagement.

  1. A way to limit this complexity is to build IPv6 services and only provide IPv4 through reverse proxies at the edge.
  2. In France, this includes non-profit ISPs, like FDN and Milkywan. Additionally, Orange, the previously state-owned telecom provider, supplies non-shared IPv4 addresses. Free also provides a dedicated IPv4 address for customers connected to the point-to-point FTTH access.
  3. I use the term NAT instead of the more correct term NAPT. Feel free to do a mental substitution. If you are curious, check RFC 2663. For a survey of the IPv6 transition technologies enumerated here, have a look at RFC 9313.
  4. For AS 12322, Google, Netflix, and Meta are delivering 85% of their traffic over IPv6. Also, more than half of our traffic is delivered over IPv6.
  5. An additional reason is for fighting abuse: blacklisting an IPv4 address may impact unrelated users who share the same IPv4 as the culprits.
  6. IPv6 may not be the sole reason the latency is lower: users with IPv6 generally have a better connection.

9 May 2024

Vincent Sanders: Bee to the blossom, moth to the flame; Each to his passion; what's in a name?

I like the sentiment of Helen Hunt Jackson in that quote and it generally applies double for computer system names. However I like to think when I named the first NetSurf VM host server phoenix fourteen years ago I captured the nature of its continuous cycle of replacement.
Image of the fourth phoenix server
We have been very fortunate to receive a donated server to replace the previous every few years and the very generous folks at Collabora continue to provide hosting for it.Recently I replaced the server for the third time. We once again were given a replacement by Huw Jones in the form of a SuperServer 6017R-TDAF system with dual Intel Xeon Ivy Bridge E5-2680v2 processors. There were even rack rails!

The project bought some NVMe drives and an adaptor cards and I attempted to arrange to swap out the server in January.

The old phoenixiii server being replaced
Here we come to the slight disadvantage of an informal arrangement where access to the system depends upon a busy third party. Unfortunately it took until May to arrange access (I must thank Vivek again for coming in on a Saturday to do this)

In the intervening time, once I realised access was going to become increasingly difficult, I decided to obtain as good a system as I could manage to reduce requirements for future access.

I turned to eBay and acquired a slightly more modern SuperServer with dual Intel Xeon Haswell E5-2680v3 processors which required purchase of 64G of new memory (Haswell is a DDR4 platform).

I had wanted to use Broadwell processors but this exceeded my budget and would only be a 10% performance uplift (The chassis, motherboard and memory cost 180 and another 50 for processors was just too much, maybe next time)

graph of cpu mark improvements in the phoenix servers over time
While making the decision on the processor selection I made a quick chart of previous processing capabilities (based on a passmark comparison) of phoenix servers and was startled to discover I needed a logarithmic vertical axis. Multi core performance of processors has improved at a startling rate in the last decade.

When the original replacement was donated I checked where the performance was limited and noticed it was mainly in disc access which is what prompted the upgrade to NVMe (2 gigabytes a second peek read throughput) which moved the bottleneck to the processors where, even with the upgrades, it remains.

I do not really know if there is a conclusion here beyond noting NetSurf is very fortunate as a project to have some generous benefactors both for donating hardware and hosting for which I know all the developers are grateful.

Now I just need to go and migrate a huge bunch of virtual machines and associated sysadmin to make use of these generous donations.

22 April 2024

Vincent Fourmond: QSoas version 3.3 is out

Version 3.3 brings in new features, including reverse Laplace transforms and fits, pH fits, commands for picking points from a dataset, averaging points with the same X value, or perform singular value decomposition. In addition to these new features, many previous commands were improved, like the addition of a bandcut filter in FFT filtering, better handling of the loading of files produced by QSoas itself, and a button to interrupt the processing of scripts. There are a lot of other new features, improvements and so on, look for the full list there. About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.3. You can download for free its source code or precompiled versions for MacOS and Windows there. Alternatively, you can clone from the GitHub repository.

5 November 2023

Vincent Bernat: Non-interactive SSH password authentication

SSH offers several forms of authentication, such as passwords and public keys. The latter are considered more secure. However, password authentication remains prevalent, particularly with network equipment.1 A classic solution to avoid typing a password for each connection is sshpass, or its more correct variant passh. Here is a wrapper for Zsh, getting the password from pass, a simple password manager:2
pssh()  
  passh -p <(pass show network/ssh/password   head -1) ssh "$@"
 
compdef pssh=ssh
This approach is a bit brittle as it requires to parse the output of the ssh command to look for a password prompt. Moreover, if no password is required, the password manager is still invoked. Since OpenSSH 8.4, we can use SSH_ASKPASS and SSH_ASKPASS_REQUIRE instead:
ssh()  
  set -o localoptions -o localtraps
  local passname=network/ssh/password
  local helper=$(mktemp)
  trap "command rm -f $helper" EXIT INT
  > $helper <<EOF
#!$SHELL
pass show $passname   head -1
EOF
  chmod u+x $helper
  SSH_ASKPASS=$helper SSH_ASKPASS_REQUIRE=force command ssh "$@"
 
If the password is incorrect, we can display a prompt on the second tentative:
ssh()  
  set -o localoptions -o localtraps
  local passname=network/ssh/password
  local helper=$(mktemp)
  trap "command rm -f $helper" EXIT INT
  > $helper <<EOF
#!$SHELL
if [ -k $helper ]; then
   
    oldtty=\$(stty -g)
    trap 'stty \$oldtty < /dev/tty 2> /dev/null' EXIT INT TERM HUP
    stty -echo
    print "\rpassword: "
    read password
    printf "\n"
    > /dev/tty < /dev/tty
  printf "%s" "\$password"
else
  pass show $passname   head -1
  chmod +t $helper
fi
EOF
  chmod u+x $helper
  SSH_ASKPASS=$helper SSH_ASKPASS_REQUIRE=force command ssh "$@"
 
A possible improvement is to use a different password entry depending on the remote host:3
ssh()  
  # Grab login information
  local -A details
  details=($ =$ (M)$ :-"$ (@f)$(command ssh -G "$@" 2>/dev/null) " :#(host hostname user) * )
  local remote=$ details[host]:-details[hostname] 
  local login=$ details[user] @$ remote 
  # Get password name
  local passname
  case "$login" in
    admin@*.example.net)  passname=company1/ssh/admin ;;
    bernat@*.example.net) passname=company1/ssh/bernat ;;
    backup@*.example.net) passname=company1/ssh/backup ;;
  esac
  # No password name? Just use regular SSH
  [[ -z $passname ]] &&  
    command ssh "$@"
    return $?
   
  # Invoke SSH with the helper for SSH_ASKPASS
  # [ ]
 
It is also possible to make scp invoke our custom ssh function:
scp()  
  set -o localoptions -o localtraps
  local helper=$(mktemp)
  trap "command rm -f $helper" EXIT INT
  > $helper <<EOF 
#!$SHELL
source $ (%):-%x 
ssh "\$@"
EOF
  command scp -S $helper "$@"
 
For the complete code, have a look at my zshrc. As an alternative, you can put the ssh() function body into its own script file and replace command ssh with /usr/bin/ssh to avoid an unwanted recursive call. In this case, the scp() function is not needed anymore.

  1. First, some vendors make it difficult to associate an SSH key with a user. Then, many vendors do not support certificate-based authentication, making it difficult to scale. Finally, interactions between public-key authentication and finer-grained authorization methods like TACACS+ and Radius are still uncharted territory.
  2. The clear-text password never appears on the command line, in the environment, or on the disk, making it difficult for a third party without elevated privileges to capture it. On Linux, Zsh provides the password through a file descriptor.
  3. To decipher the fourth line, you may get help from print -l and the zshexpn(1) manual page. details is an associative array defined from an array alternating keys and values.

8 September 2023

Reproducible Builds: Reproducible Builds in August 2023

Welcome to the August 2023 report from the Reproducible Builds project! In these reports we outline the most important things that we have been up to over the past month. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please visit our Contribute page on our website.

Rust serialisation library moving to precompiled binaries Bleeping Computer reported that Serde, a popular Rust serialization framework, had decided to ship its serde_derive macro as a precompiled binary. As Ax Sharma writes:
The move has generated a fair amount of push back among developers who worry about its future legal and technical implications, along with a potential for supply chain attacks, should the maintainer account publishing these binaries be compromised.
After intensive discussions, use of the precompiled binary was phased out.

Reproducible builds, the first ten years On August 4th, Holger Levsen gave a talk at BornHack 2023 on the Danish island of Funen titled Reproducible Builds, the first ten years which promised to contain:
[ ] an overview about reproducible builds, the past, the presence and the future. How it started with a small [meeting] at DebConf13 (and before), how it grew from being a Debian effort to something many projects work on together, until in 2021 it was mentioned in an executive order of the president of the United States. (HTML slides)
Holger repeated the talk later in the month at Chaos Communication Camp 2023 in Zehdenick, Germany: A video of the talk is available online, as are the HTML slides.

Reproducible Builds Summit Just another reminder that our upcoming Reproducible Builds Summit is set to take place from October 31st November 2nd 2023 in Hamburg, Germany. Our summits are a unique gathering that brings together attendees from diverse projects, united by a shared vision of advancing the Reproducible Builds effort. During this enriching event, participants will have the opportunity to engage in discussions, establish connections and exchange ideas to drive progress in this vital field. If you re interested in joining us this year, please make sure to read the event page, the news item, or the invitation email that Mattia Rizzolo sent out, which have more details about the event and location. We are also still looking for sponsors to support the event, so do reach out to the organizing team if you are able to help. (Also of note that PackagingCon 2023 is taking place in Berlin just before our summit, and their schedule has just been published.)

Vagrant Cascadian on the Sustain podcast Vagrant Cascadian was interviewed on the SustainOSS podcast on reproducible builds:
Vagrant walks us through his role in the project where the aim is to ensure identical results in software builds across various machines and times, enhancing software security and creating a seamless developer experience. Discover how this mission, supported by the Software Freedom Conservancy and a broad community, is changing the face of Linux distros, Arch Linux, openSUSE, and F-Droid. They also explore the challenges of managing random elements in software, and Vagrant s vision to make reproducible builds a standard best practice that will ideally become automatic for users. Vagrant shares his work in progress and their commitment to the last mile problem.
The episode is available to listen (or download) from the Sustain podcast website. As it happens, the episode was recorded at FOSSY 2023, and the video of Vagrant s talk from this conference (Breaking the Chains of Trusting Trust is now available on Archive.org: It was also announced that Vagrant Cascadian will be presenting at the Open Source Firmware Conference in October on the topic of Reproducible Builds All The Way Down.

On our mailing list Carles Pina i Estany wrote to our mailing list during August with an interesting question concerning the practical steps to reproduce the hello-traditional package from Debian. The entire thread can be viewed from the archive page, as can Vagrant Cascadian s reply.

Website updates Rahul Bajaj updated our website to add a series of environment variations related to reproducible builds [ ], Russ Cox added the Go programming language to our projects page [ ] and Vagrant Cascadian fixed a number of broken links and typos around the website [ ][ ][ ].

Software development In diffoscope development this month, versions 247, 248 and 249 were uploaded to Debian unstable by Chris Lamb, who also added documentation for the new specialize_as method and expanding the documentation of the existing specialize as well [ ]. In addition, Fay Stegerman added specialize_as and used it to optimise .smali comparisons when decompiling Android .apk files [ ], Felix Yan and Mattia Rizzolo corrected some typos in code comments [ , ], Greg Chabala merged the RUN commands into single layer in the package s Dockerfile [ ] thus greatly reducing the final image size. Lastly, Roland Clobus updated tool descriptions to mark that the xb-tool has moved package within Debian [ ].
reprotest is our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. This month, Vagrant Cascadian updated the packaging to be compatible with Tox version 4. This was originally filed as Debian bug #1042918 and Holger Levsen uploaded this to change to Debian unstable as version 0.7.26 [ ].

Distribution work In Debian, 28 reviews of Debian packages were added, 14 were updated and 13 were removed this month adding to our knowledge about identified issues. A number of issue types were added, including Chris Lamb adding a new timestamp_in_documentation_using_sphinx_zzzeeksphinx_theme toolchain issue.
In August, F-Droid added 25 new reproducible apps and saw 2 existing apps switch to reproducible builds, making 191 apps in total that are published with Reproducible Builds and using the upstream developer s signature. [ ]
Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In August, a number of changes were made by Holger Levsen:
  • Debian-related changes:
    • Disable Debian live image creation jobs until an OpenQA credential problem has been fixed. [ ]
    • Run our maintenance scripts every 3 hours instead of every 2. [ ]
    • Export data for unstable to the reproducible-tracker.json data file. [ ]
    • Stop varying the build path, we want reproducible builds. [ ]
    • Temporarily stop updating the pbuilder.tgz for Debian unstable due to #1050784. [ ][ ]
    • Correctly document that we are not variying usrmerge. [ ][ ]
    • Mark two armhf nodes (wbq0 and jtx1a) as down; investigation is needed. [ ]
  • Misc:
    • Force reconfiguration of all Jenkins jobs, due to the recent rise of zombie processes. [ ]
    • In the node health checks, also try to restart failed ntpsec, postfix and vnstat services. [ ][ ][ ]
  • System health checks:
    • Detect Debian live build failures due to missing credentials. [ ][ ]
    • Ignore specific types of known zombie processes. [ ][ ]
In addition, Vagrant Cascadian updated the scripts to use a predictable build path that is consistent with the one used on buildd.debian.org. [ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

27 March 2023

Vincent Fourmond: QSoas version 3.2 is out

Version 3.2 of QSoas is out ! It is mostly a bug-fix release, fixing the computation mistake found in the eecr-relay wave shape fit, see the correction to our initial article in JACS. We strongly encourage all the users of the eecr-relay wave shape fit to upgrade, and, unfortunately, refit previously fitted data as the results might change. The other wave shape fits are not affected by the issue. New features In addition to this important bug fix, new possibilities have been added, including a way to make fits with partially global parameters using the new define-indexed-fit command, to pick the best parameters dataset-by-dataset within fit trajectories, but also a parameter space explorer trying all possible permutations of one or more sets of parameters, and the possibility to save the results of a command to a global ruby variable. There are a lot of other new features, improvements and so on, look for the full list there. About QSoas
QSoas is a powerful open source data analysis program that focuses on flexibility and powerful fitting capacities. It is released under the GNU General Public License. It is described in Fourmond, Anal. Chem., 2016, 88 (10), pp 5050 5052. Current version is 3.2. You can download for free its source code or precompiled versions for MacOS and Windows there. Alternatively, you can clone from the GitHub repository.

6 March 2023

Vincent Bernat: DDoS detection and remediation with Akvorado and Flowspec

Akvorado collects sFlow and IPFIX flows, stores them in a ClickHouse database, and presents them in a web console. Although it lacks built-in DDoS detection, it s possible to create one by crafting custom ClickHouse queries.

DDoS detection Let s assume we want to detect DDoS targeting our customers. As an example, we consider a DDoS attack as a collection of flows over one minute targeting a single customer IP address, from a single source port and matching one of these conditions:
  • an average bandwidth of 1 Gbps,
  • an average bandwidth of 200 Mbps when the protocol is UDP,
  • more than 20 source IP addresses and an average bandwidth of 100 Mbps, or
  • more than 10 source countries and an average bandwidth of 100 Mbps.
Here is the SQL query to detect such attacks over the last 5 minutes:
SELECT *
FROM (
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    SrcPort,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SUM(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    uniq(SrcAddr) AS sources,
    uniq(SrcCountry) AS countries
  FROM flows
  WHERE TimeReceived > now() - INTERVAL 5 MINUTE
    AND DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    SrcPort,
    Proto
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC
Here is an example output1 where two of our users are under attack. One from what looks like an NTP amplification attack, the other from a DNS amplification attack:
TimeReceived DstAddr SrcPort Proto Gbps sources countries
2023-02-26 17:44:00 ::ffff:203.0.113.206 123 UDP 0.102 109 13
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.130 133 17
2023-02-26 17:43:00 ::ffff:203.0.113.68 53 UDP 0.129 364 63
2023-02-26 17:43:00 ::ffff:203.0.113.206 123 UDP 0.113 129 21
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.139 50 14
2023-02-26 17:42:00 ::ffff:203.0.113.206 123 UDP 0.105 42 14
2023-02-26 17:40:00 ::ffff:203.0.113.68 53 UDP 0.121 340 65

DDoS remediation Once detected, there are at least two ways to stop the attack at the network level:
  • blackhole the traffic to the targeted user (RTBH), or
  • selectively drop packets matching the attack patterns (Flowspec).

Traffic blackhole The easiest method is to sacrifice the attacked user. While this helps the attacker, this protects your network. It is a method supported by all routers. You can also offload this protection to many transit providers. This is useful if the attack volume exceeds your internet capacity. This works by advertising with BGP a route to the attacked user with a specific community. The border router modifies the next hop address of these routes to a specific IP address configured to forward the traffic to a null interface. RFC 7999 defines 65535:666 for this purpose. This is known as a remote-triggered blackhole (RTBH) and is explained in more detail in RFC 3882. It is also possible to blackhole the source of the attacks by leveraging unicast Reverse Path Forwarding (uRPF) from RFC 3704, as explained in RFC 5635. However, uRPF can be a serious tax on your router resources. See NCS5500 uRPF: Configuration and Impact on Scale for an example of the kind of restrictions you have to expect when enabling uRPF. On the advertising side, we can use BIRD. Here is a complete configuration file to allow any router to collect them:
log stderr all;
router id 192.0.2.1;
protocol device  
  scan time 10;
 
protocol bgp exporter  
  ipv4  
    import none;
    export where proto = "blackhole4";
   ;
  ipv6  
    import none;
    export where proto = "blackhole6";
   ;
  local as 64666;
  neighbor range 192.0.2.0/24 external;
  multihop;
  dynamic name "exporter";
  dynamic name digits 2;
  graceful restart yes;
  graceful restart time 0;
  long lived graceful restart yes;
  long lived stale time 3600;  # keep routes for 1 hour!
 
protocol static blackhole4  
  ipv4;
  route 203.0.113.206/32 blackhole  
    bgp_community.add((65535, 666));
   ;
  route 203.0.113.68/32 blackhole  
    bgp_community.add((65535, 666));
   ;
 
protocol static blackhole6  
  ipv6;
 
We use BGP long-lived graceful restart to ensure routes are kept for one hour, even if the BGP connection goes down, notably during maintenance. On the receiver side, if you have a Cisco router running IOS XR, you can use the following configuration to blackhole traffic received on the BGP session. As the BGP session is dedicated to this usage, The community is not used, but you can also forward these routes to your transit providers.
router static
 vrf public
  address-family ipv4 unicast
   192.0.2.1/32 Null0 description "BGP blackhole"
  !
  address-family ipv6 unicast
   2001:db8::1/128 Null0 description "BGP blackhole"
  !
 !
!
route-policy blackhole_ipv4_in_public
  if destination in (0.0.0.0/0 le 31) then
    drop
  endif
  set next-hop 192.0.2.1
  done
end-policy
!
route-policy blackhole_ipv6_in_public
  if destination in (::/0 le 127) then
    drop
  endif
  set next-hop 2001:db8::1
  done
end-policy
!
router bgp 12322
 neighbor-group BLACKHOLE_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv4_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
  address-family ipv6 unicast
   maximum-prefix 100 90
   route-policy blackhole_ipv6_in_public in
   route-policy drop out
   long-lived-graceful-restart stale-time send 86400 accept 86400
  !
 !
 vrf public
  neighbor 192.0.2.1
   use neighbor-group BLACKHOLE_IPV4_PUBLIC
   description akvorado-1
When the traffic is blackholed, it is still reported by IPFIX and sFlow. In Akvorado, use ForwardingStatus >= 128 as a filter. While this method is compatible with all routers, it makes the attack successful as the target is completely unreachable. If your router supports it, Flowspec can selectively filter flows to stop the attack without impacting the customer.

Flowspec Flowspec is defined in RFC 8955 and enables the transmission of flow specifications in BGP sessions. A flow specification is a set of matching criteria to apply to IP traffic. These criteria include the source and destination prefix, the IP protocol, the source and destination port, and the packet length. Each flow specification is associated with an action, encoded as an extended community: traffic shaping, traffic marking, or redirection. To announce flow specifications with BIRD, we extend our configuration. The extended community used shapes the matching traffic to 0 bytes per second.
flow4 table flowtab4;
flow6 table flowtab6;
protocol bgp exporter  
  flow4  
    import none;
    export where proto = "flowspec4";
   ;
  flow6  
    import none;
    export where proto = "flowspec6";
   ;
  # [ ]
 
protocol static flowspec4  
  flow4;
  route flow4  
    dst 203.0.113.68/32;
    sport = 53;
    length >= 1476 && <= 1500;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
  route flow4  
    dst 203.0.113.206/32;
    sport = 123;
    length = 468;
    proto = 17;
   
    bgp_ext_community.add((generic, 0x80060000, 0x00000000));
   ;
 
protocol static flowspec6  
  flow6;
 
If you have a Cisco router running IOS XR, the configuration may look like this:
vrf public
 address-family ipv4 flowspec
 address-family ipv6 flowspec
!
router bgp 12322
 address-family vpnv4 flowspec
 address-family vpnv6 flowspec
 neighbor-group FLOWSPEC_IPV4_PUBLIC
  remote-as 64666
  ebgp-multihop 255
  update-source Loopback10
  address-family ipv4 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
  address-family ipv6 flowspec
   long-lived-graceful-restart stale-time send 86400 accept 86400
   route-policy accept in
   route-policy drop out
   maximum-prefix 100 90
   validation disable
  !
 !
 vrf public
  address-family ipv4 flowspec
  address-family ipv6 flowspec
  neighbor 192.0.2.1
   use neighbor-group FLOWSPEC_IPV4_PUBLIC
   description akvorado-1
Then, you need to enable Flowspec on all interfaces with:
flowspec
 vrf public
  address-family ipv4
   local-install interface-all
  !
  address-family ipv6
   local-install interface-all
  !
 !
!
As with the RTBH setup, you can filter dropped flows with ForwardingStatus >= 128.

DDoS detection (continued) In the example using Flowspec, the flows were also filtered on the length of the packet:
route flow4  
  dst 203.0.113.68/32;
  sport = 53;
  length >= 1476 && <= 1500;
  proto = 17;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
This is an important addition: legitimate DNS requests are smaller than this and therefore not filtered.2 With ClickHouse, you can get the 10th and 90th percentiles of the packet sizes with quantiles(0.1, 0.9)(Bytes/Packets). The last issue we need to tackle is how to optimize the request: it may need several seconds to collect the data and it is likely to consume substantial resources from your ClickHouse database. One solution is to create a materialized view to pre-aggregate results:
CREATE TABLE ddos_logs (
  TimeReceived DateTime,
  DstAddr IPv6,
  Proto UInt32,
  SrcPort UInt16,
  Gbps SimpleAggregateFunction(sum, Float64),
  Mpps SimpleAggregateFunction(sum, Float64),
  sources AggregateFunction(uniqCombined(12), IPv6),
  countries AggregateFunction(uniqCombined(12), FixedString(2)),
  size AggregateFunction(quantiles(0.1, 0.9), UInt64)
) ENGINE = SummingMergeTree
PARTITION BY toStartOfHour(TimeReceived)
ORDER BY (TimeReceived, DstAddr, Proto, SrcPort)
TTL toStartOfHour(TimeReceived) + INTERVAL 6 HOUR DELETE ;
CREATE MATERIALIZED VIEW ddos_logs_view TO ddos_logs AS
  SELECT
    toStartOfMinute(TimeReceived) AS TimeReceived,
    DstAddr,
    Proto,
    SrcPort,
    sum(((((Bytes * SamplingRate) * 8) / 1000) / 1000) / 1000) / 60 AS Gbps,
    sum(((Packets * SamplingRate) / 1000) / 1000) / 60 AS Mpps,
    uniqCombinedState(12)(SrcAddr) AS sources,
    uniqCombinedState(12)(SrcCountry) AS countries,
    quantilesState(0.1, 0.9)(toUInt64(Bytes/Packets)) AS size
  FROM flows
  WHERE DstNetRole = 'customers'
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
The ddos_logs table is using the SummingMergeTree engine. When the table receives new data, ClickHouse replaces all the rows with the same sorting key, as defined by the ORDER BY directive, with one row which contains summarized values using either the sum() function or the explicitly specified aggregate function (uniqCombined and quantiles in our example).3 Finally, we can modify our initial query with the following one:
SELECT *
FROM (
  SELECT
    TimeReceived,
    DstAddr,
    dictGetOrDefault('protocols', 'name', Proto, '???') AS Proto,
    SrcPort,
    sum(Gbps) AS Gbps,
    sum(Mpps) AS Mpps,
    uniqCombinedMerge(12)(sources) AS sources,
    uniqCombinedMerge(12)(countries) AS countries,
    quantilesMerge(0.1, 0.9)(size) AS size
  FROM ddos_logs
  WHERE TimeReceived > now() - INTERVAL 60 MINUTE
  GROUP BY
    TimeReceived,
    DstAddr,
    Proto,
    SrcPort
)
WHERE (Gbps > 1)
   OR ((Proto = 'UDP') AND (Gbps > 0.2)) 
   OR ((sources > 20) AND (Gbps > 0.1)) 
   OR ((countries > 10) AND (Gbps > 0.1))
ORDER BY
  TimeReceived DESC,
  Gbps DESC

Gluing everything together To sum up, building an anti-DDoS system requires to following these steps:
  1. define a set of criteria to detect a DDoS attack,
  2. translate these criteria into SQL requests,
  3. pre-aggregate flows into SummingMergeTree tables,
  4. query and transform the results to a BIRD configuration file, and
  5. configure your routers to pull the routes from BIRD.
A Python script like the following one can handle the fourth step. For each attacked target, it generates both a Flowspec rule and a blackhole route.
import socket
import types
from clickhouse_driver import Client as CHClient
# Put your SQL query here!
SQL_QUERY = " "
# How many anti-DDoS rules we want at the same time?
MAX_DDOS_RULES = 20
def empty_ruleset():
    ruleset = types.SimpleNamespace()
    ruleset.flowspec = types.SimpleNamespace()
    ruleset.blackhole = types.SimpleNamespace()
    ruleset.flowspec.v4 = []
    ruleset.flowspec.v6 = []
    ruleset.blackhole.v4 = []
    ruleset.blackhole.v6 = []
    return ruleset
current_ruleset = empty_ruleset()
client = CHClient(host="clickhouse.akvorado.net")
while True:
    results = client.execute(SQL_QUERY)
    seen =  
    new_ruleset = empty_ruleset()
    for (t, addr, proto, port, gbps, mpps, sources, countries, size) in results:
        if (addr, proto, port) in seen:
            continue
        seen[(addr, proto, port)] = True
        # Flowspec
        if addr.ipv4_mapped:
            address = addr.ipv4_mapped
            rules = new_ruleset.flowspec.v4
            table = "flow4"
            mask = 32
            nh = "proto"
        else:
            address = addr
            rules = new_ruleset.flowspec.v6
            table = "flow6"
            mask = 128
            nh = "next header"
        if size[0] == size[1]:
            length = f"length =  int(size[0]) "
        else:
            length = f"length >=  int(size[0])  && <=  int(size[1]) "
        header = f"""
# Time:  t 
# Source:  address , protocol:  proto , port:  port 
# Gbps/Mpps:  gbps:.3 / mpps:.3 , packet size:  int(size[0]) <=X<= int(size[1]) 
# Flows:  flows , sources:  sources , countries:  countries 
"""
        rules.append(
                f""" header 
route  table   
  dst  address / mask ;
  sport =  port ;
   length ;
   nh  =  socket.getprotobyname(proto) ;
 
  bgp_ext_community.add((generic, 0x80060000, 0x00000000));
 ;
"""
        )
        # Blackhole
        if addr.ipv4_mapped:
            rules = new_ruleset.blackhole.v4
        else:
            rules = new_ruleset.blackhole.v6
        rules.append(
            f""" header 
route  address / mask  blackhole  
  bgp_community.add((65535, 666));
 ;
"""
        )
        new_ruleset.flowspec.v4 = list(
            set(new_ruleset.flowspec.v4[:MAX_DDOS_RULES])
        )
        new_ruleset.flowspec.v6 = list(
            set(new_ruleset.flowspec.v6[:MAX_DDOS_RULES])
        )
        # TODO: advertise changes by mail, chat, ...
        current_ruleset = new_ruleset
        changes = False
        for rules, path in (
            (current_ruleset.flowspec.v4, "v4-flowspec"),
            (current_ruleset.flowspec.v6, "v6-flowspec"),
            (current_ruleset.blackhole.v4, "v4-blackhole"),
            (current_ruleset.blackhole.v6, "v6-blackhole"),
        ):
            path = os.path.join("/etc/bird/", f" path .conf")
            with open(f" path .tmp", "w") as f:
                for r in rules:
                    f.write(r)
            changes = (
                changes or not os.path.exists(path) or not samefile(path, f" path .tmp")
            )
            os.rename(f" path .tmp", path)
        if not changes:
            continue
        proc = subprocess.Popen(
            ["birdc", "configure"],
            stdin=subprocess.DEVNULL,
            stdout=subprocess.PIPE,
            stderr=subprocess.PIPE,
        )
        stdout, stderr = proc.communicate(None)
        stdout = stdout.decode("utf-8", "replace")
        stderr = stderr.decode("utf-8", "replace")
        if proc.returncode != 0:
            logger.error(
                "  error:\n \n ".format(
                    "birdc reconfigure",
                    "\n".join(
                        [" O:  ".format(line) for line in stdout.rstrip().split("\n")]
                    ),
                    "\n".join(
                        [" E:  ".format(line) for line in stderr.rstrip().split("\n")]
                    ),
                )
            )

Until Akvorado integrates DDoS detection and mitigation, the ideas presented in this blog post provide a solid foundation to get started with your own anti-DDoS system.

  1. ClickHouse can export results using Markdown format when appending FORMAT Markdown to the query.
  2. While most DNS clients should retry with TCP on failures, this is not always the case: until recently, musl libc did not implement this.
  3. The materialized view also aggregates the data at hand, both for efficiency and to ensure we work with the right data types.

13 February 2023

Vincent Bernat: Building a SQL-like language to filter flows

Akvorado collects network flows using IPFIX or sFlow. It stores them in a ClickHouse database. A web console allows a user to query the data and plot some graphs. A nice aspect of this console is how we can filter flows with a SQL-like language:
Filter editor in Akvorado console
Often, web interfaces expose a query builder to build such filters. I think combining a SQL-like language with an editor supporting completion, syntax highlighting, and linting is a better approach.1 The language parser is built with pigeon (Go) from a parsing expression grammar or PEG. The editor component is CodeMirror (TypeScript).

Language parser PEG grammars are relatively recent2 and are an alternative to context-free grammars. They are easier to write and they can generate better error messages. Python switched from an LL(1)-based parser to a PEG-based parser in Python 3.9. pigeon generates a parser for Go. A grammar is a set of rules. Each rule is an identifier, with an optional user-friendly label for error messages, an expression, and an action in Go to be executed on match. You can find the complete grammar in parser.peg. Here is a simplified rule:
ConditionIPExpr "condition on IP"  
  column:("ExporterAddress"i   return "ExporterAddress", nil  
        / "SrcAddr"i   return "SrcAddr", nil  
        / "DstAddr"i   return "DstAddr", nil  ) _ 
  operator:("=" / "!=") _ 
  ip:IP  
    return fmt.Sprintf("%s %s IPv6StringToNum(%s)",
      toString(column), toString(operator), quote(ip)), nil
   
The rule identifier is ConditionIPExpr. It case-insensitively matches ExporterAddress, SrcAddr, or DstAddr. The action for each case returns the proper case for the column name. That s what is stored in the column variable. Then, it matches one of the possible operators. As there is no code block, it stores the matched string directly in the operator variable. Then, it tries to match the IP rule, which is defined elsewhere in the grammar. If it succeeds, it stores the result of the match in the ip variable and executes the final action. The action turns the column, operator, and IP into a proper expression for ClickHouse. For example, if we have ExporterAddress = 203.0.113.15, we get ExporterAddress = IPv6StringToNum('203.0.113.15'). The IP rule uses a rudimentary regular expression but checks if the matched address is correct in the action block, thanks to netip.ParseAddr():
IP "IP address"   [0-9A-Fa-f:.]+  
  ip, err := netip.ParseAddr(string(c.text))
  if err != nil  
    return "", errors.New("expecting an IP address")
   
  return ip.String(), nil
 
Our parser safely turns the filter into a WHERE clause accepted by ClickHouse:3
WHERE InIfBoundary = 'external' 
AND ExporterRegion = 'france' 
AND InIfConnectivity = 'transit' 
AND SrcAS = 15169 
AND DstAddr BETWEEN toIPv6('2a01:e0f:ffff::') 
                AND toIPv6('2a01:e0f:ffff:ffff:ffff:ffff:ffff:ffff')

Integration in CodeMirror CodeMirror is a versatile code editor that can be easily integrated into JavaScript projects. In Akvorado, the Vue.js component, InputFilter, uses CodeMirror as its foundation and leverages features such as syntax highlighting, linting, and completion. The source code for these capabilities can be found in the codemirror/lang-filter/ directory.

Syntax highlighting The PEG grammar for Go cannot be utilized directly4 and the requirements for parsers for editors are distinct: they should be error-tolerant and operate incrementally, as code is typically updated character by character. CodeMirror offers a solution through its own parser generator, Lezer. We don t need this additional parser to fully understand the filter language. Only the basic structure is needed: column names, comparison and logic operators, quoted and unquoted values. The grammar is therefore quite short and does not need to be updated often:
@top Filter  
  expression
 
expression  
 Not expression  
 "(" expression ")"  
 "(" expression ")" And expression  
 "(" expression ")" Or expression  
 comparisonExpression And expression  
 comparisonExpression Or expression  
 comparisonExpression
 
comparisonExpression  
 Column Operator Value
 
Value  
  String   Literal   ValueLParen ListOfValues ValueRParen
 
ListOfValues  
  ListOfValues ValueComma (String   Literal)  
  String   Literal
 
// [ ]
@tokens  
  // [ ]
  Column   std.asciiLetter (std.asciiLetter std.digit)*  
  Operator   $[a-zA-Z!=><]+  
  String  
    '"' (![\\\n"]   "\\" _)* '"'?  
    "'" (![\\\n']   "\\" _)* "'"?
   
  Literal   (std.digit   std.asciiLetter   $[.:/])+  
  // [ ]
 
The expression SrcAS = 12322 AND (DstAS = 1299 OR SrcAS = 29447) is parsed to:
Filter(Column, Operator, Value(Literal),
  And, Column, Operator, Value(Literal),
  Or, Column, Operator, Value(Literal))
The last step is to teach CodeMirror how to map each token to a highlighting tag:
export const FilterLanguage = LRLanguage.define( 
  parser: parser.configure( 
    props: [
      styleTags( 
        Column: t.propertyName,
        String: t.string,
        Literal: t.literal,
        LineComment: t.lineComment,
        BlockComment: t.blockComment,
        Or: t.logicOperator,
        And: t.logicOperator,
        Not: t.logicOperator,
        Operator: t.compareOperator,
        "( )": t.paren,
       ),
    ],
   ),
 );

Linting We offload linting to the original parser in Go. The /api/v0/console/filter/validate endpoint accepts a filter and returns a JSON structure with the errors that were found:
 
  "message": "at line 1, position 12: string literal not terminated",
  "errors": [ 
    "line":    1,
    "column":  12,
    "offset":  11,
    "message": "string literal not terminated",
   ]
 
The linter source for CodeMirror queries the API and turns each error into a diagnostic.

Completion The completion system takes a hybrid approach. It splits the work between the frontend and the backend to offer useful suggestions for completing filters. The frontend uses the parser built with Lezer to determine the context of the completion: do we complete a column name, an operator, or a value? It also extracts the column name if we are completing something else. It forwards the result to the backend through the /api/v0/console/filter/complete endpoint. Walking the syntax tree was not as easy as I thought, but unit tests helped a lot. The backend uses the parser generated by pigeon to complete a column name or a comparison operator. For values, the completions are either static or extracted from the ClickHouse database. A user can complete an AS number from an organization name thanks to the following snippet:
results := []struct  
  Label  string  ch:"label" 
  Detail string  ch:"detail" 
 
columnName := "DstAS"
sqlQuery := fmt.Sprintf( 
 SELECT concat('AS', toString(%s)) AS label, dictGet('asns', 'name', %s) AS detail
 FROM flows
 WHERE TimeReceived > date_sub(minute, 1, now())
 AND detail != ''
 AND positionCaseInsensitive(detail, $1) >= 1
 GROUP BY label, detail
 ORDER BY COUNT(*) DESC
 LIMIT 20
 , columnName, columnName)
if err := conn.Select(ctx, &results, sqlQuery, input.Prefix); err != nil  
  c.r.Err(err).Msg("unable to query database")
  break
 
for _, result := range results  
  completions = append(completions, filterCompletion 
    Label:  result.Label,
    Detail: result.Detail,
    Quoted: false,
   )
 
In my opinion, the completion system is a major factor in making the field editor an efficient way to select flows. While a query builder may have been more beginner-friendly, the completion system s ease of use and functionality make it more enjoyable to use once you become familiar.

  1. Moreover, building a query builder did not seem like a fun task for me.
  2. They were introduced in 2004 in Parsing Expression Grammars: A Recognition-Based Syntactic Foundation. LR parsers were introduced in 1965, LALR parsers in 1969, and LL parsers in the 1970s. Yacc, a popular parser generator, was written in 1975.
  3. The parser returns a string. It does not generate an intermediate AST. This makes it simpler and it currently fits our needs.
  4. It could be manually translated to JavaScript with PEG.js.

11 February 2023

Vincent Bernat: Hacking the Geberit Sigma 70 flush plate

My toilet is equipped with a Geberit Sigma 70 flush plate. The sales pitch for this hydraulic-assisted device praises the ingenious mount that acts like a rocker switch. In practice, the flush is very capricious and has a very high failure rate. Avoid this type of mechanism! Prefer a fully mechanical version like the Geberit Sigma 20. After several plumbers, exchanges with Geberit s technical department, and the expensive replacement of the entire mechanism, I was still getting a failure rate of over 50% for the small flush. I finally managed to decrease this rate to 5% by applying two 8 mm silicone bumpers on the back of the plate. Their locations are indicated by red circles on the picture below:
Geberit Sigma 70 flush plate. Top: the mechanism that converts the mechanical press into a hydraulic impulse. Bottom: the back of the plate with the two places where to apply the bumpers.
Geberit Sigma 70 flush plate. Above: the mechanism installed on the wall. Below, the back of the glass plate. In red, the two places where to apply the silicone bumpers.
Expect to pay about 5 and as many minutes for this operation.

6 February 2023

Vincent Bernat: Fast and dynamic encoding of Protocol Buffers in Go

Protocol Buffers are a popular choice for serializing structured data due to their compact size, fast processing speed, language independence, and compatibility. There exist other alternatives, including Cap n Proto, CBOR, and Avro. Usually, data structures are described in a proto definition file (.proto). The protoc compiler and a language-specific plugin convert it into code:
$ head flow-4.proto
syntax = "proto3";
package decoder;
option go_package = "akvorado/inlet/flow/decoder";
message FlowMessagev4  
  uint64 TimeReceived = 2;
  uint32 SequenceNum = 3;
  uint64 SamplingRate = 4;
  uint32 FlowDirection = 5;
$ protoc -I=. --plugin=protoc-gen-go --go_out=module=akvorado:. flow-4.proto
$ head inlet/flow/decoder/flow-4.pb.go
// Code generated by protoc-gen-go. DO NOT EDIT.
// versions:
//      protoc-gen-go v1.28.0
//      protoc        v3.21.12
// source: inlet/flow/data/schemas/flow-4.proto
package decoder
import (
        protoreflect "google.golang.org/protobuf/reflect/protoreflect"
Akvorado collects network flows using IPFIX or sFlow, decodes them with GoFlow2, encodes them to Protocol Buffers, and sends them to Kafka to be stored in a ClickHouse database. Collecting a new field, such as source and destination MAC addresses, requires modifications in multiple places, including the proto definition file and the ClickHouse migration code. Moreover, the cost is paid by all users.1 It would be nice to have an application-wide schema and let users enable or disable the fields they need. While the main goal is flexibility, we do not want to sacrifice performance. On this front, this is quite a success: when upgrading from 1.6.4 to 1.7.1, the decoding and encoding performance almost doubled!
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt                 final.txt               
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.963    2%   7.836    1%  -39.55% (p=0.000 n=10)
Sflow/with_encoding-12         19.37    1%   10.15    2%  -47.63% (p=0.000 n=10)

Faster Protocol Buffers encoding I use the following code to benchmark both the decoding and encoding process. Initially, the Decode() method is a thin layer above GoFlow2 producer and stores the decoded data into the in-memory structure generated by protoc. Later, some of the data will be encoded directly during flow decoding. This is why we measure both the decoding and the encoding.2
func BenchmarkDecodeEncodeSflow(b *testing.B)  
    r := reporter.NewMock(b)
    sdecoder := sflow.New(r)
    data := helpers.ReadPcapPayload(b,
        filepath.Join("decoder", "sflow", "testdata", "data-1140.pcap"))
    for _, withEncoding := range []bool true, false   
        title := map[bool]string 
            true:  "with encoding",
            false: "without encoding",
         [withEncoding]
        var got []*decoder.FlowMessage
        b.Run(title, func(b *testing.B)  
            for i := 0; i < b.N; i++  
                got = sdecoder.Decode(decoder.RawFlow 
                    Payload: data,
                    Source: net.ParseIP("127.0.0.1"),
                 )
                if withEncoding  
                    for _, flow := range got  
                        buf := []byte 
                        buf = protowire.AppendVarint(buf, uint64(proto.Size(flow)))
                        proto.MarshalOptions .MarshalAppend(buf, flow)
                     
                 
             
         )
     
 
The canonical Go implementation for Protocol Buffers, google.golang.org/protobuf is not the most efficient one. For a long time, people were relying on gogoprotobuf. However, the project is now deprecated. A good replacement is vtprotobuf.3
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              initial.txt               bench-2.txt              
                                sec/op        sec/op     vs base                 
Netflow/with_encoding-12      12.96    2%   10.28    2%  -20.67% (p=0.000 n=10)
Netflow/without_encoding-12   8.935    2%   8.975    2%        ~ (p=0.143 n=10)
Sflow/with_encoding-12        19.37    1%   16.67    2%  -13.93% (p=0.000 n=10)
Sflow/without_encoding-12     14.62    3%   14.87    1%   +1.66% (p=0.007 n=10)

Dynamic Protocol Buffers encoding We have our baseline. Let s see how to encode our Protocol Buffers without a .proto file. The wire format is simple and rely a lot on variable-width integers. Variable-width integers, or varints, are an efficient way of encoding unsigned integers using a variable number of bytes, from one to ten, with small values using fewer bytes. They work by splitting integers into 7-bit payloads and using the 8th bit as a continuation indicator, set to 1 for all payloads except the last.
Variable-width integers encoding in Protocol Buffers: conversion of 150 to a varint
Variable-width integers encoding in Protocol Buffers
For our usage, we only need two types: variable-width integers and byte sequences. A byte sequence is encoded by prefixing it by its length as a varint. When a message is encoded, each key-value pair is turned into a record consisting of a field number, a wire type, and a payload. The field number and the wire type are encoded as a single variable-width integer called a tag.
Message encoded with Protocol Buffers: three varints, two sequences of bytes
Message encoded with Protocol Buffers
We use the following low-level functions to build the output buffer: Our schema abstraction contains the appropriate information to encode a message (ProtobufIndex) and to generate a proto definition file (fields starting with Protobuf):
type Column struct  
    Key       ColumnKey
    Name      string
    Disabled  bool
    // [ ]
    // For protobuf.
    ProtobufIndex    protowire.Number
    ProtobufType     protoreflect.Kind // Uint64Kind, Uint32Kind,  
    ProtobufEnum     map[int]string
    ProtobufEnumName string
    ProtobufRepeated bool
 
We have a few helper methods around the protowire functions to directly encode the fields while decoding the flows. They skip disabled fields or non-repeated fields already encoded. Here is an excerpt of the sFlow decoder:
sch.ProtobufAppendVarint(bf, schema.ColumnBytes, uint64(recordData.Base.Length))
sch.ProtobufAppendVarint(bf, schema.ColumnProto, uint64(recordData.Base.Protocol))
sch.ProtobufAppendVarint(bf, schema.ColumnSrcPort, uint64(recordData.Base.SrcPort))
sch.ProtobufAppendVarint(bf, schema.ColumnDstPort, uint64(recordData.Base.DstPort))
sch.ProtobufAppendVarint(bf, schema.ColumnEType, helpers.ETypeIPv4)
For fields that are required later in the pipeline, like source and destination addresses, they are stored unencoded in a separate structure:
type FlowMessage struct  
    TimeReceived uint64
    SamplingRate uint32
    // For exporter classifier
    ExporterAddress netip.Addr
    // For interface classifier
    InIf  uint32
    OutIf uint32
    // For geolocation or BMP
    SrcAddr netip.Addr
    DstAddr netip.Addr
    NextHop netip.Addr
    // Core component may override them
    SrcAS     uint32
    DstAS     uint32
    GotASPath bool
    // protobuf is the protobuf representation for the information not contained above.
    protobuf      []byte
    protobufSet   bitset.BitSet
 
The protobuf slice holds encoded data. It is initialized with a capacity of 500 bytes to avoid resizing during encoding. There is also some reserved room at the beginning to be able to encode the total size as a variable-width integer. Upon finalizing encoding, the remaining fields are added and the message length is prefixed:
func (schema *Schema) ProtobufMarshal(bf *FlowMessage) []byte  
    schema.ProtobufAppendVarint(bf, ColumnTimeReceived, bf.TimeReceived)
    schema.ProtobufAppendVarint(bf, ColumnSamplingRate, uint64(bf.SamplingRate))
    schema.ProtobufAppendIP(bf, ColumnExporterAddress, bf.ExporterAddress)
    schema.ProtobufAppendVarint(bf, ColumnSrcAS, uint64(bf.SrcAS))
    schema.ProtobufAppendVarint(bf, ColumnDstAS, uint64(bf.DstAS))
    schema.ProtobufAppendIP(bf, ColumnSrcAddr, bf.SrcAddr)
    schema.ProtobufAppendIP(bf, ColumnDstAddr, bf.DstAddr)
    // Add length and move it as a prefix
    end := len(bf.protobuf)
    payloadLen := end - maxSizeVarint
    bf.protobuf = protowire.AppendVarint(bf.protobuf, uint64(payloadLen))
    sizeLen := len(bf.protobuf) - end
    result := bf.protobuf[maxSizeVarint-sizeLen : end]
    copy(result, bf.protobuf[end:end+sizeLen])
    return result
 
Minimizing allocations is critical for maintaining encoding performance. The benchmark tests should be run with the -benchmem flag to monitor allocation numbers. Each allocation incurs an additional cost to the garbage collector. The Go profiler is a valuable tool for identifying areas of code that can be optimized:
$ go test -run=__nothing__ -bench=Netflow/with_encoding \
>         -benchmem -cpuprofile profile.out \
>         akvorado/inlet/flow
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
Netflow/with_encoding-12             143953              7955 ns/op            8256 B/op        134 allocs/op
PASS
ok      akvorado/inlet/flow     1.418s
$ go tool pprof profile.out
File: flow.test
Type: cpu
Time: Feb 4, 2023 at 8:12pm (CET)
Duration: 1.41s, Total samples = 2.08s (147.96%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) web
After using the internal schema instead of code generated from the proto definition file, the performance improved. However, this comparison is not entirely fair as less information is being decoded and previously GoFlow2 was decoding to its own structure, which was then copied to our own version.
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-2.txt                bench-3.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12      10.284    2%   7.758    3%  -24.56% (p=0.000 n=10)
Netflow/without_encoding-12    8.975    2%   7.304    2%  -18.61% (p=0.000 n=10)
Sflow/with_encoding-12         16.67    2%   14.26    1%  -14.50% (p=0.000 n=10)
Sflow/without_encoding-12      14.87    1%   13.56    2%   -8.80% (p=0.000 n=10)
As for testing, we use github.com/jhump/protoreflect: the protoparse package parses the proto definition file we generate and the dynamic package decodes the messages. Check the ProtobufDecode() method for more details.4 To get the final figures, I have also optimized the decoding in GoFlow2. It was relying heavily on binary.Read(). This function may use reflection in certain cases and each call allocates a byte array to read data. Replacing it with a more efficient version provides the following improvement:
goos: linux
goarch: amd64
pkg: akvorado/inlet/flow
cpu: AMD Ryzen 5 5600X 6-Core Processor
                              bench-3.txt                bench-4.txt              
                                 sec/op        sec/op     vs base                 
Netflow/with_encoding-12       7.758    3%   7.365    2%   -5.07% (p=0.000 n=10)
Netflow/without_encoding-12    7.304    2%   6.931    3%   -5.11% (p=0.000 n=10)
Sflow/with_encoding-12        14.256    1%   9.834    2%  -31.02% (p=0.000 n=10)
Sflow/without_encoding-12     13.559    2%   9.353    2%  -31.02% (p=0.000 n=10)
It is now easier to collect new data and the inlet component is faster!

Notice Some paragraphs were editorialized by ChatGPT, using editorialize and keep it short as a prompt. The result was proofread by a human for correctness. The main idea is that ChatGPT should be better at English than me.


  1. While empty fields are not serialized to Protocol Buffers, empty columns in ClickHouse take some space, even if they compress well. Moreover, unused fields are still decoded and they may clutter the interface.
  2. There is a similar function using NetFlow. NetFlow and IPFIX protocols are less complex to decode than sFlow as they are using a simpler TLV structure.
  3. vtprotobuf generates more optimized Go code by removing an abstraction layer. It directly generates the code encoding each field to bytes:
    if m.OutIfSpeed != 0  
        i = encodeVarint(dAtA, i, uint64(m.OutIfSpeed))
        i--
        dAtA[i] = 0x6
        i--
        dAtA[i] = 0xd8
     
    
  4. There is also a protoprint package to generate proto definition file. I did not use it.

26 December 2022

Vincent Bernat: Managing infrastructure with Terraform, CDKTF, and NixOS

A few years ago, I downsized my personal infrastructure. Until 2018, there were a dozen containers running on a single Hetzner server.1 I migrated my emails to Fastmail and my DNS zones to Gandi. It left me with only my blog to self-host. As of today, my low-scale infrastructure is composed of 4 virtual machines running NixOS on Hetzner Cloud and Vultr, a handful of DNS zones on Gandi and Route 53, and a couple of Cloudfront distributions. It is managed by CDK for Terraform (CDKTF), while NixOS deployments are handled by NixOps. In this article, I provide a brief introduction to Terraform, CDKTF, and the Nix ecosystem. I also explain how to use Nix to access these tools within your shell, so you can quickly start using them.

CDKTF: infrastructure as code Terraform is an infrastructure-as-code tool. You can define your infrastructure by declaring resources with the HCL language. This language has some additional features like loops to declare several resources from a list, built-in functions you can call in expressions, and string templates. Terraform relies on a large set of providers to manage resources.

Managing servers Here is a short example using the Hetzner Cloud provider to spawn a virtual machine:
variable "hcloud_token"  
  sensitive = true
 
provider "hcloud"  
  token = var.hcloud_token
 
resource "hcloud_server" "web03"  
  name = "web03"
  server_type = "cpx11"
  image = "debian-11"
  datacenter = "nbg1-dc3"
 
resource "hcloud_rdns" "rdns4-web03"  
  server_id = hcloud_server.web03.id
  ip_address = hcloud_server.web03.ipv4_address
  dns_ptr = "web03.luffy.cx"
 
resource "hcloud_rdns" "rdns6-web03"  
  server_id = hcloud_server.web03.id
  ip_address = hcloud_server.web03.ipv6_address
  dns_ptr = "web03.luffy.cx"
 
HCL expressiveness is quite limited and I find a general-purpose language more convenient to describe all the resources. This is where CDK for Terraform comes in: you can manage your infrastructure using your preferred programming language, including TypeScript, Go, and Python. Here is the previous example using CDKTF and TypeScript:
import   App, TerraformStack, Fn   from "cdktf";
import   HcloudProvider   from "./.gen/providers/hcloud/provider";
import * as hcloud from "./.gen/providers/hcloud";
class MyStack extends TerraformStack  
  constructor(scope: Construct, name: string)  
    super(scope, name);
    const hcloudToken = new TerraformVariable(this, "hcloudToken",  
      type: "string",
      sensitive: true,
     );
    const hcloudProvider = new HcloudProvider(this, "hcloud",  
      token: hcloudToken.value,
     );
    const web03 = new hcloud.server.Server(this, "web03",  
      name: "web03",
      serverType: "cpx11",
      image: "debian-11",
      datacenter: "nbg1-dc3",
      provider: hcloudProvider,
     );
    new hcloud.rdns.Rdns(this, "rdns4-web03",  
      serverId: Fn.tonumber(web03.id),
      ipAddress: web03.ipv4Address,
      dnsPtr: "web03.luffy.cx",
      provider: hcloudProvider,
     );
    new hcloud.rdns.Rdns(this, "rdns6-web03",  
      serverId: Fn.tonumber(web03.id),
      ipAddress: web03.ipv6Address,
      dnsPtr: "web03.luffy.cx",
      provider: hcloudProvider,
     );
   
 
const app = new App();
new MyStack(app, "cdktf-take1");
app.synth();
Running cdktf synth generates a configuration file for Terraform, terraform plan previews the changes, and terraform apply applies them. Now that you have a general-purpose language, you can use functions.

Managing DNS records While using CDKTF for 4 web servers may seem a tad overkill, this is quite different when it comes to managing a few DNS zones. With DNSControl, which is using JavaScript as a domain-specific language, I was able to define the bernat.ch zone with this snippet of code:
D("bernat.ch", REG_NONE, DnsProvider(DNS_BIND, 0), DnsProvider(DNS_GANDI),
  DefaultTTL('2h'),
  FastMailMX('bernat.ch',  subdomains: ['vincent'] ),
  WebServers('@'),
  WebServers('vincent');
This generated 38 records. With CDKTF, I use:
new Route53Zone(this, "bernat.ch", providers.aws)
  .sign(dnsCMK)
  .registrar(providers.gandiVB)
  .www("@", servers)
  .www("vincent", servers)
  .www("media", servers)
  .fastmailMX(["vincent"]);
All the magic is in the code that I did not show you. You can check the dns.ts file in the cdktf-take1 repository to see how it works. Here is a quick explanation:
  • Route53Zone() creates a new zone hosted by Route 53,
  • sign() signs the zone with the provided master key,
  • registrar() registers the zone to the registrar of the domain and sets up DNSSEC,
  • www() creates A and AAAA records for the provided name pointing to the web servers,
  • fastmailMX() creates the MX records and other support records to direct emails to Fastmail.
Here is the content of the fastmailMX() function. It generates a few records and returns the current zone for chaining:
fastmailMX(subdomains?: string[])  
  (subdomains ?? [])
    .concat(["@", "*"])
    .forEach((subdomain) =>
      this.MX(subdomain, [
        "10 in1-smtp.messagingengine.com.",
        "20 in2-smtp.messagingengine.com.",
      ])
    );
  this.TXT("@", "v=spf1 include:spf.messagingengine.com ~all");
  ["mesmtp", "fm1", "fm2", "fm3"].forEach((dk) =>
    this.CNAME( $ dk ._domainkey ,  $ dk .$ this.name .dkim.fmhosted.com. )
  );
  this.TXT("_dmarc", "v=DMARC1; p=none; sp=none");
  return this;
 
I encourage you to browse the repository if you need more information.

About Pulumi My first tentative around Terraform was to use Pulumi. You can find this attempt on GitHub. This is quite similar to what I currently do with CDKTF. The main difference is that I am using Python instead of TypeScript because I was not familiar with TypeScript at the time.2 Pulumi predates CDKTF and it uses a slightly different approach. CDKTF generates a Terraform configuration (in JSON format instead of HCL), delegating planning, state management, and deployment to Terraform. It is therefore bound to the limitations of what can be expressed by Terraform, notably when you need to transform data obtained from one resource to another.3 Pulumi needs specific providers for each resource. Many Pulumi providers are thin wrappers encapsulating Terraform providers. While Pulumi provides a good user experience, I switched to CDKTF because writing providers for Pulumi is a chore. CDKTF does not require such a step. Outside the big players (AWS, Azure and Google Cloud), the existence, quality, and freshness of the Pulumi providers are inconsistent. Most providers rely on a Terraform provider and they may lag a few versions behind, miss a few resources, or have a few bugs of their own. When a provider does not exist, you can write one with the help of the pulumi-terraform-bridge library. The Pulumi project provides a boilerplate for this purpose. I had a bad experience with it when writing providers for Gandi and Vultr: the Makefile automatically installs Pulumi using a curl sh pattern and does not work with /bin/sh. There is a lack of interest for community-based contributions4 or even for providers for smaller players.

NixOS & NixOps Nix is a functional, purely-functional programming language. Nix is also the name of the package manager that is built on top of the Nix language. It allows users to declaratively install packages. nixpkgs is a repository of packages. You can install Nix on top of a regular Linux distribution. If you want more details, a good resource is the official website, and notably the learn section. There is a steep learning curve, but the reward is tremendous.

NixOS: declarative Linux distribution NixOS is a Linux distribution built on top of the Nix package manager. Here is a configuration snippet to add some packages:
environment.systemPackages = with pkgs;
  [
    bat
    htop
    liboping
    mg
    mtr
    ncdu
    tmux
  ];
It is possible to alter an existing derivation5 to use a different version, enable a specific feature, or apply a patch. Here is how I enable and configure Nginx to disable the stream module, add the Brotli compression module, and add the IP address anonymizer module. Moreover, instead of using OpenSSL 3, I keep using OpenSSL 1.1.6
services.nginx =  
  enable = true;
  package = (pkgs.nginxStable.override  
    withStream = false;
    modules = with pkgs.nginxModules; [
      brotli
      ipscrub
    ];
    openssl = pkgs.openssl_1_1;
   );
If you need to add some patches, it is also possible. Here are the patches I added in 2019 to circumvent the DoS vulnerabilities in Nginx until they were fixed in NixOS:7
services.nginx.package = pkgs.nginxStable.overrideAttrs (old:  
  patches = oldAttrs.patches ++ [
    # HTTP/2: reject zero length headers with PROTOCOL_ERROR.
    (pkgs.fetchpatch  
      url = https://github.com/nginx/nginx/commit/dbdd[ ].patch;
      sha256 = "a48190[ ]";
     )
    # HTTP/2: limited number of DATA frames.
    (pkgs.fetchpatch  
      url = https://github.com/nginx/nginx/commit/94c5[ ].patch;
      sha256 = "af591a[ ]";
     )
    #  HTTP/2: limited number of PRIORITY frames.
    (pkgs.fetchpatch  
      url = https://github.com/nginx/nginx/commit/39bb[ ].patch;
      sha256 = "1ad8fe[ ]";
     )
  ];
 );
If you are interested, have a look at my relatively small configuration: common.nix contains the configuration to be applied to any host (SSH, users, common software packages), web.nix contains the configuration for the web servers, isso.nix runs Isso into a systemd container.

NixOps: NixOS deployment tool On a single node, NixOS configuration is in the /etc/nixos/configuration.nix file. After modifying it, you have to run nixos-rebuild switch. Nix fetches all possible dependencies from the binary cache and builds the remaining packages. It creates a new entry in the boot loader menu and activates the new configuration. To manage several nodes, there exists several options, including NixOps, deploy-rs, Colmena, and morph. I do not know all of them, but from my point of view, the differences are not that important. It is also possible to build such a tool yourself as Nix provides the most important building blocks: nix build and nix copy. NixOps is one of the first tools available but I encourage you to explore the alternatives. NixOps configuration is written in Nix. Here is a simplified configuration to deploy znc01.luffy.cx, web01.luffy.cx, and web02.luffy.cx, with the help of the server and web functions:
let
  server = hardware: name: imports:  
    deployment.targetHost = "$ name .luffy.cx";
    networking.hostName = name;
    networking.domain = "luffy.cx";
    imports = [ (./hardware/. + "/$ hardware .nix") ] ++ imports;
   ;
  web = hardware: idx: imports:
    server hardware "web$ lib.fixedWidthNumber 2 idx " ([ ./web.nix ] ++ imports);
in  
  network.description = "Luffy infrastructure";
  network.enableRollback = true;
  defaults = import ./common.nix;
  znc01 = server "exoscale" [ ./znc.nix ];
  web01 = web "hetzner" 1 [ ./isso.nix ];
  web02 = web "hetzner" 2 [];
 

Tying everything together with Nix The Nix ecosystem is a unified solution to the various problems around software and configuration management. A very interesting feature is the declarative and reproducible developer environments. This is similar to Python virtual environments, except it is not language-specific.

Brief introduction to Nix flakes I am using flakes, a new Nix feature improving reproducibility by pinning all dependencies and making the build hermetic. While the feature is marked as experimental,8 it is widely used and you may see flake.nix and flake.lock at the root of some repositories. As a short example, here is the flake.nix content shipped with Snimpy, an interactive SNMP tool for Python relying on libsmi, a C library:
 
  inputs =  
    nixpkgs.url = "nixpkgs";
    flake-utils.url = "github:numtide/flake-utils";
   ;
  outputs =   self, ...  @inputs:
    inputs.flake-utils.lib.eachDefaultSystem (system:
      let
        pkgs = inputs.nixpkgs.legacyPackages."$ system ";
      in
       
        # nix build
        packages.default = pkgs.python3Packages.buildPythonPackage  
          name = "snimpy";
          src = self;
          preConfigure = ''echo "1.0.0-0-000000000000" > version.txt'';
          checkPhase = "pytest";
          checkInputs = with pkgs.python3Packages; [ pytest mock coverage ];
          propagatedBuildInputs = with pkgs.python3Packages; [ cffi pysnmp ipython ];
          buildInputs = [ pkgs.libsmi ];
         ;
        # nix run + nix shell
        apps.default =   
          type = "app";
          program = "$ self.packages."$ system ".default /bin/snimpy";
         ;
        # nix develop
        devShells.default = pkgs.mkShell  
          name = "snimpy-dev";
          buildInputs = [
            self.packages."$ system ".default.inputDerivation
            pkgs.python3Packages.ipython
          ];
         ;
       );
 
If you have Nix installed on your system:
  • nix run github:vincentbernat/snimpy runs Snimpy,
  • nix shell github:vincentbernat/snimpy provides a shell with Snimpy ready-to-use,
  • nix build github:vincentbernat/snimpy builds the Python package, tests included, and
  • nix develop . provides a shell to hack around Snimpy when run from a fresh checkout.9
For more information about Nix flakes, have a look at the tutorial from Tweag.

Nix and CDKTF At the root of the repository I use for CDKTF, there is a flake.nix file to set up a shell with Terraform and CDKTF installed and with the appropriate environment variables to automate my infrastructure. Terraform is already packaged in nixpkgs. However, I need to apply a patch on top of the Gandi provider. Not a problem with Nix!
terraform = pkgs.terraform.withPlugins (p: [
  p.aws
  p.hcloud
  p.vultr
  (p.gandi.overrideAttrs
    (old:  
      src = pkgs.fetchFromGitHub  
        owner = "vincentbernat";
        repo = "terraform-provider-gandi";
        rev = "feature/livedns-key";
        hash = "sha256-V16BIjo5/rloQ1xTQrdd0snoq1OPuDh3fQNW7kiv/kQ=";
       ;
     ))
]);
CDKTF is written in TypeScript. I have a package.json file with all the dependencies needed, including the ones to use TypeScript as the language to define infrastructure:
 
  "name": "cdktf-take1",
  "version": "1.0.0",
  "main": "main.js",
  "types": "main.ts",
  "private": true,
  "dependencies":  
    "@types/node": "^14.18.30",
    "cdktf": "^0.13.3",
    "cdktf-cli": "^0.13.3",
    "constructs": "^10.1.151",
    "eslint": "^8.27.0",
    "prettier": "^2.7.1",
    "ts-node": "^10.9.1",
    "typescript": "^3.9.10",
    "typescript-language-server": "^2.1.0"
   
 
I use Yarn to get a yarn.lock file that can be used directly to declare a derivation containing all the dependencies:
nodeEnv = pkgs.mkYarnModules  
  pname = "cdktf-take1-js-modules";
  version = "1.0.0";
  packageJSON = ./package.json;
  yarnLock = ./yarn.lock;
 ;
The next step is to generate the CDKTF providers from the Terraform providers and turn them into a derivation:
cdktfProviders = pkgs.stdenvNoCC.mkDerivation  
  name = "cdktf-providers";
  nativeBuildInputs = [
    pkgs.nodejs
    terraform
  ];
  src = nix-filter  
    root = ./.;
    include = [ ./cdktf.json ./tsconfig.json ];
   ;
  buildPhase = ''
    export HOME=$(mktemp -d)
    export CHECKPOINT_DISABLE=1
    export DISABLE_VERSION_CHECK=1
    export PATH=$ nodeEnv /node_modules/.bin:$PATH
    ln -nsf $ nodeEnv /node_modules node_modules
    # Build all providers we have in terraform
    for provider in $(cd $ terraform /libexec/terraform-providers; echo */*/*/*); do
      version=''$ provider##*/ 
      provider=''$ provider%/* 
      echo "Build $provider@$version"
      cdktf provider add --force-local $provider@$version   cat
    done
    echo "Compile TS   JS"
    tsc
  '';
  installPhase = ''
    mv .gen $out
    ln -nsf $ nodeEnv /node_modules $out/node_modules
  '';
 ;
Finally, we can define the development environment:
devShells.default = pkgs.mkShell  
  name = "cdktf-take1";
  buildInputs = [
    pkgs.nodejs
    pkgs.yarn
    terraform
  ];
  shellHook = ''
    # No telemetry
    export CHECKPOINT_DISABLE=1
    # No autoinstall of plugins
    export CDKTF_DISABLE_PLUGIN_CACHE_ENV=1
    # Do not check version
    export DISABLE_VERSION_CHECK=1
    # Access to node modules
    export PATH=$PWD/node_modules/.bin:$PATH
    ln -nsf $ nodeEnv /node_modules node_modules
    ln -nsf $ cdktfProviders  .gen
    # Credentials
    for p in \
      njf.nznmba.pbz/Nqzvavfgengbe \
      urgmare.pbz/ivaprag@oreang.pu \
      ihyge.pbz/ihyge@ivaprag.oreang.pu; do
        eval $(pass show $(echo $p   tr 'A-Za-z' 'N-ZA-Mn-za-m')   grep '^export')
    done
    eval $(pass show personal/cdktf/secrets   grep '^export')
    export TF_VAR_hcloudToken="$HCLOUD_TOKEN"
    export TF_VAR_vultrApiKey="$VULTR_API_KEY"
    unset VULTR_API_KEY HCLOUD_TOKEN
  '';
 ;
The derivations listed in buildInputs are available in the provided shell. The content of shellHook is sourced when starting the shell. It sets up some symbolic links to make the JavaScript environment built at an earlier step available, as well as the generated CDKTF providers. It also exports all the credentials.10 I am also using direnv with an .envrc to automatically load the development environment. This also enables the environment to be available from inside Emacs, notably when using lsp-mode to get TypeScript completions. Without direnv, nix develop . can activate the environment. I use the following commands to deploy the infrastructure:11
$ cdktf synth
$ cd cdktf.out/stacks/cdktf-take1
$ terraform plan --out plan
$ terraform apply plan
$ terraform output -json > ~-automation/nixops-take1/cdktf.json
The last command generates a JSON file containing various data to complete the deployment with NixOps.

NixOps The JSON file exported by Terraform contains the list of servers with various attributes:
 
  "hardware": "hetzner",
  "ipv4Address": "5.161.44.145",
  "ipv6Address": "2a01:4ff:f0:b91::1",
  "name": "web05.luffy.cx",
  "tags": [
    "web",
    "continent:NA",
    "continent:SA"
  ]
 
In network.nix, this list is imported and transformed into an attribute set describing the servers. A simplified version looks like this:
let
  lib = inputs.nixpkgs.lib;
  shortName = name: builtins.elemAt (lib.splitString "." name) 0;
  domainName = name: lib.concatStringsSep "." (builtins.tail (lib.splitString "." name));
  server = hardware: name: imports:  
    networking =  
      hostName = shortName name;
      domain = domainName name;
     ;
    deployment.targetHost = name;
    imports = [ (./hardware/. + "/$ hardware .nix") ] ++ imports;
   ;
  cdktf-servers-json = (lib.importJSON ./cdktf.json).servers.value;
  cdktf-servers = map
    (s:
      let
        tags-maybe-import = map (t: ./. + "/$ t .nix") s.tags;
        tags-import = builtins.filter (t: builtins.pathExists t) tags-maybe-import;
      in
       
        name = shortName s.name;
        value = server s.hardware s.name tags-import;
       )
    cdktf-servers-json;
in
 
  // [ ]
  // builtins.listToAttrs cdktf-servers
For web05, this expands to:
web05 =  
  networking =  
    hostName = "web05";
    domainName = "luffy.cx";
   ;
  deployment.targetHost = "web05.luffy.cx";
  imports = [ ./hardware/hetzner.nix ./web.nix ];
 ;
As for CDKTF, at the root of the repository I use for NixOps, there is a flake.nix file to set up a shell with NixOps configured. Because NixOps do not support rollouts, I usually use the following commands to deploy on a single server:12
$ nix flake update
$ nixops deploy --include=web04
$ ./tests web04.luffy.cx
If the tests are OK, I deploy the remaining nodes gradually with the following command:
$ (set -e; for h in web 03..06 ; do nixops deploy --include=$h; done)
nixops deploy rolls out all servers in parallel and therefore could cause a short outage where all Nginx are down at the same time.
This post has been a work-in-progress for the past three years, with the content being updated and refined as I experimented with different solutions. There is still much to explore13 but I feel there is enough content to publish now.

  1. It was an AMD Athlon 64 X2 5600+ with 2 GB of RAM and 2 400 GB disks with software RAID. I was paying something around 59 per month for it. While it was a good deal in 2008, by 2018 it was no longer cost-effective. It was running on Debian Wheezy with Linux-VServer for isolation, both of which were outdated in 2018.
  2. I also did not use Python because Poetry support in Nix was a bit broken around the time I started hacking around CDKTF.
  3. Pulumi can apply arbitrary functions with the apply() method on an output. It makes it easy to transform data that are not known during the planning stage. Terraform has functions to serve a similar purpose, but they are more limited.
  4. The two mentioned pull requests are not merged yet. The second one is superseded by PR #61, submitted two months later, which enforces the use of /bin/bash. I also submitted PR #56, which was merged 4 months later and quickly reverted without an explanation.
  5. You may consider packages and derivations to be synonyms in the Nix ecosystem.
  6. OpenSSL 3 has outstanding performance regressions.
  7. NixOS can be a bit slow to integrate patches since they need to rebuild parts of the binary cache before releasing the fixes. In this specific case, they were fast: the vulnerability and patches were released on August 13th 2019 and available in NixOS on August 15th. As a comparison, Debian only released the fixed version on August 22nd, which is unusually late.
  8. Because flakes are experimental, many documentations do not use them and it is an additional aspect to learn.
  9. It is possible to replace . with github:vincentbernat/snimpy, like in the other commands, but having Snimpy dependencies without Snimpy source code is less interesting.
  10. I am using pass as a password manager. The password names are only obfuscated to avoid spam.
  11. The cdktf command can wrap the terraform commands, but I prefer to use them directly as they are more flexible.
  12. If the change is risky, I disable the server with CDKTF. This removes it from the web service DNS records.
  13. I would like to replace NixOps with an alternative handling progressive rollouts and checks. I am also considering switching to Nomad or Kubernetes to deploy workloads.

11 December 2022

Vincent Bernat: Akvorado: a flow collector, enricher, and visualizer

Earlier this year, we released Akvorado, a flow collector, enricher, and visualizer. It receives network flows from your routers using either NetFlow v9, IPFIX, or sFlow. Several pieces of information are added, like GeoIP and interface names. The flows are exported to Apache Kafka, a distributed queue, then stored inside ClickHouse, a column-oriented database. A web frontend is provided to run queries. A live version is available for you to play.
Akvorado web interface displays the result of a simple query using stacked areas
Akvorado s web frontend
Several alternatives exist: Akvorado differentiates itself from these solutions because: The proposed deployment solution relies on Docker Compose to set up Akvorado, Zookeeper, Kafka, and ClickHouse. I hope it should be enough for anyone to get started quickly. Akvorado is performant enough to handle 100 000 flows per second with 64 GB of RAM and 24 vCPU. With 2 TB of disk, you should expect to keep data for a few years. I spent some time writing a fairly complete documentation. It seems redundant to repeat its content in this blog post. There is also a section about its internal design if you are interested in how it is built. I also did a FRnOG presentation earlier this year, and a ClickHouse meetup presentation, which focuses more on how ClickHouse is used. I plan to write more detailed articles on specific aspects of Akvorado. Stay tuned!

  1. While the collector could write directly to the database, the queue buffers flows if the database is unavailable. It also enables you to process flows with another piece of software (like an anti-DDoS system).

Next.