Search Results: "Vincent Bernat"

27 July 2022

Vincent Bernat: ClickHouse SF Bay Area Meetup: Akvorado

Here are the slides I presented for a ClickHouse SF Bay Area Meetup in July 2022, hosted by Altinity. They are about Akvorado, a network flow collector and visualizer, and notably on how it relies on ClickHouse, a column-oriented database.
The meetup was recorded and available on YouTube. Here is the part relevant to my presentation, with subtitles:1
I got a few questions about how to get information from the higher layers, like HTTP. As my use case for Akvorado was at the network edge, my answers were mostly negative. However, as sFlow is extensible, when collecting flows from Linux servers instead, you could embed additional data and they could be exported as well. I also got a question about doing aggregation in a single table. ClickHouse can aggregate automatically data using TTL. My answer for not doing that is partial. There is another reason: the retention periods of the various tables may overlap. For example, the main table keeps data for 15 days, but even in these 15 days, if I do a query on a 12-hour window, it is faster to use the flows_1m0s aggregated table, unless I request something about ports and IP addresses.

  1. To generate the subtitles, I have used Amazon Transcribe, the speech-to-text solution from Amazon AWS. Unfortunately, there is no en-FR language available, which would have been useful for my terrible accent. While the subtitles were 100% accurate when the host, Robert Hodge from Altinity, was speaking, the success rate on my talk was quite lower. I had to rewrite almost all sentences. However, using speech-to-text is still useful to get the timings, as it is also something requiring a lot of work to do manually.

26 December 2021

Vincent Bernat: Custom screen saver with XSecureLock

i3lock is a popular X11 screen lock utility. As far as customization goes, it only allows one to set a background from a PNG file. This limitation is part of the design of i3lock: its primary goal is to keep the screen locked, something difficult enough with X11. Each additional feature would increase the attack surface and move away from this goal.1 Many are frustrated with these limitations and extend i3lock through simple wrapper scripts or by forking it.2 The first solution is usually safe, but the second goes against the spirit of i3lock. XSecureLock is a less-known alternative to i3lock. One of the most attractive features of this locker is to delegate the screen saver feature to another process. This process can be anything as long it can attach to an existing window provided by XSecureLock, which won t pass any input to it. It will also put a black window below it to ensure the screen stays locked in case of a crash. XSecureLock is shipped with a few screen savers, notably one using mpv to display photos or videos, like the Apple TV aerial videos. I have written my own saver using Python and GTK.3 It shows a background image, a clock, and the current weather.4
Custom screen saver for XSecureLock, displaying a clock and the current weather
Custom screen saver for XSecureLock
I add two patches over XSecureLock: XSecureLock also delegates the authentication window to another process, but I was less comfortable providing a custom one as it is a bit more security-sensitive. While basic, the shipped authentication application is fine by me. I think people should avoid modifying i3lock code and use XSecureLock instead. I hope this post will help a bit.

Update (2022-01) XScreenSaver can also run arbitrary programs as a screen saver.


  1. See for example this comment or this one explaining the rationale.
  2. This Reddit post enumerates many of these alternatives.
  3. Using GTK makes it a bit difficult to use some low-level features, like embedding an application into an existing window. However, the high-level features are easier, notably drawing an image and a text with a shadow.
  4. Weather is retrieved by another script running on a timer and written to a file. The screen saver watches this file for updates.

24 December 2021

Vincent Bernat: Automatic login with startx and systemd

If your workstation is using full-disk encryption, you may want to jump directly to your desktop environment after entering the passphrase to decrypt the disk. Many display managers like GDM and LightDM have an autologin feature. However, only GDM can run Xorg with standard user privileges. Here is an alternative using startx and a systemd service:
[Unit]
Description=X11 session for bernat
After=graphical.target systemd-user-sessions.service
[Service]
User=bernat
WorkingDirectory=~
PAMName=login
Environment=XDG_SESSION_TYPE=x11
TTYPath=/dev/tty8
StandardInput=tty
UnsetEnvironment=TERM
UtmpIdentifier=tty8
UtmpMode=user
StandardOutput=journal
ExecStartPre=/usr/bin/chvt 8
ExecStart=/usr/bin/startx -- vt8 -keeptty -verbose 3 -logfile /dev/null
Restart=no
[Install]
WantedBy=graphical.target
Let me explain each block: Drop this unit in /etc/systemd/system/x11-autologin.service and enable it with systemctl enable x11-autologin.service. Xorg is now running rootless and logging into the journal. After using it for a few months, I didn t notice any regression compared to LightDM with autologin.

  1. For more information on how logind provides access to devices, see this blog post. The method names do not match the current implementation, but the concepts are still correct. Xorg takes control of the session when the TTY is active.
  2. Xorg could change the type of the session itself after taking control of it, but it does not.
  3. There is some code in Xorg to do that, but it is executed too late and fails with: xf86OpenConsole: VT_ACTIVATE failed: Operation not permitted.

15 November 2021

Vincent Bernat: Git as a source of truth for network automation

The first step when automating a network is to build the source of truth. A source of truth is a repository of data that provides the intended state: the list of devices, the IP addresses, the network protocols settings, the time servers, etc. A popular choice is NetBox. Its documentation highlights its usage as a source of truth:
NetBox intends to represent the desired state of a network versus its operational state. As such, automated import of live network state is strongly discouraged. All data created in NetBox should first be vetted by a human to ensure its integrity. NetBox can then be used to populate monitoring and provisioning systems with a high degree of confidence.
When introducing Jerikan, a common feedback we got was: you should use NetBox for this. Indeed, Jerikan s source of truth is a bunch of YAML files versioned with Git.

Why Git? If we look at how things are done with servers and services, in a datacenter or in the cloud, we are likely to find users of Terraform, a tool turning declarative configuration files into infrastructure. Declarative configuration management tools like Salt, Puppet,1 or Ansible take care of server configuration. NixOS is an alternative: it combines package management and configuration management with a functional language to build virtual machines and containers. When using a Kubernetes cluster, people use Kustomize or Helm, two other declarative configuration management tools. Tapped together, these tools implement the infrastructure as code paradigm.
Infrastructure as code is an approach to infrastructure automation based on practices from software development. It emphasizes consistent, repeatable routines for provisioning and changing systems and their configuration. You make changes to code, then use automation to test and apply those changes to your systems. Kief Morris, Infrastructure as Code, O Reilly.
A version control system is a central tool for infrastructure as code. The usual candidate is Git with a source code management system like GitLab or GitHub. You get:
Traceability and visibility
Git keeps a log of all changes: what, who, why, and when. With a bit of discipline, each change is explained and self-contained. It becomes part of the infrastructure documentation. When the support team complains about a degraded experience for some customers over the last two months or so, you quickly discover this may be related to a change to an incoming policy in New York.
Rolling back
If a change is defective, it can be reverted quickly, safely, and without much effort, even if other changes happened in the meantime. The policy change at the origin of the problem spanned over three routers. Reverting this specific change and deploying the configuration let you solve the situation until you find a better fix.
Branching, reviewing, merging
When working on a new feature or refactoring some part of the infrastructure, a team member creates a branch and works on their change without interfering with the work of other members. Once the branch is ready, a pull request is created and the change is ready to be reviewed by the other team members before merging. You discover the issue was related to diverting traffic through an IX where one ISP was connected without enough capacity. You propose and discuss a fix that includes a change of the schema and the templates used to declare policies to be able to handle this case.
Continuous integration
For each change, automated tests are triggered. They can detect problems and give more details on the effect of a change. Branches can be deployed to a test infrastructure where regression tests are executed. The results can be synthesized as a comment in the pull request to help the review. You check your proposed change does not modify the other existing policies.

Why not NetBox? NetBox does not share these features. It is a database with a REST and a GraphQL API. Traceability is limited: changes are not grouped into a transaction and they are not documented. You cannot fork the database. Usually, there is one staging database to test modifications before applying them to the production database. It does not scale well and reviews are difficult. Applying the same change to the production database can be hazardous. Rolling back a change is non-trivial.

Update (2021-11) Nautobot, a fork of NetBox, will soon address this point by using Dolt, an SQL database engine allowing you to clone, branch, and merge, like a Git repository. Dolt is compatible with MySQL clients. See Nautobots, Roll Back! for a preview of this feature.

Moreover, NetBox is not usually the single source of truth. It contains your hardware inventory, the IP addresses, and some topology information. However, this is not the place you put authorized SSH keys, syslog servers, or the BGP configuration. If you also use Ansible, this information ends in its inventory. The source of truth is therefore fragmented between several tools with different workflows. Since NetBox 2.7, you can append additional data with configuration contexts. This mitigates this point. The data is arranged hierarchically but the hierarchy cannot be customized.2 Nautobot can manage configuration contexts in a Git repository, while still allowing to use of the API to fetch them. You get some additional perks, thanks to Git, but the remaining data is still in a database with a different lifecycle. Lastly, the schema used by NetBox may not fit your needs and you cannot tweak it. For example, you may have a rule to compute the IPv6 address from the IPv4 address for dual-stack interfaces. Such a relationship cannot be easily expressed and enforced in NetBox. When changing the IPv4 address, you may forget the IPv6 address. The source of truth should only contain the IPv4 address but you also want the IPv6 address in NetBox because this is your IPAM and you need it to update your DNS entries.

Why not Git? There are some limitations when putting your source of truth in Git:
  1. If you want to expose a web interface to allow an external team to request a change, it is more difficult to do it with Git than with a database. Out-of-the-box, NetBox provides a nice web interface and a permission system. You can also write your own web interface and interact with NetBox through its API.
  2. YAML files are more difficult to query in different ways. For example, looking for a free IP address is complex if they are scattered in multiple places.
In my opinion, in most cases, you are better off putting the source of truth in Git instead of NetBox. You get a lot of perks by doing that and you can still use NetBox as a read-only view, usable by other tools. We do that with an Ansible module. In the remaining cases, Git could still fit the bill. Read-only access control can be done through submodules. Pull requests can restrict write access: a bot can check the changes only modify allowed files before auto-merging. This still requires some Git knowledge, but many teams are now comfortable using Git, thanks to its ubiquity.

  1. Wikimedia manages its infrastructure with Puppet. They publish everything on GitHub. Creative Commons uses Salt. They also publish everything on GitHub. Thanks to them for doing that! I wish I could provide more real-life examples.
  2. Being able to customize the hierarchy is key to avoiding repetition in the data. For example, if switches are paired together, some data should be attached to them as a group and not duplicated on each of them. Tags can be used to partially work around this issue but you lose the hierarchical aspect.

6 November 2021

Vincent Bernat: How to rsync files between two remotes?

scp -3 can copy files between two remote hosts through localhost. This comes in handy when the two servers cannot communicate directly or if they are unable to authenticate one to the other.1 Unfortunately, rsync does not support such a feature. Here is a trick to emulate the behavior of scp -3 with SSH tunnels. When syncing with a remote host, rsync invokes ssh to spawn a remote rsync --server process. It interacts with it through its standard input and output. The idea is to recreate the same setup using SSH tunnels and socat, a versatile tool to establish bidirectional data transfers. The first step is to connect to the source server and ask rsync the command-line to spawn the remote rsync --server process. The -e flag overrides the command to use to get a remote shell: instead of ssh, we use echo.
$ ssh web04
$ rsync -e 'sh -c ">&2 echo $@" echo' -aLv /data/. web05:/data/.
web05 rsync --server -vlogDtpre.iLsfxCIvu . /data/.
rsync: connection unexpectedly closed (0 bytes received so far) [sender]
rsync error: error in rsync protocol data stream (code 12) at io.c(228) [sender=3.2.3]
The second step is to connect to the destination server with local port forwarding. When connecting to the local port 5000, the TCP connection is forwarded through SSH to the remote port 5000 and handled by socat. When receiving the connection, socat spawns the rsync --server command we got at the previous step and connects its standard input and output to the incoming TCP socket.
$ ssh -L 127.0.0.1:5000:127.0.0.1:5000 web05
$ socat tcp-listen:5000,reuseaddr exec:"rsync --server -vlogDtpre.iLsfxCIvu . /data/."
The last step is to connect to the source with remote port forwarding. socat is used in place of a regular SSH connection and connects its standard input and output to a TCP socket connected to the remote port 5000. Thanks to the remote port forwarding, SSH forwards the data to the local port 5000. From there, it is relayed back to the destination, as described in the previous step.
$ ssh -R 127.0.0.1:5000:127.0.0.1:5000 web04
$ rsync -e 'sh -c "socat stdio tcp-connect:127.0.0.1:5000"' -aLv /data/. remote:/data/.
sending incremental file list
haproxy.debian.net/
haproxy.debian.net/dists/buster-backports-1.8/Contents-amd64.bz2
haproxy.debian.net/dists/buster-backports-1.8/Contents-i386.bz2
[ ]
media.luffy.cx/videos/2021-frnog34-jerikan/progressive.mp4
sent 921,719,453 bytes  received 26,939 bytes  7,229,383.47 bytes/sec
total size is 7,526,872,300  speedup is 8.17
This little diagram may help understand how everything fits together:
Diagram showing how all processes are connected together: rsync, socat and ssh
How each process is connected together. Arrows labeled stdio are implemented as two pipes connecting the process to the left to the standard input and output of the process to the right. Don't be fooled by the apparent symmetry!
The rsync manual page prohibits the use of --server. Use this hack at your own risk!
The options --server and --sender are used internally by rsync, and should never be typed by a user under normal circumstances. Some awareness of these options may be needed in certain scenarios, such as when setting up a login that can only run an rsync command. For instance, the support directory of the rsync distribution has an example script named rrsync (for restricted rsync) that can be used with a restricted ssh login.

Addendum I was hoping to get something similar with a one-liner. But this does not work!
$ socat \
>  exec:"ssh web04 rsync --server --sender -vlLogDtpre.iLsfxCIvu . /data/." \
>  exec:"ssh web05 rsync --server -vlogDtpre.iLsfxCIvu /data/. /data/." \
over-long vstring received (511 > 255)
over-long vstring received (511 > 255)
rsync error: requested action not supported (code 4) at compat.c(387) [sender=3.2.3]
rsync error: requested action not supported (code 4) at compat.c(387) [Receiver=3.2.3]
socat[878291] E waitpid(): child 878292 exited with status 4
socat[878291] E waitpid(): child 878293 exited with status 4

  1. And SSH agent forwarding is dangerous. Don t use it if you can.

1 October 2021

Vincent Bernat: FRnOG #34: how we deployed a datacenter in one click

Here are the slides I presented for FRnOG #34 in October 2021. They are about automating the deployment of Blade s datacenters using Jerikan and Ansible. For more information, have a look at Jerikan+Ansible: a configuration management system for network.
The presentation, in French, was recorded. I have added English subtitles.1

  1. Good thing if you don t understand French as my diction was poor with a lot of fillers.

12 September 2021

Vincent Bernat: Short feedback on Cisco pyATS and Genie Parser

Cisco pyATS is a framework for network automation and testing. It includes, among other things, an open-source multi-vendor set of parsers and models, Genie Parser. It features 2700 parsers for various commands over many network OS. On the paper, this seems a great tool!
>>> from genie.conf.base import Device
>>> device = Device("router", os="iosxr")
>>> # Hack to parse outputs without connecting to a device
>>> device.custom.setdefault("abstraction",  )["order"] = ["os", "platform"]
>>> cmd = "show route ipv4 unicast"
>>> output = """
... Tue Oct 29 21:29:10.924 UTC
...
... O    10.13.110.0/24 [110/2] via 10.12.110.1, 5d23h, GigabitEthernet0/0/0/0.110
... """
>>> device.parse(cmd, output=output)
 'vrf':  'default':  'address_family':  'ipv4':  'routes':  '10.13.110.0/24':  'route': '10.13.110.0/24',
       'active': True,
       'route_preference': 110,
       'metric': 2,
       'source_protocol': 'ospf',
       'source_protocol_codes': 'O',
       'next_hop':  'next_hop_list':  1:  'index': 1,
          'next_hop': '10.12.110.1',
          'outgoing_interface': 'GigabitEthernet0/0/0/0.110',
          'updated': '5d23h' 
First deception: pyATS is closed-source with some exceptions. This is quite annoying if you run into some issues outside Genie Parser. For example, although pyATS is using the ssh command, it cannot leverage my ssh_config file: pyATS resolves hostnames before providing them to ssh. There is no plan to open source pyATS. Then, Genie Parser has two problems:
  1. The data models used are dependent on the vendor and OS, despite the documentation saying otherwise. For example, the data model used for IPv4 interfaces is different between NX-OS and IOS-XR.
  2. The parsers rely on line-by-line regular expressions to extract data and some Python code as glue. This is fragile and may break silently.
To illustrate the second point, let s assume the output of show ipv4 vrf all interface is:
  Loopback10 is Up, ipv4 protocol is Up
    Vrf is default (vrfid 0x60000000)
    Internet protocol processing disabled
  Loopback30 is Up, ipv4 protocol is Down [VRF in FWD reference]
    Vrf is ran (vrfid 0x0)
    Internet address is 203.0.113.17/32
    MTU is 1500 (1500 is available to IP)
    Helper address is not set
    Directed broadcast forwarding is disabled
    Outgoing access list is not set
    Inbound  common access list is not set, access list is not set
    Proxy ARP is disabled
    ICMP redirects are never sent
    ICMP unreachables are always sent
    ICMP mask replies are never sent
    Table Id is 0x0
Because the regular expression to parse an interface name does not expect the extra data after the interface state, Genie Parser ignores the line starting the definition of Loopback30 and parses the output to this structure:1
 
  "Loopback10":  
    "int_status": "up",
    "oper_status": "up",
    "vrf": "ran",
    "vrf_id": "0x0",
    "ipv4":  
      "203.0.113.17/32":  
        "ip": "203.0.113.17",
        "prefix_length": "32"
       ,
      "mtu": 1500,
      "mtu_available": 1500,
      "broadcast_forwarding": "disabled",
      "proxy_arp": "disabled",
      "icmp_redirects": "never sent",
      "icmp_unreachables": "always sent",
      "icmp_replies": "never sent",
      "table_id": "0x0"
     
   
 
While this bug is simple to fix, this is an uphill battle. Any existing or future slight variation in the output of a command could trigger another similar undetected bug, despite the extended test coverage. I have reported and fixed several other silent parsing errors: #516, #529, and #530. A more robust alternative would have been to use TextFSM and to trigger a warning when some output is not recognized, like Batfish, a configuration analysis tool, does. In the future, we should rely on YANG for data extraction, but it is currently not widely supported. SNMP is still a valid possibility but much information is not accessible through this protocol. In the meantime, I would advise you to only use Genie Parser with caution.

  1. As an exercise, the astute reader is asked to write the code to extract the IPv4 from this structure.

6 September 2021

Vincent Bernat: Switching to the i3 window manager

I have been using the awesome window manager for 10 years. It is a tiling window manager, configurable and extendable with the Lua language. Using a general-purpose programming language to configure every aspect is a double-edged sword. Due to laziness and the apparent difficulty of adapting my configuration about 3000 lines to newer releases, I was stuck with the 3.4 version, whose last release is from 2013. It was time for a rewrite. Instead, I have switched to the i3 window manager, lured by the possibility to migrate to Wayland and Sway later with minimal pain. Using an embedded interpreter for configuration is not as important to me as it was in the past: it brings both complexity and brittleness.
i3 dual screen setup
Dual screen desktop running i3, Emacs, some terminals, including a Quake console, Firefox, Polybar as the status bar, and Dunst as the notification daemon.
The window manager is only one part of a desktop environment. There are several options for the other components. I am also introducing them in this post.

i3: the window manager i3 aims to be a minimal tiling window manager. Its documentation can be read from top to bottom in less than an hour. i3 organize windows in a tree. Each non-leaf node contains one or several windows and has an orientation and a layout. This information arbitrates the window positions. i3 features three layouts: split, stacking, and tabbed. They are demonstrated in the below screenshot:
Example of layouts
Demonstration of the layouts available in i3. The main container is split horizontally. The first child is split vertically. The second one is tabbed. The last one is stacking.
Tree representation of the previous screenshot
Tree representation of the previous screenshot.
Most of the other tiling window managers, including the awesome window manager, use predefined layouts. They usually feature a large area for the main window and another area divided among the remaining windows. These layouts can be tuned a bit, but you mostly stick to a couple of them. When a new window is added, the behavior is quite predictable. Moreover, you can cycle through the various windows without thinking too much as they are ordered. i3 is more flexible with its ability to build any layout on the fly, it can feel quite overwhelming as you need to visualize the tree in your head. At first, it is not unusual to find yourself with a complex tree with many useless nested containers. Moreover, you have to navigate windows using directions. It takes some time to get used to. I set up a split layout for Emacs and a few terminals, but most of the other workspaces are using a tabbed layout. I don t use the stacking layout. You can find many scripts trying to emulate other tiling window managers but I did try to get my setup pristine of these tentatives and get a chance to familiarize myself. i3 can also save and restore layouts, which is quite a powerful feature. My configuration is quite similar to the default one and has less than 200 lines.

i3 companion: the missing bits i3 philosophy is to keep a minimal core and let the user implements missing features using the IPC protocol:
Do not add further complexity when it can be avoided. We are generally happy with the feature set of i3 and instead focus on fixing bugs and maintaining it for stability. New features will therefore only be considered if the benefit outweighs the additional complexity, and we encourage users to implement features using the IPC whenever possible. Introduction to the i3 window manager
While this is not as powerful as an embedded language, it is enough for many cases. Moreover, as high-level features may be opinionated, delegating them to small, loosely coupled pieces of code keeps them more maintainable. Libraries exist for this purpose in several languages. Users have published many scripts to extend i3: automatic layout and window promotion to mimic the behavior of other tiling window managers, window swallowing to put a new app on top of the terminal launching it, and cycling between windows with Alt+Tab. Instead of maintaining a script for each feature, I have centralized everything into a single Python process, i3-companion using asyncio and the i3ipc-python library. Each feature is self-contained into a function. It implements the following components:
make a workspace exclusive to an application
When a workspace contains Emacs or Firefox, I would like other applications to move to another workspace, except for the terminal which is allowed to intrude into any workspace. The workspace_exclusive() function monitors new windows and moves them if needed to an empty workspace or to one with the same application already running.
implement a Quake console
The quake_console() function implements a drop-down console available from any workspace. It can be toggled with Mod+ . This is implemented as a scratchpad window.
back and forth workspace switching on the same output
With the workspace back_and_forth command, we can ask i3 to switch to the previous workspace. However, this feature is not restricted to the current output. I prefer to have one keybinding to switch to the workspace on the next output and one keybinding to switch to the previous workspace on the same output. This behavior is implemented in the previous_workspace() function by keeping a per-output history of the focused workspaces.
create a new empty workspace or move a window to an empty workspace
To create a new empty workspace or move a window to an empty workspace, you have to locate a free slot and use workspace number 4 or move container to workspace number 4. The new_workspace() function finds a free number and use it as the target workspace.
restart some services on output change
When adding or removing an output, some actions need to be executed: refresh the wallpaper, restart some components unable to adapt their configuration on their own, etc. i3 triggers an event for this purpose. The output_update() function also takes an extra step to coalesce multiple consecutive events and to check if there is a real change with the low-level library xcffib.
I will detail the other features as this post goes on. On the technical side, each function is decorated with the events it should react to:
@on(CommandEvent("previous-workspace"), I3Event.WORKSPACE_FOCUS)
async def previous_workspace(i3, event):
    """Go to previous workspace on the same output."""
The CommandEvent() event class is my way to send a command to the companion, using either i3-msg -t send_tick or binding a key to a nop command. The latter is used to avoid spawning a shell and a i3-msg process just to send a message. The companion listens to binding events and checks if this is a nop command.
bindsym $mod+Tab nop "previous-workspace"
There are other decorators to avoid code duplication: @debounce() to coalesce multiple consecutive calls, @static() to define a static variable, and @retry() to retry a function on failure. The whole script is a bit more than 1000 lines. I think this is worth a read as I am quite happy with the result.

dunst: the notification daemon Unlike the awesome window manager, i3 does not come with a built-in notification system. Dunst is a lightweight notification daemon. I am running a modified version with HiDPI support for X11 and recursive icon lookup. The i3 companion has a helper function, notify(), to send notifications using DBus. container_info() and workspace_info() uses it to display information about the container or the tree for a workspace.
Notification showing i3 tree for a workspace
Notification showing i3 s tree for a workspace

polybar: the status bar i3 bundles i3bar, a versatile status bar, but I have opted for Polybar. A wrapper script runs one instance for each monitor. The first module is the built-in support for i3 workspaces. To not have to remember which application is running in a workspace, the i3 companion renames workspaces to include an icon for each application. This is done in the workspace_rename() function. The icons are from the Font Awesome project. I maintain a mapping between applications and icons. This is a bit cumbersome but it looks great.
i3 workspaces in Polybar
i3 workspaces in Polybar
For CPU, memory, brightness, battery, disk, and audio volume, I am relying on the built-in modules. Polybar s wrapper script generates the list of filesystems to monitor and they get only displayed when available space is low. The battery widget turns red and blinks slowly when running out of power. Check my Polybar configuration for more details.
Various modules for Polybar
Polybar displaying various information: CPU usage, memory usage, screen brightness, battery status, Bluetooth status (with a connected headset), network status (connected to a wireless network and to a VPN), notification status, and speaker volume.
For Bluetooh, network, and notification statuses, I am using Polybar s ipc module: the next version of Polybar can receive an arbitrary text on an IPC socket. The module is defined with a single hook to be executed at the start to restore the latest status.
[module/network]
type = custom/ipc
hook-0 = cat $XDG_RUNTIME_DIR/i3/network.txt 2> /dev/null
initial = 1
It can be updated with polybar-msg action "#network.send.XXXX". In the i3 companion, the @polybar() decorator takes the string returned by a function and pushes the update through the IPC socket. The i3 companion reacts to DBus signals to update the Bluetooth and network icons. The @on() decorator accepts a DBusSignal() object:
@on(
    StartEvent,
    DBusSignal(
        path="/org/bluez",
        interface="org.freedesktop.DBus.Properties",
        member="PropertiesChanged",
        signature="sa sv as",
        onlyif=lambda args: (
            args[0] == "org.bluez.Device1"
            and "Connected" in args[1]
            or args[0] == "org.bluez.Adapter1"
            and "Powered" in args[1]
        ),
    ),
)
@retry(2)
@debounce(0.2)
@polybar("bluetooth")
async def bluetooth_status(i3, event, *args):
    """Update bluetooth status for Polybar."""
The middle of the bar is occupied by the date and a weather forecast. The latest also uses the IPC mechanism, but the source is a Python script triggered by a timer.
Date and weather in Polybar
Current date and weather forecast for the day in Polybar. The data is retrieved with the OpenWeather API.
I don t use the system tray integrated with Polybar. The embedded icons usually look horrible and they all behave differently. A few years back, Gnome has removed the system tray. Most of the problems are fixed by the DBus-based Status Notifier Item protocol also known as Application Indicators or Ayatana Indicators for GNOME. However, Polybar does not support this protocol. In the i3 companion, The implementation of Bluetooth and network icons, including displaying notifications on change, takes about 200 lines. I got to learn a bit about how DBus works and I get exactly the info I want.

picom: the compositor I like having slightly transparent backgrounds for terminals and to reduce the opacity of unfocused windows. This requires a compositor.1 picom is a lightweight compositor. It works well for me, but it may need some tweaking depending on your graphic card.2 Unlike the awesome window manager, i3 does not handle transparency, so the compositor needs to decide by itself the opacity of each window. Check my configuration for details.

systemd: the service manager I use systemd to start i3 and the various services around it. My xsession script only sets some environment variables and lets systemd handles everything else. Have a look at this article from Micha G ral for the rationale. Notably, each component can be easily restarted and their logs are not mangled inside the ~/.xsession-errors file.3 I am using a two-stage setup: i3.service depends on xsession.target to start services before i3:
[Unit]
Description=X session
BindsTo=graphical-session.target
Wants=autorandr.service
Wants=dunst.socket
Wants=inputplug.service
Wants=picom.service
Wants=pulseaudio.socket
Wants=policykit-agent.service
Wants=redshift.service
Wants=spotify-clean.timer
Wants=ssh-agent.service
Wants=xiccd.service
Wants=xsettingsd.service
Wants=xss-lock.service
Then, i3 executes the second stage by invoking the i3-session.target:
[Unit]
Description=i3 session
BindsTo=graphical-session.target
Wants=wallpaper.service
Wants=wallpaper.timer
Wants=polybar-weather.service
Wants=polybar-weather.timer
Wants=polybar.service
Wants=i3-companion.service
Wants=misc-x.service
Have a look on my configuration files for more details.

rofi: the application launcher Rofi is an application launcher. Its appearance can be customized through a CSS-like language and it comes with several themes. Have a look at my configuration for mine.
Rofi as an application launcher
Rofi as an application launcher
It can also act as a generic menu application. I have a script to control a media player and another one to select the wifi network. It is quite a flexible application.
Rofi as a wifi network selector
Rofi to select a wireless network

xss-lock and i3lock: the screen locker i3lock is a simple screen locker. xss-lock invokes it reliably on inactivity or before a system suspend. For inactivity, it uses the XScreenSaver events. The delay is configured using the xset s command. The locker can be invoked immediately with xset s activate. X11 applications know how to prevent the screen saver from running. I have also developed a small dimmer application that is executed 20 seconds before the locker to give me a chance to move the mouse if I am not away.4 Have a look at my configuration script.
Demonstration of xss-lock, xss-dimmer and i3lock with a 4 speedup.

The remaining components
  • autorandr is a tool to detect the connected display, match them against a set of profiles, and configure them with xrandr.
  • inputplug executes a script for each new mouse and keyboard plugged. This is quite useful to load the appropriate the keyboard map. See my configuration.
  • xsettingsd provides settings to X11 applications, not unlike xrdb but it notifies applications for changes. The main use is to configure the Gtk and DPI settings. See my article on HiDPI support on Linux with X11.
  • Redshift adjusts the color temperature of the screen according to the time of day.
  • maim is a utility to take screenshots. I use Prt Scn to trigger a screenshot of a window or a specific area and Mod+Prt Scn to capture the whole desktop to a file. Check the helper script for details.
  • I have a collection of wallpapers I rotate every hour. A script selects them using advanced machine learning algorithms and stitches them together on multi-screen setups. The selected wallpaper is reused by i3lock.

  1. Apart from the eye candy, a compositor also helps to get tear-free video playbacks.
  2. My configuration works with both Haswell (2014) and Whiskey Lake (2018) Intel GPUs. It also works with AMD GPU based on the Polaris chipset (2017).
  3. You cannot manage two different displays this way e.g. :0 and :1. In the first implementation, I did try to parametrize each service with the associated display, but this is useless: there is only one DBus user session and many services rely on it. For example, you cannot run two notification daemons.
  4. I have only discovered later that XSecureLock ships such a dimmer with a similar implementation. But mine has a cool countdown!

23 August 2021

Vincent Bernat: ThinkPad X1 Carbon (Gen 7): 2 years later

Two years ago, I replaced my ThinkPad X1 Carbon 2014 with the latest generation. The new configuration embeds an Intel Core i7-8565U, 16 Gib of RAM, a 1 Tib NVMe disk, and a WQHD display (2560 1440). I did not ask for a WWAN card. I think it is easier and more reliable to use the wifi hotspot feature of a phone instead: no unreliable firmware and unsupported drivers.1 Here is my opinion on this model.
ThinkPad X1 Carbon 7th Gen with the lid closed
ThinkPad X1 Carbon with its lid closed
While the second generation got a very odd keyboard, this one got a classic one with a full row of function keys. I don t know if my model was defective, but the keyboard skips one keypress from time to time. I have got used to it, but the space key still has a hard time registering when hitting it with my right thumb. The travel course is also shorter and it is less comfortable to type on it than it was on the 2014 version. The trackpoint2 works well. The physical buttons are a welcome addition. I am only using the trackpad for scrolling with the two-finger gesture.
Keyboard of the X1 Carbon 7th Gen
Keyboard with an ANSI QWERTY layout (aka English EU for Lenovo). The LEDs on the speaker and microphone keys work out of the box on Linux.
The screen does not suffer from ghosting effects like the one in my previous ThinkPad. To avoid leaving marks on the screen, I use a piece of cloth on top of the keyboard when closing the laptop. The 720p webcam has a built-in mechanical cover. Its quality is not great, but it does the job.
X1 Carbon 7th Gen
ThinkPad X1 Carbon with its lid opened
Battery life effortlessly reaches about 8 hours on a full charge. This is the main reason this laptop feels like a solid upgrade compared to the previous one: no need to carry a power brick during the day. This is the first Lenovo model with a sound card requiring the Sound Open Firmware. Without the appropriate firmware and the related userspace components (ALSA UCM and PulseAudio), the microphone does not work. If everything is set up correctly, the speakers produce a very decent sound, better than the 2014 model. It should now work out-of-the-box with Debian Bullseye. Just install the firmware-sof-signed package.3 The BIOS can be updated directly from Linux, thanks to the Linux Vendor Firmware Service. I was using the ThinkPad USB-C Dock Gen 2 as a docking station. Everything works out-of-the-box. However, from time to time, I got issues reliably getting an image on the two screens. I was using a couple of 10-year LG monitors with DVI connectors, so I relied on DP HDMI DVI adapters. This may have been the source of some of the problems.
ThinkPad USB-C Dock Gen 2
ThinkPad USB-C Dock Gen 2 with network, keyboard, mouse, power and two screens plugged

In summary, this is a fine laptop plagued with a bad keyboard. I did appreciate the good battery life and the fact there is still one HDMI and two USB-A ports, so I didn t need to travel with dongles. I was disappointed by how small the performance gap was with my 5-year older laptop. I am unsure to get a Lenovo for the next one: HiDPI screens are mostly unavailable and current prices are high. Other possibilities include: Until then, I am using my previous ThinkPad.

  1. The seventh generation moved from Sierra Wireless to Fibocom. While Linux support for modems is good, thanks to ModemManager, they are usually driven over the USB bus. The Fibocom L850-GL can use either the USB bus or the PCI bus but Lenovo s BIOS blacklists the former. It took quite some time to find how to switch it to USB after booting. A PCI driver is in progress.
  2. I did replace the trackpoint with a low-profile one from SaotoTech.
  3. The package is in non-free, despite being open-source: many platforms require the firmware to be signed by Intel.

1 July 2021

Vincent Bernat: Upgrading my desktop PC

I built my current desktop PC in 2014. A second SSD was added in 2015. The motherboard and the power supply were replaced after a fault1 in 2016. The memory was upgraded in 2018. A discrete AMD GPU was installed in 2019 to drive two 4K screens. An NVMe disk was added earlier this year to further increase storage performance. This is a testament to the durability of a desktop PC compared to a laptop: it s evolutive and you can keep it a long time. While fine for most usage, the CPU started to become a bottleneck during video conferences.2 So, it was set for an upgrade. The table below summarizes the change. This update cost me about 800 .
Before After
CPU Intel i5-4670K @ 3.4 GHz AMD Ryzen 5 5600X @ 3.7 GHz
CPU fan Zalman CNPS9900 Noctua NH-U12S
Motherboard Asus Z97-PRO Gamer Asus TUF Gaming B550-PLUS
RAM 2 8 GB + 2 4 GB DDR3 @ 1.6 GHz 2 16 GB DDR4 @ 3.6 GHz
GPU Asus Radeon PH RX 550 4G M7
Disks 500 GB Crucial P2 NVMe
256 GB Samsung SSD 850
256 GB Samsung SSD 840
PSU be quiet! Pure Power CM L8 @ 530 W
Case Antec P100
According to some benchmark, the new CPU should be 4 faster when all cores are used and 1.5 faster for a single-threaded workload. Compiling an arbitrary3 kernel provides a 3 speedup. Before:
$ lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ   MINMHZ
  0    0      0    0 0:0:0:0          yes 3800.0000 800.0000
  1    0      0    1 1:1:1:0          yes 3800.0000 800.0000
  2    0      0    2 2:2:2:0          yes 3800.0000 800.0000
  3    0      0    3 3:3:3:0          yes 3800.0000 800.0000
$ CCACHE_DISABLE=1 =time -f '  %E' make -j$(nproc)
[ ]
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Kernel: arch/x86/boot/bzImage is ready  (#1)
  4:54.32
After:
$ lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE    MAXMHZ    MINMHZ
  0    0      0    0 0:0:0:0          yes 5210.3511 2200.0000
  1    0      0    1 1:1:1:0          yes 4650.2920 2200.0000
  2    0      0    2 2:2:2:0          yes 5210.3511 2200.0000
  3    0      0    3 3:3:3:0          yes 5073.0459 2200.0000
  4    0      0    4 4:4:4:0          yes 4932.1279 2200.0000
  5    0      0    5 5:5:5:0          yes 4791.2100 2200.0000
  6    0      0    0 0:0:0:0          yes 5210.3511 2200.0000
  7    0      0    1 1:1:1:0          yes 4650.2920 2200.0000
  8    0      0    2 2:2:2:0          yes 5210.3511 2200.0000
  9    0      0    3 3:3:3:0          yes 5073.0459 2200.0000
 10    0      0    4 4:4:4:0          yes 4932.1279 2200.0000
 11    0      0    5 5:5:5:0          yes 4791.2100 2200.0000
$ CCACHE_DISABLE=1 =time -f '  %E' make -j$(nproc)
[ ]
  OBJCOPY arch/x86/boot/vmlinux.bin
  AS      arch/x86/boot/header.o
  LD      arch/x86/boot/setup.elf
  OBJCOPY arch/x86/boot/setup.bin
  BUILD   arch/x86/boot/bzImage
Kernel: arch/x86/boot/bzImage is ready  (#1)
  1:40.18
Here we go for another seven years!

  1. The original power supply was from an older configuration. It suddenly became unable to reliably start the PC. The motherboard got replaced as it was the first suspect: without load, the power supply was working correctly.
  2. On Linux, many programs are unable to leverage hardware acceleration. This is a pity. On a laptop, this can also draws the battery pretty fast.
  3. The kernel is configured with make defconfig on commit 15fae3410f1d.

10 June 2021

Vincent Bernat: Serving WebP & AVIF images with Nginx

WebP and AVIF are two image formats for the web. They aim to produce smaller files than JPEG and PNG. They both support lossy and lossless compression, as well as alpha transparency. WebP was developed by Google and is a derivative of the VP8 video format.1 It is supported on most browsers. AVIF is using the newer AV1 video format to achieve better results. It is supported by Chromium-based browsers and has experimental support for Firefox.2

Your browser supports WebP and AVIF image formats. Your browser supports none of these image formats. Your browser only supports the WebP image format. Your browser only supports the AVIF image format.

Without JavaScript, I can t tell what your browser supports.

Converting and optimizing images For this blog, I am using the following shell snippets to convert and optimize JPEG and PNG images. Skip to the next section if you are only interested in the Nginx setup.

JPEG images JPEG images are converted to WebP using cwebp.
find media/images -type f -name '*.jpg' -print0 \
    xargs -0n1 -P$(nproc) -i \
      cwebp -q 84 -af ' ' -o ' '.webp
They are converted to AVIF using avifenc from libavif:
find media/images -type f -name '*.jpg' -print0 \
    xargs -0n1 -P$(nproc) -i \
      avifenc --codec aom --yuv 420 --min 20 --max 25 ' ' ' '.avif
Then, they are optimized using jpegoptim built with Mozilla s improved JPEG encoder, via Nix. This is one reason I love Nix.
jpegoptim=$(nix-build --no-out-link \
      -E 'with (import <nixpkgs> ); jpegoptim.override   libjpeg = mozjpeg;  ')
find media/images -type f -name '*.jpg' -print0 \
    sort -z
    xargs -0n10 -P$(nproc) \
      $ jpegoptim /bin/jpegoptim --max=84 --all-progressive --strip-all

PNG images PNG images are down-sampled to 8-bit RGBA-palette using pngquant. The conversion reduces file sizes significantly while being mostly invisible.
find media/images -type f -name '*.png' -print0 \
    sort -z
    xargs -0n10 -P$(nproc) \
      pngquant --skip-if-larger --strip \
               --quiet --ext .png --force
Then, they are converted to WebP with cwebp in lossless mode:
find media/images -type f -name '*.png' -print0 \
    xargs -0n1 -P$(nproc) -i \
      cwebp -z 8 ' ' -o ' '.webp
No conversion is done to AVIF: lossless compression is not as efficient as pngquant and lossy compression is only marginally better than what I get with WebP.

Keeping only the smallest files I am only keeping WebP and AVIF images if they are at least 10% smaller than the original format: decoding is usually faster for JPEG and PNG; and JPEG images can be decoded progressively.3
for f in media/images/**/*. webp,avif ; do
  orig=$(stat --format %s $ f%.* )
  new=$(stat --format %s $f)
  (( orig*0.90 > new ))   rm $f
done
I only keep AVIF images if they are smaller than WebP.
for f in media/images/**/*.avif; do
  [[ -f $ f%.* .webp ]]   continue
  orig=$(stat --format %s $ f%.* .webp)
  new=$(stat --format %s $f)
  (( $orig > $new ))   rm $f
done
We can compare how many images are kept when converted to WebP or AVIF:
printf "     %10s %10s %10s\n" Original WebP AVIF
for format in png jpg; do
  printf " $ format:u  %10s %10s %10s\n" \
    $(find media/images -name "*.$format"   wc -l) \
    $(find media/images -name "*.$format.webp"   wc -l) \
    $(find media/images -name "*.$format.avif"   wc -l)
done
AVIF is better than MozJPEG for most JPEG files while WebP beats MozJPEG only for one file out of two:
       Original       WebP       AVIF
 PNG         64         47          0
 JPG         83         40         74

Further reading I didn t detail my choices for quality parameters and there is not much science in it. Here are two resources providing more insight on AVIF:

Serving WebP & AVIF with Nginx To serve WebP and AVIF images, there are two possibilities:
  1. use <picture> to let the browser pick the format it supports, or
  2. use content negotiation to let the server send the best-supported format.
I use the second approach. It relies on inspecting the Accept HTTP header in the request. For Chrome, it looks like this:
Accept: image/avif,image/webp,image/apng,image/*,*/*;q=0.8
I configure Nginx to serve AVIF image, then the WebP image, and fallback to the original JPEG/PNG image depending on what the browser advertises:4
http  
  map $http_accept $webp_suffix  
    default        "";
    "~image/webp"  ".webp";
   
  map $http_accept $avif_suffix  
    default        "";
    "~image/avif"  ".avif";
   
 
server  
  # [ ]
  location ~ ^/images/.*\.(png jpe?g)$  
    add_header Vary Accept;
    try_files $uri$avif_suffix$webp_suffix $uri$avif_suffix $uri$webp_suffix $uri =404;
   
 
For example, let s suppose the browser requests /images/ont-box-orange@2x.jpg. If it supports WebP but not AVIF, $webp_suffix is set to .webp while $avif_suffix is set to the empty string. The server tries to serve the first existing file in this list:
  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg
  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg
If the browser supports both AVIF and WebP, Nginx walks the following list:
  • /images/ont-box-orange@2x.jpg.webp.avif (it never exists)
  • /images/ont-box-orange@2x.jpg.avif
  • /images/ont-box-orange@2x.jpg.webp
  • /images/ont-box-orange@2x.jpg
Eugene Lazutkin explains in more detail how this works. I have only presented a variation of his setup supporting both WebP and AVIF.

  1. VP8 is only used for lossy compression. Lossless compression is using an unrelated format.
  2. Firefox support was scheduled for Firefox 86 but because of the lack of proper color space support, it is still not enabled by default.
  3. Progressive decoding is not planned for WebP but could be implemented using low-quality thumbnail images for AVIF. See this issue for a discussion.
  4. The Vary header ensures an intermediary cache (a proxy or a CDN) checks the Accept header before using a cached response. Internet Explorer has trouble with this header and may not be able to cache the resource properly. There is a workaround but Internet Explorer s market share is now so small that it is pointless to implement it.

25 May 2021

Vincent Bernat: Jerikan+Ansible: a configuration management system for network

There are many resources for network automation with Ansible. Most of them only expose the first steps or limit themselves to a narrow scope. They give no clue on how to expand from that. Real network environments may be large, versatile, heterogeneous, and filled with exceptions. The lack of real-world examples for Ansible deployments, unlike Puppet and SaltStack, leads many teams to build brittle and incomplete automation solutions. We have released under an open-source license our attempt to tackle this problem: Here is a quick demo to configure a new peering:
This work is the collective effort of C dric Hasco t, Jean-Christophe Legatte, Lo c Pailhas, S bastien Hurtel, Tchadel Icard, and Vincent Bernat. We are the network team of Blade, a French company operating Shadow, a cloud-computing product. In May 2021, our company was bought by Octave Klaba and the infrastructure is being transferred to OVHcloud, saving Shadow as a product, but making our team redundant. Our network was around 800 devices, spanning over 10 datacenters with more than 2.5 Tbps of available egress bandwidth. The released material is therefore a substantial example of managing a medium-scale network using Ansible. We have left out the handling of our legacy datacenters to make the final result more readable while keeping enough material to not turn it into a trivial example.

Jerikan The first component is Jerikan. As input, it takes a list of devices, configuration data, templates, and validation scripts. It generates a set of configuration files for each device. Ansible could cover this task, but it has the following limitations:
  • it is slow;
  • errors are difficult to debug;1 and
  • the hierarchy to look up a variable is rigid.
Jerikan inputs and outputs
Jerikan inputs and outputs
If you want to follow the examples, you only need to have Docker and Docker Compose installed. Clone the repository and you are ready!

Source of truth We use YAML files, versioned with Git, as the single source of truth instead of using a database, like NetBox, or a mix of a database and text files. This provides many advantages:
  • anyone can use their preferred text editor;
  • the team prepares changes in branches;
  • the team reviews changes using merge requests;
  • the merge requests expose the changes to the generated configuration files;
  • rollback to a previous state is easy; and
  • it is fast.
The first file is devices.yaml. It contains the device list. The second file is classifier.yaml. It defines a scope for each device. A scope is a set of keys and values. It is used in templates and to look up data associated with a device.
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
continent: apac
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p1.sk1
location: sk1
member: '1'
model: dell-s4048
os: cumulus
pod: '1'
shorthost: to1-p1
The device name is matched against a list of regular expressions and the scope is extended by the result of each match. For to1-p1.sk1.blade-group.net, the following subset of classifier.yaml defines its scope:
matchers:
  - '^(([^.]*)\..*)\.blade-group\.net':
      environment: prod
      host: '\1'
      shorthost: '\2'
  - '\.(sk1)\.':
      location: '\1'
      continent: apac
  - '^to([12])-[as]?p(\d+)\.':
      member: '\1'
      pod: '\2'
  - '^to[12]-p\d+\.':
      groups:
        - tor
        - tor-bgp
        - tor-bgp-compute
  - '^to[12]-(p ap)\d+\.sk1\.':
      os: cumulus
      model: dell-s4048
The third file is searchpaths.py. It describes which directories to search for a variable. A Python function provides a list of paths to look up in data/ for a given scope. Here is a simplified version:2
def searchpaths(scope):
    paths = [
        "host/ scope[location] / scope[shorthost] ",
        "location/ scope[location] ",
        "os/ scope[os] - scope[model] ",
        "os/ scope[os] ",
        'common'
    ]
    for idx in range(len(paths)):
        try:
            paths[idx] = paths[idx].format(scope=scope)
        except KeyError:
            paths[idx] = None
    return [path for path in paths if path]
With this definition, the data for to1-p1.sk1.blade-group.net is looked up in the following paths:
$ ./run-jerikan scope to1-p1.sk1.blade-group.net
[ ]
Search paths:
  host/sk1/to1-p1
  location/sk1
  os/cumulus-dell-s4048
  os/cumulus
  common
Variables are scoped using a namespace that should be specified when doing a lookup. We use the following ones:
  • system for accounts, DNS, syslog servers,
  • topology for ports, interfaces, IP addresses, subnets,
  • bgp for BGP configuration
  • build for templates and validation scripts
  • apps for application variables
When looking up for a variable in a given namespace, Jerikan looks for a YAML file named after the namespace in each directory in the search paths. For example, if we look up a variable for to1-p1.sk1.blade-group.net in the bgp namespace, the following YAML files are processed: host/sk1/to1-p1/bgp.yaml, location/sk1/bgp.yaml, os/cumulus-dell-s4048/bgp.yaml, os/cumulus/bgp.yaml, and common/bgp.yaml. The search stops at the first match. The schema.yaml file allows us to override this behavior by asking to merge dictionaries and arrays across all matching files. Here is an excerpt of this file for the topology namespace:
system:
  users:
    merge: hash
  sampling:
    merge: hash
  ansible-vars:
    merge: hash
  netbox:
    merge: hash
The last feature of the source of truth is the ability to use Jinja2 templates for keys and values by prefixing them with ~ :
# In data/os/junos/system.yaml
netbox:
  manufacturer: Juniper
  model: "~  model upper  "
# In data/groups/tor-bgp-compute/system.yaml
netbox:
  role: net_tor_gpu_switch
Looking up for netbox in the system namespace for to1-p2.ussfo03.blade-group.net yields the following result:
$ ./run-jerikan scope to1-p2.ussfo03.blade-group.net
continent: us
environment: prod
groups:
- tor
- tor-bgp
- tor-bgp-compute
host: to1-p2.ussfo03
location: ussfo03
member: '1'
model: qfx5110-48s
os: junos
pod: '2'
shorthost: to1-p2
[ ]
Search paths:
[ ]
  groups/tor-bgp-compute
[ ]
  os/junos
  common
$ ./run-jerikan lookup to1-p2.ussfo03.blade-group.net system netbox
manufacturer: Juniper
model: QFX5110-48S
role: net_tor_gpu_switch
This also works for structured data:
# In groups/adm-gateway/topology.yaml
interface-rescue:
  address: "~  lookup('topology', 'addresses').rescue  "
  up:
    - "~ip route add default via   lookup('topology', 'addresses').rescue ipaddr('first_usable')   table rescue"
    - "~ip rule add from   lookup('topology', 'addresses').rescue ipaddr('address')   table rescue priority 10"
# In groups/adm-gateway-sk1/topology.yaml
interfaces:
  ens1f0: "~  lookup('topology', 'interface-rescue')  "
This yields the following result:
$ ./run-jerikan lookup gateway1.sk1.blade-group.net topology interfaces
[ ]
ens1f0:
  address: 121.78.242.10/29
  up:
  - ip route add default via 121.78.242.9 table rescue
  - ip rule add from 121.78.242.10 table rescue priority 10
When putting data in the source of truth, we use the following rules:
  1. Don t repeat yourself.
  2. Put the data in the most specific place without breaking the first rule.
  3. Use templates with parsimony, mostly to help with the previous rules.
  4. Restrict the data model to what is needed for your use case.
The first rule is important. For example, when specifying IP addresses for a point-to-point link, only specify one side and deduce the other value in the templates. The last rule means you do not need to mimic a BGP YANG model to specify BGP peers and policies:
peers:
  transit:
    cogent:
      asn: 174
      remote:
        - 38.140.30.233
        - 2001:550:2:B::1F9:1
      specific-import:
        - name: ATT-US
          as-path: ".*7018$"
          lp-delta: 50
  ix-sfmix:
    rs-sfmix:
      monitored: true
      asn: 63055
      remote:
        - 206.197.187.253
        - 206.197.187.254
        - 2001:504:30::ba06:3055:1
        - 2001:504:30::ba06:3055:2
    blizzard:
      asn: 57976
      remote:
        - 206.197.187.42
        - 2001:504:30::ba05:7976:1
      irr: AS-BLIZZARD

Templates The list of templates to compile for each device is stored in the source of truth, under the build namespace:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: junos/main.j2
config-base.txt: junos/base.j2
config-irr.txt: junos/irr.j2
$ ./run-jerikan lookup to1-p1.ussfo03.blade-group.net build templates
data.yaml: data.j2
config.txt: cumulus/main.j2
frr.conf: cumulus/frr.j2
interfaces.conf: cumulus/interfaces.j2
ports.conf: cumulus/ports.j2
dhcpd.conf: cumulus/dhcp.j2
default-isc-dhcp: cumulus/default-isc-dhcp.j2
authorized_keys: cumulus/authorized-keys.j2
motd: linux/motd.j2
acl.rules: cumulus/acl.j2
rsyslog.conf: cumulus/rsyslog.conf.j2
Templates are using Jinja2. This is the same engine used in Ansible. Jerikan ships some custom filters but also reuse some of the useful filters from Ansible, notably ipaddr. Here is an excerpt of templates/junos/base.j2 to configure DNS and NTP servers on Juniper devices:
system  
  ntp  
 % for ntp in lookup("system", "ntp") % 
    server   ntp  ;
 % endfor % 
   
  name-server  
 % for dns in lookup("system", "dns") % 
      dns  ;
 % endfor % 
   
 
The equivalent template for Cisco IOS-XR is:
 % for dns in lookup('system', 'dns') % 
domain vrf VRF-MANAGEMENT name-server   dns  
 % endfor % 
!
 % for syslog in lookup('system', 'syslog') % 
logging   syslog   vrf VRF-MANAGEMENT
 % endfor % 
!
There are three helper functions provided:
  • devices() returns the list of devices matching a set of conditions on the scope. For example, devices("location==ussfo03", "groups==tor-bgp") returns the list of devices in San Francisco in the tor-bgp group. You can also omit the operator if you want the specified value to be equal to the one in the local scope. For example, devices("location") returns devices in the current location.
  • lookup() does a key lookup. It takes the namespace, the key, and optionally, a device name. If not provided, the current device is assumed.
  • scope() returns the scope of the provided device.
Here is how you would define iBGP sessions between edge devices in the same location:
 % for neighbor in devices("location", "groups==edge") if neighbor != device % 
   % for address in lookup("topology", "addresses", neighbor).loopback tolist % 
protocols bgp group IPV  address ipv  -EDGES-IBGP  
  neighbor   address    
    description "IPv  address ipv  : iBGP to   neighbor  ";
   
 
   % endfor % 
 % endfor % 
We also have a global key-value store to save information to be reused in another template or device. This is quite useful to automatically build DNS records. First, capture the IP address inserted into a template with store() as a filter:
interface Loopback0
 description 'Loopback:'
  % for address in lookup('topology', 'addresses').loopback tolist % 
 ipv  address ipv   address   address store('addresses', 'Loopback0') ipaddr('cidr')  
  % endfor % 
!
Then, reuse it later to build DNS records by iterating over store():4
 % for device, ip, interface in store('addresses') % 
   % set interface = interface replace('/', '-') replace('.', '-') replace(':', '-') % 
   % set name = ' . '.format(interface lower, device) % 
  name  . IN   'A' if ip ipv4 else 'AAAA'     ip ipaddr('address')  
 % endfor % 
Templates are compiled locally with ./run-jerikan build. The --limit argument restricts the devices to generate configuration files for. Build is not done in parallel because a template may depend on the data collected by another template. Currently, it takes 1 minute to compile around 3000 files spanning over 800 devices.
Jerikan outputs when building templates
Output of Jerikan after building configuration files for six devices
When an error occurs, a detailed traceback is displayed, including the template name, the line number and the value of all visible variables. This is a major time-saver compared to Ansible!
templates/opengear/config.j2:15: in top-level template code
    config.interfaces.  interface  .netmask   adddress   ipaddr("netmask")  
        continent  = 'us'
        device     = 'con1-ag2.ussfo03.blade-group.net'
        environment = 'prod'
        host       = 'con1-ag2.ussfo03'
        infos      =  'address': '172.30.24.19/21' 
        interface  = 'wan'
        location   = 'ussfo03'
        loop       = <LoopContext 1/2>
        member     = '2'
        model      = 'cm7132-2-dac'
        os         = 'opengear'
        shorthost  = 'con1-ag2'
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
value = JerkianUndefined, query = 'netmask', version = False, alias = 'ipaddr'
[ ]
        # Check if value is a list and parse each element
        if isinstance(value, (list, tuple, types.GeneratorType)):
            _ret = [ipaddr(element, str(query), version) for element in value]
            return [item for item in _ret if item]
>       elif not value or value is True:
E       jinja2.exceptions.UndefinedError: 'dict object' has no attribute 'adddress'
We don t have general-purpose rules when writing templates. Like for the source of truth, there is no need to create generic templates able to produce any BGP configuration. There is a balance to be found between readability and avoiding duplication. Templates can become scary and complex: sometimes, it s better to write a filter or a function in jerikan/jinja.py. Mastering Jinja2 is a good investment. Take time to browse through our templates as some of them show interesting features.

Checks Optionally, each configuration file can be validated by a script in the checks/ directory. Jerikan looks up the key checks in the build namespace to know which checks to run:
$ ./run-jerikan lookup edge1.ussfo03.blade-group.net build checks
- description: Juniper configuration file syntax check
  script: checks/junoser
  cache:
    input: config.txt
    output: config-set.txt
- description: check YAML data
  script: checks/data.yaml
  cache: data.yaml
In the above example, checks/junoser is executed if there is a change to the generated config.txt file. It also outputs a transformed version of the configuration file which is easier to understand when using diff. Junoser checks a Junos configuration file using Juniper s XML schema definition for Netconf.5 On error, Jerikan displays:
jerikan/build.py:127: RuntimeError
-------------- Captured syntax check with Junoser call --------------
P: checks/junoser edge2.ussfo03.blade-group.net
C: /app/jerikan
O:
E: Invalid syntax:  set system syslog archive size 10m files 10 word-readable
S: 1

Integration into GitLab CI The next step is to compile the templates using a CI. As we are using GitLab, Jerikan ships with a .gitlab-ci.yml file. When we need to make a change, we create a dedicated branch and a merge request. GitLab compiles the templates using the same environment we use on our laptops and store them as an artifact.
Merge requests in GitLab for a change
Merge request to add a new port in USSFO03. The templates were compiled successfully but approval from another team member is still required to merge.
Before approving the merge request, another team member looks at the changes in data and templates but also the differences for the generated configuration files:
Differences for generated configuration files
The change configures a port on the Juniper device, adds records to DNS, and updates NetBox with the new IP addresses (not shown).

Ansible After Jerikan has built the configuration files, Ansible takes over. It is also packaged as a Docker image to avoid the trouble to maintain the right Python virtual environment and ensure everyone is using the same versions.

Inventory Jerikan has generated an inventory file. It contains all the managed devices, the variables defined for each of them and the groups converted to Ansible groups:
ob1-n1.sk1.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob2-n1.sk1.blade-group.net ansible_host=172.29.15.13 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
ob1-n1.ussfo03.blade-group.net ansible_host=172.29.15.12 ansible_user=blade ansible_connection=network_cli ansible_network_os=ios
none ansible_connection=local
[oob]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[os-ios]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[model-c2960s]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
[location-sk1]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
[location-ussfo03]
ob1-n1.ussfo03.blade-group.net
[in-sync]
ob1-n1.sk1.blade-group.net
ob2-n1.sk1.blade-group.net
ob1-n1.ussfo03.blade-group.net
none
in-sync is a special group for devices which configuration should match the golden configuration. Daily and unattended, Ansible should be able to push configurations to this group. The mid-term goal is to cover all devices. none is a special device for tasks not related to a specific host. This includes synchronizing NetBox, IRR objects, and the DNS, updating the RPKI, and building the geofeed files.

Playbook We use a single playbook for all devices. It is described in the ansible/playbooks/site.yaml file. Here is a shortened version:
- hosts: adm-gateway:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - blade.adm-gateway
    - done
- hosts: os-linux:!done
  strategy: mitogen_linear
  roles:
    - blade.linux
    - done
- hosts: os-junos:!done
  gather_facts: false
  roles:
    - blade.junos
    - done
- hosts: os-opengear:!done
  gather_facts: false
  roles:
    - blade.opengear
    - done
- hosts: none:!done
  gather_facts: false
  roles:
    - blade.none
    - done
A host executes only one of the play. For example, a Junos device executes the blade.junos role. Once a play has been executed, the device is added to the done group and the other plays are skipped. The playbook can be executed with the configuration files generated by the GitLab CI using the ./run-ansible-gitlab command. This is a wrapper around Docker and the ansible-playbook command and it accepts the same arguments. To deploy the configuration on the edge devices for the SK1 datacenter in check mode, we use:
$ ./run-ansible-gitlab playbooks/site.yaml --limit='edge:&location-sk1' --diff --check
[ ]
PLAY RECAP *************************************************************
edge1.sk1.blade-group.net  : ok=6    changed=0    unreachable=0    failed=0    skipped=3    rescued=0    ignored=0
edge2.sk1.blade-group.net  : ok=5    changed=0    unreachable=0    failed=0    skipped=1    rescued=0    ignored=0
We have some rules when writing roles:
  • --check must detect if a change is needed;
  • --diff must provide a visualization of the planned changes;
  • --check and --diff must not display anything if there is nothing to change;
  • writing a custom module tailored to our needs is a valid solution;
  • the whole device configuration is managed;6
  • secrets must be stored in Vault;
  • templates should be avoided as we have Jerikan for that; and
  • avoid duplication and reuse tasks.7
We avoid using collections from Ansible Galaxy, the exception being collections to connect and interact with vendor devices, like cisco.iosxr collection. The quality of Ansible Galaxy collections is quite random and it is an additional maintenance burden. It seems better to write roles tailored to our needs. The collections we use are in ci/ansible/ansible-galaxy.yaml. We use Mitogen to get a 10 speedup on Ansible executions on Linux hosts. We also have a few playbooks for operational purpose: upgrading the OS version, isolate an edge router, etc. We were also planning on how to add operational checks in roles: are all the BGP sessions up? They could have been used to validate a deployment and rollback if there is an issue. Currently, our playbooks are run from our laptops. To keep tabs, we are using ARA. A weekly dry-run on devices in the in-sync group also provides a dashboard on which devices we need to run Ansible on.

Configuration data and templates Jerikan ships with pre-populated data and templates matching the configuration of our USSFO03 and SK1 datacenters. They do not exist anymore but, we promise, all this was used in production back in the days!
Network architecture for Blade datacenter
The latest iteration of our network infrastructure for SK1, USSFO03, and future data centers. The production network is using BGPttH using a spine-leaf fabric. The out-of-band network is using a simple L2 design, using the spanning tree protocol, as well as a set of console servers.
Notably, you can find the configuration for:
our edge routers
Some are running on Junos, like edge2.ussfo03, the others on IOS-XR, like edge1.sk1. The implemented functionalities are similar in both cases and we could swap one for the other. It includes the BGP configuration for transits, peerings, and IX as well as the associated BGP policies. PeeringDB is queried to get the maximum number of prefixes to accept for peerings. bgpq3 and a containerized IRRd help to filter received routes. A firewall is added to protect the routing engine. Both IPv4 and IPv6 are configured.
our BGP-based fabric
BGP is used inside the datacenter8 and is extended on bare-metal hosts. The configuration is automatically derived from the device location and the port number.9 Top-of-the-rack devices are using passive BGP sessions for ports towards servers. They are also serving a provisioning network to let them boot using DHCP and PXE. They also act as a DHCP server. The design is multivendor. Some devices are running Cumulus Linux, like to1-p1.ussfo03, while some others are running Junos, like to1-p2.ussfo03.
our out-of-band fabric
We are using Cisco Catalyst 2960 switches to build an L2 out-of-band network. To provide redundancy and saving a few bucks on wiring, we build small loops and run the spanning-tree protocol. See ob1-p1.ussfo03. It is redundantly connected to our gateway servers. We also use OpenGear devices for console access. See con1-n1.ussfo03
our administrative gateways
These Linux servers have multiple purposes: SSH jump boxes, rescue connection, direct access to the out-of-band network, zero-touch provisioning of network devices,10 Internet access for management flows, centralization of the console servers using Conserver, and API for autoconfiguration of BGP sessions for bare-metal servers. They are the first servers installed in a new datacenter and are used to provision everything else. Check both the generated files and the associated Ansible tasks.

  1. Ansible does not even provide a line number when there is an error in a template. You may need to find the problem by bisecting.
    $ ansible --version
    ansible 2.10.8
    [ ]
    $ cat test.j2
    Hello   name  !
    $ ansible all -i localhost, \
    >  --connection=local \
    >  -m template \
    >  -a "src=test.j2 dest=test.txt"
    localhost   FAILED! =>  
        "changed": false,
        "msg": "AnsibleUndefinedVariable: 'name' is undefined"
     
    
  2. You may recognize the same concepts as in Hiera, the hierarchical key-value store from Puppet. At first, we were using Jerakia, a similar independent store exposing an HTTP REST interface. However, the lookup overhead is too large for our use. Jerikan implements the same functionality within a Python function.
  3. The list of available filters is mangled inside jerikan/jinja.py. This is a remain of the fact we do not maintain Jerikan as a standalone software.
  4. This is a bit confusing: we have a store() filter and a store() function. With Jinja2, filters and functions live in two different namespaces.
  5. We are using a fork with some modifications to be able to validate our configurations and exposing an HTTP service to reduce the time spent on each configuration check.
  6. There is a trend in network automation to automate a configuration subset, for example by having a playbook to create a new BGP session. We believe this is wrong: with time, your configuration will get out-of-sync with its expected state, notably hand-made changes will be left undetected.
  7. See ansible/roles/blade.linux/tasks/firewall.yaml and ansible/roles/blade.linux/tasks/interfaces.yaml. They are meant to be called when needed, using import_role.
  8. We also have some datacenters using BGP EVPN VXLAN at medium-scale using Juniper devices. As they are still in production today, we didn t include this feature but we may publish it in the future.
  9. In retrospect, this may not be a good idea unless you are pretty sure everything is uniform (number of switches for each layer, number of ports). This was not our case. We now think it is a better idea to assign a prefix to each device and write it in the source of truth.
  10. Non-linux based devices are upgraded and configured unattended. Cumulus Linux devices are automatically upgraded on install but the final configuration has to be pushed using Ansible: we didn t want to duplicate the configuration process using another tool.

24 May 2021

Vincent Bernat: Transient prompt with Zsh

Powerlevel10k is a theme for Zsh. It contains some powerful features, is astoundingly fast, and easy to customize. I am quite amazed at the skills of its main author. Be sure to also have a look at Zsh for Humans, a complete Zsh configuration including this theme. One of the nice features of Powerlevel10k is transient prompts: past prompts are reduced to a more minimal configuration to save space by removing unneeded information.
Demonstration of a transient prompt with Zsh: past prompts use a more compact form
My implementation of a transient prompt with Zsh. Past prompts are compact and include the time of the command execution, the hostname, and the status of the previous command while the complete prompt contains more information like the current directory and the Git branch.
When it comes to configuring my shell, I still prefer writing and understanding each line going into it. Therefore, I am still building my Zsh configuration from scratch. Here is how I have integrated the above transient feature into my prompt. The first step is to configure the appearance of the prompt in its compact form. Let s assume we have a variable, $_vbe_prompt_compact set to 1 when we want a compact prompt. We use the following function to define the prompt appearance:
_vbe_prompt ()  
    local retval=$?
    # When compact, just time + prompt sign
    if (( $_vbe_prompt_compact )); then
        # Current time (with timezone for remote hosts)
        _vbe_prompt_segment cyan default "%D %H:%M$ SSH_TTY+ %Z  "
        # Hostname for remote hosts
        [[ $SSH_TTY ]] && \
            _vbe_prompt_segment black magenta "%B%M%b"
        # Status of the last command
        if (( $retval )); then
            _vbe_prompt_segment red default $ PRCH[reta] 
        else
            _vbe_prompt_segment green cyan $ PRCH[ok] 
        fi
        # End of prompt
        _vbe_prompt_end
        return
    fi
    # Regular prompt with many information
    # [ ]
 
setopt prompt_subst
PS1='$(_vbe_prompt) '

Update (2021.05) The following part has been rewritten to be more robust. The code is stolen from Powerlevel10k s issue #888. See the comments for more details.

Our next step is to redraw the prompt after accepting a command. We wrap Zsh line editor into a function:1
_vbe-zle-line-init()  
    [[ $CONTEXT == start ]]   return 0
    # Start regular line editor
    (( $+zle_bracketed_paste )) && print -r -n - $zle_bracketed_paste[1]
    zle .recursive-edit
    local -i ret=$?
    (( $+zle_bracketed_paste )) && print -r -n - $zle_bracketed_paste[2]
    # If we received EOT, we exit the shell
    if [[ $ret == 0 && $KEYS == $'\4' ]]; then
        _vbe_prompt_compact=1
        zle .reset-prompt
        exit
    fi
    # Line edition is over. Shorten the current prompt.
    _vbe_prompt_compact=1
    zle .reset-prompt
    unset _vbe_prompt_compact
    if (( ret )); then
        # Ctrl-C
        zle .send-break
    else
        # Enter
        zle .accept-line
    fi
    return ret
 
zle -N zle-line-init _vbe-zle-line-init
That s all!
One downside of using the powerline fonts is that it messes with copy/paste. As I am using tmux, I use the following snippet to work around this issue and use only standard Unicode characters when copying from the terminal:
bind-key -T copy-mode M-w \
  send -X copy-pipe-and-cancel "sed 's/ .* /%/g'   xclip -i -selection clipboard" \;\
  display-message "Selection saved to clipboard!"
Copying and pasting the text from the screenshot above yields the following text:
14:21 % ssh eizo.luffy.cx
Linux eizo 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64
Last login: Fri Apr 23 14:20:39 2021 from 2a01:cb00:3f:b02:9db6:efa4:d85:7f9f
14:21 CEST % uname -a
Linux eizo 4.19.0-16-amd64 #1 SMP Debian 4.19.181-1 (2021-03-19) x86_64 GNU/Linux
14:21 CEST %
Connection to eizo.luffy.cx closed.
14:22 % git status
On branch article/zsh-transient
Untracked files:
  (use "git add <file>..." to include in what will be committed)
        ../../media/images/zsh-compact-prompt@2x.jpg
nothing added to commit but untracked files present (use "git add" to track)

  1. We have to manually enable bracketed paste because Zsh does it after zle-line-init.

16 February 2021

Michael Prokop: How to properly use 3rd party Debian repository signing keys with apt

(Blogging this, since this is a recurring anti-pattern I noticed at several customers and often comes up during deployments of 3rd party repositories.) Update on 2021-02-19: clarified, that Signed-By requires apt >= 1.1, thanks Vincent Bernat Many upstream projects provide Debian repository instructions like this:
curl -fsSL https://example.com/stable/debian.gpg   sudo apt-key add -
Do not follow this, for different reasons, including:
  1. You do not see what you get before adding the GPG key to your global apt trust store
  2. You can t easily script this via your preferred configuration management (the apt-key manpage clearly discourages programmatic usage)
  3. The signing key is considered valid for all your enabled Debian repositories (instead of only a specific one)
  4. You need GnuPG (either gnupg2 or gnupg1) on your system for usage with apt-key
There s a much better approach to this: download the GPG key, make sure it s in the appropriate format, then use it via deb [signed-by=/usr/share/keyrings/ ] in your apt s sources list configuration. Note and FTR: the Signed-By feature is available starting with apt 1.1 (so apt in Debian jessie/8 and older does not support it). TL;DR: As an example, let s demonstrate this with the Tailscale Debian repository for buster.
Downloading the GPG file will give you an ascii-armored GPG file:
% curl -fsSL -o buster.gpg https://pkgs.tailscale.com/stable/debian/buster.gpg
% gpg --keyid-format long buster.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/458CA832957F5868 2020-02-25 [SC]
      2596A99EAAB33821893C0A79458CA832957F5868
uid                           Tailscale Inc. (Package repository signing key) <info@tailscale.com>
sub   rsa4096/B1547A3DDAAF03C6 2020-02-25 [E]
% file buster.gpg
buster.gpg: PGP public key block Public-Key (old)
If you have apt version >= 1.4 available (Debian >=stretch/9 and Ubuntu >=bionic/18.04), you can use this file directly as follows:
% sudo mv buster.gpg /usr/share/keyrings/tailscale.asc
% cat /etc/apt/sources.list.d/tailscale.list
deb [signed-by=/usr/share/keyrings/tailscale.asc] https://pkgs.tailscale.com/stable/debian buster main
% sudo apt update
[...]
And you re done! Iff your apt version really is older than 1.4, you need to convert the ascii-armored GPG file into a GPG key public ring file (AKA binary OpenPGP format), either by just dearmor-ing it (if you don t care about checking ID + fingerprint):
% gpg --dearmor < buster.gpg > tailscale.gpg
or if you prefer to go via GPG, you can also use a temporary GPG home directory (if you don t care about going through your personal GPG setup):
% mkdir --mode=700 /tmp/gpg-tmpdir
% gpg --homedir /tmp/gpg-tmpdir --import ./buster.gpg
gpg: keybox '/tmp/gpg-tmpdir/pubring.kbx' created
gpg: /tmp/gpg-tmpdir/trustdb.gpg: trustdb created
gpg: key 458CA832957F5868: public key "Tailscale Inc. (Package repository signing key) <info@tailscale.com>" imported
gpg: Total number processed: 1
gpg:               imported: 1
% gpg --homedir /tmp/gpg-tmpdir --output tailscale.gpg  --export-options=export-minimal --export 0x458CA832957F5868
% rm -rf /tmp/gpg-tmpdir
The resulting GPG key public ring file should look like that:
% file tailscale.gpg 
tailscale.gpg: PGP/GPG key public ring (v4) created Tue Feb 25 04:51:20 2020 RSA (Encrypt or Sign) 4096 bits MPI=0xc00399b10bc12858...
% gpg tailscale.gpg 
gpg: WARNING: no command supplied.  Trying to guess what you mean ...
pub   rsa4096/458CA832957F5868 2020-02-25 [SC]
      2596A99EAAB33821893C0A79458CA832957F5868
uid                           Tailscale Inc. (Package repository signing key) <info@tailscale.com>
sub   rsa4096/B1547A3DDAAF03C6 2020-02-25 [E]
Then you can use this GPG file on your system as follows:
% sudo mv tailscale.gpg /usr/share/keyrings/tailscale.gpg
% cat /etc/apt/sources.list.d/tailscale.list
deb [signed-by=/usr/share/keyrings/tailscale.gpg] https://pkgs.tailscale.com/stable/debian buster main
% sudo apt update
[...]
Such a setup ensures:
  1. You can verify the GPG key file (ID + fingerprint)
  2. You can easily ship files via /usr/share/keyrings/ and refer to it in your deployment scripts, configuration management, (and can also easily update or get rid of them again!)
  3. The GPG key is valid only for the repositories with the corresponding [signed-by=/usr/share/keyrings/ ] entry
  4. You don t need to install GnuPG (neither gnupg2 nor gnupg1) on the system which is using the 3rd party Debian repository
Thanks: Guillem Jover for reviewing an early draft of this blog article.

17 January 2021

Wouter Verhelst: Software available through Extrepo

Just over 7 months ago, I blogged about extrepo, my answer to the "how do you safely install software on Debian without downloading random scripts off the Internet and running them as root" question. I also held a talk during the recent "MiniDebConf Online" that was held, well, online. The most important part of extrepo is "what can you install through it". If the number of available repositories is too low, there's really no reason to use it. So, I thought, let's look what we have after 7 months... To cut to the chase, there's a bunch of interesting content there, although not all of it has a "main" policy. Each of these can be enabled by installing extrepo, and then running extrepo enable <reponame>, where <reponame> is the name of the repository. Note that the list is not exhaustive, but I intend to show that even though we're nowhere near complete, extrepo is already quite useful in its current state:

Free software
  • The debian_official, debian_backports, and debian_experimental repositories contain Debian's official, backports, and experimental repositories, respectively. These shouldn't have to be managed through extrepo, but then again it might be useful for someone, so I decided to just add them anyway. The config here uses the deb.debian.org alias for CDN-backed package mirrors.
  • The belgium_eid repository contains the Belgian eID software. Obviously this is added, since I'm upstream for eID, and as such it was a large motivating factor for me to actually write extrepo in the first place.
  • elastic: the elasticsearch software.
  • Some repositories, such as dovecot, winehq and bareos contain upstream versions of their respective software. These two repositories contain software that is available in Debian, too; but their upstreams package their most recent release independently, and some people might prefer to run those instead.
  • The sury, fai, and postgresql repositories, as well as a number of repositories such as openstack_rocky, openstack_train, haproxy-1.5 and haproxy-2.0 (there are more) contain more recent versions of software packaged in Debian already by the same maintainer of that package repository. For the sury repository, that is PHP; for the others, the name should give it away. The difference between these repositories and the ones above is that it is the official Debian maintainer for the same software who maintains the repository, which is not the case for the others.
  • The vscodium repository contains the unencumbered version of Microsoft's Visual Studio Code; i.e., the codium version of Visual Studio Code is to code as the chromium browser is to chrome: it is a build of the same softare, but without the non-free bits that make code not entirely Free Software.
  • While Debian ships with at least two browsers (Firefox and Chromium), additional browsers are available through extrepo, too. The iridiumbrowser repository contains a Chromium-based browser that focuses on privacy.
  • Speaking of privacy, perhaps you might want to try out the torproject repository.
  • For those who want to do Cloud Computing on Debian in ways that isn't covered by Openstack, there is a kubernetes repository that contains the Kubernetes stack, the as well as the google_cloud one containing the Google Cloud SDK.

Non-free software While these are available to be installed through extrepo, please note that non-free and contrib repositories are disabled by default. In order to enable these repositories, you must first enable them; this can be accomplished through /etc/extrepo/config.yaml.
  • In case you don't care about freedom and want the official build of Visual Studio Code, the vscode repository contains it.
  • While we're on the subject of Microsoft, there's also Microsoft Teams available in the msteams repository. And, hey, skype.
  • For those who are not satisfied with the free browsers in Debian or any of the free repositories, there's opera and google_chrome.
  • The docker-ce repository contains the official build of Docker CE. While this is the free "community edition" that should have free licenses, I could not find a licensing statement anywhere, and therefore I'm not 100% sure whether this repository is actually free software. For that reason, it is currently marked as a non-free one. Merge Requests for rectifying that from someone with more information on the actual licensing situation of Docker CE would be welcome...
  • For gamers, there's Valve's steam repository.
Again, the above lists are not meant to be exhaustive. Special thanks go out to Russ Allbery, Kim Alvefur, Vincent Bernat, Nick Black, Arnaud Ferraris, Thorsten Glaser, Thomas Goirand, Juri Grabowski, Paolo Greppi, and Josh Triplett, for helping me build the current list of repositories. Is your favourite repository not listed? Create a configuration based on template.yaml, and file a merge request!

15 November 2020

Vincent Bernat: Zero-Touch Provisioning for Juniper

Juniper s official documentation on ZTP explains how to configure the ISC DHCP Server to automatically upgrade and configure on first boot a Juniper device. However, the proposed configuration could be a bit more elegant. This note explains how.

TL;DR Do not redefine option 43. Instead, specify the vendor option space to use to encode parameters with vendor-option-space.


When booting for the first time, a Juniper device requests its IP address through a DHCP discover message, then request additional parameters for autoconfiguration through a DHCP request message:
Dynamic Host Configuration Protocol (Request)
    Message type: Boot Request (1)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x44e3a7c9
    Seconds elapsed: 0
    Bootp flags: 0x8000, Broadcast flag (Broadcast)
    Client IP address: 0.0.0.0
    Your (client) IP address: 0.0.0.0
    Next server IP address: 0.0.0.0
    Relay agent IP address: 0.0.0.0
    Client MAC address: 02:00:00:00:00:01 (02:00:00:00:00:01)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (54) DHCP Server Identifier (10.0.2.2)
    Option: (55) Parameter Request List
        Length: 14
        Parameter Request List Item: (3) Router
        Parameter Request List Item: (51) IP Address Lease Time
        Parameter Request List Item: (1) Subnet Mask
        Parameter Request List Item: (15) Domain Name
        Parameter Request List Item: (6) Domain Name Server
        Parameter Request List Item: (66) TFTP Server Name
        Parameter Request List Item: (67) Bootfile name
        Parameter Request List Item: (120) SIP Servers
        Parameter Request List Item: (44) NetBIOS over TCP/IP Name Server
        Parameter Request List Item: (43) Vendor-Specific Information
        Parameter Request List Item: (150) TFTP Server Address
        Parameter Request List Item: (12) Host Name
        Parameter Request List Item: (7) Log Server
        Parameter Request List Item: (42) Network Time Protocol Servers
    Option: (50) Requested IP Address (10.0.2.15)
    Option: (53) DHCP Message Type (Request)
    Option: (60) Vendor class identifier
        Length: 15
        Vendor class identifier: Juniper-mx10003
    Option: (51) IP Address Lease Time
    Option: (12) Host Name
    Option: (255) End
    Padding: 00
It requests several options, including the TFTP server address option 150, and the Vendor-Specific Information Option 43 or VSIO. The DHCP server can use option 60 to identify the vendor-specific information to send. For Juniper devices, option 43 encodes the image name and the configuration file name. They are fetched from the IP address provided in option 150. The official documentation on ZTP provides a valid configuration to answer such a request. However, it does not leverage the ability of the ISC DHCP Server to support several vendors and redefines option 43 to be Juniper-specific:
option NEW_OP-encapsulation code 43 = encapsulate NEW_OP;
Instead, it is possible to define an option space for Juniper, using a self-descriptive name, without overriding option 43:
# Juniper vendor option space
option space juniper;
option juniper.image-file-name     code 0 = text;
option juniper.config-file-name    code 1 = text;
option juniper.image-file-type     code 2 = text;
option juniper.transfer-mode       code 3 = text;
option juniper.alt-image-file-name code 4 = text;
option juniper.http-port           code 5 = text;
Then, when you need to set these suboptions, specify the vendor option space:
class "juniper-mx10003"  
  match if (option vendor-class-identifier = "Juniper-mx10003")  
  vendor-option-space juniper;
  option juniper.transfer-mode    "http";
  option juniper.image-file-name  "/images/junos-vmhost-install-mx-x86-64-19.3R2-S4.5.tgz";
  option juniper.config-file-name "/cfg/juniper-mx10003.txt";
 
This configuration returns the following answer:1
Dynamic Host Configuration Protocol (ACK)
    Message type: Boot Reply (2)
    Hardware type: Ethernet (0x01)
    Hardware address length: 6
    Hops: 0
    Transaction ID: 0x44e3a7c9
    Seconds elapsed: 0
    Bootp flags: 0x8000, Broadcast flag (Broadcast)
    Client IP address: 0.0.0.0
    Your (client) IP address: 10.0.2.15
    Next server IP address: 0.0.0.0
    Relay agent IP address: 0.0.0.0
    Client MAC address: 02:00:00:00:00:01 (02:00:00:00:00:01)
    Client hardware address padding: 00000000000000000000
    Server host name not given
    Boot file name not given
    Magic cookie: DHCP
    Option: (53) DHCP Message Type (ACK)
    Option: (54) DHCP Server Identifier (10.0.2.2)
    Option: (51) IP Address Lease Time
    Option: (1) Subnet Mask (255.255.255.0)
    Option: (3) Router
    Option: (6) Domain Name Server
    Option: (43) Vendor-Specific Information
        Length: 89
        Value: 00362f696d616765732f6a756e6f732d766d686f73742d69 
    Option: (150) TFTP Server Address
    Option: (255) End
Using vendor-option-space directive allows you to make different ZTP implementations coexist. For example, you can add the option space for PXE:
option space PXE;
option PXE.mtftp-ip    code 1 = ip-address;
option PXE.mtftp-cport code 2 = unsigned integer 16;
option PXE.mtftp-sport code 3 = unsigned integer 16;
option PXE.mtftp-tmout code 4 = unsigned integer 8;
option PXE.mtftp-delay code 5 = unsigned integer 8;
option PXE.discovery-control    code 6  = unsigned integer 8;
option PXE.discovery-mcast-addr code 7  = ip-address;
option PXE.boot-server code 8  =   unsigned integer 16, unsigned integer 8, ip-address  ;
option PXE.boot-menu   code 9  =   unsigned integer 16, unsigned integer 8, text  ;
option PXE.menu-prompt code 10 =   unsigned integer 8, text  ;
option PXE.boot-item   code 71 = unsigned integer 32;
class "pxeclients"  
  match if substring (option vendor-class-identifier, 0, 9) = "PXEClient";
  vendor-option-space PXE;
  option PXE.mtftp-ip 10.0.2.2;
  # [ ]
 
On the same topic, do not override option 125 VIVSO. See Zero-Touch Provisioning for Cisco IOS.

  1. Wireshark knows how to decode option 43 for some vendors, thanks to option 60, but not for Juniper.

2 November 2020

Vincent Bernat: My collection of vintage PC cards

Recently, I have been gathering some old hardware at my parents house, notably PC extension cards, as they don t take much room and can be converted to a nice display item. Unfortunately, I was not very concerned about keeping stuff around. Compared to all the hardware I have acquired over the years, only a few pieces remain.

Tseng Labs ET4000AX (1989) This SVGA graphics card was installed into a PC powered by a 386SX CPU running at 16 MHz. This was a good card at the time as it was pretty fast. It didn t feature 2D acceleration, unlike the later ET4000/W32. This version only features 512 KB of RAM. It can display 1024 768 images with 16 colors or 800 600 with 256 colors. It was also compatible with CGA, EGA, VGA, MDA, and Hercules modes. No contemporary games were using the SVGA modes but the higher resolutions were useful with Windows 3. This card was manufactured directly by Tseng Labs.
Carte Tseng Labs ET4000AX ISA au-dessus de la bo te "Plan te Aventure"
Tseng Labs ET4000 AX ISA card

AdLib clone (1992) My first sound card was an AdLib. My parents bought it in Canada during the summer holidays in 1992. It uses a Yamaha OPL2 chip to produce sound via FM synthesis. The first game I have tried is Indiana Jones and the Last Crusade. I think I gave this AdLib to a friend once I upgraded my PC with a Sound Blaster Pro 2. Recently, I needed one for a side project, but they are rare and expensive on eBay. Someone mentioned a cheap clone on Vogons, so I bought it. It was sold by Sun Moon Star in 1992 and shipped with a CD-ROM of Doom shareware.
AdLib clone on top of "Alone in the Dark" box
AdLib clone ISA card by Sun Moon Star
On this topic, take a look at OPL2LPT: an AdLib sound card for the parallel port and OPL2 Audio Board: an AdLib sound card for Arduino .

Sound Blaster Pro 2 (1992) Later, I switched the AdLib sound card with a Sound Blaster Pro 2. It features an OPL3 chip and was also able to output digital samples. At the time, this was a welcome addition, but not as important as the FM synthesis introduced earlier by the AdLib.
Sound Blaster Pro 2 on top of "Day of the Tentacle" box
Sound Blaster Pro 2 ISA card

Promise EIDE 2300 Plus (1995) I bought this card mostly for the serial port. I was using a 486DX2 running at 66 MHz with a Creatix LC 288 FC external modem. The serial port was driven by an 8250 UART with no buffer. Thanks to Terminate, I was able to connect to BBSes with DOS, but this was not possible with Windows 3 or OS/2. I needed one of these fancy new cards with a 16550 UART, featuring a 16-byte buffer. At the time, this was quite difficult to find in France. During a holiday trip, I convinced my parent to make a short detour from Los Angeles to San Diego to buy this Promise EIDE 2300 Plus controller card at a shop I located through an advertisement in a local magazine! The card also features an EIDE controller with multi-word DMA mode 2 support. In contrast with the older PIO modes, the CPU didn t have to copy data from disk to memory.
Promise EIDE 2300 Plus next to an OS/2 Warp CD
Promise EIDE 2300 Plus VLB card

3dfx Voodoo2 Magic 3D II (1998) The 3dfx Voodoo2 was one of the first add-in graphics cards implementing hardware acceleration of 3D graphics. I bought it from a friend along with his Pentium II box in 1999. It was a big evolutionary step in PC gaming, as games became more beautiful and fluid. A traditional video controller was still required for 2D. A pass-through VGA cable daisy-chained the video controller to the Voodoo, which was itself connected to the monitor.
3dfx Voodoo 2 Magic 3D II on top of "Jedi Knight: Dark Forces II" box
3dfx Voodoo2 Magic 3D II PCI card

3Com 3C905C-TX-M Tornado (1999) In the early 2000s, in college, the Internet connection on the campus was provided by a student association through a 100 Mbps Ethernet cable. If you wanted to reach the maximum speed, the 3Com 3C905C-TX-M PCI network adapter, nicknamed Tornado , was the card you needed. We would buy it second-hand by the dozen and sell them to other students for around 30 .
3COM 3C905C-TX-M on top of "Red Alert" box
3Com 3C905C-TX-M PCI card

1 November 2020

Vincent Bernat: Running Isso on NixOS in a Docker container

This short article documents how I run Isso, the commenting system used by this blog, inside a Docker container on NixOS, a Linux distribution built on top of Nix. Nix is a declarative package manager for Linux and other Unix systems.
While NixOS 20.09 includes a derivation for Isso, it is unfortunately broken and relies on Python 2. As I am also using a fork of Isso, I have built my own derivation, heavily inspired by the one in master:1
issoPackage = with pkgs.python3Packages; buildPythonPackage rec  
  pname = "isso";
  version = "custom";
  src = pkgs.fetchFromGitHub  
    # Use my fork
    owner = "vincentbernat";
    repo = pname;
    rev = "vbe/master";
    sha256 = "0vkkvjcvcjcdzdj73qig32hqgjly8n3ln2djzmhshc04i6g9z07j";
   ;
  propagatedBuildInputs = [
    itsdangerous
    jinja2
    misaka
    html5lib
    werkzeug
    bleach
    flask-caching
  ];
  buildInputs = [
    cffi
  ];
  checkInputs = [ nose ];
  checkPhase = ''
    $ python.interpreter  setup.py nosetests
  '';
 ;
I want to run Isso through Gunicorn. To this effect, I build a Python environment combining Isso and Gunicorn. Then, I can invoke the latter with "$ issoEnv /bin/gunicorn", like with a virtual environment.
issoEnv = pkgs.python3.buildEnv.override  
    extraLibs = [
      issoPackage
      pkgs.python3Packages.gunicorn
      pkgs.python3Packages.gevent
    ];
 ;
Before building a Docker image, I also need to specify the Isso configuration file for Isso:
issoConfig = pkgs.writeText "isso.conf" ''
  [general]
  dbpath = /db/comments.db
  host =
    https://vincent.bernat.ch
    http://localhost:8080
  notify = smtp
  [ ]
'';
NixOS comes with a convenient tool to build a Docker image without a Dockerfile:
issoDockerImage = pkgs.dockerTools.buildImage  
  name = "isso";
  tag = "latest";
  extraCommands = ''
    mkdir -p db
  '';
  config =  
    Cmd = [ "$ issoEnv /bin/gunicorn"
            "--name" "isso"
            "--bind" "0.0.0.0:$ port "
            "--worker-class" "gevent"
            "--workers" "2"
            "--worker-tmp-dir" "/dev/shm"
            "--preload"
            "isso.run"
          ];
    Env = [
      "ISSO_SETTINGS=$ issoConfig "
      "SSL_CERT_FILE=$ pkgs.cacert /etc/ssl/certs/ca-bundle.crt"
    ];
   ;
 ;
Because we refer to the issoEnv derivation in config.Cmd, the whole derivation, including Isso and Gunicorn, is copied inside the Docker image. The same applies for issoConfig, the configuration file we created earlier, and pkgs.cacert, the derivation containing trusted root certificates. The resulting image is 171 MB once installed, which is comparable to the Debian Buster image generated by the official Dockerfile. NixOS features an abstraction to run Docker containers. It is not currently documented in NixOS manual but you can look at the source code of the module for the available options. I choose to use Podman instead of Docker as the backend because it does not require running an additional daemon.
virtualisation.oci-containers =  
  backend = "podman";
  containers =  
    isso =  
      image = "isso";
      imageFile = issoDockerImage;
      ports = ["127.0.0.1:$ port :$ port "];
      volumes = [
        "/var/db/isso:/db"
      ];
     ;
   ;
 ;
A systemd unit file is automatically created to run and supervise the container:
$ systemctl status podman-isso.service
  podman-isso.service
     Loaded: loaded (/nix/store/a66gzqqwcdzbh99sz8zz5l5xl8r8ag7w-unit->
     Active: active (running) since Sun 2020-11-01 16:04:16 UTC; 4min 44s ago
    Process: 14564 ExecStartPre=/nix/store/95zfn4vg4867gzxz1gw7nxayqcl>
   Main PID: 14697 (podman)
         IP: 0B in, 0B out
      Tasks: 10 (limit: 2313)
     Memory: 221.3M
        CPU: 10.058s
     CGroup: /system.slice/podman-isso.service
              14697 /nix/store/pn52xgn1wb2vr2kirq3xj8ij0rys35mf-podma>
              14802 /nix/store/7vsba54k6ag4cfsfp95rvjzqf6rab865-conmo>
nov. 01 16:04:17 web03 podman[14697]: container init (image=localhost/isso:latest)
nov. 01 16:04:17 web03 podman[14697]: container start (image=localhost/isso:latest)
nov. 01 16:04:17 web03 podman[14697]: container attach (image=localhost/isso:latest)
nov. 01 16:04:19 web03 conmon[14802]: INFO: connected to SMTP server
nov. 01 16:04:19 web03 conmon[14802]: INFO: connected to https://vincent.bernat.ch
nov. 01 16:04:19 web03 conmon[14802]: [INFO] Starting gunicorn 20.0.4
nov. 01 16:04:19 web03 conmon[14802]: [INFO] Listening at: http://0.0.0.0:8080 (1)
nov. 01 16:04:19 web03 conmon[14802]: [INFO] Using worker: gevent
nov. 01 16:04:19 web03 conmon[14802]: [INFO] Booting worker with pid: 8
nov. 01 16:04:19 web03 conmon[14802]: [INFO] Booting worker with pid: 9
As the last step, we configure Nginx to forward requests for comments.luffy.cx to the container. NixOS provides a simple integration to grab a Let s Encrypt certificate.
services.nginx.virtualHosts."comments.luffy.cx" =  
  root = "/data/webserver/comments.luffy.cx";
  enableACME = true;
  forceSSL = true;
  extraConfig = ''
    access_log /var/log/nginx/comments.luffy.cx.log anonymous;
  '';
  locations."/" =  
    proxyPass = "http://127.0.0.1:$ port ";
    extraConfig = ''
      proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
      proxy_set_header Host $host;
      proxy_set_header X-Forwarded-Proto $scheme;
      proxy_hide_header Set-Cookie;
      proxy_hide_header X-Set-Cookie;
      proxy_ignore_headers Set-Cookie;
    '';
   ;
 ;
security.acme.certs."comments.luffy.cx" =  
  email = lib.concatStringsSep "@" [ "letsencrypt" "vincent.bernat.ch" ];
 ;

While I still struggle with Nix and NixOS, I am convinced this is how declarative infrastructure should be done. I like how in one single file, I can define the derivation to build Isso, the configuration, the Docker image, the container definition, and the Nginx configuration. The Nix language is used both for building packages and for managing configurations. Moreover, the Docker image is updated automatically like a regular NixOS host. This solves an issue plaguing the Docker ecosystem: no more stale images! My next step would be to combine this approach with Nomad, a simple orchestrator to deploy and manage containers.

  1. There is a subtle difference: I am using buildPythonPackage instead of buildPythonApplication. This is important for the next step. I didn t investigate if an application can be converted to a package easily.

29 September 2020

Vincent Bernat: Speeding up bgpq4 with IRRd in a container

When building route filters with bgpq4 or bgpq3, the speed of rr.ntt.net or whois.radb.net can be a bottleneck. Updating many filters may take several tens of minutes, depending on the load:
$ time bgpq4 -h whois.radb.net AS-HURRICANE   wc -l
909869
1.96s user 0.15s system 2% cpu 1:17.64 total
$ time bgpq4 -h rr.ntt.net AS-HURRICANE   wc -l
927865
1.86s user 0.08s system 12% cpu 14.098 total
A possible solution is to run your own IRRd instance in your network, mirroring the main routing registries. A close alternative is to bundle IRRd with all the data in a ready-to-use Docker image. This also has the advantage of easy integration into a Docker-based CI/CD pipeline.
$ git clone https://github.com/vincentbernat/irrd-legacy.git -b blade/master
$ cd irrd-legacy
$ docker build . -t irrd-snapshot:latest
[ ]
Successfully built 58c3e83a1d18
Successfully tagged irrd-snapshot:latest
$ docker container run --rm --detach --publish=43:43 irrd-snapshot
4879cfe7413075a0c217089dcac91ed356424c6b88808d8fcb01dc00eafcc8c7
$ time bgpq4 -h localhost AS-HURRICANE   wc -l
904137
1.72s user 0.11s system 96% cpu 1.881 total
The Dockerfile contains three stages:
  1. building IRRd,1
  2. retrieving various IRR databases, and
  3. assembling the final container with the result of the two previous stages.
The second stage fetches the databases used by rr.ntt.net: NTTCOM, RADB, RIPE, ALTDB, BELL, LEVEL3, RGNET, APNIC, JPIRR, ARIN, BBOI, TC, AFRINIC, ARIN-WHOIS, and REGISTROBR. However, it misses RPKI.2 Feel free to adapt! The image can be scheduled to be rebuilt daily or weekly, depending on your needs. The repository includes a .gitlab-ci.yaml file automating the build and triggering the compilation of all filters by your CI/CD upon success.

  1. Instead of using the latest version of IRRd, the image relies on an older version that does not require a PostgreSQL instance and uses flat files instead.
  2. Unlike the others, the RPKI database is built from the published RPKI ROAs. They can be retrieved with rpki-client and transformed into RPSL objects to be imported in IRRd.

28 September 2020

Vincent Bernat: Syncing RIPE, ARIN and APNIC objects with a custom Ansible module

Internet is split into five regional Internet registry: AFRINIC, ARIN, APNIC, LACNIC and RIPE. Each RIR maintains an Internet Routing Registry. An IRR allows one to publish information about the routing of Internet number resources.1 Operators use this to determine the owner of an IP address and to construct and maintain routing filters. To ensure your routes are widely accepted, it is important to keep the prefixes you announce up-to-date in an IRR. There are two common tools to query this database: whois and bgpq4. The first one allows you to do a query with the WHOIS protocol:
$ whois -BrG 2a0a:e805:400::/40
[ ]
inet6num:       2a0a:e805:400::/40
netname:        FR-BLADE-CUSTOMERS-DE
country:        DE
geoloc:         50.1109 8.6821
admin-c:        BN2763-RIPE
tech-c:         BN2763-RIPE
status:         ASSIGNED
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2020-05-19T08:04:58Z
last-modified:  2020-05-19T08:04:58Z
source:         RIPE
route6:         2a0a:e805:400::/40
descr:          Blade IPv6 - AMS1
origin:         AS64476
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2019-10-01T08:19:34Z
last-modified:  2020-05-19T08:05:00Z
source:         RIPE
The second one allows you to build route filters using the information contained in the IRR database:
$ bgpq4 -6 -S RIPE -b AS64476
NN = [
    2a0a:e805::/40,
    2a0a:e805:100::/40,
    2a0a:e805:300::/40,
    2a0a:e805:400::/40,
    2a0a:e805:500::/40
];
There is no module available on Ansible Galaxy to manage these objects. Each IRR has different ways of being updated. Some RIRs propose an API but some don t. If we restrict ourselves to RIPE, ARIN and APNIC, the only common method to update objects is email updates, authenticated with a password or a GPG signature.2 Let s write a custom Ansible module for this purpose!

Notice I recommend that you read Writing a custom Ansible module as an introduction, as well as Syncing MySQL tables for a more instructive example.

Code The module takes a list of RPSL objects to synchronize and returns the body of an email update if a change is needed:
- name: prepare RIPE objects
  irr_sync:
    irr: RIPE
    mntner: fr-blade-1-mnt
    source: whois-ripe.txt
  register: irr

Prerequisites The source file should be a set of objects to sync using the RPSL language. This would be the same content you would send manually by email. All objects should be managed by the same maintainer, which is also provided as a parameter. Signing and sending the result is not the responsibility of this module. You need two additional tasks for this purpose:
- name: sign RIPE objects
  shell:
    cmd: gpg --batch --user noc@example.com --clearsign
    stdin: "  irr.objects  "
  register: signed
  check_mode: false
  changed_when: false
- name: update RIPE objects by email
  mail:
    subject: "NEW: update for RIPE"
    from: noc@example.com
    to: "auto-dbm@ripe.net"
    cc: noc@example.com
    host: smtp.example.com
    port: 25
    charset: us-ascii
    body: "  signed.stdout  "
You also need to authorize the PGP keys used to sign the updates by creating a key-cert object and adding it as a valid authentication method for the corresponding mntner object:
key-cert:  PGPKEY-A791AAAB
certif:    -----BEGIN PGP PUBLIC KEY BLOCK-----
certif:    
certif:    mQGNBF8TLY8BDADEwP3a6/vRhEERBIaPUAFnr23zKCNt5YhWRZyt50mKq1RmQBBY
[ ]
certif:    -----END PGP PUBLIC KEY BLOCK-----
mnt-by:    fr-blade-1-mnt
source:    RIPE
mntner:    fr-blade-1-mnt
[ ]
auth:      PGPKEY-A791AAAB
mnt-by:    fr-blade-1-mnt
source:    RIPE

Module definition Starting from the skeleton described in the previous article, we define the module:
module_args = dict(
    irr=dict(type='str', required=True),
    mntner=dict(type='str', required=True),
    source=dict(type='path', required=True),
)
result = dict(
    changed=False,
)
module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
)

Getting existing objects To grab existing objects, we use the whois command to retrieve all the objects from the provided maintainer.
# Per-IRR variations:
# - whois server
whois =  
    'ARIN': 'rr.arin.net',
    'RIPE': 'whois.ripe.net',
    'APNIC': 'whois.apnic.net'
 
# - whois options
options =  
    'ARIN': ['-r'],
    'RIPE': ['-BrG'],
    'APNIC': ['-BrG']
 
# - objects excluded from synchronization
excluded = ["domain"]
if irr == "ARIN":
    # ARIN does not return these objects
    excluded.extend([
        "key-cert",
        "mntner",
    ])
# Grab existing objects
args = ["-h", whois[irr],
        "-s", irr,
        *options[irr],
        "-i", "mnt-by",
        module.params['mntner']]
proc = subprocess.run("whois", *args, capture_output=True)
if proc.returncode != 0:
    raise AnsibleError(
        f"unable to query whois:  args ")
output = proc.stdout.decode('ascii')
got = extract(output, excluded)
The first part of the code setup some IRR-specific constants: the server to query, the options to provide to the whois command and the objects to exclude from synchronization. The second part invokes the whois command, requesting all objects whose mnt-by field is the provided maintainer. Here is an example of output:
$ whois -h whois.ripe.net -s RIPE -BrG -i mnt-by fr-blade-1-mnt
[ ]
inet6num:       2a0a:e805:300::/40
netname:        FR-BLADE-CUSTOMERS-FR
country:        FR
geoloc:         48.8566 2.3522
admin-c:        BN2763-RIPE
tech-c:         BN2763-RIPE
status:         ASSIGNED
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2020-05-19T08:04:59Z
last-modified:  2020-05-19T08:04:59Z
source:         RIPE
[ ]
route6:         2a0a:e805:300::/40
descr:          Blade IPv6 - PA1
origin:         AS64476
mnt-by:         fr-blade-1-mnt
remarks:        synced with cmdb
created:        2019-10-01T08:19:34Z
last-modified:  2020-05-19T08:05:00Z
source:         RIPE
[ ]
The result is passed to the extract() function. It parses and normalizes the results into a dictionary mapping object names to objects. We store the result in the got variable.
def extract(raw, excluded):
    """Extract objects."""
    # First step, remove comments and unwanted lines
    objects = "\n".join([obj
                         for obj in raw.split("\n")
                         if not obj.startswith((
                                 "#",
                                 "%",
                         ))])
    # Second step, split objects
    objects = [RPSLObject(obj.strip())
               for obj in re.split(r"\n\n+", objects)
               if obj.strip()
               and not obj.startswith(
                   tuple(f" x :" for x in excluded))]
    # Last step, put objects in a dict
    objects =  repr(obj): obj
               for obj in objects 
    return objects
RPSLObject() is a class enabling normalization and comparison of objects. Look at the module code for more details.
>>> output="""
... inet6num:       2a0a:e805:300::/40
... [ ]
... """
>>> pprint( k: str(v) for k,v in extract(output, excluded=[]) )
 '<Object:inet6num:2a0a:e805:300::/40>':
   'inet6num:       2a0a:e805:300::/40\n'
   'netname:        FR-BLADE-CUSTOMERS-FR\n'
   'country:        FR\n'
   'geoloc:         48.8566 2.3522\n'
   'admin-c:        BN2763-RIPE\n'
   'tech-c:         BN2763-RIPE\n'
   'status:         ASSIGNED\n'
   'mnt-by:         fr-blade-1-mnt\n'
   'remarks:        synced with cmdb\n'
   'source:         RIPE',
 '<Object:route6:2a0a:e805:300::/40>':
   'route6:         2a0a:e805:300::/40\n'
   'descr:          Blade IPv6 - PA1\n'
   'origin:         AS64476\n'
   'mnt-by:         fr-blade-1-mnt\n'
   'remarks:        synced with cmdb\n'
   'source:         RIPE' 

Comparing with wanted objects Let s build the wanted dictionary using the same structure, thanks to the extract() function we can use verbatim:
with open(module.params['source']) as f:
    source = f.read()
wanted = extract(source, excluded)
The next step is to compare got and wanted to build the diff object:
if got != wanted:
    result['changed'] = True
    if module._diff:
        result['diff'] = [
            dict(before_header=k,
                 after_header=k,
                 before=str(got.get(k, "")),
                 after=str(wanted.get(k, "")))
            for k in set((*wanted.keys(), *got.keys()))
            if k not in wanted or k not in got or wanted[k] != got[k]]

Returning updates The module does not have a side effect. If there is a difference, we return the updates to send by email. We choose to include all wanted objects in the updates (contained in the source variable) and let the IRR ignore unmodified objects. We also append the objects to be deleted by adding a delete: attribute to each them them.
# We send all source objects and deleted objects.
deleted_mark = f" 'delete:':16 deleted by CMDB"
deleted = "\n\n".join([f" got[k].raw \n deleted_mark "
                       for k in got
                       if k not in wanted])
result['objects'] = f" source \n\n deleted "
module.exit_json(**result)

The complete code is available on GitHub. The module supports both --diff and --check flags. It does not return anything if no change is detected. It can work with APNIC, RIPE and ARIN. It is not perfect: it may not detect some changes,3 it is not able to modify objects not owned by the provided maintainer4 and some attributes cannot be modified, requiring to manually delete and recreate the updated object.5 However, this module should automate 95% of your needs.

  1. Other IRRs exist without being attached to a RIR. The most notable one is RADb.
  2. ARIN is phasing out this method in favor of IRR-online. RIPE has an API available, but email updates are still supported and not planned to be deprecated. APNIC plans to expose an API.
  3. For ARIN, we cannot query key-cert and mntner objects and therefore we cannot detect changes in them. It is also not possible to detect changes to the auth mechanisms of a mntner object.
  4. APNIC do not assign top-level objects to the maintainer associated with the owner.
  5. Changing the status of an inetnum object requires deleting and recreating the object.

Next.

Previous.