Search Results: "beh"

11 May 2023

Simon Josefsson: Streamlined NTRU Prime sntrup761 goes to IETF

The OpenSSH project added support for a hybrid Streamlined NTRU Prime post-quantum key encapsulation method sntrup761 to strengthen their X25519-based default in their version 8.5 released on 2021-03-03. While there has been a lot of talk about post-quantum crypto generally, my impression has been that there has been a slowdown in implementing and deploying them in the past two years. Why is that? Regardless of the answer, we can try to collaboratively change things, and one effort that appears strangely missing are IETF documents for these algorithms. Building on some earlier work that added X25519/X448 to SSH, writing a similar document was relatively straight-forward once I had spent a day reading OpenSSH and TinySSH source code to understand how it worked. While I am not perfectly happy with how the final key is derived from the sntrup761/X25519 secrets it is a SHA512 call on the concatenated secrets I think the construct deserves to be better documented, to pave the road for increased confidence or better designs. Also, reusing the RFC5656 4 structs makes for a worse specification (one unnecessary normative reference), but probably a simpler implementation. I have published draft-josefsson-ntruprime-ssh-00 here. Credit here goes to Jan Moj of TinySSH that designed the earlier sntrup4591761x25519-sha512@tinyssh.org in 2018, Markus Friedl who added it to OpenSSH in 2019, and Damien Miller that changed it to sntrup761 in 2020. Does anyone have more to add to the history of this work? Once I had sharpened my xml2rfc skills, preparing a document describing the hybrid construct between the sntrup761 key-encapsulation mechanism and the X25519 key agreement method in a non-SSH fashion was easy. I do not know if this work is useful, but it may serve as a reference for further study. I published draft-josefsson-ntruprime-hybrid-00 here. Finally, how about a IETF document on the base Streamlined NTRU Prime? Explaining all the details, and especially the math behind it would be a significant effort. I started doing that, but realized it is a subjective call when to stop explaining things. If we can t assume that the reader knows about lattice math, is a document like this the best place to teach it? I settled for the most minimal approach instead, merely giving an introduction to the algorithm, included SageMath and C reference implementations together with test vectors. The IETF audience rarely understands math, so I think it is better to focus on the bits on the wire and the algorithm interfaces. Everything here was created by the Streamlined NTRU Prime team, I merely modified it a bit hoping I didn t break too much. I have now published draft-josefsson-ntruprime-streamlined-00 here. I maintain the IETF documents on my ietf-ntruprime GitLab page, feel free to open merge requests or raise issues to help improve them. To have confidence in the code was working properly, I ended up preparing a branch with sntrup761 for the GNU-project Nettle and have submitted it upstream for review. I had the misfortune of having to understand and implement NIST s DRBG-CTR to compute the sntrup761 known-answer tests, and what a mess it is. Why does a deterministic random generator support re-seeding? Why does it support non-full entropy derivation? What s with the key size vs block size confusion? What s with the optional parameters? What s with having multiple algorithm descriptions? Luckily I was able to extract a minimal but working implementation that is easy to read. I can t locate DRBG-CTR test vectors, anyone? Does anyone have sntrup761 test vectors that doesn t use DRBG-CTR? One final reflection on publishing known-answer tests for an algorithm that uses random data: are the test vectors stable over different ways to implement the algorithm? Just consider of some optimization moved one randomness-extraction call before another, then wouldn t the output be different? Are there other ways to verify correctness of implementations? As always, happy hacking!

9 May 2023

Dirk Eddelbuettel: crc32c 0.0.1 on CRAN: New Package

Happy to announce a new package: crc32c. This arose out of a user request to add crc32c (which is related to but differnt from crc32 without the trailing c) to my digest package. Which I did (for now in a branch), using the software-fallback version of crc32c from the reference implementation by Google at their crc32c repo. However, the Google repo also offers hardware-accelerated versions and switches at run-time. So I pondered a little about how to offer the additional performance without placing a dependency burden on all users of digest. Lo and behold, I think I found a solution by reusing what R offers. First off, the crc32c package wraps the Google repo cleanly and directly. We include all the repo code but not the logging or benchmarking code. This keeps external dependencies down to just cmake. Which while still rare in the CRAN world is at least not entirely uncommon. So now each time you build the crc32c R package, the upstream hardware detection is added transparently thanks in part to cmake. We build libcrc32c.a as a static library and include it in the R package and its own shared library. And we added exporting of three key functions, directly at the C level, along with exporting one C level function of package that other packages can call. The distinction matters. The second step of offering a function R can call (also from other packages) is used and documented. By means of an Imports: statement to instantiate the package providing the functionality, the client package obtains access to a compiled functions its R code can then call. A number of other R packages use this. But what is less well known is that we can do the same with C level functions. Again, it takes an imports statement but once that is done we can call C to C . Which is quite nice. I am working currently on the branch in digest which motivated this, and it can import the automagically hardware-accelerated functionality on the suitable hardware. Stay tuned for a change in digest. I also won and lost the CRAN lottery for once: the package made it through the new package checks without any revisions. Only to then immediately fail on the CRAN machines using gcc-13 as a -fPIC was seen as missing when making the shared library. Even though both my CI and the r-universe builds were all green. Ah well. So a 0.0.2 release will be coming in a day or two. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

2 May 2023

Neil Williams: Carrying Grief

This isn't a book review, although the reason that I am typing this now is because of a book, You Are Not Alone: from the creator and host of Griefcast, Cariad Lloyd, ISBN: 978-1526621870 and I include a handful of quotes from Cariad where there is really no better way of describing things. Many people experience death for the first time as a child, often relating to a family pet. Death is universal but every experience of death is unique. One of the myths of grief is the idea of the Five Stages but this is a misinterpretation. Denial, Anger, Bargaining, Depression and Acceptance represent the five stage model of death and have nothing to do with grief. The five stages were developed from studying those who are terminally ill, the dying, not those who then grieve for the dead person and have to go on living without them. Grief is for those who loved the person who has died and it varies between each of those people just as people vary in how they love someone. The Five Stages end at the moment of death, grief is what comes next and most people do not grieve in stages, it can be more like a tangled knot. Death has a date and time, so that is why the last stage of the model is Acceptance. Grief has no timetable, those who grieve will carry that grief for the rest of their lives. Death starts the process of grief in those who go on living just as it ends the life of the person who is loved. "Grief eases and changes and returns but it never disappears.". I suspect many will have already stopped reading by this point. People do not talk about death and grief enough and this only adds to the burden of those who carry their grief. It can be of enormous comfort to those who have carried grief for some time to talk directly about the dead, not in vague pleasantries but with specific and strong memories. Find a safe place without distractions and talk with the person grieving face to face. Name the dead person. Go to places with strong memories and be there alongside. Talk about the times with that person before their death. Early on, everything about grief is painful and sad. It does ease but it remains unpredictable. Closing it away in a box inside your head (as I did at one point) is like cutting off a damaged limb but keeping the pain in a box on the shelf. You still miss the limb and eventually, the box starts leaking. For me, there were family pets which died but my first job out of university was to work in hospitals, helping the nurses manage the medication regimen and providing specialist advice as a pharmacist. It will not be long in that environment before everyone on the ward gets direct experience of the death of a person. In some ways, this helped me to separate the process of death from the process of grief. I cared for these people as patients but these were not my loved ones. Later, I worked in specialist terminal care units, including providing potential treatments as part of clinical trials. Here, it was not expected for any patient to be discharged alive. The more aggressive chemotherapies had already been tried and had failed, this was about pain relief, symptom management and helping the loved ones. Palliative care is not just about the patient, it involves helping the loved ones to accept what is happening as this provides comfort to the patient by closing the loop. Grief is stressful. One of the most common causes of personal stress is bereavement. The death of your loved one is outside of your control, it has happened, no amount of regret can change that. Then come all the other stresses, maybe about money or having somewhere to live as a result of what else has changed after the death or having to care for other loved ones. In the early stages, the first two years, I found it helpful to imagine my life as a box containing a ball and a button. The button triggers new waves of pain and loss each time it is hit. The ball bounces around the box and hits the button at random. Initially, the button is large and the ball is enormous, so the button is hit almost constantly. Over time, both the button and the ball change size. Starting off at maximum, initially there is only one direction of change. There are two problems with this analogy. First is that the grief ball has infinite energy which does not happen in reality. The ball may get smaller and the button harder to hit but the ball will continue bouncing. Secondly, the life box is not a predictable shape, so the pattern of movement of the ball is unpredictable. A single stress is one thing, but what has happened since has just kept adding more stress for me. Shortly before my father died 5 years ago now, I had moved house. Then, I was made redundant on the day of the first anniversary of my father's death. A year or so later, my long term relationship failed and a few months after that COVID-19 appeared. As the country eased out of the pandemic in 2021, my mother died (unrelated to COVID itself). A year after that, I had to take early retirement. My brother and sister, of course, share a lot of those stressors. My brother, in particular, took the responsibility for organising both funerals and did most of the visits to my mother before her death. The grief is different for each of the surviving family. Cariad's book helped me understand why I was getting frequent ideas about going back to visit places which my father and I both knew. My parents encouraged each of us to work hard to leave Port Talbot (or Pong Toilet locally) behind, in no small part due to the unrestrained pollution and deprivation that is common to small industrial towns across Wales, the midlands and the north of the UK. It wasn't that I wanted to move house back to our ancestral roots. It was my grief leaking out of the box. Yes, I long for mountains and the sea because I'm now living in a remorselessly flat and landlocked region after moving here for employment. However, it was my grief driving those longings - not for the physical surroundings but out of the shared memories with my father. I can visit those memories without moving house, I just need to arrange things so that I can be undisturbed and undistracted. I am not alone with my grief and I am grateful to my friends who have helped whilst carrying their own grief. It is necessary for everyone to think and talk about death and grief. In respect of your own death, no matter how far ahead that may be, consider Advance Care Planning and Expressions of Wish as well as your Will. Talk to people, document what you want. Your loved ones will be grateful and they deserve that much whilst they try to cope with the first onslaught of grief. Talk to your loved ones and get them to do the same for themselves. Normalise talking about death with your family, especially children. None of us are getting out of this alive and we will all leave behind people who will grieve.

29 April 2023

Simon Josefsson: How To Trust A Machine

Let s reflect on some of my recent work that started with understanding Trisquel GNU/Linux, improving transparency into apt-archives, working on reproducible builds of Trisquel, strengthening verification of apt-archives with Sigstore, and finally thinking about security device threat models. A theme in all this is improving methods to have trust in machines, or generally any external entity. While I believe that everything starts by trusting something, usually something familiar and well-known, we need to deal with misuse of that trust that leads to failure to deliver what is desired and expected from the trusted entity. How can an entity behave to invite trust? Let s argue for some properties that can be quantitatively measured, with a focus on computer software and hardware: Essentially, this boils down to: Trust, Verify and Hold Accountable. To put this dogma in perspective, it helps to understand that this approach may be harmful to human relationships (which could explain the social awkwardness of hackers), but it remains useful as a method to improve the design of computer systems, and a useful method to evaluate safety of computer systems. When a system fails some of the criteria above, we know we have more work to do to improve it. How far have we come on this journey? Through earlier efforts, we are in a fairly good situation. Richard Stallman through GNU/FSF made us aware of the importance of free software, the Reproducible/Bootstrappable build projects made us aware of the importance of verifiability, and Certificate Transparency highlighted the need for accountable signature logs leading to efforts like Sigstore for software. None of these efforts would have seen the light of day unless people wrote free software and packaged them into distributions that we can use, and built hardware that we can run it on. While there certainly exists more work to be done on the software side, with the recent amazing full-source build of Guix based on a 357-byte hand-written seed, I believe that we are closing that loop on the software engineering side. So what remains? Some inspiration for further work: Onwards and upwards, happy hacking! Update 2023-05-03: Added the Liberating property regarding free software, instead of having it be part of the Verifiability and Transparency .

28 April 2023

Enrico Zini: Handling keyboard-like devices

CNC control panel and Bluetooth pedal page turner I acquired some unusual input devices to experiment with, like a CNC control panel and a bluetooth pedal page turner. These identify and behave like a keyboard, sending nice and simple keystrokes, and can be accessed with no drivers or other special software. However, their keystrokes appear together with keystrokes from normal keyboards, which is the expected default when plugging in a keyboard, but not what I want in this case. I'd also like them to be readable via evdev and accessible by my own user. Here's the udev rule I cooked up to handle this use case:
# Handle the CNC control panel
SUBSYSTEM=="input", ENV ID_VENDOR =="04d9", ENV ID_MODEL =="1203", \
   OWNER="enrico", ENV ID_INPUT =""
# Handle the Bluetooth page turner
SUBSYSTEM=="input", ENV ID_BUS =="bluetooth", ENV LIBINPUT_DEVICE_GROUP =="*/ mac ", ENV ID_INPUT_KEYBOARD ="1" \
   OWNER="enrico", ENV ID_INPUT ="", SYMLINK+="input/by-id/bluetooth- mac -kbd"
SUBSYSTEM=="input", ENV ID_BUS =="bluetooth", ENV LIBINPUT_DEVICE_GROUP =="*/ mac ", ENV ID_INPUT_TABLET ="1" \
   OWNER="enrico", ENV ID_INPUT ="", SYMLINK+="input/by-id/bluetooth- mac -tablet"
The bluetooth device didn't have standard rules to create /dev/input/by-id/ symlinks so I added them. In my own code, I watch /dev/input/by-id with inotify to handle when devices appear or disappear. I used udevadm info /dev/input/event to see what I could use to identify the device. The Static device configuration via udev page of libinput's documentation has documentation on the various elements specific to the input subsystem Grepping rule files in /usr/lib/udev/rules.d was useful to see syntax examples. udevadm test /dev/input/event was invaluable for syntax checking and testing my rule file while working on it. Finally, this is an extract of a quick prototype Python code to read keys from the CNC control panel:
import libevdev
KEY_MAP =  
    libevdev.EV_KEY.KEY_GRAVE: "EMERGENCY",
    # InputEvent(EV_KEY, KEY_LEFTALT, 1)
    libevdev.EV_KEY.KEY_R: "CYCLE START",
    libevdev.EV_KEY.KEY_F5: "SPINDLE ON/OFF",
    # InputEvent(EV_KEY, KEY_RIGHTCTRL, 1)
    libevdev.EV_KEY.KEY_W: "REDO",
    # InputEvent(EV_KEY, KEY_LEFTALT, 1)
    libevdev.EV_KEY.KEY_N: "SINGLE STEP",
    # InputEvent(EV_KEY, KEY_LEFTCTRL, 1)
    libevdev.EV_KEY.KEY_O: "ORIGIN POINT",
    libevdev.EV_KEY.KEY_ESC: "STOP",
    libevdev.EV_KEY.KEY_KPPLUS: "SPEED UP",
    libevdev.EV_KEY.KEY_KPMINUS: "SLOW DOWN",
    libevdev.EV_KEY.KEY_F11: "F+",
    libevdev.EV_KEY.KEY_F10: "F-",
    libevdev.EV_KEY.KEY_RIGHTBRACE: "J+",
    libevdev.EV_KEY.KEY_LEFTBRACE: "J-",
    libevdev.EV_KEY.KEY_UP: "+Y",
    libevdev.EV_KEY.KEY_DOWN: "-Y",
    libevdev.EV_KEY.KEY_LEFT: "-X",
    libevdev.EV_KEY.KEY_RIGHT: "+X",
    libevdev.EV_KEY.KEY_KP7: "+A",
    libevdev.EV_KEY.KEY_Q: "-A",
    libevdev.EV_KEY.KEY_PAGEDOWN: "-Z",
    libevdev.EV_KEY.KEY_PAGEUP: "+Z",
 
class KeyReader:
    def __init__(self, path: str):
        self.path = path
        self.fd: IO[bytes]   None = None
        self.device: libevdev.Device   None = None
    def __enter__(self):
        self.fd = open(self.path, "rb")
        self.device = libevdev.Device(self.fd)
        return self
    def __exit__(self, exc_type, exc, tb):
        self.device = None
        self.fd.close()
        self.fd = None
    def events(self) -> Iterator[dict[str, Any]]:
        for e in self.device.events():
            if e.type == libevdev.EV_KEY:
                if (val := KEY_MAP.get(e.code)):
                    yield  
                        "name": val,
                        "value": e.value,
                        "sec": e.sec,
                        "usec": e.usec,
                     
Edited: added rules to handle the Bluetooth page turner

Shirish Agarwal: John Grisham s books, Evolution removed from textbooks

Gray Mountain John Grisham I have been perusing John Grisham s books, some read and some re-read again. Almost all of the books that Mr. Grisham wrote are relevant even today. The Gray Mountain talks about how mountain top removal was done in Applachia, the U.S. (South). In fact NASA made a summary which either was borrowed from this book or the author borrowed from NASA, either could be true although seems it might be the former. And this is when GOI just made a new Forest Conservation Bill 2023 which does the opposite. There are many examples of the same, the latest from my own city as an e.g. Vetal Tekdi is and was a haven for people animals, birds all kinds of ecosystem and is a vital lung of the city, one of the few remaining green spaces in the city but BJP wants to commercialize it as it has been doing for everything, I haven t been to the Himalayas since 4-5 years back as I hear the rape of the land daily. Even after Joshimath tragedy, if people are not opening the eyes what can I do  The more I say, the more depressed I will become so will leave it for now. In many ways the destruction seems similar to the destruction that happened in Brazil under Jair Bolsonaro. So as can be seen from what I have shared this book was mostly about environment and punitive damages and also how punitive damages have been ceiling in America (South). This was done via lobbying by the coal groups and apparently destroyed people s lives, communities, even whole villages. It also shared how most people called black lung and how those claims had been denied by Coal Companies all the time. And there are hardly any unions. While the book itself is a fiction piece, there is a large amount of truth in it. And that is one of the reasons people write a fiction book. You could write about reality using fictional names and nobody can sue you while you set the reality as it is. In many ways, it is a tell-all.

The Testament John Grisham One of the things I have loved about John Grisham is he understands human condition. In this book it starts with an eccentric billionaire who makes a will which leaves all his property to an illegitimate child who coincidentally also lives in Brazil, she is a missionary. The whole book is about human failings as well as about finding the heir. I am not going to give much detail as the book itself is fun.

The Appeal John Grisham In this, we are introduced to a company that does a chemical spill for decades and how that leads to lobbying and funding Judicial elections. It does go into quite a bit of length how private money coming from Big business does all kind of shady things to get their person elected to the Supreme Court. Sadly, this seems to happen all the time, for e.g. two weeks ago. This piece from the Atlantic also says the same. Again, won t tell as there is a bit of irony at the end of the book.

The Rainmaker John Grisham This in short is about how Insurance Companies stiff people. It s a wonderful story that has all people in grey including our hero. An engaging book that sorta tells how the Insurance Industry plays the game. In India, the companies are safe as they ask for continuance for years together and their object is to delay the hearings till the grandchildren are not alive unlike in the U.S. or UK. It also does tell how more lawyers are there then required. Both of which are the same in my country.

The Litigators John Grisham This one is about Product Liability, both medicines as well as toys for young kids. What I do find funny a bit is how the law in States allows people to file cases but doesn t protect people while in EU they try their best that if there is any controversy behind any medicine or product, they will simply ban it. Huge difference between the two cultures.

Evolution no longer part of Indian Education A few days ago, NCERT (one of the major bodies) that looks into Indian Education due to BJP influence has removed Darwin s Theory of Evolution. You can t even debate because the people do not understand adaptability. So the only conclusion is that Man suddenly appeared out of nowhere. If that is so, then we are the true aliens. They discard the notion that we and Chimpanzees are similar. Then they also have to discard this finding that genetically we are 96% similar.

27 April 2023

Simon Josefsson: A Security Device Threat Model: The Substitution Attack

I d like to describe and discuss a threat model for computational devices. This is generic but we will narrow it down to security-related devices. For example, portable hardware dongles used for OpenPGP/OpenSSH keys, FIDO/U2F, OATH HOTP/TOTP, PIV, payment cards, wallets etc and more permanently attached devices like a Hardware Security Module (HSM), a TPM-chip, or the hybrid variant of a mostly permanently-inserted but removable hardware security dongles. Our context is cryptographic hardware engineering, and the purpose of the threat model is to serve as as a thought experiment for how to build and design security devices that offer better protection. The threat model is related to the Evil maid attack. Our focus is to improve security for the end-user rather than the traditional focus to improve security for the organization that provides the token to the end-user, or to improve security for the site that the end-user is authenticating to. This is a critical but often under-appreciated distinction, and leads to surprising recommendations related to onboard key generation, randomness etc below.

The Substitution Attack
An attacker is able to substitute any component of the device (hardware or software) at any time for any period of time.
Your takeaway should be that devices should be designed to mitigate harmful consequences if any component of the device (hardware or software) is substituted for a malicious component for some period of time, at any time, during the lifespan of that component. Some designs protect better against this attack than other designs, and the threat model can be used to understand which designs are really bad, and which are less so.

Terminology The threat model involves at least one device that is well-behaving and one that is not, and we call these Good Device and Bad Device respectively. The bad device may be the same physical device as the good key, but with some minor software modification or a minor component replaced, but could also be a completely separate physical device. We don t care about that distinction, we just care if a particular device has a malicious component in it or not. I ll use terms like security device , device , hardware key , security co-processor etc interchangeably. From an engineering point of view, malicious here includes unintentional behavior such as software or hardware bugs. It is not possible to differentiate an intentionally malicious device from a well-designed device with a critical bug. Don t attribute to malice what can be adequately explained by stupidity, but don t na vely attribute to stupidity what may be deniable malicious.

What is some period of time ? Some period of time can be any length of time: seconds, minutes, days, weeks, etc. It may also occur at any time: During manufacturing, during transportation to the user, after first usage by the user, or after a couple of months usage by the user. Note that we intentionally consider time-of-manufacturing as a vulnerable phase. Even further, the substitution may occur multiple times. So the Good Key may be replaced with a Bad Key by the attacker for one day, then returned, and later this repeats a month later.

What is harmful consequences ? Since a security key has a fairly well-confined scope and purpose, we can get a fairly good exhaustive list of things that could go wrong. Harmful consequences include:
  • Attacker learns any secret keys stored on a Good Key.
  • Attacker causes user to trust a public generated by a Bad Key.
  • Attacker is able to sign something using a Good Key.
  • Attacker learns the PIN code used to unlock a Good Key.
  • Attacker learns data that is decrypted by a Good Key.

Thin vs Deep solutions One approach to mitigate many issues arising from device substitution is to have the host (or remote site) require that the device prove that it is the intended unique device before it continues to talk to it. This require an authentication/authorization protocol, which usually involves unique device identity and out-of-band trust anchors. Such trust anchors is often problematic, since a common use-case for security device is to connect it to a host that has never seen the device before. A weaker approach is to have the device prove that it merely belongs to a class of genuine devices from a trusted manufacturer, usually by providing a signature generated by a device-specific private key signed by the device manufacturer. This is weaker since then the user cannot differentiate two different good devices. In both cases, the host (or remote site) would stop talking to the device if it cannot prove that it is the intended key, or at least belongs to a class of known trusted genuine devices. Upon scrutiny, this solution is still vulnerable to a substitution attack, just earlier in the manufacturing chain: how can the process that injects the per-device or per-class identities/secrets know that it is putting them into a good key rather than a malicious device? Consider also the consequences if the cryptographic keys that guarantee that a device is genuine leaks. The model of the thin solution is similar to the old approach to network firewalls: have a filtering firewall that only lets through intended traffic, and then run completely insecure protocols internally such as telnet. The networking world has evolved, and now we have defense in depth: even within strongly firewall ed networks, it is prudent to run for example SSH with publickey-based user authentication even on locally physical trusted networks. This approach requires more thought and adds complexity, since each level has to provide some security checking. I m arguing we need similar defense-in-depth for security devices. Security key designs cannot simply dodge this problem by assuming it is working in a friendly environment where component substitution never occur.

Example: Device authentication using PIN codes To see how this threat model can be applied to reason about security key designs, let s consider a common design. Many security keys uses PIN codes to unlock private key operations, for example on OpenPGP cards that lacks built-in PIN-entry functionality. The software on the computer just sends a PIN code to the device, and the device allows private-key operations if the PIN code was correct. Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious device that saves a copy of the PIN code presented to it, and then gives out error messages. Once the user has entered the PIN code and gotten an error message, presumably temporarily giving up and doing other things, the attacker replaces the device back again. The attacker has learnt the PIN code, and can later use this to perform private-key operations on the good device. This means a good design involves not sending PIN codes in clear, but use a stronger authentication protocol that allows the card to know that the PIN was correct without learning the PIN. This is implemented optionally for many OpenPGP cards today as the key-derivation-function extension. That should be mandatory, and users should not use setups that sends device authentication in the clear, and ultimately security devices should not even include support for that. Compare how I build Gnuk on my PGP card with the kdf_do=required option.

Example: Onboard non-predictable key-generation Many devices offer both onboard key-generation, for example OpenPGP cards that generate a Ed25519 key internally on the devices, or externally where the device imports an externally generated cryptographic key. Let s apply the subsitution threat model to this design: the user wishes to generate a key and trust the public key that came out of that process. The attacker substitutes the device for a malicious device during key-generation, imports the private key into a good device and gives that back to the user. Most of the time except during key generation the user uses a good device but still the attacker succeeded in having the user trust a public key which the attacker knows the private key for. The substitution may be a software modification, and the method to leak the private key to the attacker may be out-of-band signalling. This means a good design never generates key on-board, but imports them from a user-controllable environment. That approach should be mandatory, and users should not use setups that generates private keys on-board, and ultimately security devices should not even include support for that.

Example: Non-predictable randomness-generation Many devices claims to generate random data, often with elaborate design documents explaining how good the randomness is. Let s apply the substitution threat model to this design: the attacker replaces the intended good key with a malicious design that generates (for the attacker) predictable randomness. The user will never be able to detect the difference since the random output is, well, random, and typically not distinguishable from weak randomness. The user cannot know if any cryptographic keys generated by a generator was faulty or not. This means a good design never generates non-predictable randomness on the device. That approach should be mandatory, and users should not use setups that generates non-predictable randomness on the device, and ideally devices should not have this functionality.

Case-Study: Tillitis I have warmed up a bit for this. Tillitis is a new security device with interesting properties, and core to its operation is the Compound Device Identifier (CDI), essentially your Ed25519 private key (used for SSH etc) is derived from the CDI that is computed like this:
cdi = blake2s(UDS, blake2s(device_app), USS)
Let s apply the substitution threat model to this design: Consider someone replacing the Tillitis key with a malicious key during postal delivery of the key to the user, and the replacement device is identical with the real Tillitis key but implements the following key derivation function:
cdi = weakprng(UDS , weakprng(device_app), USS)
Where weakprng is a compromised algorithm that is predictable for the attacker, but still appears random. Everything will work correctly, but the attacker will be able to learn the secrets used by the user, and the user will typically not be able to tell the difference since the CDI is secret and the Ed25519 public key is not self-verifiable.

Conclusion Remember that it is impossible to fully protect against this attack, that s why it is merely a thought experiment, intended to be used during design of these devices. Consider an attacker that never gives you access to a good key and as a user you only ever use a malicious device. There is no way to have good security in this situation. This is not hypothetical, many well-funded organizations do what they can to derive people from having access to trustworthy security devices. Philosophically it does not seem possible to tell if these organizations have succeeded 100% already and there are only bad security devices around where further resistance is futile, but to end on an optimistic note let s assume that there is a non-negligible chance that they haven t succeeded. In these situations, this threat model becomes useful to improve the situation by identifying less good designs, and that s why the design mantra of mitigate harmful consequences is crucial as a takeaway from this. Let s improve the design of security devices that further the security of its users!

25 April 2023

B lint R czey: Improve build time of Rust, Java and Intel Fortran projects with Firebuild s new release!

Rust is a hugely popular compiled programming language and accelerating it was an important goal for Firebuild for some time. Firebuild s v0.8.0 release finally added Rust support in addition to numerous other improvements including support for Doxygen, Intel s Fortran compiler and restored javac and javadoc acceleration.

Firebuild s Rust + Cargo support Firebuild treats programs as black boxes intercepting C standard library calls and system calls. It shortcuts the program invocations that predictably generate the same outputs because the program itself is known to be deterministic and all inputs are known in advance. Rust s compiler, rustc is deterministic in itself and simple rustc invocations were already accelerated, but parallel builds driven by Cargo needed a few enhancements in Firebuild.

Cargo s jobserver Cargo uses the Rust variant of the GNU Make s jobserver to control the parallelism in a build. The jobserver creates a file descriptor from which descendant processes can read tokens and are allowed to run one extra thread or parallel process per token received. After the extra threads or processes are finished the tokens must be returned by writing to the other file descriptor the jobserver created. The jobserver s file descriptors are shared with the descendant processes via environment variables:
# rustc's environment variables
...
CARGO_MAKEFLAGS="-j --jobserver-fds=4,5 --jobserver-auth=4,5"
...
Since getting tokens from the jobserver involves reading them as nondeterministic bytes from an inherited file descriptor this is clearly an operation that would depend on input not known in advance. Firebuild needs to make an exception and ignore jobserver usage related reads and writes since they are not meant to change the build results. However, there are programs not caring at all about jobservers and their file descriptors. They happily close the inherited file descriptors and open new ones with the same id, to use them for entirely different purposes. One such program is the widely used ./configure script, thus the case is far from being theoretical. To stay on the safe side firebuild ignores jobserver fd usage only in programs which are known to use the jobserver properly. The list of the programs is now configurable in /etc/firebuild.conf and since rustc is on the list by default parallel Rust builds are accelerated out of the box!

Writable dependency dir The other issue that prevented highly accelerated Rust builds was rustc s -L dependency=<dir> parameter. This directory is populated in a not fully deterministic order in parallel builds. Firebuild on the other hand hashes directory listings of open()-ed directories treating them as inputs assuming that the directory content will influence the intercepted programs outputs. As rustc programs started in parallel scanned the dependency directory in different states depending on what other Rust compilations finished already Firebuild had to store the full directory content as an input for each rustc cache entry resulting low hit rate when rustc was started again with otherwise identical inputs. The solution here is ignoring rustc scanning the dependency directory, because the dependencies actually used are still treated as input and are checked when shortcutting rustc. With that implemented in firebuild, too, librsvg s build that uses Rust and Cargo can be accelerated by more than 90%, even on a system having 12 cores/24 threads!:
Firebuild accelerating librsvg s Rust + Cargo build from 38s to 2.8s on a Ryzen 5900X (12C/24T) system

On the way to accelerate anything Firebuild s latest release incorporated more than 100 changes just from the last two months. They unlocked acceleration of Rust builds with Cargo, fixed Firebuild to work with the latest Java update that slightly changed its behavior, started accelerating Intel s Fortran compiler in addition to accelerating gfortran that was already supported and included many smaller changes improving the acceleration of other compilers and tools. If your favorite toolchain is not mentioned, there is still a good chance that it is already supported. Give Firebuild a try and tell us about your experience! Update 1: Comparison to sccache came up in the reddit topic about Firebuild s Rust acceleration , thus by popular demand this is how sccache performs on the same project:
Firebuild 0.8.0 vs. sccache 0.4.2 accelerating librsvg s Rust + Cargo build
All builds took place on the same Ryzen 5900X system with 12 cores / 24 threads in LXC containers limited to using 1-12 virtual CPUs. A warm-up build took place before the vanilla (without any instrumentation) build to download and compile the dependency crates to measure only the project s build time. A git clean command cleared all the build artifacts from the project directory before each build and ./autogen.sh was run to measure only clean rebuilds (without autotools). See test configuration in the Firebuild performance test repository for more details and easy reproduction. Firebuild had lower overhead than sccache (2.83% vs. 6.10% on 1 CPU and 7.71% vs. 22.05% on 12 CPUs) and made the accelerated build finish much faster (2.26% vs. 19.41% of vanilla build s time on 1 CPU and 7.5% vs. 27.4% of vanilla build s time on 12 CPUs).

23 April 2023

Petter Reinholdtsen: Speech to text, she APTly whispered, how hard can it be?

While visiting a convention during Easter, it occurred to me that it would be great if I could have a digital Dictaphone with transcribing capabilities, providing me with texts to cut-n-paste into stuff I need to write. The background is that long drives often bring up the urge to write on texts I am working on, which of course is out of the question while driving. With the release of OpenAI Whisper, this seem to be within reach with Free Software, so I decided to give it a go. OpenAI Whisper is a Linux based neural network system to read in audio files and provide text representation of the speech in that audio recording. It handle multiple languages and according to its creators even can translate into a different language than the spoken one. I have not tested the latter feature. It can either use the CPU or a GPU with CUDA support. As far as I can tell, CUDA in practice limit that feature to NVidia graphics cards. I have few of those, as they do not work great with free software drivers, and have not tested the GPU option. While looking into the matter, I did discover some work to provide CUDA support on non-NVidia GPUs, and some work with the library used by Whisper to port it to other GPUs, but have not spent much time looking into GPU support yet. I've so far used an old X220 laptop as my test machine, and only transcribed using its CPU. As it from a privacy standpoint is unthinkable to use computers under control of someone else (aka a "cloud" service) to transcribe ones thoughts and personal notes, I want to run the transcribing system locally on my own computers. The only sensible approach to me is to make the effort I put into this available for any Linux user and to upload the needed packages into Debian. Looking at Debian Bookworm, I discovered that only three packages were missing, tiktoken, triton, and openai-whisper. For a while I also believed ffmpeg-python was needed, but as its upstream seem to have vanished I found it safer to rewrite whisper to stop depending on in than to introduce ffmpeg-python into Debian. I decided to place these packages under the umbrella of the Debian Deep Learning Team, which seem like the best team to look after such packages. Discussing the topic within the group also made me aware that the triton package was already a future dependency of newer versions of the torch package being planned, and would be needed after Bookworm is released. All required code packages have been now waiting in the Debian NEW queue since Wednesday, heading for Debian Experimental until Bookworm is released. An unsolved issue is how to handle the neural network models used by Whisper. The default behaviour of Whisper is to require Internet connectivity and download the model requested to ~/.cache/whisper/ on first invocation. This obviously would fail the deserted island test of free software as the Debian packages would be unusable for someone stranded with only the Debian archive and solar powered computer on a deserted island. Because of this, I would love to include the models in the Debian mirror system. This is problematic, as the models are very large files, which would put a heavy strain on the Debian mirror infrastructure around the globe. The strain would be even higher if the models change often, which luckily as far as I can tell they do not. The small model, which according to its creator is most useful for English and in my experience is not doing a great job there either, is 462 MiB (deb is 414 MiB). The medium model, which to me seem to handle English speech fairly well is 1.5 GiB (deb is 1.3 GiB) and the large model is 2.9 GiB (deb is 2.6 GiB). I would assume everyone with enough resources would prefer to use the large model for highest quality. I believe the models themselves would have to go into the non-free part of the Debian archive, as they are not really including any useful source code for updating the models. The "source", aka the model training set, according to the creators consist of "680,000 hours of multilingual and multitask supervised data collected from the web", which to me reads material with both unknown copyright terms, unavailable to the general public. In other words, the source is not available according to the Debian Free Software Guidelines and the model should be considered non-free. I asked the Debian FTP masters for advice regarding uploading a model package on their IRC channel, and based on the feedback there it is still unclear to me if such package would be accepted into the archive. In any case I wrote build rules for a OpenAI Whisper model package and modified the Whisper code base to prefer shared files under /usr/ and /var/ over user specific files in ~/.cache/whisper/ to be able to use these model packages, to prepare for such possibility. One solution might be to include only one of the models (small or medium, I guess) in the Debian archive, and ask people to download the others from the Internet. Not quite sure what to do here, and advice is most welcome (use the debian-ai mailing list). To make it easier to test the new packages while I wait for them to clear the NEW queue, I created an APT source targeting bookworm. I selected Bookworm instead of Bullseye, even though I know the latter would reach more users, is that some of the required dependencies are missing from Bullseye and I during this phase of testing did not want to backport a lot of packages just to get up and running. Here is a recipe to run as user root if you want to test OpenAI Whisper using Debian packages on your Debian Bookworm installation, first adding the APT repository GPG key to the list of trusted keys, then setting up the APT repository and finally installing the packages and one of the models:
curl https://geekbay.nuug.no/~pere/openai-whisper/D78F5C4796F353D211B119E28200D9B589641240.asc \
  -o /etc/apt/trusted.gpg.d/pere-whisper.asc
mkdir -p /etc/apt/sources.list.d
cat > /etc/apt/sources.list.d/pere-whisper.list <<EOF
deb https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
deb-src https://geekbay.nuug.no/~pere/openai-whisper/ bookworm main
EOF
apt update
apt install openai-whisper
The package work for me, but have not yet been tested on any other computer than my own. With it, I have been able to (badly) transcribe a 2 minute 40 second Norwegian audio clip to test using the small model. This took 11 minutes and around 2.2 GiB of RAM. Transcribing the same file with the medium model gave a accurate text in 77 minutes using around 5.2 GiB of RAM. My test machine had too little memory to test the large model, which I believe require 11 GiB of RAM. In short, this now work for me using Debian packages, and I hope it will for you and everyone else once the packages enter Debian. Now I can start on the audio recording part of this project. As usual, if you use Bitcoin and want to show your support of my activities, please send Bitcoin donations to my address 15oWEoG9dUPovwmUL9KWAnYRtNJEkP1u1b.

20 April 2023

Jamie McClelland: Electron doesn't like negative layout coordinates

I got a second external monitor. Overkill? Probably, but I like having a dedicated space to instant messaging (now left monitor) and also a dedicated space for a web browser (right monitor). But, when I moved signal-desktop to the left monitor, clicks stopped working. I moved it back to my laptop screen, clicks started working. Other apps (like gajim) worked fine. A real mystery. I spent a lot of time on the wrong thing. I turned this monitor into portrait mode. Maybe signal doesn t like portrait mode? Nope. Maybe signal doesn t think it has enough horizontal space? Nope. Maybe signal suddently doesn t like being on an external monitor? Nope. Maybe signal will output something useful if I start it via the terminal? Nope (but that was a real distraction). Then, I discovered that mattermost desktop behaves the same way. A clue! So, now I know it s an electron app limitation, not a signal limitation. Finally I hit on the problem: Via my sway desktop, I set my laptop screen to the x,y coordinates: 0,0 (since it s in the middle). I set the left monitor to negative coordinates so it would appear on the left. Well, electron does not like that, not sure why. Now, my left monitor occupies 0,0 and the rest are adjusted to that center. I originally used negative coordinates so when I unplugged my monitors, my laptop would still be 0,0 and display all desktops properly. Fortunately, sway magically figures that all out even when the center shifts by unplugging the monitors. Hooray for sway.

17 April 2023

Matthew Garrett: Booting modern Intel CPUs

CPUs can't do anything without being told what to do, which leaves the obvious problem of how do you tell a CPU to do something in the first place. On many CPUs this is handled in the form of a reset vector - an address the CPU is hardcoded to start reading instructions from when power is applied. The address the reset vector points to will typically be some form of ROM or flash that can be read by the CPU even if no other hardware has been configured yet. This allows the system vendor to ship code that will be executed immediately after poweron, configuring the rest of the hardware and eventually getting the system into a state where it can run user-supplied code.

The specific nature of the reset vector on x86 systems has varied over time, but it's effectively always been 16 bytes below the top of the address space - so, 0xffff0 on the 20-bit 8086, 0xfffff0 on the 24-bit 80286, and 0xfffffff0 on the 32-bit 80386. Convention on x86 systems is to have RAM starting at address 0, so the top of address space could be used to house the reset vector with as low a probability of conflicting with RAM as possible.

The most notable thing about x86 here, though, is that when it starts running code from the reset vector, it's still in real mode. x86 real mode is a holdover from a much earlier era of computing. Rather than addresses being absolute (ie, if you refer to a 32-bit address, you store the entire address in a 32-bit or larger register), they are 16-bit offsets that are added to the value stored in a "segment register". Different segment registers existed for code, data, and stack, so a 16-bit address could refer to different actual addresses depending on how it was being interpreted - jumping to a 16 bit address would result in that address being added to the code segment register, while reading from a 16 bit address would result in that address being added to the data segment register, and so on. This is all in order to retain compatibility with older chips, to the extent that even 64-bit x86 starts in real mode with segments and everything (and, also, still starts executing at 0xfffffff0 rather than 0xfffffffffffffff0 - 64-bit mode doesn't support real mode, so there's no way to express a 64-bit physical address using the segment registers, so we still start just below 4GB even though we have massively more address space available).

Anyway. Everyone knows all this. For modern UEFI systems, the firmware that's launched from the reset vector then reprograms the CPU into a sensible mode (ie, one without all this segmentation bullshit), does things like configure the memory controller so you can actually access RAM (a process which involves using CPU cache as RAM, because programming a memory controller is sufficiently hard that you need to store more state than you can fit in registers alone, which means you need RAM, but you don't have RAM until the memory controller is working, but thankfully the CPU comes with several megabytes of RAM on its own in the form of cache, so phew). It's kind of ugly, but that's a consequence of a bunch of well-understood legacy decisions.

Except. This is not how modern Intel x86 boots. It's far stranger than that. Oh, yes, this is what it looks like is happening, but there's a bunch of stuff going on behind the scenes. Let's talk about boot security. The idea of any form of verified boot (such as UEFI Secure Boot) is that a signature on the next component of the boot chain is validated before that component is executed. But what verifies the first component in the boot chain? You can't simply ask the BIOS to verify itself - if an attacker can replace the BIOS, they can replace it with one that simply lies about having done so. Intel's solution to this is called Boot Guard.

But before we get to Boot Guard, we need to ensure the CPU is running in as bug-free a state as possible. So, when the CPU starts up, it examines the system flash and looks for a header that points at CPU microcode updates. Intel CPUs ship with built-in microcode, but it's frequently old and buggy and it's up to the system firmware to include a copy that's new enough that it's actually expected to work reliably. The microcode image is pulled out of flash, a signature is verified, and the new microcode starts running. This is true in both the Boot Guard and the non-Boot Guard scenarios. But for Boot Guard, before jumping to the reset vector, the microcode on the CPU reads an Authenticated Code Module (ACM) out of flash and verifies its signature against a hardcoded Intel key. If that checks out, it starts executing the ACM. Now, bear in mind that the CPU can't just verify the ACM and then execute it directly from flash - if it did, the flash could detect this, hand over a legitimate ACM for the verification, and then feed the CPU different instructions when it reads them again to execute them (a Time of Check vs Time of Use, or TOCTOU, vulnerability). So the ACM has to be copied onto the CPU before it's verified and executed, which means we need RAM, which means the CPU already needs to know how to configure its cache to be used as RAM.

Anyway. We now have an ACM loaded and verified, and it can safely be executed. The ACM does various things, but the most important from the Boot Guard perspective is that it reads a set of write-once fuses in the motherboard chipset that represent the SHA256 of a public key. It then reads the initial block of the firmware (the Initial Boot Block, or IBB) into RAM (or, well, cache, as previously described) and parses it. There's a block that contains a public key - it hashes that key and verifies that it matches the SHA256 from the fuses. It then uses that key to validate a signature on the IBB. If it all checks out, it executes the IBB and everything starts looking like the nice simple model we had before.

Except, well, doesn't this seem like an awfully complicated bunch of code to implement in real mode? And yes, doing all of this modern crypto with only 16-bit registers does sound like a pain. So, it doesn't. All of this is happening in a perfectly sensible 32 bit mode, and the CPU actually switches back to the awful segmented configuration afterwards so it's still compatible with an 80386 from 1986. The "good" news is that at least firmware can detect that the CPU has already configured the cache as RAM and can skip doing that itself.

I'm skipping over some steps here - the ACM actually does other stuff around measuring the firmware into the TPM and doing various bits of TXT setup for people who want DRTM in their lives, but the short version is that the CPU bootstraps itself into a state where it works like a modern CPU and then deliberately turns a bunch of the sensible functionality off again before it starts executing firmware. I'm also missing out the fact that this entire process only kicks off after the Management Engine says it can, which means we're waiting for an entirely independent x86 to boot an entire OS before our CPU even starts pretending to execute the system firmware.

Of course, as mentioned before, on modern systems the firmware will then reprogram the CPU into something actually sensible so OS developers no longer need to care about this[1][2], which means we've bounced between multiple states for no reason other than the possibility that someone wants to run legacy BIOS and then boot DOS on a CPU with like 5 orders of magnitude more transistors than the 8086.

tl;dr why can't my x86 wake up with the gin protected mode already inside it

[1] Ha uh except that on ACPI resume we're going to skip most of the firmware setup code so we still need to handle the CPU being in fucking 16-bit mode because suspend/resume is basically an extremely long reboot cycle

[2] Oh yeah also you probably have multiple cores on your CPU and well bad news about the state most of the cores are in when the OS boots because the firmware never started them up so they're going to come up in 16-bit real mode even if your boot CPU is already in 64-bit protected mode, unless you were using TXT in which case you have a different sort of nightmare that if we're going to try to map it onto real world nightmare concepts is one that involves a lot of teeth. Or, well, that used to be the case, but ACPI 6.4 (released in 2021) provides a mechanism for the OS to ask the firmware to wake the CPU up for it so this is invisible to the OS, but you're still relying on the firmware to actually do the heavy lifting here

comment count unavailable comments

14 April 2023

John Goerzen: Easily Accessing All Your Stuff with a Zero-Trust Mesh VPN

Probably everyone is familiar with a regular VPN. The traditional use case is to connect to a corporate or home network from a remote location, and access services as if you were there. But these days, the notion of corporate network and home network are less based around physical location. For instance, a company may have no particular office at all, may have a number of offices plus a number of people working remotely, and so forth. A home network might have, say, a PVR and file server, while highly portable devices such as laptops, tablets, and phones may want to talk to each other regardless of location. For instance, a family member might be traveling with a laptop, another at a coffee shop, and those two devices might want to communicate, in addition to talking to the devices at home. And, in both scenarios, there might be questions about giving limited access to friends. Perhaps you d like to give a friend access to part of your file server, or as a company, you might have contractors working on a limited project. Pretty soon you wind up with a mess of VPNs, forwarded ports, and tricks to make it all work. With the increasing prevalence of CGNAT, a lot of times you can t even open a port to the public Internet. Each application or device probably has its own gateway just to make it visible on the Internet, some of which you pay for. Then you add on the question of: should you really trust your LAN anyhow? With possibilities of guests using it, rogue access points, etc., the answer is probably no . We can move the responsibility for dealing with NAT, fluctuating IPs, encryption, and authentication, from the application layer further down into the network stack. We then arrive at a much simpler picture for all. So this page is fundamentally about making the network work, simply and effectively.

How do we make the Internet work in these scenarios? We re going to combine three concepts:
  1. A VPN, providing fully encrypted and authenticated communication and stable IPs
  2. Mesh Networking, in which devices automatically discover optimal paths to reach each other
  3. Zero-trust networking, in which we do not need to trust anything about the underlying LAN, because all our traffic uses the secure systems in points 1 and 2.
By combining these concepts, we arrive at some nice results:
  • You can ssh hostname, where hostname is one of your machines (server, laptop, whatever), and as long as hostname is up, you can reach it, wherever it is, wherever you are.
    • Combined with mosh, these sessions will be durable even across moving to other host networks.
    • You could just as well use telnet, because the underlying network should be secure.
  • You don t have to mess with encryption keys, certs, etc., for every internal-only service. Since IPs are now trustworthy, that s all you need. hosts.allow could make a comeback!
  • You have a way of transiting out of extremely restrictive networks. Every tool discussed here has a way of falling back on routing things via a broker (relay) on TCP port 443 if all else fails.
There might sometimes be tradeoffs. For instance:
  • On LANs faster than 1Gbps, performance may degrade due to encryption and encapsulation overhead. However, these tools should let hosts discover the locality of each other and not send traffic over the Internet if the devices are local.
  • With some of these tools, hosts local to each other (on the same LAN) may be unable to find each other if they can t reach the control plane over the Internet (Internet is down or provider is down)
Some other features that some of the tools provide include:
  • Easy sharing of limited access with friends/guests
  • Taking care of everything you need, including SSL certs, for exposing a certain on-net service to the public Internet
  • Optional routing of your outbound Internet traffic via an exit node on your network. Useful, for instance, if your local network is blocking tons of stuff.
Let s dive in.

Types of Mesh VPNs I ll go over several types of meshes in this article:
  1. Fully decentralized with automatic hop routing This model has no special central control plane. Nodes discover each other in various ways, and establish routes to each other. These routes can be direct connections over the Internet, or via other nodes. This approach offers the greatest resilience. Examples I ll cover include Yggdrasil and tinc.
  2. Automatic peer-to-peer with centralized control In this model, nodes, by default, communicate by establishing direct links between them. A regular node never carries traffic on behalf of other nodes. Special-purpose relays are used to handle cases in which NAT traversal is impossible. This approach tends to offer simple setup. Examples I ll cover include Tailscale, Zerotier, Nebula, and Netmaker.
  3. Roll your own and hybrid approaches This is a grab bag of other ideas; for instance, running Yggdrasil over Tailscale.

Terminology For the sake of consistency, I m going to use common language to discuss things that have different terms in different ecosystems:
  • Every tool discussed here has a way of dealing with NAT traversal. It may assist with establishing direct connections (eg, STUN), and if that fails, it may simply relay traffic between nodes. I ll call such a relay a broker . This may or may not be the same system that is a control plane for a tool.
  • All of these systems operate over lower layers that are unencrypted. Those lower layers may be a LAN (wired or wireless, which may or may not have Internet access), or the public Internet (IPv4 and/or IPv6). I m going to call the unencrypted lower layer, whatever it is, the clearnet .

Evaluation Criteria Here are the things I want to see from a solution:
  • Secure, with all communications end-to-end encrypted and authenticated, and prevention of traffic from untrusted devices.
  • Flexible, adapting to changes in network topology quickly and automatically.
  • Resilient, without single points of failure, and with devices local to each other able to communicate even if cut off from the Internet or other parts of the network.
  • Private, minimizing leakage of information or metadata about me and my systems
  • Able to traverse CGNAT without having to use a broker whenever possible
  • A lesser requirement for me, but still a nice to have, is the ability to include others via something like Internet publishing or inviting guests.
  • Fully or nearly fully Open Source
  • Free or very cheap for personal use
  • Wide operating system support, including headless Linux on x86_64 and ARM.

Fully Decentralized VPNs with Automatic Hop Routing Two systems fit this description: Yggdrasil and Tinc. Let s dive in.

Yggdrasil I ll start with Yggdrasil because I ve written so much about it already. It featured in prior posts such as:

Yggdrasil can be a private mesh VPN, or something more Yggdrasil can be a private mesh VPN, just like the other tools covered here. It s unique, however, in that a key goal of the project is to also make it useful as a planet-scale global mesh network. As such, Yggdrasil is a testbed of new ideas in distributed routing designed to scale up to massive sizes and all sorts of connection conditions. As of 2023-04-10, the main global Yggdrasil mesh has over 5000 nodes in it. You can choose whether or not to participate. Every node in a Yggdrasil mesh has a public/private keypair. Each node then has an IPv6 address (in a private address space) derived from its public key. Using these IPv6 addresses, you can communicate right away. Yggdrasil differs from most of the other tools here in that it does not necessarily seek to establish a direct link on the clearnet between, say, host A and host G for them to communicate. It will prefer such a direct link if it exists, but it is perfectly happy if it doesn t. The reason is that every Yggdrasil node is also a router in the Yggdrasil mesh. Let s sit with that concept for a moment. Consider:
  • If you have a bunch of machines on your LAN, but only one of them can peer over the clearnet, that s fine; all the other machines will discover this route to the world and use it when necessary.
  • All you need to run a broker is just a regular node with a public IP address. If you are participating in the global mesh, you can use one (or more) of the free public peers for this purpose.
  • It is not necessary for every node to know about the clearnet IP address of every other node (improving privacy). In fact, it s not even necessary for every node to know about the existence of all the other nodes, so long as it can find a route to a given node when it s asked to.
  • Yggdrasil can find one or more routes between nodes, and it can use this knowledge of multiple routes to aggressively optimize for varying network conditions, including combinations of, say, downloads and low-latency ssh sessions.
Behind the scenes, Yggdrasil calculates optimal routes between nodes as necessary, using a mesh-wide DHT for initial contact and then deriving more optimal paths. (You can also read more details about the routing algorithm.) One final way that Yggdrasil is different from most of the other tools is that there is no separate control server. No node is special , in charge, the sole keeper of metadata, or anything like that. The entire system is completely distributed and auto-assembling.

Meeting neighbors There are two ways that Yggdrasil knows about peers:
  • By broadcast discovery on the local LAN
  • By listening on a specific port (or being told to connect to a specific host/port)
Sometimes this might lead to multiple ways to connect to a node; Yggdrasil prefers the connection auto-discovered by broadcast first, then the lowest-latency of the defined path. In other words, when your laptops are in the same room as each other on your local LAN, your packets will flow directly between them without traversing the Internet.

Unique uses Yggdrasil is uniquely suited to network-challenged situations. As an example, in a post-disaster situation, Internet access may be unavailable or flaky, yet there may be many local devices perhaps ones that had never known of each other before that could share information. Yggdrasil meets this situation perfectly. The combination of broadcast auto-detection, distributed routing, and so forth, basically means that if there is any physical path between two nodes, Yggdrasil will find and enable it. Ad-hoc wifi is rarely used because it is a real pain. Yggdrasil actually makes it useful! Its broadcast discovery doesn t require any IP address provisioned on the interface at all (it just uses the IPv6 link-local address), so you don t need to figure out a DHCP server or some such. And, Yggdrasil will tend to perform routing along the contours of the RF path. So you could have a laptop in the middle of a long distance relaying communications from people farther out, because it could see both. Or even a chain of such things.

Yggdrasil: Security and Privacy Yggdrasil s mesh is aggressively greedy. It will peer with any node it can find (unless told otherwise) and will find a route to anywhere it can. There are two main ways to make sure you keep unauthorized traffic out: by restricting who can talk to your mesh, and by firewalling the Yggdrasil interface. Both can be used, and they can be used simultaneously. I ll discuss firewalling more at the end of this article. Basically, you ll almost certainly want to do this if you participate in the public mesh, because doing so is akin to having a globally-routable public IP address direct to your device. If you want to restrict who can talk to your mesh, you just disable the broadcast feature on all your nodes (empty MulticastInterfaces section in the config), and avoid telling any of your nodes to connect to a public peer. You can set a list of authorized public keys that can connect to your nodes listening interfaces, which you ll probably want to do. You will probably want to either open up some inbound ports (if you can) or set up a node with a known clearnet IP on a place like a $5/mo VPS to help with NAT traversal (again, setting AllowedPublicKeys as appropriate). Yggdrasil doesn t allow filtering multicast clients by public key, only by network interface, so that s why we disable broadcast discovery. You can easily enough teach Yggdrasil about static internal LAN IPs of your nodes and have things work that way. (Or, set up an internal gateway node or two, that the clients just connect to when they re local). But fundamentally, you need to put a bit more thought into this with Yggdrasil than with the other tools here, which are closed-only. Compared to some of the other tools here, Yggdrasil is better about information leakage; nodes only know details, such as clearnet IPs, of directly-connected peers. You can obtain the list of directly-connected peers of any known node in the mesh but that list is the public keys of the directly-connected peers, not the clearnet IPs. Some of the other tools contain a limited integrated firewall of sorts (with limited ACLs and such). Yggdrasil does not, but is fully compatible with on-host firewalls. I recommend these anyway even with many other tools.

Yggdrasil: Connectivity and NAT traversal Compared to the other tools, Yggdrasil is an interesting mix. It provides a fully functional mesh and facilitates connectivity in situations in which no other tool can. Yet its NAT traversal, while it exists and does work, results in using a broker under some of the more challenging CGNAT situations more often than some of the other tools, which can impede performance. Yggdrasil s underlying protocol is TCP-based. Before you run away screaming that it must be slow and unreliable like OpenVPN over TCP it s not, and it is even surprisingly good around bufferbloat. I ve found its performance to be on par with the other tools here, and it works as well as I d expect even on flaky 4G links. Overall, the NAT traversal story is mixed. On the one hand, you can run a node that listens on port 443 and Yggdrasil can even make it speak TLS (even though that s unnecessary from a security standpoint), so you can likely get out of most restrictive firewalls you will ever encounter. If you join the public mesh, know that plenty of public peers do listen on port 443 (and other well-known ports like 53, plus random high-numbered ones). If you connect your system to multiple public peers, there is a chance though a very small one that some public transit traffic might be routed via it. In practice, public peers hopefully are already peered with each other, preventing this from happening (you can verify this with yggdrasilctl debug_remotegetpeers key=ABC...). I have never experienced a problem with this. Also, since latency is a factor in routing for Yggdrasil, it is highly unlikely that random connections we use are going to be competitive with datacenter peers.

Yggdrasil: Sharing with friends If you re open to participating in the public mesh, this is one of the easiest things of all. Have your friend install Yggdrasil, point them to a public peer, give them your Yggdrasil IP, and that s it. (Well, presumably you also open up your firewall you did follow my advice to set one up, right?) If your friend is visiting at your location, they can just hop on your wifi, install Yggdrasil, and it will automatically discover a route to you. Yggdrasil even has a zero-config mode for ephemeral nodes such as certain Docker containers. Yggdrasil doesn t directly support publishing to the clearnet, but it is certainly possible to proxy (or even NAT) to/from the clearnet, and people do.

Yggdrasil: DNS There is no particular extra DNS in Yggdrasil. You can, of course, run a DNS server within Yggdrasil, just as you can anywhere else. Personally I just add relevant hosts to /etc/hosts and leave it at that, but it s up to you.

Yggdrasil: Source code, pricing, and portability Yggdrasil is fully open source (LGPLv3 plus additional permissions in an exception) and highly portable. It is written in Go, and has prebuilt binaries for all major platforms (including a Debian package which I made). There is no charge for anything with Yggdrasil. Listed public peers are free and run by volunteers. You can run your own peers if you like; they can be public and unlisted, public and listed (just submit a PR to get it listed), or private (accepting connections only from certain nodes keys). A peer in this case is just a node with a known clearnet IP address. Yggdrasil encourages use in other projects. For instance, NNCP integrates a Yggdrasil node for easy communication with other NNCP nodes.

Yggdrasil conclusions Yggdrasil is tops in reliability (having no single point of failure) and flexibility. It will maintain opportunistic connections between peers even if the Internet is down. The unique added feature of being able to be part of a global mesh is a nice one. The tradeoffs include being more prone to need to use a broker in restrictive CGNAT environments. Some other tools have clients that override the OS DNS resolver to also provide resolution of hostnames of member nodes; Yggdrasil doesn t, though you can certainly run your own DNS infrastructure over Yggdrasil (or, for that matter, let public DNS servers provide Yggdrasil answers if you wish). There is also a need to pay more attention to firewalling or maintaining separation from the public mesh. However, as I explain below, many other options have potential impacts if the control plane, or your account for it, are compromised, meaning you ought to firewall those, too. Still, it may be a more immediate concern with Yggdrasil. Although Yggdrasil is listed as experimental, I have been using it for over a year and have found it to be rock-solid. They did change how mesh IPs were calculated when moving from 0.3 to 0.4, causing a global renumbering, so just be aware that this is a possibility while it is experimental.

tinc tinc is the oldest tool on this list; version 1.0 came out in 2003! You can think of tinc as something akin to an older Yggdrasil without the public option. I will be discussing tinc 1.0.36, the latest stable version, which came out in 2019. The development branch, 1.1, has been going since 2011 and had its latest release in 2021. The last commit to the Github repo was in June 2022. Tinc is the only tool here to support both tun and tap style interfaces. I go into the difference more in the Zerotier review below. Tinc actually provides a better tap implementation than Zerotier, with various sane options for broadcasts, but I still think the call for an Ethernet, as opposed to IP, VPN is small. To configure tinc, you generate a per-host configuration and then distribute it to every tinc node. It contains a host s public key. Therefore, adding a host to the mesh means distributing its key everywhere; de-authorizing it means removing its key everywhere. This makes it rather unwieldy. tinc can do LAN broadcast discovery and mesh routing, but generally speaking you must manually teach it where to connect initially. Somewhat confusingly, the examples all mention listing a public address for a node. This doesn t make sense for a laptop, and I suspect you d just omit it. I think that address is used for something akin to a Yggdrasil peer with a clearnet IP. Unlike all of the other tools described here, tinc has no tool to inspect the running state of the mesh. Some of the properties of tinc made it clear I was unlikely to adopt it, so this review wasn t as thorough as that of Yggdrasil.

tinc: Security and Privacy As mentioned above, every host in the tinc mesh is authenticated based on its public key. However, to be more precise, this key is validated only at the point it connects to its next hop peer. (To be sure, this is also the same as how the list of allowed pubkeys works in Yggdrasil.) Since IPs in tinc are not derived from their key, and any host can assign itself whatever mesh IP it likes, this implies that a compromised host could impersonate another. It is unclear whether packets are end-to-end encrypted when using a tinc node as a router. The fact that they can be routed at the kernel level by the tun interface implies that they may not be.

tinc: Connectivity and NAT traversal I was unable to find much information about NAT traversal in tinc, other than that it does support it. tinc can run over UDP or TCP and auto-detects which to use, preferring UDP.

tinc: Sharing with friends tinc has no special support for this, and the difficulty of configuration makes it unlikely you d do this with tinc.

tinc: Source code, pricing, and portability tinc is fully open source (GPLv2). It is written in C and generally portable. It supports some very old operating systems. Mobile support is iffy. tinc does not seem to be very actively maintained.

tinc conclusions I haven t mentioned performance in my other reviews (see the section at the end of this post). But, it is so poor as to only run about 300Mbps on my 2.5Gbps network. That s 1/3 the speed of Yggdrasil or Tailscale. Combine that with the unwieldiness of adding hosts and some uncertainties in security, and I m not going to be using tinc.

Automatic Peer-to-Peer Mesh VPNs with centralized control These tend to be the options that are frequently discussed. Let s talk about the options.

Tailscale Tailscale is a popular choice in this type of VPN. To use Tailscale, you first sign up on tailscale.com. Then, you install the tailscale client on each machine. On first run, it prints a URL for you to click on to authorize the client to your mesh ( tailnet ). Tailscale assigns a mesh IP to each system. The Tailscale client lets the Tailscale control plane gather IP information about each node, including all detectable public and private clearnet IPs. When you attempt to contact a node via Tailscale, the client will fetch the known contact information from the control plane and attempt to establish a link. If it can contact over the local LAN, it will (it doesn t have broadcast autodetection like Yggdrasil; the information must come from the control plane). Otherwise, it will try various NAT traversal options. If all else fails, it will use a broker to relay traffic; Tailscale calls a broker a DERP relay server. Unlike Yggdrasil, a Tailscale node never relays traffic for another; all connections are either direct P2P or via a broker. Tailscale, like several others, is based around Wireguard; though wireguard-go rather than the in-kernel Wireguard. Tailscale has a number of somewhat unique features in this space:
  • Funnel, which lets you expose ports on your system to the public Internet via the VPN.
  • Exit nodes, which automate the process of routing your public Internet traffic over some other node in the network. This is possible with every tool mentioned here, but Tailscale makes switching it on or off a couple of quick commands away.
  • Node sharing, which lets you share a subset of your network with guests
  • A fantastic set of documentation, easily the best of the bunch.
Funnel, in particular, is interesting. With a couple of tailscale serve -style commands, you can expose a directory tree (or a development webserver) to the world. Tailscale gives you a public hostname, obtains a cert for it, and proxies inbound traffic to you. This is subject to some unspecified bandwidth limits, and you can only choose from three public ports, so it s not really a production solution but as a quick and easy way to demonstrate something cool to a friend, it s a neat feature.

Tailscale: Security and Privacy With Tailscale, as with the other tools in this category, one of the main threats to consider is the control plane. What are the consequences of a compromise of Tailscale s control plane, or of the credentials you use to access it? Let s begin with the credentials used to access it. Tailscale operates no identity system itself, instead relying on third parties. For individuals, this means Google, Github, or Microsoft accounts; Okta and other SAML and similar identity providers are also supported, but this runs into complexity and expense that most individuals aren t wanting to take on. Unfortunately, all three of those types of accounts often have saved auth tokens in a browser. Personally I would rather have a separate, very secure, login. If a person does compromise your account or the Tailscale servers themselves, they can t directly eavesdrop on your traffic because it is end-to-end encrypted. However, assuming an attacker obtains access to your account, they could:
  • Tamper with your Tailscale ACLs, permitting new actions
  • Add new nodes to the network
  • Forcibly remove nodes from the network
  • Enable or disable optional features
Of note is that they cannot just commandeer an existing IP. I would say the riskiest possibility here is that could add new nodes to the mesh. Because they could also tamper with your ACLs, they could then proceed to attempt to access all your internal services. They could even turn on service collection and have Tailscale tell them what and where all the services are. Therefore, as with other tools, I recommend a local firewall on each machine with Tailscale. More on that below. Tailscale has a new alpha feature called tailnet lock which helps with this problem. It requires existing nodes in the mesh to sign a request for a new node to join. Although this doesn t address ACL tampering and some of the other things, it does represent a significant help with the most significant concern. However, tailnet lock is in alpha, only available on the Enterprise plan, and has a waitlist, so I have been unable to test it. Any Tailscale node can request the IP addresses belonging to any other Tailscale node. The Tailscale control plane captures, and exposes to you, this information about every node in your network: the OS hostname, IP addresses and port numbers, operating system, creation date, last seen timestamp, and NAT traversal parameters. You can optionally enable service data capture as well, which sends data about open ports on each node to the control plane. Tailscale likes to highlight their key expiry and rotation feature. By default, all keys expire after 180 days, and traffic to and from the expired node will be interrupted until they are renewed (basically, you re-login with your provider and do a renew operation). Unfortunately, the only mention I can see of warning of impeding expiration is in the Windows client, and even there you need to edit a registry key to get the warning more than the default 24 hours in advance. In short, it seems likely to cut off communications when it s most important. You can disable key expiry on a per-node basis in the admin console web interface, and I mostly do, due to not wanting to lose connectivity at an inopportune time.

Tailscale: Connectivity and NAT traversal When thinking about reliability, the primary consideration here is being able to reach the Tailscale control plane. While it is possible in limited circumstances to reach nodes without the Tailscale control plane, it is a fairly brittle setup and notably will not survive a client restart. So if you use Tailscale to reach other nodes on your LAN, that won t work unless your Internet is up and the control plane is reachable. Assuming your Internet is up and Tailscale s infrastructure is up, there is little to be concerned with. Your own comfort level with cloud providers and your Internet should guide you here. Tailscale wrote a fantastic article about NAT traversal and they, predictably, do very well with it. Tailscale prefers UDP but falls back to TCP if needed. Broker (DERP) servers step in as a last resort, and Tailscale clients automatically select the best ones. I m not aware of anything that is more successful with NAT traversal than Tailscale. This maximizes the situations in which a direct P2P connection can be used without a broker. I have found Tailscale to be a bit slow to notice changes in network topography compared to Yggdrasil, and sometimes needs a kick in the form of restarting the client process to re-establish communications after a network change. However, it s possible (maybe even probable) that if I d waited a bit longer, it would have sorted this all out.

Tailscale: Sharing with friends I touched on the funnel feature earlier. The sharing feature lets you give an invite to an outsider. By default, a person accepting a share can make only outgoing connections to the network they re invited to, and cannot receive incoming connections from that network this makes sense. When sharing an exit node, you get a checkbox that lets you share access to the exit node as well. Of course, the person accepting the share needs to install the Tailnet client. The combination of funnel and sharing make Tailscale the best for ad-hoc sharing.

Tailscale: DNS Tailscale s DNS is called MagicDNS. It runs as a layer atop your standard DNS taking over /etc/resolv.conf on Linux and provides resolution of mesh hostnames and some other features. This is a concept that is pretty slick. It also is a bit flaky on Linux; dueling programs want to write to /etc/resolv.conf. I can t really say this is entirely Tailscale s fault; they document the problem and some workarounds. I would love to be able to add custom records to this service; for instance, to override the public IP for a service to use the in-mesh IP. Unfortunately, that s not yet possible. However, MagicDNS can query existing nameservers for certain domains in a split DNS setup.

Tailscale: Source code, pricing, and portability Tailscale is almost fully open source and the client is highly portable. The client is open source (BSD 3-clause) on open source platforms, and closed source on closed source platforms. The DERP servers are open source. The coordination server is closed source, although there is an open source coordination server called Headscale (also BSD 3-clause) made available with Tailscale s blessing and informal support. It supports most, but not all, features in the Tailscale coordination server. Tailscale s pricing (which does not apply when using Headscale) provides a free plan for 1 user with up to 20 devices. A Personal Pro plan expands that to 100 devices for $48 per year - not a bad deal at $4/mo. A Community on Github plan also exists, and then there are more business-oriented plans as well. See the pricing page for details. As a small note, I appreciated Tailscale s install script. It properly added Tailscale s apt key in a way that it can only be used to authenticate the Tailscale repo, rather than as a systemwide authenticator. This is a nice touch and speaks well of their developers.

Tailscale conclusions Tailscale is tops in sharing and has a broad feature set and excellent documentation. Like other solutions with a centralized control plane, device communications can stop working if the control plane is unreachable, and the threat model of the control plane should be carefully considered.

Zerotier Zerotier is a close competitor to Tailscale, and is similar to it in a lot of ways. So rather than duplicate all of the Tailscale information here, I m mainly going to describe how it differs from Tailscale. The primary difference between the two is that Zerotier emulates an Ethernet network via a Linux tap interface, while Tailscale emulates a TCP/IP network via a Linux tun interface. However, Zerotier has a number of things that make it be a somewhat imperfect Ethernet emulator. For one, it has a problem with broadcast amplification; the machine sending the broadcast sends it to all the other nodes that should receive it (up to a set maximum). I wouldn t want to have a lot of programs broadcasting on a slow link. While in theory this could let you run Netware or DECNet across Zerotier, I m not really convinced there s much call for that these days, and Zerotier is clearly IP-focused as it allocates IP addresses and such anyhow. Zerotier provides special support for emulated ARP (IPv4) and NDP (IPv6). While you could theoretically run Zerotier as a bridge, this eliminates the zero trust principle, and Tailscale supports subnet routers, which provide much of the same feature set anyhow. A somewhat obscure feature, but possibly useful, is Zerotier s built-in support for multipath WAN for the public interface. This actually lets you do a somewhat basic kind of channel bonding for WAN.

Zerotier: Security and Privacy The picture here is similar to Tailscale, with the difference that you can create a Zerotier-local account rather than relying on cloud authentication. I was unable to find as much detail about Zerotier as I could about Tailscale - notably I couldn t find anything about how sticky an IP address is. However, the configuration screen lets me delete a node and assign additional arbitrary IPs within a subnet to other nodes, so I think the assumption here is that if your Zerotier account (or the Zerotier control plane) is compromised, an attacker could remove a legit device, add a malicious one, and assign the previous IP of the legit device to the malicious one. I m not sure how to mitigate against that risk, as firewalling specific IPs is ineffective if an attacker can simply take them over. Zerotier also lacks anything akin to Tailnet Lock. For this reason, I didn t proceed much further in my Zerotier evaluation.

Zerotier: Connectivity and NAT traversal Like Tailscale, Zerotier has NAT traversal with STUN. However, it looks like it s more limited than Tailscale s, and in particular is incompatible with double NAT that is often seen these days. Zerotier operates brokers ( root servers ) that can do relaying, including TCP relaying. So you should be able to connect even from hostile networks, but you are less likely to form a P2P connection than with Tailscale.

Zerotier: Sharing with friends I was unable to find any special features relating to this in the Zerotier documentation. Therefore, it would be at the same level as Yggdrasil: possible, maybe even not too difficult, but without any specific help.

Zerotier: DNS Unlike Tailscale, Zerotier does not support automatically adding DNS entries for your hosts. Therefore, your options are approximately the same as Yggdrasil, though with the added option of pushing configuration pointing to your own non-Zerotier DNS servers to the client.

Zerotier: Source code, pricing, and portability The client ZeroTier One is available on Github under a custom business source license which prevents you from using it in certain settings. This license would preclude it being included in Debian. Their library, libzt, is available under the same license. The pricing page mentions a community edition for self hosting, but the documentation is sparse and it was difficult to understand what its feature set really is. The free plan lets you have 1 user with up to 25 devices. Paid plans are also available.

Zerotier conclusions Frankly I don t see much reason to use Zerotier. The virtual Ethernet model seems to be a weird hybrid that doesn t bring much value. I m concerned about the implications of a compromise of a user account or the control plane, and it lacks a lot of Tailscale features (MagicDNS and sharing). The only thing it may offer in particular is multipath WAN, but that s esoteric enough and also solvable at other layers that it doesn t seem all that compelling to me. Add to that the strange license and, to me anyhow, I don t see much reason to bother with it.

Netmaker Netmaker is one of the projects that is making noise these days. Netmaker is the only one here that is a wrapper around in-kernel Wireguard, which can make a performance difference when talking to peers on a 1Gbps or faster link. Also, unlike other tools, it has an ingress gateway feature that lets people that don t have the Netmaker client, but do have Wireguard, participate in the VPN. I believe I also saw a reference somewhere to nodes as routers as with Yggdrasil, but I m failing to dig it up now. The project is in a bit of an early state; you can sign up for an upcoming closed beta with a SaaS host, but really you are generally pointed to self-hosting using the code in the github repo. There are community and enterprise editions, but it s not clear how to actually choose. The server has a bunch of components: binary, CoreDNS, database, and web server. It also requires elevated privileges on the host, in addition to a container engine. Contrast that to the single binary that some others provide. It looks like releases are frequent, but sometimes break things, and have a somewhat more laborious upgrade processes than most. I don t want to spend a lot of time managing my mesh. So because of the heavy needs of the server, the upgrades being labor-intensive, it taking over iptables and such on the server, I didn t proceed with a more in-depth evaluation of Netmaker. It has a lot of promise, but for me, it doesn t seem to be in a state that will meet my needs yet.

Nebula Nebula is an interesting mesh project that originated within Slack, seems to still be primarily sponsored by Slack, but is also being developed by Defined Networking (though their product looks early right now). Unlike the other tools in this section, Nebula doesn t have a web interface at all. Defined Networking looks likely to provide something of a SaaS service, but for now, you will need to run a broker ( lighthouse ) yourself; perhaps on a $5/mo VPS. Due to the poor firewall traversal properties, I didn t do a full evaluation of Nebula, but it still has a very interesting design.

Nebula: Security and Privacy Since Nebula lacks a traditional control plane, the root of trust in Nebula is a CA (certificate authority). The documentation gives this example of setting it up:
./nebula-cert sign -name "lighthouse1" -ip "192.168.100.1/24"
./nebula-cert sign -name "laptop" -ip "192.168.100.2/24" -groups "laptop,home,ssh"
./nebula-cert sign -name "server1" -ip "192.168.100.9/24" -groups "servers"
./nebula-cert sign -name "host3" -ip "192.168.100.10/24"
So the cert contains your IP, hostname, and group allocation. Each host in the mesh gets your CA certificate, and the per-host cert and key generated from each of these steps. This leads to a really nice security model. Your CA is the gatekeeper to what is trusted in your mesh. You can even have it airgapped or something to make it exceptionally difficult to breach the perimeter. Nebula contains an integrated firewall. Because the ability to keep out unwanted nodes is so strong, I would say this may be the one mesh VPN you might consider using without bothering with an additional on-host firewall. You can define static mappings from a Nebula mesh IP to a clearnet IP. I haven t found information on this, but theoretically if NAT traversal isn t required, these static mappings may allow Nebula nodes to reach each other even if Internet is down. I don t know if this is truly the case, however.

Nebula: Connectivity and NAT traversal This is a weak point of Nebula. Nebula sends all traffic over a single UDP port; there is no provision for using TCP. This is an issue at certain hotel and other public networks which open only TCP egress ports 80 and 443. I couldn t find a lot of detail on what Nebula s NAT traversal is capable of, but according to a certain Github issue, this has been a sore spot for years and isn t as capable as Tailscale. You can designate nodes in Nebula as brokers (relays). The concept is the same as Yggdrasil, but it s less versatile. You have to manually designate what relay to use. It s unclear to me what happens if different nodes designate different relays. Keep in mind that this always happens over a UDP port.

Nebula: Sharing with friends There is no particular support here.

Nebula: DNS Nebula has experimental DNS support. In contrast with Tailscale, which has an internal DNS server on every node, Nebula only runs a DNS server on a lighthouse. This means that it can t forward requests to a DNS server that s upstream for your laptop s particular current location. Actually, Nebula s DNS server doesn t forward at all. It also doesn t resolve its own name. The Nebula documentation makes reference to using multiple lighthouses, which you may want to do for DNS redundancy or performance, but it s unclear to me if this would make each lighthouse form a complete picture of the network.

Nebula: Source code, pricing, and portability Nebula is fully open source (MIT). It consists of a single Go binary and configuration. It is fairly portable.

Nebula conclusions I am attracted to Nebula s unique security model. I would probably be more seriously considering it if not for the lack of support for TCP and poor general NAT traversal properties. Its datacenter connectivity heritage does show through.

Roll your own and hybrid Here is a grab bag of ideas:

Running Yggdrasil over Tailscale One possibility would be to use Tailscale for its superior NAT traversal, then allow Yggdrasil to run over it. (You will need a firewall to prevent Tailscale from trying to run over Yggdrasil at the same time!) This creates a closed network with all the benefits of Yggdrasil, yet getting the NAT traversal from Tailscale. Drawbacks might be the overhead of the double encryption and double encapsulation. A good Yggdrasil peer may wind up being faster than this anyhow.

Public VPN provider for NAT traversal A public VPN provider such as Mullvad will often offer incoming port forwarding and nodes in many cities. This could be an attractive way to solve a bunch of NAT traversal problems: just use one of those services to get you an incoming port, and run whatever you like over that. Be aware that a number of public VPN clients have a kill switch to prevent any traffic from egressing without using the VPN; see, for instance, Mullvad s. You ll need to disable this if you are running a mesh atop it.

Other

Combining with local firewalls For most of these tools, I recommend using a local firewal in conjunction with them. I have been using firehol and find it to be quite nice. This means you don t have to trust the mesh, the control plane, or whatever. The catch is that you do need your mesh VPN to provide strong association between IP address and node. Most, but not all, do.

Performance I tested some of these for performance using iperf3 on a 2.5Gbps LAN. Here are the results. All speeds are in Mbps.
Tool iperf3 (default) iperf3 -P 10 iperf3 -R
Direct (no VPN) 2406 2406 2764
Wireguard (kernel) 1515 1566 2027
Yggdrasil 892 1126 1105
Tailscale 950 1034 1085
Tinc 296 300 277
You can see that Wireguard was significantly faster than the other options. Tailscale and Yggdrasil were roughly comparable, and Tinc was terrible.

IP collisions When you are communicating over a network such as these, you need to trust that the IP address you are communicating with belongs to the system you think it does. This protects against two malicious actor scenarios:
  1. Someone compromises one machine on your mesh and reconfigures it to impersonate a more important one
  2. Someone connects an unauthorized system to the mesh, taking over a trusted IP, and uses the privileges of the trusted IP to access resources
To summarize the state of play as highlighted in the reviews above:
  • Yggdrasil derives IPv6 addresses from a public key
  • tinc allows any node to set any IP
  • Tailscale IPs aren t user-assignable, but the assignment algorithm is unknown
  • Zerotier allows any IP to be allocated to any node at the control plane
  • I don t know what Netmaker does
  • Nebula IPs are baked into the cert and signed by the CA, but I haven t verified the enforcement algorithm
So this discussion really only applies to Yggdrasil and Tailscale. tinc and Zerotier lack detailed IP security, while Nebula expects IP allocations to be handled outside of the tool and baked into the certs (therefore enforcing rigidity at that level). So the question for Yggdrasil and Tailscale is: how easy is it to commandeer a trusted IP? Yggdrasil has a brief discussion of this. In short, Yggdrasil offers you both a dedicated IP and a rarely-used /64 prefix which you can delegate to other machines on your LAN. Obviously by taking the dedicated IP, a lot more bits are available for the hash of the node s public key, making collisions technically impractical, if not outright impossible. However, if you use the /64 prefix, a collision may be more possible. Yggdrasil s hashing algorithm includes some optimizations to make this more difficult. Yggdrasil includes a genkeys tool that uses more CPU cycles to generate keys that are maximally difficult to collide with. Tailscale doesn t document their IP assignment algorithm, but I think it is safe to say that the larger subnet you use, the better. If you try to use a /24 for your mesh, it is certainly conceivable that an attacker could remove your trusted node, then just manually add the 240 or so machines it would take to get that IP reassigned. It might be a good idea to use a purely IPv6 mesh with Tailscale to minimize this problem as well. So, I think the risk is low in the default configurations of both Yggdrasil and Tailscale (certainly lower than with tinc or Zerotier). You can drive the risk even lower with both.

Final thoughts For my own purposes, I suspect I will remain with Yggdrasil in some fashion. Maybe I will just take the small performance hit that using a relay node implies. Or perhaps I will get clever and use an incoming VPN port forward or go over Tailscale. Tailscale was the other option that seemed most interesting. However, living in a region with Internet that goes down more often than I d like, I would like to just be able to send as much traffic over a mesh as possible, trusting that if the LAN is up, the mesh is up. I have one thing that really benefits from performance in excess of Yggdrasil or Tailscale: NFS. That s between two machines that never leave my LAN, so I will probably just set up a direct Wireguard link between them. Heck of a lot easier than trying to do Kerberos! Finally, I wrote this intending to be useful. I dealt with a lot of complexity and under-documentation, so it s possible I got something wrong somewhere. Please let me know if you find any errors.
This blog post is a copy of a page on my website. That page may be periodically updated.

12 April 2023

Jamie McClelland: Doing whatever Gmail says

As we slowly move our members to our new email infrastructure, an unexpected twist turned up: One member reported getting the Gmail warning:
Be careful with this message The sender hasn t authenticated this message so Gmail can t verify that it actually came from them.
They have their email delivered to May First, but have configured Gmail to pull in that email using the Check mail from other accounts feature. It worked fine on our old infrastructure, but started giving this message when we transitioned. A further twist: he only receives this message from email sent by other people in his organization - in other words email sent via May First gets flagged, email sent from other people does not. These Gmail messages typically warn users about email that has failed (or lacks) both SPF and DKIM. However, before diving into the technical details, my first thought was: why is Gmail giving a warning on a message that wasn t even delivered to them? It s always nice to get confirmation from others that this is totally wrong behavior. Unfortunately, when it s Gmail, it doesn t matter if they are wrong. We all have to deal with it. So next, I decided to investigate why this message failed both DKIM (digital signature) and SPF (ensuring the message was sent from an authorized server). Examining the headers immediately turned up the SPF failure:
Authentication-Results: mx.google.com;
       spf=fail (google.com: domain of xxx@xxx.org does not designate n.n.n.n as permitted sender) smtp.mailfrom=xxx@xxx.org
The IP address Google checked to ensure the message was sent by an authorized server is the IP address of our internal mail filter server in our new email infrastructure. That s the last hop before delivery to the user s mailbox, so that s the last hop Gmail sees. This is why Gmail is totally wrong to run this check: all email messages retreived via their mail fecthing service are going to fail the SPF test because Gmail has no way of knowing what the actual last hop is. So why is this problem only showing up after we transitioned to our new infrastructure? Because our old infrastructure had only one mail server for every user. The one mail server was the MX server and the relay server, so it was included in their SPF record. And why does this only affect mail sent via May First and not other domains? Because we add our DKIM signature to outgoing email, not to email delivered internally. Therefore, these messages both fail the SPF check and also don t have a DKIM signature. Other messages have a DKIM signature. Ugggg. So what do we do now? Clearly, something dumb and simple is in order: I added the IP addresses of our internal filter servers to our global SPF record. Someday, years from now, after Gmail is long gone (or has fixed this dumb behavior), when I m doing whatever retired people like me do, someone will notice that our internal filter server IPs are included in our SPF record. Hopefully they will fix the problem, but instead they ll probably think: no idea why these are here - something will probably break if I remove them.

Russ Allbery: Review: The Last Hero

Review: The Last Hero, by Terry Pratchett
Illustrator: Paul Kidby
Series: Discworld #27
Publisher: Harper
Copyright: 2001, 2002
ISBN: 0-06-050777-2
Format: Graphic novel
Pages: 176
The Last Hero is the 27th Discworld novel and part of the Rincewind subseries. This is something of a sequel to Interesting Times and relies heavily on the cast that was built up in previous books. It's not a good place to start with the series. At last, the rare Rincewind novel that I enjoyed. It helps that Rincewind is mostly along for the ride. Cohen the Barbarian and his band of elderly heroes have decided they're tired of enjoying their spoils and are going on a final adventure. They're going to return fire to the gods, in the form of a giant bomb. The wizards in Ankh-Morpork get wind of this and realize that an explosion at the Hub where the gods live could disrupt the magical field of the entire Disc, effectively destroying it. The only hope seems to be to reach Cori Celesti before Cohen and head him off, but Cohen is already almost there. Enter Lord Vetinari, who has Leonard of Quirm design a machine that will get them there in time by slingshotting under the Disc itself. First off, let me say how much I love the idea of returning fire to the gods with interest. I kind of wish Pratchett had done more with their motivations, but I was laughing about that through the whole book. Second, this is the first of the illustrated Discworld books that I've read in the intended illustrated form (I read the paperback version of Eric), and this book is gorgeous. I enjoyed Paul Kidby's art far more than I had expected to. His style what I will call, for lack of better terminology due to my woeful art education, "highly detailed caricature." That's not normally a style that clicks with me, but it works incredibly well for Discworld. The Last Hero is richly illustrated, with some amount of art, if only subtle background behind the text, on nearly every page. There are several two-page spreads, but oddly I thought those (including the parody of The Scream on the cover) were the worst art of the book. None of them did much for me. The best art is in the figure studies and subtle details: Leonard of Quirm's beautiful calligraphy, his numerous sketches, the labeled illustration of the controls of the flying machine, and the portraits of Cohen's band and the people they encounter. The edition I got is printed on lovely, thick glossy paper, and the subtle art texture behind the writing makes this book a delight to read. I'm not sure if, like Eric, this book comes in other editions, but if so, I highly recommend getting or finding the high-quality illustrated edition for the best reading experience. The plot, like a lot of the Rincewind books, doesn't amount to much, but I enjoyed the mission to intercept Cohen. Leonard of Quirm is a great character, and the slow revelation of his flying machine design (which I will not spoil) is a delightful combination of Leonardo da Vinci parody, Discworld craziness, and NASA homage. Also, the Librarian is involved, which always improves a Discworld book. (The Luggage, sadly, is not; I would have liked to have seen a richly-illustrated story about it, but it looks like I'll have to find the illustrated version of Eric for that.) There is one of Pratchett's philosophical subtexts here, about heroes and stories and what it means for your story to live on. To be honest, it didn't grab me; it's mostly subtext, and this particular set of characters weren't quite introspective enough to make the philosophy central to the story. Also, I was perhaps too sympathetic to Cohen's goals, and thus not very interested in anyone successfully stopping him. But I had a lot more fun with this one than I usually do with Rincewind books, helped considerably by the illustrations. If you've been skipping Rincewind books in your Discworld read-through and have access to the illustrated edition of The Last Hero, consider making an exception for this one. Followed by The Amazing Maurice and His Educated Rodents in publication order and, thematically, by Unseen Academicals. Rating: 7 out of 10

6 April 2023

Reproducible Builds: Reproducible Builds in March 2023

Welcome to the March 2023 report from the Reproducible Builds project. In these reports we outline the most important things that we have been up to over the past month. As a quick recap, the motivation behind the reproducible builds effort is to ensure no malicious flaws have been introduced during compilation and distributing processes. It does this by ensuring identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to the project, please do visit our Contribute page on our website.

News There was progress towards making the Go programming language reproducible this month, with the overall goal remaining making the Go binaries distributed from Google and by Arch Linux (and others) to be bit-for-bit identical. These changes could become part of the upcoming version 1.21 release of Go. An issue in the Go issue tracker (#57120) is being used to follow and record progress on this.
Arnout Engelen updated our website to add and update reproducibility-related links for NixOS to reproducible.nixos.org. [ ]. In addition, Chris Lamb made some cosmetic changes to our presentations and resources page. [ ][ ]
Intel published a guide on how to reproducibly build their Trust Domain Extensions (TDX) firmware. TDX here refers to an Intel technology that combines their existing virtual machine and memory encryption technology with a new kind of virtual machine guest called a Trust Domain. This runs the CPU in a mode that protects the confidentiality of its memory contents and its state from any other software.
A reproducibility-related bug from early 2020 in the GNU GCC compiler as been fixed. The issues was that if GCC was invoked via the as frontend, the -ffile-prefix-map was being ignored. We were tracking this in Debian via the build_path_captured_in_assembly_objects issue. It has now been fixed and will be reflected in GCC version 13.
Holger Levsen will present at foss-north 2023 in April of this year in Gothenburg, Sweden on the topic of Reproducible Builds, the first ten years.
Anthony Andreoli, Anis Lounis, Mourad Debbabi and Aiman Hanna of the Security Research Centre at Concordia University, Montreal published a paper this month entitled On the prevalence of software supply chain attacks: Empirical study and investigative framework:
Software Supply Chain Attacks (SSCAs) typically compromise hosts through trusted but infected software. The intent of this paper is twofold: First, we present an empirical study of the most prominent software supply chain attacks and their characteristics. Second, we propose an investigative framework for identifying, expressing, and evaluating characteristic behaviours of newfound attacks for mitigation and future defense purposes. We hypothesize that these behaviours are statistically malicious, existed in the past, and thus could have been thwarted in modernity through their cementation x-years ago. [ ]

On our mailing list this month:
  • Mattia Rizzolo is asking everyone in the community to save the date for the 2023 s Reproducible Builds summit which will take place between October 31st and November 2nd at Dock Europe in Hamburg, Germany. Separate announcement(s) to follow. [ ]
  • ahojlm posted an message announcing a new project which is the first project offering bootstrappable and verifiable builds without any binary seeds. That is to say, a way of providing a verifiable path towards trusted software development platform without relying on pre-provided binary code in order to prevent against various forms of compiler backdoors. The project s homepage is hosted on Tor (mirror).

The minutes and logs from our March 2023 IRC meeting have been published. In case you missed this one, our next IRC meeting will take place on Tuesday 25th April at 15:00 UTC on #reproducible-builds on the OFTC network.
and as a Valentines Day present, Holger Levsen wrote on his blog on 14th February to express his thanks to OSUOSL for their continuous support of reproducible-builds.org. [ ]

Debian Vagrant Cascadian developed an easier setup for testing debian packages which uses sbuild s unshare mode along and reprotest, our tool for building the same source code twice in different environments and then checking the binaries produced by each build for any differences. [ ]
Over 30 reviews of Debian packages were added, 14 were updated and 7 were removed this month, all adding to our knowledge about identified issues. A number of issues were updated, including the Holger Levsen updating build_path_captured_in_assembly_objects to note that it has been fixed for GCC 13 [ ] and Vagrant Cascadian added new issues to mark packages where the build path is being captured via the Rust toolchain [ ] as well as new categorisation for where virtual packages have nondeterministic versioned dependencies [ ].

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: In addition, Vagrant Cascadian filed a bug with a patch to ensure GNU Modula-2 supports the SOURCE_DATE_EPOCH environment variable.

Testing framework The Reproducible Builds project operates a comprehensive testing framework (available at tests.reproducible-builds.org) in order to check packages and other artifacts for reproducibility. In March, the following changes were made by Holger Levsen:
  • Arch Linux-related changes:
    • Build Arch packages in /tmp/archlinux-ci/$SRCPACKAGE instead of /tmp/$SRCPACKAGE. [ ]
    • Start 2/3 of the builds on the o1 node, the rest on o2. [ ]
    • Add graphs for Arch Linux (and OpenWrt) builds. [ ]
    • Toggle Arch-related builders to debug why a specific node overloaded. [ ][ ][ ][ ]
  • Node health checks:
    • Detect SetuptoolsDeprecationWarning tracebacks in Python builds. [ ]
    • Detect failures do perform chdist calls. [ ][ ]
  • OSUOSL node migration.
    • Install megacli packages that are needed for hardware RAID. [ ][ ]
    • Add health check and maintenance jobs for new nodes. [ ]
    • Add mail config for new nodes. [ ][ ]
    • Handle a node running in the future correctly. [ ][ ]
    • Migrate some nodes to Debian bookworm. [ ]
    • Fix nodes health overview for osuosl3. [ ]
    • Make sure the /srv/workspace directory is owned by by the jenkins user. [ ]
    • Use .debian.net names everywhere, except when communicating with the outside world. [ ]
    • Grant fpierret access to a new node. [ ]
    • Update documentation. [ ]
    • Misc migration changes. [ ][ ][ ][ ][ ][ ][ ][ ]
  • Misc changes:
    • Enable fail2ban everywhere and monitor it with munin [ ].
    • Gracefully deal with non-existing Alpine schroots. [ ]
In addition, Roland Clobus is continuing his work on reproducible Debian ISO images:
  • Add/update openQA configuration [ ], and use the actual timestamp for openQA builds [ ].
  • Moved adding the user to the docker group from the janitor_setup_worker script to the (more general) update_jdn.sh script. [ ]
  • Use the (short-term) reproducible source when generating live-build images. [ ]

diffoscope development diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats as well. This month, Mattia Rizzolo released versions 238, and Chris Lamb released versions 239 and 240. Chris Lamb also made the following changes:
  • Fix compatibility with PyPDF 3.x, and correctly restore test data. [ ]
  • Rework PDF annotation handling into a separate method. [ ]
In addition, Holger Levsen performed a long-overdue overhaul of the Lintian overrides in the Debian packaging [ ][ ][ ][ ], and Mattia Rizzolo updated the packaging to silence an include_package_data=True [ ], fixed the build under Debian bullseye [ ], fixed tool name in a list of tools permitted to be absent during package build tests [ ] and as well as documented sending out an email upon [ ]. In addition, Vagrant Cascadian updated the version of GNU Guix to 238 [ and 239 [ ]. Vagrant also updated reprotest to version 0.7.23. [ ]

Other development work Bernhard M. Wiedemann published another monthly report about reproducibility within openSUSE


If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

28 March 2023

kpcyrd: Writing a Linux executable from scratch with x86_64-unknown-none and Rust

I recently mentioned on the internet I did work in this direction and a friend of mine asked me to write a blogpost on this. I didn t blog for a long time (keeping all the goodness for myself hehe), so here we go. To set the scene, let s assume we want to make an exectuable binary for x86_64 Linux that s supposed to be extremely portable. It should work on both Debian and Arch Linux. It should work on systems without glibc like Alpine Linux. It should even work in a FROM scratch Docker container. In a more serious setting you would statically link musl-libc with your Rust program, but today we re in a silly-goofy mood so we re going to try to make this work without a libc. And we re also going to use Rust for this, more specifically the stable release channel of Rust, so this blog post won t use any nightly-only features that might still change/break. If you re using a Rust 1.0 version that was recent at the time of writing or later (>= 1.68.0 according to my computer), you should be able to try this at home just fine . This tutorial assumes you have no prior programming experience in any programming language, but it s going to involve some x86_64 assembly. If you already know what a syscall is, you ll be just fine. If this is your first exposure to programming you might still be able to follow along, but it might be a wild ride. If you haven t already, install rustup (possibly also available in your package manager, who knows?)
# when asked, press enter to confirm default settings
curl --proto '=https' --tlsv1.2 -sSf https://sh.rustup.rs   sh
This is going to install everything you need to use Rust on Linux (this tutorial assumes you re following along on Linux btw). Usually it s still using a system linker (by calling the cc binary, and errors out if none is present), but instead we re going to use rustup to install an additional target:
rustup target add x86_64-unknown-none
I don t know if/how this is made available by Linux distributions, so I recommend following along with rust installed from rustup. Anyway, we re creating a new project with cargo, this creates a new directory that we can then change into (you might ve done this before):
cargo new hack-the-planet
cd hack-the-planet
There s going to be a file named Cargo.toml, we don t need to make any changes there, but the one that was auto-generated for me at the time of writing looks like this:
[package]
name = "hack-the-planet"
version = "0.1.0"
edition = "2021"
# See more keys and their definitions at https://doc.rust-lang.org/cargo/reference/manifest.html
[dependencies]
There s a second file named src/main.rs, it s going to contain some pre-generated hello world, but we re going to delete it and create a new, empty file:
rm src/main.rs
touch src/main.rs
Alrighty, leaving this file empty is not valid but we re going to walk through the individual steps so we re going to try to build with an empty file first. At this point I would like to credit this chapter of a fasterthanli.me series and a blogpost by Philipp Oppermann, this tutorial is merely an 2023 update and makes it work with stable Rust. Let s run the build:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0463]: can't find crate for  std 
   
  = note: the  x86_64-unknown-none  target may not support the standard library
  = note:  std  is required by  hack_the_planet  because it does not declare  #![no_std] 
error[E0601]:  main  function not found in crate  hack_the_planet 
   
  = note: consider adding a  main  function to  src/main.rs 
Some errors have detailed explanations: E0463, E0601.
For more information about an error, try  rustc --explain E0463 .
error: could not compile  hack-the-planet  due to 2 previous errors
Since this doesn t use a libc (oh right, I forgot to mention this up to this point actually), this also means there s no std standard library. Usually the standard library of Rust still uses the system libc to do syscalls, but since we specify our libc as none this means std won t be available (use std::fs::rename won t work). There are still other functions we can use and import, for example there s core that s effectively a second standard library, but much smaller. To opt-out of the std standard library, we can put #![no_std] into src/main.rs:
#![no_std]
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error[E0601]:  main  function not found in crate  hack_the_planet 
 --> src/main.rs:1:11
   
1   #![no_std]
              ^ consider adding a  main  function to  src/main.rs 
For more information about this error, try  rustc --explain E0601 .
error: could not compile  hack-the-planet  due to previous error
Rust noticed we didn t define a main function and suggest we add one. This isn t what we want though so we ll politely decline and inform Rust we don t have a main and it shouldn t attempt to call it. We re adding #![no_main] to our file and src/main.rs now looks like this:
#![no_std]
#![no_main]
Running the build again:
$ cargo build
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error:  #[panic_handler]  function required, but not found
error: language item required, but not found:  eh_personality 
   
  = note: this can occur when a binary crate with  #![no_std]  is compiled for a target where  eh_personality  is defined in the standard library
  = help: you may be able to compile for a target that doesn't need  eh_personality , specify a target with  --target  or in  .cargo/config 
error: could not compile  hack-the-planet  due to 2 previous errors
Rust is asking us for a panic handler, basically I m going to jump to this address if something goes terribly wrong and execute whatever you put there . Eventually we would put some code there to just exit the program, but for now an infinitely loop will do. This is likely going to get stripped away anyway by the compiler if it notices our program has no code-branches leading to a panic and the code is unused. Our src/main.rs now looks like this:
#![no_std]
#![no_main]
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
Running the build again:
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.16s
Neat, it worked! What happens if we run it?
$ target/x86_64-unknown-none/release/hack-the-planet
Segmentation fault (core dumped)
Oops. Let s try to disassemble it:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Ok that looks pretty from scratch to me . The file contains no cpu instructions. Also note how our infinity loop is not present (as predicted).

Making a basic program and executing it Ok let s try to make a valid program that basically just cleanly exits. First let s try to add some cpu instructions and verify they re indeed getting executed. Lemme introduce, the INT 3 instruction in x86_64 assembly. In binary it s also known as the 0xCC opcode. It crashes our program in a slightly different way, so if the error message changes, we know it worked. The other tutorials use a #[naked] function for the entry point, but since this feature isn t stabilized at the time of writing we re going to use the global_asm! macro. Also don t worry, I m not going to introduce every assembly instruction individually. Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "int 3"
 
Running the build again (ok basically from now on the build is always going to be expected to work unless I say otherwise):
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
    Finished release [optimized] target(s) in 0.11s
Let s try to disassemble the binary again:
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	cc                   	int3
And sure enough, there s a cc instruction that was identified as int3. Let s try to run this:
$ target/x86_64-unknown-none/release/hack-the-planet
Trace/breakpoint trap (core dumped)
The error message of the crash is now slightly different because it s hitting our breakpoint cpu instruction. Funfact btw, if you run this in strace you can see this isn t making any system calls (aka not talking to the kernel at all, it just crashes):
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74f12430d1d8 /* 39 vars */) = 0
--- SIGTRAP  si_signo=SIGTRAP, si_code=SI_KERNEL, si_addr=NULL  ---
+++ killed by SIGTRAP (core dumped) +++
[1]    2796457 trace trap (core dumped)  strace -f ./hack-the-planet
Let s try to make a program that does a clean shutdown. To do this we inform the kernel with a system call that we may like to exit. We can get more info on this with man 2 exit and it defines exit like this:
[[noreturn]] void _exit(int status);
On Linux this syscall is actually called _exit and exit is implemented as a libc function, but we don t care about any of that today, it s going to do the job just fine. Also note how it takes a single argument of type int. In C-speak this means signed 32 bit , i32 in Rust. Next we need to figure out the syscall number of this syscall. These numbers are cpu architecture specific for some reason (idk, idc). We re looking these numbers up with ripgrep in /usr/include/asm/:
$ rg __NR_exit /usr/include/asm
/usr/include/asm/unistd_64.h
64:#define __NR_exit 60
235:#define __NR_exit_group 231
/usr/include/asm/unistd_x32.h
53:#define __NR_exit (__X32_SYSCALL_BIT + 60)
206:#define __NR_exit_group (__X32_SYSCALL_BIT + 231)
/usr/include/asm/unistd_32.h
5:#define __NR_exit 1
253:#define __NR_exit_group 252
Since we re on x86_64 the correct value is the one in unistd_64.h, 60. Also, on x86_64 the syscall number goes into the rax cpu register, the status argument goes in the rdi register. The return value of the syscall is going to be placed in the rax register after the syscall is done, but for exit the execution is never given back to us. Let s try to write 60 into the rax register and 69 into the rdi register. To copy into registers we re going to use the mov destination, source instruction to copy from source to destination. With these registers setup we can use the syscall cpu instruction to hand execution over to the kernel. Don t worry, there s only one more assembly instruction coming and for everything else we re going to use Rust. Our code now looks like this:
#![no_std]
#![no_main]
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rax, 60",
    "mov rdi, 69",
    "syscall"
 
Build the binary, run it and print the exit code:
$ cargo build --release --target x86_64-unknown-none
$ target/x86_64-unknown-none/release/hack-the-planet; echo $?
69
Nice. Rust is quite literally putting these cpu instructions into the binary for us, nothing else.
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 c7 c0 3c 00 00 00 	mov    $0x3c,%rax
    1217:	48 c7 c7 45 00 00 00 	mov    $0x45,%rdi
    121e:	0f 05                	syscall
Running this with strace shows the program does exactly one thing.
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x70699fe8c908 /* 39 vars */) = 0
exit(69)                                = ?
+++ exited with 69 +++

Writing Rust Ok but even though cpu instructions can be fun at times, I d rather not deal with them most of the time (this might strike you as odd, considering this blog post). Instead let s try to define a function in Rust and call into that instead. We re going to define this function as unsafe (btw none of this is taking advantage of the safety guarantees by Rust in case it wasn t obvious. This tutorial is mostly going to stick to unsafe Rust, but for bigger projects you can attempt to reduce your usage of unsafe to opt back into normal safe Rust), it also declares the function with #[no_mangle] so the function name is preserved as main and we can call it from our global_asm entry point. Lastely, when our program is started it s going to get the stack address passed in one of the cpu registers, this value is expected to be passed to our function as an argument. Our function declares ! as return type, which means it never returns:
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    // TODO: this is missing
 
This won t compile yet, we need to add our assembly for the exit syscall back in.
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
This time we re using the asm! macro, this is a slightly more declarative approach. We want to run the syscall cpu instruction with 60 in the rax register, and this time we want the rdi register to be zero, to indicate a successful exit. We also use options(noreturn) so Rust knows it should assume execution does not resume after this assembly is executed (the Linux kernel guarantees this). We modify our global_asm! entrypoint to call our new main function, and to copy the stack address from rsp into the register for the first argument rdi because it would otherwise get lost forever:
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
Our full program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    loop  
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") 0,
        options(noreturn)
    );
 
After building and disassembling this the Rust compiler is slowly starting to do work for us:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
The mov and syscall instructions are still the same, but it noticed it can XOR the rdi register with itself to set it to zero. It s using x86 assembly language (the 32 bit variant of x86_64, that also happens to work on x86_64) to do so, that s why the register is refered to as edi in the disassembly. You can also see it s inserting a bunch of 0xCC instructions (for alignment) and Rust puts the opcodes 0x0F 0x0B at the end of the function to force an invalid opcode exception so the program is guaranteed to crash in case the exit syscall doesn t do it. This code still executes as expected:
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x72dae7e5dc08 /* 39 vars */) = 0
exit(0)                                 = ?
+++ exited with 0 +++

Adding functions Ok we re getting closer but we aren t quite there yet. Let s try to write an exit function for our assembly that we can then call like a normal function. Remember that it takes a signed 32 bit integer that s supposed to go into rdi.
unsafe fn exit(status: i32) -> !  
    asm!(
        "syscall",
        in("rax") 60,
        in("rdi") status,
        options(noreturn)
    );
 
Actually, since this function doesn t take any raw pointers and any i32 is valid for this syscall we re going to remove the unsafe marker of this function. When doing this we still need to use unsafe within the function for our inline assembly.
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
Let s call this function from our main, and also remove the infinity loop of the panic handler with a call to exit(1):
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    exit(0);
 
Running this still works, but interestingly the generated assembly didn t change at all:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001210 <_start>:
    1210:	48 89 e7             	mov    %rsp,%rdi
    1213:	e8 08 00 00 00       	call   1220 <main>
    1218:	cc                   	int3
    1219:	cc                   	int3
    121a:	cc                   	int3
    121b:	cc                   	int3
    121c:	cc                   	int3
    121d:	cc                   	int3
    121e:	cc                   	int3
    121f:	cc                   	int3
0000000000001220 <main>:
    1220:	50                   	push   %rax
    1221:	b8 3c 00 00 00       	mov    $0x3c,%eax
    1226:	31 ff                	xor    %edi,%edi
    1228:	0f 05                	syscall
    122a:	0f 0b                	ud2
Rust noticed there s no need to make it a separate function at runtime and instead merged the instructions of the exit function directly into our main. It also noticed the 0 argument in exit(0) means rdi is supposed to be zero and uses the XOR optimization mentioned before. Since main is not calling any unsafe functions anymore we could mark it as safe too, but in the next few functions we re going to deal with file descriptors and raw pointers, so this is likely the only safe function we re going to write in this tutorial so let s just keep the unsafe marker.

Printing text Ok let s try to do a quick hello world, to do this we re going to call the write syscall. Looking it up with man 2 write:
ssize_t write(int fd, const void buf[.count], size_t count);
The write syscall takes 3 arguments and returns a signed size_t. In Rust this is called isize. In C size_t is an unsigned integer type that can hold any value of sizeof(...) for the given platform, ssize_t can only store half of that because it uses one of the bits to indicate an error has occured (the first s means signed, write returns -1 in case of an error). The arguments for write are:
  • the file descriptor to write to. stdout is located on file descriptor 1.
  • a pointer/address to some memory.
  • the number of bytes that should be written, starting at the given address.
Let s also lookup the syscall number of write:
% rg __NR_write /usr/include/asm
/usr/include/asm/unistd_64.h
5:#define __NR_write 1
24:#define __NR_writev 20
/usr/include/asm/unistd_32.h
8:#define __NR_write 4
150:#define __NR_writev 146
/usr/include/asm/unistd_x32.h
5:#define __NR_write (__X32_SYSCALL_BIT + 1)
323:#define __NR_writev (__X32_SYSCALL_BIT + 516)
The value we re looking for is 1. Let s write our write function (heh).
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1 => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
Now that s a lot of stuff at once. Since this syscall is actually going to hand execution back to our program we need to let Rust know which cpu registers the syscall is writing to, so Rust doesn t attempt to use them to store data (that would be silently overwritten by the syscall). inlateout("raw") 1 => r0 means we re writing a value to the register and want the result back in variable r0. in("rdi") fd means we want to write the value of fd into the rdi register. lateout("rcx") _ means the Linux kernel may write to that register (so the previous value may get lost), but we don t want to store the value anywhere (the underscore acts as a dummy variable name). This doesn t compile just yet though
$ cargo build --release --target x86_64-unknown-none
   Compiling hack-the-planet v0.1.0 (/hack-the-planet)
error: incompatible types for asm inout argument
  --> src/main.rs:35:26
    
35           inlateout("rax") 1 => r0,
                              ^    ^^ type  isize 
                               
                              type  i32 
    
   = note: asm inout arguments must have the same type, unless they are both pointers or integers of the same size
error: could not compile  hack-the-planet  due to previous error
Rust has inferred the type of r0 is isize since that s what our function returns, but the type of the input value for the register was inferred to be i32. We re going to select a specific number type to fix this.
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
We can now call our new write function like this:
write(1, b"Hello world\n".as_ptr(), 12);
We need to set the number of bytes we want to write explicitly because there s no concept of null-byte termination in the write system call, it s quite literally write the next X bytes, starting from this address . Our program now looks like this:
#![no_std]
#![no_main]
use core::arch::asm;
use core::arch::global_asm;
use core::panic::PanicInfo;
#[panic_handler]
fn panic(_info: &PanicInfo) -> !  
    exit(1);
 
global_asm!  
    ".global _start",
    "_start:",
    "mov rdi, rsp",
    "call main"
 
fn exit(status: i32) -> !  
    unsafe  
        asm!(
            "syscall",
            in("rax") 60,
            in("rdi") status,
            options(noreturn)
        );
     
 
unsafe fn write(fd: i32, buf: *const u8, count: usize) -> isize  
    let r0;
    asm!(
        "syscall",
        inlateout("rax") 1isize => r0,
        in("rdi") fd,
        in("rsi") buf,
        in("rdx") count,
        lateout("rcx") _,
        lateout("r11") _,
        options(nostack, preserves_flags)
    );
    r0
 
#[no_mangle]
unsafe fn main(_stack_top: *const u8) -> !  
    write(1, b"Hello world\n".as_ptr(), 12);
    exit(0);
 
Let s try to build and disassemble it:
$ cargo build --release --target x86_64-unknown-none
$ objdump -d target/x86_64-unknown-none/release/hack-the-planet
target/x86_64-unknown-none/release/hack-the-planet:     file format elf64-x86-64
Disassembly of section .text:
0000000000001220 <_start>:
    1220:	48 89 e7             	mov    %rsp,%rdi
    1223:	e8 08 00 00 00       	call   1230 <main>
    1228:	cc                   	int3
    1229:	cc                   	int3
    122a:	cc                   	int3
    122b:	cc                   	int3
    122c:	cc                   	int3
    122d:	cc                   	int3
    122e:	cc                   	int3
    122f:	cc                   	int3
0000000000001230 <main>:
    1230:	50                   	push   %rax
    1231:	48 8d 35 d5 ef ff ff 	lea    -0x102b(%rip),%rsi        # 20d <_start-0x1013>
    1238:	b8 01 00 00 00       	mov    $0x1,%eax
    123d:	ba 0c 00 00 00       	mov    $0xc,%edx
    1242:	bf 01 00 00 00       	mov    $0x1,%edi
    1247:	0f 05                	syscall
    1249:	b8 3c 00 00 00       	mov    $0x3c,%eax
    124e:	31 ff                	xor    %edi,%edi
    1250:	0f 05                	syscall
    1252:	0f 0b                	ud2
This time there are 2 syscalls, first write, then exit. For write it s setting up the 3 arguments in our cpu registers (rdi, rsi, rdx). The lea instruction subtracts 0x102b from the rip register (the instruction pointer) and places the result in the rsi register. This is effectively saying an address relative to wherever this code was loaded into memory . The instruction pointer is going to point directly behind the opcodes of the lea instruction, so 0x1238 - 0x102b = 0x20d. This address is also pointed out in the disassembly as a comment. We don t see the string in our disassembly but we can convert our 0x20d hex to 525 in decimal and use dd to read 12 bytes from that offset, and sure enough:
$ dd bs=1 skip=525 count=12 if=target/x86_64-unknown-none/release/hack-the-planet
Hello world
12+0 records in
12+0 records out
Execute our binary with strace also shows the new write syscall (and the bytes that are being written mixed up in the output).
$ strace -f ./hack-the-planet
execve("./hack-the-planet", ["./hack-the-planet"], 0x74493abe64a8 /* 39 vars */) = 0
write(1, "Hello world\n", 12Hello world
)           = 12
exit(0)                                 = ?
+++ exited with 0 +++
After running strip on it to remove some symbols the binary is so small, if you open it in a text editor it fits on a screenshot:

25 March 2023

Russ Allbery: Review: Thief of Time

Review: Thief of Time, by Terry Pratchett
Series: Discworld #26
Publisher: Harper
Copyright: May 2001
Printing: August 2014
ISBN: 0-06-230739-8
Format: Mass market
Pages: 420
Thief of Time is the 26th Discworld novel and the last Death novel, although he still appears in subsequent books. It's the third book starring Susan Sto Helit, so I don't recommend starting here. Mort is the best starting point for the Death subseries, and Reaper Man provides a useful introduction to the villains. Jeremy Clockson was an orphan raised by the Guild of Clockmakers. He is very good at making clocks. He's not very good at anything else, particularly people, but his clocks are the most accurate in Ankh-Morpork. He is therefore the logical choice to receive a commission by a mysterious noblewoman who wants him to make the most accurate possible clock: a clock that can measure the tick of the universe, one that a fairy tale says had been nearly made before. The commission is followed by a surprise delivery of an Igor, to help with the clock-making. People who live in places with lots of fields become farmers. People who live where there is lots of iron and coal become blacksmiths. And people who live in the mountains near the Hub, near the gods and full of magic, become monks. In the highest valley are the History Monks, founded by Wen the Eternally Surprised. Like most monks, they take apprentices with certain talents and train them in their discipline. But Lobsang Ludd, an orphan discovered in the Thieves Guild in Ankh-Morpork, is proving a challenge. The monks decide to apprentice him to Lu-Tze the sweeper; perhaps that will solve multiple problems at once. Since Hogfather, Susan has moved from being a governess to a schoolteacher. She brings to that job the same firm patience, total disregard for rules that apply to other people, and impressive talent for managing children. She is by far the most popular teacher among the kids, and not only because she transports her class all over the Disc so that they can see things in person. It is a job that she likes and understands, and one that she's quite irate to have interrupted by a summons from her grandfather. But the Auditors are up to something, and Susan may be able to act in ways that Death cannot. This was great. Susan has quickly become one of my favorite Discworld characters, and this time around there is no (or, well, not much) unbelievable romance or permanently queasy god to distract. The clock-making portions of the book quickly start to focus on Igor, who is a delightful perspective through whom to watch events unfold. And the History Monks! The metaphysics of what they are actually doing (which I won't spoil, since discovering it slowly is a delight) is perhaps my favorite bit of Discworld world building to date. I am a sucker for stories that focus on some process that everyone thinks happens automatically and investigate the hidden work behind it. I do want to add a caveat here that the monks are in part a parody of Himalayan Buddhist monasteries, Lu-Tze is rather obviously a parody of Laozi and Daoism in general, and Pratchett's parodies of non-western cultures are rather ham-handed. This is not quite the insulting mess that the Chinese parody in Interesting Times was, but it's heavy on the stereotypes. It does not, thankfully, rely on the stereotypes; the characters are great fun on their own terms, with the perfect (for me) balance of irreverence and thoughtfulness. Lu-Tze refusing to be anything other than a sweeper and being irritatingly casual about all the rules of the order is a classic bit that Pratchett does very well. But I also have the luxury of ignoring stereotypes of a culture that isn't mine, and I think Pratchett is on somewhat thin ice. As one specific example, having Lu-Tze's treasured sayings be a collection of banal aphorisms from a random Ankh-Morpork woman is both hilarious and also arguably rather condescending, and I'm not sure where I landed. It's a spot-on bit of parody of how a lot of people who get very into "eastern religions" sound, but it's also equating the Dao De Jing with advice from the Discworld equivalent of a English housewife. I think the generous reading is that Lu-Tze made the homilies profound by looking at them in an entirely different way than the woman saying them, and that's not completely unlike Daoism and works surprisingly well. But that's reading somewhat against the grain; Pratchett is clearly making fun of philosophical koans, and while anything is fair game for some friendly poking, it still feels a bit weird. That isn't the part of the History Monks that I loved, though. Their actual role in the story doesn't come out of the parody. It's something entirely native to Discworld, and it's an absolute delight. The scene with Lobsang and the procrastinators is perhaps my favorite Discworld set piece to date. Everything about the technology of the History Monks, even the Bond parody, is so good. I grew up reading the Marvel Comics universe, and Thief of Time reminds me of a classic John Byrne or Jim Starlin story, where the heroes are dumped into the middle of vast interdimensional conflicts involving barely-anthropomorphized cosmic powers and the universe is revealed to work in ever more intricate ways at vastly expanding scales. The Auditors are villains in exactly that tradition, and just like the best of those stories, the fulcrum of the plot is questions about what it means to be human, what it means to be alive, and the surprising alliances these non-human powers make with humans or semi-humans. I devoured this kind of story as a kid, and it turns out I still love it. The one complaint I have about the plot is that the best part of this book is the middle, and the end didn't entirely work for me. Ronnie Soak is at his best as a supporting character about three quarters of the way through the book, and I found the ending of his subplot much less interesting. The cosmic confrontation was oddly disappointing, and there's a whole extended sequence involving chocolate that I think was funnier in Pratchett's head than it was in mine. The ending isn't bad, but the middle of this book is my favorite bit of Discworld writing yet, and I wish the story had carried that momentum through to the end. I had so much fun with this book. The Discworld novels are clearly getting better. None of them have yet vaulted into the ranks of my all-time favorite books there's always some lingering quibble or sagging bit but it feels like they've gone from reliably good books to more reliably great books. The acid test is coming, though: the next book is a Rincewind book, which are usually the weak spots. Followed by The Last Hero in publication order. There is no direct thematic sequel. Rating: 8 out of 10

24 March 2023

Matthew Garrett: We need better support for SSH host certificates

Github accidentally committed their SSH RSA private key to a repository, and now a bunch of people's infrastructure is broken because it needs to be updated to trust the new key. This is obviously bad, but what's frustrating is that there's no inherent need for it to be - almost all the technological components needed to both reduce the initial risk and to make the transition seamless already exist.

But first, let's talk about what actually happened here. You're probably used to the idea of TLS certificates from using browsers. Every website that supports TLS has an asymmetric pair of keys divided into a public key and a private key. When you contact the website, it gives you a certificate that contains the public key, and your browser then performs a series of cryptographic operations against it to (a) verify that the remote site possesses the private key (which prevents someone just copying the certificate to another system and pretending to be the legitimate site), and (b) generate an ephemeral encryption key that's used to actually encrypt the traffic between your browser and the site. But what stops an attacker from simply giving you a fake certificate that contains their public key? The certificate is itself signed by a certificate authority (CA), and your browser is configured to trust a preconfigured set of CAs. CAs will not give someone a signed certificate unless they prove they have legitimate ownership of the site in question, so (in theory) an attacker will never be able to obtain a fake certificate for a legitimate site.

This infrastructure is used for pretty much every protocol that can use TLS, including things like SMTP and IMAP. But SSH doesn't use TLS, and doesn't participate in any of this infrastructure. Instead, SSH tends to take a "Trust on First Use" (TOFU) model - the first time you ssh into a server, you receive a prompt asking you whether you trust its public key, and then you probably hit the "Yes" button and get on with your life. This works fine up until the point where the key changes, and SSH suddenly starts complaining that there's a mismatch and something awful could be happening (like someone intercepting your traffic and directing it to their own server with their own keys). Users are then supposed to verify whether this change is legitimate, and if so remove the old keys and add the new ones. This is tedious and risks users just saying "Yes" again, and if it happens too often an attacker can simply redirect target users to their own server and through sheer fatigue at dealing with this crap the user will probably trust the malicious server.

Why not certificates? OpenSSH actually does support certificates, but not in the way you might expect. There's a custom format that's significantly less complicated than the X509 certificate format used in TLS. Basically, an SSH certificate just contains a public key, a list of hostnames it's good for, and a signature from a CA. There's no pre-existing set of trusted CAs, so anyone could generate a certificate that claims it's valid for, say, github.com. This isn't really a problem, though, because right now nothing pays attention to SSH host certificates unless there's some manual configuration.

(It's actually possible to glue the general PKI infrastructure into SSH certificates. Please do not do this)

So let's look at what happened in the Github case. The first question is "How could the private key have been somewhere that could be committed to a repository in the first place?". I have no unique insight into what happened at Github, so this is conjecture, but I'm reasonably confident in it. Github deals with a large number of transactions per second. Github.com is not a single computer - it's a large number of machines. All of those need to have access to the same private key, because otherwise git would complain that the private key had changed whenever it connected to a machine with a different private key (the alternative would be to use a different IP address for every frontend server, but that would instead force users to repeatedly accept additional keys every time they connect to a new IP address). Something needs to be responsible for deploying that private key to new systems as they're brought up, which means there's ample opportunity for it to accidentally end up in the wrong place.

Now, best practices suggest that this should be avoided by simply placing the private key in a hardware module that performs the cryptographic operations, ensuring that nobody can ever get at the private key. The problem faced here is that HSMs typically aren't going to be fast enough to handle the number of requests per second that Github deals with. This can be avoided by using something like a Nitro Enclave, but you're still going to need a bunch of these in different geographic locales because otherwise your front ends are still going to be limited by the need to talk to an enclave on the other side of the planet, and now you're still having to deal with distributing the private key to a bunch of systems.

What if we could have the best of both worlds - the performance of private keys that just happily live on the servers, and the security of private keys that live in HSMs? Unsurprisingly, we can! The SSH private key could be deployed to every front end server, but every minute it could call out to an HSM-backed service and request a new SSH host certificate signed by a private key in the HSM. If clients are configured to trust the key that's signing the certificates, then it doesn't matter what the private key on the servers is - the client will see that there's a valid certificate and will trust the key, even if it changes. Restricting the validity of the certificate to a small window of time means that if a key is compromised an attacker can't do much with it - the moment you become aware of that you stop signing new certificates, and once all the existing ones expire the old private key becomes useless. You roll out a new private key with new certificates signed by the same CA and clients just carry on trusting it without any manual involvement.

Why don't we have this already? The main problem is that client tooling just doesn't handle this well. OpenSSH has no way to do TOFU for CAs, just the keys themselves. This means there's no way to do a git clone ssh://git@github.com/whatever and get a prompt asking you to trust Github's CA. Instead, you need to add a @cert-authority github.com (key) line to your known_hosts file by hand, and since approximately nobody's going to do that there's only marginal benefit in going to the effort to implement this infrastructure. The most important thing we can do to improve the security of the SSH ecosystem is to make it easier to use certificates, and that means improving the behaviour of the clients.

It should be noted that certificates aren't the only approach to handling key migration. OpenSSH supports a protocol for key rotation, basically by allowing the server to provide a set of multiple trusted keys that the client can cache, and then invalidating old ones. Unfortunately this still requires that the "new" private keys be deployed in the same way as the old ones, so any screwup that results in one private key being leaked may well also result in the additional keys being leaked. I prefer the certificate approach.

Finally, I've seen a couple of people imply that the blame here should be attached to whoever or whatever caused the private key to be committed to a repository in the first place. This is a terrible take. Humans will make mistakes, and your systems should be resilient against that. There's no individual at fault here - there's a series of design decisions that made it possible for a bad outcome to occur, and in a better universe they wouldn't have been necessary. Let's work on building that better universe.

comment count unavailable comments

22 March 2023

Michael Prokop: Automatically unlocking a LUKS encrypted root filesystem during boot

Update on 2023-03-23: thanks to Daniel Roschka for mentioning the Mandos and TPM approaches, which might be better alternatives, depending on your options and needs. Peter Palfrader furthermore pointed me towards clevis-initramfs and tang. A customer of mine runs dedicated servers inside a foreign data-center, remote hands only. In such an environment you might need a disk replacement because you need bigger or faster disks, though also a disk might (start to) fail and you need a replacement. One has to be prepared for such a scenario, but fully wiping your used disk then might not always be an option, especially once disks (start to) fail. On the other hand you don t want to end up with (partial) data on your disk handed over to someone unexpected. By encrypting the data on your disks upfront you can prevent against this scenario. But if you have a fleet of servers you might not want to manually jump on servers during boot and unlock crypto volumes manually. It s especially annoying if it s about the root filesystem where a solution like dropbear-initramfs needs to be used for remote access during initramfs boot stage. So my task for the customer was to adjust encrypted LUKS devices such that no one needs to manually unlock the encrypted device during server boot (with some specific assumptions about possible attack vectors one has to live with, see the disclaimer at the end). The documentation about this use-case was rather inconsistent, especially because special rules apply for the root filesystem (no key file usage), we see different behavior between what s supported by systemd (hello key file again), initramfs-tools and dracut, not to mention the changes between different distributions. Since tests with this tend to be rather annoying (better make sure to have a Grml live system available :)), I m hereby documenting what worked for us (Debian/bullseye with initramfs-tools and cryptsetup-initramfs). The system was installed with LVM on-top of an encrypted Software-RAID device, only the /boot partition is unencrypted. But even if you don t use Software-RAID nor LVM the same instructions apply. The system looks like this:
% mount -t ext4 -l
/dev/mapper/foobar-root_1 on / type ext4 (rw,relatime,errors=remount-ro)
% sudo pvs
  PV                    VG     Fmt  Attr PSize   PFree
  /dev/mapper/md1_crypt foobar lvm2 a--  445.95g 430.12g
% sudo vgs
  VG     #PV #LV #SN Attr   VSize   VFree
  foobar   1   2   0 wz--n- 445.95g 430.12g
% sudo lvs
  LV     VG     Attr       LSize   Pool Origin Data%  Meta%  Move Log Cpy%Sync Convert
  root_1 foobar -wi-ao---- <14.90g
% lsblk
NAME                  MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
[...]
sdd                     8:48   0 447.1G  0 disk
 sdd1                  8:49   0   571M  0 part  /boot/efi
 sdd2                  8:50   0   488M  0 part
   md0                 9:0    0   487M  0 raid1 /boot
 sdd3                  8:51   0 446.1G  0 part
   md1                 9:1    0   446G  0 raid1
     md1_crypt       253:0    0   446G  0 crypt
       foobar-root_1 253:1    0  14.9G  0 lvm   /
[...]
sdf                     8:80   0 447.1G  0 disk
 sdf1                  8:81   0   571M  0 part
 sdf2                  8:82   0   488M  0 part
   md0                 9:0    0   487M  0 raid1 /boot
 sdf3                  8:83   0 446.1G  0 part
   md1                 9:1    0   446G  0 raid1
     md1_crypt       253:0    0   446G  0 crypt
       foobar-root_1 253:1    0  14.9G  0 lvm   /
The actual crypsetup configuration is:
% cat /etc/crypttab
md1_crypt UUID=77246138-b666-4151-b01c-5a12db54b28b none luks,discard
Now, to automatically open the crypto device during boot we can instead use:
% cat /etc/crypttab 
md1_crypt UUID=77246138-b666-4151-b01c-5a12db54b28b none luks,discard,keyscript=/etc/initramfs-tools/unlock.sh
# touch /etc/initramfs-tools/unlock.sh
# chmod 0700 /etc/initramfs-tools/unlock.sh
# $EDITOR etc/initramfs-tools/unlock.sh
# cat /etc/initramfs-tools/unlock.sh
#!/bin/sh
echo -n "provide_the_actual_password_here"
# update-initramfs -k all -u
[...]
The server will then boot without prompting for a crypto password. Note that initramfs-tools by default uses an insecure umask of 0022, resulting in the initrd being accessible to everyone. But if you have the dropbear-initramfs package installed, its /usr/share/initramfs-tools/conf-hooks.d/dropbear sets UMASK=0077 , so the resulting /boot/initrd* file should automatically have proper permissions (0600). The cryptsetup hook warns about a permissive umask configuration during update-initramfs runs, but if you want to be sure, explicitly set it via e.g.:
# cat > /etc/initramfs-tools/conf.d/umask << EOF
# restrictive umask to avoid non-root access to initrd:
UMASK=0077
EOF
# update-initramfs -k all -u
Disclaimer: Of course you need to trust users with access to /etc/initramfs-tools/unlock.sh as well as the initramfs/initrd on your system. Furthermore you should wipe the boot partition (to destroy the keyfile information) before handing over such a disk. But that is a risk my customer can live with, YMMV.

15 March 2023

Freexian Collaborators: Debian Contributions: Core python package, Redmine backports, and more! (by Utkarsh Gupta, Stefano Rivera)

Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Core Python Packages, by Stefano Rivera Just before the freeze, pip added support for PEP-668. This is a scheme devised by Debian with other distributions and the Python Packaging Authority, to allow distributors to mark Python installations as being managed by a distribution package manager. When this EXTERNALLY-MANAGED flag is present, installers like pip will refuse to install packages outside a virtual environment. This protects users from breaking unrelated software on their systems, when installing packages with pip or similar tools. Stefano quickly got this version of pip into the archive, marked Debian s Python interpreters as EXTERNALLY-MANAGED, and worked with the upstream to add a mechanism to allow users to override the restriction. Debian bookworm will likely be the first distro release to implement this change. The transition from Python 3.10 to 3.11 was one of the last to complete before the bookworm freeze (as 3.11 only released at the end of October 2022). Stefano helped port some Python packages to 3.11, in January, and also kicked off the final transition to remove Python 3.10 support. Stefano did a big round of bug triage in the cPython interpreter (and related) packages, applying some provided patches, and fixing some long-standing minor bugs in the packaging. To allow Debian packages to more accurately reflect upstream-specified dependencies that only apply under specific Python interpreter versions, in the future, Stefano added more metadata to the python3 binary package. Python s unittest runner would successfully exit with 0 passed tests, if it couldn t find any tests. This means that configuration / layout changes can cause test failures to go unnoticed, because the tests aren t being run any more in Debian packages. Stefano proposed a change to Python 3.12 to change this behavior and treat 0 tests as a kind of failure.

debvm, by Helmut Grohne With support from Johannes Schauer Marin Rodrigues, and Jochen Sprickerhof, Helmut Grohne wrote debvm, a tool for quickly creating and running Debian virtual machine images for various architectures and Debian and Ubuntu releases. This is meant for development and testing purposes and has already identified a number of bugs in e.g. fakechroot (#1029490), Linux (#1029270), and runit (#1028181).

Rails 6 and Redmine 5 available in bullseye-backports, by Utkarsh Gupta Bullseye users can now upgrade to the latest 6.1 branch of Rails, v6.1.7, and the latest Redmine version, v5.0.4. The Ruby team received numerous requests to backport the latest version of Rails and Redmine, especially since there was no redmine shipped in the bullseye release itself. So this is big news for all users as we ve not only successfully backported both the packages, but also fixed all the CVEs and RC bugs in the process! This work was sponsored by Entrouvert.

Patches metadata in the Package Tracker, by Rapha l Hertzog Building on the great Ultimate Debian Database work of Lucas Nussbaum and on his suggestion, Rapha l enhanced the Debian Package Tracker to display action items when the patches metadata indicate that some patches were not forwarded upstream, or when the metadata were invalid. One can now also browse the patches metadata from the Links panel on the right.

Fixed kernel bug that broke debian-installer on computers with Mediatek wifi devices, by Helmut Grohne As part of our regular work on Kali Linux for OffSec, they funded Helmut s work to fix the MT7921e driver. When being loaded without firmware available, it would not register itself, but upon module release it would unregister itself causing a kernel oops. This was commonly observed in Kali Linux when reloading the module to add firmware. Helmut Grohne identified the cause and sent a patch, a different variant of which is now heading into Linux and available from Kali Linux.

Printing in Debian, by Thorsten Alteholz There are about 40 packages in Debian that take care of sending output to printers, scan documents, or even send documents to fax machines. In the light of the upcoming/already ongoing freeze, these packages had to be updated to the latest version and bugs had to be fixed. Basically this applies to large packages like cups, cups-filters, hplip but also the smaller ones that shouldn t be neglected. All in all Thorsten uploaded 13 packages with new upstream versions or improved packaging and could resolve 14 bugs. Further triaging led to 35 bugs that could be closed, either because they were already fixed and not closed in an earlier upload or they could not be reproduced with current software versions. There is also work to do to prepare for the future. Historically, printing on Linux required finding a PPD file for your printer and finding some software that is able to render your documents with this PPD. These days, driverless printing is becoming more common and the use of PPD files has decreased. In the upcoming version 3.0 of cups, PPD files are no longer supported and so called printer applications need to be used. In order not to lose the ability to print documents, this big transition needs to be carefully planned. This started in the beginning of 2023 and will hopefully be finished with the release of Debian Trixie. More information can be found in this Debian Printing Wiki article. In preparation for this transition Thorsten created three new packages.

Yade update, by Anton Gladky Last month, Anton updated the yade package to the newest 2023.02a version, which includes new features. Yade is a software package for discrete element method (DEM) simulations, which are widely used in scientific and engineering fields for the simulation of granular systems. Yade is an open-source project that is being used worldwide for different tasks, such as geomechanics, civil engineering, mining, and materials science. The Yade package in Debian supports different precision levels for its simulations. This means that researchers and engineers can select the needed precision level without recompiling the package, saving time and effort.

Miscellaneous contributions
  • Helmut Grohne continues to improve cross building (mostly Qt) and architecture bootstrap (mostly loong64 and musl).

Next.

Previous.