In my home lab(s), I have a handful of machines spread around a few
points of presence, with mostly residential/commercial cable/DSL
uplinks, which means, generally, NAT. This makes monitoring those
devices kind of impossible. While I do punch holes for SSH, using jump
hosts gets old quick, so I'm considering adding a
virtual private
network (a "VPN", not a
VPN service) so that all machines can
be reachable from everywhere.
I see three ways this can work:
- a home-made Wireguard VPN, deployed with Puppet
- a Wireguard VPN overlay, with Tailscale or equivalent
- IPv6, native or with tunnels
So which one will it be?
Wireguard Puppet modules
As is (unfortunately) typical with Puppet, I found multiple different
modules to talk with Wireguard.
module |
score |
downloads |
release |
stars |
watch |
forks |
license |
docs |
contrib |
issue |
PR |
notes |
halyard |
3.1 |
1,807 |
2022-10-14 |
0 |
0 |
0 |
MIT |
no |
|
|
|
requires firewall and Configvault_Write modules? |
voxpupuli |
5.0 |
4,201 |
2022-10-01 |
2 |
23 |
7 |
AGPLv3 |
good |
1/9 |
1/4 |
1/61 |
optionnally configures ferm , uses systemd-networkd, recommends systemd module with manage_systemd to true , purges unknown keys |
abaranov |
4.7 |
17,017 |
2021-08-20 |
9 |
3 |
38 |
MIT |
okay |
1/17 |
4/7 |
4/28 |
requires pre-generated private keys |
arrnorets |
3.1 |
16,646 |
2020-12-28 |
1 |
2 |
1 |
Apache-2 |
okay |
1 |
0 |
0 |
requires pre-generated private keys? |
The
voxpupuli module seems to be the most promising. The
abaranov module is more
popular and has more contributors, but
it has more open issues and PRs.
More critically, the voxpupuli module was written after the abaranov
author didn't respond to a
PR from the voxpupuli author trying to
add more automation (namely private key management).
It looks like setting up a wireguard network would be as simple as
this on node A:
wireguard::interface 'wg0':
source_addresses => ['2003:4f8:c17:4cf::1', '149.9.255.4'],
public_key => $facts['wireguard_pubkeys']['nodeB'],
endpoint => 'nodeB.example.com:53668',
addresses => [ 'Address' => '192.168.123.6/30', , 'Address' => 'fe80::beef:1/64' ,],
This configuration come from
this pull request I sent to the
module to document how to use that fact.
Note that the addresses used here are examples that shouldn't be
reused and do not confirm to
RFC5737 ("IPv4 Address Blocks
Reserved for Documentation", 192.0.2.0/24 (TEST-NET-1),
198.51.100.0/24 (TEST-NET-2), and 203.0.113.0/24 (TEST-NET-3)) or
RFC3849 ("IPv6 Address Prefix Reserved for Documentation",
2001:DB8::/32), but that's another story.
(To avoid boostrapping problems, the
resubmit-facts configuration
could be used so that other nodes facts are more immediately
available.)
One problem with the above approach is that you explicitly need to
take care of routing, network topology, and addressing. This can get
complicated quickly, especially if you have lots of devices, behind
NAT, in multiple locations (which is basically my life at home,
unfortunately).
Concretely, basic Wireguard only support
one peer behind
NAT. There are some workarounds for this, but they generally imply
a
relay server of some sort, or
some custom registry, it's
kind of a mess. And this is where overlay networks like Tailscale come
in.
Tailscale
Tailscale is basically designed to deal with this problem. It's
not fully opensource, but pretty close, and they have an interesting
philosophy behind that. The client is opensource, and there
is an opensource version of the server side, called
headscale. They have recently (late 2022) hired the main headscale
developer while promising to keep supporting it, which is pretty
amazing.
Tailscale provides an overlay network based on Wireguard, where each
peer basically has a peer-to-peer encrypted connexion, with automatic
key rotation. They also ship a multitude of applications and features
on top of that like file sharing, keyless SSH access, and so on. The
authentication layer is based on an existing SSO provider, you
don't just register with Tailscale with new account, you login with
Google, Microsoft, or GitHub (which, really, is still Microsoft).
The Headscale server ships with many features out of that:
- Full "base" support of Tailscale's features
- Configurable DNS
- Split DNS
- MagicDNS (each user gets a name)
- Node registration
- Single-Sign-On (via Open ID Connect)
- Pre authenticated key
- Taildrop (File Sharing)
- Access control lists
- Support for multiple IP ranges in the tailnet
- Dual stack (IPv4 and IPv6)
- Routing advertising (including exit nodes)
- Ephemeral nodes
- Embedded DERP server (AKA NAT-to-NAT traversal)
Neither project (client or server) is in Debian (RFP 972439 for
the client, none filed yet for the server), which makes deploying this
for my use case rather problematic. Their install instructions
are basically a curl bash
but they also provide packages for
various platforms. Their Debian install instructions are
surprisingly good, and check most of the third party checklist
we're trying to establish. (It's missing a pin.)
There's also a Puppet module for tailscale, naturally.
What I find a little disturbing with Tailscale is that you not only
need to trust Tailscale with authorizing your devices, you also
basically delegate that trust also to the SSO provider. So, in my
case, GitHub (or anyone who compromises my account there) can
penetrate the VPN. A little scary.
Tailscale is also kind of an "all or nothing" thing. They have
MagicDNS, file transfers, all sorts of things, but those things
require you to hook up your resolver with Tailscale. In fact,
Tailscale kind of assumes you will use their nameservers, and have
suffered great lengths to figure out how to do that. And
naturally, here, it doesn't seem to work reliably; my resolv.conf
somehow gets replaced and the magic resolution of the ts.net
domain
fails.
(I wonder why we can't opt in to just publicly resolve the ts.net
domain. I don't care if someone can enumerate the private IP addreses
or machines in use in my VPN, at least I don't care as much as
fighting with resolv.conf
everywhere.)
Because I mostly have access to the routers on the networks I'm on, I
don't think I'll be using tailscale in the long term. But it's pretty
impressive stuff: in the time it took me to even review the Puppet
modules to configure Wireguard (which is what I'll probably end up
doing), I was up and running with Tailscale (but with a broken DNS,
naturally).
(And yes, basic Wireguard won't bring me DNS either, but at least I
won't have to trust Tailscale's Debian packages, and Tailscale, and
Microsoft, and GitHub with this thing.)
IPv6
IPv6 is actually what is supposed to solve this. Not NAT port
forwarding crap, just real IPs everywhere.
The problem is: even though IPv6 adoption is still growing, it's
kind of reaching a plateau at around 40% world-wide, with Canada
lagging behind at 34%. It doesn't help that major ISPs in Canada
(e.g. Bell Canada, Videotron) don't care at all about IPv6
(e.g. Videotron in beta since 2011). So we can't rely on
those companies to do the right thing here.
The typical solution here is often to use a tunnel like HE's
tunnelbroker.net. It's kind of tricky to configure, but once
it's done, it works. You get end-to-end connectivity as long as
everyone on the network is on IPv6.
And that's really where the problem lies here; the second one of
your nodes can't setup such a tunnel, you're kind of stuck and that
tool completely breaks down. IPv6 tunnels also don't give you the kind
of security a VPN provides as well, naturally.
The other downside of a tunnel is you don't really get peer-to-peer
connectivity: you go through the tunnel. So you can expect higher
latencies and possibly lower bandwidth as well. Also, HE.net doesn't
currently charge for this service (and they've been doing this for a
long time), but this could change in the future (just like
Tailscale, that said).
Concretely, the latency difference is rather minimal, Google:
--- ipv6.l.google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,8ms
RTT[ms]: min = 13, median = 14, p(90) = 14, max = 15
--- google.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 136,0ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14
In the case of GitHub, latency is actually lower, interestingly:
--- ipv6.github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 134,6ms
RTT[ms]: min = 13, median = 13, p(90) = 14, max = 14
--- github.com ping statistics ---
10 packets transmitted, 10 received, 0,00% packet loss, time 293,1ms
RTT[ms]: min = 29, median = 29, p(90) = 29, max = 30
That is because HE.net peers directly with my ISP and Fastly (which
is behind GitHub.com's IPv6, apparently?), so it's only 6 hops
away. While over IPv4, the ping goes over New York, before landing
AWS's Ashburn, Virginia datacenters, for a whopping 13 hops...
I managed setup a HE.net tunnel at home, because I also need IPv6
for other reasons (namely debugging at work). My first attempt at
setting this up in the office failed, but now that I found the
openwrt.org guide, it worked... for a while, and I was able to
produce the above, encouraging, mini benchmarks.
Unfortunately, a few minutes later, IPv6 just went down again. And the
problem with that is that many programs (and especially
OpenSSH) do not respect the Happy Eyeballs protocol (RFC
8305), which means various mysterious "hangs" at random times on
random applications. It's kind of a terrible user experience, on top
of breaking the one thing it's supposed to do, of course, which is to
give me transparent access to all the nodes I maintain.
Even worse, it would still be a problem for other remote nodes I might
setup where I might not have acess to the router to setup the
tunnel. It's also not absolutely clear what happens if you setup the
same tunnel in two places... Presumably, something is smart enough to
distribute only a part of the /48
block selectively, but I don't
really feel like going that far, considering how flaky the setup is
already.
Other options
If this post sounds a little biased towards IPv6 and Wireguard, it's
because it is. I would like everyone to migrate to IPv6 already, and
Wireguard seems like a simple and sound system.
I'm aware of many other options to make VPNs. So before anyone jumps
in and says "but what about...", do know that I have personnally
experimented with:
- tinc: nice, automatic meshing, used for the Montreal
mesh, serious design flaws in the crypto that make it
generally unsafe to use; supposedly, v1.1 (or 2.0?) will fix this, but that's
been promised for over a decade by now
- ipsec, specifically strongswan: hard to configure
(especially configure correctly!), harder even to debug, otherwise
really nice because transparent (e.g. no need for special subnets),
used at work, but also considering a replacement there because it's
a major barrier to entry to train new staff
- OpenVPN: mostly used as a client for [VPN service][]s like
Riseup VPN or Mullvad, mostly relevant for client-server
configurations, not really peer-to-peer, shared secrets or TLS,
kind of an hassle to maintain, see also SoftEther for an
alternative implementation
All of those solutions have significant problems and I do not wish to
use any of those for this project.
Also note that Tailscale is only one of many projects laid over
Wireguard to do that kind of thing, see this LWN review for
others (basically NetbBird, Firezone, and Netmaker).
Future work
Those are options that came up after writing this post, and might
warrant further examination in the future.
- Meshbird, a "distributed private networking" with little
information about how it actually works other than "encrypted with
strong AES-256"
- Nebula, "A scalable overlay networking tool with a focus on
performance, simplicity and security", written by Slack people to
replace IPsec, docs, runs as an overlay for Slack's 50k
node network, only packaged in Debian experimental, lagging behind
upstream (1.4.0, from May 2021 vs upstream's 1.6.1 from September
2022), requires a central CA, Golang, I'm in "wait and see" mode
for now
- n2n: "layer two VPN", seems packaged in Debian but inactive
- ouroboros: "peer-to-peer packet network prototype", sounds and
seems complicated
- QuickTUN is interesting because it's just a small wrapper
around NaCL, and it's in Debian... but maybe too obscure for my own
good
- unetd: Wireguard-based full mesh networking from OpenWRT, not
in Debian
- vpncloud: "high performance peer-to-peer mesh VPN over UDP
supporting strong encryption, NAT traversal and a simple
configuration", sounds interesting, not in Debian
- Yggdrasil: actually a pretty good match for my use case, but I
didn't think of it when starting the experiments here; packaged in
Debian, with the Golang version planned, Puppet
module; major caveat: nodes exposed publicly inside the global
mesh unless configured otherwise (firewall suggested),
requires port forwards, alpha status
Conclusion
Right now, I'm going to deploy Wireguard tunnels with Puppet. It seems
like kind of a pain in the back, but it's something I will be able to
reuse for work, possibly completely replacing strongswan.
I have another Puppet module for IPsec which I was planning to
publish, but now I'm thinking I should just abort that and replace
everything with Wireguard, assuming we still need VPNs at work in the
future. (I have a number of reasons to believe we might not need any
in the near future anyways...)