Arturo Borrero Gonz lez: Netfilter Workshop 2022 summary
This is my report from the Netfilter Workshop 2022. The event was held on 2022-10-20/2022-10-21 in Seville, and the venue
was the offices of Zevenet. We started on Thursday with Pablo Neira (head of the project) giving a short
welcome / opening speech. The previous iteration of this event was in virtual fashion in 2020, two years ago.
In the year 2021 we were unable to meet either in person or online.
This year, the number of participants was just eight people, and this allowed the setup to be a bit more informal.
We had kind of an un-conference style meeting, in which whoever had something prepared just went ahead and opened
a topic for debate.
In the opening speech, Pablo did a quick recap on the legal problems the Netfilter project had a few years ago, a
topic that was settled for good some months ago, in January 2022. There were no news in this front,
which was definitely a good thing.
Moving into the technical topics, the workshop proper, Pablo started to comment on the recent developments to
instrument a way to perform inner matching for tunnel protocols. The current implementation supports VXLAN, IPIP,
GRE and GENEVE. Using nftables you can match packet headers that are encapsulated inside these protocols.
He mentioned the design and the goals, that was to have a kernel space setup that allows adding more protocols by just
patching userspace. In that sense, more tunnel protocols will be supported soon, such as IP6IP, UDP, and ESP.
Pablo requested our opinion on whether if nftables should generate the matching dependencies. For example,
if a given tunnel is UDP-based, a dependency match should be there otherwise the rule won t work as expected. The
agreement was to assist the user in the setup when possible and if not, print clear error messages.
By the way, this inner thing is pure stateless packet filtering. Doing inner-conntracking is an open topic
that will be worked on in the future.
Pablo continued with the next topic: nftables automatic ruleset optimizations. The times of linear ruleset evaluation
are over, but some people have a hard time understanding / creating rulesets that leverage maps, sets, and
concatenations. This is where the ruleset optimizations kick in: it can transform a given ruleset to be more optimal
by using such advanced data structures. This is purely about optimizing the ruleset, not about validating
the usefulness of it, which could be another interesting project.
There were a couple of problems mentioned, however. The ruleset optimizer can be slow, O(n!) in worst case. And the
user needs to use nested syntax. More improvements to come in the future.
Next was Stefano Brivio s turn (Red Hat engineer). He had been involved lately in a couple of migrations to
nftables, in particular libvirt and KubeVirt. We were pointed to https://libvirt.org/firewall.html, and Stefano walked us
through the 3 or 4 different virtual networks that libvirt can create. He evaluated some options to generate efficient
rulesets in nftables to instrument such networks, and commented on a couple of ideas: having a null
matcher in nftables set expression. Or perhaps having kind of subsets, something similar to a view in a SQL
database. The room spent quite a bit of time debating how the nft_lookup API could be extended to support such new
search operations.
We also discussed if having intermediate facilities such as firewalld could provide the abstraction levels that
could make developers more comfortable. Using firewalld also may have the advantage that coordination between different
system components writing ruleset to nftables is handled by firewalld itself and developers are freed of the
responsibility of doing it right.
Next was Fernando F. Mancera (Red Hat engineer). He wanted to improve error reporting when deleting table/chain/rules
with nftables. In general, there are some inconsistencies on how tables can be deleted (or flushed). And there seems
to be no correct way to make a single table go away with all its content in a single command.
The room agreed in that the commands
destroy table
and delete table
should be defined consistently, with
the following meanings:
- destroy: nuke the table, don t fail if it doesn t exist
- delete: delete the table, but the command will fail if it doesn t exist
-m nft xyz
. Which feels ugly, but may work. We also explored playing with the semantics of
release version numbers. And another idea: store strings in the nft rule userdata area with the equivalent
matching information for older iptables-nft.
In fact, what Phil may have been looking for is not backwards but forward compatibility. Phil was undecided which path
to follow, but perhaps the most common-sense approach is to fall back to a major release version bump (2.x.y)
and declaring compatibility breakage with older iptables 1.x.y.
That was pretty much it for the first day. We had dinner together and went to sleep for the next day.
The second day was opened by Florian Westphal (Netfilter coreteam member and Red Hat engineer). Florian has been
trying to improve nftables performance in kernels with RETPOLINE mitigations enabled. He commented that several
workarounds have been collected over the years to avoid the performance penalty of such mitigations.
The basic strategy is to avoid function indirect calls in the kernel.
Florian also described how BPF programs work around this more effectively. And actually, Florian tried translating
nf_hook_slow()
to BPF. Some preliminary benchmarks results were showed, with about 2% performance improvement in
MB/s and PPS. The flowtable infrastructure is specially benefited from this approach. The software
flowtable infrastructure already offers a 5x performance improvement with regards the classic forwarding path, and the
change being researched by Florian would be an addition on top of that.
We then moved into discussing the meeting Florian had with Alexei in Zurich. My personal opinion was that
Netfilter offers interesting user-facing interfaces and semantics that BPF does not. Whereas BPF may be more performant
in certain scenarios. The idea of both things going hand in hand may feel natural for some people. Others also
shared my view, but no particular agreement was reached in this topic. Florian will probably continue exploring options
on that front.
The next topic was opened by Fernando. He wanted to discuss Netfilter involvement in Google Summer of Code and Outreachy.
Pablo had some personal stuff going on last year that prevented him from engaging in such projects. After all, GSoC is
not fundamental or a priority for Netfilter. Also, Pablo mentioned the lack of support from others in the project for
mentoring activities. There was no particular decision made here. Netfilter may be present again in such initiatives
in the future, perhaps under the umbrella of other organizations.
Again, Fernando proposed the next topic: nftables JSON support. Fernando shared his plan of going over all features
and introduce programmatic tests from them. He also mentioned that the nftables wiki was incomplete and couldn t be
used as a reference for missing tests. Phil suggested running the nftables python test-suite in JSON mode, which
should complain about missing features. The py test suite should cover pretty much all statements and variations on
how the nftables expression are invoked.
Next, Phil commented on nftables xtables support. This is, supporting legacy xtables extensions in nftables.
The most prominent problem was that some translations had some corner cases that resulted in a listed ruleset that
couldn t be fed back into the kernel. Also, iptables-to-nftables translations can be sloppy, and the resulting
rule won t work in some cases. In general, nft list ruleset nft -f
may fail in rulesets created by iptables-nft
and there is no trivial way to solve it.
Phil also commented on potential iptables-tests.py speed-ups. Running the test suite may take very long time
depending on the hardware. Phil will try to re-architect it, so it runs faster. Some alternatives had been
explored, including collecting all rules into a single iptables-restore run, instead of hundreds of individual
iptables calls.
Next topic was about documentation on the nftables wiki. Phil is interested in having all nftables
code-flows documented, and presented some improvements in that front. We are trying to organize all
developer-oriented docs on a mediawiki portal, but the extension was not active yet. Since I worked at the
Wikimedia Foundation, all the room stared at me, so at the end I kind of committed to exploring and enabling the
mediawiki portal extension. Note to self: is this perhaps https://www.mediawiki.org/wiki/Portals ?
Next presentation was by Pablo. He had a list of assorted topics for quick review and comment.
- We discussed nftables accept/drop semantics. People that gets two or more rulesets from different software are requesting additional semantics here. A typical case is fail2ban integration. One option is quick accept (no further evaluation if accepted) and the other is lazy drop (don t actually drop the packet, but delay decision until the whole ruleset has been evaluated). There was no clear way to move forward with this.
- A debate on nft userspace memory usage followed. Some people are running nftables on low end devices with very
little memory (such as 128 MB). Pablo was exploring a potential solution: introducing
struct constant_expr
, which can reduce 12.5% mem usage. - Next we talked about repository licensing (or better, relicensing to GPLv2+). Pablo went over a list of files in the nftables tree which had diverging licenses. All people in the room agreed on this relicensing effort. A mention to the libreadline situation was made.
- Another quick topic: a bogus EEXIST in nft_rbtree. Pablo & Stefano to work in a patch.
- Next one was conntrack early drop in flowtable. Pablo is studying use cases for some legitimate UDP unidirectional flows (like RTP traffic).
- Pablo and Stefano discussed pipapo not being atomic on updates. Stefano already looked into it, and one of the ideas was to introduce a new commit API for sets.
- The last of the quick topics was an idea to have a global table in nftables. Or some global items, like sets. Folk in the community keep asking for this. Some ideas were discussed, like perhaps adding a family agnostic family. But then there would be a challenge: nftables would need to generate byte code that works in any of the hooks. There was no immediate way of addressing this. The idea of having templated tables/sets circulated again as a way of reusing data across namespaces/families.
nft_tunnel
expression that can do this encapsulation for complete feature parity. It is only available in
the kernel, but it can be made available easily on the userspace utility too.
Also, we discussed some limitations of katran, for example, inability to handle IP fragmentation, IP options, and
potentially others not documented anywhere. This seems to be common with XDP/BPF programs, because handling all
possible network scenarios would over-complicate the BPF programs, and at that point you are probably better off by
using the normal Linux network stack and nftables.
In summary, we agreed that nftlb can pretty much offer the same as katran, in a more flexible way.
Finally, after many interesting debates over two days, the workshop ended. We all agreed on the need for extending
it to 3 days next time, since 2 days feel too intense and too short for all the topics worth discussing.
That s all on my side! I really enjoyed this Netfilter workshop round.