The Reproducible Builds project relies on several projects, supporters and sponsors for financial support, but they are also valued as ambassadors who spread the word about our project and the work that we do.
This is the
eighth installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. We started this series by
featuring the Civil Infrastructure Platform project, and followed this up with a
post about the Ford Foundation as well as recent ones about
ARDC, the
Google Open Source Security Team (GOSST),
Bootstrappable Builds,
the F-Droid project,
David A. Wheeler and
Simon Butler.
Today, however, we will be talking with
Kees Cook,
founder of the
Kernel Self-Protection Project.
Vagrant Cascadian: Could you tell me a bit about yourself? What sort
of things do you work on?
Kees Cook: I m a Free Software junkie living in Portland, Oregon, USA.
I have been focusing on the upstream Linux kernel s protection
of itself. There is a lot of support that the kernel provides
userspace to defend itself, but when I first started focusing on this
there was not as much attention given to the kernel protecting
itself. As userspace got more hardened the kernel itself became a
bigger target. Almost 9 years ago I formally announced the
Kernel Self-Protection Project
because the work necessary was way more than my time and expertise could do
alone. So I just try to get people to help as much as possible; people who
understand the ARM architecture, people who understand the memory management
subsystem to help, people who understand how to make the kernel less buggy.
Vagrant: Could you describe the path that lead you to working on this
sort of thing?
Kees: I have always been interested in security through the aspect of
exploitable flaws. I always thought it was like a magic trick to make a
computer do something that it was very much not designed to do and seeing how
easy it is to subvert bugs. I wanted to improve that fragility. In 2006, I
started working at Canonical on Ubuntu and was mainly focusing on bringing
Debian and Ubuntu up to what was the state of the art for Fedora and Gentoo s
security hardening efforts. Both had really pioneered a lot of userspace
hardening with compiler flags and
ELF
stuff and many other things for hardened
binaries. On the whole, Debian had not really paid attention to it. Debian s
packaging building process at the time was sort of a chaotic free-for-all as
there wasn t centralized build methodology for defining things. Luckily that
did slowly change over the years. In Ubuntu we had the opportunity to apply top
down build rules for hardening all the packages. In 2011 Chrome OS was
following along and took advantage of a bunch of the security hardening work as
they were based on
ebuild
out of Gentoo
and when they looked for someone to
help out they reached out to me. We recognized the Linux kernel was pretty much
the weakest link in the Chrome OS security posture and I joined them to help
solve that. Their userspace was pretty well handled but the kernel had a lot
of weaknesses, so focusing on hardening was the next place to go. When I
compared notes with other users of the Linux kernel within Google there were a
number of common concerns and desires. Chrome OS already had an upstream
first requirement, so I tried to consolidate the concerns and solve them
upstream. It was challenging to land anything in other kernel team repos at
Google, as they (correctly) wanted to minimize their delta from upstream, so I
needed to work on any major improvements entirely in upstream and had a lot of
support from Google to do that. As such, my focus shifted further from working
directly on Chrome OS into being entirely upstream and being more of a
consultant to internal teams, helping with integration or sometimes
backporting. Since the volume of needed work was so gigantic I needed to find
ways to inspire other developers (both inside and outside of Google) to help.
Once I had a budget I tried to get folks paid (or hired) to work on these areas
when it wasn t already their job.
Vagrant: So my understanding of some of your recent work is basically
defining
undefined behavior in the language or compiler?
Kees: I ve found the term undefined behavior to have a really strict
meaning within the compiler community, so I have tried to redefine my goal as
eliminating unexpected behavior or ambiguous language constructs . At the
end of the day ambiguity leads to bugs, and bugs lead to exploitable security
flaws. I ve been taking a four-pronged approach: supporting the work people are
doing to get rid of ambiguity, identify new areas where ambiguity needs to be
removed, actually removing that ambiguity from the C language, and then dealing
with any needed refactoring in the Linux kernel source to adapt to the new
constraints.
None of this is particularly novel; people have recognized how dangerous some
of these language constructs are for decades and decades but I think it is a
combination of hard problems and a lot of refactoring that nobody has the
interest/resources to do. So, we have been incrementally going after the lowest
hanging fruit. One clear example in recent years was the elimination of C s
implicit fall-through in
switch
statements. The language would just fall
through between adjacent
case
s if a
break
(or other code flow directive)
wasn t present. But this is ambiguous: is the code meant to fall-through, or
did the author just forget a
break
statement? By
defining the [[fallthrough]]
statement,
and
requiring its use in
Linux,
all
switch
statements now have explicit code flow, and the entire class of
bugs disappeared. During our refactoring we actually found that 1 in 10 added
[[fallthrough]]
statements were actually missing
break
statements. This
was an extraordinarily common bug!
So getting rid of that ambiguity is where we have been. Another area I ve been
spending a bit of time on lately is looking at how defensive security work has
challenges associated with metrics. How do you measure your defensive security
impact? You can t say because we installed locks on the doors, 20% fewer
break-ins have happened. Much of our signal is always secondary or
retrospective, which is frustrating: This class of flaw was used X much over
the last decade so, and if we have eliminated that class of flaw and will never
see it again, what is the impact? Is the impact infinity? Attackers will just
move to the next easiest thing. But it means that exploitation gets
incrementally more difficult. As attack surfaces are reduced, the expense of
exploitation goes up.
Vagrant: So it is hard to identify how effective this is how bad would it be
if people just gave up?
Kees: I think it would be pretty bad, because as we have seen, using
secondary factors, the work we have done in the industry at large, not just the
Linux kernel, has had an impact. What we, Microsoft, Apple, and everyone else
is doing for their respective software ecosystems, has shown that the price of
functional exploits in the black market has gone up. Especially for really
egregious stuff like a zero-click remote code execution.
If those were cheap then obviously we are not doing something right, and it
becomes clear that it s trivial for anyone to attack the infrastructure that
our lives depend on. But thankfully we have seen over the last two decades that
prices for exploits keep going up and up into millions of dollars. I think it
is important to keep working on that because, as a central piece of modern
computer infrastructure, the Linux kernel has a giant target painted on it. If
we give up, we have to accept that our computers are not doing what they were
designed to do, which I can t accept. The safety of my grandparents shouldn t
be any different from the safety of journalists, and political activists, and
anyone else who might be the target of attacks. We need to be able to trust our
devices otherwise why use them at all?
Vagrant: What has been your biggest success in recent years?
Kees: I think with all these things I am not the only actor. Almost
everything that we have been successful at has been because of a lot
of people s work, and one of the big ones that has been coordinated
across the ecosystem and across compilers was
initializing stack variables to 0 by default.
This feature was added in
Clang,
GCC,
and
MSVC
across the board even though there were a lot of fears about forking the C language.
The worry was that developers would come to depend on zero-initialized stack
variables, but this hasn t been the case because we still warn about
uninitialized variables when the compiler can figure that out. So you still
still get the warnings at compile time but now you can count on the contents of
your stack at run-time and we drop an entire class of uninitialized variable flaws.
While the exploitation of this class has mostly been around memory content
exposure, it has also been
used for control flow attacks.
So that was politically and technically a large challenge: convincing people it
was necessary, showing its utility, and implementing it in a way that everyone
would be happy with, resulting in the elimination of a large and persistent
class of flaws in C.
Vagrant: In a world where things are generally Reproducible do you see ways
in which that might affect your work?
Kees: One of the questions I frequently get is, What version of the Linux
kernel has feature
$foo? If I know how things are built, I can answer with
just a version number. In a Reproducible Builds scenario I can count on the
compiler version, compiler flags, kernel configuration, etc. all those things
are known, so I can actually answer definitively that a certain feature exists.
So that is an area where Reproducible Builds affects me most directly.
Indirectly, it is just being able to trust the binaries you are running are
going to behave the same for the same build environment is critical for sane
testing.
Vagrant: Have you used
diffoscope?
Kees: I have! One subset of tree-wide refactoring that we do when getting
rid of ambiguous language usage in the kernel is when we have to make source
level changes to satisfy some new compiler requirement but where the binary
output is not expected to change at all. It is mostly about getting the
compiler to understand what is happening, what is intended in the cases where
the old ambiguity does actually match the new unambiguous description of what
is intended. The binary shouldn t change. We have
used diffoscope to compare
the before and after binaries to confirm that yep, there is no change in
binary .
Vagrant: You cannot just use checksums for that?
Kees: For the most part, we need to only compare the text segments. We try
to hold as much stable as we can, following the
Reproducible Builds documentation for the kernel,
but there are macros in the kernel that are sensitive to source line numbers
and as a result those will change the layout of the data segment (and sometimes
the text segment too). With diffoscope there s flexibility where I can exclude
or include different comparisons. Sometimes I just go look at what diffoscope
is doing and do that manually, because I can tweak that a little harder, but
diffoscope is definitely the default. Diffoscope is awesome!
Vagrant: Where has reproducible builds affected you?
Kees: One of the notable wins of reproducible builds lately was
dealing with the fallout of the
XZ backdoor and just being able to ask
the question is my build environment running the expected
code? and to be able to compare the output generated from one
install that never had a vulnerable XZ and one that did have a
vulnerable XZ and compare the results of what you get. That was
important for kernel builds because the XZ threat actor was working to
expand their influence and capabilities to include Linux kernel
builds, but they didn t finish their work before they were noticed. I
think what happened with
Debian proving the build infrastructure was not affected is an
important example of how people would have needed to verify the kernel
builds too.
Vagrant: What do you want to see for the near or distant future in security work?
Kees: For reproducible builds in the kernel, in the work that has been
going on in the
ClangBuiltLinux project, one of the driving forces of
code and usability quality has been the
continuous integration work. As soon as something breaks, on the
kernel side, the Clang side, or something in between the two, we get a
fast signal and can chase it and fix the bugs quickly. I would like to
see someone with funding to maintain a reproducible kernel build
CI. There have been places where there are certain
architecture configurations or certain build configuration where we lose
reproducibility and right now we have sort of a standard open source
development feedback loop where those things get fixed but the time
in between introduction and fix can be large. Getting a CI for
reproducible kernels would give us the opportunity to shorten that
time.
Vagrant: Well, thanks for that! Any last closing thoughts?
Kees: I am a big fan of reproducible builds, thank you for all your work.
The world is a safer place because of it.
Vagrant: Likewise for your work!
For more information about the Reproducible Builds project, please see our website at
reproducible-builds.org. If you are interested in
ensuring the ongoing security of the software that underpins our civilisation
and wish to sponsor the Reproducible Builds project, please reach out to the
project by emailing
contact@reproducible-builds.org.