Linux v5.8 was released in August, 2020. Here s my summary of various security things that caught my attention:
arm64 Branch Target Identification
Dave Martin added support for ARMv8.5 s Branch Target Instructions
), which are enabled in userspace at execve()
time, and all the time in the kernel
(which required manually marking up a lot of non-C code, like assembly and JIT code
With this in place, Jump-Oriented Programming (JOP, where code gadgets are chained together with jumps and calls) is no longer available to the attacker. An attacker s code must make direct function calls. This basically reduces the usable code available to an attacker from every word in the kernel text to only function entries (or jump targets). This is a low granularity forward-edge Control Flow Integrity (CFI) feature, which is important (since it greatly reduces the potential targets that can be used in an attack) and cheap (implemented in hardware). It s a good first step to strong CFI, but (as we ve seen with things like CFG) it isn t usually strong enough to stop a motivated attacker. High granularity CFI (which uses a more specific branch-target characteristic, like function prototypes, to track expected call sites) is not yet a hardware supported feature, but the software version will be coming in the future by way of Clang s CFI implementation
arm64 Shadow Call Stack
Sami Tolvanen landed the kernel implementation of Clang s Shadow Call Stack
), which protects the kernel against Return-Oriented Programming (ROP) attacks (where code gadgets are chained together with returns). This backward-edge CFI protection is implemented by keeping a second dedicated stack pointer register (
) and keeping a copy of the return addresses stored in a separate shadow stack . In this way, manipulating the regular stack s return addresses will have no effect. (And since a copy of the return address continues to live in the regular stack, no changes are needed for back trace dumps, etc.)
It s worth noting that unlike BTI (which is hardware based), this is a software defense that relies on the location of the Shadow Stack (i.e. the value of
) staying secret, since the memory could be written to directly. Intel s hardware ROP defense (CET) uses a hardware shadow stack that isn t directly writable. ARM s hardware defense against ROP is PAC
(which is actually designed as an arbitrary CFI defense it can be used for forward-edge too), but that depends on having ARMv8.3 hardware. The expectation is that SCS will be used until PAC is available.
Kernel Concurrency Sanitizer infrastructure added
Marco Elver landed support for the Kernel Concurrency Sanitizer
, which is a new debugging infrastructure to find data races in the kernel, via
. This immediately found real bugs, with some fixes having already landed
too. For more details, see the KCSAN documentation
Alexey Budankov added
, which is designed to allow access to
. The idea is that this capability gives a process access to only read aspects of the running kernel and system. No longer will access be needed through the much more powerful abilities of
, which has many ways to change kernel internals. This allows for a split between controls over the confidentiality (read access via CAP_PERFMON) of the kernel vs control over integrity (write access via CAP_SYS_ADMIN).
Alexei Starovoitov added
, which is designed to separate BPF access from the all-powerful
. It is designed to be used in combination with
for tracing-like activities and
for networking-related activities. For things that could change kernel integrity (i.e. write access),
is still required.
network random number generator improvements
Willy Tarreau made the network code s random number generator less predictable
. This will further frustrate any attacker s attempts to recover the state of the RNG externally, which might lead to the ability to hijack network sessions (by correctly guessing packet states).
fix various kernel address exposures to non-
I fixed several situations where kernel addresses were still being exposed to unprivileged (i.e. non-
) users, though usually only through odd corner cases. After refactoring how capabilities were being checked
for files in
, the kernel modules sections
, and BPF
exposures got fixed. (Though in doing so, I briefly made things much worse before getting it properly fixed
RISCV W^X detection
Following up on his recent work to enable strict kernel memory protections on RISCV
, Zong Li has now added support for
as seen for other architectures. Any writable and executable memory regions in the kernel (which are lovely targets for attackers) will be loudly noted at boot so they can get corrected.
Eric W. Biederman continued working on
, including getting rid of the frequently problematic recursion
used to locate binary handlers. I used the opportunity to dust off some old
binfmt_script regression tests
and get them into the kernel selftests.
Alexey Gladkov modernized
internals and provided a way to have multiple
mounted in the same PID namespace. This allows for having multiple views of
, with different features enabled. (Including the newly added hidepid=4
Christoph Hellwig, with Eric W. Biederman, Arnd Bergmann, and others, have been diligently working to entirely remove the kernel s
interface, which has long been a source of security flaws due to weird confusions about which address space the kernel thought it should be accessing. Beyond things like the lower-level per-architecture signal handling
code, this has needed to touch various parts
of the ELF
loader, and networking
is no more for native 64-bit
flag was a work-around for dealing with the addition of non-executable (NX) memory when x86_64 was introduced. It was designed as a way to mark a memory region as well, since we don t know if this memory region was expected to be executable, we must assume that if we need to read it, we need to be allowed to execute it too . It was designed mostly for stack memory (where trampoline code might live), but it would carry over into all
allocations, which would mean sometimes exposing a large attack surface to an attacker looking to find executable memory. While normally this didn t cause problems on modern systems that correctly marked their ELF sections as NX, there were still some awkward corner-cases. I fixed this by splitting
from the ELF
marking on x86
, and declaring that a native 64-bit process would never gain
, which matches the behavior of other native 64-bit architectures that correctly didn t ever implement
in the first place.
array index bounds checking continues
As part of the ongoing work to use modern flexible arrays in the kernel, Gustavo A. R. Silva added the
(as a cousin to
). The zero/one-member into flex array conversions
continue with over a hundred commits as we slowly get closer to being able to build with
Chen Zhou joined Takashi Iwai in continuing
to replace potentially unsafe uses of
. Fixing all of these will make sure the kernel avoids nasty buffer concatenation surprises.
That s it for now! Let me know if there is anything else you think I should mention here. Next up: Linux v5.9.
2021, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.