Search Results: "Arnd Bergmann"

9 February 2021

Kees Cook: security things in Linux v5.8

Previously: v5.7 Linux v5.8 was released in August, 2020. Here s my summary of various security things that caught my attention: arm64 Branch Target Identification
Dave Martin added support for ARMv8.5 s Branch Target Instructions (BTI), which are enabled in userspace at execve() time, and all the time in the kernel (which required manually marking up a lot of non-C code, like assembly and JIT code). With this in place, Jump-Oriented Programming (JOP, where code gadgets are chained together with jumps and calls) is no longer available to the attacker. An attacker s code must make direct function calls. This basically reduces the usable code available to an attacker from every word in the kernel text to only function entries (or jump targets). This is a low granularity forward-edge Control Flow Integrity (CFI) feature, which is important (since it greatly reduces the potential targets that can be used in an attack) and cheap (implemented in hardware). It s a good first step to strong CFI, but (as we ve seen with things like CFG) it isn t usually strong enough to stop a motivated attacker. High granularity CFI (which uses a more specific branch-target characteristic, like function prototypes, to track expected call sites) is not yet a hardware supported feature, but the software version will be coming in the future by way of Clang s CFI implementation. arm64 Shadow Call Stack
Sami Tolvanen landed the kernel implementation of Clang s Shadow Call Stack (SCS), which protects the kernel against Return-Oriented Programming (ROP) attacks (where code gadgets are chained together with returns). This backward-edge CFI protection is implemented by keeping a second dedicated stack pointer register (x18) and keeping a copy of the return addresses stored in a separate shadow stack . In this way, manipulating the regular stack s return addresses will have no effect. (And since a copy of the return address continues to live in the regular stack, no changes are needed for back trace dumps, etc.) It s worth noting that unlike BTI (which is hardware based), this is a software defense that relies on the location of the Shadow Stack (i.e. the value of x18) staying secret, since the memory could be written to directly. Intel s hardware ROP defense (CET) uses a hardware shadow stack that isn t directly writable. ARM s hardware defense against ROP is PAC (which is actually designed as an arbitrary CFI defense it can be used for forward-edge too), but that depends on having ARMv8.3 hardware. The expectation is that SCS will be used until PAC is available. Kernel Concurrency Sanitizer infrastructure added
Marco Elver landed support for the Kernel Concurrency Sanitizer, which is a new debugging infrastructure to find data races in the kernel, via CONFIG_KCSAN. This immediately found real bugs, with some fixes having already landed too. For more details, see the KCSAN documentation. new capabilities
Alexey Budankov added CAP_PERFMON, which is designed to allow access to perf(). The idea is that this capability gives a process access to only read aspects of the running kernel and system. No longer will access be needed through the much more powerful abilities of CAP_SYS_ADMIN, which has many ways to change kernel internals. This allows for a split between controls over the confidentiality (read access via CAP_PERFMON) of the kernel vs control over integrity (write access via CAP_SYS_ADMIN). Alexei Starovoitov added CAP_BPF, which is designed to separate BPF access from the all-powerful CAP_SYS_ADMIN. It is designed to be used in combination with CAP_PERFMON for tracing-like activities and CAP_NET_ADMIN for networking-related activities. For things that could change kernel integrity (i.e. write access), CAP_SYS_ADMIN is still required. network random number generator improvements
Willy Tarreau made the network code s random number generator less predictable. This will further frustrate any attacker s attempts to recover the state of the RNG externally, which might lead to the ability to hijack network sessions (by correctly guessing packet states). fix various kernel address exposures to non-CAP_SYSLOG
I fixed several situations where kernel addresses were still being exposed to unprivileged (i.e. non-CAP_SYSLOG) users, though usually only through odd corner cases. After refactoring how capabilities were being checked for files in /sys and /proc, the kernel modules sections, kprobes, and BPF exposures got fixed. (Though in doing so, I briefly made things much worse before getting it properly fixed. Yikes!) RISCV W^X detection
Following up on his recent work to enable strict kernel memory protections on RISCV, Zong Li has now added support for CONFIG_DEBUG_WX as seen for other architectures. Any writable and executable memory regions in the kernel (which are lovely targets for attackers) will be loudly noted at boot so they can get corrected. execve() refactoring continues
Eric W. Biederman continued working on execve() refactoring, including getting rid of the frequently problematic recursion used to locate binary handlers. I used the opportunity to dust off some old binfmt_script regression tests and get them into the kernel selftests. multiple /proc instances
Alexey Gladkov modernized /proc internals and provided a way to have multiple /proc instances mounted in the same PID namespace. This allows for having multiple views of /proc, with different features enabled. (Including the newly added hidepid=4 and subset=pid mount options.) set_fs() removal continues
Christoph Hellwig, with Eric W. Biederman, Arnd Bergmann, and others, have been diligently working to entirely remove the kernel s set_fs() interface, which has long been a source of security flaws due to weird confusions about which address space the kernel thought it should be accessing. Beyond things like the lower-level per-architecture signal handling code, this has needed to touch various parts of the ELF loader, and networking code too. READ_IMPLIES_EXEC is no more for native 64-bit
The READ_IMPLIES_EXEC flag was a work-around for dealing with the addition of non-executable (NX) memory when x86_64 was introduced. It was designed as a way to mark a memory region as well, since we don t know if this memory region was expected to be executable, we must assume that if we need to read it, we need to be allowed to execute it too . It was designed mostly for stack memory (where trampoline code might live), but it would carry over into all mmap() allocations, which would mean sometimes exposing a large attack surface to an attacker looking to find executable memory. While normally this didn t cause problems on modern systems that correctly marked their ELF sections as NX, there were still some awkward corner-cases. I fixed this by splitting READ_IMPLIES_EXEC from the ELF PT_GNU_STACK marking on x86 and arm/arm64, and declaring that a native 64-bit process would never gain READ_IMPLIES_EXEC on x86_64 and arm64, which matches the behavior of other native 64-bit architectures that correctly didn t ever implement READ_IMPLIES_EXEC in the first place. array index bounds checking continues
As part of the ongoing work to use modern flexible arrays in the kernel, Gustavo A. R. Silva added the flex_array_size() helper (as a cousin to struct_size()). The zero/one-member into flex array conversions continue with over a hundred commits as we slowly get closer to being able to build with -Warray-bounds. scnprintf() replacement continues
Chen Zhou joined Takashi Iwai in continuing to replace potentially unsafe uses of sprintf() with scnprintf(). Fixing all of these will make sure the kernel avoids nasty buffer concatenation surprises. That s it for now! Let me know if there is anything else you think I should mention here. Next up: Linux v5.9.

2021, Kees Cook. This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 License.
CC BY-SA 4.0

3 December 2016

Ben Hutchings: Linux Kernel Summit 2016, part 1

I attended this year's Linux Kernel Summit in Santa Fe, NM, USA and made notes on some of the sessions that were relevant to Debian. LWN also reported many of the discussions. This is the first of two parts of my notes; part 2 is here. Stable process Jiri Kosina, in his role as a distribution maintainer, sees too many unsuitable patches being backported - e.g. a fix for a bug that wasn't present or a change that depends on an earlier semantic change so that when cherry-picked it still compiles but isn't quite right. He thinks the current review process is insufficient to catch them. As an example, a recent fix for a minor information leak (CVE-2016-9178) depended on an earlier change to page fault handling. When backported by itself, it introduced a much more serious security flaw (CVE-2016-9644). This could have been caught very quickly by a system call fuzzer. Possible solutions: require 'Fixes' field, not just 'Cc: stable'. Deals with 'bug wasn't present', but not semantic changes. There was some disagreement whether 'Fixes' without 'Cc: stable' should be sufficient for inclusion in stable. Ted Ts'o said he specifically does that in some cases where he thinks backporting is risky. Greg Kroah-Hartman said he takes it as a weaker hint for inclusion in stable. Is it a good idea to keep 'Cc: stable' given the risk of breaking embargo? On balance, yes, it only happened once. Sometimes it's hard to know exactly how/when the bug was introduced. Linus doesn't want people to guess and add incorrect 'Fixes' fields. There is still the option to give some explanation and hints for stable maintainers in the commit message. Ideally the upstream developer should provide a test case for the bug. Is Linus happy? Linus complained about minor fixes coming later in the release cycle. After rc2, all fixes should either be for new code introduced in the current release cycle or for important bugs. However, new, production-ready drivers without new infrastructure dependencies are welcome at almost any point in the release cycle. He was unhappy about some big changes in RDMA, but I'm not sure what those were. Bugzilla and bug tracking Laura Abbott started a discussion of bugzilla.kernel.org, talking about subsystems where maintainers ignore it and any responses come from random people giving bad advice. This is a terrible experience for users. Several maintainers are actively opposed to using it, and the email bridge no longer works (or not well?). She no longer recommends Fedora bug submitters to submit reports there. Are there any alternatives? None were proposed. Someone asked whether Bugzilla could tell reporters to use email for certain products/components instead of continuing with the bug entry process. Konstantin Ryabitsev talked about the difficulty of upgrading a customised instance of Bugzilla. Much customisation requires patches which don't apply to next version (maybe due to limitations of the extension mechanism?). He has had to drop many such patches. Email is hard to track when a bug is handed over from one maintainer to another. Email archives are very unreliable. Linus: I'll take Bugzilla over mail-archive. No-one is currently keeping track of bugs across the kernel and making sure they get addressed by an appropriate maintainer. It's (at least) a full-time job but no individual company has business case for paying for this. Konstantin suggested (I think) that CII might pay for this. There was some discussion of what information should be included in a bug report. The Cut here line in oops messages was said to be a mistake because there are often relevant messages before it. The model of computer is often important. Beyond that, there was not much interest in the automated information gathering that distributions do. Distribution maintainers should curate bugs before forwarding upstream. There was a request for custom fields per component in Bugzilla. Konstantin says this is doable (possibly after upgrade to version 5); it doesn't require patches. The future of the Kernel Summit The kernel community is growing, and the invitation list for the core day is too small to include all the right people for technical subjects. For 2017, the core half-day will have an even smaller invitation list, only ~30 subsystem maintainers that Linus pulls from. The entire technical track will be open (I think). Kernel Summit 2017 and some mini-summits will be held in Prague alongside Open Source Summit Europe (formerly LinuxCon Europe) and Embedded Linux Conference Europe. There were some complaints that LinuxCon is not that interesting to kernel developers, compared to Linux Plumbers Conference (which followed this year's Kernel Summit). However, the Linux Foundation is apparently soliciting more hardcore technical sessions. Kernel Summit and Linux Plumbers Conference are quite small, and it's apparently hard to find venues for them in cities that also have major airports. It might be more practical to co-locate them both with Open Source Summit in future. time_t and 2038 On 32-bit architectures the kernel's representation of real time (time_t etc.) will break in early 2038. Fixing this in a backward-compatible way is a complex problem. Arnd Bergmann presented the current status of this process. There has not yet been much progress in mainline, but more fixes have been prepared. The changes to struct inode and to input events are proving to be particularly problematic. There is a need to add new system calls, and he intends to add these for all (32-bit) achitectures at once. Copyright retention James Bottomley talked about how developers can retain copyright on their contributions. It's hard to renegotiate within an existing employment; much easier to do this when preparing to sign a new contract. Some employers expect you to fill in a document disclosing 'prior inventions' you have worked on. Depending on how it's worded, this may require the employer to negotiate with you again whenever they want you to work on that same software. It's much easier for contractors to retain copyright on their work - customers expect to have a custom agreement and don't expect to get copyright on contractor's software.

22 July 2013

Matthew Garrett: ARM and firmware specifications

Jon Masters, Chief ARM Architect at Red Hat, recently posted a description of his expectations for baseline arm64 servers. The quick summary is that systems should implement UEFI and ACPI, and any more traditional ARM boot mechanisms should be ignored. This is an interesting departure from the status quo in the ARM world, and it's worth thinking about the benefits and drawbacks of this approach.

It's very easy to build a generic kernel for most x86 systems, since the PC platform is fairly well defined even if not terribly well specified. Where system hardware does vary, it's almost always exposed on an enumerable bus (such as PCI or USB) which allows the OS to bind appropriate drivers. Things are different in the ARM world. Even once you're past the point of different SoC vendors requiring different kernel setup code and drivers, you still have to cope with the fact that system vendors can wire these SoCs up very differently. Hardware is often attached via GPIO lines without any means to enumerate them. The end result is that you've traditionally needed a different kernel for every ARM board. This is viable if you're selling the OS and hardware as a single product, but less viable if there's any desire to run a generic OS on the hardware.

The solution that's been adopted for this in the Linux world is called Device Tree. Device Tree actually has significant history, having been used as the device descriptor format in Open Firmware. Since there was already support for it in the Linux kernel, adapting it for use in ARM devices was straightforward. Device Tree aware devices can pass a descriptor blob to the kernel at startup[1], and devices without that knowledge can have a blob build into the kernel.

So, if this problem is already solved, why the push to move to UEFI and ACPI? This push didn't actually originate in the Linux world - Microsoft mandate that Windows RT devices implement UEFI and ACPI, and were they to launch a Windows ARM server product would probably carry that over. That makes sense for Microsoft, since recent versions of Windows have been x86 only and so have grown all kinds of support for ACPI and UEFI. Supporting Device Tree would require Microsoft to rewrite large parts of Windows, whereas mandating UEFI and ACPI allowed them to reuse most of their existing Windows boot and driver code. As a result, largely at Microsoft's behest, ACPI 5 has grown a range of additional features for describing things like GPIO pinouts and I2C connections. Whatever your weird device layout, you can probably express it via ACPI.

This argument works less well for Linux. Linux already supports Device Tree, whereas it currently doesn't support ACPI or UEFI on ARM[2]. Hardware vendors are already used to working with Device Tree. Moving to UEFI and ACPI has the potential to uncover a range of exciting new kernel issues and vendor bugs. It's not obviously an engineering win.

So how about users? There's an argument that since server vendors are now mostly shipping ACPI and UEFI systems, having ARM support these technologies makes it easier for customers to replace x86 systems with ARM systems. This really doesn't fly for ACPI, which is entirely invisible to the user. There are no standard ACPI entry points for system configuration, and the bits of ACPI that are generically useful (such as configuring system wakeup times) are already abstracted away to a standard interface by the kernel. It's somewhat more compelling for UEFI. UEFI supports a platform-independent bytecode language (EFI Byte Code, or EBC), which means that customers can write their own system management utilities, build them for EBC and then deploy them to their servers without caring about whether they're x86 or ARM. Want a bootloader that'll hit an internal HTTP server in order to determine which system image to deploy, and which works on both x86 and ARM? Straightforward.

Arnd Bergmann has a interesting counterargument. In a nutshell, ARM servers aren't currently aiming for the same market as x86 servers, and as a result customers are unlikely to gain any significant benefit from shared functionality between the two.

So if there's no real benefit to users, and if there's no benefit to kernel developers, what's the point? The main one that springs to mind is that there is a benefit to distributions. Moving to UEFI means that there's a standard mechanism for distributions to interact with the firmware and configure the bootloader. The traditional ARM approach has been for vendors to ship their own version of u-boot. If that's in flash then it's not much of a problem[3], but if it's on disk then you have to ship a range of different bootloaders and know which one to install (and let's not even talk about initial bootstrapping).

This seems like the most compelling argument. UEFI provides a genuine benefit for distributions, and long term it probably provides some benefit to customers. The question is whether that benefit is worth the flux. The same distribution benefit could be gained by simply mandating a minimum set of u-boot functionality, which would seem much more straightforward. The customer benefit is currently unclear.

In the end it'll probably be a market decision. If Red Hat produce an ARM product that has these requirements, and if Suse produce an ARM product that will work with u-boot and Device Tree, it'll be up to vendors to decide whether the additional work to support UEFI/ACPI is worth it in order to be able to sell to customers who want Red Hat. I expect that large vendors like HP and Dell will probably do it, but the smaller ones may not. The customer demand issue is also going to be unclear until we learn whether using UEFI is something that customers actually care about, rather than a theoretical benefit.

Overall, I'm on the fence as to whether a UEFI requirement is going to stick, and I suspect that the ACPI requirement is tilting at windmills. There's nothing stopping vendors from providing a Device Tree blob from UEFI, and I can't think of any benefits they gain from using ACPI instead. Vendor interest in the generic parts of the ACPI spec has been tepid even in the x86 world (the vast majority of ACPI spec updates come from Microsoft and Intel, not any of the system vendors), and I don't see that changing with the introduction of a range of ARM vendors who are already happy with Device Tree.

We'll see. Linux is going to need to gain the support for UEFI and ACPI on ARM in any case, since there's already hardware shipping in that configuration. But with ARM vendors still getting to grips with Device Tree, forcing them to go through another change in how they do things is going to be hard work. Red Hat may be successful in enforcing these requirements at the cost of some vendor unhappiness, or Red Hat may find that their product doesn't boot on most of the available hardware. It's an aggressive gamble, and while it'll be interesting to see how it plays out, I'm not that optimistic.

[1] The blob could be pulled from the firmware, but it's not uncommon for it to be built into u-boot instead. This does mean that you have a device-specific u-boot even if you have a generic kernel, but that's typically true anyway.
[2] Patches have been posted for ARM UEFI support. They're not mergeable in their current form, but they should be in the near future. ACPI support is in development.
[3] Although not all u-boots are created equal - some vendors ship versions that will only boot off FAT, some vendors ship versions that will only boot off ext2. Having to special case this stuff in your installer is a pain.

comment count unavailable comments

24 April 2009

Jon Dowland: my first attempt at hacking on Linux: a story

I have a little job that runs on my web server and tells me what URIs people have tried to fetch over the last 24 hours which were not successful. One of these URIs was for a page that used to contain my first ever Linux device driver. That device driver is now obsolete and so there was little point in tracking down the code and putting it back in place, but I did think there was a story to tell, so today I wrote down what happened with this driver at the URI. Thanks to the magic of ikiwiki's "inline" directive, here it is in the log: This is a story about my first real attempt to contribute to the Linux kernel. The Davicom DM9601 is a cheap USB NIC that I picked up on ebay at some point in late 2006. Suprisingly, at the time, there was no driver for it in the Linux kernel. I'd had some experience hacking on the kernel whilst working for IBM and I've always wanted to get more involved, so I thought this would be a good opportunity. I searched around and managed to find a vendor-written driver for the 2.4.x kernel series, but nothing newer. I felt that the best approach would be to start porting this driver to 2.6, improving it along the way. I got to work, and after several hours I had something that would compile against 2.6 series headers, at least, so I put my progress online and wrote a short log post about it (dm9601). A really invaluable resource whilst doing this was LWN.net's "Porting device drivers to the 2.6 kernel". Once I had the code up there, kernel hacker Arnd Bergmann got in touch with me. He said, quite frankly, that my driver had a long way to go, but that he would be willing to help mentor me and guide me in improving it so that it would make it into mainline. This was fantastic news: this is exactly what people need when trying to get up to speed with the customs of something as complex as the kernel development community. Arnd sent me a long list of things I needed to fix. Whilst I had my code online I had several other people get in touch with me and provide things like device IDs for their NICs that they needed to add in order to get it to work for their hardware. Several weeks later, after implementing a good chunk of Arnd's suggestions, I had a driver that worked, at least a little bit, with 2.6, so I got back in touch with Arnd. Bad news, for me at least: Peter Korsgaard, another kernel hacker, had implemented a dm9601 driver from scratch over a weekend. Apparently he was at a friend's house, his friend had this device, he noticed there wasn't a driver in the kernel already and sat down to write it as a fun project. From scratch. In one weekend. This was a bit of a blow. I couldn't imagine having the skill and knowledge required to be able to pump out a driver like that, but I knew the first step on the road was what I was doing, and that road was now closed. I couldn't really blame Peter, I could only really feel quite jealous at his prowess, so that was the end of that. His driver was much leaner, much tidier, fitted the kernel programming style properly, and was thus accepted. However, for my hardware at least, it didn't actually work, and he didn't have access to the device after that one weekend, so he wasn't in a position to fix it. This situation lasted about 4 months from 2007-02-23, when Peter's driver was accepted, until about 2007-06-27 with this patch. From this whole experience, I did manage to get one small patch into the kernel: one which added the device IDs for my NIC and those of the people who had got in touch with me.