Kees Cook: non-executable kernel memory progress
The Linux kernel attempts to protect portions of its memory from unexpected modification (through potential future exploits) by setting areas read-only where the compiler has allowed it (CONFIG_DEBUG_RODATA). This, combined with marking function pointer tables const , reduces the number of easily writable kernel memory targets for attackers.
However, modules (which are almost the bulk of kernel code) were not handled, and remained read-write, regardless of compiler markings. In 2.6.38, thanks to the efforts of many people (especially Siarhei Liakh and Matthieu Castet), CONFIG_DEBUG_SET_MODULE_RONX was created (and CONFIG_DEBUG_RODATA expanded).
To visualize the effects, I patched Arjan van de Ven s
arch/x86/mm/dump_pagetables.c
to be a loadable module so I could look at /sys/kernel/debug/kernel_page_tables
without needing to rebuild my kernel with CONFIG_X86_PTDUMP.
Comparing Lucid (2.6.32), Maverick (2.6.35), and Natty (2.6.38), it s clear to see the effects of the RO/NX improvements, especially in the Modules section which has no NX markings at all before 2.6.38:
lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables grep NX wc -l 0 maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables grep NX wc -l 0 natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables grep NX wc -l 762.6.38 s memory region is much more granular, since each module has been chopped up for the various segment permissions:
lucid-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables wc -l 53 maverick-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables wc -l 67 natty-amd64# awk '/Modules/,/End Modules/' /sys/kernel/debug/kernel_page_tables wc -l 155For example, here s the large sunrpc module. RW is read-write, ro is read-only, x is executable, and NX is non-executable:
maverick-amd64# awk '/^'$(awk '/^sunrpc/ print $NF ' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables 0xffffffffa005d000-0xffffffffa0096000 228K RW GLB x pte 0xffffffffa0096000-0xffffffffa0098000 8K pte natty-amd64# awk '/^'$(awk '/^sunrpc/ print $NF ' /proc/modules)'/','!/GLB/' /sys/kernel/debug/kernel_page_tables 0xffffffffa005d000-0xffffffffa007a000 116K ro GLB x pte 0xffffffffa007a000-0xffffffffa0083000 36K ro GLB NX pte 0xffffffffa0083000-0xffffffffa0097000 80K RW GLB NX pte 0xffffffffa0097000-0xffffffffa0099000 8K pteThe latter looks a whole lot more like a proper ELF (text segment is read-only and executable, rodata segment is read-only and non-executable, and data segment is read-write and non-executable). Just another reason to make sure you re using your CPU s NX bit (via 64bit or 32bit-PAE kernels)! (And no, PAE is not slower in any meaningful way.)