Wouter Verhelst: NBD: Write Zeroes and Rotational
The NBD
protocol
has grown a number of new features over the years. Unfortunately, some
of those features are not (yet?) supported by the Linux kernel.
I suggested a few times over the years that the maintainer of the NBD
driver in the kernel, Josef Bacik, take a look at these features, but he
hasn't done so; presumably he has other priorities. As with anything in
the open source world, if you want it done you must do it yourself.
I'd been off and on considering to work on the kernel driver so that I
could implement these new features, but I never really got anywhere.
A few months ago, however, Christoph Hellwig posted a patch
set that reworked a
number of block device drivers in the Linux kernel to a new type of API.
Since the NBD mailinglist is listed in the kernel's MAINTAINERS file,
this patch series were crossposted to the NBD mailinglist, too, and when
I noticed that it explicitly disabled the "rotational" flag on the NBD
device, I
suggested to
Christoph that perhaps "we" (meaning, "he") might want to vary the
decision on whether a device is rotational depending on whether the NBD
server signals, through the flag that exists for that very purpose,
whether the device is rotational.
To which he replied "
Can you send a
patch
".
That got me down the rabbit hole, and now, for the first time in the 20+
years of being a C programmer who uses Linux exclusively, I got a patch
merged into the
Linux kernel...
twice.
So, what do these things do?
The first patch adds support for the ROTATIONAL flag. If the NBD server
mentions that the device is rotational, it will be treated as such, and
the elevator
algorithm will be
used to optimize accesses to the device. For the reference
implementation, you can do
this by adding a line "rotational =
true"
to the relevant section (relating to the export where you want it to be
used) of the config file.
It's unlikely that this will be of much benefit in most cases (most
nbd-server installations will be exporting a file on a filesystem and
have the elevator algorithm implemented server side and then it doesn't
matter whether the device has the rotational flag set), but it's there
in case you wish to use it.
The second set of patches adds support for the WRITE_ZEROES
command.
Most devices these days allow you to tell them "please write a N zeroes
starting at this offset", which is a lot more efficient than sending
over a buffer of N zeroes and asking the device to do DMA to copy
buffers etc etc for just zeroes.
The NBD protocol has supported its own WRITE_ZEROES
command for a
while now, and hooking it up was reasonably simple in the end. The only
problem is that it expects length values in bytes, whereas the kernel
uses it in blocks. It took me a few tries to get that right -- and then
I also fixed up handling of discard messages, which required the same
conversion.