OverloadedRecordFields
.
At that point, there was no secretary role yet, so how I did become one? It seems that in February 2017 I started to clean-up and refine the process documentation, fixing bugs in the process (like requiring authors to set Github labels when they don t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the email in which I described that role I wrote:
Simon already shoved me towards picking up the secretary hat, to reduce load on Ben.So when I merged the updated process documentation, I already listed myself secretary . It wasn t just Simon s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:
I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here. I am rather content with the result: These three ingredients single secretary, per-proposal shepherds, silence-is-consent helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.
BlockArguments
, do
notation with ( foo)
expressions and Unicode), I m still around, hosting the Haskell Interlude Podcast, writing on this blog and hanging out at ZuriHac etc.
Well, I mean they re aliens, so I don t know if I wanna tell them much.(Parents laughing in the background.) Let s assume they re friendly aliens.
Well, I would say you can look anything up and play different games. And there are alien games. But mostly the enemies are aliens which you might be a little offended by. And you can get work done, if you needed to spy on humans. There s cameras, you can film yourself, yeah. And you can text people and call people who are far awayAnd what would be in a drawing that would explain the internet? And here s what he explains about his drawing:
First, I would draw what I see when you open a new tab, Google.On the right side of the drawing we see something like Twitch.
I don t wanna offend the aliens, but you can film yourself playing a game, so here is the alien and he s playing a game.
And then you can ask questions like: How did aliens come to the Earth? And the answer will be here (below). And there ll be different websites that you can click on.
And you can also look up Who won the alien contest? And that would be Usmushgagu, and that guy won the alien contest.Do you think the information about alien intergalactic football is already on the internet?
Yeah! That s how fast the internet is.On the bottom of the drawing we see an iPhone and an instant messaging software.
There s also a device called an iPhone and with it you can text your friends. So here s the alien asking: How was ur day? and the friend might answer IDK [I don t know].Imagine that a wise and friendly dragon could teach you one thing about the internet that you ve always wanted to know. What would you ask the dragon to teach you about?
Is there a way you don t have to pay for any channels or subscriptions and you can get through any firewall?Imagine you could make the internet better for everyone. What would you do first?
Well you wouldn t have to pay for it [paywalls].Can you describe what happens between your device and a website when you visit a website?
Well, it takes 0.025 seconds. [ ] It s connecting.Wow, that s indeed fast! We were not able to obtain more details about what is that fast thing that s happening exactly
My dad knows everything.The kid has a laptop and a mobile phone, both with parental control they don t think that the controlling is fair. This kid uses the internet foremostly for listening to music and watching prank channels on Youtube but also to work with Purple Mash (a teaching platform for the computing curriculum used at their school), finding 3d printing models (that they ask their father to print with them because they did not manage to use the printer by themselves yet). Interestingly, and very differently from the non-tech-parent kids, this kid insists on using Firefox and Signal - the latter is not only used by their dad to tell them to come downstairs for dinner, but also to call their grandmother. This kid also shops online, with the help of the father who does the actual shopping for them using money that the kid earned by reading books. If you would need to explain to an alien who has landed on Earth what the internet is, what would you tell them?
The internet is something where you search, for example, you can look for music. You can also watch videos from around the world, and you can program stuff.Like most of the kids interviewed, this kid uses the internet mostly for media consumption, but with the difference that they also engage with technology by way of programming using Purple Mash. In their drawing we see a Youtube prank channel on a screen, an external trackpad on the right (likely it s not a touch screen), and headphones. Notice how there is no keyboard, or maybe it s folded away. If you could ask a nice and friendly dragon anything you d like to learn about the internet, what would it be?
How do I shutdown my dad s computer forever?And what is it that he would do to improve the internet for everyone? Contrary to the kid living in the US, they think that
It takes too much time to load stuff!I wonder if this kid experiences the internet as being slow because they use the mobile network or because their connection somehow gets throttled as a way to control media consumption, or if the German internet infrastructure is just so much worse in certain regions If you could improve the internet for everyone, what would you do first? I d make a new Firefox app that loads the internet much faster.
[The internet] is something that you can [use to] see someone who is far away, so that you don t need to take time to get to them.Now, that s a great explanation, the internet providing the possibility for communication over a distance :) If she could ask a friendly dragon something she always wanted to know, she d ask how to make her phone come alive:
that it can talk to you, that it can see you, that it can smile and has eyes. It s like a new family member, you can talk to it.Sounds a bit like Siri, Alexa, or Furby, doesn t it? If you could improve the internet for everyone, what would you do first? She d have the phone be able to decide over her free time, her phone time. That would make the world better, not for the kids, but certainly for the parents.
An invisible world. A virtual world. But there s also the darknet.He told me he always watches that German show on public TV for kids that explains stuff: Checker Tobi. (In 2014, Checker Tobi actually produced an episode about the internet, which I d criticize for having only male characters, except for one female character: a secretary Google, a nice and friendly woman guiding the way through the huge library that s the internet ) This kid was the only one interviewed who managed to actually explain something about the internet, or rather about the hypertextual structure of the web. When I asked him to draw the internet, he made a drawing of a pin board. He explained:
Many items are attached to the pin board, and on the top left corner there s a computer, for example with Youtube and one can navigate like that between all the items, and start again from the beginning when done.When I asked if he knew what actually happens between the device and a website he visits, he put forth the hypothesis of the existence of some kind of
Waves, internet waves - all this stuff somehow needs to be transmitted.What he d like to learn:
How to get into the darknet? How do you become a Whitehat? I ve heard these words on the internet, the internet makes me clever.And what would he change on the internet if he could?
I want that right wing extreme stuff is not accessible anymore, or at least, that it rains turds ( Kackw rste ) whenever people watch such stuff. Or that people are always told: This video is scum.I suspect that his father has been talking with him about these things, and maybe these are also subjects he heard about when listening to punk music (he told me he does), or browsing Youtube.
Multi-Arch: same
file loss. It was found that the proposed
mitigation for ineffective diversions does
not work as expected. Trying to fix it up resulted in more problems, some of
which remain unsolved as of this writing.
Initial work on moving shared libraries in the essential set has been done.
Meanwhile, the wider Debian community worked on fixing all known
Multi-Arch: same
file loss scenarios. This work is now being driven by
Christian Hofstaedler and during the Mini DebConf in Cambridge, Chris Boot,
tienne Mollier, Miguel Landaeta, Samuel Henrique, and Utkarsh Gupta sent
the other half of the necessary patches.
/dev/fd/N
from fuse3
to fuse2
.piuparts
.amd64
, arm64
, armhf
, i386
, ppc64el
, riscv64
and s390
for Debian trixie, unstable and experimental, this is only around 500GB ie. less than 1%. Although the new service not yet ready for usage, it has already provided a promising outlook in this regard. More information is available on https://rebuilder-snapshot.debian.net and we hope that this service becomes usable in the coming weeks.
The adjacent picture shows a sticky note authored by Jan-Benedict Glaw at the summit in Hamburg, confirming Holger Levsen s theory that rebuilding all Debian packages needs a very small subset of packages, the text states that 69,200 packages (in Debian sid) list 24,850 packages in their .buildinfo
files, in 8,0200 variations. This little piece of paper was the beginning of rebuilder-snapshot and is a direct outcome of the summit!
The Reproducible Builds team would like to thank our event sponsors who include Mullvad VPN, openSUSE, Debian, Software Freedom Conservancy, Allotropia and Aspiration Tech.
[ ] introduce the concepts of Reproducible Builds, including best practices for developing and releasing software, the tools available to help diagnose issues, and touch on progress towards solving decades-old deeply pervasive fundamental security issues Learn how to verify and demonstrate trust, rather than simply hoping everything is OK!Germane to the contents of the talk, the slides for Vagrant s talk can be built reproducibly, resulting in a PDF with a SHA1 of
cfde2f8a0b7e6ec9b85377eeac0661d728b70f34
when built on Debian bookworm and c21fab273232c550ce822c4b0d9988e6c49aa2c3
on Debian sid at the time of writing.
[ ] today I hold in my hands the first two bit-identical LibreOffice rpm packages. And this is the success I wanted to share with you all today [and] it makes me feel as if we can solve anything.
esp32c3
microcontroller firmware reproducible with Rust, repro-env and Arch Linux:
I chose theesp32c3
[board] because it has good Rust support from theesp-rs
project, and you can get a dev board for about 6-8 . To document my build environment I usedrepro-env
together with Arch Linux because its archive is very reliable and contains all the different Rust development tools I needed.
dump
command and hopes that someone may be able to help.
amd64
, arm64
, i386
and armhf
architectures, data is collected from the Reproducible Builds testing framework is collected by this migration software even though, at the time of writing, it neither causes nor migration bonuses nor blocks migration. Indeed, the information only results are visible on Britney s excuses as well as on individual packages pages on tracker.debian.org.
.buildinfo
files
Back in 2017, Steve Langasek filed a bug against Ubuntu s Launchpad code hosting platform to report that .changes
files (artifacts of building Ubuntu and Debian packages) reference .buildinfo
files that aren t actually exposed by Launchpad itself. This was causing issues when attempting to process .changes
files with tools such as Lintian. However, it was noticed last month that, in early August of this year, Simon Quigley had resolved this issue, and .buildinfo
files are now available from the Launchpad system.
composer.lock
file, ensuring total reproducibility of the shipped binary file. Further details and the discussion that went into their particular implementation can be found on the associated GitHub pull request.
In addition, the presentation Leveraging Nix in the PHP ecosystem has been given in late October at the PHP International Conference in Munich by Pol Dellaiera. While the video replay is not yet available, the (reproducible) presentation slides and speaker notes are available.
7z
. [ ]RequiredToolNotFound
import. [ ]252
to Debian unstable. [ ]SOURCE_DATE_EPOCH
and CMake [ ], added iomart (ne Bytemark) and DigitalOcean to our sponsors page [ ] and dropped an unnecessary link on some horizontal navigation buttons [ ].
amber-cli
(date-related issue)bin86
(FTBFS-2038)buildah
(timestamp)colord
(CPU)google-noto-fonts
(file modification issue)grub2
(directory-related metadata)guile-fibers
(parallelism issue)guile-newt
(parallelism issue)gutenprint
(embedded date/hostname)hub
(random build path)ipxe
(nondeterministic behavoiour)joker
/ joker
kopete
(undefined behaviour)kraft
(embedde hostname)libcamera
(signature)libguestfs
(embeds build host file)llvm
(toolchain/Rust-related issue)nfdump
(date-related issue)ovmf
(unknown cause)quazip
(missing fonts)rdflib
(nondeterminstic behaviour)rpm
(toolchain)tigervnc
(embedded an RSA signature)whatsie
(date-related issue)xen
(time-related issue)policycoreutils
(sort-related issue)python-ansible-pygments
.bidict
.meson
.radsecproxy
.taffybar
.php-doc
.pelican
.maildir-utils
.openmrac-data
.vectorscan
.Priority: important
in a new package set. [ ][ ]pool_buildinfos
script to be re-run for a specific year. [ ]osuosl4
node [ ][ ] along with lynxis [ ].amd64
Ionos builders from 48 GiB to 64 GiB; thanks IONOS! [ ]arm64
architecture workers from 24 to 16 in order to improve stability [ ], reduce the workers for amd64
from 32 to 28 and, for i386
, reduce from 12 down to 8 [ ].cache_dir
size setting to 16 GiB. [ ]systemd-oomd
as it unfortunately kills sshd
[ ]debootstrap
from backports when commisioning nodes. [ ]live_build_debian_stretch_gnome
, debsums-tests_buster
and debsums-tests_buster
jobs to the zombie list. [ ][ ]jekyll build
with the --watch
argument when building the Reproducible Builds website. [ ]rc.local
s Bash syntax so it can actually run [ ], commenting away some file cleanup code that is (potentially) deleting too much [ ] and fixing the html_brekages
page for Debian package builds [ ]. Finally, diagnosed and submitted a patch to add a AddEncoding gzip .gz
line to the tests.reproducible-builds.org Apache configuration so that Gzip files aren t re-compressed as Gzip which some clients can t deal with (as well as being a waste of time). [ ]
#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
Photo by Pixabay |
Given a typical install of 3 generic kernel ABIs in the default configuration on a regular-sized VM (2 CPU cores 8GB of RAM) the following metrics are achieved in Ubuntu 23.10 versus Ubuntu 22.04 LTS:
2x less disk space used (1,417MB vs 2,940MB, including initrd)
3x less peak RAM usage for the initrd boot (68MB vs 204MB)
0.5x increase in download size (949MB vs 600MB)
2.5x faster initrd generation (4.5s vs 11.3s)
approximately the same total time (103s vs 98s, hardware dependent)
For minimal cloud images that do not install either linux-firmware or modules extra the numbers are:
1.3x less disk space used (548MB vs 742MB)
2.2x less peak RAM usage for initrd boot (27MB vs 62MB)
0.4x increase in download size (207MB vs 146MB)
Hopefully, the compromise of download size, relative to the disk space & initrd savings is a win for the majority of platforms and use cases. For users on extremely expensive and metered connections, the likely best saving is to receive air-gapped updates or skip updates.
This was achieved by precompressing kernel modules & firmware files with the maximum level of Zstd compression at package build time; making actual .deb files uncompressed; assembling the initrd using split cpio archives - uncompressed for the pre-compressed files, whilst compressing only the userspace portions of the initrd; enabling in-kernel module decompression support with matching kmod; fixing bugs in all of the above, and landing all of these things in time for the feature freeze. Whilst leveraging the experience and some of the design choices implementations we have already been shipping on Ubuntu Core. Some of these changes are backported to Jammy, but only enough to support smooth upgrades to Mantic and later. Complete gains are only possible to experience on Mantic and later.
The discovered bugs in kernel module loading code likely affect systems that use LoadPin LSM with kernel space module uncompression as used on ChromeOS systems. Hopefully, Kees Cook or other ChromeOS developers pick up the kernel fixes from the stable trees. Or you know, just use Ubuntu kernels as they do get fixes and features like these first.
The team that designed and delivered these changes is large: Benjamin Drung, Andrea Righi, Juerg Haefliger, Julian Andres Klode, Steve Langasek, Michael Hudson-Doyle, Robert Kratky, Adrien Nader, Tim Gardner, Roxana Nicolescu - and myself Dimitri John Ledkov ensuring the most optimal solution is implemented, everything lands on time, and even implementing portions of the final solution.
Hi, It's me, I am a Staff Engineer at Canonical and we are hiring https://canonical.com/careers.
Lots of additional technical details and benchmarks on a huge range of diverse hardware and architectures, and bikeshedding all the things below:
release
elements that reference downloadable data without an artifact
block, which has not been supported for a while. For all of these, I checked to remove only things that had close to no users and that were a significant maintenance burden. So as a rule of thumb: If your XML validated with no warnings with the 0.16.x branch of AppStream, it will still be 100% valid with the 1.0 release.
Another notable change is that the generated output of AppStream 1.0 will always be 1.0 compliant, you can not make it generate data for versions below that (this greatly reduced the maintenance cost of the project).
developer_name
tag. With AppStream 1.0, this is changed a bit. There is now a developer
tag with a name
child (that can be translated unless the translate="no"
attribute is set on it). This allows future extensibility, and also allows to set a machine-readable id
attribute in the developer
element. This permits software centers to group software by developer easier, without having to use heuristics. If we decide to extend the developer information per-app in future, this is also now possible. Do not worry though the developer_name
tag is also still read, so there is no high pressure to update. The old 0.16.x stable series also has this feature backported, so it can be available everywhere. Check out the developer tag specification for more details.
scale
attribute, to indicate an (integer) scaling factor to apply. This feature was a breaking change and therefore we could not have it for the longest time, but it is now available. Please wait a bit for AppStream 1.0 to become deployed more widespread though, as using it with older AppStream versions may lead to issues in some cases. Check out the screenshots tag specification for more details.
environment
attribute on the respective screenshot
tag. This was also a breaking change, so use it carefully for now! If projects want to, they can use this feature to supply dedicated screenshots depending on the environment the application page is displayed in. Check out the screenshots tag specification for more details.
references
tag, you can associate the AppStream component with a DOI (Digital object identifier) or provide a link to a CFF file to provide citation information. It also allows to link to other scientific registries. Check out the references tag specification for more details.
appstreamcli
utility also has much improved support for relation checks, and I wrote about these changes in a previous post. Check it out!
With these changes, I hope this feature will be used much more, and beyond just drivers and firmware.
10 years
100 countries
1000 maintainers
10000 packages
1 project
10 architectures
100 countries
1000 maintainers
10000 packages
100000 bugs fixed
1000000 installations
10000000 users
100000000 lines of code
That is just awesome, nothing to see here, go look at the BPF documents if you have cgroup v2. With cgroup v1 if you wanted to know what devices were permitted, you just wouldCgroup v2 device controller has no interface files and is implemented on top of cgroup BPF.https://www.kernel.org/doc/Documentation/admin-guide/cgroup-v2.rst
cat /sys/fs/cgroup/XX/devices.allow
and you were done!
The kernel documentation is not very helpful, sure its something in BPF and has something to do with the cgroup BPF specifically, but what does that mean?
There doesn t seem to be an easy corresponding method to get the same information. So to see what restrictions a docker container has, we will have to:
docker ps
command, you get the short id. To get the long id you can either use the --no-trunc
flag or just guess from the short ID. I usually do the second.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
a3c53d8aaec2 debian:minicom "/bin/bash" 19 minutes ago Up 19 minutes inspiring_shannon
So the short ID is a3c53d8aaec2 and the long ID is a big ugly hex string starting with that. I generally just paste the relevant part in the next step and hit tab. For this container the cgroup is /sys/fs/cgroup/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope/
Notice that the last directory has docker- then the short ID.
If you re not sure of the exact path. The /sys/fs/cgroup is the cgroup v2 mount point which can be found with mount -t cgroup2 and then rest is the actual cgroup name. If you know the process running in the container then the cgroup column in ps will show you.
$ ps -o pid,comm,cgroup 140064
PID COMMAND CGROUP
140064 bash 0::/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope
Either way, you will have your cgroup path.
$ sudo bpftool cgroup list /sys/fs/cgroup/system.slice/docker-a3c53d8aaec23c256124f03d208732484714219c8b5f90dc1c3b4ab00f0b7779.scope/
ID AttachType AttachFlags Name
90 cgroup_device multi
Our cgroup is attached to eBPF prog with ID of 90 and the type of program is cgroup _device.
sudo bpftool prog dump xlated id 90 > myebpf.txt
Congratulations! You now have the eBPF program in a human-readable (?) format.
0: (61) r2 = *(u32 *)(r1 +0) 1: (54) w2 &= 65535 2: (61) r3 = *(u32 *)(r1 +0) 3: (74) w3 >>= 16 4: (61) r4 = *(u32 *)(r1 +4) 5: (61) r5 = *(u32 *)(r1 +8)What we find is that once we get past the first few lines filtering the given value that the comparison lines have:
63: (55) if r2 != 0x2 goto pc+4 64: (55) if r4 != 0x64 goto pc+3 65: (55) if r5 != 0x2a goto pc+2 66: (b4) w0 = 1 67: (95) exitThis is a container using the option
--device-cgroup-rule='c 100:42 rwm'
. It is checking if r2 (device type) is 2 (char) and r4 (major device number) is 0x64 or 100 and r5 (minor device number) is 0x2a or 42. If any of those are not true, move to the next section, otherwise return with 1 (permit). We have all access modes permitted so it doesn t check for it.
The previous example has all permissions for our device with id 100:42, what about if we only want write access with the option --device-cgroup-rule='c 100:42 r'
. The resulting eBPF is:
63: (55) if r2 != 0x2 goto pc+7 64: (bc) w1 = w3 65: (54) w1 &= 2 66: (5d) if r1 != r3 goto pc+4 67: (55) if r4 != 0x64 goto pc+3 68: (55) if r5 != 0x2a goto pc+2 69: (b4) w0 = 1 70: (95) exitThe code is almost the same but we are checking that w3 only has the second bit set, which is for reading, effectively checking for X==X&2. It s a cautious approach meaning no access still passes but multiple bits set will fail.
--device
flag. This flag actually does two things. The first is to great the device file in the containers /dev directory, effectively doing a mknod
command. The second thing is to adjust the eBPF program. If the device file we specified actually did have a major number of 100 and a minor of 42, the eBPF would look exactly like the above snippets.
--privileged
flag do? This lets the container have full access to all the devices (if the user running the process is allowed). Like the --device
flag, it makes the device files as well, but what does the filtering look like? We still have a cgroup but the eBPF program is greatly simplified, here it is in full:
0: (61) r2 = *(u32 *)(r1 +0) 1: (54) w2 &= 65535 2: (61) r3 = *(u32 *)(r1 +0) 3: (74) w3 >>= 16 4: (61) r4 = *(u32 *)(r1 +4) 5: (61) r5 = *(u32 *)(r1 +8) 6: (b4) w0 = 1 7: (95) exitThere is the usual setup lines and then, return 1. Everyone is a winner for all devices and access types!
.ikiwiki
directory of the site
in the hope of making things a little more
"standalone". Unfortunately, that didn't work either because the
theme must be shipped in the system-wide location: I couldn't figure
out how to put it to have it bundled with the main repository. At that
point I mostly gave up because I had spent too much time on this and I
had to do something about email otherwise it would start to bounce.
rsync
'd my build in /var/www/html
and boom, I had a website. The
Goatcounter analytics were timing out, but that was easy to turn
off.
rsync -v -n --files-from=<(ssh colette.anarc.at find Maildir -name '*colette*' ) colette.anarc.at: colette/
rsync -v -n --files-from=<(ssh colette.anarc.at find Maildir -name '*colette*' ) colette/ marcos.anarc.at:
Overall, the outage lasted about 24 hours, from 11:00EST (16:00UTC) on
2023-02-07 to the same time today.
X
characters! There is
a new encoding option to set the output encoding, and new options groff
(which uses the groff extension for Unicode code points and is the default
on EBCDIC systmes) and roff (which does the old, broken X
substitution).
Since this was a major backward-incompatible change, I also finally
removed most of the formatting touch-ups that Pod::Man tried to do for
troff output but which would be invisible for the (by far more commonly
used) nroff output. These have been an endless source of bugs and are
very difficult to maintain, most of them were of marginal utility, and I
am dubious many people are using troff to print Perl manual pages these
days instead of, say, printing the rendered output from one of the many
excellent POD to HTML modules.
There is some remaining somewhat-Perl-specific guesswork applied to the
formatting, which is much simpler, but even that can now be turned off
with the new guesswork option. This will allow people using POD to
generate manual pages for things other than Perl modules to disable the
Perl-specific markup logic.
Pod::Text also now supports encoding and gets some major encoding
cleanups, including using Encode instead of PerlIO encoding layers for its
output.
There are also numerous other fixes and improvements: a new language
option to Pod::Man to configure (in an unfortunately groff-specific way)
the line-breaking rules for languages like Chinese and Japanese,
conversion of zero-width spaces to the *roff \:
equivalent, a fix
for wrapping L<> inside S<>, and various other bug fixes.
Perhaps the most interesting is a fix to a long-standing problem with the
Pod::Man output where bold and italic text would extend too far if used in
combination with C<> fixed-width text. This bug has been around forever
without being noticed, and then two different people noticed it while I
was preparing this release.
You can get the latest release from CPAN or from the
podlators distribution page. These
changes should be incorporated into Perl core in due course, although
given the substantial changes, that may require a baking period.
i3
configuration to Sway, and adapt my systemd
startup sequence to the
new environment. Screen sharing only works with Pipewire, so I also
did that migration, which basically requires an upgrade to Debian
bookworm to get a nice enough Pipewire release.
I'm testing Wayland on my laptop, but I'm not using it as a daily
driver because I first need to upgrade to Debian bookworm on my main
workstation.
Most irritants have been solved one way or the other. My main problem
with Wayland right now is that I spent a frigging week doing the
conversion: it's exciting and new, but it basically sucked the life
out of all my other projects and it's distracting, and I want it to
stop.
The rest of this page documents why I made the switch, how it
happened, and what's left to do. Hopefully it will keep you from
spending as much time as I did in fixing this.
TL;DR: Wayland is mostly ready. Main blockers you might find are
that you need to do manual configurations, DisplayLink (multiple
monitors on a single cable) doesn't work in Sway, HDR and color
management are still in development.
I had to install the following packages:
apt install \
brightnessctl \
foot \
gammastep \
gdm3 \
grim slurp \
pipewire-pulse \
sway \
swayidle \
swaylock \
wdisplays \
wev \
wireplumber \
wlr-randr \
xdg-desktop-portal-wlr
And did some of tweaks in my $HOME
, mostly dealing with my esoteric
systemd startup sequence, which you won't have to deal with if you are
not a fan.
It s amazing. I have never experienced gaming on Linux that looked this smooth in my life.... I'm not a gamer, but I do care about latency. The longer version is worth a read as well. The point here is not to bash one side or the other, or even do a thorough comparison. I start with the premise that Xorg is likely going away in the future and that I will need to adapt some day. In fact, the last major Xorg release (21.1, October 2021) is rumored to be the last ("just like the previous release...", that said, minor releases are still coming out, e.g. 21.1.4). Indeed, it seems even core Xorg people have moved on to developing Wayland, or at least Xwayland, which was spun off it its own source tree. X, or at least Xorg, in in maintenance mode and has been for years. Granted, the X Window System is getting close to forty years old at this point: it got us amazingly far for something that was designed around the time the first graphical interface. Since Mac and (especially?) Windows released theirs, they have rebuilt their graphical backends numerous times, but UNIX derivatives have stuck on Xorg this entire time, which is a testament to the design and reliability of X. (Or our incapacity at developing meaningful architectural change across the entire ecosystem, take your pick I guess.) What pushed me over the edge is that I had some pretty bad driver crashes with Xorg while screen sharing under Firefox, in Debian bookworm (around November 2022). The symptom would be that the UI would completely crash, reverting to a text-only console, while Firefox would keep running, audio and everything still working. People could still see my screen, but I couldn't, of course, let alone interact with it. All processes still running, including Xorg. (And no, sorry, I haven't reported that bug, maybe I should have, and it's actually possible it comes up again in Wayland, of course. But at first, screen sharing didn't work of course, so it's coming a much further way. After making screen sharing work, though, the bug didn't occur again, so I consider this a Xorg-specific problem until further notice.) There were also frustrating glitches in the UI, in general. I actually had to setup a compositor alongside i3 to make things bearable at all. Video playback in a window was laggy, sluggish, and out of sync. Wayland fixed all of this.
man -k sway
to find what they need. I don't think we need that kind
of elitism in our communities, to put this bluntly.
But let's put that aside: Sway is still a no-brainer. It's the easiest
thing to migrate to, because it's mostly compatible with i3. I had
to immediately fix those resources to get a minimal session going:
i3 | Sway | note |
---|---|---|
set_from_resources |
set |
no support for X resources, naturally |
new_window pixel 1 |
default_border pixel 1 |
actually supported in i3 as well |
brightnessctl
instead of
xbacklight
to change the backlight levels.
See a copy of my full sway/config for details.
Other options include:
nm-applet
work. Based on this
nm-applet.service, I found that you need to pass --indicator
for
it to show up at all.
In theory, tray icon support was merged in 1.5, but in practice
there are still several limitations, like icons not
clickable. Also, on startup, nm-applet --indicator
triggers this
error in the Sway logs:
nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property IconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property AttentionIconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property ItemIsMenu
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
... but that seems innocuous. The tray icon displays but is not
clickable.
Note that there is currently (November 2022) a pull request to
hook up a "Tray D-Bus Menu" which, according to Reddit might fix
this, or at least be somewhat relevant.
If you don't see the icon, check the bar.tray_output
property in the
Sway config, try: tray_output *
.
The non-working tray was the biggest irritant in my migration. I have
used nmtui
to connect to new Wifi hotspots or change connection
settings, but that doesn't support actions like "turn off WiFi".
I eventually fixed this by switching from py3status to
waybar, which was another yak horde shaving session, but
ultimately, it worked.
echo MOZ_ENABLE_WAYLAND=1 >> ~/.config/environment.d/firefox.conf && apt install xdg-desktop-portal-wlr
MOZ_ENABLE_WAYLAND=1 firefox
To make the change permanent, many recipes recommend adding this to an
environment startup script:
if [ "$XDG_SESSION_TYPE" == "wayland" ]; then
export MOZ_ENABLE_WAYLAND=1
fi
At least that's the theory. In practice, Sway doesn't actually run any
startup shell script, so that can't possibly work. Furthermore,
XDG_SESSION_TYPE
is not actually set when starting Sway from gdm3
which I find really confusing, and I'm not the only one. So
the above trick doesn't actually work, even if the environment
(XDG_SESSION_TYPE
) is set correctly, because we don't have
conditionals in environment.d(5).
(Note that systemd.environment-generator(7) do support running
arbitrary commands to generate environment, but for some some do not
support user-specific configuration files... Even then it may be a
solution to have a conditional MOZ_ENABLE_WAYLAND
environment, but
I'm not sure it would work because ordering between those two isn't
clear: maybe the XDG_SESSION_TYPE
wouldn't be set just yet...)
At first, I made this ridiculous script to workaround those
issues. Really, it seems to me Firefox should just parse the
XDG_SESSION_TYPE
variable here... but then I realized that Firefox
works fine in Xorg when the MOZ_ENABLE_WAYLAND
is set.
So now I just set that variable in environment.d
and It Just Works :
MOZ_ENABLE_WAYLAND=1
chromium -enable-features=UseOzonePlatform -ozone-platform=wayland
If it shows an ugly gray border, check the Use system title bar and
borders
setting.
It can do some screensharing. Sharing a window and a tab seems to
work, but sharing a full screen doesn't: it's all black. Maybe not
ready for prime time.
And since Firefox can do what I need under Wayland now, I will not
need to fight with Chromium to work under Wayland:
apt purge chromium
Note that a similar fix was necessary for Signal Desktop, see this
commit. Basically you need to figure out a way to pass those same
flags to signal:
--enable-features=WaylandWindowDecorations --ozone-platform-hint=auto
$PATH
in /etc
! and certain things are simply not
working in my setup. For example, this hook never gets ran on startup:
(add-hook 'after-init-hook 'server-start t)
Still, like many X11 applications, Emacs mostly works fine under
Xwayland. The clipboard works as expected, for example.
Scaling is a bit of an issue: fonts look fuzzy.
I have heard anecdotal evidence of hard lockups with Emacs running
under Xwayland as well, but haven't experienced any problem so far. I
did experience a Wayland crash with the snapshot version however.
TODO: look again at Wayland in Emacs 29.
redshift -m drm -PO 3000
This tip is from the arch wiki which also has other suggestions
for Wayland-based alternatives. Both KDE and GNOME have their own "red
shifters", and for wlroots-based compositors, they (currently,
Sept. 2022) list the following alternatives:
gammastep
with a simple gammastep.service file
associated with the sway-session.target.
nov 16 16:41:43 angela sway[843121]: 00:00:00.002 [ERROR] [wlr] [libseat] [common/terminal.c:162] Could not open target tty: Permission denied
Possible alternatives:
foot-terminfo
package
on the remote host, which is available in Debian stable.
This should eventually resolve itself, as Debian bookworm has a newer
version. Note that some corrections were also shipped in the
20211113 release, but that is also shipped in Debian bookworm.
That said, I am almost certain I will have to revert back to xterm
under Xwayland at some point in the future. Back when I was using
GNOME Terminal, it would mostly work for everything until I had to use
the serial console on a (HP ProCurve) network switch, which have a
fancy TUI that was basically unusable there. I fully expect such
problems with foot, or any other terminal than xterm, for that matter.
The foot wiki has good troubleshooting instructions as well.
Update: I did find one tiny thing to improve with foot, and it's the
default logging level which I found pretty verbose. After discussing
it with the maintainer on IRC, I submitted this patch to tweak
it, which I described like this on Mastodon:
today's reason why i will go to hell when i die (TRWIWGTHWID?): a 600-word, 63 lines commit log for a one line change: https://codeberg.org/dnkl/foot/pulls/1215It's Friday.
Tool | In Debian | Notes |
---|---|---|
alfred | yes | general launcher/assistant tool |
bemenu | yes, bookworm+ | inspired by dmenu |
cerebro | no | Javascript ... uh... thing |
dmenu-wl | no | fork of dmenu, straight port to Wayland |
Fuzzel | ITP 982140 | dmenu/drun replacement, app icon overlay |
gmenu | no | drun replacement, with app icons |
kickoff | no | dmenu/run replacement, fuzzy search, "snappy", history, copy-paste, Rust |
krunner | yes | KDE's runner |
mauncher | no | dmenu/drun replacement, math |
nwg-launchers | no | dmenu/drun replacement, JSON config, app icons, nwg-shell project |
Onagre | no | rofi/alfred inspired, multiple plugins, Rust |
menu | no | dmenu/drun rewrite |
Rofi (lbonn's fork) | no | see above |
sirula | no | .desktop based app launcher |
Ulauncher | ITP 949358 | generic launcher like Onagre/rofi/alfred, might be overkill |
tofi | yes, bookworm+ | dmenu/drun replacement, C |
wmenu | no | fork of dmenu-wl, but mostly a rewrite |
Wofi | yes | dmenu/drun replacement, not actively maintained |
yofi | no | dmenu/drun replacement, Rust |
input-method-unstable-v2
protocol (sample
emoji picker, but is not packaged in Debian.
As it turns out, wtype just works as expected, and fixing this was
basically a two-line patch. Another alternative, not in Debian, is
wofi-pass.
The other problem is that I actually heavily modified rofi. I use
"modis" which are not actually implemented in wofi or tofi, so I'm
left with reinventing those wheels from scratch or using the rofi +
wayland fork... It's really too bad that fork isn't being
reintegrated...
For now, I'm actually still using rofi under Xwayland. The main
downside is that fonts are fuzzy, but it otherwise just works.
Note that wlogout could be a partial replacement (just for the
"power menu").
mpv
seems to work fine under Wayland,
better than Xorg on my new laptop (as mentioned in the introduction),
and that before the version which improves Wayland support
significantly, by bringing native Pipewire support and DMA-BUF
support.
gmpc is more of a problem, mainly because it is abandoned. See
2022-08-22-gmpc-alternatives for the full discussion, one of
the alternatives there will likely support Wayland.
Finally, I might just switch to sublime-music instead... In any
case, not many changes here, thankfully.
swayidle
with a configuration based on
the systemd integration wiki page but with additional tweaks from
this service, see the resulting swayidle.service file.
Interestingly, damjan also has a service for swaylock itself,
although it's not clear to me what its purpose is...
--audio
, duh). It's also
packaged in Debian.
One has to wonder how this works while keeping the "between app
security" that Wayland promises, however... Would installing such a
program make my system less secure?
Many other options are available, see the awesome Wayland
screencasting list.
.Xresources
- just say goodbye to that old resource system, it
was used, in my case, only for rofi, xterm, and ... Xboard!? swaymsg input 0:0:X11_keyboard xkb_layout de
or using this config:
input *
xkb_layout "ca,us"
xkb_options "grp:sclk_toggle"
That works refreshingly well, even better than in Xorg, I must say.
swaykbdd is an alternative that supports per-window layouts
(in Debian).nm-applet
work. based on
this nm-applet.service, I found that you need to pass --indicator
. In
theory, tray icon support was merged in 1.5, but in practice
there are still several limitations, like icons not
clickable. On startup, nm-applet --indicator
triggers this
error in the Sway logs:
nov 11 22:34:12 angela sway[298938]: 00:49:42.325 [INFO] [swaybar/tray/host.c:24] Registering Status Notifier Item ':1.47/org/ayatana/NotificationItem/nm_applet'
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet IconPixmap: No such property IconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet AttentionIconPixmap: No such property AttentionIconPixmap
nov 11 22:34:12 angela sway[298938]: 00:49:42.327 [ERROR] [swaybar/tray/item.c:127] :1.47/org/ayatana/NotificationItem/nm_applet ItemIsMenu: No such property ItemIsMenu
nov 11 22:36:10 angela sway[313419]: info: fcft.c:838: /usr/share/fonts/truetype/dejavu/DejaVuSans.ttf: size=24.00pt/32px, dpi=96.00
... but it seems innocuous. The tray icon displays but, as stated
above, is not clickable. If you don't see the icon, check the
bar.tray_output
property in the Sway config, try: tray_output *
.
Note that there is currently (November 2022) a pull request to
hook up a "Tray D-Bus Menu" which, according to Reddit might
fix this, or at least be somewhat relevant.
This was the biggest irritant in my migration. I have used nmtui
to connect to new Wifi hotspots or change connection settings, but
that doesn't support actions like "turn off WiFi".
I eventually fixed this by switching from py3status to
waybar.i3
I was using this bespoke i3-focus
script, which doesn't work under Sway, swayr an option, not in
Debian. So I put together this other bespoke hack from
multiple sources, which works.X11 | Wayland | In Debian |
---|---|---|
arandr |
wdisplays | yes |
autorandr |
kanshi | yes |
xdotool |
wtype | yes |
xev |
wev | yes |
xlsclients |
swaymsg -t get_tree |
yes |
xrandr |
wlr-randr | yes |
xlsclients
but is not
packaged in Debian.
See also:
.xsession
like
this:
#!/bin/sh
. ~/.shenv
systemctl --user import-environment
exec systemctl --user start --wait xsession.target
But obviously, the xsession.target
is not started by the Sway
session. It seems to just start a default.target
, which is really
not what we want because we want to associate the services directly
with the graphical-session.target
, so that they don't start when
logging in over (say) SSH.
damjan
on #debian-systemd
showed me his sway-setup which
features systemd integration. It involves starting a different session
in a completely new .desktop
file. That work was submitted
upstream but refused on the grounds that "I'd rather not give a
preference to any particular init system." Another PR was
abandoned because "restarting sway does not makes sense: that
kills everything".
The work was therefore moved to the wiki.
So. Not a great situation. The upstream wiki systemd
integration suggests starting the systemd target from within
Sway, which has all sorts of problems:
$PATH
and environment.
So I went down that rabbit hole and managed to correctly configure
Sway to be started from the systemd --user
session.
I have partly followed the wiki but also picked ideas from damjan's
sway-setup and xdbob's sway-services. Another option is
uwsm (not in Debian).
This is the config I have in .config/systemd/user/
:
I have also configured those services, but that's somewhat optional:
You will also need at least part of my sway/config, which
sends the systemd notification (because, no, Sway doesn't support any
sort of readiness notification, that would be too easy). And you might
like to see my swayidle-config while you're there.
Finally, you need to hook this up somehow to the login manager. This
is typically done with a desktop file, so drop
sway-session.desktop in /usr/share/wayland-sessions
and
sway-user-service somewhere in your $PATH
(typically
/usr/bin/sway-user-service
).
The session then looks something like this:
$ systemd-cgls head -101
Control group /:
-.slice
user.slice (#472)
user.invocation_id: bc405c6341de4e93a545bde6d7abbeec
trusted.invocation_id: bc405c6341de4e93a545bde6d7abbeec
user-1000.slice (#10072)
user.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
trusted.invocation_id: 08f40f5c4bcd4fd6adfd27bec24e4827
user@1000.service (#10156)
user.delegate: 1
trusted.delegate: 1
user.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
trusted.invocation_id: 76bed72a1ffb41dca9bfda7bb174ef6b
session.slice (#10282)
xdg-document-portal.service (#12248)
9533 /usr/libexec/xdg-document-portal
9542 fusermount3 -o rw,nosuid,nodev,fsname=portal,auto_unmount,subt
xdg-desktop-portal.service (#12211)
9529 /usr/libexec/xdg-desktop-portal
pipewire-pulse.service (#10778)
6002 /usr/bin/pipewire-pulse
wireplumber.service (#10519)
5944 /usr/bin/wireplumber
gvfs-daemon.service (#10667)
5960 /usr/libexec/gvfsd
gvfs-udisks2-volume-monitor.service (#10852)
6021 /usr/libexec/gvfs-udisks2-volume-monitor
at-spi-dbus-bus.service (#11481)
6210 /usr/libexec/at-spi-bus-launcher
6216 /usr/bin/dbus-daemon --config-file=/usr/share/defaults/at-spi2
6450 /usr/libexec/at-spi2-registryd --use-gnome-session
pipewire.service (#10403)
5940 /usr/bin/pipewire
dbus.service (#10593)
5946 /usr/bin/dbus-daemon --session --address=systemd: --nofork --n
background.slice (#10324)
tracker-miner-fs-3.service (#10741)
6001 /usr/libexec/tracker-miner-fs-3
app.slice (#10240)
xdg-permission-store.service (#12285)
9536 /usr/libexec/xdg-permission-store
gammastep.service (#11370)
6197 gammastep
dunst.service (#11958)
7460 /usr/bin/dunst
wterminal.service (#13980)
69100 foot --title pop-up
69101 /bin/bash
77660 sudo systemd-cgls
77661 head -101
77662 wl-copy
77663 sudo systemd-cgls
77664 systemd-cgls
syncthing.service (#11995)
7529 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo
7537 /usr/bin/syncthing -no-browser -no-restart -logflags=0 --verbo
dconf.service (#10704)
5967 /usr/libexec/dconf-service
gnome-keyring-daemon.service (#10630)
5951 /usr/bin/gnome-keyring-daemon --foreground --components=pkcs11
gcr-ssh-agent.service (#10963)
6035 /usr/libexec/gcr-ssh-agent /run/user/1000/gcr
swayidle.service (#11444)
6199 /usr/bin/swayidle -w
nm-applet.service (#11407)
6198 /usr/bin/nm-applet --indicator
wcolortaillog.service (#11518)
6226 foot colortaillog
6228 /bin/sh /home/anarcat/bin/colortaillog
6230 sudo journalctl -f
6233 ccze -m ansi
6235 sudo journalctl -f
6236 journalctl -f
afuse.service (#10889)
6051 /usr/bin/afuse -o mount_template=sshfs -o transform_symlinks -
gpg-agent.service (#13547)
51662 /usr/bin/gpg-agent --supervised
51719 scdaemon --multi-server
emacs.service (#10926)
6034 /usr/bin/emacs --fg-daemon
33203 /usr/bin/aspell -a -m -d en --encoding=utf-8
xdg-desktop-portal-gtk.service (#12322)
9546 /usr/libexec/xdg-desktop-portal-gtk
xdg-desktop-portal-wlr.service (#12359)
9555 /usr/libexec/xdg-desktop-portal-wlr
sway.service (#11037)
6037 /usr/bin/sway
6181 swaybar -b bar-0
6209 py3status
6309 /usr/bin/i3status -c /tmp/py3status_oy4ntfnq
6969 Xwayland :0 -rootless -terminate -core -listen 29 -listen 30 -
init.scope (#10198)
5909 /lib/systemd/systemd --user
5911 (sd-pam)
session-7.scope (#10440)
5895 gdm-session-worker [pam/gdm-password]
6028 /usr/libexec/gdm-wayland-session --register-session sway-user-serv
[...]
I think that's pretty neat.
$PATH
, which
broke a lot of my workflow. It's hard to tell exactly how Wayland
gets started or where to inject environment. This discussion
suggests a few alternatives and this Debian bug report discusses
this issue as well.
I eventually picked environment.d(5) since I already manage my user
session with systemd, and it fixes a bunch of other problems. I used
to have a .shenv
that I had to manually source everywhere. The only
problem with that approach is that it doesn't support conditionals,
but that's something that's rarely needed.
apt install pipewire pipewire-audio-client-libraries pipewire-pulse wireplumber
Then, as a regular user:
systemctl --user daemon-reload
systemctl --user --now disable pulseaudio.service pulseaudio.socket
systemctl --user --now enable pipewire pipewire-pulse
systemctl --user mask pulseaudio
An optional (but key, IMHO) configuration you should also make is to
"switch on connect", which will make your Bluetooth or USB headset
automatically be the default route for audio, when connected. In
~/.config/pipewire/pipewire-pulse.conf.d/autoconnect.conf
:
context.exec = [
path = "pactl" args = "load-module module-always-sink"
path = "pactl" args = "load-module module-switch-on-connect"
# path = "/usr/bin/sh" args = "~/.config/pipewire/default.pw"
]
See the excellent as usual Arch wiki page about Pipewire for
that trick and more information about Pipewire. Note that you must
not put the file in ~/.config/pipewire/pipewire.conf
(or
pipewire-pulse.conf
, maybe) directly, as that will break your
setup. If you want to add to that file, first copy the template from
/usr/share/pipewire/pipewire-pulse.conf
first.
So far I'm happy with Pipewire in bookworm, but I've heard mixed
reports from it. I have high hopes it will become the standard media
server for Linux in the coming months or years, which is great because
I've been (rather boldly, I admit) on the record saying I don't like
PulseAudio.
Rereading this now, I feel it might have been a little unfair, as
"over-engineered and tries to do too many things at once" applies
probably even more to Pipewire than PulseAudio (since it also handles
video dispatching).
That said, I think Pipewire took the right approach by implementing
existing interfaces like Pulseaudio and JACK. That way we're not
adding a third (or fourth?) way of doing audio in Linux; we're just
making the server better.
d c 06 10:36:31 curie sway[343384]: 23:32:14.034 [ERROR] [wlr] [libinput] event5 - SONiX USB Keyboard: client bug: event processing lagging behind by 37ms, your system is too slow
... and corresponds to an open bug report in Sway. It seems the
"system is too slow" should really be "your compositor is too slow"
which seems to be the case here on this older system
(curie). It doesn't happen often, but it does happen,
particularly when a bunch of busy processes start in parallel (in my
case: a linter running inside a container and notmuch new
).
The proposed fix for this in Sway is to gain real time privileges
and add the CAP_SYS_NICE
capability to the binary. We'll see how
that goes in Debian once 1.8 gets released and shipped.
xeyes
(in the x11-apps
package) will run in Wayland, and can
actually be used to easily see if a given window is also in
Wayland. If the "eyes" follow the cursor, the app is actually running
in xwayland, so not natively in Wayland.
Another way to see what is using Wayland in Sway is with the command:
swaymsg -t get_tree
chooser
installed, see
xdg-desktop-portal-wrl(5))apt install --yes gdisk zfs-dkms zfs zfs-initramfs zfsutils-linux
We also tell DKMS that we need to rebuild the initrd when upgrading:
echo REMAKE_INITRD=yes > /etc/dkms/zfs.conf
/dev/sdc
with:
sgdisk --zap-all /dev/sdc
sgdisk -a1 -n1:24K:+1000K -t1:EF02 /dev/sdc
sgdisk -n2:1M:+512M -t2:EF00 /dev/sdc
sgdisk -n3:0:+1G -t3:BF01 /dev/sdc
sgdisk -n4:0:0 -t4:BF00 /dev/sdc
root@curie:/home/anarcat# sgdisk -p /dev/sdc
Disk /dev/sdc: 1953525168 sectors, 931.5 GiB
Model: ESD-S1C
Sector size (logical/physical): 512/512 bytes
Disk identifier (GUID): [REDACTED]
Partition table holds up to 128 entries
Main partition table begins at sector 2 and ends at sector 33
First usable sector is 34, last usable sector is 1953525134
Partitions will be aligned on 16-sector boundaries
Total free space is 14 sectors (7.0 KiB)
Number Start (sector) End (sector) Size Code Name
1 48 2047 1000.0 KiB EF02
2 2048 1050623 512.0 MiB EF00
3 1050624 3147775 1024.0 MiB BF01
4 3147776 1953525134 930.0 GiB BF00
Unfortunately, we can't be sure of the sector size here, because the
USB controller is probably lying to us about it. Normally, this
smartctl
command should tell us the sector size as well:
root@curie:~# smartctl -i /dev/sdb -qnoserial
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-14-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Western Digital Black Mobile
Device Model: WDC WD10JPLX-00MBPT0
Firmware Version: 01.01H01
User Capacity: 1 000 204 886 016 bytes [1,00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 7200 rpm
Form Factor: 2.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS T13/1699-D revision 6
SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
Local Time is: Tue May 17 13:33:04 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Above is the example of the builtin HDD drive. But the SSD device
enclosed in that USB controller doesn't support SMART commands,
so we can't trust that it really has 512 bytes sectors.
This matters because we need to tweak the ashift
value
correctly. We're going to go ahead the SSD drive has the common 4KB
settings, which means ashift=12
.
Note here that we are not creating a separate partition for
swap. Swap on ZFS volumes (AKA "swap on ZVOL") can trigger lockups and
that issue is still not fixed upstream. Ubuntu recommends using a
separate partition for swap instead. But since this is "just" a
workstation, we're betting that we will not suffer from this problem,
after hearing a report from another Debian developer running this
setup on their workstation successfully.
We do not recommend this setup though. In fact, if I were to redo this
partition scheme, I would probably use LUKS encryption and setup a
dedicated swap partition, as I had problems with ZFS encryption as
well.
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-o feature@zpool_checkpoint=enabled \
-O acltype=posixacl -O canmount=off \
-O compression=lz4 \
-O devices=off -O normalization=formD -O relatime=on -O xattr=sa \
-O mountpoint=/boot -R /mnt \
bpool /dev/sdc3
I haven't investigated all those settings and just trust the upstream
guide on the above.
zpool create \
-o ashift=12 \
-O encryption=on -O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O relatime=on \
-O canmount=off \
-O mountpoint=/ -R /mnt \
rpool /dev/sdc4
Breaking this down:
-o ashift=12
: mentioned above, 4k sector size-O encryption=on -O keylocation=prompt -O keyformat=passphrase
:
encryption, prompt for a password, default algorithm is
aes-256-gcm
, explicit in the guide, made implicit here-O acltype=posixacl -O xattr=sa
: enable ACLs, with better
performance (not enabled by default)-O dnodesize=auto
: related to extended attributes, less
compatibility with other implementations-O compression=zstd
: enable zstd compression, can be
disabled/enabled by dataset to with zfs set compression=off
rpool/example
-O relatime=on
: classic atime
optimisation, another that could
be used on a busy server is atime=off
-O canmount=off
: do not make the pool mount automatically with
mount -a
?-O mountpoint=/ -R /mnt
: mount pool on /
in the future, but
/mnt
for now-O normalization=formD
: normalize file names on comparisons (not
storage), implies utf8only=on
, which is a bad idea (and
effectively meant my first sync failed to copy some files,
including this folder from a supysonic checkout). and this
cannot be changed after the filesystem is created. bad, bad, bad.[...] any error can be detected, but cannot be corrected. This sounds like an acceptable compromise, but its actually not. The reason its not is that ZFS' metadata cannot be allowed to be corrupted. If it is it is likely the zpool will be impossible to mount (and will probably crash the system once the corruption is found). So a couple of bad sectors in the right place will mean that all data on the zpool will be lost. Not some, all. Also there's no ZFS recovery tools, so you cannot recover any data on the drives.Compared with (say) ext4, where a single disk error can recovered, this is pretty bad. But we are ready to live with this with the idea that we'll have hourly offline snapshots that we can easily recover from. It's trade-off. Also, we're running this on a NVMe/M.2 drive which typically just blinks out of existence completely, and doesn't "bit rot" the way a HDD would. Also, the FreeBSD handbook quick start doesn't have any warnings about their first example, which is with a single disk. So I am reassured at least.
ROOT
and BOOT
zfs create -o canmount=off -o mountpoint=none rpool/ROOT &&
zfs create -o canmount=off -o mountpoint=none bpool/BOOT
Note that it's unclear to me why those datasets are necessary, but
they seem common practice, also used in this FreeBSD
example. The OpenZFS guide mentions the Solaris upgrades and
Ubuntu's zsys that use that container for upgrades and rollbacks.
This blog post seems to explain a bit the layout behind the
installer. zfs create -o canmount=noauto -o mountpoint=/ rpool/ROOT/debian &&
zfs mount rpool/ROOT/debian &&
zfs create -o mountpoint=/boot bpool/BOOT/debian
I guess the debian
name here is because we could technically have
multiple operating systems with the same underlying datasets. zfs create rpool/home &&
zfs create -o mountpoint=/root rpool/home/root &&
chmod 700 /mnt/root &&
zfs create rpool/var
zfs create -o com.sun:auto-snapshot=false rpool/var/cache &&
zfs create -o com.sun:auto-snapshot=false rpool/var/tmp &&
chmod 1777 /mnt/var/tmp
zfs create -o canmount=off rpool/var/lib &&
zfs create -o com.sun:auto-snapshot=false rpool/var/lib/docker
Notice here a peculiarity: we must create rpool/var/lib
to
create rpool/var/lib/docker
otherwise we get this error:
cannot create 'rpool/var/lib/docker': parent does not exist
... and no, just creating /mnt/var/lib
doesn't fix that
problem. In fact, it makes things even more confusing because an
existing directory shadows a mountpoint, which is the opposite of
how things normally work.
Also note that you will probably need to change storage driver in
Docker, see the zfs-driver documentation for details but,
basically, I did:
echo ' "storage-driver": "zfs" ' > /etc/docker/daemon.json
Note that podman has the same problem (and similar solution):
printf '[storage]\ndriver = "zfs"\n' > /etc/containers/storage.conf
tmpfs
for /run
:
mkdir /mnt/run &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock
/srv
, as that's the HDD stuff.
Also mount the EFI partition:
mkfs.fat -F 32 /dev/sdc2 &&
mount /dev/sdc2 /mnt/boot/efi/
At this point, everything should be mounted in /mnt
. It should look
like this:
root@curie:~# LANG=C df -h -t zfs -t vfat
Filesystem Size Used Avail Use% Mounted on
rpool/ROOT/debian 899G 384K 899G 1% /mnt
bpool/BOOT/debian 832M 123M 709M 15% /mnt/boot
rpool/home 899G 256K 899G 1% /mnt/home
rpool/home/root 899G 256K 899G 1% /mnt/root
rpool/var 899G 384K 899G 1% /mnt/var
rpool/var/cache 899G 256K 899G 1% /mnt/var/cache
rpool/var/tmp 899G 256K 899G 1% /mnt/var/tmp
rpool/var/lib/docker 899G 256K 899G 1% /mnt/var/lib/docker
/dev/sdc2 511M 4.0K 511M 1% /mnt/boot/efi
Now that we have everything setup and mounted, let's copy all files
over.
for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
You can check that the list is correct with:
mount -l -t ext4,btrfs,vfat awk ' print $3 '
Note that we skip /srv
as it's on a different disk.
On the first run, we had:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
16,831,437 100% 184.14MB/s 0:00:00 (xfr#101, to-chk=0/110)
syncing / to /mnt/...
28,019,293,280 94% 47.63MB/s 0:09:21 (xfr#703710, ir-chk=6748/839220)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,267,990 98% 50.71MB/s 0:10:40 (xfr#736577, to-chk=0/867732)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
24,456,268,098 98% 68.03MB/s 0:05:42 (xfr#159867, ir-chk=6875/172377)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/B3AB0CDA9C4454B3C1197E5A22669DF8EE849D90"
199,762,528,125 93% 74.82MB/s 0:42:26 (xfr#1437846, ir-chk=1018/1983979)rsync: [generator] recv_generator: mkdir "/mnt/home/anarcat/dist/supysonic/tests/assets/\#346" failed: Invalid or incomplete multibyte or wide character (84)
*** Skipping any contents from this failed directory ***
315,384,723,978 96% 76.82MB/s 1:05:15 (xfr#2256473, to-chk=0/2993950)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Note the failure to transfer that supysonic file? It turns out they
had a weird filename in their source tree, since then removed,
but still it showed how the utf8only
feature might not be such a bad
idea. At this point, the procedure was restarted all the way back to
"Creating pools", after unmounting all ZFS filesystems (umount
/mnt/run /mnt/boot/efi && umount -t zfs -a
) and destroying the pool,
which, surprisingly, doesn't require any confirmation (zpool destroy
rpool
).
The second run was cleaner:
root@curie:~# for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
syncing /boot/ to /mnt/boot/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/299)
syncing /boot/efi/ to /mnt/boot/efi/...
0 0% 0.00kB/s 0:00:00 (xfr#0, to-chk=0/110)
syncing / to /mnt/...
28,019,033,070 97% 42.03MB/s 0:10:35 (xfr#703671, ir-chk=1093/833515)rsync: [generator] delete_file: rmdir(var/lib/docker) failed: Device or resource busy (16)
could not make way for new symlink: var/lib/docker
34,081,807,102 98% 44.84MB/s 0:12:04 (xfr#736580, to-chk=0/867723)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
syncing /home/ to /mnt/home/...
rsync: [sender] readlink_stat("/home/anarcat/.fuse") failed: Permission denied (13)
IO error encountered -- skipping file deletion
24,043,086,450 96% 62.03MB/s 0:06:09 (xfr#151819, ir-chk=15117/172571)
file has vanished: "/home/anarcat/.cache/mozilla/firefox/s2hwvqbu.quantum/cache2/entries/4C1FDBFEA976FF924D062FB990B24B897A77B84B"
315,423,626,507 96% 67.09MB/s 1:14:43 (xfr#2256845, to-chk=0/2994364)
rsync error: some files/attrs were not transferred (see previous errors) (code 23) at main.c(1333) [sender=3.2.3]
Also note the transfer speed: we seem capped at 76MB/s, or
608Mbit/s. This is not as fast as I was expecting: the USB connection
seems to be at around 5Gbps:
anarcat@curie:~$ lsusb -tv head -4
/: Bus 02.Port 1: Dev 1, Class=root_hub, Driver=xhci_hcd/6p, 5000M
ID 1d6b:0003 Linux Foundation 3.0 root hub
__ Port 1: Dev 4, If 0, Class=Mass Storage, Driver=uas, 5000M
ID 0b05:1932 ASUSTek Computer, Inc.
So it shouldn't cap at that speed. It's possible the USB adapter is
failing to give me the full speed though. It's not the M.2 SSD drive
either, as that has a ~500MB/s bandwidth, acccording to its spec.
At this point, we're about ready to do the final configuration. We
drop to single user mode and do the rest of the procedure. That used
to be shutdown now
, but it seems like the systemd switch broke that,
so now you can reboot into grub and pick the "recovery"
option. Alternatively, you might try systemctl rescue
, as I found
out.
I also wanted to copy the drive over to another new NVMe drive, but
that failed: it looks like the USB controller I have doesn't work with
older, non-NVME drives.
mount --rbind /dev /mnt/dev &&
mount --rbind /proc /mnt/proc &&
mount --rbind /sys /mnt/sys &&
chroot /mnt /bin/bash
Next we add an extra service that imports the bpool on boot, to make
sure it survives a zpool.cache
destruction:
cat > /etc/systemd/system/zfs-import-bpool.service <<EOF
[Unit]
DefaultDependencies=no
Before=zfs-import-scan.service
Before=zfs-import-cache.service
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zpool import -N -o cachefile=none bpool
# Work-around to preserve zpool cache:
ExecStartPre=-/bin/mv /etc/zfs/zpool.cache /etc/zfs/preboot_zpool.cache
ExecStartPost=-/bin/mv /etc/zfs/preboot_zpool.cache /etc/zfs/zpool.cache
[Install]
WantedBy=zfs-import.target
EOF
Enable the service:
systemctl enable zfs-import-bpool.service
I had to trim down /etc/fstab
and /etc/crypttab
to only contain
references to the legacy filesystems (/srv
is still BTRFS!).
If we don't already have a tmpfs
defined in /etc/fstab
:
ln -s /usr/share/systemd/tmp.mount /etc/systemd/system/ &&
systemctl enable tmp.mount
Rebuild boot loader with support for ZFS, but also to workaround
GRUB's missing zpool-features support:
grub-probe /boot grep -q zfs &&
update-initramfs -c -k all &&
sed -i 's,GRUB_CMDLINE_LINUX.*,GRUB_CMDLINE_LINUX="root=ZFS=rpool/ROOT/debian",' /etc/default/grub &&
update-grub
For good measure, make sure the right disk is configured here, for
example you might want to tag both drives in a RAID array:
dpkg-reconfigure grub-pc
Install grub to EFI while you're there:
grub-install --target=x86_64-efi --efi-directory=/boot/efi --bootloader-id=debian --recheck --no-floppy
Filesystem mount ordering. The rationale here in the OpenZFS
guide is a little strange, but I don't dare ignore that.
mkdir /etc/zfs/zfs-list.cache
touch /etc/zfs/zfs-list.cache/bpool
touch /etc/zfs/zfs-list.cache/rpool
zed -F &
Verify that zed updated the cache by making sure these are not empty:
cat /etc/zfs/zfs-list.cache/bpool
cat /etc/zfs/zfs-list.cache/rpool
Once the files have data, stop zed:
fg
Press Ctrl-C.
Fix the paths to eliminate /mnt
:
sed -Ei "s /mnt/? / " /etc/zfs/zfs-list.cache/*
Snapshot initial install:
zfs snapshot bpool/BOOT/debian@install
zfs snapshot rpool/ROOT/debian@install
Exit chroot:
exit
for fs in /boot/ /boot/efi/ / /home/; do
echo "syncing $fs to /mnt$fs..." &&
rsync -aSHAXx --info=progress2 --delete $fs /mnt$fs
done
Then we unmount all filesystems:
mount grep -v zfs tac awk '/\/mnt/ print $3 ' xargs -i umount -lf
zpool export -a
Reboot, swap the drives, and boot in ZFS. Hurray!
fio --name=randwrite4k1x --ioengine=posixaio --rw=randwrite --bs=4k --size=4g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
fio --name=randwrite64k16x --ioengine=posixaio --rw=randwrite --bs=64k --size=256m --numjobs=16 --iodepth=16 --runtime=60 --time_based --end_fsync=1
fio --name=randwrite1m1x --ioengine=posixaio --rw=randwrite --bs=1m --size=16g --numjobs=1 --iodepth=1 --runtime=60 --time_based --end_fsync=1
fio
tests, one by one, 60 seconds
each. It should take about 12 minutes to run, as there are 3 pair of
tests, read/write, with and without async.
My bias, before building, running and analysing those results is that
ZFS should outperform the traditional stack on writes, but possibly
not on reads. It's also possible it outperforms it on both, because
it's a newer drive. A new test might be possible with a new external
USB drive as well, although I doubt I will find the time to do this.
systemctl rescue
The network might have been started before or after the test as well:
systemctl start systemd-networkd
So it should be fairly reliable as basically nothing else is running.
Raw numbers, from the ?job-curie-lvm.log, converted to MiB/s and
manually merged:
test | read I/O | read IOPS | write I/O | write IOPS |
---|---|---|---|---|
rand4k4g1x | 39.27 | 10052 | 212.15 | 54310 |
rand4k4g1x--fsync=1 | 39.29 | 10057 | 2.73 | 699 |
rand64k256m16x | 1297.00 | 20751 | 1068.57 | 17097 |
rand64k256m16x--fsync=1 | 1290.90 | 20654 | 353.82 | 5661 |
rand1m16g1x | 315.15 | 315 | 563.77 | 563 |
rand1m16g1x--fsync=1 | 345.88 | 345 | 157.01 | 157 |
test | read I/O | read IOPS | write I/O | write IOPS |
---|---|---|---|---|
rand4k4g1x | 77.20 | 19763 | 27.13 | 6944 |
rand4k4g1x--fsync=1 | 76.16 | 19495 | 6.53 | 1673 |
rand64k256m16x | 1882.40 | 30118 | 70.58 | 1129 |
rand64k256m16x--fsync=1 | 1865.13 | 29842 | 71.98 | 1151 |
rand1m16g1x | 921.62 | 921 | 102.21 | 102 |
rand1m16g1x--fsync=1 | 908.37 | 908 | 64.30 | 64 |
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87\x2dinit-merged.mount: Succeeded.
May 16 14:42:52 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:53 curie dockerd[1723]: time="2022-05-16T14:42:53.087219426-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 pid=151170
May 16 14:42:53 curie systemd[1]: Started libcontainer container af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.
May 16 14:42:54 curie systemd[1]: docker-af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10.scope: Succeeded.
May 16 14:42:54 curie dockerd[1723]: time="2022-05-16T14:42:54.047297800-04:00" level=info msg="shim disconnected" id=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10
May 16 14:42:54 curie dockerd[998]: time="2022-05-16T14:42:54.051365015-04:00" level=info msg="ignoring event" container=af22586fba07014a4d10ab19da10cf280db7a43cad804d6c1e9f2682f12b5f10 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
May 16 14:42:54 curie systemd[2444]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[2444]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[5161]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: run-docker-netns-f5453c87c879.mount: Succeeded.
May 16 14:42:54 curie systemd[1]: home-docker-overlay2-17e4d24228decc2d2d493efc401dbfb7ac29739da0e46775e122078d9daf3e87-merged.mount: Succeeded.
Translating this:
mai 30 15:31:39 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:39 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf\x2dinit.mount: Succeeded.
mai 30 15:31:40 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:40 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:41 curie dockerd[3199]: time="2022-05-30T15:31:41.551403693-04:00" level=info msg="starting signal loop" namespace=moby path=/run/docker/containerd/daemon/io.containerd.runtime.v2.task/moby/42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 pid=141080
mai 30 15:31:41 curie systemd[1]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[5287]: run-docker-runtime\x2drunc-moby-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142-runc.ZVcjvl.mount: Succeeded.
mai 30 15:31:41 curie systemd[1]: Started libcontainer container 42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.
mai 30 15:31:45 curie systemd[1]: docker-42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142.scope: Succeeded.
mai 30 15:31:45 curie dockerd[3199]: time="2022-05-30T15:31:45.883019128-04:00" level=info msg="shim disconnected" id=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142
mai 30 15:31:45 curie dockerd[1726]: time="2022-05-30T15:31:45.883064491-04:00" level=info msg="ignoring event" container=42a1a1ed5912a7227148e997f442e7ab2e5cc3558aa3471548223c5888c9b142 module=libcontainerd namespace=moby topic=/tasks/delete type="*events.TaskDelete"
mai 30 15:31:45 curie systemd[1]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: run-docker-netns-e45f5cf5f465.mount: Succeeded.
mai 30 15:31:45 curie systemd[1]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
mai 30 15:31:45 curie systemd[5287]: var-lib-docker-zfs-graph-41ce08fb7a1d3a9c101694b82722f5621c0b4819bd1d9f070933fd1e00543cdf.mount: Succeeded.
That's double or triple the run time, from 2 seconds to 6
seconds. Most of the time is spent in run time, inside the
container. Here's the breakdown:
umount /mnt/boot/efi /mnt/boot/run
umount -a -t zfs
zpool export -a
And disconnected the drive, to see how I would recover this system
from another Linux system in case of a total motherboard failure.
To import an existing pool, plug the device, then import the pool with
an alternate root, so it doesn't mount over your existing filesystems,
then you mount the root filesystem and all the others:
zpool import -l -a -R /mnt &&
zfs mount rpool/ROOT/debian &&
zfs mount -a &&
mount /dev/sdc2 /mnt/boot/efi &&
mount -t tmpfs tmpfs /mnt/run &&
mkdir /mnt/run/lock
sgdisk
, but I couldn't figure
out how to do this with sgdisk
, so this uses sfdisk
to dump the
partition from the first disk to an external, identical drive:
sfdisk -d /dev/nvme0n1 sfdisk --no-reread /dev/sda --force
zpool create \
-o cachefile=/etc/zfs/zpool.cache \
-o ashift=12 -d \
-o feature@async_destroy=enabled \
-o feature@bookmarks=enabled \
-o feature@embedded_data=enabled \
-o feature@empty_bpobj=enabled \
-o feature@enabled_txg=enabled \
-o feature@extensible_dataset=enabled \
-o feature@filesystem_limits=enabled \
-o feature@hole_birth=enabled \
-o feature@large_blocks=enabled \
-o feature@lz4_compress=enabled \
-o feature@spacemap_histogram=enabled \
-o feature@zpool_checkpoint=enabled \
-O acltype=posixacl -O xattr=sa \
-O compression=lz4 \
-O devices=off \
-O relatime=on \
-O canmount=off \
-O mountpoint=/boot -R /mnt \
bpool-tubman /dev/sdb3
The change from the main boot pool are:
sdb
used to be the M.2 device, it's now
nvme0n1
)zpool create \
-o ashift=12 \
-O encryption=on -O keylocation=prompt -O keyformat=passphrase \
-O acltype=posixacl -O xattr=sa -O dnodesize=auto \
-O compression=zstd \
-O relatime=on \
-O canmount=off \
-O mountpoint=/ -R /mnt \
rpool-tubman /dev/sdb4
sanoid
command had a --readonly
argument to simulate changes,
but syncoid
didn't so I tried to fix that with an upstream PR.
It seems it would be better to do this by hand, but this was much
easier. The full first sync was:
root@curie:/home/anarcat# ./bin/syncoid -r bpool bpool-tubman
CRITICAL ERROR: Target bpool-tubman exists but has no snapshots matching with bpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target bpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create bpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot bpool/BOOT@test (~ 42 KB) to new target filesystem:
44.2KiB 0:00:00 [4.19MiB/s] [========================================================================================================================] 103%
INFO: Updating new target filesystem with incremental bpool/BOOT@test ... syncoid_curie_2022-05-30:12:50:39 (~ 4 KB):
2.13KiB 0:00:00 [ 114KiB/s] [===============================================================> ] 53%
INFO: Sending oldest full snapshot bpool/BOOT/debian@install (~ 126.0 MB) to new target filesystem:
126MiB 0:00:00 [ 308MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental bpool/BOOT/debian@install ... syncoid_curie_2022-05-30:12:50:39 (~ 113.4 MB):
113MiB 0:00:00 [ 315MiB/s] [=======================================================================================================================>] 100%
root@curie:/home/anarcat# ./bin/syncoid -r rpool rpool-tubman
CRITICAL ERROR: Target rpool-tubman exists but has no snapshots matching with rpool!
Replication to target would require destroying existing
target. Cowardly refusing to destroy your existing target.
NOTE: Target rpool-tubman dataset is < 64MB used - did you mistakenly run
zfs create rpool-tubman on the target? ZFS initial
replication must be to a NON EXISTENT DATASET, which will
then be CREATED BY the initial replication process.
INFO: Sending oldest full snapshot rpool/ROOT@syncoid_curie_2022-05-30:12:50:51 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [2.44MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/ROOT/debian@install (~ 25.9 GB) to new target filesystem:
25.9GiB 0:03:33 [ 124MiB/s] [=======================================================================================================================>] 100%
INFO: Updating new target filesystem with incremental rpool/ROOT/debian@install ... syncoid_curie_2022-05-30:12:50:52 (~ 3.9 GB):
3.92GiB 0:00:33 [ 119MiB/s] [======================================================================================================================> ] 99%
INFO: Sending oldest full snapshot rpool/home@syncoid_curie_2022-05-30:12:55:04 (~ 276.8 GB) to new target filesystem:
277GiB 0:27:13 [ 174MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/home/root@syncoid_curie_2022-05-30:13:22:19 (~ 2.2 GB) to new target filesystem:
2.22GiB 0:00:25 [90.2MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var@syncoid_curie_2022-05-30:13:22:47 (~ 5.6 GB) to new target filesystem:
5.56GiB 0:00:32 [ 176MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/cache@syncoid_curie_2022-05-30:13:23:22 (~ 627.3 MB) to new target filesystem:
627MiB 0:00:03 [ 169MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib@syncoid_curie_2022-05-30:13:23:28 (~ 69 KB) to new target filesystem:
44.2KiB 0:00:00 [1.40MiB/s] [===========================================================================> ] 63%
INFO: Sending oldest full snapshot rpool/var/lib/docker@syncoid_curie_2022-05-30:13:23:28 (~ 442.6 MB) to new target filesystem:
443MiB 0:00:04 [ 103MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 (~ 6.3 MB) to new target filesystem:
6.49MiB 0:00:00 [12.9MiB/s] [========================================================================================================================] 102%
INFO: Updating new target filesystem with incremental rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254 ... syncoid_curie_2022-05-30:13:23:34 (~ 4 KB):
1.52KiB 0:00:00 [27.6KiB/s] [============================================> ] 38%
INFO: Sending oldest full snapshot rpool/var/lib/flatpak@syncoid_curie_2022-05-30:13:23:36 (~ 2.0 GB) to new target filesystem:
2.00GiB 0:00:17 [ 115MiB/s] [=======================================================================================================================>] 100%
INFO: Sending oldest full snapshot rpool/var/tmp@syncoid_curie_2022-05-30:13:23:55 (~ 57.0 MB) to new target filesystem:
61.8MiB 0:00:01 [45.0MiB/s] [========================================================================================================================] 108%
INFO: Clone is recreated on target rpool-tubman/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205 based on rpool/var/lib/docker/05c0de7fabbea60500eaa495d0d82038249f6faa63b12914737c4d71520e62c5@266253254
INFO: Sending oldest full snapshot rpool/var/lib/docker/ed71ddd563a779ba6fb37b3b1d0cc2c11eca9b594e77b4b234867ebcb162b205@syncoid_curie_2022-05-30:13:23:58 (~ 218.6 MB) to new target filesystem:
219MiB 0:00:01 [ 151MiB/s] [=======================================================================================================================>] 100%
Funny how the CRITICAL ERROR
doesn't actually stop syncoid
and it
just carries on merrily doing when it's telling you it's "cowardly
refusing to destroy your existing target"... Maybe that's because my pull
request broke something though...
During the transfer, the computer was very sluggish: everything feels
like it has ~30-50ms latency extra:
anarcat@curie:sanoid$ LANG=C top -b -n 1 head -20
top - 13:07:05 up 6 days, 4:01, 1 user, load average: 16.13, 16.55, 11.83
Tasks: 606 total, 6 running, 598 sleeping, 0 stopped, 2 zombie
%Cpu(s): 18.8 us, 72.5 sy, 1.2 ni, 5.0 id, 1.2 wa, 0.0 hi, 1.2 si, 0.0 st
MiB Mem : 15898.4 total, 1387.6 free, 13170.0 used, 1340.8 buff/cache
MiB Swap: 0.0 total, 0.0 free, 0.0 used. 1319.8 avail Mem
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
70 root 20 0 0 0 0 S 83.3 0.0 6:12.67 kswapd0
4024878 root 20 0 282644 96432 10288 S 44.4 0.6 0:11.43 puppet
3896136 root 20 0 35328 16528 48 S 22.2 0.1 2:08.04 mbuffer
3896135 root 20 0 10328 776 168 R 16.7 0.0 1:22.93 zfs
3896138 root 20 0 10588 788 156 R 16.7 0.0 1:49.30 zfs
350 root 0 -20 0 0 0 R 11.1 0.0 1:03.53 z_rd_int
351 root 0 -20 0 0 0 S 11.1 0.0 1:04.15 z_rd_int
3896137 root 20 0 4384 352 244 R 11.1 0.0 0:44.73 pv
4034094 anarcat 30 10 20028 13960 2428 S 11.1 0.1 0:00.70 mbsync
4036539 anarcat 20 0 9604 3464 2408 R 11.1 0.0 0:00.04 top
352 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
353 root 0 -20 0 0 0 S 5.6 0.0 1:03.64 z_rd_int
354 root 0 -20 0 0 0 S 5.6 0.0 1:04.01 z_rd_int
I wonder how much of that is due to syncoid, particularly because I
often saw mbuffer
and pv
in there which are not strictly necessary
to do those kind of operations, as far as I understand.
Once that's done, export the pools to disconnect the drive:
zpool export bpool-tubman
zpool export rpool-tubman
anarcat@curie:~$ sudo dd if=/dev/sdb of=/dev/sdc bs=4M status=progress conv=fdatasync
499944259584 octets (500 GB, 466 GiB) copi s, 1713 s, 292 MB/s
119235+1 enregistrements lus
119235+1 enregistrements crits
500107862016 octets (500 GB, 466 GiB) copi s, 1719,93 s, 291 MB/s
... while both over USB, whoohoo 300MB/s!
systemctl enable zfs-scrub-weekly@rpool.timer --now
systemctl enable zfs-scrub-monthly@rpool.timer --now
When the scrub runs, if it finds anything it will send an event which
will get picked up by the zed
daemon which will then send a
notification, see below for an example.
TODO: deploy on curie, if possible (probably not because no RAID)
TODO: this should be in Puppet
Date: Sun, 09 Oct 2022 00:58:08 -0400
From: root <root@anarc.at>
To: root@anarc.at
Subject: ZFS scrub_finish event for rpool on tubman
ZFS has finished a scrub:
eid: 39536
class: scrub_finish
host: tubman
time: 2022-10-09 00:58:07-0400
pool: rpool
state: ONLINE
status: One or more devices has experienced an unrecoverable error. An
attempt was made to correct the error. Applications are unaffected.
action: Determine if the device needs to be replaced, and clear the errors
using 'zpool clear' or replace the device with 'zpool replace'.
see: https://openzfs.github.io/openzfs-docs/msg/ZFS-8000-9P
scan: scrub repaired 0B in 00:33:57 with 0 errors on Sun Oct 9 00:58:07 2022
config:
NAME STATE READ WRITE CKSUM
rpool ONLINE 0 0 0
mirror-0 ONLINE 0 0 0
sdb4 ONLINE 0 1 0
sdc4 ONLINE 0 0 0
cache
sda3 ONLINE 0 0 0
errors: No known data errors
This, in itself, is a little worrisome. But it helpfully links to this
more detailed documentation (and props up there: the link still
works) which explains this is a "minor" problem (something that could
be included in the report).
In this case, this happened on a server setup on 2021-04-28, but the
disks and server hardware are much older. The server itself
(marcos v1) was built
around 2011, over 10 years ago now. The hard drive in question is:
root@tubman:~# smartctl -i -qnoserial /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:02:32 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
Some more SMART stats:
root@tubman:~# smartctl -a -qnoserial /dev/sdb grep -e Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes: 512 bytes logical, 4096 bytes physical
9 Power_On_Hours 0x0032 086 086 000 Old_age Always - 12464 (206 202 0)
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 10966h+55m+23.757s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 21107792664
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 3201579750
That's over a year of power on, which shouldn't be so bad. It has
written about 10TB of data (21107792664 LBAs * 512 byte/LBA
), which
is about two full writes. According to its specification, this
device is supposed to support 55 TB/year of writes, so we're far below
spec. Note that are still far from the "non-recoverable read error per
bits" spec (1 per 10E15), as we've basically read 13E12 bits
(3201579750 LBAs * 512 byte/LBA
= 13E12 bits).
It's likely this disk was made in 2018, so it is in its fourth
year.
Interestingly, /dev/sdc
is also a Seagate drive, but of a different
series:
root@tubman:~# smartctl -qnoserial -i /dev/sdb
smartctl 7.2 2020-12-30 r5155 [x86_64-linux-5.10.0-15-amd64] (local build)
Copyright (C) 2002-20, Bruce Allen, Christian Franke, www.smartmontools.org
=== START OF INFORMATION SECTION ===
Model Family: Seagate BarraCuda 3.5
Device Model: ST4000DM004-2CV104
Firmware Version: 0001
User Capacity: 4,000,787,030,016 bytes [4.00 TB]
Sector Sizes: 512 bytes logical, 4096 bytes physical
Rotation Rate: 5425 rpm
Form Factor: 3.5 inches
Device is: In smartctl database [for details use: -P show]
ATA Version is: ACS-3 T13/2161-D revision 5
SATA Version is: SATA 3.1, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is: Tue Oct 11 11:21:35 2022 EDT
SMART support is: Available - device has SMART capability.
SMART support is: Enabled
It has seen much more reads than the other disk which is also interesting:
root@tubman:~# smartctl -a -qnoserial /dev/sdc grep -e Head_Flying_Hours -e Power_On_Hours -e Total_LBA -e 'Sector Sizes'
Sector Sizes: 512 bytes logical, 4096 bytes physical
9 Power_On_Hours 0x0032 059 059 000 Old_age Always - 36240
240 Head_Flying_Hours 0x0000 100 253 000 Old_age Offline - 33994h+10m+52.118s
241 Total_LBAs_Written 0x0000 100 253 000 Old_age Offline - 30730174438
242 Total_LBAs_Read 0x0000 100 253 000 Old_age Offline - 51894566538
That's 4 years of Head_Flying_Hours
, and over 4 years (4 years and
48 days) of Power_On_Hours
. The copyright date on that drive's
specs goes back to 2016, so it's a much older drive.
SMART self-test succeeded.
fio
. Right now, I'm just
cargo-culting stuff from other folks and I don't really like
it. stressant is a good example of my struggles, in the sense
that it doesn't really work that well for disk tests.
I would love to have just a single .fio
job file that lists multiple
jobs to run serially. For example, this file describes the above
workload pretty well:
[global]
# cargo-culting Salter
fallocate=none
ioengine=posixaio
runtime=60
time_based=1
end_fsync=1
stonewall=1
group_reporting=1
# no need to drop caches, done by default
# invalidate=1
# Single 4KiB random read/write process
[randread-4k-4g-1x]
rw=randread
bs=4k
size=4g
numjobs=1
iodepth=1
[randwrite-4k-4g-1x]
rw=randwrite
bs=4k
size=4g
numjobs=1
iodepth=1
# 16 parallel 64KiB random read/write processes:
[randread-64k-256m-16x]
rw=randread
bs=64k
size=256m
numjobs=16
iodepth=16
[randwrite-64k-256m-16x]
rw=randwrite
bs=64k
size=256m
numjobs=16
iodepth=16
# Single 1MiB random read/write process
[randread-1m-16g-1x]
rw=randread
bs=1m
size=16g
numjobs=1
iodepth=1
[randwrite-1m-16g-1x]
rw=randwrite
bs=1m
size=16g
numjobs=1
iodepth=1
... except the jobs are actually started in parallel, even though they
are stonewall
'd, as far as I can tell by the reports. I sent a
mail to the fio mailing list for clarification.
It looks like the jobs are started in parallel, but actual
(correctly) run serially. It seems like this might just be a matter of
reporting the right timestamps in the end, although it does feel like
starting all the processes (even if not doing any work yet) could
skew the results.
sdc
to sdd
, for example), and this would
greatly confuse ZFS.
Here, for example, is sdd
reappearing out of the blue:
May 19 11:22:53 curie kernel: [ 699.820301] scsi host4: uas
May 19 11:22:53 curie kernel: [ 699.820544] usb 2-1: authorized to connect
May 19 11:22:53 curie kernel: [ 699.922433] scsi 4:0:0:0: Direct-Access ROG ESD-S1C 0 PQ: 0 ANSI: 6
May 19 11:22:53 curie kernel: [ 699.923235] sd 4:0:0:0: Attached scsi generic sg2 type 0
May 19 11:22:53 curie kernel: [ 699.923676] sd 4:0:0:0: [sdd] 1953525168 512-byte logical blocks: (1.00 TB/932 GiB)
May 19 11:22:53 curie kernel: [ 699.923788] sd 4:0:0:0: [sdd] Write Protect is off
May 19 11:22:53 curie kernel: [ 699.923949] sd 4:0:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
May 19 11:22:53 curie kernel: [ 699.924149] sd 4:0:0:0: [sdd] Optimal transfer size 33553920 bytes
May 19 11:22:53 curie kernel: [ 699.961602] sdd: sdd1 sdd2 sdd3 sdd4
May 19 11:22:53 curie kernel: [ 699.996083] sd 4:0:0:0: [sdd] Attached SCSI disk
Next time I run a ZFS command (say zpool list
), the command
completely hangs (D
state) and this comes up in the logs:
May 19 11:34:21 curie kernel: [ 1387.914843] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=71344128 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914859] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=205565952 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914874] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272789504 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.914906] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=270336 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914932] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073225728 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.914948] zio pool=bpool vdev=/dev/sdc3 error=5 type=1 offset=1073487872 size=8192 flags=b08c1
May 19 11:34:21 curie kernel: [ 1387.915165] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=272793600 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915183] zio pool=bpool vdev=/dev/sdc3 error=5 type=2 offset=339853312 size=4096 flags=184880
May 19 11:34:21 curie kernel: [ 1387.915648] WARNING: Pool 'bpool' has encountered an uncorrectable I/O failure and has been suspended.
May 19 11:34:21 curie kernel: [ 1387.915648]
May 19 11:37:25 curie kernel: [ 1571.558614] task:txg_sync state:D stack: 0 pid: 997 ppid: 2 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.558623] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.558640] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.558650] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.558670] schedule_timeout+0x8b/0x140
May 19 11:37:25 curie kernel: [ 1571.558675] ? __next_timer_interrupt+0x110/0x110
May 19 11:37:25 curie kernel: [ 1571.558678] io_schedule_timeout+0x4c/0x80
May 19 11:37:25 curie kernel: [ 1571.558689] __cv_timedwait_common+0x12b/0x160 [spl]
May 19 11:37:25 curie kernel: [ 1571.558694] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.558702] __cv_timedwait_io+0x15/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.558816] zio_wait+0x129/0x2b0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.558929] dsl_pool_sync+0x461/0x4f0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559032] spa_sync+0x575/0xfa0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559138] ? spa_txg_history_init_io+0x101/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559245] txg_sync_thread+0x2e0/0x4a0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559354] ? txg_fini+0x240/0x240 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559366] thread_generic_wrapper+0x6f/0x80 [spl]
May 19 11:37:25 curie kernel: [ 1571.559376] ? __thread_exit+0x20/0x20 [spl]
May 19 11:37:25 curie kernel: [ 1571.559379] kthread+0x11b/0x140
May 19 11:37:25 curie kernel: [ 1571.559382] ? __kthread_bind_mask+0x60/0x60
May 19 11:37:25 curie kernel: [ 1571.559386] ret_from_fork+0x22/0x30
May 19 11:37:25 curie kernel: [ 1571.559401] task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
May 19 11:37:25 curie kernel: [ 1571.559404] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559409] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559412] ? __kmalloc_node+0x141/0x2b0
May 19 11:37:25 curie kernel: [ 1571.559417] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559420] schedule_preempt_disabled+0xa/0x10
May 19 11:37:25 curie kernel: [ 1571.559424] __mutex_lock.constprop.0+0x133/0x460
May 19 11:37:25 curie kernel: [ 1571.559435] ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
May 19 11:37:25 curie kernel: [ 1571.559537] spa_all_configs+0x41/0x120 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559644] zfs_ioc_pool_configs+0x17/0x70 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559752] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559758] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.559860] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.559866] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.559869] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.559873] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.559876] RIP: 0033:0x7fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559878] RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.559881] RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
May 19 11:37:25 curie kernel: [ 1571.559883] RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
May 19 11:37:25 curie kernel: [ 1571.559885] RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
May 19 11:37:25 curie kernel: [ 1571.559886] R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
May 19 11:37:25 curie kernel: [ 1571.559888] R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
May 19 11:37:25 curie kernel: [ 1571.559980] task:zpool state:D stack: 0 pid:11815 ppid: 3816 flags:0x00004000
May 19 11:37:25 curie kernel: [ 1571.559983] Call Trace:
May 19 11:37:25 curie kernel: [ 1571.559988] __schedule+0x282/0x870
May 19 11:37:25 curie kernel: [ 1571.559992] schedule+0x46/0xb0
May 19 11:37:25 curie kernel: [ 1571.559995] io_schedule+0x42/0x70
May 19 11:37:25 curie kernel: [ 1571.560004] cv_wait_common+0xac/0x130 [spl]
May 19 11:37:25 curie kernel: [ 1571.560008] ? add_wait_queue_exclusive+0x70/0x70
May 19 11:37:25 curie kernel: [ 1571.560118] txg_wait_synced_impl+0xc9/0x110 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560223] txg_wait_synced+0xc/0x40 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560325] spa_export_common+0x4cd/0x590 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560430] ? zfs_log_history+0x9c/0xf0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560537] zfsdev_ioctl_common+0x697/0x870 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560543] ? _copy_from_user+0x28/0x60
May 19 11:37:25 curie kernel: [ 1571.560644] zfsdev_ioctl+0x53/0xe0 [zfs]
May 19 11:37:25 curie kernel: [ 1571.560649] __x64_sys_ioctl+0x83/0xb0
May 19 11:37:25 curie kernel: [ 1571.560653] do_syscall_64+0x33/0x80
May 19 11:37:25 curie kernel: [ 1571.560656] entry_SYSCALL_64_after_hwframe+0x44/0xa9
May 19 11:37:25 curie kernel: [ 1571.560659] RIP: 0033:0x7fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560661] RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
May 19 11:37:25 curie kernel: [ 1571.560664] RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
May 19 11:37:25 curie kernel: [ 1571.560666] RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
May 19 11:37:25 curie kernel: [ 1571.560667] RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
May 19 11:37:25 curie kernel: [ 1571.560669] R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
May 19 11:37:25 curie kernel: [ 1571.560671] R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
Here's another example, where you see the USB controller bleeping out
and back into existence:
mai 19 11:38:39 curie kernel: usb 2-1: USB disconnect, device number 2
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronizing SCSI cache
mai 19 11:38:39 curie kernel: sd 4:0:0:0: [sdd] Synchronize Cache(10) failed: Result: hostbyte=DID_ERROR driverbyte=DRIVER_OK
mai 19 11:39:25 curie kernel: INFO: task zed:1564 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zed state:D stack: 0 pid: 1564 ppid: 1 flags:0x00000000
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: ? __kmalloc_node+0x141/0x2b0
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: schedule_preempt_disabled+0xa/0x10
mai 19 11:39:25 curie kernel: __mutex_lock.constprop.0+0x133/0x460
mai 19 11:39:25 curie kernel: ? nvlist_xalloc.part.0+0x68/0xc0 [znvpair]
mai 19 11:39:25 curie kernel: spa_all_configs+0x41/0x120 [zfs]
mai 19 11:39:25 curie kernel: zfs_ioc_pool_configs+0x17/0x70 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fcf0ef32cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007fcf0e181618 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055b212f972a0 RCX: 00007fcf0ef32cc7
mai 19 11:39:25 curie kernel: RDX: 00007fcf0e181640 RSI: 0000000000005a04 RDI: 000000000000000b
mai 19 11:39:25 curie kernel: RBP: 00007fcf0e184c30 R08: 00007fcf08016810 R09: 00007fcf08000080
mai 19 11:39:25 curie kernel: R10: 0000000000080000 R11: 0000000000000246 R12: 000055b212f972a0
mai 19 11:39:25 curie kernel: R13: 0000000000000000 R14: 00007fcf0e181640 R15: 0000000000000000
mai 19 11:39:25 curie kernel: INFO: task zpool:11815 blocked for more than 241 seconds.
mai 19 11:39:25 curie kernel: Tainted: P IOE 5.10.0-14-amd64 #1 Debian 5.10.113-1
mai 19 11:39:25 curie kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mai 19 11:39:25 curie kernel: task:zpool state:D stack: 0 pid:11815 ppid: 2621 flags:0x00004004
mai 19 11:39:25 curie kernel: Call Trace:
mai 19 11:39:25 curie kernel: __schedule+0x282/0x870
mai 19 11:39:25 curie kernel: schedule+0x46/0xb0
mai 19 11:39:25 curie kernel: io_schedule+0x42/0x70
mai 19 11:39:25 curie kernel: cv_wait_common+0xac/0x130 [spl]
mai 19 11:39:25 curie kernel: ? add_wait_queue_exclusive+0x70/0x70
mai 19 11:39:25 curie kernel: txg_wait_synced_impl+0xc9/0x110 [zfs]
mai 19 11:39:25 curie kernel: txg_wait_synced+0xc/0x40 [zfs]
mai 19 11:39:25 curie kernel: spa_export_common+0x4cd/0x590 [zfs]
mai 19 11:39:25 curie kernel: ? zfs_log_history+0x9c/0xf0 [zfs]
mai 19 11:39:25 curie kernel: zfsdev_ioctl_common+0x697/0x870 [zfs]
mai 19 11:39:25 curie kernel: ? _copy_from_user+0x28/0x60
mai 19 11:39:25 curie kernel: zfsdev_ioctl+0x53/0xe0 [zfs]
mai 19 11:39:25 curie kernel: __x64_sys_ioctl+0x83/0xb0
mai 19 11:39:25 curie kernel: do_syscall_64+0x33/0x80
mai 19 11:39:25 curie kernel: entry_SYSCALL_64_after_hwframe+0x44/0xa9
mai 19 11:39:25 curie kernel: RIP: 0033:0x7fdc23be2cc7
mai 19 11:39:25 curie kernel: RSP: 002b:00007ffc8c792478 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
mai 19 11:39:25 curie kernel: RAX: ffffffffffffffda RBX: 000055942ca49e20 RCX: 00007fdc23be2cc7
mai 19 11:39:25 curie kernel: RDX: 00007ffc8c792490 RSI: 0000000000005a03 RDI: 0000000000000003
mai 19 11:39:25 curie kernel: RBP: 00007ffc8c795e80 R08: 00000000ffffffff R09: 00007ffc8c792310
mai 19 11:39:25 curie kernel: R10: 000055942ca49e30 R11: 0000000000000246 R12: 00007ffc8c792490
mai 19 11:39:25 curie kernel: R13: 000055942ca49e30 R14: 000055942aed2c20 R15: 00007ffc8c795a40
I understand those are rather extreme conditions: I would fully expect
the pool to stop working if the underlying drives disappear. What
doesn't seem acceptable is that a command would completely hang like
this.
Next.