Evgeni Golov: Fixing the volume control in an Alesis M1Active 330 USB Speaker System



wget -O /usr/local/hpePublicKey2048_key1.pub https://downloads.linux.hpe.com/SDR/hpePublicKey2048_key1.pub echo "# HP monitoring" >> /etc/apt/sources.list echo "deb [signed-by=/usr/local/hpePublicKey2048_key1.pub] http://downloads.linux.hpe.com/SDR/downloads/MCP/Debian/ stretch/current-gen9 non-free" >> /etc/apt/sources.listThe above commands will make the management utilities installable on Debian/Buster. If using Bullseye (Testing at the moment) then you need to have Buster repositories in APT for dependencies, HP doesn t seem to have packaged all their utilities for Buster.
wget -r -np -A Contents-amd64.bz2 http://downloads.linux.hpe.com/SDR/repo/mcp/debian/distsTo find out which repositories had the programs I need I ran the above recursive wget and then uncompressed them for grep -R (as an aside it would be nice if bzgrep supported -R). I installed the hp-health package which has hpasmcli for viewing and setting many configuration options and hplog for viewing event log data and thermal data (among a few other things). I ve added a new monitor to etbemon hp-temp.monitor to monitor HP server temperatures, I haven t made a configuration option to change the thresholds for what is considered normal because I don t expect server class systems to be routinely running above the warning temperature. For the linux-temp.monitor script I added a command-line option for the percentage of the high temperature that is an error condition as well as an option for the number of CPU cores that need to be over-temperature, having one core permanently over the high temperature due to a web browser seems standard for white-box workstations nowadays. The hp-health package depends on libc6-i686 lib32gcc1 even though none of the programs it contains use lib32gcc1. Depending on lib32gcc1 instead of lib32gcc1 lib32gcc-s1 means that installing hp-health requires removing mesa-opencl-icd which probably means that BOINC can t use the GPU among other things. I solved this by editing /var/lib/dpkg/status and changing the package dependencies to what I desired. Note that this is not something for a novice to do, make a backup and make sure you know what you are doing! Issues The HPE Dynamic Smart Array B140i is a software RAID device. While it s convenient for some users that software RAID gets supported in the UEFI boot process, generally software RAID is a bad idea. Also my system has hot-swap drive caddies but the controller doesn t support hot-swap. So the first thing to do was to configure the array controller to run in AHCI mode and give up on using hot-swap drive caddies for hot-swap. I tested all the documented ways of scanning for new devices and nothing other than a reboot made the kernel recognise a new SATA disk. According to specs provided by Dell and HP the ML110 Gen9 makes less noise than the PowerEdge T320, according to my own observations the reverse is the case. I don t know if this is because of Dell being more conservative in their specs than HP or because of how dBA is measured vs my own personal annoyance thresholds for sounds. As the system makes more noise than I m comfortable with I plan to build a rubber enclosure for the rear of the system to reduce noise, that will be the subject of another post. For Australian readers Bunnings has some good deals on rubber floor mats that can be used to reduce server noise. The server doesn t have sound hardware, while one could argue that servers don t need sound there are some server uses for sound hardware such as using line input as a source of entropy. Also for a manufacturer it might be a benefit to use the same motherboard for workstations and servers. Fortunately a friend gave me a nice set of Logitech USB speakers a few years ago that I hadn t previously had a cause to use, so that will solve the problem for me (I don t need line-in on a workstation). UEFI and Memtest I decided to try UEFI boot for something new (in the past I d only used UEFI boot for a server that only had large disks). In the past I ve booted all my own systems with BIOS boot because I m familiar with it and they all have SSDs for booting which are less than 2TB in size (until recently 2TB SSDs weren t affordable for my personal use). The Debian UEFI wiki page is worth reading [3]. The Debian Wiki page about ProLiant servers [4] is worth reading too. Memtest86+ doesn t support EFI booting (just goes to a black screen) even though Debian/Buster puts in a GRUB entry for it (Debian bug #695246 was filed for this in 2012). Also on my ML110 Memtest86+ doesn t report the RAM speed (a known issue on Memtest86+). Comments on the net say that Memtest86+ hasn t been maintained for a long time and Memtest86 (the non-free version) has been updated more recently. So far I haven t seen a system with ECC RAM have a memory problem that could be detected by Memtest86+, the memory problems I ve seen on ECC systems have been things that prevent booting (RAM not being recognised correctly), that are detected by the BIOS as ECC errors before booting, or that are reported by the kernel as ECC errors at run time (happened years ago and I can t remember the details). Overall I m not a fan of EFI with the way it currently works in Debian. It seems to add some of the GRUB functionality into the BIOS and then use that to load GRUB. It seems that EFI can do everything you need and it would be better to just have a single boot loader not two of them chained. Power Supply There are a range of PSUs for the ML110, the one I have has the smallest available PSU (350W) and doesn t have a PCIe power cable (the one used for video cards). Here is the HP document which shows the cabling for the various ML110 Gen8 PSUs [5], I have the 350W PSU. One thing I ve considered is whether I could make an adaptor from the drive bay power to the PCIe connector. A quick web search indicates that 4 SAS disks when active can take up to 75W more power than a system with no disks. If that s the case then the 2 spare drive bay connectors which can each handle 4 disks should be able to supply 150W. As a 6 pin PCIe power cable (GPU power cable) is rated at 75W that should be fine in theory (here s a page with the pinouts for PCIe power connectors [6]). My video card is a Radeon R7 260X which apparently takes about 113W all up so should be taking less than 75W from the PCIe power cable. All I really want is YouTube, Netflix, and text editing at 4K resolution. So I don t need much in terms of 3D power. KDE uses some of the advanced features of modern video cards, but it doesn t compare to 3D gaming. According to the Wikipedia page for Radeon RX 500 series [7] the RX560 supports DisplayPort 1.4 and HDMI 2.0 (both of which do 4K@60Hz) and has a TDP of 75W. So a RX560 video card seems like a good option that will work in any system that doesn t have a spare PCIe power cable. I ve just ordered one of those for $246 so hopefully that will arrive in a week or so. PCI Fan The ML110 Gen9 has an optional PCIe fan and baffle to cool PCIe cards (part number 784580-B21). Extra cooling of PCIe cards is a good thing, but $400 list price (and about $50 ebay price) for the fan and baffle is unpleasant. When I boot the system with a PCIe dual-ethernet card and two PCIe NVMe cards it gives a BIOS warning on boot, when I add a video card it refuses to boot without the extra fan. It s nice that the system makes sure it doesn t get into a thermal overload situation, but it would be nicer if they just shipped all necessary fans with it instead of trying to get more money out of customers. I just bought a PCI fan and baffle kit for $60. Conclusion In spite of the unexpected expense of a new video card and PCI fan the overall cost of this system is still low, particularly when considering that I ll find another use for the video card which needs and extra power connector. It is disappointing that HP didn t supply a more capable PSU and fit all the fans to all models, the expectation of a server is that you can just do server stuff not have to buy extra bits before you can do server stuff. If you want to install Tesla GPUs or something then it s expected that you might need to do something unusual with a server, but the basic stuff should just work. A single processor tower server should be designed to function as a deskside workstation and be able to handle an average video card. Generally it s a nice computer, I look forward to getting the next deliveries of parts so I can make it work properly.
remotes/private/attic/novena 822ca2bb add letter i sent to novena, never published
remotes/private/attic/secureboot de09d82b quick review, add note and graph
remotes/private/attic/wireguard 5c5340d1 wireguard review, tutorial and comparison with alternatives
remotes/private/backlog/dat 914c5edf Merge branch 'master' into backlog/dat
remotes/private/backlog/packet 9b2c6d1a ham radio packet innovations and primer
remotes/private/backlog/performance-tweaks dcf02676 config notes for http2
remotes/private/backlog/serverless 9fce6484 postponed until kubecon europe
remotes/private/fin/cost-of-hosting 00d8e499 cost-of-hosting article online
remotes/private/fin/kubecon f4fd7df2 remove published or spun off articles
remotes/private/fin/kubecon-overview 21fae984 publish kubecon overview article
remotes/private/fin/kubecon2018 1edc5ec8 add series
remotes/private/fin/netconf 3f4b7ece publish the netconf articles
remotes/private/fin/netdev 6ee66559 publish articles from netdev 2.2
remotes/private/fin/pgp-offline f841deed pgp offline branch ready for publication
remotes/private/fin/primes c7e5b912 publish the ROCA paper
remotes/private/fin/runtimes 4bee1d70 prepare publication of runtimes articles
remotes/private/fin/token-benchmarks 5a363992 regenerate timestamp automatically
remotes/private/ideas/astropy 95d53152 astropy or python in astronomy
remotes/private/ideas/avaneya 20a6d149 crowdfunded blade-runner-themed GPLv3 simcity-like simulator
remotes/private/ideas/backups-benchmarks fe2f1f13 review of backup software through performance and features
remotes/private/ideas/cumin 7bed3945 review of the cumin automation tool from WM foundation
remotes/private/ideas/future-of-distros d086ca0d modern packaging problems and complex apps
remotes/private/ideas/on-dying a92ad23f another dying thing
remotes/private/ideas/openpgp-discovery 8f2782f0 openpgp discovery mechanisms (WKD, etc), thanks to jonas meurer
remotes/private/ideas/password-bench 451602c0 bruteforce estimates for various password patterns compared with RSA key sizes
remotes/private/ideas/prometheus-openmetrics 2568dbd6 openmetrics standardizing prom metrics enpoints
remotes/private/ideas/telling-time f3c24a53 another way of telling time
remotes/private/ideas/wallabako 4f44c5da talk about wallabako, read-it-later + kobo hacking
remotes/private/stalled/bench-bench-bench 8cef0504 benchmarking http benchmarking tools
remotes/private/stalled/debian-survey-democracy 909bdc98 free software surveys and debian democracy, volunteer vs paid work
Wow, what a mess! Let's see if I can make sense of this:
novena
: the project is ooold now, didn't seem to fit a LWN
article. it was basically "how can i build my novena now" and "you
guys rock!" it seems like the MNT Reform is the brain child of
the Novena now, and I dare say it's even cooler!secureboot
: my LWN editors were critical of my approach, and
probably rightly so - it's a really complex subject and I was
probably out of my depth... it's also out of date now, we did
manage secureboot in Debianwireguard
: LWN ended up writing extensive coverage, and
I was biased against Donenfeld because of conflicts in a previous
projectdat
: I already had written Sharing and archiving data sets with
Dat, but
it seems I had more to say... mostly performance issues, beaker, no
streaming, limited adoption... to be investigated, I guess?packet
: a primer on data communications over ham radio, and the
cool new tech that has emerged in the free software world. those
are mainly notes about Pat, Direwolf, APRS and so
on... just never got around to making sense of it or really using
the tech...performance-tweaks
: "optimizing websites at the age of http2",
the unwritten story of the optimization of this website with HTTP/2
and friendsserverless
: god. one of the leftover topics at Kubecon, my notes
on this were thin, and the actual subject, possibly even
thinner... the only lie worse than the cloud is that there's no
server at all! concretely, that's a pile of notes about
Kubecon which I wanted to sort
through. Probably belongs in the attic now.astropy
: "Python in astronomy" - had a chat with saimn
while writing about sigal, and it
turns out he actually works on free software in astronomy, in
Python... I actually expect LWN to cover this sooner than
later, after Lee Phillips's introduction to SciPyavaneya
: crowdfunded blade-runner-themed GPLv3 simcity-like
simulator, i just have that link so farbackups-benchmarks
: review of backup software through performance
and features, possibly based on those benchmarks, maybe based
on this list from restic although they refused
casync. benchmark articles are hard though, especially when
you want to "cover them all"... I did write a silly Attic vs
Bup back when those programs existed (2014), in a related
note...ideas/cumin
: review of the Cumin automation tool from
WikiMedia Foundation... I ended up using the tool at work and
writing service documentation for itideas/future-of-distros
: modern packaging problems and complex
apps, starting from this discussion about the removal of
Dolibarr from Debian, a summary of the thread from liw,
and ideas from joeyh (now from the outside of Debian), then
debates over the power of FTP masters - ugh, glad I
didn't step in that rat's nestideas/on-dying
: "what happens when a hacker dies?" rather grim
subject, but a more and more important one... joeyh has ideas
again, phk as well, then there's a protocol for dying
(really grim)... then there are site policies like GitHub,
Facebook, etc... more in the branch, but that one I can't help but
think about now that family has taken a bigger place in my life...ideas/openpgp-discovery
: OpenPGP discovery mechanisms (WKD, etc),
suggested by Jonas Meurer (somewhere?), only links to
Mailveloppe, LEAP, WKD (or is it WKS?), another
standard, probably would need to talk about OpenPGP CA now
and how Debian and Tor manage their keyrings... pain in the back.ideas/password-bench
: bruteforce estimates for various password
patterns compared with RSA key sizes, spinoff of my smartcard
article, in
the crypto-bench, look at this shiny graph, surely that
must mean an article, right?ideas/prometheus-openmetrics
: "Evolving the Prometheus exposition
format into a standard", seems like this happenedideas/telling-time
: telling time to users is hard. xclock vs
ttyclock, etc. maybe gameclock and undertime as well? syncing time
is hard, but it turns out showing it is non trivial as
well... basically turning this bug report into an article. for
some reason I linked to this meme, derived from this
meme, presumably a premonition of my stupid idea of writing
undertime TIMEZONES!ideas/wallabako
: "talk about wallabako, read-it-later + kobo
hacking", that's it, not even a link to the project!stalled/bench-bench-bench
benchmarking http benchmarking tools, a
horrible mess of links, copy-paste from terminals, and ideas about
benchmarking... some of this trickled out into this benchmarking
guide at Tor, but not much more than the list of toolsstalled/debian-survey-democracy
: "free software surveys and
Debian democracy, volunteer vs paid work"... A long standing
concern of mine is that all Debian work is supposed to be
volunteer, and paying explicitly for work inside Debian has
traditionally been frowned upon, even leading to serious drama and
dissent (remember Dunc-Tank)? back when I was writing for LWN,
I was also doing paid work for Debian LTS. I also learned
that a lot (most?) Debian Developers were actually being paid by
their job to work on Debian. So I was confused by this apparent
contradiction, especially given how the LTS project has been mostly
accepted, while Dunc-Tank was not... See also this talk at Debconf
16. I had hopes that this study would show the "hunch"
people have offered (that most DDs are paid to work on Debian) but
it seems to show the reverse (only 36% of DDs, and 18% of all
respondents paid). So I am still confused and worried about the
sustainability of Debian.% sudo ceph osd tree ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF -1 65.44138 root default -2 21.81310 host server1 0 hdd 1.08989 osd.0 down 1.00000 1.00000 1 hdd 1.08989 osd.1 down 1.00000 1.00000 2 hdd 1.63539 osd.2 down 1.00000 1.00000 3 hdd 1.63539 osd.3 down 1.00000 1.00000 4 hdd 1.63539 osd.4 down 1.00000 1.00000 5 hdd 1.63539 osd.5 down 1.00000 1.00000 18 hdd 2.18279 osd.18 down 1.00000 1.00000 20 hdd 2.18179 osd.20 down 1.00000 1.00000 28 hdd 2.18179 osd.28 down 1.00000 1.00000 29 hdd 2.18179 osd.29 down 1.00000 1.00000 30 hdd 2.18179 osd.30 down 1.00000 1.00000 31 hdd 2.18179 osd.31 down 1.00000 1.00000 -4 21.81409 host server2 6 hdd 1.08989 osd.6 down 1.00000 1.00000 7 hdd 1.08989 osd.7 down 1.00000 1.00000 8 hdd 1.63539 osd.8 down 1.00000 1.00000 9 hdd 1.63539 osd.9 down 1.00000 1.00000 10 hdd 1.63539 osd.10 down 1.00000 1.00000 11 hdd 1.63539 osd.11 down 1.00000 1.00000 19 hdd 2.18179 osd.19 up 1.00000 1.00000 21 hdd 2.18279 osd.21 up 1.00000 1.00000 22 hdd 2.18279 osd.22 up 1.00000 1.00000 32 hdd 2.18179 osd.32 down 1.00000 1.00000 33 hdd 2.18179 osd.33 down 1.00000 1.00000 34 hdd 2.18179 osd.34 down 1.00000 1.00000 -3 21.81419 host server3 12 hdd 1.08989 osd.12 down 1.00000 1.00000 13 hdd 1.08989 osd.13 down 1.00000 1.00000 14 hdd 1.63539 osd.14 down 1.00000 1.00000 15 hdd 1.63539 osd.15 down 1.00000 1.00000 16 hdd 1.63539 osd.16 down 1.00000 1.00000 17 hdd 1.63539 osd.17 down 1.00000 1.00000 23 hdd 2.18190 osd.23 down 1.00000 1.00000 24 hdd 2.18279 osd.24 down 1.00000 1.00000 25 hdd 2.18279 osd.25 down 1.00000 1.00000 35 hdd 2.18179 osd.35 down 1.00000 1.00000 36 hdd 2.18179 osd.36 down 1.00000 1.00000 37 hdd 2.18179 osd.37 down 1.00000 1.00000Our blood pressure increased slightly! Did we just lose all of our cluster? What happened, and how can we get all the other OSDs back? We stumbled upon this beauty in our logs:
kernel: [ 73.697957] XFS (sdl1): SB stripe unit sanity check failed kernel: [ 73.698002] XFS (sdl1): Metadata corruption detected at xfs_sb_read_verify+0x10e/0x180 [xfs], xfs_sb block 0xffffffffffffffff kernel: [ 73.698799] XFS (sdl1): Unmount and run xfs_repair kernel: [ 73.699199] XFS (sdl1): First 128 bytes of corrupted metadata buffer: kernel: [ 73.699677] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. kernel: [ 73.700205] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ kernel: [ 73.700836] 00000020: 62 44 2b c0 e6 22 40 d7 84 3d e1 cc 65 88 e9 d8 bD+.."@..=..e... kernel: [ 73.701347] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... kernel: [ 73.701770] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ ceph-disk[4240]: mount: /var/lib/ceph/tmp/mnt.jw367Y: mount(2) system call failed: Structure needs cleaning. ceph-disk[4240]: ceph-disk: Mounting filesystem failed: Command '['/bin/mount', '-t', u'xfs', '-o', 'noatime,inode64', '--', '/dev/disk/by-parttypeuuid/4fbd7e29-9d25-41b8-afd0-062c0ceff05d.cdda39ed-5 ceph/tmp/mnt.jw367Y']' returned non-zero exit status 32 kernel: [ 73.702162] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ kernel: [ 73.702550] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ kernel: [ 73.702975] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ kernel: [ 73.703373] XFS (sdl1): SB validate failed with error -117.The same issue was present for the other failing OSDs. We hoped, that the data itself was still there, and only the mounting of the XFS partitions failed. The Ceph cluster was initially installed in 2017 with Ceph jewel/10.2 with the OSDs on filestore (nowadays being a legacy approach to storing objects in Ceph). However, we migrated the disks to bluestore since then (with ceph-disk and not yet via ceph-volume what s being used nowadays). Using ceph-disk introduces these 100MB XFS partitions containing basic metadata for the OSD. Given that we had three working OSDs left, we decided to investigate how to rebuild the failing ones. Some folks on #ceph (thanks T1, ormandj + peetaur!) were kind enough to share how working XFS partitions looked like for them. After creating a backup (via dd), we tried to re-create such an XFS partition on server1. We noticed that even mounting a freshly created XFS partition failed:
synpromika@server1 ~ % sudo mkfs.xfs -f -i size=2048 -m uuid="4568c300-ad83-4288-963e-badcd99bf54f" /dev/sdc1 meta-data=/dev/sdc1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=1, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 synpromika@server1 ~ % sudo mount /dev/sdc1 /mnt/ceph-recovery SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 cache_node_purge: refcount was 1, not zero (node=0x1d3c400) SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x18800/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x18800/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0x24c00/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x24c00/0x1000 SB stripe unit sanity check failed Metadata corruption detected at 0x433840, xfs_sb block 0xc400/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0xc400/0x1000 releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!releasing dirty buffer (bulk) to free list!found dirty buffer (bulk) on free list!bad magic number bad magic number Metadata corruption detected at 0x433840, xfs_sb block 0x0/0x1000 libxfs_writebufr: write verifer failed on xfs_sb bno 0x0/0x1000 releasing dirty buffer (bulk) to free list!mount: /mnt/ceph-recovery: wrong fs type, bad option, bad superblock on /dev/sdc1, missing codepage or helper program, or other error.Ouch. This very much looked related to the actual issue we re seeing. So we tried to execute mkfs.xfs with a bunch of different sunit/swidth settings. Using
-d sunit=512 -d swidth=512
at least worked then, so we decided to force its usage in the creation of our OSD XFS partition. This brought us a working XFS partition. Please note, sunit must not be larger than swidth (more on that later!).
Then we reconstructed how to restore all the metadata for the OSD (activate.monmap, active, block_uuid, bluefs, ceph_fsid, fsid, keyring, kv_backend, magic, mkfs_done, ready, require_osd_release, systemd, type, whoami
). To identify the UUID, we can read the data from ceph --format json osd dump
, like this for all our OSDs (Zsh syntax ftw!):
synpromika@server1 ~ % for f in 0..37 ; printf "osd-$f: %s\n" "$(sudo ceph --format json osd dump jq -r ".osds[] select(.osd==$f) .uuid")" osd-0: 4568c300-ad83-4288-963e-badcd99bf54f osd-1: e573a17a-ccde-4719-bdf8-eef66903ca4f osd-2: 0e1b2626-f248-4e7d-9950-f1a46644754e osd-3: 1ac6a0a2-20ee-4ed8-9f76-d24e900c800c [...]Identifying the corresponding raw device for each OSD UUID is possible via:
synpromika@server1 ~ % UUID="4568c300-ad83-4288-963e-badcd99bf54f" synpromika@server1 ~ % readlink -f /dev/disk/by-partuuid/"$ UUID " /dev/sdc1The OSD s key ID can be retrieved via:
synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph auth get osd."$ OSD_ID " -f json 2>/dev/null jq -r '.[] .key' AQCKFpZdm0We[...]Now we also need to identify the underlying block device:
synpromika@server1 ~ % OSD_ID=0 synpromika@server1 ~ % sudo ceph osd metadata osd."$ OSD_ID " -f json jq -r '.bluestore_bdev_partition_path' /dev/sdc2With all of this, we reconstructed the
keyring, fsid, whoami, block + block_uuid
files. All the other files inside the XFS metadata partition are identical on each OSD. So after placing and adjusting the corresponding metadata on the XFS partition for Ceph usage, we got a working OSD hurray! Since we had to fix yet another 32 OSDs, we decided to automate this XFS partitioning and metadata recovery procedure.
We had a network share available on /srv/backup for storing backups of existing partition data. On each server, we tested the procedure with one single OSD before iterating over the list of remaining failing OSDs. We started with a shell script on server1, then adjusted the script for server2 and server3. This is the script, as we executed it on the 3rd server.
Thanks to this, we managed to get the Ceph cluster up and running again. We didn t want to continue with the Ceph upgrade itself during the night though, as we wanted to know exactly what was going on and why the system behaved like that. Time for RCA!
Root Cause Analysis
So all but three OSDs on server2 failed, and the problem seems to be related to XFS. Therefore, our starting point for the RCA was, to identify what was different on server2, as compared to server1 + server3. My initial assumption was that this was related to some firmware issues with the involved controller (and as it turned out later, I was right!). The disks were attached as JBOD devices to a ServeRAID M5210 controller (with a stripe size of 512). Firmware state:
synpromika@server1 ~ % sudo storcli64 /c0 show all grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156 synpromika@server2 ~ % sudo storcli64 /c0 show all grep '^Firmware' Firmware Package Build = 24.21.0-0112 Firmware Version = 4.680.00-8489 synpromika@server3 ~ % sudo storcli64 /c0 show all grep '^Firmware' Firmware Package Build = 24.16.0-0092 Firmware Version = 4.660.00-8156This looked very promising, as server2 indeed runs with a different firmware version on the controller. But how so? Well, the motherboard of server2 got replaced by a Lenovo/IBM technician in January 2020, as we had a failing memory slot during a memory upgrade. As part of this procedure, the Lenovo/IBM technician installed the latest firmware versions. According to our documentation, some OSDs were rebuilt (due to the filestore->bluestore migration) in March and April 2020. It turned out that precisely those OSDs were the ones that survived the upgrade. So the surviving drives were created with a different firmware version running on the involved controller. All the other OSDs were created with an older controller firmware. But what difference does this make? Now let s check firmware changelogs. For the 24.21.0-0097 release we found this:
- Cannot create or mount xfs filesystem using xfsprogs 4.19.x kernel 4.20(SCGCQ02027889) - xfs_info command run on an XFS file system created on a VD of strip size 1M shows sunit and swidth as 0(SCGCQ02056038)Our XFS problem certainly was related to the controller s firmware. We also recalled that our monitoring system reported different sunit settings for the OSDs that were rebuilt in March and April. For example, OSD 21 was recreated and got different sunit settings:
WARN server2.example.org Mount options of /var/lib/ceph/osd/ceph-21 WARN - Missing: sunit=1024, Exceeding: sunit=512We compared the new OSD 21 with an existing one (OSD 25 on server3):
synpromika@server2 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d21.mount grep sunit Options=rw,noatime,attr2,inode64,sunit=512,swidth=512,noquota synpromika@server3 ~ % systemctl show var-lib-ceph-osd-ceph\\x2d25.mount grep sunit Options=rw,noatime,attr2,inode64,sunit=1024,swidth=512,noquotaThanks to our documentation, we could compare execution logs of their creation:
% diff -u ceph-disk-osd-25.log ceph-disk-osd-21.log -synpromika@server2 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdj --osd-id 25 +synpromika@server3 ~ % sudo ceph-disk -v prepare --bluestore /dev/sdi --osd-id 21 [...] -command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdj1 -meta-data=/dev/sdj1 isize=2048 agcount=4, agsize=6272 blks [...] +command_check_call: Running command: /sbin/mkfs -t xfs -f -i size=2048 -- /dev/sdi1 +meta-data=/dev/sdi1 isize=2048 agcount=4, agsize=6336 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0, reflink=0 -data = bsize=4096 blocks=25088, imaxpct=25 - = sunit=128 swidth=64 blks +data = bsize=4096 blocks=25344, imaxpct=25 + = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 [...]So back then, we even tried to track this down but couldn t make sense of it yet. But now this sounds very much like it is related to the problem we saw with this Ceph/XFS failure. We follow Occam s razor, assuming the simplest explanation is usually the right one, so let s check the disk properties and see what differs:
synpromika@server1 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 524288 262144 synpromika@server2 ~ % sudo blockdev --getsz --getsize64 --getss --getpbsz --getiomin --getioopt /dev/sdk 4685545472 2398999281664 512 4096 262144 262144See the difference between server1 and server2 for identical disks? The
getiomin
option now reports something different for them:
synpromika@server1 ~ % sudo blockdev --getiomin /dev/sdk 524288 synpromika@server1 ~ % cat /sys/block/sdk/queue/minimum_io_size 524288 synpromika@server2 ~ % sudo blockdev --getiomin /dev/sdk 262144 synpromika@server2 ~ % cat /sys/block/sdk/queue/minimum_io_size 262144It doesn t make sense that the minimum I/O size (iomin, AKA
BLKIOMIN
) is bigger than the optimal I/O size (ioopt, AKA BLKIOOPT
). This leads us to Bug 202127 cannot mount or create xfs on a 597T device, which matches our findings here. But why did this XFS partition work in the past and fails now with the newer kernel version?
The XFS behaviour change
Now given that we have backups of all the XFS partition, we wanted to track down, a) when this XFS behaviour was introduced, and b) whether, and if so how it would be possible to reuse the XFS partition without having to rebuild it from scratch (e.g. if you would have no working Ceph OSD or backups left).
Let s look at such a failing XFS partition with the Grml live system:
root@grml ~ # grml-version grml64-full 2020.06 Release Codename Ausgehfuahangl [2020-06-24] root@grml ~ # uname -a Linux grml 5.6.0-2-amd64 #1 SMP Debian 5.6.14-2 (2020-06-09) x86_64 GNU/Linux root@grml ~ # grml-hostname grml-2020-06 Setting hostname to grml-2020-06: done root@grml ~ # exec zsh root@grml-2020-06 ~ # dpkg -l xfsprogs util-linux Desired=Unknown/Install/Remove/Purge/Hold Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend / Err?=(none)/Reinst-required (Status,Err: uppercase=bad) / Name Version Architecture Description +++-==============-============-============-========================================= ii util-linux 2.35.2-4 amd64 miscellaneous system utilities ii xfsprogs 5.6.0-1+b2 amd64 Utilities for managing the XFS filesystemThere it s failing, no matter which mount option we try:
root@grml-2020-06 ~ # mount ./sdd1.dd /mnt mount: /mnt: mount(2) system call failed: Structure needs cleaning. root@grml-2020-06 ~ # dmesg tail -30 [...] [ 64.788640] XFS (loop1): SB stripe unit sanity check failed [ 64.788671] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 64.788671] XFS (loop1): Unmount and run xfs_repair [ 64.788672] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 64.788673] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 64.788674] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 64.788675] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 64.788675] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 64.788675] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 64.788676] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 64.788677] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 64.788677] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 64.788679] XFS (loop1): SB validate failed with error -117. root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: wrong fs type, bad option, bad superblock on /dev/loop1, missing codepage or helper program, or other error. 32 root@grml-2020-06 ~ # dmesg tail -1 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) root@grml-2020-06 ~ # mount -t xfs -o rw,relatime,attr2,inode64,sunit=512,swidth=512,noquota ./sdd1.dd /mnt/ mount: /mnt: mount(2) system call failed: Structure needs cleaning. 32 root@grml-2020-06 ~ # dmesg tail -14 [ 66.342976] XFS (loop1): stripe width (512) must be a multiple of the stripe unit (1024) [ 80.751277] XFS (loop1): SB stripe unit sanity check failed [ 80.751323] XFS (loop1): Metadata corruption detected at xfs_sb_read_verify+0x102/0x170 [xfs], xfs_sb block 0xffffffffffffffff [ 80.751324] XFS (loop1): Unmount and run xfs_repair [ 80.751325] XFS (loop1): First 128 bytes of corrupted metadata buffer: [ 80.751327] 00000000: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 62 00 XFSB..........b. [ 80.751328] 00000010: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ [ 80.751330] 00000020: 32 b6 dc 35 53 b7 44 96 9d 63 30 ab b3 2b 68 36 2..5S.D..c0..+h6 [ 80.751331] 00000030: 00 00 00 00 00 00 40 08 00 00 00 00 00 00 01 00 ......@......... [ 80.751331] 00000040: 00 00 00 00 00 00 01 01 00 00 00 00 00 00 01 02 ................ [ 80.751332] 00000050: 00 00 00 01 00 00 18 80 00 00 00 04 00 00 00 00 ................ [ 80.751333] 00000060: 00 00 06 48 bd a5 10 00 08 00 00 02 00 00 00 00 ...H............ [ 80.751334] 00000070: 00 00 00 00 00 00 00 00 0c 0c 0b 01 0d 00 00 19 ................ [ 80.751338] XFS (loop1): SB validate failed with error -117.Also xfs_repair doesn t help either:
root@grml-2020-06 ~ # xfs_info ./sdd1.dd meta-data=./sdd1.dd isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now.With the SB stripe unit sanity check failed message, we could easily track this down to the following commit fa4ca9c:
% git show fa4ca9c5574605d1e48b7e617705230a0640b6da cat commit fa4ca9c5574605d1e48b7e617705230a0640b6da Author: Dave Chinner <dchinner@redhat.com> Date: Tue Jun 5 10:06:16 2018 -0700 xfs: catch bad stripe alignment configurations When stripe alignments are invalid, data alignment algorithms in the allocator may not work correctly. Ensure we catch superblocks with invalid stripe alignment setups at mount time. These data alignment mismatches are now detected at mount time like this: XFS (loop0): SB stripe unit sanity check failed XFS (loop0): Metadata corruption detected at xfs_sb_read_verify+0xab/0x110, xfs_sb block 0xffffffffffffffff XFS (loop0): Unmount and run xfs_repair XFS (loop0): First 128 bytes of corrupted metadata buffer: 0000000091c2de02: 58 46 53 42 00 00 10 00 00 00 00 00 00 00 10 00 XFSB............ 0000000023bff869: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................ 00000000cdd8c893: 17 32 37 15 ff ca 46 3d 9a 17 d3 33 04 b5 f1 a2 .27...F=...3.... 000000009fd2844f: 00 00 00 00 00 00 00 04 00 00 00 00 00 00 06 d0 ................ 0000000088e9b0bb: 00 00 00 00 00 00 06 d1 00 00 00 00 00 00 06 d2 ................ 00000000ff233a20: 00 00 00 01 00 00 10 00 00 00 00 01 00 00 00 00 ................ 000000009db0ac8b: 00 00 03 60 e1 34 02 00 08 00 00 02 00 00 00 00 ... .4.......... 00000000f7022460: 00 00 00 00 00 00 00 00 0c 09 0b 01 0c 00 00 19 ................ XFS (loop0): SB validate failed with error -117. And the mount fails. Signed-off-by: Dave Chinner <dchinner@redhat.com> Reviewed-by: Carlos Maiolino <cmaiolino@redhat.com> Reviewed-by: Darrick J. Wong <darrick.wong@oracle.com> Signed-off-by: Darrick J. Wong <darrick.wong@oracle.com> diff --git fs/xfs/libxfs/xfs_sb.c fs/xfs/libxfs/xfs_sb.c index b5dca3c8c84d..c06b6fc92966 100644 --- fs/xfs/libxfs/xfs_sb.c +++ fs/xfs/libxfs/xfs_sb.c @@ -278,6 +278,22 @@ xfs_mount_validate_sb( return -EFSCORRUPTED; + if (sbp->sb_unit) + if (!xfs_sb_version_hasdalign(sbp) + sbp->sb_unit > sbp->sb_width + (sbp->sb_width % sbp->sb_unit) != 0) + xfs_notice(mp, "SB stripe unit sanity check failed"); + return -EFSCORRUPTED; + + else if (xfs_sb_version_hasdalign(sbp)) + xfs_notice(mp, "SB stripe alignment sanity check failed"); + return -EFSCORRUPTED; + else if (sbp->sb_width) + xfs_notice(mp, "SB stripe width sanity check failed"); + return -EFSCORRUPTED; + + + if (xfs_sb_version_hascrc(&mp->m_sb) && sbp->sb_blocksize < XFS_MIN_CRC_BLOCKSIZE) xfs_notice(mp, "v5 SB sanity check failed");This change is included in kernel versions 4.18-rc1 and newer:
% git describe --contains fa4ca9c5574605d1e48 v4.18-rc1~37^2~14Now let s try with an older kernel version (4.9.0), using old Grml 2017.05 release:
root@grml ~ # grml-version grml64-small 2017.05 Release Codename Freedatensuppe [2017-05-31] root@grml ~ # uname -a Linux grml 4.9.0-1-grml-amd64 #1 SMP Debian 4.9.29-1+grml.1 (2017-05-24) x86_64 GNU/Linux root@grml ~ # lsb_release -a No LSB modules are available. Distributor ID: Debian Description: Debian GNU/Linux 9.0 (stretch) Release: 9.0 Codename: stretch root@grml ~ # grml-hostname grml-2017-05 Setting hostname to grml-2017-05: done root@grml ~ # exec zsh root@grml-2017-05 ~ # root@grml-2017-05 ~ # xfs_info ./sdd1.dd xfs_info: ./sdd1.dd is not a mounted XFS filesystem 1 root@grml-2017-05 ~ # xfs_repair ./sdd1.dd Phase 1 - find and verify superblock... bad primary superblock - bad stripe width in superblock !!! attempting to find secondary superblock... ..............................................................................................Sorry, could not find valid secondary superblock Exiting now. 1 root@grml-2017-05 ~ # mount ./sdd1.dd /mnt root@grml-2017-05 ~ # mount -t xfs /root/sdd1.dd on /mnt type xfs (rw,relatime,attr2,inode64,sunit=1024,swidth=512,noquota) root@grml-2017-05 ~ # ls /mnt activate.monmap active block block_uuid bluefs ceph_fsid fsid keyring kv_backend magic mkfs_done ready require_osd_release systemd type whoami root@grml-2017-05 ~ # xfs_info /mnt meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1 spinodes=0 rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=128 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0 ftype=1 log =internal bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0Mounting there indeed works! Now, if we mount the filesystem with new and proper sunit/swidth settings using the older kernel, it should rewrite them on disk:
root@grml-2017-05 ~ # mount -t xfs -o sunit=512,swidth=512 ./sdd1.dd /mnt/ root@grml-2017-05 ~ # umount /mnt/And indeed, mounting this rewritten filesystem then also works with newer kernels:
root@grml-2020-06 ~ # mount ./sdd1.rewritten /mnt/ root@grml-2020-06 ~ # xfs_info /root/sdd1.rewritten meta-data=/dev/loop1 isize=2048 agcount=4, agsize=6272 blks = sectsz=4096 attr=2, projid32bit=1 = crc=1 finobt=1, sparse=0, rmapbt=0 = reflink=0 data = bsize=4096 blocks=25088, imaxpct=25 = sunit=64 swidth=64 blks naming =version 2 bsize=4096 ascii-ci=0, ftype=1 log =internal log bsize=4096 blocks=1608, version=2 = sectsz=4096 sunit=1 blks, lazy-count=1 realtime =none extsz=4096 blocks=0, rtextents=0 root@grml-2020-06 ~ # mount -t xfs /root/sdd1.rewritten on /mnt type xfs (rw,relatime,attr2,inode64,logbufs=8,logbsize=32k,sunit=512,swidth=512,noquota)FTR: The
sunit=512,swidth=512
from the xfs mount option is identical to xfs_info s output sunit=64,swidth=64
(because mount.xfs s sunit value is given in 512-byte block units, see man 5 xfs, and the xfs_info output reported here is in blocks with a block size (bsize) of 4096, so sunit = 512*512 := 64*4096
).
mkfs uses minimum and optimal sizes for stripe unit and stripe width; you can check this e.g. via (note that server2 with fixed firmware version reports proper values, whereas server3 with broken controller firmware reports non-sense):
synpromika@server2 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 262144 262144 /sys/block/sdd/queue/: 262144 262144 /sys/block/sde/queue/: 262144 262144 /sys/block/sdf/queue/: 262144 262144 /sys/block/sdg/queue/: 262144 262144 /sys/block/sdh/queue/: 262144 262144 /sys/block/sdi/queue/: 262144 262144 /sys/block/sdj/queue/: 262144 262144 /sys/block/sdk/queue/: 262144 262144 /sys/block/sdl/queue/: 262144 262144 /sys/block/sdm/queue/: 262144 262144 /sys/block/sdn/queue/: 262144 262144 [...] synpromika@server3 ~ % for i in /sys/block/sd*/queue/ ; do printf "%s: %s %s\n" "$i" "$(cat "$i"/minimum_io_size)" "$(cat "$i"/optimal_io_size)" ; done [...] /sys/block/sdc/queue/: 524288 262144 /sys/block/sdd/queue/: 524288 262144 /sys/block/sde/queue/: 524288 262144 /sys/block/sdf/queue/: 524288 262144 /sys/block/sdg/queue/: 524288 262144 /sys/block/sdh/queue/: 524288 262144 /sys/block/sdi/queue/: 524288 262144 /sys/block/sdj/queue/: 524288 262144 /sys/block/sdk/queue/: 524288 262144 /sys/block/sdl/queue/: 524288 262144 /sys/block/sdm/queue/: 524288 262144 /sys/block/sdn/queue/: 524288 262144 [...]This is the underlying reason why the initially created XFS partitions were created with incorrect sunit/swidth settings. The broken firmware of server1 and server3 was the cause of the incorrect settings they were ignored by old(er) xfs/kernel versions, but treated as an error by new ones. Make sure to also read the XFS FAQ regarding How to calculate the correct sunit,swidth values for optimal performance . We also stumbled upon two interesting reads in RedHat s knowledge base: 5075561 + 2150101 (requires an active subscription, though) and #1835947. Am I affected? How to work around it? To check whether your XFS mount points are affected by this issue, the following command line should be useful:
awk '$3 == "xfs" print $2 ' /proc/self/mounts while read mount ; do echo -n "$mount " ; xfs_info $mount awk '$0 ~ "swidth" gsub(/.*=/,"",$2); gsub(/.*=/,"",$3); print $2,$3 ' awk ' if ($1 > $2) print "impacted"; else print "OK" ' ; doneIf you run into the above situation, the only known solution to get your original XFS partition working again, is to boot into an older kernel version again (4.17 or older), mount the XFS partition with correct sunit/swidth settings and then boot back into your new system (kernel version wise). Lessons learned
deb https://download.opensuse.org/repositories/home:/npreining:/debian-kde:/apps2012/Debian_Unstable/ ./
Testing
instead of Unstable
for Debian/testing.
For those adventurous, there is also the digikam-beta
repository
That s it. Have a nice Christmas, if you celebrate it, and a good start into a hopefully better 2021!
Enjoy.
Silicon Valley s elite are hatching plans to escape disaster and when it comes, they ll leave the rest of us behind
Heteronomy refers to action that is influenced by a force outside the individual, in other words the state or condition of being ruled, governed, or under the sway of another, as in a military occupation.
Poster P590CW $9.00 Early Warning Signs Of Fascism Laurence W. Britt wrote about the common signs of fascism in April, 2003, after researching seven fascist regimes: Hitler's Nazi Germany; Mussolini's Italy; Franco's Spain; Salazar's Portugal; Papadopoulos' Greece; Pinochet's Chile; Suharto's Indonesia. Get involved! Text: Early Warning Signs of Fascism Powerful and Continuing Nationalism Disdain For Human Rights Identification of Enemies As a unifying cause Supremacy of the military Rampant Sexism Controlled Mass Media Obsession With National Security
Political and social scientist Stefania Milan writes about social movements, mobilization and organized collective action. On the one hand, interactions and networks achieve more visibility and become a proxy for a collective we . On the other hand: Law enforcement can exercise preemptive monitorin
How new technologies and techniques pioneered by dictators will shape the 2020 election
A regional election offers lessons on combatting the rise of the far right, both across the Continent and in the United States.
The Italian diaspora is the large-scale emigration of Italians from Italy. There are two major Italian diasporas in Italian history. The first diaspora began more or less around 1880, a decade or so after the Unification of Italy (with most leaving after 1880), and ended in the 1920s to early-1940s with the rise of Fascism in Italy. The second diaspora started after the end of World War II and roughly concluded in the 1970s. These together constituted the largest voluntary emigration period in documented history. Between 1880-1980, about 15,000,000 Italians left the country permanently. By 1980, it was estimated that about 25,000,000 Italians were residing outside Italy. A third wave is being reported in present times, due to the socio-economic problems caused by the financial crisis of the early twenty-first century, especially amongst the youth. According to the Public Register of Italian Residents Abroad (AIRE), figures of Italians abroad rose from 3,106,251 in 2006 to 4,636,647 in 2015, growing by 49.3% in just ten years.
60
) continued in
git, taking
in contributions from:
0.023-2~bpo8+1
to
jessie-backports.
A new version of strip-nondeterminism 0.024-1
was uploaded to unstable by
Chris Lamb. It included
contributions
from:
GtkFileChooserDialog
. The intention is
that other environments such as KDE will substitute their own equivalent.
Other planned portals include:
GProxyResolver
) and availability (GNetworkMonitor
)/etc/X11/Xsession.d
,
/etc/X11/xinit/xinitrc.d
, /etc/profile.d
), but these do not apply to
Wayland, or to noninteractive login environments like cron
and
systemd --user
. We are also keen to avoid requiring a Turing-complete shell
language during session startup, because it's difficult to reason about
and potentially rather inefficient.
Some uses of environment variables can be dismissed as unnecessary or even
unwanted, similar to the statement in Debian Policy 9.9: "A program must not
depend on environment variables to get reasonable defaults." However,
there are two common situations where environment variables can be necessary
for proper OS integration: search-paths like $PATH
, $XDG_DATA_DIRS
and
$PYTHONPATH
(particularly necessary for things like Flatpak), and
optionally-loaded modules like $GTK_MODULES
and $QT_ACCESSIBILITY
where a package influences the configuration of another package.
There is a stopgap solution in GNOME's gdm display manager,
/usr/share/gdm/env.d
, but this is gdm-specific and
insufficiently expressive to provide the functionality needed by
Flatpak: "set XDG_DATA_DIRS
to its specified default value if unset,
then add a couple of extra paths".
pam_env
comes closer PAM is run at every transition from "no user logged
in" to "user can execute arbitrary code as themselves" but it doesn't
support .d
fragments, which are required if we want distribution packages
to be able to extend search paths. pam_env
also turns off per-user
configuration by default, citing security concerns.
I'll write more about this when I have a concrete proposal for how to solve it.
I think the best solution is probably a PAM module similar to pam_env
but
supporting .d
directories, either by modifying pam_env
directly or
out-of-tree, combined with clarifying what the security concerns for
per-user configuration are and how they can be avoided.
Relocatable binary packages
On Windows and OS X, various GLib APIs automatically discover where the
application binary is located and use search paths relative to that;
for example, if C:\myprefix\bin\app.exe
is running, GLib might put
C:\myprefix\share
into the result of g_get_system_data_dirs()
,
so that the application can ask to load app/data.xml
from the data
directories and get C:\myprefix\share\app\data.xml
. We would like
to be able to do the same on Linux, for example so that the apps in a
Flatpak or Snap package can be constructed from RPM or dpkg packages without
needing to be recompiled for a different --prefix
, and so that other
third-party software packages like the games on Steam and gog.com can
easily locate their own resources.
Relatedly, there are currently no well-defined semantics for what happens
when a .desktop
file or a D-Bus .service
file has Exec=./bin/foo
.
The meaning of Exec=foo
is well-defined (it searches $PATH
) and the
meaning of Exec=/opt/whatever/bin/foo
is obvious. When this came up in
D-Bus previously, my assertion was that the meaning should be the same as
in .desktop
files, whatever that is.
We agreed to propose that the meaning of a non-absolute path in a .desktop
or .service
file should be interpreted relative to the directory
where the .desktop
or .service
file was found: for example, if
/opt/whatever/share/applications/foo.desktop
says Exec=../../bin/foo
,
then /opt/whatever/bin/foo
would be the right thing to execute.
While preparing a mail to the freedesktop and D-Bus mailing lists proposing
this, I found that I had
proposed the same thing
almost 2 years ago...
this time I hope I can actually make it happen!
Flatpak and OSTree bug fixing
On the way to the hackfest, and while the discussion moved to topics that I
didn't have useful input on, I spent some time fixing up the Debian packaging
for Flatpak and its dependencies. In particular, I did my first upload
as a co-maintainer of bubblewrap,
uploaded ostree to unstable (with the known limitation that the grub, dracut
and systemd integration is missing for now
since I haven't been able to test it yet), got
most of the way through packaging Flatpak 0.6.5 (which I'll upload soon),
cherry-picked the right patches to make ostree compile on Debian 8 in an effort
to make backports trivial, and spent some time disentangling
a flatpak test failure
which was breaking the Debian package's installed-tests.
I'm still looking into
ostree test failures on little-endian MIPS,
which I was able to reproduce on a Debian porterbox just before the end of the
hackfest.
OSTree + Debian
I also had some useful conversations with developers from
Endless, who recently opened up a version of their
OSTree build scripts
for public access. Hopefully that information brings me a bit closer to
being able to publish a walkthrough for
how to deploy a simple Debian derivative using OSTree
(help with that is very welcome of course!).
GTK life-cycle and versioning
The life-cycle of GTK releases has already been
mentioned here and elsewhere, and there are some interesting responses
in the comments on my earlier blog post.
It's important to note that what we
discussed at the hackfest is only a proposal: a hackfest discussion between
a subset of the GTK maintainers and a small number of other GTK users
(I am in the latter category)
doesn't, and shouldn't, set policy for all of GTK or for all of GNOME. I
believe the intention is that the GTK maintainers will discuss the proposals
further at GUADEC, and make a decision after that.
As I said before, I hope that being more realistic about API and ABI
guarantees can avoid GTK going too far towards either of the possible
extremes: either becoming unable to advance because it's too constrained
by compatibility, or breaking applications because it isn't constrained
enough. The current situation, where it is meant to be compatible within
the GTK 3 branch but in practice applications still sometimes break,
doesn't seem ideal for anyone, and I hope we can do better in future.
Acknowledgements
Thanks to everyone involved, particularly:
apt-get install tftpd-hpa
). # apt-get install u-boot-tools
$ cat vmlinuz armada-370-netgear-rn104.dtb > vmlinuz-rn104
$ mkimage -A arm -O linux -T multi -C none -a 0x04000000 -e 0x04000000 -n "Debian Jessie armhf installer" -d vmlinuz-rn104:initrd.gz uImage-installer-rn104
# cp uImage-installer-rn104 /srv/tftp
dhcp
setenv serverip 192.168.1.17
tftp uImage-installer-rn104
bootm $load_addr
192.168.1.17
being the IPv4 of the machine you set up the tftp server
above, adapt accordingly to your setup.
While in U-Boot the default ethernet device is the lower jack, the installer is
only able to use the upper, so replug the ethernet cable to the upper
receptacle.
Go through the installation, and before rebooting do the following:
Select "Change debconf priority" and set it to "low". Then "Download installer components" and check "mtd-modules-3.16.0-4-armmp-di". After that "Execute a shell" and do:
# depmod -a
# modprobe pxa3xx_nand
# apt-install flash-kernel
# mount --bind /proc /target/proc
# chroot /target
# cat >> /etc/flash-kernel/db << EOF
Machine: NETGEAR ReadyNAS 104
DTB-Id: armada-370-netgear-rn104.dtb
DTB-Append: yes
Mtd-Kernel: uImage
Mtd-Initrd: minirootfs
U-Boot-Kernel-Address: 0x04000000
U-Boot-Initrd-Address: 0x05000000
Required-Packages: u-boot-tools
EOF
# flash-kernel
Bootloader-Sets-Incorrect-Root: yes
to
/etc/flash-kernel/db
.
[1] Note this page is about a ReadyNAS 102, the pinout is identical though.
.buildinfo
files from being removed from the upload queue.
Toolchain fixes
dh_installinit
. Patch by Reiner Herrmann.debian/changelog
entry for LXC_GENERATE_DATE
.SOURCE_DATE_EPOCH
.armhf
. (h01ger)
Arch Linux packages in the multilib and community repositories (4,000 more source packages) are also being tested. All of these test results are better analyzed and nicely displayed together with each package. (h01ger)
For Fedora, build jobs can now run in parallel. Two are currently running, now testing reproducibility of 785 source packages from Fedora 23.
mock/1.2.3-1.1 has been uploaded to experimental to better build RPMs. (h01ger)
Work has started on having automatic build node pools to maximize use of armhf
build nodes. (Vagrant Cascadian)
diffoscope development
Version 43 has been released on December 15th. It has been dubbed as epic! as it contains many contributions that were written around the summit in Athens.
Baptiste Daroussin found that running diffoscope on some Tar archives could overwrite arbitrary files. This has been fixed by using libarchive instead of Python internal Tar library and adding a sanity check for destination paths. In any cases, until proper sandboxing is implemented, don't run diffosope on unstrusted inputs outside an isolated, throw-away system.
Mike Hommey identified that the CBFS comparator would needlessly waste time scanning big files. It will now not consider any files bigger than 24 MiB 8 MiB more than the largest ROM created by coreboot at this time. An encoding issue related to Zip files has also been fixed. (Lunar)
New comparators have been added: Android dex files (Reiner Herrmann), filesystem images using libguestfs (Reiner Herrmann), icons and JPEG images using libcaca (Chris Lamb), and OS X binaries (Clemens Lang). The comparator for Free Pascal Compilation Unit will now only be used when the unit version matches the compiler one. (Levente Polyak)
A new multi-file HTML output with on-demand loading of long diffs is available through the --html-dir
option. On-demand loading requires jQuery which path can be specified through the --jquery
option. The diffs can also be simply browsed for non-JavaScript users or when jQuery is not available. (Joachim Breitner)
en_US.UTF-8
(Ed Maste), the --list-tools
option can now support multiple systems (Mattia Rizzolo, Levente Polyak, Lunar).
Many internal changes and code clean-ups have been made, paving the way for parallel processing. (Lunar)
Version 44 was released on December 18th fixing an issue affecting .deb
lacking a md5sums
file introduced in a previous refactoring (Lunar). Support has been added for Mozilla optimized Zip files. (Mike Hommey). The HTML output has been optimized in size (Mike Hommey, Esa Peuha, Lunar), speed (Lunar), and will now properly number lines (Mike Hommey). A message will always be displayed when lines are ignored at the end of a diff (Lunar). For portability and consistency, Python os.walk()
function is now used instead of find
to perform directory listing. (Lunar)
Documentation update
Package reviews
143 reviews have been removed, 69 added and 22 updated in the previous week.
Chris Lamb reported 12 new FTBFS issues.
News issues identified this week: random_order_in_init_py_generated_by_python-genpy, timestamps_in_copyright_added_by_perl_dist_zilla, random_contents_in_dat_files_generated_by_chasen-dictutils_makemat, timestamps_in_documentation_generated_by_pandoc.
Chris West did some improvements on the scripts used to manage notes in the misc repository.
Misc.
Accounts of the reproducible builds summit in Athens were written by Thomas Klausner from NetBSD and Hans-Christoph Steiner from The Guardian Project.
Some openSUSE developers are working on a hackweek on reproducible builds which was discussed on the opensuse-packaging mailing-list.
debian/control
file with all locales. Original patch by Chris Lamb.SOURCE_DATE_EPOCH
. She uploded a package with the enhancement to the experimental reproducible repository.
Packages fixed
The following 15 packages became reproducible due to changes in their
build dependencies:
dracut,
editorconfig-core,
elasticsearch,
fish,
libftdi1,
liblouisxml,
mk-configure,
nanoc,
octave-bim,
octave-data-smoothing,
octave-financial,
octave-ga,
octave-missing-functions,
octave-secs1d,
octave-splines,
valgrind.
The following packages became reproducible after getting fixed:
debian/changelog
entry.debian/changelog
entry in manpage.SOURCE_DATE_EPOCH
.SOURCE_DATE_EPOCH
.debian/changelog
entry.armhf
build hosts were provided by Vagrant Cascadian and have been configured to be used by jenkins.debian.net. Work on including armhf
builds in the reproducible.debian.net webpages has begun. So far the repository comparison page just shows us which armhf
binary packages are currently missing in our repo. (h01ger)
The scheduler has been changed to re-schedule more packages from stretch than sid, as the gcc5 transition has started This mostly affects build log age. (h01ger)
A new depwait status has been introduced for packages which can't be built because of missing build dependencies. (Mattia Rizzolo)
debbindiff development
Finally, on August 31st, Lunar released debbindiff 27 containing a complete overhaul of the code for the comparison stage. The new architecture is more versatile and extensible while minimizing code duplication. libarchive is now used to handle cpio archives and iso9660 images through the newly packaged python-libarchive-c. This should also help support a couple other archive formats in the future. Symlinks and devices are now properly compared. Text files are compared as Unicode after being decoded, and encoding differences are reported. Support for Sqlite3 and Mono/.NET executables has been added. Thanks to Valentin Lorentz, the test suite should now run on more systems. A small defiency in unquashfs has been identified in the process. A long standing optimization is now performed on Debian package: based on the content of the md5sums
control file, we skip comparing files with matching hashes. This makes debbindiff usable on packages with many files. Fuzzy-matching is now performed for files in the same container (like a tarball) to handle renames. Also, for Debian .changes
, listed files are now compared without looking the embedded version number. This makes debbindiff a lot more useful when comparing different versions of the same package.
Based on the rearchitecturing work has been done to allow parallel processing. The branch now seems to work most of the time. More test needs to be done before it can be merged.
The current fuzzy-matching algorithm, ssdeep, has showed disappointing results. One important use case is being able to properly compare debug symbols. Their path is made using the Build ID. As this identifier is made with a checksum of the binary content, finding things like CPP macros is much easier when a diff of the debug symbols is available. Good news is that TLSH, another fuzzy-matching algorithm, has been tested with much better results. A package is waiting in NEW and the code is ready for it to become available.
A follow-up release 28 was made on August 2nd fixing content label used for gzip2, bzip2 and xz files and an error on text files only differing in their encoding. It also contains a small code improvement on how comments on Difference
object are handled.
This is the last release name debbindiff
. A new name has been chosen to better reflect that it is not a Debian specific tool. Stay tuned!
Documentation update
Valentin Lorentz updated the patch submission template to suggest to write the kind of issue in the bug subject.
Small progress have been made on the Reproducible Builds HOWTO while preparing the related CCCamp15 talk.
Package reviews
235 obsolete
reviews have
been removed, 47 added and 113 updated this week.
42 reports for packages failing to build from source have been made by Chris West (Faux).
New issue added this week: haskell_devscripts_locale_substvars.
Misc.
Valentin Lorentz wrote a script to report packages tested as unreproducible installed on a system. We encourage everyone to run it on their systems and give feedback!
Since this was the first time ever for me I was going to do less than trivial reverse engineering in order to find the addresses and signatures of interesting functions in the RedBoot code, it wasn't bad at all that I had a version of the RedBoot source code.
load -r -b 0x200000 -h 192.168.0.2 netbsd-nfs.bin
g
![]() |
JTAG connections on Kinder (the NSLU2 targeting NetBSD) |
$ openocd -f interface/ftdi/olimex-arm-usb-ocd.cfg -f board/linksys_nslu2.cfgOpen On-Chip Debugger 0.8.0 (2015-04-14-09:12)Licensed under GNU GPL v2For bug reports, readhttp://openocd.sourceforge.net/doc/doxygen/bugs.htmlInfo : only one transport option; autoselect 'jtag'adapter speed: 300 kHzInfo : ixp42x.cpu: hardware has 2 breakpoints and 2 watchpoints0Info : clock speed 300 kHzInfo : JTAG tap: ixp42x.cpu tap/device found: 0x29277013 (mfg: 0x009,part: 0x9277, ver: 0x2)[..]
$ telnet localhost 4444Trying ::1...Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.Open On-Chip Debugger> halttarget was in unknown state when halt was requestedin procedure 'halt'> pollbackground polling: onTAP: ixp42x.cpu (enabled)target state: unknown
Actually nobody single stepped the NSLU2, no matter the version of the NSLU2 or connections available on the JTAG interface!So I was back to square 1, I had to either struggle with disassembly, reevaluate my initial options, find another option or even drop entirely the idea. At that point I was already committed to the project, so dropping entirely the idea didn't seem like the reasonable thing to do.
Next.