Search Results: "guilhem"

21 November 2021

Antoine Beaupr : mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync). But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time. Read on for the details.

Benchmark setup All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye. The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive. The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C). The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:
$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir
The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely. Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline. A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps
That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps
As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:
anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/   pv -s 13G   tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%
Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer. (But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?) Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:
anarcat@marcos:~$ tar fc - Maildir   pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%
That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:
anarcat@angela:~(main)$ tar fc - Maildir   pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%
So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways. Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file I started with this .mbsyncrc configuration file:
SyncState *
Sync New ReNew Flags
IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore anarcat-remote
Account anarcat
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/
Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both
Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax. Also, that syntax is a little overly complicated. For example, Far needs colons, like:
Far :anarcat-remote:
Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global. Using a more standard format like .INI or TOML could improve that situation.

Stellar performance A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive. It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync. The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    
Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m. Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:
120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps
That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365
Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:
===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       
That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:
anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0
(Note that nice switch away from slavery-related terms too.) The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:
This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).
In other words:
  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted
You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:
notmuch search --output=files --format=text0 "$search_spam" \
      xargs -r -0 mv -t "$HOME/Maildir/$ PREFIX junk/cur/"
This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:
Maildir error: duplicate UID 37578.
And indeed, there are now two messages with that UID in the mailbox:
anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S
This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":
When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:
(setq mu4e-change-filenames-when-moving t)
So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above. (A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.) Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like. Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect. The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more. Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it... Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues. Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages. I have used the following .service file:
[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service
[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true
[Install]
WantedBy=default.target
And the following .timer:
[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos
[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service
[Install]
WantedBy=timers.target
Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:
[Unit]
Description=notmuch new
After=mbsync.service
[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new
[Install]
WantedBy=mbsync.service
An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup The sample file suggests this should work:
IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"
Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
And it actually seems to work:
$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087
It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well? Interestingly, the SSH setup is not faster than IMAP. With IMAP:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476
With SSH:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393
Basically: 200ms slower. Tolerable.

Migrating from SMD The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:
  1. install isync, with the patch:
    dpkg -i isync_1.4.3-1.1~_amd64.deb
    
  2. copy all files over from previous workstation to avoid a full resync (optional):
    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
    
  3. rename all files to match new hostname (optional):
    find Maildir-mbsync/ -type f -name '*.angela,*' -print0    rename -0 's/\.angela,/\.curie,/'
    
  4. trash the notmuch database (optional):
    rm -rf Maildir-mbsync/.notmuch/xapian/
    
  5. disable all smd and notmuch services:
    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
  6. do one last sync with smd:
    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
    
  7. backup notmuch on the client and server:
    notmuch dump   pv > notmuch.dump
    
  8. backup the maildir on the client and server:
    cp -al Maildir Maildir-bak
    
  9. create the SSH key:
    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/id_ed25519_mbsync.pub
    
  10. add to .ssh/authorized_keys on the server, like this: command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...
  11. move old files aside, if present:
    mv Maildir Maildir-smd
    
  12. move new files in place (CRITICAL SECTION BEGINS!):
    mv Maildir-mbsync Maildir
    
  13. run a test sync, only pulling changes: mbsync --create-near --remove-none --expunge-none --noop anarcat-register
  14. if that works well, try with all mailboxes: mbsync --create-near --remove-none --expunge-none --noop -a
  15. if that works well, try again with a full sync: mbsync register mbsync -a
  16. reindex and restore the notmuch database, this should take ~25 minutes:
    notmuch new
    pv notmuch.dump   notmuch restore
    
  17. enable the systemd services and retire the smd-* services: systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload
During the migration, notmuch helpfully told me the full list of those lost messages:
[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: F5086003-2917-4659-B7D2-66C62FCD4128@gmail.com
[...]
Warning: cannot apply tags to missing message: mailman.2.1316793601.53477.sage-members@mailman.sage.org
Warning: cannot apply tags to missing message: mailman.7.1317646801.26891.outages-discussion@outages.org
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]
As detailed in the crash report, all of those were actually innocuous and could be ignored. Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:
nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.
While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file Putting it all together, I ended up with the following configuration file:
SyncState *
Sync All
# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-remote
Account anarcat-tunnel
# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/
# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both
IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C register@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-register-remote
Account anarcat-register-tunnel
MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/
Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both
Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that. It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync The first thing that happened on a full sync is this crash:
Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps
That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:
 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<20190619152034.BFB8810E07A@marcos.anarc.at>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: Exception parsing message with ID (<40A270DB.9090609@alternatives.ca>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)
Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs
This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude! Now that we have a full sync, we can test incremental synchronization. That is also much slower:
===> multitime results
1: sh -c "offlineimap -o   true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002
That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory. The OfflineIMAP mail spool is missing quite a few messages as well:
anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*'   wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385247
... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate Those are all the options I have considered, in alphabetical order
  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python
Most projects were not evaluated due to lack of time.

Conclusion I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response. Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

29 June 2021

Antoine Beaupr : Another syncmaildir crash

So I had another major email crash with my syncmaildir setup. This time I was at least able to confirm the issue, and I still haven't lost mail thanks to backups and sheer luck (again).

The crash It is not really worth going over the crash in details, it's fairly similar to the last one: something bad happened and smd started destroying everything. The hint is that it takes a long time to do what usually takes seconds. It helps that I now have a second monitor showing logs. I still lost much more mail than the last time. I used to have "301 723 messages", according to notmuch. But then when I ran smd-pull by hand, it was telling me:
95K emails scanned
Oops. You can see notmuch happily noticing the destroyed files on the server:
jun 28 16:33:40 marcos notmuch[28532]: No new mail. Removed 65498 messages. Detected 1699 file renames.
jun 28 16:36:05 marcos notmuch[29746]: No new mail. Removed 68883 messages. Detected 2488 file renames.
jun 28 16:41:40 marcos notmuch[31972]: No new mail. Removed 118295 messages. Detected 3657 file renames.
The final count ended up being 81 042 messages, according to notmuch. A whopping 220 000 mails deleted. The interesting bit, this time around, is that I caught smd in the act of running two processes in parallel:
jun 28 16:30:09 curie systemd[2845]: Finished pull emails with syncmaildir. 
jun 28 16:30:09 curie systemd[2845]: Starting push emails with syncmaildir... 
jun 28 16:30:09 curie systemd[2845]: Starting pull emails with syncmaildir... 
So clearly that is the source of the bug.

Recovery Emergency stop on curie:
notmuch dump > notmuch.dump
systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
On marcos (the server), guessed the number of messages delivered since the last backup to be 71, just looking at timestamps in the mail log. Made a list:
grep postfix/local /var/log/mail.log   tail -71 > lost-mail
Found postfix queue IDs:
sed 's/.*\]://;s/:.*//' lost-mail > qids
Turn those into message IDs, find those that are missing from the disk (had previously ran notmuch new just to be sure it's up to date):
while read qid ; do 
    grep "$qid: message-id" /var/log/mail.log
done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do
    sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid
done
Copy this back on curie as missing-msgids and:
$ wc -l missing-msgids 
48 missing-msgids
$ while read msgid ; do notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done < missing-msgids
mailman.189.1624881611.23397.nodes-reseaulibre.ca@reseaulibre.ca
AnwMy7rdSpK-N-vt4AiOag@ismtpd0148p1mdw1.sendgrid.net
only two mails missing! whoohoo! Copy those back onto marcos as really-missing-msgids, and look at the full mail logs to see what they are:
~anarcat/src/koumbit-scripts/mail/postfix-trace --from-file really-missing-msgids2
I actually remembered deleting those, so no mail lost! Rebuild the list of msgids that were lost, on marcos:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'
Copy that on curie as lost-mail-msgids, then copy the files over in a test dir:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:restore/Maildir-angela/
If that looks about right, on marcos:
find restore/Maildir-angela/ -type f   wc -l
... should match the number of missing mails, roughly. Copy if in the real spool:
while read msgid ; do
    notmuch search --output=files --exclude=false "id:$msgid"
done < lost-mail-msgids   sed 's#/home/anarcat/Maildir/##'   rsync -v  --files-from=- /home/anarcat/Maildir/ shell.anarc.at:Maildir/
Then on the server, notmuch new should find the new emails, and we shouldn't have any lost mail anymore:
while read qid ; do grep "$qid: message-id" /var/log/mail.log; done < qids    sed 's/.*message-id=<//;s/>//'   while read msgid; do sudo -u anarcat notmuch count --exclude=false id:$msgid   grep -q 0 && echo $msgid ; done
Then, crucial moment, try to pull the new mails from the backups on curie:
anarcat@curie:~(main)$ smd-pull  -n  --show-tags -v
Found lockfile of a dead instance. Ignored.
Phase 0: handshake
Phase 1: changes detection
    5K emails scanned
   10K emails scanned
   15K emails scanned
   20K emails scanned
   25K emails scanned
   30K emails scanned
   35K emails scanned
   40K emails scanned
   45K emails scanned
   50K emails scanned
Phase 2: synchronization
Phase 3: agreement
default: smd-client@localhost: TAGS: stats::new-mails(49687), del-mails(0), bytes-received(215752279), xdelta-received(3703852)
"smd-pull  -n  --show-tags -v" took 3 mins 39 secs
This brought me back to the state after the backup plus the mails delivered during the day, which means I had to catchup with all my holiday's read emails (1440 mails!) but thankfully I made a dump of the notmuch database on curie at the start of the procedure, so this actually restored a sane state:
pv notmuch.dump   notmuch restore
Phew!

Workaround I have filed this as a bug in upstream issue 18. Considering I filed 11 issues and only 3 of those were closed, I'm not holding my breath. I nevertheless filed PR 19 in the hope that this will fix my particular issue, but I'm not even sure this is the right fix...

Fix At this point, I'm really ready to give up on SMD. It's really, really nice to be able to sync mail over SSH because I don't need to store my IMAP password on disk. But surely there are more reliable syncing mechanisms. I do not remember ever losing that much mail before. At worst, offlineimap would duplicate emails like mad, but never destroy my entire mail spool that way. As mentioned before, there are other programs that sync mail. I'm looking at:
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, see comment below
  • isync/mbsync: might be faster, I remember having trouble switching from offlineimap to this, has support for TLS client certs, running over SSH, and generally has good words from multiple Debian and notmuch people
  • getmail: just downloads email, might not be enough
  • nncp: treat the local spool as another mail server, might not be compatible with my "multiple clients" setup
  • doveadm-sync: requires dovecot on both ends, but supports using SSH to sync, will try this next, may have performance problems, see comment below
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, seems awfully complicated to setup, mentions rsmtp which is a nice name for rsendmail

1 January 2021

Utkarsh Gupta: FOSS Activites in December 2020

Here s my (fifteenth) monthly update about the activities I ve done in the F/L/OSS world.

Debian
This was my 24th month of contributing to Debian. I became a DM in late March last year and a DD last Christmas! \o/ Amongs a lot of things, this was month was crazy, hectic, adventerous, and the last of 2020 more on some parts later this month.
I finally finished my 7th semester (FTW!) and moved onto my last one! That said, I had been busy with other things but still did a bunch of Debian stuff Here are the following things I did this month:

Uploads and bug fixes:

Other $things:
  • Attended the Debian Ruby team meeting.
  • Mentoring for newcomers.
  • FTP Trainee reviewing.
  • Moderation of -project mailing list.
  • Sponsored golang-github-gorilla-css for Fedrico.

Debian (E)LTS
Debian Long Term Support (LTS) is a project to extend the lifetime of all Debian stable releases to (at least) 5 years. Debian LTS is not handled by the Debian security team, but by a separate group of volunteers and companies interested in making it a success. And Debian Extended LTS (ELTS) is its sister project, extending support to the Jessie release (+2 years after LTS support). This was my fifteenth month as a Debian LTS and sixth month as a Debian ELTS paid contributor.
I was assigned 26.00 hours for LTS and 38.25 hours for ELTS and worked on the following things:

LTS CVE Fixes and Announcements:
  • Issued DLA 2474-1, fixing CVE-2020-28928, for musl.
    For Debian 9 Stretch, these problems have been fixed in version 1.1.16-3+deb9u1.
  • Issued DLA 2481-1, fixing CVE-2020-25709 and CVE-2020-25710, for openldap.
    For Debian 9 Stretch, these problems have been fixed in version 2.4.44+dfsg-5+deb9u6.
  • Issued DLA 2484-1, fixing #969126, for python-certbot.
    For Debian 9 Stretch, these problems have been fixed in version 0.28.0-1~deb9u3.
  • Issued DLA 2487-1, fixing CVE-2020-27350, for apt.
    For Debian 9 Stretch, these problems have been fixed in version 1.4.11. The update was prepared by the maintainer, Julian.
  • Issued DLA 2488-1, fixing CVE-2020-27351, for python-apt.
    For Debian 9 Stretch, these problems have been fixed in version 1.4.2. The update was prepared by the maintainer, Julian.
  • Issued DLA 2495-1, fixing CVE-2020-17527, for tomcat8.
    For Debian 9 Stretch, these problems have been fixed in version 8.5.54-0+deb9u5.
  • Issued DLA 2488-2, for python-apt.
    For Debian 9 Stretch, these problems have been fixed in version 1.4.3. The update was prepared by the maintainer, Julian.
  • Issued DLA 2508-1, fixing CVE-2020-35730, for roundcube.
    For Debian 9 Stretch, these problems have been fixed in version 1.2.3+dfsg.1-4+deb9u8. The update was prepared by the maintainer, Guilhem.

ELTS CVE Fixes and Announcements:

Other (E)LTS Work:
  • Front-desk duty from 21-12 until 27-12 and from 28-12 until 03-01 for both LTS and ELTS.
  • Triaged openldap, python-certbot, lemonldap-ng, qemu, gdm3, open-iscsi, gobby, jackson-databind, wavpack, cairo, nsd, tomcat8, and bountycastle.
  • Marked CVE-2020-17527/tomcat8 as not-affected for jessie.
  • Marked CVE-2020-28052/bountycastle as not-affected for jessie.
  • Marked CVE-2020-14394/qemu as postponed for jessie.
  • Marked CVE-2020-35738/wavpack as not-affected for jessie.
  • Marked CVE-2020-3550 3-6 /qemu as postponed for jessie.
  • Marked CVE-2020-3550 3-6 /qemu as postponed for stretch.
  • Marked CVE-2020-16093/lemonldap-ng as no-dsa for stretch.
  • Marked CVE-2020-27837/gdm3 as no-dsa for stretch.
  • Marked CVE-2020- 13987, 13988, 17437 /open-iscsi as no-dsa for stretch.
  • Marked CVE-2020-35450/gobby as no-dsa for stretch.
  • Marked CVE-2020-35728/jackson-databind as no-dsa for stretch.
  • Marked CVE-2020-28935/nsd as no-dsa for stretch.
  • Auto EOL ed libpam-tacplus, open-iscsi, wireshark, gdm3, golang-go.crypto, jackson-databind, spotweb, python-autobahn, asterisk, nsd, ruby-nokogiri, linux, and motion for jessie.
  • General discussion on LTS private and public mailing list.

Other $things! \o/

Bugs and Patches Well, I did report some bugs and issues and also sent some patches:
  • Issue #44 for github-activity-readme, asking for a feature request to set custom committer s email address.
  • Issue #711 for git2go, reporting build failure for the library.
  • PR #89 for rubocop-rails_config, bumping RuboCop::Packaging to v0.5.
  • Issue #36 for rubocop-packaging, asking to try out mutant :)
  • PR #212 for cucumber-ruby-core, bumping RuboCop::Packaging to v0.5.
  • PR #213 for cucumber-ruby-core, enabling RuboCop::Packaging.
  • Issue #19 for behance, asking to relax constraints on faraday and faraday_middleware.
  • PR #37 for rubocop-packaging, enabling tests against ruby3.0! \o/
  • PR #489 for cucumber-rails, bumping RuboCop::Packaging to v0.5.
  • Issue #362 for nheko, reporting a crash when opening the application.
  • PR #1282 for paper_trail, adding RuboCop::Packaging amongst other used extensions.
  • Bug #978640 for nheko Debian package, reporting a crash, as a result of libfmt7 regression.

Misc and Fun Besides squashing bugs and submitting patches, I did some other things as well!
  • Participated in my first Advent of Code event! :)
    Whilst it was indeed fun, I didn t really complete it. No reason, really. But I ll definitely come back stronger next year, heh! :)
    All the solutions thus far could be found here.
  • Did a couple of reviews for some PRs and triaged some bugs here and there, meh.
  • Also did some cloud debugging, not so fun if you ask me, but cool enough to make me want to do it again! ^_^
  • Worked along with pollo, zigo, ehashman, rlb, et al for puppet and puppetserver in Debian. OMG, they re so lovely! <3
  • Ordered some interesting books to read January onward. New year resolution? Meh, not really. Or maybe. But nah.
  • Also did some interesting stuff this month but can t really talk about it now. Hopefully sooooon.

Until next time.
:wq for today.

14 May 2017

Bits from Debian: New Debian Developers and Maintainers (March and April 2017)

The following contributors got their Debian Developer accounts in the last two months: The following contributors were added as Debian Maintainers in the last two months: Congratulations!

7 December 2016

Jonas Meurer: On CVE-2016-4484, a (securiy)? bug in the cryptsetup initramfs integration

On CVE-2016-4484, a (security)? bug in the cryptsetup initramfs integration On November 4, I was made aware of a security vulnerability in the integration of cryptsetup into initramfs. The vulnerability was discovered by security researchers Hector Marco and Ismael Ripoll of CyberSecurity UPV Research Group and got CVE-2016-4484 assigned. In this post I'll try to reflect a bit on

What CVE-2016-4484 is all about Basically, the vulnerability is about two separate but related issues:

1. Initramfs rescue shell considered harmful The main topic that Hector Marco and Ismael Ripoll address in their publication is that Debian exits into a rescue shell in case of failure during initramfs, and that this can be triggered by entering a wrong password ~93 times in a row. Indeed the Debian initramfs implementation as provided by initramfs-tools exits into a rescue shell (usually a busybox shell) after a defined amount of failed attempts to make the root filesystem available. The loop in question is in local_device_setup() at the local initramfs script In general, this behaviour is considered as a feature: if the root device hasn't shown up after 30 rounds, the rescue shell is spawned to provide the local user/admin a way to debug and fix things herself. Hector Marco and Ismael Ripoll argue that in special environments, e.g. on public computers with password protected BIOS/UEFI and bootloader, this opens an attack vector and needs to be regarded as a security vulnerability:
It is common to assume that once the attacker has physical access to the computer, the game is over. The attackers can do whatever they want. And although this was true 30 years ago, today it is not. There are many "levels" of physical access. [...] In order to protect the computer in these scenarios: the BIOS/UEFI has one or two passwords to protect the booting or the configuration menu; the GRUB also has the possibility to use multiple passwords to protect unauthorized operations. And in the case of an encrypted system, the initrd shall block the maximum number of password trials and prevent the access to the computer in that case.
While Hector and Ismael have a valid point in that the rescue shell might open an additional attack vector in special setups, this is not true for the vast majority of Debian systems out there: in most cases a local attacker can alter the boot order, replace or add boot devices, modify boot options in the (GNU GRUB) bootloader menu or modify/replace arbitrary hardware parts. The required scenario to make the initramfs rescue shell an additional attack vector is indeed very special: locked down hardware, password protected BIOS and bootloader but still local keyboard (or serial console) access are required at least. Hector and Ismael argue that the default should be changed for enhanced security:
[...] But then Linux is used in more hostile environments, this helpful (but naive) recovery services shall not be the default option.
For the reasons explained about, I tend to disagree to Hectors and Ismaels opinion here. And after discussing this topic with several people I find my opinion reconfirmed: the Debian Security Team disputes the security impact of the issue and others agree. But leaving the disputable opinion on a sane default aside, I don't think that the cryptsetup package is the right place to change the default, if at all. If you want added security by a locked down initramfs (i.e. no rescue shell spawned), then at least the bootloader (GNU GRUB) needs to be locked down by default as well. To make it clear: if one wants to lock down the boot process, bootloader and initramfs should be locked down together. And the right place to do this would be the configurable behaviour of grub-mkconfig. Here, one can set a password for GRUB and the boot parameter 'panic=1' which disables the spawning of a rescue shell in initramfs. But as mentioned, I don't agree that this would be sane defaults. The vast majority of Debian systems out there don't have any security added by locked down bootloader and initramfs and the benefit of a rescue shell for debugging purposes clearly outrivals the minor security impact in my opinion. For the few setups which require the added security of a locked down bootloader and initramfs, we already have the relevant options documented in the Securing Debian Manual: After discussing the topic with initramfs-tools maintainers today, Guilhem and me (the cryptsetup maintainers) finally decided to not change any defaults and just add a 'sleep 60' after the maximum allowed attempts were reached. 2. tries=n option ignored, local brute-force slightly cheaper Apart from the issue of a rescue shell being spawned, Hector and Ismael also discovered a programming bug in the cryptsetup initramfs integration. This bug in the cryptroot initramfs local-top script allowed endless retries of passphrase input, ignoring the tries=n option of crypttab (and the default of 3). As a result, theoretically unlimited attempts to unlock encrypted disks were possible when processed during initramfs stage. The attack vector here was that local brute-force attacks are a bit cheaper. Instead of having to reboot after max tries were reached, one could go on trying passwords. Even though efficient brute-force attacks are mitigated by the PBKDF2 implementation in cryptsetup, this clearly is a real bug. The reason for the bug was twofold:
  • First, the condition in setup_mapping() responsible for making the function fail when the maximum amount of allowed attempts is reached, was never met:
    setup_mapping()
     
      [...]
      # Try to get a satisfactory password $crypttries times
      count=0                              
    while [ $crypttries -le 0 ] [ $count -lt $crypttries ]; do export CRYPTTAB_TRIED="$count" count=$(( $count + 1 )) [...] done if [ $crypttries -gt 0 ] && [ $count -gt $crypttries ]; then message "cryptsetup: maximum number of tries exceeded for $crypttarget" return 1 fi [...]
    As one can see, the while loop stops when $count -lt $crypttries. Thus the second condition $count -gt $crypttries is never met. This can easily be fixed by decreasing $count by one in case of a successful unlock attempt along with changing the second condition to $count -ge $crypttries:
    setup_mapping()
     
      [...]
      while [ $crypttries -le 0 ]   [ $count -lt $crypttries ]; do
          [...]
          # decrease $count by 1, apparently last try was successful.
          count=$(( $count - 1 ))
          [...]
      done
      if [ $crypttries -gt 0 ] && [ $count -ge $crypttries ]; then
          [...]
      fi
      [...]
     
    
    Christian Lamparter already spotted this bug back in October 2011 and provided a (incomplete) patch, but back then I even managed to merge the patch in an improper way, making it even more useless: The patch by Christian forgot to decrease $count by one in case of a successful unlock attempt, resulting in warnings about maximum tries exceeded even for successful attemps in some circumstances. But instead of adding the decrease myself and keeping the (almost correct) condition $count -eq $crypttries for detection of exceeded maximum tries, I changed back the condition to the wrong original $count -gt $crypttries that again was never met. Apparently I didn't test the fix properly back then. I definitely should do better in future!
  • Second, back in December 2013, I added a cryptroot initramfs local-block script as suggested by Goswin von Brederlow in order to fix bug #678692. The purpose of the cryptroot initramfs local-block script is to invoke the cryptroot initramfs local-top script again and again in a loop. This is required to support complex block device stacks. In fact, the numberless options of stacked block devices are one of the biggest and most inglorious reasons that the cryptsetup initramfs integration scripts became so complex over the years. After all we need to support setups like rootfs on top of LVM with two separate encrypted PVs or rootfs on top of LVM on top of dm-crypt on top of MD raid. The problem with the local-block script is that exiting the setup_mapping() function merely triggers a new invocation of the very same function. The guys who discovered the bug suggested a simple and good solution to this bug: When maximum attempts are detected (by second condition from above), the script sleeps for 60 seconds. This mitigates the brute-force attack options for local attackers - even rebooting after max attempts should be faster.

About disclosure, wording and clickbaiting I'm happy that Hector and Ismael brought up the topic and made their argument about the security impacts of an initramfs rescue shell, even though I have to admit that I was rather astonished about the fact that they got a CVE assigned. Nevertheless I'm very happy that they informed the Security Teams of Debian and Ubuntu prior to publishing their findings, which put me in the loop in turn. Also Hector and Ismael were open and responsive when it came to discussing their proposed fixes. But unfortunately the way they advertised their finding was not very helpful. They announced a speech about this topic at the DeepSec 2016 in Vienna with the headline Abusing LUKS to Hack the System. Honestly, this headline is missleading - if not wrong - in several ways:
  • First, the whole issue is not about LUKS, neither is it about cryptsetup itself. It's about Debians integration of cryptsetup into the initramfs, which is a compeletely different story.
  • Second, the term hack the system suggests that an exploit to break into the system is revealed. This is not true. The device encryption is not endangered at all.
  • Third - as shown above - very special prerequisites need to be met in order to make the mere existance of a LUKS encrypted device the relevant fact to be able to spawn a rescue shell during initramfs.
Unfortunately, the way this issue was published lead to even worse articles in the tech news press. Topics like Major security hole found in Cryptsetup script for LUKS disk encryption or Linux Flaw allows Root Shell During Boot-Up for LUKS Disk-Encrypted Systems suggest that a major security vulnerabilty was revealed and that it compromised the protection that cryptsetup respective LUKS offer. If these articles/news did anything at all, then it was causing damage to the cryptsetup project, which is not affected by the whole issue at all. After the cat was out of the bag, Marco and Ismael aggreed that the way the news picked up the issue was suboptimal, but I cannot fight the feeling that the over-exaggeration was partly intended and that clickbaiting is taking place here. That's a bit sad.

9 November 2015

Lunar: Reproducible builds: week 28 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Chris Lamb filled a bug on python-setuptools with a patch to make the generated requires.txt files reproducible. The patch has been forwarded upstream. Chris also understood why the she-bang in some Python scripts kept being undeterministic: setuptools as called by dh-python could skip re-installing the scripts if the build had been too fast (under one second). #804339 offers a patch fixing the issue by passing --force to setup.py install. #804141 reported on gettext asks for support of SOURCE_DATE_EPOCH in gettextize. Santiago Vila pointed out that it doesn't felt appropriate as gettextize is supposed to be an interactive tool. The problem reported seems to be in avahi build system instead. Packages fixed The following packages became reproducible due to changes in their build dependencies: celestia, dsdo, fonts-taml-tscu, fte, hkgerman, ifrench-gut, ispell-czech, maven-assembly-plugin, maven-project-info-reports-plugin, python-avro, ruby-compass, signond, thepeg, wagon2, xjdic. The following packages became reproducible after getting fixed: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Chris Lamb closed a wrongly reopened bug against haskell-devscripts that was actually a problem in haddock. reproducible.debian.net FreeBSD tests are now run for three branches: master, stable/10, release/10.2.0. (h01ger) diffoscope development Support has been added for Free Pascal unit files (.ppc). (Paul Gevers) The homepage is now available using HTTPS, thanks to Let's Encrypt!. Work has been done to be able to publish diffoscope on the Python Package Index (also known as PyPI): the tlsh module is now optional, compatibility with python-magic has been added, and the fallback code to handle RPM has been fixed. Documentation update Reiner Herrmann, Paul Gevers, Niko Tyni, opi, and Dhole offered various fixes and wording improvements to the reproducible-builds.org. A mailing-list is now available to receive change notifications. NixOS, Guix, and Baserock are featured as projects working on reproducible builds. Package reviews 70 reviews have been removed, 74 added and 17 updated this week. Chris Lamb opened 22 new fail to build from source bugs. New issues this week: randomness_in_ocaml_provides, randomness_in_qdoc_page_id, randomness_in_python_setuptools_requires_txt, gettext_creates_ChangeLog_files_and_entries_with_current_date. Misc. h01ger and Chris Lamb presented Beyond reproducible builds at the MiniDebConf in Cambridge on November 8th. They gave an overview of where we stand and the changes in user tools, infrastructure, and development practices that we might want to see happening. Feedback on these thoughts are welcome. Slides are already available, and the video should be online soon. At the same event, a meeting happened with some members of the release team to discuss the best strategy regarding releases and reproducibility. Minutes have been posted on the Debian reproducible-builds mailing-list.

18 October 2015

Lunar: Reproducible builds: week 25 in Stretch cycle

What happened in the reproducible builds effort this week: Toolchain fixes Niko Tyni wrote a new patch adding support for SOURCE_DATE_EPOCH in Pod::Man. This would complement or replace the previously implemented POD_MAN_DATE environment variable in a more generic way. Niko Tyni proposed a fix to prevent mtime variation in directories due to debhelper usage of cp --parents -p. Packages fixed The following 119 packages became reproducible due to changes in their build dependencies: aac-tactics, aafigure, apgdiff, bin-prot, boxbackup, calendar, camlmix, cconv, cdist, cl-asdf, cli-common, cluster-glue, cppo, cvs, esdl, ess, faucc, fauhdlc, fbcat, flex-old, freetennis, ftgl, gap, ghc, git-cola, globus-authz-callout-error, globus-authz, globus-callout, globus-common, globus-ftp-client, globus-ftp-control, globus-gass-cache, globus-gass-copy, globus-gass-transfer, globus-gram-client, globus-gram-job-manager-callout-error, globus-gram-protocol, globus-gridmap-callout-error, globus-gsi-callback, globus-gsi-cert-utils, globus-gsi-credential, globus-gsi-openssl-error, globus-gsi-proxy-core, globus-gsi-proxy-ssl, globus-gsi-sysconfig, globus-gss-assist, globus-gssapi-error, globus-gssapi-gsi, globus-net-manager, globus-openssl-module, globus-rsl, globus-scheduler-event-generator, globus-xio-gridftp-driver, globus-xio-gsi-driver, globus-xio, gnome-control-center, grml2usb, grub, guilt, hgview, htmlcxx, hwloc, imms, kde-l10n, keystone, kimwitu++, kimwitu-doc, kmod, krb5, laby, ledger, libcrypto++, libopendbx, libsyncml, libwps, lprng-doc, madwimax, maria, mediawiki-math, menhir, misery, monotone-viz, morse, mpfr4, obus, ocaml-csv, ocaml-reins, ocamldsort, ocp-indent, openscenegraph, opensp, optcomp, opus, otags, pa-bench, pa-ounit, pa-test, parmap, pcaputils, perl-cross-debian, prooftree, pyfits, pywavelets, pywbem, rpy, signify, siscone, swtchart, tipa, typerep, tyxml, unison2.32.52, unison2.40.102, unison, uuidm, variantslib, zipios++, zlibc, zope-maildrophost. The following packages became reproducible after getting fixed: Packages which could not be tested: Some uploads fixed some reproducibility issues but not all of them: Patches submitted which have not made their way to the archive yet: Lunar reported that test strings depend on default character encoding of the build system in ongl. reproducible.debian.net The 189 packages composing the Arch Linux core repository are now being tested. No packages are currently reproducible, but most of the time the difference is limited to metadata. This has already gained some interest in the Arch Linux community. An explicit log message is now visible when a build has been killed due to the 12 hours timeout. (h01ger) Remote build setup has been made more robust and self maintenance has been further improved. (h01ger) The minimum age for rescheduling of already tested amd64 packages has been lowered from 14 to 7 days, thanks to the increase of hardware resources sponsored by ProfitBricks last week. (h01ger) diffoscope development diffoscope version 37 has been released on October 15th. It adds support for two new file formats (CBFS images and Debian .dsc files). After proposing the required changes to TLSH, fuzzy hashes are now computed incrementally. This will avoid reading entire files in memory which caused problems for large packages. New tests have been added for the command-line interface. More character encoding issues have been fixed. Malformed md5sums will now be compared as binary files instead of making diffoscope crash amongst several other minor fixes. Version 38 was released two days later to fix the versioned dependency on python3-tlsh. strip-nondeterminism development strip-nondeterminism version 0.013-1 has been uploaded to the archive. It fixes an issue with nonconformant PNG files with trailing garbage reported by Roland Rosenfeld. disorderfs development disorderfs version 0.4.1-1 is a stop-gap release that will disable lock propagation, unless --share-locks=yes is specified, as it still is affected by unidentified issues. Documentation update Lunar has been busy creating a proper website for reproducible-builds.org that would be a common location for news, documentation, and tools for all free software projects working on reproducible builds. It's not yet ready to be published, but it's surely getting there. Homepage of the future reproducible-builds.org website  Who's involved?  page of the future reproducible-builds.org website Package reviews 103 reviews have been removed, 394 added and 29 updated this week. 72 FTBFS issues were reported by Chris West and Niko Tyni. New issues: random_order_in_static_libraries, random_order_in_md5sums.

7 November 2007

Roland Mas: Planet scores

Top posters in a few Debian-related Planets:
$ planet-scores.sh 
Planet Debian-FR :
     19 Rapha l Hertzog
      4 Roland Mas
      3 Jean-Christophe Dubacq
      2 Gr gory Colpart
      2 Alexis Sukrieh
Sometimes I think this should be renamed Planet Buxy.
Planet Debian-FR (utilisateurs) :
     10 Julien Candelier
      8 Emilien Macchi
      4 Guilhem Bonnefille
      3 Shams Fantar
      1 Rapha l Hertzog
      1 Olivier Berger (perso)
      1 Jean-Christophe Dubacq
      1 Jean-Baptiste H tier (djib)
      1 Eric Veiras Galisson
Newly added contributors to that planet have all their recent articles aggregated, not only the ones they wrote since they were added.
Planet Debian :
     40 Christian Perrier
      2 Russell Coker
      2 Raphael Geissert
      1 Wouter Verhelst
      1 Steve Kemp
      1 Romain Francoise
      1 NOKUBI Takatsugu
      1 Michal  iha 
      1 John Goerzen
      1 Joey Schulze
      1 Gerfried Fuchs
      1 Fathi Boudra
      1 Enrico Zini
      1 Emanuele Rocca
      1 Dirk Eddelbuettel
      1 David Welton
      1 Christine Spang
      1 Antti-Juhani Kaijanaho
      1 Adam Rosi-Kessel
Planet "Christian loves rugby".
debian-community.org :
      4 Holger Levsen
      3 Andrew Donnellan
      2 Evgeni Golov
      1 Wolfgang Lonien
      1 Rapha l Hertzog
      1 Martin Albisetti
      1 Marcos Marado
      1 Jean-Christophe Dubacq
      1 Cord Beermann
      1 Benjamin A'Lee
      1 Andreas Putzo
$
I know I have an encoding problem on some planets, but that script is a very basic curl+shell+sed+grep+recode+sort+uniq pipeline, and I only use it for the amusement value. Maybe I'll recode it with a proper RSS parser some day if I feel utterly bored.