Search Results: "mt"

16 January 2022

Chris Lamb: Favourite films of 2021

In my four most recent posts, I went over the memoirs and biographies, the non-fiction, the fiction and the 'classic' novels that I enjoyed reading the most in 2021. But in the very last of my 2021 roundup posts, I'll be going over some of my favourite movies. (Saying that, these are perhaps less of my 'favourite films' than the ones worth remarking on after all, nobody needs to hear that The Godfather is a good movie.) It's probably helpful to remark you that I took a self-directed course in film history in 2021, based around the first volume of Roger Ebert's The Great Movies. This collection of 100-odd movie essays aims to make a tour of the landmarks of the first century of cinema, and I watched all but a handul before the year was out. I am slowly making my way through volume two in 2022. This tome was tremendously useful, and not simply due to the background context that Ebert added to each film: it also brought me into contact with films I would have hardly come through some other means. Would I have ever discovered the sly comedy of Trouble in Paradise (1932) or the touching proto-realism of L'Atalante (1934) any other way? It also helped me to 'get around' to watching films I may have put off watching forever the influential Battleship Potemkin (1925), for instance, and the ur-epic Lawrence of Arabia (1962) spring to mind here. Choosing a 'worst' film is perhaps more difficult than choosing the best. There are first those that left me completely dry (Ready or Not, Written on the Wind, etc.), and those that were simply poorly executed. And there are those that failed to meet their own high opinions of themselves, such as the 'made for Reddit' Tenet (2020) or the inscrutable Vanilla Sky (2001) the latter being an almost perfect example of late-20th century cultural exhaustion. But I must save my most severe judgement for those films where I took a visceral dislike how their subjects were portrayed. The sexually problematic Sixteen Candles (1984) and the pseudo-Catholic vigilantism of The Boondock Saints (1999) both spring to mind here, the latter of which combines so many things I dislike into such a short running time I'd need an entire essay to adequately express how much I disliked it.

Dogtooth (2009) A father, a mother, a brother and two sisters live in a large and affluent house behind a very high wall and an always-locked gate. Only the father ever leaves the property, driving to the factory that he happens to own. Dogtooth goes far beyond any allusion to Josef Fritzl's cellar, though, as the children's education is a grotesque parody of home-schooling. Here, the parents deliberately teach their children the wrong meaning of words (e.g. a yellow flower is called a 'zombie'), all of which renders the outside world utterly meaningless and unreadable, and completely mystifying its very existence. It is this creepy strangeness within a 'regular' family unit in Dogtooth that is both socially and epistemically horrific, and I'll say nothing here of its sexual elements as well. Despite its cold, inscrutable and deadpan surreality, Dogtooth invites all manner of potential interpretations. Is this film about the artificiality of the nuclear family that the West insists is the benchmark of normality? Or is it, as I prefer to believe, something more visceral altogether: an allegory for the various forms of ontological violence wrought by fascism, as well a sobering nod towards some of fascism's inherent appeals? (Perhaps it is both. In 1972, French poststructuralists Gilles and F lix Guattari wrote Anti-Oedipus, which plays with the idea of the family unit as a metaphor for the authoritarian state.) The Greek-language Dogtooth, elegantly shot, thankfully provides no easy answers.

Holy Motors (2012) There is an infamous scene in Un Chien Andalou, the 1929 film collaboration between Luis Bu uel and famed artist Salvador Dal . A young woman is cornered in her own apartment by a threatening man, and she reaches for a tennis racquet in self-defence. But the man suddenly picks up two nearby ropes and drags into the frame two large grand pianos... each leaden with a dead donkey, a stone tablet, a pumpkin and a bewildered priest. This bizarre sketch serves as a better introduction to Leos Carax's Holy Motors than any elementary outline of its plot, which ostensibly follows 24 hours in the life of a man who must play a number of extremely diverse roles around Paris... all for no apparent reason. (And is he even a man?) Surrealism as an art movement gets a pretty bad wrap these days, and perhaps justifiably so. But Holy Motors and Un Chien Andalou serve as a good reminder that surrealism can be, well, 'good, actually'. And if not quite high art, Holy Motors at least demonstrates that surrealism can still unnerving and hilariously funny. Indeed, recalling the whimsy of the plot to a close friend, the tears of laughter came unbidden to my eyes once again. ("And then the limousines...!") Still, it is unclear how Holy Motors truly refreshes surrealism for the twenty-first century. Surrealism was, in part, a reaction to the mechanical and unfeeling brutality of World War I and ultimately sought to release the creative potential of the unconscious mind. Holy Motors cannot be responding to another continental conflagration, and so it appears to me to be some kind of commentary on the roles we exhibit in an era of 'post-postmodernity': a sketch on our age of performative authenticity, perhaps, or an idle doodle on the function and psychosocial function of work. Or perhaps not. After all, this film was produced in a time that offers the near-universal availability of mind-altering substances, and this certainly changes the context in which this film was both created. And, how can I put it, was intended to be watched.

Manchester by the Sea (2016) An absolutely devastating portrayal of a character who is unable to forgive himself and is hesitant to engage with anyone ever again. It features a near-ideal balance between portraying unrecoverable anguish and tender warmth, and is paradoxically grandiose in its subtle intimacy. The mechanics of life led me to watch this lying on a bed in a chain hotel by Heathrow Airport, and if this colourless circumstance blunted the film's emotional impact on me, I am probably thankful for it. Indeed, I find myself reduced in this review to fatuously recalling my favourite interactions instead of providing any real commentary. You could write a whole essay about one particular incident: its surfaces, subtexts and angles... all despite nothing of any substance ever being communicated. Truly stunning.

McCabe & Mrs. Miller (1971) Roger Ebert called this movie one of the saddest films I have ever seen, filled with a yearning for love and home that will not ever come. But whilst it is difficult to disagree with his sentiment, Ebert's choice of sad is somehow not quite the right word. Indeed, I've long regretted that our dictionaries don't have more nuanced blends of tragedy and sadness; perhaps the Ancient Greeks can loan us some. Nevertheless, the plot of this film is of a gambler and a prostitute who become business partners in a new and remote mining town called Presbyterian Church. However, as their town and enterprise booms, it comes to the attention of a large mining corporation who want to bully or buy their way into the action. What makes this film stand out is not the plot itself, however, but its mood and tone the town and its inhabitants seem to be thrown together out of raw lumber, covered alternatively in mud or frozen ice, and their days (and their personalities) are both short and dark in equal measure. As a brief aside, if you haven't seen a Roger Altman film before, this has all the trappings of being a good introduction. As Ebert went on to observe: This is not the kind of movie where the characters are introduced. They are all already here. Furthermore, we can see some of Altman's trademark conversations that overlap, a superb handling of ensemble casts, and a quietly subversive view of the tyranny of 'genre'... and the latter in a time when the appetite for revisionist portrays of the West was not very strong. All of these 'Altmanian' trademarks can be ordered in much stronger measures in his later films: in particular, his comedy-drama Nashville (1975) has 24 main characters, and my jejune interpretation of Gosford Park (2001) is that it is purposefully designed to poke fun those who take a reductionist view of 'genre', or at least on the audience's expectations. (In this case, an Edwardian-era English murder mystery in the style of Agatha Christie, but where no real murder or detection really takes place.) On the other hand, McCabe & Mrs. Miller is actually a poor introduction to Altman. The story is told in a suitable deliberate and slow tempo, and the two stars of the film are shown thoroughly defrocked of any 'star status', in both the visual and moral dimensions. All of these traits are, however, this film's strength, adding up to a credible, fascinating and riveting portrayal of the old West.

Detour (1945) Detour was filmed in less than a week, and it's difficult to decide out of the actors and the screenplay which is its weakest point.... Yet it still somehow seemed to drag me in. The plot revolves around luckless Al who is hitchhiking to California. Al gets a lift from a man called Haskell who quickly falls down dead from a heart attack. Al quickly buries the body and takes Haskell's money, car and identification, believing that the police will believe Al murdered him. An unstable element is soon introduced in the guise of Vera, who, through a set of coincidences that stretches credulity, knows that this 'new' Haskell (ie. Al pretending to be him) is not who he seems. Vera then attaches herself to Al in order to blackmail him, and the world starts to spin out of his control. It must be understood that none of this is executed very well. Rather, what makes Detour so interesting to watch is that its 'errors' lend a distinctively creepy and unnatural hue to the film. Indeed, in the early twentieth century, Sigmund Freud used the word unheimlich to describe the experience of something that is not simply mysterious, but something creepy in a strangely familiar way. This is almost the perfect description of watching Detour its eerie nature means that we are not only frequently second-guessed about where the film is going, but are often uncertain whether we are watching the usual objective perspective offered by cinema. In particular, are all the ham-fisted segues, stilted dialogue and inscrutable character motivations actually a product of Al inventing a story for the viewer? Did he murder Haskell after all, despite the film 'showing' us that Haskell died of natural causes? In other words, are we watching what Al wants us to believe? Regardless of the answers to these questions, the film succeeds precisely because of its accidental or inadvertent choices, so it is an implicit reminder that seeking the director's original intention in any piece of art is a complete mirage. Detour is certainly not a good film, but it just might be a great one. (It is a short film too, and, out of copyright, it is available online for free.)

Safe (1995) Safe is a subtly disturbing film about an upper-middle-class housewife who begins to complain about vague symptoms of illness. Initially claiming that she doesn't feel right, Carol starts to have unexplained headaches, a dry cough and nosebleeds, and eventually begins to have trouble breathing. Carol's family doctor treats her concerns with little care, and suggests to her husband that she sees a psychiatrist. Yet Carol's episodes soon escalate. For example, as a 'homemaker' and with nothing else to occupy her, Carol's orders a new couch for a party. But when the store delivers the wrong one (although it is not altogether clear that they did), Carol has a near breakdown. Unsure where to turn, an 'allergist' tells Carol she has "Environmental Illness," and so Carol eventually checks herself into a new-age commune filled with alternative therapies. On the surface, Safe is thus a film about the increasing about of pesticides and chemicals in our lives, something that was clearly felt far more viscerally in the 1990s. But it is also a film about how lack of genuine healthcare for women must be seen as a critical factor in the rise of crank medicine. (Indeed, it made for something of an uncomfortable watch during the coronavirus lockdown.) More interestingly, however, Safe gently-yet-critically examines the psychosocial causes that may be aggravating Carol's illnesses, including her vacant marriage, her hollow friends and the 'empty calorie' stimulus of suburbia. None of this should be especially new to anyone: the gendered Victorian term 'hysterical' is often all but spoken throughout this film, and perhaps from the very invention of modern medicine, women's symptoms have often regularly minimised or outright dismissed. (Hilary Mantel's 2003 memoir, Giving Up the Ghost is especially harrowing on this.) As I opened this review, the film is subtle in its messaging. Just to take one example from many, the sound of the cars is always just a fraction too loud: there's a scene where a group is eating dinner with a road in the background, and the total effect can be seen as representing the toxic fumes of modernity invading our social lives and health. I won't spoiler the conclusion of this quietly devasting film, but don't expect a happy ending.

The Driver (1978) Critics grossly misunderstood The Driver when it was first released. They interpreted the cold and unemotional affect of the characters with the lack of developmental depth, instead of representing their dissociation from the society around them. This reading was encouraged by the fact that the principal actors aren't given real names and are instead known simply by their archetypes instead: 'The Driver', 'The Detective', 'The Player' and so on. This sort of quasi-Jungian erudition is common in many crime films today (Reservoir Dogs, Kill Bill, Layer Cake, Fight Club), so the critics' misconceptions were entirely reasonable in 1978. The plot of The Driver involves the eponymous Driver, a noted getaway driver for robberies in Los Angeles. His exceptional talent has far prevented him from being captured thus far, so the Detective attempts to catch the Driver by pardoning another gang if they help convict the Driver via a set-up robbery. To give himself an edge, however, The Driver seeks help from the femme fatale 'Player' in order to mislead the Detective. If this all sounds eerily familiar, you would not be far wrong. The film was essentially remade by Nicolas Winding Refn as Drive (2011) and in Edgar Wright's 2017 Baby Driver. Yet The Driver offers something that these neon-noir variants do not. In particular, the car chases around Los Angeles are some of the most captivating I've seen: they aren't thrilling in the sense of tyre squeals, explosions and flying boxes, but rather the vehicles come across like wild animals hunting one another. This feels especially so when the police are hunting The Driver, which feels less like a low-stakes game of cat and mouse than a pack of feral animals working together a gang who will tear apart their prey if they find him. In contrast to the undercar neon glow of the Fast & Furious franchise, the urban realism backdrop of the The Driver's LA metropolis contributes to a sincere feeling of artistic fidelity as well. To be sure, most of this is present in the truly-excellent Drive, where the chase scenes do really communicate a credible sense of stakes. But the substitution of The Driver's grit with Drive's soft neon tilts it slightly towards that common affliction of crime movies: style over substance. Nevertheless, I can highly recommend watching The Driver and Drive together, as it can tell you a lot about the disconnected socioeconomic practices of the 1980s compared to the 2010s. More than that, however, the pseudo-1980s synthwave soundtrack of Drive captures something crucial to analysing the world of today. In particular, these 'sounds from the past filtered through the present' bring to mind the increasing role of nostalgia for lost futures in the culture of today, where temporality and pop culture references are almost-exclusively citational and commemorational.

The Souvenir (2019) The ostensible outline of this quietly understated film follows a shy but ambitious film student who falls into an emotionally fraught relationship with a charismatic but untrustworthy older man. But that doesn't quite cover the plot at all, for not only is The Souvenir a film about a young artist who is inspired, derailed and ultimately strengthened by a toxic relationship, it is also partly a coming-of-age drama, a subtle portrait of class and, finally, a film about the making of a film. Still, one of the geniuses of this truly heartbreaking movie is that none of these many elements crowds out the other. It never, ever feels rushed. Indeed, there are many scenes where the camera simply 'sits there' and quietly observes what is going on. Other films might smother themselves through references to 18th-century oil paintings, but The Souvenir somehow evades this too. And there's a certain ring of credibility to the story as well, no doubt in part due to the fact it is based on director Joanna Hogg's own experiences at film school. A beautifully observed and multi-layered film; I'll be happy if the sequel is one-half as good.

The Wrestler (2008) Randy 'The Ram' Robinson is long past his prime, but he is still rarin' to go in the local pro-wrestling circuit. Yet after a brutal beating that seriously threatens his health, Randy hangs up his tights and pursues a serious relationship... and even tries to reconnect with his estranged daughter. But Randy can't resist the lure of the ring, and readies himself for a comeback. The stage is thus set for Darren Aronofsky's The Wrestler, which is essentially about what drives Randy back to the ring. To be sure, Randy derives much of his money from wrestling as well as his 'fitness', self-image, self-esteem and self-worth. Oh, it's no use insisting that wrestling is fake, for the sport is, needless to say, Randy's identity; it's not for nothing that this film is called The Wrestler. In a number of ways, The Sound of Metal (2019) is both a reaction to (and a quiet remake of) The Wrestler, if only because both movies utilise 'cool' professions to explore such questions of identity. But perhaps simply when The Wrestler was produced makes it the superior film. Indeed, the role of time feels very important for the Wrestler. In the first instance, time is clearly taking its toll on Randy's body, but I felt it more strongly in the sense this was very much a pre-2008 film, released on the cliff-edge of the global financial crisis, and the concomitant precarity of the 2010s. Indeed, it is curious to consider that you couldn't make The Wrestler today, although not because the relationship to work has changed in any fundamentalway. (Indeed, isn't it somewhat depressing the realise that, since the start of the pandemic and the 'work from home' trend to one side, we now require even more people to wreck their bodies and mental health to cover their bills?) No, what I mean to say here is that, post-2016, you cannot portray wrestling on-screen without, how can I put it, unwelcome connotations. All of which then reminds me of Minari's notorious red hat... But I digress. The Wrestler is a grittily stark darkly humorous look into the life of a desperate man and a sorrowful world, all through one tragic profession.

Thief (1981) Frank is an expert professional safecracker and specialises in high-profile diamond heists. He plans to use his ill-gotten gains to retire from crime and build a life for himself with a wife and kids, so he signs on with a top gangster for one last big score. This, of course, could be the plot to any number of heist movies, but Thief does something different. Similar to The Wrestler and The Driver (see above) and a number of other films that I watched this year, Thief seems to be saying about our relationship to work and family in modernity and postmodernity. Indeed, the 'heist film', we are told, is an understudied genre, but part of the pleasure of watching these films is said to arise from how they portray our desired relationship to work. In particular, Frank's desire to pull off that last big job feels less about the money it would bring him, but a displacement from (or proxy for) fulfilling some deep-down desire to have a family or indeed any relationship at all. Because in theory, of course, Frank could enter into a fulfilling long-term relationship right away, without stealing millions of dollars in diamonds... but that's kinda the entire point: Frank needing just one more theft is an excuse to not pursue a relationship and put it off indefinitely in favour of 'work'. (And being Federal crimes, it also means Frank cannot put down meaningful roots in a community.) All this is communicated extremely subtly in the justly-lauded lowkey diner scene, by far the best scene in the movie. The visual aesthetic of Thief is as if you set The Warriors (1979) in a similarly-filthy Chicago, with the Xenophon-inspired plot of The Warriors replaced with an almost deliberate lack of plot development... and the allure of The Warriors' fantastical criminal gangs (with their alluringly well-defined social identities) substituted by a bunch of amoral individuals with no solidarity beyond the immediate moment. A tale of our time, perhaps. I should warn you that the ending of Thief is famously weak, but this is a gritty, intelligent and strangely credible heist movie before you get there.

Uncut Gems (2019) The most exhausting film I've seen in years; the cinematic equivalent of four cups of double espresso, I didn't even bother even trying to sleep after downing Uncut Gems late one night. Directed by the two Safdie Brothers, it often felt like I was watching two films that had been made at the same time. (Or do I mean two films at 2X speed?) No, whatever clumsy metaphor you choose to adopt, the unavoidable effect of this film's finely-tuned chaos is an uncompromising and anxiety-inducing piece of cinema. The plot follows Howard as a man lost to his countless vices mostly gambling with a significant side hustle in adultery, but you get the distinct impression he would be happy with anything that will give him another high. A true junkie's junkie, you might say. You know right from the beginning it's going to end in some kind of disaster, the only question remaining is precisely how and what. Portrayed by an (almost unrecognisable) Adam Sandler, there's an uncanny sense of distance in the emotional chasm between 'Sandler-as-junkie' and 'Sandler-as-regular-star-of-goofy-comedies'. Yet instead of being distracting and reducing the film's affect, this possibly-deliberate intertextuality somehow adds to the masterfully-controlled mayhem. My heart races just at the memory. Oof.

Woman in the Dunes (1964) I ended up watching three films that feature sand this year: Denis Villeneuve's Dune (2021), Lawrence of Arabia (1962) and Woman in the Dunes. But it is this last 1964 film by Hiroshi Teshigahara that will stick in my mind in the years to come. Sure, there is none of the Medician intrigue of Dune or the Super Panavision-70 of Lawrence of Arabia (or its quasi-orientalist score, itself likely stolen from Anton Bruckner's 6th Symphony), but Woman in the Dunes doesn't have to assert its confidence so boldly, and it reveals the enormity of its plot slowly and deliberately instead. Woman in the Dunes never rushes to get to the film's central dilemma, and it uncovers its terror in little hints and insights, all whilst establishing the daily rhythm of life. Woman in the Dunes has something of the uncanny horror as Dogtooth (see above), as well as its broad range of potential interpretations. Both films permit a wide array of readings, without resorting to being deliberately obscurantist or being just plain random it is perhaps this reason why I enjoyed them so much. It is true that asking 'So what does the sand mean?' sounds tediously sophomoric shorn of any context, but it somehow applies to this thoughtfully self-contained piece of cinema.

A Quiet Place (2018) Although A Quiet Place was not actually one of the best films I saw this year, I'm including it here as it is certainly one of the better 'mainstream' Hollywood franchises I came across. Not only is the film very ably constructed and engages on a visceral level, I should point out that it is rare that I can empathise with the peril of conventional horror movies (and perhaps prefer to focus on its cultural and political aesthetics), but I did here. The conceit of this particular post-apocalyptic world is that a family is forced to live in almost complete silence while hiding from creatures that hunt by sound alone. Still, A Quiet Place engages on an intellectual level too, and this probably works in tandem with the pure 'horrorific' elements and make it stick into your mind. In particular, and to my mind at least, A Quiet Place a deeply American conservative film below the surface: it exalts the family structure and a certain kind of sacrifice for your family. (The music often had a passacaglia-like strain too, forming a tombeau for America.) Moreover, you survive in this dystopia by staying quiet that is to say, by staying stoic suggesting that in the wake of any conflict that might beset the world, the best thing to do is to keep quiet. Even communicating with your loved ones can be deadly to both of you, so not emote, acquiesce quietly to your fate, and don't, whatever you do, speak up. (Or join a union.) I could go on, but The Quiet Place is more than this. It's taut and brief, and despite cinema being an increasingly visual medium, it encourages its audience to develop a new relationship with sound.

11 January 2022

Ritesh Raj Sarraf: ThinkPad AMD Debian

After a hiatus of 6 years, it was nice to be back with the ThinkPad. This blog post briefly touches upon my impressions with the current generation ThinkPad T14 Gen2 AMD variant.
ThinkPad T14 Gen2 AMD
ThinkPad T14 Gen2 AMD

Lenovo It took 8 weeks to get my hands on the machine. Given the pandemic, restrictions and uncertainities, not sure if I should call it an ontime delivery. This was a CTO - Customise-to-order; so was nice to get rid of things I really didn t care/use much. On the other side, it also meant I could save on some power. It also came comparatively cheaper overall.
  • No fingerprint reader
  • No Touch screen
There s still parts where Lenovo could improve. Or less frustate a customer. I don t understand why a company would provide a full customization option on their portal, while at the same time, not provide an explicit option to choose the make/model of the hardware one wants. Lenovo deliberately chooses to not show/specify which WiFi adapter one could choose. So, as I suspected, I ended up with a MEDIATEK Corp. Device 7961 wifi adapter.

AMD For the first time in my computing life, I m now using AMD at the core. I was pretty frustrated with annoying Intel Graphics bugs, so decided to take the plunge and give AMD/ATI a shot, knowing that the radeon driver does have decent support. So far, on the graphics side of things, I m glad that things look bright. The stock in-kernel radeon driver has been working perfect for my needs and I haven t had to tinker even once so far, in my 30 days of use. On the overall system performance, I have not done any benchmarks nor do I want to do. But wholly, the system performance is smooth.

Power/Thermal This is where things need more improvement on the AMD side. This AMD laptop terribly draws a lot of power in suspend mode. And it isn t just this machine, but also the previous T14 Gen1 which has similar problems. I m not sure if this is a generic ThinkPad problem, or an AMD specific problem. But coming from the Dell XPS 13 9370 Intel, this does draw a lot lot more power. So much, that I chose to use hibernation instead. Similarly, on the thermal side, this machine doesn t cool down well as compared the the Dell XPS Intel one. On an idle machine, its temperature are comparatively higher. Looking at powertop reports, it does show to consume an average of 10 watts power even while idle. I m hoping these are Linux ingeration issues and that Lenovo/AMD will improve things in the coming months. But given the user feedback on the ThinkPad T14 Gen1 thread, it may just be wishful thinking.

Linux The overall hardware support has been surprisingly decent. The MediaTek WiFi driver had some glitches but with Linux 5.15+, things have considerably improved. And I hope the trend will continue with forthcoming Linux releases. My previous device driver experience with MediaTek wasn t good but I took the plunge, considering that in the worst scenario I d have the option to swap the card. There s a lot of marketing about Linux + Intel. But I took a jibe with Linux + AMD. There are glitches but nothing so far that has been a dealbreaker. If anything, I wish Lenovo/AMD would seriously work on the power/thermal issues.

Migration Other than what s mentioned above, I haven t had any serious issues. I may have had some rare occassional hangs but they ve been so infrequent that I haven t spent time to investigate those. Upon receiving the machine, my biggest requirement was how to switch my current workstation from Dell XPS to Lenovo ThinkPad. I ve been using btrfs for some time now. And over the years, built my own practise on how to structure it. Things like, provisioning [sub]volumes, based on use cases is one thing I see. Like keeping separate subvols for: cache/temporary data, copy-on-write data , swap etc. I wish these things could be simplified; either on the btrfs tooling side or some different tool on top of it. Below is filtered list of subvols created over years, that were worthy of moving to the new machine.
rrs@priyasi:~$ cat btrfs-volume-layout 
ID 550 gen 19166 top level 5 path home/foo/.cache
ID 552 gen 1522688 top level 5 path home/rrs
ID 553 gen 1522688 top level 552 path home/rrs/.cache
ID 555 gen 1426323 top level 552 path home/rrs/rrs-home/Libvirt-Images
ID 618 gen 1522672 top level 5 path var/spool/news
ID 634 gen 1522670 top level 5 path var/tmp
ID 635 gen 1522688 top level 5 path var/log
ID 639 gen 1522226 top level 5 path var/cache
ID 992 gen 1522670 top level 5 path disk-tmp
ID 1018 gen 1522688 top level 552 path home/rrs/NoBackup
ID 1196 gen 1522671 top level 5 path etc
ID 23721 gen 775692 top level 5 path swap
18:54                      

btrfs send/receive This did come in handy but I sorely missed some feature. Maybe they aren t there, or are there and I didn t look close enough. Over the years, different attributes were set to different subvols. Over time I forget what feature was added where. But from a migration point of view, it d be nice to say, Take this volume and take it with all its attributes . I didn t find that functionality in send/receive. There s get/set-property which I noticed later but by then it was late. So some sort of tooling, ideally something like btrfs migrate or somesuch would be nicer. In the file system world, we already have nice tools to take care of similar scenarios. Like with rsync, I can request it to carry all file attributes. Also, iirc, send/receive works only on ro volumes. So there s more work one needs to do in:
  1. create ro vol
  2. send
  3. receive
  4. don t forget to set rw property
  5. And then somehow find out other properties set on each individual subvols and [re]apply the same on the destination
I wish this all be condensed into a sub-command. For my own sake, for this migration, the steps used were:
user@debian:~$ for volume in  sudo btrfs sub list /media/user/TOSHIBA/Migrate/   cut -d ' ' -f9   grep -v ROOTVOL   grep -v etc   grep -v btrbk ; do echo $volume; sud
o btrfs send /media/user/TOSHIBA/$volume   sudo btrfs receive /media/user/BTRFSROOT/ ; done            
Migrate/snapshot_disk-tmp
At subvol /media/user/TOSHIBA/Migrate/snapshot_disk-tmp
At subvol snapshot_disk-tmp
Migrate/snapshot-home_foo_.cache
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_foo_.cache
At subvol snapshot-home_foo_.cache
Migrate/snapshot-home_rrs
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs
At subvol snapshot-home_rrs
Migrate/snapshot-home_rrs_.cache
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs_.cache
At subvol snapshot-home_rrs_.cache
ERROR: crc32 mismatch in command
Migrate/snapshot-home_rrs_rrs-home_Libvirt-Images
At subvol /media/user/TOSHIBA/Migrate/snapshot-home_rrs_rrs-home_Libvirt-Images
At subvol snapshot-home_rrs_rrs-home_Libvirt-Images
ERROR: crc32 mismatch in command
Migrate/snapshot-var_spool_news
At subvol /media/user/TOSHIBA/Migrate/snapshot-var_spool_news
At subvol snapshot-var_spool_news
Migrate/snapshot-var_lib_machines
At subvol /media/user/TOSHIBA/Migrate/snapshot-var_lib_machines
At subvol snapshot-var_lib_machines
Migrate/snapshot-var_lib_machines_DebianSidTemplate
..... snipped .....
And then, follow-up with:
user@debian:~$ for volume in  sudo btrfs sub list /media/user/BTRFSROOT/   cut -d ' ' -f9 ; do echo $volume; sudo btrfs property set -ts /media/user/BTRFSROOT/$volume ro false; done
ROOTVOL
ERROR: Could not open: No such file or directory
etc
snapshot_disk-tmp
snapshot-home_foo_.cache
snapshot-home_rrs
snapshot-var_spool_news
snapshot-var_lib_machines
snapshot-var_lib_machines_DebianSidTemplate
snapshot-var_lib_machines_DebSidArmhf
snapshot-var_lib_machines_DebianJessieTemplate
snapshot-var_tmp
snapshot-var_log
snapshot-var_cache
snapshot-disk-tmp
And then finally, renaming everything to match proper:
user@debian:/media/user/BTRFSROOT$ for x in snapshot*; do vol=$(echo $x   cut -d '-' -f2   sed -e "s _ / g"); echo $x $vol; sudo mv $x $vol; done
snapshot-var_lib_machines var/lib/machines
snapshot-var_lib_machines_Apertisv2020ospackTargetARMHF var/lib/machines/Apertisv2020ospackTargetARMHF
snapshot-var_lib_machines_Apertisv2021ospackTargetARM64 var/lib/machines/Apertisv2021ospackTargetARM64
snapshot-var_lib_machines_Apertisv2022dev3ospackTargetARMHF var/lib/machines/Apertisv2022dev3ospackTargetARMHF
snapshot-var_lib_machines_BusterArm64 var/lib/machines/BusterArm64
snapshot-var_lib_machines_DebianBusterTemplate var/lib/machines/DebianBusterTemplate
snapshot-var_lib_machines_DebianJessieTemplate var/lib/machines/DebianJessieTemplate
snapshot-var_lib_machines_DebianSidTemplate var/lib/machines/DebianSidTemplate
snapshot-var_lib_machines_DebianSidTemplate_var_lib_portables var/lib/machines/DebianSidTemplate/var/lib/portables
snapshot-var_lib_machines_DebSidArm64 var/lib/machines/DebSidArm64
snapshot-var_lib_machines_DebSidArmhf var/lib/machines/DebSidArmhf
snapshot-var_lib_machines_DebSidMips var/lib/machines/DebSidMips
snapshot-var_lib_machines_JenkinsApertis var/lib/machines/JenkinsApertis
snapshot-var_lib_machines_v2019 var/lib/machines/v2019
snapshot-var_lib_machines_v2019LinuxSupport var/lib/machines/v2019LinuxSupport
snapshot-var_lib_machines_v2020 var/lib/machines/v2020
snapshot-var_lib_machines_v2021dev3Slim var/lib/machines/v2021dev3Slim
snapshot-var_lib_machines_v2021dev3SlimTarget var/lib/machines/v2021dev3SlimTarget
snapshot-var_lib_machines_v2022dev2OspackMinimal var/lib/machines/v2022dev2OspackMinimal
snapshot-var_lib_portables var/lib/portables
snapshot-var_log var/log
snapshot-var_spool_news var/spool/news
snapshot-var_tmp var/tmp

snapper Entirely independent of this, but indirectly related. I use snapper as my snapshotting tool. It worked perfect on my previous machine. While everything got migrated, the only thing that fell apart was snapper. It just wouldn t start/run proper. Funny thing is that I just removed the snapper configs and reinitialized with the exact same config again, and voila snapper was happy.

Conclusion That was pretty much it. With the above and then also migrating /boot and then just chroot to install the boot loader. At some time, I d like to explore other boot options but given that that is such a non-essential task, it is low on the list. The good part was that I booted into my new machine with my exact workstation setup as it was. All the way to the user cache and the desktop session. So it was nice on that part. But I surely think there s room for a better migration experience here. If not directly as btrfs migrate, then maybe as an independent tool. The problem is that such a tool is going to be used once in years, so I didn t find the motivation to write one. But this surely would be a good use case for the distribution vendors.

9 January 2022

Dirk Eddelbuettel: Rblpapi 0.3.13: Some Fixes and Documentation

A new version, now at 0.3.13, of the Rblpapi package just arrived at CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg (but note that a valid Bloomberg license and installation is required). This is the thirteenth release since the package first appeared on CRAN in 2016. It comprises the PRs from three different contributors (with special thanks once again to Michael Kerber), and extends test and documentation, and extends two function interfaces to control explicitly whether returned lists of length one should be simplified. The list of changes follow below.

Changes in Rblpapi version 0.3.13 (2022-01-09)
  • Add a test for bds (Michael Kerber in #352)
  • Add simplify argument (and option) to bdh and bds (Dirk in #354 closing #353, #351)
  • Improve documentation for bsearch (John in #357 closing #356)

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

8 January 2022

Jonathan Dowland: 2021 in Fiction

Cover for *This is How You Lose the Time War*
Cover for *Robot*
Cover for *The Glass Hotel*
Following on from last year's round-up of my reading, here's a look at the fiction I enjoyed in 2021. I managed to read 42 books in 2021, up from 31 last year. That's partly to do with buying an ereader: 33/36% of my reading (by pages/by books) was ebooks. I think this demonstrates that ebooks have mostly complemented paper books for me, rather than replacing them. My book of the year (although it was published in 2019) was This is How You Lose the Time War by Amal El-Mohtar and Max Gladstone: A short epistolary love story between warring time travellers and quite unlike anything else I've read for a long time. Other notables were The Glass Hotel by Emily St John Mandel and Robot by Adam Wi niewski-Snerg. The biggest disappointment for me was The Ministry for the Future by Kim Stanley Robinson (KSR), which I haven't even finished. I love KSRs writing: I've written about him many times on this blog, at least in 2002, 2006 and 2009, I think I've read every other novel he's published and most of his short stories. But this one was too much of something for me. He's described this novel a the end-point of a particular journey and approach to writing he's taken, which I felt relieved to learn, assuming he writes any more novels (and I really hope that he does) they will likely be in a different "mode". My "new author discovery" for 2021 was Chris Beckett: I tore through Two Tribes and America City before promptly buying all his other work. He fits roughly into the same bracket as Adam Roberts and Christopher Priest, two of my other favourite authors. 5 of the books I read (12%) were from my "backlog" of already-purchased physical books. I'd like to try and reduce my Backlog further so I hope to push this figure up next year. I made a small effort to read more diverse authors this year. 24% of the books I read (by book count and page count) were by women. 15% by page count were (loosely) BAME (19% by book count). Again I'd like to increase these numbers modestly in 2022. Unlike 2020, I didn't complete any short story collections in 2021! This is partly because there was only one issue of Interzone published in all of 2021, a double-issue which I haven't yet finished. This is probably a sad date point in terms of Interzone's continued existence, but it's not dead yet.

3 January 2022

Paul Wise: FLOSS Activities December 2021

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Spam: reported 166 Debian mailing list posts
  • Patches: reviewed libpst upstream patches
  • Debian packages: sponsored nsis, memtest86+
  • Debian wiki: RecentChanges for the month
  • Debian BTS usertags: changes for the month
  • Debian screenshots:

Administration
  • libpst: setup GitHub presence, migrate from hg to git, requested details from bug reporters
  • plac: cleaned up git repo anomalies
  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages: stardict, node-carto
  • Debian wiki: unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The purple-discord, python-plac, sptag, smart-open, libpst, memtest86+, oci-python-sdk work was sponsored. All other work was done on a volunteer basis.

21 December 2021

Jonathan Dowland: Vim plugins by Tim Pope

I've been using Vim as my main text editor for 18 years, but for most of that time I've been using something very close to the default configuration: my vimrc contained not much more than preferences for indentation and how to visually indicate white space characters like tabs. Last but not least, I've used a single colour scheme for most of that time: Zenburn. In 20151 I started exploring a few Vim plugins2. To manage them, I started by choosing a plugin manager, Pathogen3. Recently I noticed that the plugin's author, Tim Pope, now recommends new users just use Vim's built in package management instead. I got curious about Tim Pope's other plugins: He has written a great deal of them. Given I've spent most of two decades with a barely-configured Vim, you can imagine that I don't want to radically alter the way it works, and so I did not expect to want to use a lot of plugins. The utility of a plugin would have to outweight the disadvantages of coming to rely on one. But when browsing Tim's plugins, time and time again, I found myself reacting to description with How did I manage without this?. And so, I've ended up installing all of the following, all by Tim Pope:

  1. I had to look this up, probably because I started at redhat
  2. I have opinions about Plugins, what supporting them means architecturally for your application, the interaction with Open Source, and stuff like that, perhaps for another post.
  3. To try out Vim plugins the first thing you need to figure out is how to manage them (install, activate, configure). Vim grew a Plugin manager in version 8 (2016). Prior to that, people wrote third-party ones. For this reason there is a frankly absurd number of Plugin managers, including: Vim's own; Vim Plug; Vundle; Dein; Volt and Pathogen.

17 December 2021

Jonathan Dowland: ereader

Kobo Libra H2O e-reader Kobo Libra H2O e-reader
This year I finally bought an e-reader: a "Kobo Libra H2O": a 7" 300ppi screen with an adjustable colour temperature front light, waterproof, physical buttons on a "margin"/spine which pushes the case dimensions up to 8", which is sadly a little too big for most jeans pockets. In fact this is the second time I've bought one. Sarah and I bought a pair of Amazon Kindle Paperwhite (2nd generation) readers as Honeymoon gifts for each other in 2013. Sarah took to hers, but after giving mine a try and reading a couple of books on it I ended up giving it to my Father-in-law. The main problem I had with it was the colour temperature of the backlight: it was too blue and "clinical". It's interesting to me to realise how the environment that I read in can become indelibly linked to the memory of that book. One book in particular that I read on the Kindle was James Smythe's "The Machine", a modern twist on a classic horror story. This was perfectly complemented by the somewhat pallid, sick colour of the backlight, especially on its lowest brightness settings. It didn't go so well with other stories.
Broken Libra H2O e-reader Broken Libra H2O e-reader
I got on much better with the Libra H2O. For me, e-books and e-reading has not replaced traditional paper books, but complemented them, giving me an opportunity to read in contexts and situations where I couldn't manage with a real book (such as in the pitch black, lying on my back with a toddler on my chest). I feel that being able to read more often thanks to the e-reader has spurred me on to read even paper books more frequently as well. Unfortunately the screen broke, about 10 months after I bought it! After a bit of soul-searching on what to do about replacing it, and briefly considering the latest Amazon Paperwhite (which now has a 300ppi display and adjustable-colour temperature front-light). This looks like a very nice e-reader: no physical buttons which is a bit of a shame but narrower without the margin/spine of the Kobo, so possibly more pocket-friendly. On the other hand, I really felt that I wanted to avoid giving Amazon the money, for a whole variety of reasons.
Libra 2 e-reader. Spot the difference? Libra 2 e-reader. Spot the difference?
So I instead settled on the "Kobo Libra 2", which is a slight refresh of the Libra H2O, but almost identical. I'm going to pair it with a nice cover for when its not in use and hopefully it will last longer than its predecessor. My current favourite feature on the Libra 2, that I don't remember on the H2O and I'm fairly sure doesn't exist on Kindles is it supports a "night mode", which is effectively a reverse video, white-on-black mode. This is fantastic for keeping the light levels down at night. I was already using the backlight at about 2% brightness and occasionally found even that too bright.

14 December 2021

Timo Jyrinki: Working and warming up cats

How to disable internal keyboard/touchpad when a cat arrives
I m using an external keyboard (1) and mouse (2), but the laptop lid is usually still open for better cooling. That means the internal keyboard (3) and touchpad (4) made of comfortable materials are open to be used by a cat searching for warmth (7), in the obvious every time case that a normal non-heated nest (6) is not enough.
The problem is, everything goes chaotic at that point in the default configuration. The solution is to have quick shortcuts in my Dash to Dock (8) to both disable (10) and enable (9) keyboard and touchpad at a very rapid pace.It is to be noted that I m not disabling the touch screen (5) by default, because most of the time the cat is not leaning on it there is also the added benefit that if one forgets about the internal keyboard and touchpad disabling and detaches the laptop from the USB-C monitor (11), there s the possibility of using the touch screen and on-screen keyboard to type in the password and tap on the keyboard/touchpad enabling shortcut button again. If also touch screen was disabled, the only way would be to go back to an external keyboard or reboot.So here are the scripts. First, the disabling script (pardon my copy-paste use of certain string manipulation tools):
dconf write /org/gnome/desktop/peripherals/touchpad/send-events "'disabled'"
sudo killall evtest
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "AT Translated Set 2 keyboard" tail -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "Dell WMI" tail -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "Power" grep Kernel tail -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "Power" grep Kernel head -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "Sleep" grep Kernel tail -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "HID" grep Kernel head -n 1 sed 's/.*\/dev/\/dev/') &
sudo evtest --grab $(sudo libinput list-devices grep -A 1 "HID" tail -n 1 sed 's/.*\/dev/\/dev/') &
#sudo evtest --grab $(sudo libinput list-devices grep -A 1 "ELAN" tail -n 1 sed 's/.*\/dev/\/dev/') # Touch screen
And the associated ~/.local/share/applications/disable-internal-input.desktop:
[Desktop Entry]
Version=1.0
Name=Disable internal input
GenericName=Disable internal input
Exec=/bin/bash -c /home/timo/Asiakirjat/helpers/disable-internal-input.sh
Icon=yast-keyboard
Type=Application
Terminal=false
Categories=Utility;Development;
Here s the enabling script:
dconf write /org/gnome/desktop/peripherals/touchpad/send-events "'enabled'"
sudo killall evtest
and the desktop file:
[Desktop Entry]
Version=1.0
Name=Enable internal input
GenericName=Enable internal input
Exec=/bin/bash -c /home/timo/Asiakirjat/helpers/enable-internal-input.sh
Icon=/home/timo/.local/share/icons/hicolor/scalable/apps/yast-keyboard-enable.png
Type=Application
Terminal=false
Categories=Utility;Development;
With these, if I sense a cat or am just proactive enough, I press Super+9. If I m about to detach my laptop from the monitor, I press Super+8. If I forget the latter (usually this is the case) and haven t yet locked the screen, I just tap the enabling icon on the touch screen.

9 December 2021

David Kalnischkies: APT for Advent of Code

Screenshot of my Advent of Code 2021 status page as of today Advent of Code 2021
Advent of Code, for those not in the know, is a yearly Advent calendar (since 2015) of coding puzzles many people participate in for a plenary of reasons ranging from speed coding to code golf with stops at learning a new language or practicing already known ones. I usually write boring C++, but any language and then some can be used. There are reports of people implementing it in hardware, solving them by hand on paper or using Microsoft Excel so, after solving a puzzle the easy way yesterday, this time I thought: CHALLENGE ACCEPTED! as I somehow remembered an old 2008 article about solving Sudoku with aptitude (Daniel Burrows via archive.org as the blog is long gone) and the good same old a package management system that can solve [puzzles] based on package dependency rules is not something that I think would be useful or worth having (Russell Coker). Day 8 has a rather lengthy problem description and can reasonably be approached in a bunch of different way. One unreasonable approach might be to massage the problem description into Debian packages and let apt help me solve the problem (specifically Part 2, which you unlock by solving Part 1. You can do that now, I will wait here.) Be warned: I am spoiling Part 2 in the following, so solve it yourself first if you are interested. I will try to be reasonable consistent in naming things in the following and so have chosen: The input we get are lines like acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab cdfeb fcadb cdfeb cdbaf. The letters are wires mixed up and connected to the segments of the displays: A group of these letters is hence a digit (the first 10) which represent one of the digits 0 to 9 and (after the pipe) the four displays which match (after sorting) one of the digits which means this display shows this digit. We are interested in which digits are displayed to solve the puzzle. To help us we also know which segments form which digit, we just don't know the wiring in the back. So we should identify which wire maps to which segment! We are introducing the packages wire-X-connects-to-Y for this which each provide & conflict1 with the virtual packages segment-Y and wire-X-connects. The later ensures that for a given wire we can only pick one segment and the former ensures that not multiple wires map onto the same segment. As an example: wire a's possible association with segment b is described as:
Package: wire-a-connects-to-b
Provides: segment-b, wire-a-connects
Conflicts: segment-b, wire-a-connects
Note that we do not know if this is true! We generate packages for all possible (and then some) combinations and hope dependency resolution will solve the problem for us. So don't worry, the hard part will be done by apt, we just have to provide all (im)possibilities! What we need now is to translate the 10 digits (and 4 outputs) from something like acedgfb into digit-0-is-eight and not, say digit-0-is-one. A clever solution might realize that a one consists only of two segments so a digit wiring up seven segments can not be a 1 (and must be 8 instead), but again we aren't here to be clever: We want apt to figure that out for us! So what we do is simply making every digit-0-is-N (im)possible choice available as a package and apply constraints: A given digit-N can only display one number and each N is unique as digit so for both we deploy Provides & Conflicts again. We also need to reason about the segments in the digits: Each of the digit packages gets Depends on wire-X-connects-to-Y where X is each possible wire (e.g. acedgfb) and Y each segment forming the digit (e.g. cf for one). The different choices for X are or'ed together, so that either of them satisfies the Y. We know something else too through: The segments which are not used by the digit can not be wired to any of the Xs. We model this with Conflicts on wire-X-connects-to-Y. As an example: If digit-0s acedgfb would be displaying a one (remember, it can't) the following package would be installable:
Package: digit-0-is-one
Version: 1
Depends: wire-a-connects-to-c   wire-c-connects-to-c   wire-e-connects-to-c   wire-d-connects-to-c   wire-g-connects-to-c   wire-f-connects-to-c   wire-b-connects-to-c,
         wire-a-connects-to-f   wire-c-connects-to-f   wire-e-connects-to-f   wire-d-connects-to-f   wire-g-connects-to-f   wire-f-connects-to-f   wire-b-connects-to-f
Provides: digit-0, digit-is-one
Conflicts: digit-0, digit-is-one,
  wire-a-connects-to-a, wire-c-connects-to-a, wire-e-connects-to-a, wire-d-connects-to-a, wire-g-connects-to-a, wire-f-connects-to-a, wire-b-connects-to-a,
  wire-a-connects-to-b, wire-c-connects-to-b, wire-e-connects-to-b, wire-d-connects-to-b, wire-g-connects-to-b, wire-f-connects-to-b, wire-b-connects-to-b,
  wire-a-connects-to-d, wire-c-connects-to-d, wire-e-connects-to-d, wire-d-connects-to-d, wire-g-connects-to-d, wire-f-connects-to-d, wire-b-connects-to-d,
  wire-a-connects-to-e, wire-c-connects-to-e, wire-e-connects-to-e, wire-d-connects-to-e, wire-g-connects-to-e, wire-f-connects-to-e, wire-b-connects-to-e,
  wire-a-connects-to-g, wire-c-connects-to-g, wire-e-connects-to-g, wire-d-connects-to-g, wire-g-connects-to-g, wire-f-connects-to-g, wire-b-connects-to-g
Repeat such stanzas for all 10 possible digits for digit-0 and then repeat this for all the other nine digit-N. We produce pretty much the same stanzas for display-0(-is-one), just that we omit the second Provides & Conflicts from above (digit-is-one) as in the display digits can be repeated. The rest is the same (modulo using display instead of digit as name of course). Lastly we create a package dubbed solution which depends on all 10 digit-N and 4 display-N all of them virtual packages apt will have to choose an installable provider from and we are nearly done! The resulting Packages file2 we can give to apt while requesting to install the package solution and it will spit out not only the display values we are interested in but also which number each digit represents and which wire is connected to which segment. Nifty!
$ ./skip-aoc 'acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab   cdfeb fcadb cdfeb cdbaf'
[ ]
The following additional packages will be installed:
  digit-0-is-eight digit-1-is-five digit-2-is-two digit-3-is-three
  digit-4-is-seven digit-5-is-nine digit-6-is-six digit-7-is-four
  digit-8-is-zero digit-9-is-one display-1-is-five display-2-is-three
  display-3-is-five display-4-is-three wire-a-connects-to-c
  wire-b-connects-to-f wire-c-connects-to-g wire-d-connects-to-a
  wire-e-connects-to-b wire-f-connects-to-d wire-g-connects-to-e
[ ]
0 upgraded, 22 newly installed, 0 to remove and 0 not upgraded.
We are only interested in the numbers on the display through, so grepping the apt output (-V is our friend here) a bit should let us end up with what we need as calculating3 is (unsurprisingly) not a strong suit of our package relationship language so we need a few shell commands to help us with the rest.
$ ./skip-aoc 'acedgfb cdfbe gcdfa fbcad dab cefabd cdfgeb eafb cagedb ab   cdfeb fcadb cdfeb cdbaf' -qq
5353
I have written the skip-aoc script as a testcase for apt, so to run it you need to place it in /path/to/source/of/apt/test/integration and built apt first, but that is only due to my laziness. We could write a standalone script interfacing with the system installed apt directly and in any apt version since ~2011. To hand in the solution for the puzzle we just need to run this on each line of the input (~200 lines) and add all numbers together. In other words: Behold this beautiful shell one-liner: parallel -I ' ' ./skip-aoc ' ' -qq < input.txt paste -s -d'+' - bc (You may want to run parallel with -P to properly grill your CPU as that process can take a while otherwise and it still does anyhow as I haven't optimized it at all the testing framework does a lot of pointless things wasting time here, but we aren't aiming for the leaderboard so ) That might or even likely will fail through as I have so far omitted a not unimportant detail: The default APT resolver is not able to solve this puzzle with the given problem description we need another solver! Thankfully that is as easy as installing apt-cudf (and with it aspcud) which the script is using via --solver aspcud to make apt hand over the puzzle to a "proper" solver (or better: A solver who is supposed to be good at "answering set" questions). The buildds are using this for experimental and/or backports builds and also for installability checks via dose3 btw, so you might have encountered it before. Be careful however: Just because aspcud can solve this puzzle doesn't mean it is a good default resolver for your day to day apt. One of the reasons the default resolver has such a hard time solving this here is that or-groups have usually an order in which the first is preferred over every later option and so fort. This is of no concern here as all these alternatives will collapse to a single solution anyhow, but if there are multiple viable solutions (which is often the case) picking the "wrong" alternative can have bad consequences. A classic example would be exim4 postfix nullmailer. They are all MTAs but behave very different. The non-default solvers also tend to lack certain features like keeping track of auto-installed packages or installing Recommends/Suggests. That said, Julian is working on another solver as I write this which might deal with more of these issues. And lastly: I am also relatively sure that with a bit of massaging the default resolver could be made to understand the problem, but I can't play all day with this maybe some other day. Disclaimer: Originally posted in the daily megathread on reddit, the version here is just slightly better understandable as I have hopefully renamed all the packages to have more conventional names and tried to explain what I am actually doing. No cows were harmed in this improved version, either.

  1. If you would upload those packages somewhere, it would be good style to add Replaces as well, but it is of minor concern for apt so I am leaving them out here for readability.
  2. We have generated 49 wires, 100 digits, 40 display and 1 solution package for a grant total of 190 packages. We are also making use of a few purely virtual ones, but that doesn't add up to many packages in total. So few packages are practically childs play for apt given it usually deals with thousand times more. The instability for those packages tends to be a lot better through as only 22 of 190 packages we generated can (and will) be installed. Britney will hate you if your uploads to Debian unstable are even remotely as bad as this.
  3. What we could do is introduce 10.000 packages which denote every possible display value from 0000 to 9999. We would then need to duplicate our 10.190 packages for each line (namespace them) and then add a bit more than a million packages with the correct dependencies for summing up the individual packages for apt to be able to display the final result all by itself. That would take a while through as at that point we are looking at working with ~22 million packages with a gazillion amount of dependencies probably overworking every solver we would throw at it a bit of shell glue seems the better option for now.
This article was written by David Kalnischkies on apt-get a life and republished here by pulling it from a syndication feed. You should check there for updates and more articles about apt and EDSP.

8 December 2021

Jonathan Dowland: Java in a Container World

me, delivering the talk me, delivering the talk
The redhat talk I gave at UK Systems '21 was entitled "Java in a Container World: What we've done and where we're going". The slides (with notes) for it are available:

7 December 2021

Jonathan Dowland: Cost models for evaluating stream-processing programs

title slide
As I wrote, last week I attended the UK Systems Research 2021 and gave two (or 2 , or 3) talks. my PhD talk is entitled "Picking a winner: cost models for evaluating stream-processing programs". The slides (with notes) are here:

Dirk Eddelbuettel: Rblpapi 0.3.12: Fixes and Updates

The Rblp team is happy to announce a new version 0.3.12 of Rblpapi which just arrived at CRAN. Rblpapi provides a direct interface between R and the Bloomberg Terminal via the C++ API provided by Bloomberg (but note that a valid Bloomberg license and installation is required). This is the twelveth release since the package first appeared on CRAN in 2016. Changes are detailed below and include both extensions to functionality, actual bug fixes and changes to the package setup. Special thanks goes to Michael Kerber, Yihui Xie and Kai Lin for contributing pull requests!

Changes in Rblpapi version 0.3.12 (2021-12-07)
  • bdh() supports new option returnAs (Michael Kerber and Dirk in #335 fixing #206)
  • Remove extra backtick in vignette (Yihui Xie in #343)
  • Fix a segfault from bulk access with bds (Kai Lin in #347 fixing #253)
  • Support REQUEST_STATUS in bdh (Kai Lin and John in #349 fixing #348)
  • Vignette now uses simplermarkdown (Dirk in #350)

Courtesy of my CRANberries, there is also a diffstat report for the this release. As always, more detailed information is on the Rblpapi page. Questions, comments etc should go to the issue tickets system at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

6 December 2021

Jonathan Dowland: Sixth Annual UK System Research Challenges Workshop lightning talk

me looking awkward, thanks [Mark Little](https://twitter.com/nmcl/status/1466148768043126791/photo/1) me looking awkward, thanks Mark Little
Last week I attended the UK Systems Research 2021 conference in County Durham, my first conference in nearly two years (since FOSDEM 2020, right on the cusp of the Pandemic). The Systems conference community is very pleasant and welcoming and so when I heard it was going to take place "physically" again this year I was so keen to attend I decided to hedge my bets and submit two talk proposals. I wasn't expecting them both to be accepted As well as the regular talks (more on those in another post) there is a tradition for people to give short, impromptu lightning talks after dinner on the second night. I've given two of these before, and I'd been considering whether to offer to one this time or not, but with two talks to deliver (and finish writing) I wasn't sure. Usually people talk about something interesting that they have been doing besides their research or day-jobs, but the last two years have been somewhat difficult and I didn't really think I had a topic to talk about. Then I wondered if that was a topic in itself During the first day of the conference (and especially one I'd got past one of my talks) I started to outline a lightning talk idea and it seemed to come out well enough that I thought I'd give it a go. Unusually I therefore had something written down and I was surprised how well it was received, so I thought I'd share it. Here it is:
I was anticipating the lightning talks and being cajoled into talking about something. I've done it twice before. So I've been racking my brains to figure out if I've done anything interesting enough to talk about. in 2018 I talked about some hack I'd made to the classic computer game Doom from 1993. I've done several hacks to Doom that I could probably talk about except I've become a bit uncomfortable about increasingly being thought of as "that doom guy". I'd been reflecting on why it was that I continued to mess about with that game in the first place and I realised it was a form of expression: I was treating Doom like a canvas. I've spent most of my career thinking about what I do in the frame of either science or engineering. I suffer from the creative urge and I've often expressed (and sated) that through my work. And that's possible because there's a craft in what we do. In 2019 I talked about a project I'd embarked on to resurrect my childhood computer, a Commodore Amiga 500, in order to rescue my childhood drawings and digital paintings. (There's the artistic thing again). I'd achieved that and I have ambitions to do some more Amiga stuff but again that's a work in progress and there's nothing much to talk about. In recent years I've been thinking more and more about art and became interested in the works and writings of people like Grayson Perry, Laurie Anderson and Brian Eno. I first learned about Eno through his music but he's also a visual artist. and a music producer. As a producer in the 70s he co-invented a system to try and break out of writer's block called "oblique strategies": A deck of cards with oblique suggestions written on them. When you're stuck, you pull a card and it might help you to reframe what you are working on and think about it in a completely different way. I love this idea and I think we should use more things like that in software engineering at least. So back to casting about for something to talk about. What have I been doing in the last couple of years? Frankly, surviving - I've just about managed to keep doing my day job, and keep working on the PhD, at home with two young kids and home schooling and the rest of it. Which is an achievement but makes for a boring lightning talk. But I'd like to say that for anyone here who might have been worrying similarly: I think surviving is more than enough. I'll close on the subject of thinking like an artist and not an engineer. I brought some of the Oblique Strategies deck with me and I thought I'd draw a card to perhaps help you out of a creative dilemma if you're in one. And I kid you not, the first card I drew was this one:
Card reading 'You are an Engineer'

Paul Tagliamonte: Proxying Ethernet Frames to PACKRAT (Part 5/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we left off at being able to send and recieve PACKRAT frames to and from devices. Since we can transport IPv4 packets over the network, let s go ahead and see if we can read/write Ethernet frames from a Linux network interface, and on the backend, read and write PACKRAT frames over the air. This has the benifit of continuing to allow Linux userspace tools to work (like cURL, as we ll try!), which means we don t have to do a lot of work to implement higher level protocols or tactics to get a connection established over the link. Given that this post is less RF and more Linuxy, I m going to include more code snippits than in prior posts, and those snippits are closer to runable Go, but still not complete examples. There s also a lot of different ways to do this, I ve just picked the easiest one for me to implement and debug given my existing tooling for you, you may find another approach easier to implement! Again, deviation here is very welcome, and since this segment is the least RF centric post in the series, the pace and tone is going to feel different. If you feel lost here, that s OK. This isn t the most important part of the series, and is mostly here to give a concrete ending to the story arc. Any way you want to finish your own journy is the best way for you to finish it!

Implement Ethernet conversion code This assumes an importable package with a Frame struct, which we can use to convert a Frame to/from Ethernet. Given that the PACKRAT frame has a field that Ethernet doesn t (namely, Callsign), that will need to be explicitly passed in when turning an Ethernet frame into a PACKRAT Frame.
...
// ToPackrat will create a packrat frame from an Ethernet frame.
func ToPackrat(callsign [8]byte, frame *ethernet.Frame) (*packrat.Frame, error)  
var frameType packrat.FrameType
switch frame.EtherType  
case ethernet.EtherTypeIPv4:
frameType = packrat.FrameTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unsupported ethernet type %x", frame.EtherType)
 
return &packrat.Frame 
Destination: frame.Destination,
Source: frame.Source,
Type: frameType,
Callsign: callsign,
Payload: frame.Payload,
 , nil
 
// FromPackrat will create an Ethernet frame from a Packrat frame.
func FromPackrat(frame *packrat.Frame) (*ethernet.Frame, error)  
var etherType ethernet.EtherType
switch frame.Type  
case packrat.FrameTypeRaw:
return nil, fmt.Errorf("ethernet: unsupported packrat type 'raw'")
case packrat.FrameTypeIPv4:
etherType = ethernet.EtherTypeIPv4
default:
return nil, fmt.Errorf("ethernet: unknown packrat type %x", frame.Type)
 
// We lose the Callsign here, which is sad.
 return &ethernet.Frame 
Destination: frame.Destination,
Source: frame.Source,
EtherType: etherType,
Payload: frame.Payload,
 , nil
 
Our helpers, ToPackrat and FromPackrat can now be used to transmorgify PACKRAT into Ethernet, or Ethernet into PACKRAT. Let s put them into use!

Implement a TAP interface On Linux, the networking stack can be exposed to userland using TUN or TAP interfaces. TUN devices allow a userspace program to read and write data at the Layer 3 / IP layer. TAP devices allow a userspace program to read and write data at the Layer 2 Data Link / Ethernet layer. Writing data at Layer 2 is what we want to do, since we re looking to transform our Layer 2 into Ethernet s Layer 2 Frames. Our first job here is to create the actual TAP interface, set the MAC address, and set the IP range to our pre-coordinated IP range.
...
import (
"net"
"github.com/mdlayher/ethernet"
"github.com/songgao/water"
"github.com/vishvananda/netlink"
)
...
config := water.Config DeviceType: water.TAP 
config.Name = "rat0"
iface, err := water.New(config)
...
netIface, err := netlink.LinkByName("rat0")
...
// Pick a range here that works for you!
 //
 // For my local network, I'm using some IPs
 // that AMPR (ampr.org) was nice enough to
 // allocate to me for ham radio use. Thanks,
 // AMPR!
 //
 // Let's just use 10.* here, though.
 //
 ip, cidr, err := net.ParseCIDR("10.0.0.1/24")
...
cidr.IP = ip
err = netlink.AddrAdd(netIface, &netlink.Addr 
IPNet: cidr,
Peer: cidr,
 )
...
// Add all our neighbors to the ARP table
 for _, neighbor := range neighbors  
netlink.NeighAdd(&netlink.Neigh 
LinkIndex: netIface.Attrs().Index,
Type: netlink.FAMILY_V4,
State: netlink.NUD_PERMANENT,
IP: neighbor.IP,
HardwareAddr: neighbor.MAC,
 )
 
// Pick a MAC that is globally unique here, this is
 // just used as an example!
 addr, err := net.ParseMAC("FA:DE:DC:AB:LE:01")
...
netlink.LinkSetHardwareAddr(netIface, addr)
...
err = netlink.LinkSetUp(netIface)
var frame = &ethernet.Frame 
var buf = make([]byte, 1500)
for  
n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
// process frame here (to come)
  
...
Now that our network stack can resolve an IP to a MAC Address (via ip neigh according to our pre-defined neighbors), and send that IP packet to our daemon, it s now on us to send IPv4 data over the airwaves. Here, we re going to take packets coming in from our TAP interface, and marshal the Ethernet frame into a PACKRAT Frame and transmit it. As with the rest of the RF code, we ll leave that up to the implementer, of course, using what was built during Part 2: Transmitting BPSK symbols and Part 4: Framing data.
...
for  
// continued from above

n, err := iface.Read(buf)
...
err = frame.UnmarshalBinary(buf[:n])
...
switch frame.EtherType  
case 0x0800:
// ipv4 packet
 pack, err := ToPackrat(
// Add my callsign to all Frames, for now
 [8]byte 'K', '3', 'X', 'E', 'C' ,
frame,
)
...
err = transmitPacket(pack)
...
 
 
...
Now that we have transmitting covered, let s go ahead and handle the recieve path here. We re going to listen on frequency using the code built in Part 3: Receiving BPSK symbols and Part 4: Framing data. The Frames we decode from the airwaves are expected to come back from the call packratReader.Next in the code below, and the exact way that works is up to the implementer.
...
for  
// pull the next packrat frame from
 // the symbol stream as we did in the
 // last post
 packet, err := packratReader.Next()
...
// check for CRC errors and drop invalid
 // packets
 err = packet.Check()
...
if bytes.Equal(packet.Source, addr)  
// if we've heard ourself transmitting
 // let's avoid looping back
 continue
 
// create an ethernet frame
 frame, err := FromPackrat(packet)
...
buf, err := frame.MarshalBinary()
...
// and inject it into the tap
 err = iface.Write(buf)
...
 
...
Phew. Right. Now we should be able to listen for PACKRAT frames on the air and inject them into our TAP interface.

Putting it all Together After all this work weeks of work! we can finally get around to putting some real packets over the air. For me, this was an incredibly satisfying milestone, and tied together months of learning! I was able to start up a UDP server on a remote machine with an RTL-SDR dongle attached to it, listening on the TAP interface s host IP with my defined MAC address, and send UDP packets to that server via PACKRAT using my laptop, /dev/udp and an Ettus B210, sending packets into the TAP interface. Now that UDP was working, I was able to get TCP to work using two PlutoSDRs, which allowed me to run the cURL command I pasted in the first post (both simultaneously listen and transmit on behalf of my TAP interface). It s my hope that someone out there will be inspired to implement their own Layer 1 and Layer 2 as a learning exercise, and gets the same sense of gratification that I did! If you re reading this, and at a point where you ve been able to send IP traffic over your own Layer 1 / Layer 2, please get in touch! I d be thrilled to hear all about it. I d love to link to any posts or examples you publish here!

5 December 2021

Dirk Eddelbuettel: RcppSpdlog 0.0.7 on CRAN: Package Maintenance

A new version 0.0.7 of RcppSpdlog is now on CRAN. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovich. This release brings upstream bugfix releases 1.9.1 and 1.9.2 of spdlog. We also removed the YAML file (and badge) for the disgraced former continuous integration service we shall not name (yet that we all used to use). And just like digest four days ago, drat three days ago, littler two days ago, and RcppAPT yesterday, we converted the vignettes from using the minidown package to the (fairly new) simplermarkdown package which is so much more appropriate for our use of the minimal water.css style. The (minimal) NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.7 (2021-12-05)
  • Upgraded to upstream bug fix releases spdlog 1.9.1 and 1.9.2
  • Travis artifacts and badges have been pruned
  • Vignette now uses simplermarkdown

Courtesy of my CRANberries, there is also a diffstat report. More detailed information is on the RcppSpdlog page, or the package documention site. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Paul Tagliamonte: Framing data (Part 4/5)

This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
In the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a Layer 2 scheme in the OSI model, which is otherwise known as the Data Link Layer. You re using one to view this website right now I m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE. Given that this entire exercise is hard enough without designing a complex Layer 2 scheme, I opted for simplicity in the hopes this would free me from the complexity and research that has gone into this field for the last 50 years. I settled on stealing a few ideas from Ethernet Frames namely, the use of MAC addresses to identify parties, and the EtherType field to indicate the Payload type. I also stole the idea of using a CRC at the end of the Frame to check for corruption, as well as the specific CRC method (crc32 using 0xedb88320 as the polynomial). Lastly, I added a callsign field to make life easier on ham radio frequencies if I was ever to seriously attempt to use a variant of this protocol over the air with multiple users. However, given this scheme is not a commonly used scheme, it s best practice to use a nearby radio to identify your transmissions on the same frequency while testing or use a Faraday box to test without transmitting over the airwaves. I added the callsign field in an effort to lean into the spirit of the Part 97 regulations, even if I relied on a phone emission to identify the Frames. As an aside, I asked the ARRL for input here, and their stance to me over email was I d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users). Right, back to the Frame:
sync
dest
source
callsign
type
payload
crc
With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.
type FrameType [2]byte
type Frame struct  
Destination net.HardwareAddr
Source net.HardwareAddr
Callsign [8]byte
Type FrameType
Payload []byte
CRC uint32
 

Time to pick some consts I picked a unique and distinctive sync sequence, which the sender will transmit before the Frame, while the receiver listens for that sequence to know when it s in byte alignment with the symbol stream. My sync sequence is [3]byte 'U', 'f', '~' which works out to be a very pleasant bit sequence of 01010101 01100110 01111110. It s important to have soothing preambles for your Frames. We need all the good energy we can get at this point.
var (
FrameStart = [3]byte 'U', 'f', '~' 
FrameMaxPayloadSize = 1500
)
Next, I defined some FrameType values for the type field, which I can use to determine what is done with that data next, something Ethernet was originally missing, but has since grown to depend on (who needs Length anyway? Not me. See below!)
FrameType Description Bytes
Raw Bytes in the Payload field are opaque and not to be parsed. [2]byte 0x00, 0x01
IPv4 Bytes in the Payload field are an IPv4 packet. [2]byte 0x00, 0x02
And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.
var (
FrameTypeRaw = FrameType 0, 1 
FrameTypeIPv4 = FrameType 0, 2 
)
Given we know how we re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.

Why is there no Length field? I was initially a bit surprised that Ethernet Frames didn t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer? I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.

Finding the Frame start in a Symbol Stream First thing we need to do is find our sync bit pattern in the symbols we re receiving from our BPSK demodulator. There s some smart ways to do this, but given that I m not much of a smart man, I again decided to go for simple instead. Given our incoming vector of symbols (which are still float values) prepend one at a time to a vector of floats that is the same length as the sync phrase, and compare against the sync phrase, to determine if we re in sync with the byte boundary within the symbol stream. The only trick here is that because we re using BPSK to modulate and demodulate the data, post phaselock we can be 180 degrees out of alignment (such that a +1 is demodulated as -1, or vice versa). To deal with that, I check against both the sync phrase as well as the inverse of the sync phrase (both [1, -1, 1] as well as [-1, 1, -1]) where if the inverse sync is matched, all symbols to follow will be inverted as well. This effectively turns our symbols back into bits, even if we re flipped out of phase. Other techniques like NRZI will represent a 0 or 1 by a change in phase state which is great, but can often cascade into long runs of bit errors, and is generally more complex to implement. That representation isn t ambiguous, given you look for a phase change, not the absolute phase value, which is incredibly compelling. Here s a notional example of how I ve been thinking about the phrase sliding window and how I ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.
 var (
sync = []float  ...  
buf = make([]float, len(sync))
incomingSymbols = []float  ...  
)
for _, el := range incomingSymbols  
copy(buf, buf[1:])
buf[len(buf)-1] = el
if compare(sync, buf)  
// we're synced!
 break
 
 
Given the pseudocode above, let s step through what the checks would be doing at each step:
Buffer Sync Inverse Sync
[ ]float 0, ,0 [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 0, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
[more bits in] [ ]float -1, ,-1 [ ]float 1, ,1
[ ]float 1, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we re also 180 degrees out of phase, so we need to flip the symbol s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don t know what this technique is called or even if this is used in real grown-up implementations, but it s been working for my toy implementation.

Next Steps Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!

4 December 2021

Jonathan Dowland: Haskell mortgage calculator

A few months ago I was trying to compare two mortgage offers, and ended up writing a small mortgage calculator to help me. Both mortgages were fixed-term for the same time period (5 years). One of the mortgages had a lower rate than the other, but much higher arrangement fees. A broker recommended the mortgage with the higher rate but lower fee, on an affordability basis for the fixed term: over all, we would spend less money within the fixed term on that deal than the other. (I thought) this left one bit of information missing: what remaining balance would there be at the end of the term? The mortgages I want to model are defined in terms of a monthly repayment figure and an annual interest rate for the fixed period. I think interest is usually recalculated on a daily basis, so I convert the annual rate down to a daily rate. Repayments only happen once a month. Months are not all the same size. Using mod 30 on the 'day' approximates a monthly payment. Over 5 years, there would be 60 months, meaning 60 repayments. (I'm ignoring leap years)
 > length . filter id .take (5*365) $ [ x mod 30==0   x <- [1..]]
60
Here's what I came up with. I was a little concerned the repayment approximation was too far out so I compared the output with a more precise (but boring) spreadsheet and they agreed to within an acceptable tolerance. The numbers that follow are all made up to illustrate the function and don't reflect my actual mortgage. :)
borrowed = 1000000 -- day 0 amount outstanding
aer   = 0.89
repay = 1000
der   = aer / 36
owed n   n == 0          = borrowed
         n  mod  30 == 0 = last + interest - repay
         otherwise       = last + interest
    where
        last     = owed (n - 1)
        interest = last * der

30 November 2021

Russell Coker: Links November 2021

The Guardian has an amusing article by Sophie Elmhirst about Libertarians buying a cruise ship to make a seasteading project off the coast of Panama [1]. It turns out that you need permits etc to do this and maintaining a ship is expensive. Also you wouldn t want to mine cryptocurrency in a ship cabin as most cabins are small and don t have enough airconditioning to remain pleasant if you dump 1kW or more into the air. NPR has an interesting article about the reaction of the NRA to the Columbine shootings [2]. Seems that some NRA person isn t a total asshole and is sharing their private information, maybe they are dying and are worried about going to hell. David Brin wrote an insightful blog post about the singleton hypothesis where he covers some of the evidence of autocratic societies failing [3]. I think he makes a convincing point about a single centralised government for human society not being viable. But something like the EU on a world wide scale could work well. Ken Shirriff wrote an interesting blog post about reverse engineering the Yamaha DX7 synthesiser [4]. The New York Times has an interesting article about a Baboon troop that became less aggressive after the alpha males all died at once from tuberculosis [5]. They established a new more peaceful culture that has outlived the beta males who avoided tuberculosis. The Guardian has an interesting article about how sequencing the genomes of the entire population can save healthcare costs while improving the health of the population [6]. This is somthing wealthy countries should offer for free to the world population. At a bit under $1000 per test that s only about $7 trillion to test everyone, and of course the price should drop significantly if there were billions of tests being done. The Strategy Bridge has an interesting article about SciFi books that have useful portrayals of military strategy [7]. The co-author is Major General Mick Ryan of the Australian Army which is noteworthy as Major General is the second highest rank in use by the Australian Army at this time. Vice has an interesting article about the co-evolution of penises and vaginas and how a lot of that evolution is based on avoiding impregnation from rape [8]. Cory Doctorow wrote an insightful Medium article about the way that governments could force interoperability through purchasing power [9]. Cory Doctorow wrote an insightful article for Locus Magazine about imagining life after capitalism and how capitalism might be replaced [10]. We need a Star Trek future! Arstechnica has an informative article about new developmenet in the rowhammer category of security attacks on DRAM [11]. It seems that DDR4 with ECC is the best current mitigation technique and that DDR3 with ECC is harder to attack than non-ECC RAM. So the thing to do is use ECC on all workstations and avoid doing security critical things on laptops because they can t juse ECC RAM.

21 November 2021

Antoine Beaupr : mbsync vs OfflineIMAP

After recovering from my latest email crash (previously, previously), I had to figure out which tool I should be using. I had many options but I figured I would start with a popular one (mbsync). But I also evaluated OfflineIMAP which was resurrected from the Python 2 apocalypse, and because I had used it before, for a long time. Read on for the details.

Benchmark setup All programs were tested against a Dovecot 1:2.3.13+dfsg1-2 server, running Debian bullseye. The client is a Purism 13v4 laptop with a Samsung SSD 970 EVO 1TB NVMe drive. The server is a custom build with a AMD Ryzen 5 2600 CPU, and a RAID-1 array made of two NVMe drives (Intel SSDPEKNW010T8 and WDC WDS100T2B0C). The mail spool I am testing against has almost 400k messages and takes 13GB of disk space:
$ notmuch count --exclude=false
372758
$ du -sh --exclude xapian Maildir
13G Maildir
The baseline we are comparing against is SMD (syncmaildir) which performs the sync in about 7-8 seconds locally (3.5 seconds for each push/pull command) and about 10-12 seconds remotely. Anything close to that or better is good enough. I do not have recent numbers for a SMD full sync baseline, but the setup documentation mentions 20 minutes for a full sync. That was a few years ago, and the spool has obviously grown since then, so that is not a reliable baseline. A baseline for a full sync might be also set with rsync, which copies files at nearly 40MB/s, or 317Mb/s!
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
 12,647,814,731 100%   37.85MB/s    0:05:18 (xfr#394981, to-chk=0/395815)    
72.38user 106.10system 5:19.59elapsed 55%CPU (0avgtext+0avgdata 15988maxresident)k
8816inputs+26305112outputs (0major+50953minor)pagefaults 0swaps
That is 5 minutes to transfer the entire spool. Incremental syncs are obviously pretty fast too:
anarcat@angela:tmp(main)$ time rsync -a --info=progress2 --exclude xapian  shell.anarc.at:Maildir/ Maildir/
              0   0%    0.00kB/s    0:00:00 (xfr#0, to-chk=0/395815)    
1.42user 0.81system 0:03.31elapsed 67%CPU (0avgtext+0avgdata 14100maxresident)k
120inputs+0outputs (3major+12709minor)pagefaults 0swaps
As an extra curiosity, here's the performance with tar, pretty similar with rsync, minus incremental which I cannot be bothered to figure out right now:
anarcat@angela:tmp(main)$ time ssh shell.anarc.at tar --exclude xapian -cf - Maildir/   pv -s 13G   tar xf - 
56.68user 58.86system 5:17.08elapsed 36%CPU (0avgtext+0avgdata 8764maxresident)k
0inputs+0outputs (0major+7266minor)pagefaults 0swaps
12,1GiO 0:05:17 [39,0MiB/s] [===================================================================> ] 92%
Interesting that rsync manages to almost beat a plain tar on file transfer, I'm actually surprised by how well it performs here, considering there are many little files to transfer. (But then again, this maybe is exactly where rsync shines: while tar needs to glue all those little files together, rsync can just directly talk to the other side and tell it to do live changes. Something to look at in another article maybe?) Since both ends are NVMe drives, those should easily saturate a gigabit link. And in fact, a backup of the server mail spool achieves much faster transfer rate on disks:
anarcat@marcos:~$ tar fc - Maildir   pv -s 13G > Maildir.tar
15,0GiO 0:01:57 [ 131MiB/s] [===================================] 115%
That's 131Mibyyte per second, vastly faster than the gigabit link. The client has similar performance:
anarcat@angela:~(main)$ tar fc - Maildir   pv -s 17G > Maildir.tar
16,2GiO 0:02:22 [ 116MiB/s] [==================================] 95%
So those disks should be able to saturate a gigabit link, and they are not the bottleneck on fast links. Which begs the question of what is blocking performance of a similar transfer over the gigabit link, but that's another question altogether, because no sync program ever reaches the above performance anyways. Finally, note that when I migrated to SMD, I wrote a small performance comparison that could be interesting here. It show SMD to be faster than OfflineIMAP, but not as much as we see here. In fact, it looks like OfflineIMAP slowed down significantly since then (May 2018), but this could be due to my larger mail spool as well.

mbsync The isync (AKA mbsync) project is written in C and supports syncing Maildir and IMAP folders, with possibly multiple replicas. I haven't tested this but I suspect it might be possible to sync between two IMAP servers as well. It supports partial mirorrs, message flags, full folder support, and "trash" functionality.

Complex configuration file I started with this .mbsyncrc configuration file:
SyncState *
Sync New ReNew Flags
IMAPAccount anarcat
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPStore anarcat-remote
Account anarcat
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir-mbsync/
Channel anarcat
# AKA Far, convert when all clients are 1.4+
Master :anarcat-remote:
# AKA Near
Slave :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
Patterns *
# Automatically create missing mailboxes, both locally and on the server
#Create Both
Create slave
# Sync the movement of messages between folders and deletions, add after making sure the sync works
#Expunge Both
Long gone are the days where I would spend a long time reading a manual page to figure out the meaning of every option. If that's your thing, you might like this one. But I'm more of a "EXAMPLES section" kind of person now, and I somehow couldn't find a sample file on the website. I started from the Arch wiki one but it's actually not great because it's made for Gmail (which is not a usual Dovecot server). So a sample config file in the manpage would be a great addition. Thankfully, the Debian packages ships one in /usr/share/doc/isync/examples/mbsyncrc.sample but I only found that after I wrote my configuration. It was still useful and I recommend people take a look if they want to understand the syntax. Also, that syntax is a little overly complicated. For example, Far needs colons, like:
Far :anarcat-remote:
Why? That seems just too complicated. I also found that sections are not clearly identified: IMAPAccount and Channel mark section beginnings, for example, which is not at all obvious until you learn about mbsync's internals. There are also weird ordering issues: the SyncState option needs to be before IMAPAccount, presumably because it's global. Using a more standard format like .INI or TOML could improve that situation.

Stellar performance A transfer of the entire mail spool takes 56 minutes and 6 seconds, which is impressive. It's not quite "line rate": the resulting mail spool was 12GB (which is a problem, see below), which turns out to be about 29Mbit/s and therefore not maxing the gigabit link, and an order of magnitude slower than rsync. The incremental runs are roughly 2 seconds, which is even more impressive, as that's actually faster than rsync:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.015       0.052       1.930       2.029       2.105       
user        0.660       0.040       0.592       0.661       0.722       
sys         0.338       0.033       0.268       0.341       0.387    
Those tests were performed with isync 1.3.0-2.2 on Debian bullseye. Tests with a newer isync release originally failed because of a corrupted message that triggered bug 999804 (see below). Running 1.4.3 under valgrind works around the bug, but adds a 50% performance cost, the full sync running in 1h35m. Once the upstream patch is applied, performance with 1.4.3 is fairly similar, considering that the new sync included the register folder with 4000 messages:
120.74user 213.19system 59:47.69elapsed 9%CPU (0avgtext+0avgdata 105420maxresident)k
29128inputs+28284376outputs (0major+45711minor)pagefaults 0swaps
That is ~13GB in ~60 minutes, which gives us 28.3Mbps. Incrementals are also pretty similar to 1.3.x, again considering the double-connect cost:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.500       0.087       2.340       2.491       2.629       
user        0.718       0.037       0.679       0.711       0.793       
sys         0.322       0.024       0.284       0.320       0.365
Those tests were all done on a Gigabit link, but what happens on a slower link? My server uplink is slow: 25 Mbps down, 6 Mbps up. There mbsync is worse than the SMD baseline:
===> multitime results
1: mbsync -a
Mean        Std.Dev.    Min         Median      Max
real        31.531      0.724       30.764      31.271      33.100      
user        1.858       0.125       1.721       1.818       2.131       
sys         0.610       0.063       0.506       0.600       0.695       
That's 30 seconds for a sync, which is an order of magnitude slower than SMD.

Great user interface Compared to OfflineIMAP and (ahem) SMD, the mbsync UI is kind of neat:
anarcat@angela:~(main)$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 1/2  B: 204/205  F: +0/0 *0/0 #0/0  N: +1/200 *0/0 #0/0
(Note that nice switch away from slavery-related terms too.) The display is minimal, and yet informative. It's not obvious what does mean at first glance, but the manpage is useful at least for clarifying that:
This represents the cumulative progress over channels, boxes, and messages affected on the far and near side, respectively. The message counts represent added messages, messages with updated flags, and trashed messages, respectively. No attempt is made to calculate the totals in advance, so they grow over time as more information is gathered. (Emphasis mine).
In other words:
  • C 2/2: channels done/total (2 done out of 2)
  • B 204/205: mailboxes done/total (204 out of 205)
  • F: changes on the far side
  • N: +10/200 *0/0 #0/0: changes on the "near" side:
    • +10/200: 10 out of 200 messages downloaded
    • *0/0: no flag changed
    • #0/0: no message deleted
You get used to it, in a good way. It does not, unfortunately, show up when you run it in systemd, which is a bit annoying as I like to see a summary mail traffic in the logs.

Interoperability issue In my notmuch setup, I have bound key S to "mark spam", which basically assigns the tag spam to the message and removes a bunch of others. Then I have a notmuch-purge script which moves that message to the spam folder, for training purposes. It basically does this:
notmuch search --output=files --format=text0 "$search_spam" \
      xargs -r -0 mv -t "$HOME/Maildir/$ PREFIX junk/cur/"
This method, which worked fine in SMD (and also OfflineIMAP) created this error on sync:
Maildir error: duplicate UID 37578.
And indeed, there are now two messages with that UID in the mailbox:
anarcat@angela:~(main)$ find Maildir/.junk/ -name '*U=37578*'
Maildir/.junk/cur/1637427889.134334_2.angela,U=37578:2,S
Maildir/.junk/cur/1637348602.2492889_221804.angela,U=37578:2,S
This is actually a known limitation or, as mbsync(1) calls it, a "RECOMMENDATION":
When using the more efficient default UID mapping scheme, it is important that the MUA renames files when moving them between Maildir fold ers. Mutt always does that, while mu4e needs to be configured to do it:
(setq mu4e-change-filenames-when-moving t)
So it seems I would need to fix my script. It's unclear how the paths should be renamed, which is unfortunate, because I would need to change my script to adapt to mbsync, but I can't tell how just from reading the above. (A manual fix is actually to rename the file to remove the U= field: mbsync will generate a new one and then sync correctly.) Fortunately, someone else already fixed that issue: afew, a notmuch tagging script (much puns, such hurt), has a move mode that can rename files correctly, specifically designed to deal with mbsync. I had already been told about afew, but it's one more reason to standardize my notmuch hooks on that project, it looks like. Update: I have tried to use afew and found it has significant performance issues. It also has a completely different paradigm to what I am used to: it assumes all incoming mail has a new and lays its own tags on top of that (inbox, sent, etc). It can only move files from one folder at a time (see this bug) which breaks my spam training workflow. In general, I sync my tags into folders (e.g. ham, spam, sent) and message flags (e.g. inbox is F, unread is "not S", etc), and afew is not well suited for this (although there are hacks that try to fix this). I have worked hard to make my tagging scripts idempotent, and it's something afew doesn't currently have. Still, it would be better to have that code in Python than bash, so maybe I should consider my options here.

Stability issues The newer release in Debian bookworm (currently at 1.4.3) has stability issues on full sync. I filed bug 999804 in Debian about this, which lead to a thread on the upstream mailing list. I have found at least three distinct crashes that could be double-free bugs "which might be exploitable in the worst case", not a reassuring prospect. The thing is: mbsync is really fast, but the downside of that is that it's written in C, and with that comes a whole set of security issues. The Debian security tracker has only three CVEs on isync, but the above issues show there could be many more. Reading the source code certainly did not make me very comfortable with trusting it with untrusted data. I considered sandboxing it with systemd (below) but having systemd run as a --user process makes that difficult. I also considered using an apparmor profile but that is not trivial because we need to allow SSH and only some parts of it... Thankfully, upstream has been diligent at addressing the issues I have found. They provided a patch within a few days which did fix the sync issues. Update: upstream actually took the issue very seriously. They not only got CVE-2021-44143 assigned for my bug report, they also audited the code and found several more issues collectively identified as CVE-2021-3657, which actually also affect 1.3 (ie. Debian 11/bullseye/stable). Somehow my corpus doesn't trigger that issue, but it was still considered serious enough to warrant a CVE. So one the one hand: excellent response from upstream; but on the other hand: how many more of those could there be in there?

Automation with systemd The Arch wiki has instructions on how to setup mbsync as a systemd service. It suggests using the --verbose (-V) flag which is a little intense here, as it outputs 1444 lines of messages. I have used the following .service file:
[Unit]
Description=Mailbox synchronization service
ConditionHost=!marcos
Wants=network-online.target
After=network-online.target
Before=notmuch-new.service
[Service]
Type=oneshot
ExecStart=/usr/bin/mbsync -a
Nice=10
IOSchedulingClass=idle
NoNewPrivileges=true
[Install]
WantedBy=default.target
And the following .timer:
[Unit]
Description=Mailbox synchronization timer
ConditionHost=!marcos
[Timer]
OnBootSec=2m
OnUnitActiveSec=5m
Unit=mbsync.service
[Install]
WantedBy=timers.target
Note that we trigger notmuch through systemd, with the Before and also by adding mbsync.service to the notmuch-new.service file:
[Unit]
Description=notmuch new
After=mbsync.service
[Service]
Type=oneshot
Nice=10
ExecStart=/usr/bin/notmuch new
[Install]
WantedBy=mbsync.service
An improvement over polling repeatedly with a .timer would be to wake up only on IMAP notify, but neither imapnotify nor goimapnotify seem to be packaged in Debian. It would also not cover for the "sent folder" use case, where we need to wake up on local changes.

Password-less setup The sample file suggests this should work:
IMAPStore remote
Tunnel "ssh -q host.remote.com /usr/sbin/imapd"
Add BatchMode, restrict to IdentitiesOnly, provide a password-less key just for this, add compression (-C), find the Dovecot imap binary, and you get this:
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
And it actually seems to work:
$ mbsync -a
Notice: Master/Slave are deprecated; use Far/Near instead.
C: 0/2  B: 0/1  F: +0/0 *0/0 #0/0  N: +0/0 *0/0 #0/0imap(anarcat): Error: net_connect_unix(/run/dovecot/stats-writer) failed: Permission denied
C: 2/2  B: 205/205  F: +0/0 *0/0 #0/0  N: +1/1 *3/3 #0/0imap(anarcat)<1611280><90uUOuyElmEQlhgAFjQyWQ>: Info: Logged out in=10808 out=15396642 deleted=0 expunged=0 trashed=0 hdr_count=0 hdr_bytes=0 body_count=1 body_bytes=8087
It's a bit noisy, however. dovecot/imap doesn't have a "usage" to speak of, but even the source code doesn't hint at a way to disable that Error message, so that's unfortunate. That socket is owned by root:dovecot so presumably Dovecot runs the imap process as $user:dovecot, which we can't do here. Oh well? Interestingly, the SSH setup is not faster than IMAP. With IMAP:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.367       0.065       2.220       2.376       2.458       
user        0.793       0.047       0.731       0.776       0.871       
sys         0.426       0.040       0.364       0.434       0.476
With SSH:
===> multitime results
1: mbsync -a
            Mean        Std.Dev.    Min         Median      Max
real        2.515       0.088       2.274       2.532       2.594       
user        0.753       0.043       0.645       0.766       0.804       
sys         0.328       0.045       0.212       0.340       0.393
Basically: 200ms slower. Tolerable.

Migrating from SMD The above was how I migrated to mbsync on my first workstation. The work on the second one was more streamlined, especially since the corruption on mailboxes was fixed:
  1. install isync, with the patch:
    dpkg -i isync_1.4.3-1.1~_amd64.deb
    
  2. copy all files over from previous workstation to avoid a full resync (optional):
    rsync -a --info=progress2 angela:Maildir/ Maildir-mbsync/
    
  3. rename all files to match new hostname (optional):
    find Maildir-mbsync/ -type f -name '*.angela,*' -print0    rename -0 's/\.angela,/\.curie,/'
    
  4. trash the notmuch database (optional):
    rm -rf Maildir-mbsync/.notmuch/xapian/
    
  5. disable all smd and notmuch services:
    systemctl --user --now disable smd-pull.service smd-pull.timer smd-push.service smd-push.timer notmuch-new.service notmuch-new.timer
    
  6. do one last sync with smd:
    smd-pull --show-tags ; smd-push --show-tags ; notmuch new ; notmuch-sync-flagged -v
    
  7. backup notmuch on the client and server:
    notmuch dump   pv > notmuch.dump
    
  8. backup the maildir on the client and server:
    cp -al Maildir Maildir-bak
    
  9. create the SSH key:
    ssh-keygen -t ed25519 -f .ssh/id_ed25519_mbsync
    cat .ssh/id_ed25519_mbsync.pub
    
  10. add to .ssh/authorized_keys on the server, like this: command="/usr/lib/dovecot/imap",restrict ssh-ed25519 AAAAC...
  11. move old files aside, if present:
    mv Maildir Maildir-smd
    
  12. move new files in place (CRITICAL SECTION BEGINS!):
    mv Maildir-mbsync Maildir
    
  13. run a test sync, only pulling changes: mbsync --create-near --remove-none --expunge-none --noop anarcat-register
  14. if that works well, try with all mailboxes: mbsync --create-near --remove-none --expunge-none --noop -a
  15. if that works well, try again with a full sync: mbsync register mbsync -a
  16. reindex and restore the notmuch database, this should take ~25 minutes:
    notmuch new
    pv notmuch.dump   notmuch restore
    
  17. enable the systemd services and retire the smd-* services: systemctl --user enable mbsync.timer notmuch-new.service systemctl --user start mbsync.timer rm ~/.config/systemd/user/smd* systemctl daemon-reload
During the migration, notmuch helpfully told me the full list of those lost messages:
[...]
Warning: cannot apply tags to missing message: CAN6gO7_QgCaiDFvpG3AXHi6fW12qaN286+2a7ERQ2CQtzjSEPw@mail.gmail.com
Warning: cannot apply tags to missing message: CAPTU9Wmp0yAmaxO+qo8CegzRQZhCP853TWQ_Ne-YF94MDUZ+Dw@mail.gmail.com
Warning: cannot apply tags to missing message: F5086003-2917-4659-B7D2-66C62FCD4128@gmail.com
[...]
Warning: cannot apply tags to missing message: mailman.2.1316793601.53477.sage-members@mailman.sage.org
Warning: cannot apply tags to missing message: mailman.7.1317646801.26891.outages-discussion@outages.org
Warning: cannot apply tags to missing message: notmuch-sha1-000458df6e48d4857187a000d643ac971deeef47
Warning: cannot apply tags to missing message: notmuch-sha1-0079d8e0c3340e6f88c66f4c49fca758ea71d06d
Warning: cannot apply tags to missing message: notmuch-sha1-0194baa4cfb6d39bc9e4d8c049adaccaa777467d
Warning: cannot apply tags to missing message: notmuch-sha1-02aede494fc3f9e9f060cfd7c044d6d724ad287c
Warning: cannot apply tags to missing message: notmuch-sha1-06606c625d3b3445420e737afd9a245ae66e5562
Warning: cannot apply tags to missing message: notmuch-sha1-0747b020f7551415b9bf5059c58e0a637ba53b13
[...]
As detailed in the crash report, all of those were actually innocuous and could be ignored. Also note that we completely trash the notmuch database because it's actually faster to reindex from scratch than let notmuch slowly figure out that all mails are new and all the old mails are gone. The fresh indexing took:
nov 19 15:08:54 angela notmuch[2521117]: Processed 384679 total files in 23m 41s (270 files/sec.).
nov 19 15:08:54 angela notmuch[2521117]: Added 372610 new messages to the database.
While a reindexing on top of an existing database was going twice as slow, at about 120 files/sec.

Current config file Putting it all together, I ended up with the following configuration file:
SyncState *
Sync All
# IMAP side, AKA "Far"
IMAPAccount anarcat-imap
Host imap.anarc.at
User anarcat
PassCmd "pass imap.anarc.at"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C anarcat@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-remote
Account anarcat-tunnel
# Maildir side, AKA "Near"
MaildirStore anarcat-local
# Maildir/top/sub/sub
#SubFolders Verbatim
# Maildir/.top.sub.sub
SubFolders Maildir++
# Maildir/top/.sub/.sub
# SubFolders legacy
# The trailing "/" is important
#Path ~/Maildir-mbsync/
Inbox ~/Maildir/
# what binds Maildir and IMAP
Channel anarcat
Far :anarcat-remote:
Near :anarcat-local:
# Exclude everything under the internal [Gmail] folder, except the interesting folders
#Patterns * ![Gmail]* "[Gmail]/Sent Mail" "[Gmail]/Starred" "[Gmail]/All Mail"
# Or include everything
#Patterns *
Patterns * !register  !.register
# Automatically create missing mailboxes, both locally and on the server
Create Both
#Create Near
# Sync the movement of messages between folders and deletions, add after making sure the sync works
Expunge Both
# Propagate mailbox deletion
Remove both
IMAPAccount anarcat-register-imap
Host imap.anarc.at
User register
PassCmd "pass imap.anarc.at-register"
SSLType IMAPS
CertificateFile /etc/ssl/certs/ca-certificates.crt
IMAPAccount anarcat-register-tunnel
Tunnel "ssh -o BatchMode=yes -o IdentitiesOnly=yes -i ~/.ssh/id_ed25519_mbsync -o HostKeyAlias=shell.anarc.at -C register@imap.anarc.at /usr/lib/dovecot/imap"
IMAPStore anarcat-register-remote
Account anarcat-register-tunnel
MaildirStore anarcat-register-local
SubFolders Maildir++
Inbox ~/Maildir/.register/
Channel anarcat-register
Far :anarcat-register-remote:
Near :anarcat-register-local:
Create Both
Expunge Both
Remove both
Note that it may be out of sync with my live (and private) configuration file, as I do not publish my "dotfiles" repository publicly for security reasons.

OfflineIMAP I've used OfflineIMAP for a long time before switching to SMD. I don't exactly remember why or when I started using it, but I do remember it became painfully slow as I started using notmuch, and would sometimes crash mysteriously. It's been a while, so my memory is hazy on that. It also kind of died in a fire when Python 2 stop being maintained. The main author moved on to a different project, imapfw which could serve as a framework to build IMAP clients, but never seemed to implement all of the OfflineIMAP features and certainly not configuration file compatibility. Thankfully, a new team of volunteers ported OfflineIMAP to Python 3 and we can now test that new version to see if it is an improvement over mbsync.

Crash on full sync The first thing that happened on a full sync is this crash:
Copy message from RemoteAnarcat:junk:
 ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Thread 'Copy message from RemoteAnarcat:junk' terminated with exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/imaputil.py", line 406, in utf7m_decode
    for c in binary.decode():
AttributeError: 'memoryview' object has no attribute 'decode'
The above exception was the direct cause of the following exception:
Traceback (most recent call last):
  File "/usr/share/offlineimap3/offlineimap/threadutil.py", line 146, in run
    Thread.run(self)
  File "/usr/lib/python3.9/threading.py", line 892, in run
    self._target(*self._args, **self._kwargs)
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Last 1 debug messages logged for Copy message from RemoteAnarcat:junk prior to exception:
thread: Register new thread 'Copy message from RemoteAnarcat:junk' (account 'Anarcat')
ERROR: Exceptions occurred during the run!
ERROR: Copying message 30624 [acc: Anarcat]
  decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 802, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 342, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 908, in _fetch_from_imap
    ndata1 = self.parser['8bit-RFC'].parsebytes(data[0][1])
  File "/usr/lib/python3.9/email/parser.py", line 123, in parsebytes
    return self.parser.parsestr(text, headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 67, in parsestr
    return self.parse(StringIO(text), headersonly=headersonly)
  File "/usr/lib/python3.9/email/parser.py", line 56, in parse
    feedparser.feed(data)
  File "/usr/lib/python3.9/email/feedparser.py", line 176, in feed
    self._call_parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 180, in _call_parse
    self._parse()
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 298, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 385, in _parsegen
    for retval in self._parsegen():
  File "/usr/lib/python3.9/email/feedparser.py", line 256, in _parsegen
    if self._cur.get_content_type() == 'message/delivery-status':
  File "/usr/lib/python3.9/email/message.py", line 578, in get_content_type
    value = self.get('content-type', missing)
  File "/usr/lib/python3.9/email/message.py", line 471, in get
    return self.policy.header_fetch_parse(k, v)
  File "/usr/lib/python3.9/email/policy.py", line 163, in header_fetch_parse
    return self.header_factory(name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 601, in __call__
    return self[name](name, value)
  File "/usr/lib/python3.9/email/headerregistry.py", line 196, in __new__
    cls.parse(value, kwds)
  File "/usr/lib/python3.9/email/headerregistry.py", line 445, in parse
    kwds['parse_tree'] = parse_tree = cls.value_parser(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2675, in parse_content_type_header
    ctype.append(parse_mime_parameters(value[1:]))
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2569, in parse_mime_parameters
    token, value = get_parameter(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2492, in get_parameter
    token, value = get_value(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 2403, in get_value
    token, value = get_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1294, in get_quoted_string
    token, value = get_bare_quoted_string(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1223, in get_bare_quoted_string
    token, value = get_encoded_word(value)
  File "/usr/lib/python3.9/email/_header_value_parser.py", line 1064, in get_encoded_word
    text, charset, lang, defects = _ew.decode('=?' + tok + '?=')
  File "/usr/lib/python3.9/email/_encoded_words.py", line 181, in decode
    string = bstring.decode(charset)
Folder junk [acc: Anarcat]:
 Copy message UID 30626 (29008/49310) RemoteAnarcat:junk -> LocalAnarcat:junk
Command exited with non-zero status 100
5252.91user 535.86system 3:21:00elapsed 47%CPU (0avgtext+0avgdata 846304maxresident)k
96344inputs+26563792outputs (1189major+2155815minor)pagefaults 0swaps
That only transferred about 8GB of mail, which gives us a transfer rate of 5.3Mbit/s, more than 5 times slower than mbsync. This bug is possibly limited to the bullseye version of offlineimap3 (the lovely 0.0~git20210225.1e7ef9e+dfsg-4), while the current sid version (the equally gorgeous 0.0~git20211018.e64c254+dfsg-1) seems unaffected.

Tolerable performance The new release still crashes, except it does so at the very end, which is an improvement, since the mails do get transferred:
 *** Finished account 'Anarcat' in 511:12
ERROR: Exceptions occurred during the run!
ERROR: Exception parsing message with ID (<20190619152034.BFB8810E07A@marcos.anarc.at>) from imaplib (response type: bytes).
 AttributeError: decoding with 'X-EUC-TW' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: Exception parsing message with ID (<40A270DB.9090609@alternatives.ca>) from imaplib (response type: bytes).
 AttributeError: decoding with 'x-mac-roman' codec failed (AttributeError: 'memoryview' object has no attribute 'decode')
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 910, in _fetch_from_imap
    raise OfflineImapError(
ERROR: IMAP server 'RemoteAnarcat' does not have a message with UID '32686'
Traceback:
  File "/usr/share/offlineimap3/offlineimap/folder/Base.py", line 810, in copymessageto
    message = self.getmessage(uid)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 343, in getmessage
    data = self._fetch_from_imap(str(uid), self.retrycount)
  File "/usr/share/offlineimap3/offlineimap/folder/IMAP.py", line 889, in _fetch_from_imap
    raise OfflineImapError(reason, severity)
Command exited with non-zero status 1
8273.52user 983.80system 8:31:12elapsed 30%CPU (0avgtext+0avgdata 841936maxresident)k
56376inputs+43247608outputs (811major+4972914minor)pagefaults 0swaps
"offlineimap  -o " took 8 hours 31 mins 15 secs
This is 8h31m for transferring 12G, which is around 3.1Mbit/s. That is nine times slower than mbsync, almost an order of magnitude! Now that we have a full sync, we can test incremental synchronization. That is also much slower:
===> multitime results
1: sh -c "offlineimap -o   true"
            Mean        Std.Dev.    Min         Median      Max
real        24.639      0.513       23.946      24.526      25.708      
user        23.912      0.473       23.404      23.795      24.947      
sys         1.743       0.105       1.607       1.729       2.002
That is also an order of magnitude slower than mbsync, and significantly slower than what you'd expect from a sync process. ~30 seconds is long enough to make me impatient and distracted; 3 seconds, less so: I can wait and see the results almost immediately.

Integrity check That said: this is still on a gigabit link. It's technically possible that OfflineIMAP performs better than mbsync over a slow link, but I Haven't tested that theory. The OfflineIMAP mail spool is missing quite a few messages as well:
anarcat@angela:~(main)$ find Maildir-offlineimap -type f -type f -a \! -name '.*'   wc -l 
381463
anarcat@angela:~(main)$ find Maildir -type f -type f -a \! -name '.*'   wc -l 
385247
... although that's probably all either new messages or the register folder, so OfflineIMAP might actually be in a better position there. But digging in more, it seems like the actual per-folder diff is fairly similar to mbsync: a few messages missing here and there. Considering OfflineIMAP's instability and poor performance, I have not looked any deeper in those discrepancies.

Other projects to evaluate Those are all the options I have considered, in alphabetical order
  • doveadm-sync: requires dovecot on both ends, can tunnel over SSH, may have performance issues in incremental sync, written in C
  • fdm: fetchmail replacement, IMAP/POP3/stdin/Maildir/mbox,NNTP support, SOCKS support (for Tor), complex rules for delivering to specific mailboxes, adding headers, piping to commands, etc. discarded because no (real) support for keeping mail on the server, and written in C
  • getmail: fetchmail replacement, IMAP/POP3 support, supports incremental runs, classification rules, Python
  • interimap: syncs two IMAP servers, apparently faster than doveadm and offlineimap, but requires running an IMAP server locally, Perl
  • isync/mbsync: TLS client certs and SSH tunnels, fast, incremental, IMAP/POP/Maildir support, multiple mailbox, trash and recursion support, and generally has good words from multiple Debian and notmuch people (Arch tutorial), written in C, review above
  • mail-sync: notify support, happens over any piped transport (e.g. ssh), diff/patch system, requires binary on both ends, mentions UUCP in the manpage, mentions rsmtp which is a nice name for rsendmail. not evaluated because it seems awfully complex to setup, Haskell
  • nncp: treat the local spool as another mail server, not really compatible with my "multiple clients" setup, Golang
  • offlineimap3: requires IMAP, used the py2 version in the past, might just still work, first sync painful (IIRC), ways to tunnel over SSH, review above, Python
Most projects were not evaluated due to lack of time.

Conclusion I'm now using mbsync to sync my mail. I'm a little disappointed by the synchronisation times over the slow link, but I guess that's on par for the course if we use IMAP. We are bound by the network speed much more than with custom protocols. I'm also worried about the C implementation and the crashes I have witnessed, but I am encouraged by the fast upstream response. Time will tell if I will stick with that setup. I'm certainly curious about the promises of interimap and mail-sync, but I have ran out of time on this project.

20 November 2021

Jonathan Dowland: hledger footguns

I wrote in budgeting tools that I was taking a look at Plain Text Accounting and in particular, hledger. My Jury's still out on the tools, but in the time I've been looking at them I've come across a couple of foot-guns I thought it was worth writing down. hledger's ledger format is derived from that of its predecessor ledger, and so some of the problems might be inherited. 1. significant white space delimiters The basic syntax for a transaction looks like this
2020-03-15 client payment
    assets:checking         $ 2000
    income:consulting       $-2000
There's some significant white space delimiters in play. The most subtle is what separates the account names from the values: it is two or more spaces. A single space, and the value is treated as part of the account name. For some reason I hit this frequently with trying to encode opening balances: the account name used as the source of the initial balances is something not otherwise generally referred to again (something like equity:opening balances) and the transaction amount is inferred where possible, so I ended up with a bunch of accounts named equity:opening balances 100 and similar. 2. flexible decimal delimiter The value of transactions can be interspersed with commas and periods to make it more readable: e.g. $2000 could be written as $2,000. Different locales have different conventions here: It seems some(/most/all?) of Europe use periods to separate out the units and a comma to delimit the fractional part, whereas the US and the UK do the opposite. There is no built-in association between the currency symbol you are using and the period/comma convention: it's quite possible to accidentally write a number which is interpreted differently to how you intended, and it doesn't matter if you are using $ or etc. 3. new syntax has unexpected results in old versions Finally, my favourite. hledger has a notion of rules that can be used to match transactions when importing from CSV. The format looks like this:
if (match rule)
& (another rule)
account1 some:account:from
account2 some:account:to
By default, multiple rules in sequence like above are OR'd: any of them can match. The & prefix switches the behaviour to AND. But, & is a relatively new addition: it's not supported in 1.18.1, the version in Debian stable, which upstream released in June 2020. In prior versions the & prefix is not a syntax error, or at least, not one that's reported: it's silently ignored; meaning, the line with the & does nothing, and any of the other rules in the set will match. This is easy to miss, and means imports could be incorrectly posted.

Next.