Search Results: "radu"

1 January 2022

Russ Allbery: 2021 Book Reading in Review

In 2021, I finished and reviewed 43 books, yet another (tiny) increase over 2020 and once again the best year for reading since 2012 (which was the last time I averaged 5 books a month). The year got off to a good reading start and closed strong, but once again had sags in the spring and summer when I got behind on reviews and fell out of the habit of reading daily. This year, at least, the end-of-year catch-up was less dramatic; all but two of the books I reviewed in December were finished in December. The best books I read this year were Naomi Novik's magic boarding school fantasies A Deadly Education and The Last Graduate, which I rated a 9 and a 10 respectively. Memorable characters, some great world-building, truly exceptional characterization of a mother/daughter relationship, adroit avoidance of genre pitfalls, and two of my favorite fictional tropes: for me, this series has it all. The third and concluding book of that series is my most anticipated book of 2022. My large reviewing project of this year was a complete re-read of C.S. Lewis's The Chronicles of Narnia, starting with my 1000th published review. As you can see, I have a lot of opinions about those books; they were a huge part of my childhood, and I'd been talking about writing those reviews for years. They were the longest reviews I've published and, unusually for me, full-spoiler reviews, and they took up a lot of my reviewing energy for the year. Of the seven books in the series, I was pleased to see that The Voyage of the Dawn Treader and The Magician's Nephew held up and are still very much worth reading. The Voyage of the Dawn Treader, in particular, is an exceptional sense-of-wonder fantasy novel with a story structure that remains rare. The best non-fiction book I read in 2021 is a prosaic choice that's only of specialist interest, but JavaScript: The Definitive Guide is precisely the type of programming language manual that I look for when learning a new language. It taught me what I was hoping to learn when I picked it up. Honorable mentions are a crowded field this year; I read a lot of books that were good but not great. Worth calling out are Arkady Martine's A Desolation Called Peace (sequel to the excellent A Memory Called Empire), if for nothing else than Three Seagrass; Micaiah Johnson's impressive debut The Space Between Worlds; and Becky Chambers's last Wayfarer novel, The Galaxy, and the Ground Within. On the non-fiction side, Allie Brosh's Solutions and Other Problems is a much harder and sadder book than the exceptional Hyperbole and a Half, but it was still very much worth reading. This was another year spent reading mostly recently-published books, without much backfill of either award winners or my existing library. In 2022, I hope to balance keeping up with new books of interest with returning to series I left unfinished, award lists I left only partly explored, and books I snapped up in earlier years and then never got around to. The full analysis includes some additional personal reading statistics, probably only of interest to me.

29 December 2021

Russ Allbery: Review: A Spindle Splintered

Review: A Spindle Splintered, by Alix E. Harrow
Series: Fractured Fables #1
Publisher: Tordotcom
Copyright: 2021
ISBN: 1-250-76536-6
Format: Kindle
Pages: 121
Zinnia Gray lives in rural Ohio and is obsessed with Sleeping Beauty, even though the fairy tale objectively sucks. That has a lot to do with having Generalized Roseville Malady, an always-fatal progressive amyloidosis caused by teratogenic industrial waste. No one with GRM has ever lived to turn twenty-two. A Spindle Splintered opens on Zinnia's twenty-first birthday. For her birthday, her best (and only) friend Charm (Charmaine Baldwin) throws her a party at the tower. There aren't a lot of towers in Ohio; this one is a guard tower at an abandoned state penitentiary occasionally used by the local teenagers, which is not quite the image one would get from fairy tales. But Charm fills it with roses, guests wearing cheap fairy wings, beer, and even an honest-to-god spinning wheel. At the end of the night, Zinnia decides to prick her finger on the spindle on a whim. Much to both of their surprise, that's enough to trigger some form of magic in Zinnia's otherwise entirely mundane world. She doesn't fall asleep for a thousand years, but she does get dumped into an actual fairy-tale tower near an actual princess, just in time to prevent her from pricking her finger. This is, as advertised on the tin, a fractured fairy tale, but it's one that barely introduces the Sleeping Beauty story before driving it entirely off the rails. It's also a fractured fairy tale in which the protagonist knows exactly what sort of story she's in, given that she graduated early from high school and has a college degree in folk studies. (Dying girl rule #1: move fast.) And it's one in which the fairy tale universe still has cell reception, if not chargers, which means you can text your best friend sarcastic commentary on your multiversal travels. Also, cell phone pictures of the impossibly beautiful princess. I should mention up-front that I have not watched Spider-Man: Into the Spider-Verse (yes, I know, I'm sure it's wonderful, I just don't watch things, basically ever), which is a quite explicit inspiration for this story. I'm therefore not sure how obvious the story would be to people familiar with that movie. Even with my familiarity with the general genre of fractured fairy tales, nothing plot-wise here was all that surprising. What carries this story is the characters and the emotional core, particularly Zinnia's complex and sardonic feelings about dying and the note-perfect friendship between Zinnia and Charm.
"You know it wasn't originally a spinning wheel in the story?" I offer, because alcohol transforms me into a chatty Wikipedia page.
A Spindle Splintered is told from Zinnia's first-person perspective, and Zinnia is great. My favorite thing about Harrow's writing is the fierce and complex emotions of her characters. The overall tone is lighter than The Once and Future Witches or The Ten Thousand Doors of January, but Harrow doesn't shy away from showing the reader Zinnia's internal thought process about her illness (and her eye-rolling bemusement at some of the earlier emotional stages she went through).
Dying girl rule #3 is no romance, because my entire life is one long trolley problem and I don't want to put any more bodies on the tracks. (I've spent enough time in therapy to know that this isn't "a healthy attitude towards attachment," but I personally feel that accepting my own imminent mortality is enough work without also having a healthy attitude about it.)
There's a content warning for parents here, since Harrow spends some time on the reaction of Zinnia's parents and the complicated dance between hope, despair, smothering, and freedom that she and they had to go through. There were no easy answers and all balances were fragile, but Zinnia always finds her feet. For me, Harrow's character writing is like emotional martial arts: rolling with punches, taking falls, clear-eyed about the setbacks, but always finding a new point of stability to punch back at the world. Zinnia adds just enough teenage irreverence and impatience to blunt the hardest emotional hits. I really enjoy reading it. The one caution I will make about that part of the story is that the focus is on Zinnia's projected lifespan and not on her illness specifically. Harrow uses it as setup to dig into how she and her parents would react to that knowledge (and I thought those parts were done well), but it's told from the perspective of "what would you do if you knew your date of death," not from the perspective of someone living with a disability. It is to some extent disability as plot device, and like the fairy tale that it's based on, it's deeply invested in the "find a cure" approach to the problem. I'm not disabled and am not the person to ask about how well a story handles disability, but I suspect this one may leave something to be desired. I thought the opening of this story is great. Zinnia is a great first-person protagonist and the opening few chapters are overflowing with snark and acerbic commentary. Dumping Zinnia into another world but having text messaging still work is genius, and I kind of wish Harrow had made that even more central to the book. The rest of the story was good but not as good, and the ending was somewhat predictable and a bit of a deus ex machina. But the characters carried it throughout, and I will happily read more of this. Recommended, with the caveat about disability and the content warning for parents. Followed by A Mirror Mended, which I have already pre-ordered. Rating: 8 out of 10

7 December 2021

Daniel Lange: Gradual improvements at the Linux Foundation

After last year's blunder with trying to hide the Adobe toolchain and using hilarious stock photos, the Linux Foundation did much better in their 2021 annual report1 published Dec. 6, 2021. Still they are using the Adobe toolchain (InDesign, Acrobat PDF) and my fellow DebianKernel2 Developer Geert was quick to point that out as the first comment to the LWN note on the publication: LWN comment from Geert I think it is important to call the Linux Foundation (LF) out again and again. Adobe is a Silver member of the LF and they can motivate them to publish their applications for Linux. And if that is not an option, there are Free alternatives like Scribus that could well use the exposure and funds of LF to help catch up to the market leading product, Adobe InDesign. Linux Foundation Annual report 2021, document properties Personally, as a photographer, I am very happy they used stock images from Unsplash to illustrate the 2021 edition over the cringeworthy Shutterstock footage from last year's report. And they gave proper credit: Thank you section for Unsplash from the Linux Foundation 2021 annual report Now for next year ... find an editor that knows how to spell photographers, please. And consider Scribus. And make Adobe publish their apps for Linux. Thank you.

  1. Update 07.12.2021 22:00 CET: I had to replace the link to the Linux Foundation 2021 annual report with an archive.org one as they updated the report to fix the typo as per the comment from Melissa Schmidt below. Stable URLs are not a thing, apparently. You can find their new report at https://www.linuxfoundation.org/wp-content/uploads/2021_LF_Annual_Report_120721c.pdf. Unless somebody points out more typos. There is a Last-Modified Header in HTTP 1.1. Wordpress, Varnish and Nginx, serving the LF website, all support that. Diff of 2021_LF_Annual_Report_120621a and2021_LF_Annual_Report_120721c
  2. 08.12.2021: Geert Uytterhoeven wrote in that he is "geert" on LWN, both are very nice Geert's but different Geert's :-)

29 November 2021

Russ Allbery: Fall haul

It's been a while since I've posted one of these, and I also may have had a few moments of deciding to support authors by buying their books even if I'm not going to get a chance to read them soon. There's also a bit of work reading in here. Ryka Aoki Light from Uncommon Stars (sff)
Frederick R. Chromey To Measure the Sky (non-fiction)
Neil Gaiman, et al. Sandman: Overture (graphic novel)
Alix E. Harrow A Spindle Splintered (sff)
Jordan Ifueko Raybearer (sff)
Jordan Ifueko Redemptor (sff)
T. Kingfisher Paladin's Hope (sff)
TJ Klune Under the Whispering Door (sff)
Kiese Laymon How to Slowly Kill Yourself and Others in America (non-fiction)
Yuna Lee Fox You (romance)
Tim Mak Misfire (non-fiction)
Naomi Novik The Last Graduate (sff)
Shelley Parker-Chan She Who Became the Sun (sff)
Gareth L. Powell Embers of War (sff)
Justin Richer & Antonio Sanso OAuth 2 in Action (non-fiction)
Dean Spade Mutual Aid (non-fiction)
Lana Swartz New Money (non-fiction)
Adam Tooze Shutdown (non-fiction)
Bill Watterson The Essential Calvin and Hobbes (strip collection)
Bill Willingham, et al. Fables: Storybook Love (graphic novel)
David Wong Real-World Cryptography (non-fiction)
Neon Yang The Black Tides of Heaven (sff)
Neon Yang The Red Threads of Fortune (sff)
Neon Yang The Descent of Monsters (sff)
Neon Yang The Ascent to Godhood (sff)
Xiran Jay Zhao Iron Widow (sff)

14 November 2021

Russ Allbery: Review: The Last Graduate

Review: The Last Graduate, by Naomi Novik
Series: The Scholomance #2
Publisher: Del Rey
Copyright: 2021
ISBN: 0-593-12887-7
Format: Kindle
Pages: 388
This is a direct sequel to A Deadly Education, by which I mean it starts in the same minute at which A Deadly Education ends (and let me say how grateful I am for a sequel that doesn't drop days, months, or years between books). You do not want to read this series out of order. This book is also very difficult to review without spoiling either it or the previous book, so please bear with me if I'm elliptical in my ravings. Because The Last Graduate is so good. So good, not only as a piece of writing, but as a combination of two of my favorite tropes in fiction, one of which I can't talk about because of spoilers. I adored this book in a way that is not entirely rational. I will attempt a review below anyway, but if you liked the first book, just stop reading here and go read the second one. It's more of everything I loved in the first book except even better, it did some things I was expecting and some things I didn't expect at all, and it's just so ridiculously good. Just be aware that it has another final-line cliffhanger. The third book is coming in (hopefully) 2022. Novik handles the cliffhanger at the end of the previous book beautifully, which is worth noting because there were so many ways in which it could have gone poorly. One of the best things about this series is Novik's skill at writing El's relationship with her mother, even though her mother has not appeared in the series so far. El argues with her mother's voice in her head, tells stories about her, wonders what her mother would think of her classmates (or in some cases knows exactly what her mother would think of her classmates), and sometimes makes the explicit decision to not be her mother. The relationship has the sort of messy complexity, shared history, and underlying respect that many people experience in life but that I've rarely seen portrayed this well in a fantasy novel. Novik's presentation of that relationship works because El's voice is so strong. Within fifteen minutes of starting The Last Graduate, I was already muttering "I love this book" to myself, mostly because of how much I enjoy El's sarcastic, self-deprecating internal commentary. Novik strikes a balance between self-awareness, snark, humor, and real character growth that rivals Murderbot in its effectiveness of first-person perspective. It carries the story over a few weak points, such as a romance that didn't do much for me. Even when I didn't care about part of the plot, I cared about El's opinion of the plot and what it said about El's growing understanding of how to navigate the world. A Deadly Education was scene and character establishment. El insisted on being herself and following her own morals and social rules, and through that found some allies. The Last Graduate gives El enough breathing space to make more nuanced decisions. This is the part of growing up where one realizes the limitations of one's knee-jerk reactions and innate moral judgment. It's also when it becomes hard to trust success that is entirely outside of one's previous experience. El was not a kid who had friends, so she doesn't know what to do with them now that she has them. She's barely able to convince herself that they are friends. This is one of the two fictional tropes I mentioned, the one that I can talk about (at least briefly) without major spoilers. I have such a soft spot for stubborn, sarcastic, principled characters who refuse to play by the social rules that they think are required to make friends and who then find friends who like them for themselves. The moment when they start realizing this has happened and have no idea how to deal with it or how to be a person who has friends is one I will happily read over and over again. I enjoyed this book from the beginning, but there were two points when it grabbed my heart and I was all in. The first one is a huge spoiler that I can't talk about. The second was this paragraph:
[She] came round to me and put her arm around my waist and said under her breath, "Hey, she can be taught," with a tease in her voice that wobbled a little, and when I looked at her, her eyes were bright and wet, and I put my arm around her shoulders and hugged her.
You'll know it when you get there. The Last Graduate also gives the characters other than El and Orion more room, which is part of how it handles the chosen one trope. It's been obvious since early in the first book that Orion is a sort of chosen one, and it becomes obvious to the reader that El may be as well. But Novik doesn't let the plot focus only on them; instead, she uses that trope to look at how alliances and collective action happen, and how no one can carry the weight by themselves. As El learns more and gains power, she also becomes less central to the plot resolution and has to learn how to be less self-reliant. This is not a book where one character is trained to save the world. It's a book where she manages to enlist the support of a kick-ass project manager and becomes part of a team. Middle books of a trilogy are notoriously challenging. Often they're travel books: the first book sets up a problem, the second book moves the characters both physically and emotionally into a position to solve the problem, and the third book is the payoff. Travel books often sag. They can feel obligatory but somewhat boring, like a chore on the way to the third-book climax. The Last Graduate is not a travel book; it is, instead, a pivot book, which is my favorite form of trilogy. It's a book that rewrites the problem the first book set up, both resolving it and expanding the scope beyond what the reader had expected. This is immensely satisfying when done well, and Novik does it extremely well. This is not a flawless book. There are some pacing hiccups, there is a romance angle that didn't work for me (although it does arrive at some character insights that I thought were spot on), and although I think Novik is doing something interesting with the trope, there is a lot of chosen one power escalation happening here. It's not the sort of book that I can claim is perfectly written. Instead, it's the sort of book that uses some of my favorite plot elements and emotional beats in such an effective way and with such a memorable character that I do not have it in me to care about any of the flaws. Your mileage may therefore vary, but I would be happy to read books like this until the end of time. As mentioned above, The Last Graduate ends on another cliffhanger. This time I was worried that Novik might have ended the series there, since there's enough of an internal climax that I could imagine some literary fiction (which often seems allergic to endings) would have stopped here. Thankfully, Novik's web site says this is not the case. The next year is going to be a difficult wait. The third book of this series is going to be incredibly difficult to write, and I hope Novik is up to the challenge she's made for herself. But she handled the transition between the first and second book so well, and this book is so good that I have a lot of hope. If the third book is half as good as I'm hoping, this is going to be one of my favorite fantasy series of all time. Followed by an as-yet-untitled third book. Rating: 10 out of 10

22 September 2021

Ian Jackson: Tricky compatibility issue - Rust's io::ErrorKind

This post is about some changes recently made to Rust's ErrorKind, which aims to categorise OS errors in a portable way. Audiences for this post Background and context Error handling principles Handling different errors differently is often important (although, sadly, often neglected). For example, if a program tries to read its default configuration file, and gets a "file not found" error, it can proceed with its default configuration, knowing that the user hasn't provided a specific config. If it gets some other error, it should probably complain and quit, printing the message from the error (and the filename). Otherwise, if the network fileserver is down (say), the program might erroneously run with the default configuration and do something entirely wrong. Rust's portability aims The Rust programming language tries to make it straightforward to write portable code. Portable error handling is always a bit tricky. One of Rust's facilities in this area is std::io::ErrorKind which is an enum which tries to categorise (and, sometimes, enumerate) OS errors. The idea is that a program can check the error kind, and handle the error accordingly. That these ErrorKinds are part of the Rust standard library means that to get this right, you don't need to delve down and get the actual underlying operating system error number, and write separate code for each platform you want to support. You can check whether the error is ErrorKind::NotFound (or whatever). Because ErrorKind is so important in many Rust APIs, some code which isn't really doing an OS call can still have to provide an ErrorKind. For this purpose, Rust provides a special category ErrorKind::Other, which doesn't correspond to any particular OS error. Rust's stability aims and approach Another thing Rust tries to do is keep existing code working. More specifically, Rust tries to:
  1. Avoid making changes which would contradict the previously-published documentation of Rust's language and features.
  2. Tell you if you accidentally rely on properties which are not part of the published documentation.
By and large, this has been very successful. It means that if you write code now, and it compiles and runs cleanly, it is quite likely that it will continue work properly in the future, even as the language and ecosystem evolves. This blog post is about a case where Rust failed to do (2), above, and, sadly, it turned out that several people had accidentally relied on something the Rust project definitely intended to change. Furthermore, it was something which needed to change. And the new (corrected) way of using the API is not so obvious. Rust enums, as relevant to io::ErrorKind (Very briefly:) When you have a value which is an io::ErrorKind, you can compare it with specific values:
    if error.kind() == ErrorKind::NotFound   ...
  
But in Rust it's more usual to write something like this (which you can read like a switch statement):
    match error.kind()  
      ErrorKind::NotFound => use_default_configuration(),
      _ => panic!("could not read config file  :  ", &file, &error),
     
  
Here _ means "anything else". Rust insists that match statements are exhaustive, meaning that each one covers all the possibilities. So if you left out the line with the _, it wouldn't compile. Rust enums can also be marked non_exhaustive, which is a declaration by the API designer that they plan to add more kinds. This has been done for ErrorKind, so the _ is mandatory, even if you write out all the possibilities that exist right now: this ensures that if new ErrorKinds appear, they won't stop your code compiling. Improving the error categorisation The set of error categories stabilised in Rust 1.0 was too small. It missed many important kinds of error. This makes writing error-handling code awkward. In any case, we expect to add new error categories occasionally. I set about trying to improve this by proposing new ErrorKinds. This obviously needed considerable community review, which is why it took about 9 months. The trouble with Other and tests Rust has to assign an ErrorKind to every OS error, even ones it doesn't really know about. Until recently, it mapped all errors it didn't understand to ErrorKind::Other - reusing the category for "not an OS error at all". Serious people who write serious code like to have serious tests. In particular, testing error conditions is really important. For example, you might want to test your program's handling of disk full, to make sure it didn't crash, or corrupt files. You would set up some contraption that would simulate a full disk. And then, in your tests, you might check that the error was correct. But until very recently (still now, in Stable Rust), there was no ErrorKind::StorageFull. You would get ErrorKind::Other. If you were diligent you would dig out the OS error code (and check for ENOSPC on Unix, corresponding Windows errors, etc.). But that's tiresome. The more obvious thing to do is to check that the kind is Other. Obvious but wrong. ErrorKind is non_exhaustive, implying that more error kinds will appears, and, naturally, these would more finely categorise previously-Other OS errors. Unfortunately, the documentation note
Errors that are Other now may move to a different or a new ErrorKind variant in the future.
was only added in May 2020. So the wrongness of the "obvious" approach was, itself, not very obvious. And even with that docs note, there was no compiler warning or anything. The unfortunate result is that there is a body of code out there in the world which might break any time an error that was previously Other becomes properly categorised. Furthermore, there was nothing stopping new people writing new obvious-but-wrong code. Chosen solution: Uncategorized The Rust developers wanted an engineered safeguard against the bug of assuming that a particular error shows up as Other. They chose the following solution: There is now a new ErrorKind::Uncategorized which is now used for all OS errors for which there isn't a more specific categorisation. The fallback translation of unknown errors was changed from Other to Uncategorised. This is de jure justified by the fact that this enum has always been marked non_exhaustive. But in practice because this bug wasn't previously detected, there is such code in the wild. That code now breaks (usually, in the form of failing test cases). Usually when Rust starts to detect a particular programming error, it is reported as a new warning, which doesn't break anything. But that's not possible here, because this is a behavioural change. The new ErrorKind::Uncategorized is marked unstable. This makes it impossible to write code on Stable Rust which insists that an error comes out as Uncategorized. So, one cannot now write code that will break when new ErrorKinds are added. That's the intended effect. The downside is that this does break old code, and, worse, it is not as clear as it should be what the fixed code looks like. Alternatives considered and rejected by the Rust developers Not adding more ErrorKinds This was not tenable. The existing set is already too small, and error categorisation is in any case expected to improve over time. Just adding ErrorKinds as had been done before This would mean occasionally breaking test cases (or, possibly, production code) when an error that was previously Other becomes categorised. The broken code would have been "obvious", but de jure wrong, just as it is now, So this option amounts to expecting this broken code to continue to be written and continuing to break it occasionally. Somehow using Rust's Edition system The Rust language has a system to allow language evolution, where code declares its Edition (2015, 2018, 2021). Code from multiple editions can be combined, so that the ecosystem can upgrade gradually. It's not clear how this could be used for ErrorKind, though. Errors have to be passed between code with different editions. If those different editions had different categorisations, the resulting programs would have incoherent and broken error handling. Also some of the schemes for making this change would mean that new ErrorKinds could only be stabilised about once every 3 years, which is far too slow. How to fix code broken by this change Most main-line error handling code already has a fallback case for unknown errors. Simply replacing any occurrence of Other with _ is right. How to fix thorough tests The tricky problem is tests. Typically, a thorough test case wants to check that the error is "precisely as expected" (as far as the test can tell). Now that unknown errors come out as an unstable Uncategorized variant that's not so easy. If the test is expecting an error that is currently not categorised, you want to write code that says "if the error is any of the recognised kinds, call it a test failure". What does "any of the recognised kinds" mean here ? It doesn't meany any of the kinds recognised by the version of the Rust stdlib that is actually in use. That set might get bigger. When the test is compiled and run later, perhaps years later, the error in this test case might indeed be categorised. What you actually mean is "the error must not be any of the kinds which existed when the test was written". IMO therefore the right solution for such a test case is to cut and paste the current list of stable ErrorKinds into your code. This will seem wrong at first glance, because the list in your code and in Rust can get out of step. But when they do get out of step you want your version, not the stdlib's. So freezing the list at a point in time is precisely right. You probably only want to maintain one copy of this list, so put it somewhere central in your codebase's test support machinery. Periodically, you can update the list deliberately - and fix any resulting test failures. Unfortunately this approach is not suggested by the documentation. In theory you could work all this out yourself from first principles, given even the situation prior to May 2020, but it seems unlikely that many people have done so. In particular, cutting and pasting the list of recognised errors would seem very unnatural. Conclusions This was not an easy problem to solve well. I think Rust has done a plausible job given the various constraints, and the result is technically good. It is a shame that this change to make the error handling stability more correct caused the most trouble for the most careful people who write the most thorough tests. I also think the docs could be improved.
edited shortly after posting, and again 2021-09-22 16:11 UTC, to fix HTML slips


comment count unavailable comments

6 September 2021

Jelmer Vernooij: Web Hooks for the Janitor

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. As covered in my post from last week, the Janitor now regularly tries to import new upstream git snapshots or upstream releases into packages in Sid.

Moving parts There are about 30,000 packages in sid, and it usually takes a couple of weeks for the janitor to cycle through all of them. Generally speaking, there are up to three moving targets for each package:
  • The packaging repository; vcswatch regularly scans this for changes, and notifies the janitor when a repository has changed. For salsa repositories it is instantly notified through a web hook
  • The upstream release tarballs; the QA watch service regularly polls these, and the janitor scans for changes in the UDD tables with watch data (used for fresh-releases)
  • The upstream repository; there is no service in Debian that watches this at the moment (used for fresh-snapshots)
When the janitor notices that one of these three targets has changed, it prioritizes processing of a package - this means that a push to a packaging repository on salsa usually leads to a build being kicked off within 10 minutes. New upstream releases are usually noticed by QA watch within a day or so and then lead to a build. Now commits in upstream repositories don t get noticed today. Note that there are no guarantees; the scheduler tries to be clever and not e.g. rebuild the same package over and over again if it s constantly changing and takes a long time to build. Packages without priority are processed with a scoring system that takes into account perceived value (based on e.g. popcon), cost (based on wall-time duration of previous builds) and likelihood of success (whether recent builds were successful, and how frequently the repositories involved change).
webhooks for upstream repositories At the moment there is no service in Debian (yet - perhaps this is something that vcswatch or a sibling service could also do?) that scans upstream repositories for changes. However, if you maintain an upstream package, you can use a webhook to notify the janitor that commits have been made to your repository, and it will create a new package in fresh-snapshots. Webhooks from the following hosting site software are currently supported: You can simply use the URL https://janitor.debian.net/ as the target for hooks. There is no need to specify a secret, and the hook can either use a JSON or form encoding payload. The endpoint should tell you whether it understood a webhook request, and whether it took any action. It s fine to submit webhooks for repositories that the janitor does not (yet) know about.
GitHub For GitHub, you can do so in the Webhooks section of the Settings tab. Fill the form as shown below and click on Add webhook:
GitLab On GitLab instances, you can find the Webhooks tab under the Settings menu for each repository (under the gear symbol). Fill the form in as shown below and click Add Webhook:
Launchpad For Launchpad, go to the repository (for Git) web view and click Manage Webhooks. From there, you can add a new webhook; fill the form in as shown below and click Add Webhook:

2 September 2021

Ian Jackson: partial-borrow: references to restricted views of a Rust struct

tl;dr:
With these two crazy proc-macros you can hand out multipe (perhaps mutable) references to suitable subsets/views of the same struct. Why In Otter I have adopted a style where I try to avoid giving code mutable access that doesn't need it, and try to make mutable access come with some code structures to prevent "oh I forgot a thing" type mistakes. For example, mutable access to a game state is only available in contexts that have to return a value for the updates to send to the players. This makes it harder to forget to send the update. But there is a downside. The game state is inside another struct, an Instance, and much code needs (immutable) access to it. I can't pass both &Instance and &mut GameState because one is inside the other. My workaround involves passing separate references to the other fields of Instance, leading to some functions taking far too many arguments. 14 in one case. (They're all different types so argument ordering mistakes just result in compiler errors talking about arguments 9 and 11 having wrong types, rather than actual bugs.) I felt this problem was purely a restriction arising from limitations of the borrow checker. I thought it might be possible to improve on it. Weeks passed and the question gradually wormed its way into my consciousness. Eventually, I tried some experiments. Encouraged, I persisted. What and how partial-borrow is a Rust library which solves this problem. You sprinkle #[Derive(PartialBorrow)] and partial!(...) and then you can pass a reference which grants mutable access to only some of the fields. You can also pass a reference through which some fields are inaccessible. You can even split a single mut reference into multiple compatible references, for example granting mut access to mutually-nonverlapping subsets. The core type is Struct__Partial (for some Struct). It is a zero-sized type, but we prevent anyone from constructing one. Instead we magic up references to it, always ensuring that they have the same address as some Struct. The fields of Struct__Partial are also ZSTs that exist ony as references, and they Deref to the actual field (subject to compile-type borrow compatibility checking). Soundness and testing partial-borrow is primarily a nontrivial procedural macro which autogenerates reams of unsafe. Of course I think it's sound, but I thought that the last two times before I added a test which demonstrated otherwise. So it might be fairer to say that I have tried to make it sound and that I don't know of any problems... Reasoning about the correctness of macro-generated code is not so easy. One problem is that there is nowhere good to put the kind of soundness arguments you would normally add near uses of unsafe. I decided to solve this by annotating an instance of the macro output. There's a not very complicated script using diff3 to help fold in changes if the macro output changes - merge conflicts there mean a possible re-review of the argument text. Of course I also have test cases that run with miri, and test cases for expected compiler errors for uses that need to be forbidden for soundness. But this is quite hairy and I'm worried that it might be rather "my first insane unsafe contraption". Also the pointer/reference trickery is definitely subtle, and depends heavily on knowing what Rust's aliasing and pointer provenance rules really are. Stacked Borrows is not entirely trivial to reason about in fiddly corner cases. So for now I have only called it 0.1.0 and left a note in the docs. I haven't actually made Otter use it yet but that's the rather more boring software integration part, not the fun "can I do this mad thing" part so I will probably leave that for a rainy day. Possibly a rainy day after someone other than me has looked at partial-borrow (preferably someone who understands Stacked Borrows...). Fun! This was great fun. I even enjoyed writing the docs. The proc-macro programming environment is not entirely straightforward and there are a number of things to watch out for. For my first non-adhoc proc-macro this was, perhaps, ambitious. But you don't learn anything without trying...
edited 2021-09-02 16:28 UTC to fix a typo


comment count unavailable comments

31 August 2021

Benjamin Mako Hill: Returning to DebConf

I first started using Debian sometime in the mid 90s and started contributing as a developer and package maintainer more than two decades years ago. My first very first scholarly publication, collaborative work led by Martin Michlmayr that I did when I was still an undergrad at Hampshire College, was about quality and the reliance on individuals in Debian. To this day, many of my closest friends are people I first met through Debian. I met many of them at Debian s annual conference DebConf. Given my strong connections to Debian, I find it somewhat surprising that although all of my academic research has focused on peer production, free culture, and free software, I haven t actually published any Debian related research since that first paper with Martin in 2003! So it felt like coming full circle when, several days ago, I was able to sit in the virtual DebConf audience and watch two of my graduate student advisees Kaylea Champion and Wm Salt Hale present their research about Debian at DebConf21. Salt presented his masters thesis work which tried to understand the social dynamics behind organizational resilience among free software projects. Kaylea presented her work on a new technique she developed to identifying at risk software packages that are lower quality than we might hope given their popularity (you can read more about Kaylea s project in our blog post from earlier this year). If you missed either presentation, check out the blog post my research collective put up or watch the videos below. If you want to hear about new work we re doing including work on Debian you should follow our research group blog, and/or follow or engage with us in the Fediverse (@communitydata@social.coop), or on Twitter (@comdatasci). And if you re interested in joining us perhaps to do more research on FLOSS and/or Debian and/or a graduate degree of your own? please be in touch with me directly!
Wm Salt Hale s presentation plus Q&A. (WebM available)
Kaylea Champion s presentation plus Q&A. (WebM available)

25 August 2021

Jelmer Vernooij: Thousands of Debian packages updated from their upstream Git repository

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. Linux distributions like Debian fulfill an important function in the FOSS ecosystem - they are system integrators that take existing free and open source software projects and adapt them where necessary to work well together. They also make it possible for users to install more software in an easy and consistent way and with some degree of quality control and review. One of the consequences of this model is that the distribution package often lags behind upstream releases. This is especially true for distributions that have tighter integration and standardization (such as Debian), and often new upstream code is only imported irregularly because it is a manual process - both updating the package, but also making sure that it still works together well with the rest of the system. The process of importing a new upstream used to be (well, back when I started working on Debian packages) fairly manual and something like this:

Ecosystem Improvements However, there have been developments over the last decade that make it easier to import new upstream releases into Debian packages.
Uscan and debian QA watch Uscan and debian/watch have been around for a while and make it possible to find upstream tarballs. A debian watch file usually looks something like this:
1
2
version=4
http://somesite.com/dir/filenamewithversion.tar.gz
The QA watch service regularly polls all watch locations in the archive and makes the information available, so it s possible to know which packages have changed without downloading each one of them.
Git Git is fairly ubiquitous nowadays, and most upstream projects and packages in Debian use it. There are still exceptions that do not use any version control system or that use a different control system, but they are becoming increasingly rare. [1]
debian/upstream/metadata DEP-12 specifies a file format with metadata about the upstream project that a package was based on. In particular relevant for our case is the fact it has fields for the location of the upstream version control location. debian/upstream/metadata files look something like this:
1
2
3
---
Repository: https://www.dulwich.io/code/dulwich/
Repository-Browse: https://www.dulwich.io/code/dulwich/
While DEP-12 is still a draft, it has already been widely adopted - there are about 10000 packages in Debian that ship a debian/upstream/metadata file with Repository information.
Autopkgtest The Autopkgtest standard and associated tooling provide a way to run a defined set of tests against an installed package. This makes it possible to verify that a package is working correctly as part of the system as a whole. ci.debian.net regularly runs these tests against Debian packages to detect regressions.
Vcs-Git headers The Vcs-Git headers in debian/control are the equivalent of the Repository field in debian/upstream/metadata, but for the packaging repositories (as opposed to the upstream ones). They ve been around for a while and are widely adopted, as can be seen from zack s stats: The vcswatch service that regularly polls packaging repositories to see whether they have changed makes it a lot easier to consume this information in usable way.
Debhelper adoption Over the last couple of years, Debian has slowly been converging on a single build tool - debhelper s dh interface. Being able to rely on a single build tool makes it easier to write code to update packaging when upstream changes require it.
Debhelper DWIM Debhelper (and its helpers) increasingly can figure out how to do the Right Thing in many cases without being explicitly configured. This makes packaging less effort, but also means that it s less likely that importing a new upstream version will require updates to the packaging. With all of these improvements in place, it actually becomes feasible in a lot of situations to update a Debian package to a new upstream version automatically. Of course, this requires that all of this information is available, so it won t work for all packages. In some cases, the packaging for the older upstream version might not apply to the newer upstream version. The Janitor has attempted to import a new upstream Git snapshot and a new upstream release for every package in the archive where a debian/watch file or debian/upstream/metadata file are present. These are the steps it uses:
  • Find new upstream version
    • If release, use debian/watch - or maybe tagged in upstream repository
    • If snapshot, use debian/upstream/metadata s Repository field
    • If neither is available, use guess-upstream-metadata from upstream-ontologist to guess the upstream Repository
  • Merge upstream version into packaging repository, possibly importing tarballs using pristine-tar
  • Update the changelog file to mention the new upstream version
  • Run some checks to ensure there are no unintentional changes, e.g.:
    • Scan diff between old and new for surprising license changes
      • Today, abort if there are any - in the future, maybe update debian/copyright
    • Check for obvious compatibility breaks - e.g. sonames changing
  • Attempt to update the packaging to reflect upstream changes
    • Refresh patches
  • Attempt to build the package with deb-fix-build, to deal with any missing dependencies
  • Run the autopkgtests with deb-fix-build to deal with missing dependencies, and abort if any tests fail
Results When run over all packages in unstable (sid), this process works for a surprising number of them.
Fresh Releases For fresh-releases (aka imports of upstream releases), processing all packages maintained in Git for which QA watch reports new releases (about 11,000): That means about 2300 packages updated, and about 4000 unchanged.
Fresh Snapshots For fresh-snapshots (aka imports of latest Git commit from upstream), processing all packages maintained in Git (about 26,000): Or 5100 packages updated and 2100 for which there was nothing to do, i.e. no upstream commits since the last Debian upload. As can be seen, this works for a surprising fraction of packages. It s possible to get the numbers up even higher, by both improving the tooling, the autopkgtests and the metadata that is provided by packages.
Using these packages All the packages that have been built can be accessed from the Janitor APT repository. More information can be found at https://janitor.debian.net/fresh, but in short - run:
1
2
3
4
5
6
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-snapshots main   sudo tee /etc/apt/sources.list.d/fresh-snapshots.list
echo deb "[arch=amd64 signed-by=/usr/share/keyrings/debian-janitor-archive-keyring.gpg]" \
    https://janitor.debian.net/ fresh-releases main   sudo tee /etc/apt/sources.list.d/fresh-releases.list
sudo curl -o /usr/share/keyrings/debian-janitor-archive-keyring.gpg https://janitor.debian.net/pgp_keys
apt update
And then you can install packages from the fresh-snapshots (upstream git snapshots) or fresh-releases suites on a case-by-case basis by running something like:
1
apt install -t fresh-snapshots r-cran-roxygen2
Most packages are updated based on information provided by vcswatch and qa watch, but it s also possible for upstream repositories to call a web hook to trigger a refresh of a package. These packages were built against unstable, but should in almost all cases also work for testing.
Caveats Of course, since these packages are built automatically without human supervision it s likely that some of them will have bugs in them that would otherwise have been caught by the maintainer.
[1]I m not saying that a monoculture is great here, but it does help distributions.

17 August 2021

Ian Jackson: Releasing nailing-cargo 1.0.0

Summary I have just tagged nailing-cargo/1.0.0. nailing-cargo is a wrapper around the Rust build tool cargo. nailing-cargo can: Background and history It's not really possible to make a nontrivial Rust project without using cargo. But the build process automatically downloads and executes code from crates.io, which is a minimally-curated repository. I didn't want to expose my main account to that. And, at the time, I was working on a project which for which I was also writing a library as a dependency, and I found that cargo couldn't cope with this unless I were to commit (to my git repository) the path (on my local laptop) of my dependency. I filed some bugs, including about the unpublished crate problem. But also, I was stubborn enough to try to find a workaround that didn't involve committing junk to my git history. The result was a short but horrific shell script. I wrote about this at the time (March 2019). Over the last few years the difficulties I have with cargo have remained un-resolved. I found my interactions with upstream rather discouraging. It didn't seem like I would get anywhere by trying to help improve cargo to better support my needs. So instead I have gradually improved nailing-cargo. It is now a Perl script. It is rather less horrific, and has proper documentation (sorry, JS needed because GitLab). Why Perl ? Rust would have been my language of choice. But I wanted to avoid a chicken-and-egg situation. When you're doing privsep, nailing-cargo has to run in your more privileged environment. I wanted something easy to get going with. nailing-cargo has to contain a TOML parser; and I found a small one, TOML-Tiny, which was good enough as a starting point, and small enough I could bundle it as a git subtree. Perl is nicely fast to start up (nailing-cargo --- true runs in about 170ms on my laptop), and it is easy to write a Perl script that will work on pretty much any Perl installation. Still unsolved: embedding cargo in another build system A number of my projects contain a mixture of Rust code with other languages. Unfortunately, nailing-cargo doesn't help with the problems which arise trying to integrate cargo into another build system. I generally resort to find runes for finding Rust source files that might influence cargo, and stamp files for seeing if I have run it recently enough; and I simply live with the fact that cargo sometimes builds more stuff than I needed it to. Future There are a number of ways nailing-cargo could be improved. Notably, the need to overwrite your actual Cargo.toml is very annoying, even if nailing-cargo puts it back afterwards. A big problem with this is that it means that nailing-cargo has to take a lock, while your cargo rune runs. This effectively prevents using nailing-cargo with long-running processes. Notably, editor integrations like rls and racer. I could perhaps solve this with more linkfarm-juggling, but that wouldn't help in-tree builds and it's hard to keep things up to date. I am considering using LD_PRELOAD trickery or maybe bwrap(1) to "implement" the alternative Cargo.toml feature which was rejected by cargo upstream in 2019 (and again in April when someone else asked). Currently there is no support for using sudo for out-of-tree privsep. This should be easy to add but it needs someone who uses sudo to want it (and to test it!) The documentation has some other dicusssion of limitations, some of which aren't too hard to improve. Patches welcome!

comment count unavailable comments

14 August 2021

Andrew Cater: Still chasing through release testing Debian media for Bullseye release 202108141655

Lots of people - lots of effort - we're gradually closing in on a last few tests.It's been quite a long time but we're significantly ahead of where we would be on many tests for release candidates and main releases. It's always fun to do and chat back and forth. Having new testers check in from tomorrow (Australia) has also been a novelty.It's been a very long wait for this but "This is the best Debian release ever", as they say.

2 August 2021

Colin Watson: Launchpad now runs on Python 3!

After a very long porting journey, Launchpad is finally running on Python 3 across all of our systems. I wanted to take a bit of time to reflect on why my emotional responses to this port differ so much from those of some others who ve done large ports, such as the Mercurial maintainers. It s hard to deny that we ve had to burn a lot of time on this, which I m sure has had an opportunity cost, and from one point of view it s essentially running to stand still: there is no single compelling feature that we get solely by porting to Python 3, although it s clearly a prerequisite for tidying up old compatibility code and being able to use modern language facilities in the future. And yet, on the whole, I found this a rewarding project and enjoyed doing it. Some of this may be because by inclination I m a maintenance programmer and actually enjoy this sort of thing. My default view tends to be that software version upgrades may be a pain but it s much better to get that pain over with as soon as you can rather than trying to hold back the tide; you can certainly get involved and try to shape where things end up, but rightly or wrongly I can t think of many cases when a righteously indignant user base managed to arrange for the old version to be maintained in perpetuity so that they never had to deal with the new thing (OK, maybe Perl 5 counts here). I think a more compelling difference between Launchpad and Mercurial, though, may be that very few other people really had a vested interest in what Python version Launchpad happened to be running, because it s all server-side code (aside from some client libraries such as launchpadlib, which were ported years ago). As such, we weren t trying to do this with the internet having Strong Opinions at us. We were doing this because it was obviously the only long-term-maintainable path forward, and in more recent times because some of our library dependencies were starting to drop support for Python 2 and so it was obviously going to become a practical problem for us sooner or later; but if we d just stayed on Python 2 forever then fundamentally hardly anyone else would really have cared directly, only maybe about some indirect consequences of that. I don t follow Mercurial development so I may be entirely off-base, but if other people were yelling at me about how late my project was to finish its port, that in itself would make me feel more negatively about the project even if I thought it was a good idea. Having most of the pressure come from ourselves rather than from outside meant that wasn t an issue for us. I m somewhat inclined to think of the process as an extreme version of paying down technical debt. Moving from Python 2.7 to 3.5, as we just did, means skipping over multiple language versions in one go, and if similar changes had been made more gradually it would probably have felt a lot more like the typical dependency update treadmill. I appreciate why not everyone might want to think of it this way: maybe this is just my own rationalization. Reflections on porting to Python 3 I m not going to defend the Python 3 migration process; it was pretty rough in a lot of ways. Nor am I going to spend much effort relitigating it here, as it s already been done to death elsewhere, and as I understand it the core Python developers have got the message loud and clear by now. At a bare minimum, a lot of valuable time was lost early in Python 3 s lifetime hanging on to flag-day-type porting strategies that were impractical for large projects, when it should have been providing for bilingual strategies (code that runs in both Python 2 and 3 for a transitional period) which is where most libraries and most large migrations ended up in practice. For instance, the early advice to library maintainers to maintain two parallel versions or perhaps translate dynamically with 2to3 was entirely impractical in most non-trivial cases and wasn t what most people ended up doing, and yet the idea that 2to3 is all you need still floats around Stack Overflow and the like as a result. (These days, I would probably point people towards something more like Eevee s porting FAQ as somewhere to start.) There are various fairly straightforward things that people often suggest could have been done to smooth the path, and I largely agree: not removing the u'' string prefix only to put it back in 3.3, fewer gratuitous compatibility breaks in the name of tidiness, and so on. But if I had a time machine, the number one thing I would ask to have been done differently would be introducing type annotations in Python 2 before Python 3 branched off. It s true that it s technically possible to do type annotations in Python 2, but the fact that it s a different syntax that would have to be fixed later is offputting, and in practice it wasn t widely used in Python 2 code. To make a significant difference to the ease of porting, annotations would need to have been introduced early enough that lots of Python 2 library code used them so that porting code didn t have to be quite so much of an exercise of manually figuring out the exact nature of string types from context. Launchpad is a complex piece of software that interacts with multiple domains: for example, it deals with a database, HTTP, web page rendering, Debian-format archive publishing, and multiple revision control systems, and there s often overlap between domains. Each of these tends to imply different kinds of string handling. Web page rendering is normally done mainly in Unicode, converting to bytes as late as possible; revision control systems normally want to spend most of their time working with bytes, although the exact details vary; HTTP is of course bytes on the wire, but Python s WSGI interface has some string type subtleties. In practice I found myself thinking about at least four string-like types (that is, things that in a language with a stricter type system I might well want to define as distinct types and restrict conversion between them): bytes, text, ordinary native strings (str in either language, encoded to UTF-8 in Python 2), and native strings with WSGI s encoding rules. Some of these are emergent properties of writing in the intersection of Python 2 and 3, which is effectively a specialized language of its own without coherent official documentation whose users must intuit its behaviour by comparing multiple sources of information, or by referring to unofficial porting guides: not a very satisfactory situation. Fortunately much of the complexity collapses once it becomes possible to write solely in Python 3. Some of the difficulties we ran into are not ones that are typically thought of as Python 2-to-3 porting issues, because they were changed later in Python 3 s development process. For instance, the email module was substantially improved in around the 3.2/3.3 timeframe to handle Python 3 s bytes/text model more correctly, and since Launchpad sends quite a few different kinds of email messages and has some quite picky tests for exactly what it emits, this entailed a lot of work in our email sending code and in our test suite to account for that. (It took me a while to work out whether we should be treating raw email messages as bytes or as text; bytes turned out to work best.) 3.4 made some tweaks to the implementation of quoted-printable encoding that broke a number of our tests in ways that took some effort to fix, because the tests needed to work on both 2.7 and 3.5. The list goes on. I got quite proficient at digging through Python s git history to figure out when and why some particular bit of behaviour had changed. One of the thorniest problems was parsing HTTP form data. We mainly rely on zope.publisher for this, which in turn relied on cgi.FieldStorage; but cgi.FieldStorage is badly broken in some situations on Python 3. Even if that bug were fixed in a more recent version of Python, we can t easily use anything newer than 3.5 for the first stage of our port due to the version of the base OS we re currently running, so it wouldn t help much. In the end I fixed some minor issues in the multipart module (and was kindly given co-maintenance of it) and converted zope.publisher to use it. Although this took a while to sort out, it seems to have gone very well. A couple of other interesting late-arriving issues were around pickle. For most things we normally prefer safer formats such as JSON, but there are a few cases where we use pickle, particularly for our session databases. One of my colleagues pointed out that I needed to remember to tell pickle to stick to protocol 2, so that we d be able to switch back and forward between Python 2 and 3 for a while; quite right, and we later ran into a similar problem with marshal too. A more surprising problem was that datetime.datetime objects pickled on Python 2 require special care when unpickling on Python 3; rather than the approach that ended up being implemented and documented for Python 3.6, though, I preferred a custom unpickler, both so that things would work on Python 3.5 and so that I wouldn t have to risk affecting the decoding of other pickled strings in the session database. General lessons Writing this over a year after Python 2 s end-of-life date, and certainly nowhere near the leading edge of Python 3 porting work, it s perhaps more useful to look at this in terms of the lessons it has for other large technical debt projects. I mentioned in my previous article that I used the approach of an enormous and frequently-rebased git branch as a working area for the port, committing often and sometimes combining and extracting commits for review once they seemed to be ready. A port of this scale would have been entirely intractable without a tool of similar power to git rebase, so I m very glad that we finished migrating to git in 2019. I relied on this right up to the end of the port, and it also allowed for quick assessments of how much more there was to land. git worktree was also helpful, in that I could easily maintain working trees built for each of Python 2 and 3 for comparison. As is usual for most multi-developer projects, all changes to Launchpad need to go through code review, although we sometimes make exceptions for very simple and obvious changes that can be self-reviewed. Since I knew from the outset that this was going to generate a lot of changes for review, I therefore structured my work from the outset to try to make it as easy as possible for my colleagues to review it. This generally involved keeping most changes to a somewhat manageable size of 800 lines or less (although this wasn t always possible), and arranging commits mainly according to the kind of change they made rather than their location. For example, when I needed to fix issues with / in Python 3 being true division rather than floor division, I did so in one commit across the various places where it mattered and took care not to mix it with other unrelated changes. This is good practice for nearly any kind of development, but it was especially important here since it allowed reviewers to consider a clear explanation of what I was doing in the commit message and then skim-read the rest of it much more quickly. It was vital to keep the codebase in a working state at all times, and deploy to production reasonably often: this way if something went wrong the amount of code we had to debug to figure out what had happened was always tractable. (Although I can t seem to find it now to link to it, I saw an account a while back of a company that had taken a flag-day approach instead with a large codebase. It seemed to work for them, but I m certain we couldn t have made it work for Launchpad.) I can t speak too highly of Launchpad s test suite, much of which originated before my time. Without a great deal of extensive coverage of all sorts of interesting edge cases at both the unit and functional level, and a corresponding culture of maintaining that test suite well when making new changes, it would have been impossible to be anything like as confident of the port as we were. As part of the porting work, we split out a couple of substantial chunks of the Launchpad codebase that could easily be decoupled from the core: its Mailman integration and its code import worker. Both of these had substantial dependencies with complex requirements for porting to Python 3, and arranging to be able to do these separately on their own schedule was absolutely worth it. Like disentangling balls of wool, any opportunity you can take to make things less tightly-coupled is probably going to make it easier to disentangle the rest. (I can see a tractable way forward to porting the code import worker, so we may well get that done soon. Our Mailman integration will need to be rewritten, though, since it currently depends on the Python-2-only Mailman 2, and Mailman 3 has a different architecture.) Python lessons Our database layer was already in pretty good shape for a port, since at least the modern bits of its table modelling interface were already strict about using Unicode for text columns. If you have any kind of pervasive low-level framework like this, then making it be pedantic at you in advance of a Python 3 port will probably incur much less swearing in the long run, as you won t be trying to deal with quite so many bytes/text issues at the same time as everything else. Early in our port, we established a standard set of __future__ imports and started incrementally converting files over to them, mainly because we weren t yet sure what else to do and it seemed likely to be helpful. absolute_import was definitely reasonable (and not often a problem in our code), and print_function was annoying but necessary. In hindsight I m not sure about unicode_literals, though. For files that only deal with bytes and text it was reasonable enough, but as I mentioned above there were also a number of cases where we needed literals of the language s native str type, i.e. bytes in Python 2 and text in Python 3: this was particularly noticeable in WSGI contexts, but also cropped up in some other surprising places. We generally either omitted unicode_literals or used six.ensure_str in such cases, but it was definitely a bit awkward and maybe I should have listened more to people telling me it might be a bad idea. A lot of Launchpad s early tests used doctest, mainly in the style where you have text files that interleave narrative commentary with examples. The development team later reached consensus that this was best avoided in most cases, but by then there were far too many doctests to conveniently rewrite in some other form. Porting doctests to Python 3 is really annoying. You run into all the little changes in how objects are represented as text (particularly u'...' versus '...', but plenty of other cases as well); you have next to no tools to do anything useful like skipping individual bits of a doctest that don t apply; using __future__ imports requires the rather obscure approach of adding the relevant names to the doctest s globals in the relevant DocFileSuite or DocTestSuite; dealing with many exception tracebacks requires something like zope.testing.renormalizing; and whatever code refactoring tools you re using probably don t work properly. Basically, don t have done that. It did all turn out to be tractable for us in the end, and I managed to avoid using much in the way of fragile doctest extensions aside from the aforementioned zope.testing.renormalizing, but it was not an enjoyable experience. Regressions I know of nine regressions that reached Launchpad s production systems as a result of this porting work; of course there were various other regressions caught by CI or in manual testing. (Considering the size of this project, I count it as a resounding success that there were only nine production issues, and that for the most part we were able to fix them quickly.) Equality testing of removed database objects One of the things we had to do while porting to Python 3 was to implement the __eq__, __ne__, and __hash__ special methods for all our database objects. This was quite conceptually fiddly, because doing this requires knowing each object s primary key, and that may not yet be available if we ve created an object in Python but not yet flushed the actual INSERT statement to the database (most of our primary keys are auto-incrementing sequences). We thus had to take care to flush pending SQL statements in such cases in order to ensure that we know the primary keys. However, it s possible to have a problem at the other end of the object lifecycle: that is, a Python object might still be reachable in memory even though the underlying row has been DELETEd from the database. In most cases we don t keep removed objects around for obvious reasons, but it can happen in caching code, and buildd-manager crashed as a result (in fact while it was still running on Python 2). We had to take extra care to avoid this problem. Debian imports crashed on non-UTF-8 filenames Python 2 has some unfortunate behaviour around passing bytes or Unicode strings (depending on the platform) to shutil.rmtree, and the combination of some porting work and a particular source package in Debian that contained a non-UTF-8 file name caused us to run into this. The fix was to ensure that the argument passed to shutil.rmtree is a str regardless of Python version. We d actually run into something similar before: it s a subtle porting gotcha, since it s quite easy to end up passing Unicode strings to shutil.rmtree if you re in the process of porting your code to Python 3, and you might easily not notice if the file names in your tests are all encoded using UTF-8. lazr.restful ETags We eventually got far enough along that we could switch one of our four appserver machines (we have quite a number of other machines too, but the appservers handle web and API requests) to Python 3 and see what happened. By this point our extensive test suite had shaken out the vast majority of the things that could go wrong, but there was always going to be room for some interesting edge cases. One of the Ubuntu kernel team reported that they were seeing an increase in 412 Precondition Failed errors in some of their scripts that use our webservice API. These can happen when you re trying to modify an existing resource: the underlying protocol involves sending an If-Match header with the ETag that the client thinks the resource has, and if this doesn t match the ETag that the server calculates for the resource then the client has to refresh its copy of the resource and try again. We initially thought that this might be legitimate since it can happen in normal operation if you collide with another client making changes to the same resource, but it soon became clear that something stranger was going on: we were getting inconsistent ETags for the same object even when it was unchanged. Since we d recently switched a quarter of our appservers to Python 3, that was a natural suspect. Our lazr.restful package provides the framework for our webservice API, and roughly speaking it generates ETags by serializing objects into some kind of canonical form and hashing the result. Unfortunately the serialization was dependent on the Python version in a few ways, and in particular it serialized lists of strings such as lists of bug tags differently: Python 2 used [u'foo', u'bar', u'baz'] where Python 3 used ['foo', 'bar', 'baz']. In lazr.restful 1.0.3 we switched to using JSON for this, removing the Python version dependency and ensuring consistent behaviour between appservers. Memory leaks This problem took the longest to solve. We noticed fairly quickly from our graphs that the appserver machine we d switched to Python 3 had a serious memory leak. Our appservers had always been a bit leaky, but now it wasn t so much a small hole that we can bail occasionally as the boat is sinking rapidly : A serious memory leak (Yes, this got in the way of working out what was going on with ETags for a while.) I spent ages messing around with various attempts to fix this. Since only a quarter of our appservers were affected, and we could get by on 75% capacity for a while, it wasn t urgent but it was definitely annoying. After spending some quality time with objgraph, for some time I thought traceback reference cycles might be at fault, and I sent a number of fixes to various upstream projects for those (e.g. zope.pagetemplate). Those didn t help the leaks much though, and after a while it became clear to me that this couldn t be the sole problem: Python has a cyclic garbage collector that will eventually collect reference cycles as long as there are no strong references to any objects in them, although it might not happen very quickly. Something else must be going on. Debugging reference leaks in any non-trivial and long-running Python program is extremely arduous, especially with ORMs that naturally tend to end up with lots of cycles and caches. After a while I formed a hypothesis that zope.server might be keeping a strong reference to something, although I never managed to nail it down more firmly than that. This was an attractive theory as we were already in the process of migrating to Gunicorn for other reasons anyway, and Gunicorn also has a convenient max_requests setting that s good at mitigating memory leaks. Getting this all in place took some time, but once we did we found that everything was much more stable: A rather flat memory graph This isn t completely satisfying as we never quite got to the bottom of the leak itself, and it s entirely possible that we ve only papered over it using max_requests: I expect we ll gradually back off on how frequently we restart workers over time to try to track this down. However, pragmatically, it s no longer an operational concern. Mirror prober HTTPS proxy handling After we switched our script servers to Python 3, we had several reports of mirror probing failures. (Launchpad keeps lists of Ubuntu archive and image mirrors, and probes them every so often to check that they re reasonably complete and up to date.) This only affected HTTPS mirrors when probed via a proxy server, support for which is a relatively recent feature in Launchpad and involved some code that we never managed to unit-test properly: of course this is exactly the code that went wrong. Sadly I wasn t able to sort out that gap, but at least the fix was simple. Non-MIME-encoded email headers As I mentioned above, there were substantial changes in the email package between Python 2 and 3, and indeed between minor versions of Python 3. Our test coverage here is pretty good, but it s an area where it s very easy to have gaps. We noticed that a script that processes incoming email was crashing on messages with headers that were non-ASCII but not MIME-encoded (and indeed then crashing again when it tried to send a notification of the crash!). The only examples of these I looked at were spam, but we still didn t want to crash on them. The fix involved being somewhat more careful about both the handling of headers returned by Python s email parser and the building of outgoing email notifications. This seems to be working well so far, although I wouldn t be surprised to find the odd other incorrect detail in this sort of area. Failure to handle non-ISO-8859-1 URL-encoded form input Remember how I said that parsing HTTP form data was thorny? After we finished upgrading all our appservers to Python 3, people started reporting that they couldn t post Unicode comments to bugs, which turned out to be only if the attempt was made using JavaScript, and was because I hadn t quite managed to get URL-encoded form data working properly with zope.publisher and multipart. The current standard describes the URL-encoded format for form data as in many ways an aberrant monstrosity , so this was no great surprise. Part of the problem was some very strange choices in zope.publisher dating back to 2004 or earlier, which I attempted to clean up and simplify. The rest was that Python 2 s urlparse.parse_qs unconditionally decodes percent-encoded sequences as ISO-8859-1 if they re passed in as part of a Unicode string, so multipart needs to work around this on Python 2. I m still not completely confident that this is correct in all situations, but at least now that we re on Python 3 everywhere the matrix of cases we need to care about is smaller. Inconsistent marshalling of Loggerhead s disk cache We use Loggerhead for providing web browsing of Bazaar branches. When we upgraded one of its two servers to Python 3, we immediately noticed that the one still on Python 2 was failing to read back its revision information cache, which it stores in a database on disk. (We noticed this because it caused a deployment to fail: when we tried to roll out new code to the instance still on Python 2, Nagios checks had already caused an incompatible cache to be written for one branch from the Python 3 instance.) This turned out to be a similar problem to the pickle issue mentioned above, except this one was with marshal, which I didn t think to look for because it s a relatively obscure module mostly used for internal purposes by Python itself; I m not sure that Loggerhead should really be using it in the first place. The fix was relatively straightforward, complicated mainly by now needing to cope with throwing away unreadable cache data. Ironically, if we d just gone ahead and taken the nominally riskier path of upgrading both servers at the same time, we might never have had a problem here. Intermittent bzr failures Finally, after we upgraded one of our two Bazaar codehosting servers to Python 3, we had a report of intermittent bzr branch hangs. After some digging I found this in our logs:
Traceback (most recent call last):
  ...
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/conch/ssh/channel.py", line 136, in addWindowBytes
    self.startWriting()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/lazr/sshserver/session.py", line 88, in startWriting
    resumeProducing()
  File "/srv/bazaar.launchpad.net/production/codehosting1-rev-20124175fa98fcb4b43973265a1561174418f4bd/env/lib/python3.5/site-packages/twisted/internet/process.py", line 894, in resumeProducing
    for p in self.pipes.itervalues():
builtins.AttributeError: 'dict' object has no attribute 'itervalues'
I d seen this before in our git hosting service: it was a bug in Twisted s Python 3 port, fixed after 20.3.0 but unfortunately after the last release that supported Python 2, so we had to backport that patch. Using the same backport dealt with this. Onwards!

19 June 2021

Joachim Breitner: Leaving DFINITY

Last Thursday was my last working day at DFINITY. There are various reasons why I felt that after almost three years the DFINITY Foundation isn t quite the right place for me anymore, and this plan has been in the making for a while. Primarily, there are personal pull factors that strongly suggest that I ll take a break from full time employment, so I decided to see the launch of the Internet Computer through and then leave. DFINITY has hired some amazing people, and it was a great pleasure to work with them. I learned a lot (some Rust, a lot of Nix, and just how merciless Conway s law is), and I dare say I had the opportunity to do some good work, contributing my part to make the Internet Computer a reality. I am especially proud of the Interface Specification and the specification-driven design principles behind it. It even comes with a model reference implementation and acceptance test suite, and although we didn t quite get to do formalization, those familiar with the DeepSpec project will recognize some influence of their concept of deep specifications . Besides that, there is of course my work on the Motoko programming language, where I build the backend,a the Candid interoperability layer, where I helped with the formalism, formulated the a generic soundness criterion for Interface Description Languages in a higher order settings and formally verified that in Coq. Fortunately, all of this work is now Free Software or at least Open Source. With so much work poured into this project, I continue to care about it, and you ll see me post on the the developer forum and hack on Motoko. As the Internet Computer becomes gradually more open, I hope I can be gradually more involved again. But even without me contributing full-time I am sure that DFINITY and the Internet Computer will do well; when I left there were still plenty of smart, capable and enthusiastic people forging ahead. So what s next? So far, I have rushed every professional transition in my life: When starting my PhD, when starting my postdoc, when starting my job at DFINITY, and every time I regretted it. So this time, I will take a proper break and will explore the world a bit (as far as that is possible given the pandemic). I will stresslessly contribute to various open source projects. I also hope to do more public outreach and teaching, writing more blog posts again, recording screencasts and giving talks and lectures. If you want to invite me to your user group/seminar/company/workshop, please let me know! Also, I might be up for small interesting projects in a while. Beyond these, I have no concrete plans and am looking forward to the inspiration I might get from hiking through the Scandinavian wilderness. If you happen to stumble across my tent, please stop for a tea.

15 May 2021

Utkarsh Gupta: Hello, Canonical! o/

Today marks the 90th day of me joining Canonical to work on Ubuntu full-time! So since it s been a while already, this blog post is long due. :)

The News
I joined Canonical, this February, to work on Ubuntu full-time! \o/
Those who know, they know that this is really very exciting for me because Canonical has been a dream company for me, for real (more about this below!). And hey, this is my first job, ever, so all the more reason to be psyched about, isn t it? ^_^ P.S. Keep reading and we ll meet my squad really sooon!

The Story Being an undergrad student (batch 2017-2021), I ve been slightly worried during my last two semesters, naturally, thinking about how s it all gonna pan out and what will I be doing, et al, because I ve been seeing all my friends and batchmates getting placed in companies or going for masters or at least having some sort of plans for their future and I, on the other hand, was hopelessly clueless. :D Well, to be fair, I did Google Summer of Code twice, in 2019 and 2020, became a Debian Developer in 2019, been a part of GCI and Outreachy, contributed to over dozens of open-source projects, et al, et al. So I wasn t all completely hopeless but for sure was completely clueless , heh. And for full disclosure, I was only slightly panicking because firstly, I did get placed in several companies and secondly, I didn t really need a job immediately since I was already getting paid to work on Debian stuff by Freexian, which was good enough. :)
(and honestly, Freexian has my whole heart! - more on that later sometime.) But that s not the point. I was still confused and worried and my mom & dad, more so than anyone. Ugh. We were all figuring out and she asked me places that I was interested to work in. And whilst I wasn t clear about things I wanted to do (and still am!) but I was (very) clear about this and so I told her about Canonical and also did tell her that it s a bit too ambitious for me to think about it now so I ll probably apply after some experience or something. and as they say, the world works in mysterious ways and well, it did for me! So back during the Ruby sprints (Feb 20), Kanashiro, the guy ( ), mentioned that his team was hiring and has a vacant position but I won t be eligible since I was still in my junior year. It was since then I ve been actively praying for Cronus, the god of time, to wave his magic wand and align it in such a way that the next opening should be somewhere near my graduation. And guess what? IT HAPPENED! 9 months later, in November 20, Kanashiro told me his team is hiring yet again and that I could apply this time! Without much (since there was some ) delay, I applied and started asking all sorts of questions to Kanashiro. No words are enough for him, he literally helped me throughout the process; from referring me to answering all sorts of doubts I had! And roughly after 2 months of interviewing, et al, my ambitious dream did come true and I finalyyyy signed my contract! \o/
(the interview process and what went on during those 10 weeks is a story for later ;))

The Server Team! \o This position, which I didn t mention earlier, was for the Server Team which is a team of 15 people, working to make Ubuntu server the best! And as I tweeted sometime back, the team is absolutely lovely, super kind, and consists of the best of teammates one could possibly ask for! Here s a quick sneak peek into our weekly team meeting. Thanks to Rafael for taking such a lovely picture. And yes, the cat Luna is a part of our squad! And oh, did I mention that we re completely remote and distributed?
FUN FACT: Our team covers all the TZs, that is, at any point of time (during weekdays), you ll find someone or the other from the team around! \o/ Anyway, our squad, managed by Rick is divided into two halves: Squeaky Wheels and Table Flip. Cool names, right?
Squeaky Wheels does the distro side of stuff and consists of Christian, Andreas, Rafael, Robie, Bryce, Sergio, Kanashiro, Athos, and now myself as well! And OTOH, Table Flip consists of Dan, Chad, Paride, Lucas, James, and Grant. Even though I interact w/ Squeaky Wheels more (basically daily), each of my teammates is absolutely lovely and equally awesome! Whilst I ll talk more about things here in the upcoming months, this is it for now! If there s anything, in particular, you d like to know more about, let me know! And lastly, here s us vibing our way through, making Ubuntu server better, cause that s how we roll!
Until next time.
:wq for today.

14 May 2021

Jelmer Vernooij: Ognibuild

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. The FOSS world uses a wide variety of different build tools; given a git repository or tarball, it can be hard to figure out how to build and install a piece of software. Humans will generally know what build tool a project is using when they check out a project from git, or they can read the README. And even then, the answer may not always be straightforward to everybody. For automation, there is no obvious place to figure out how to build or install a project.

Debian For Debian packages, Debian maintainers generally will have determined that the appropriate tools to invoke are, and added appropriate invocations to debian/rules. This is really nice when rebuilding all of Debian - one can just invoke debian/rules - a consistent interface - and it will in turn invoke the right tools to build the package, meeting a long list of requirements. With newer versions of debhelper and most common build systems, debhelper can figure a lot of this out automatically - the maintainer just has to add the appropriate build and run time dependencies. However, debhelper needs to be consistent in its behaviour per compat level - otherwise builds might start failing with different versions of debhelper, when the autodetection logic is changed. debhelper can also only do the right thing if all the necessary dependencies are present. debhelper also only functions in the context of a Debian package.
Ognibuild Ognibuild is a new tool that figures out the build system in use by an upstream project, as well as the other dependencies it needs. This information can then be used to invoke said build system, or to e.g. add missing build dependencies to a Debian package. Ognibuild uses a variety of techniques to work out what the dependencies for an upstream package are:
  • Extracting dependencies and other requirements declared in build system metadata (e.g. setup.py)
  • Attempting builds and parsing build logs for missing dependencies (repeating until the build succeeds), calling out to buildlog-consultant
Once it is determined which dependencies are missing, they can be resolved in a variety of ways. Apt can be invoked to install missing dependencies on Debian systems (optionally in a chroot) or ecosystem-specific tools can be used to do so (e.g. pypi or cpan). Instead of installing packages, the tool can also simply inform the user about the missing packages and commands to install them, or update a Debian package appropriately (this is what deb-fix-build does). The target audience of ognibuild are people who need to (possibly from automation) build a variety of projects from different ecosystems or users who are looking to just install a project from source. Developers who are just hacking on e.g. a Python project are better off directly invoking the ecosystem-native tools rather than a wrapper like ognibuild.
Supported ecosystems (Partially) supported ecosystems currently include:
  • Combinations of make and autoconf, automake or CMake
  • Python, including fetching packages from pypi
  • Perl, including fetching packages from cpan
  • Haskell, including fetching from hackage
  • Ninja/Meson
  • Maven
  • Rust, including fetching packages from crates.io
  • PHP Pear
  • R, including fetching packages from CRAN and Bioconductor
For a full list, see the README.
Usage Ognibuild provides a couple of top-level subcommands that will seem familiar to anybody who has used a couple of other build systems:
  • ogni clean - remove build artifacts
  • ogni dist - create a dist tarball
  • ogni build - build the project in the current directory
  • ogni test - run the test suite
  • ogni install - install the project somewhere
  • ogni info - display project information including discovered build system and dependencies
  • ogni exec - run an arbitrary command but attempt to resolve issues like missing dependencies
These tools all take a couple of common options:
resolve=apt auto native Specifies how to resolve any missing dependencies:
  • apt: install the appropriate dependency using apt
  • native: install dependencies using native tools like pip or cpan
  • auto: invoke either apt or native package install, depending on whether the current user is allowed to invoke apt
schroot=name Run inside of a schroot.
explain do not make any changes but tell the user which native on apt packages they could install. There are also subcommand-specific options, e.g. to install to a specific directory on restrict which tests are run.
Examples
Creating a dist tarball
1
2
3
4
5
6
7
8
9
% git clone https://github.com/dulwich/dulwich
% cd dulwich
% ogni --schroot=unstable-amd64-sbuild dist
 
Writing dulwich-0.20.21/setup.cfg
creating dist
Creating tar archive
removing 'dulwich-0.20.21' (and everything under it)
Found new tarball dulwich-0.20.21.tar.gz in /var/run/schroot/mount/unstable-amd64-sbuild-974d32d7-6f10-4e77-8622-b6a091857e85/build/tmpucazj7j7/package/dist.
Installing ldb from source, resolving dependencies using apt
1
2
3
4
5
6
7
8
9
% wget https://download.samba.org/pub/ldb/ldb-2.3.0.tar.gz
% tar xvfz ldb-2.3.0.tar.gz
% cd ldb-2.3.0
% ogni install --prefix=/tmp/ldb
 
+ install /tmp/ldb/include/ldb.h (from include/ldb.h)
 
Waf: Leaving directory  /tmp/ldb-2.3.0/bin/default'
'install' finished successfully (11.395s)
Running all tests from XML::LibXML::LazyBuilder
1
2
3
4
5
6
% wget  https://cpan.metacpan.org/authors/id/T/TO/TORU/XML-LibXML-LazyBuilder-0.08.tar.gz _ <https://cpan.metacpan.org/authors/id/T/TO/TORU/XML-LibXML-LazyBuilder-0.08.tar.gz> _
% tar xvfz XML-LibXML-LazyBuilder-0.08.tar.gz
Cd XML-LibXML-LazyBuilder-0.08
% ogni test
 
Current Status ognibuild is still in its early stages, but works well enough that it can detect and invoke the build system for most of the upstream projects packaged in Debian. If there are buildsystems that it currently lacks support for or other issues, then I d welcome any bug reports.

24 April 2021

Gunnar Wolf: FLISOL Talking about Jitsi

Every year since 2005 there is a very good, big and interesting Latin American gathering of free-software-minded people. Of course, Latin America is a big, big, big place, and it s not like we are the most economically buoyant region to meet in something equiparable to FOSDEM. What we have is a distributed free software conference originally, a distributed Linux install-fest (which I never liked, I am against install-fests), but gradually it morphed into a proper conference: Festival Latinoamericano de Instalaci n de Software Libre (Latin American Free Software Installation Festival) This FLISOL was hosted by the always great and always interesting Rancho Electr nico, our favorite local hacklab, and has many other interesting talks. I like talking about projects where I am involved as a developer but this time I decided to do otherwise: I presented a talk on the Jitsi videoconferencing server. Why? Because of the relevance videoconferences have had over the last year. So, without further ado Here is a video I recorded locally from the talk I gave (MKV), as well as the slides (PDF).

11 April 2021

Jelmer Vernooij: The upstream ontologist

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. The upstream ontologist is a project that extracts metadata about upstream projects in a consistent format. It does this with a combination of heuristics and reading ecosystem-specific metadata files, such as Python s setup.py, rust s Cargo.toml as well as e.g. scanning README files.

Supported Data Sources It will extract information from a wide variety of sources, including:
Supported Fields Fields that it currently provides include:
  • Homepage: homepage URL
  • Name: name of the upstream project
  • Contact: contact address of some sort of the upstream (e-mail, mailing list URL)
  • Repository: VCS URL
  • Repository-Browse: Web URL for viewing the VCS
  • Bug-Database: Bug database URL (for web viewing, generally)
  • Bug-Submit: URL to use to submit new bugs (either on the web or an e-mail address)
  • Screenshots: List of URLs with screenshots
  • Archive: Archive used - e.g. SourceForge
  • Security-Contact: e-mail or URL with instructions for reporting security issues
  • Documentation: Link to documentation on the web:
  • Wiki: Wiki URL
  • Summary: one-line description of the project
  • Description: longer description of the project
  • License: Single line license description (e.g. GPL 2.0 ) as declared in the metadata[1]
  • Copyright: List of copyright holders
  • Version: Current upstream version
  • Security-MD: URL to markdown file with security policy
All data fields have a certainty associated with them ( certain , confident , likely or possible ), which gets set depending on how the data was derived or where it was found. If multiple possible values were found for a specific field, then the value with the highest certainty is taken.
Interface The ontologist provides a high-level Python API as well as two command-line tools that can write output in two different formats: For example, running guess-upstream-metadata on dulwich:
 % guess-upstream-metadata
 <string>:2: (INFO/1) Duplicate implicit target name: "contributing".
 Name: dulwich
 Repository: https://www.dulwich.io/code/
 X-Security-MD: https://github.com/dulwich/dulwich/tree/HEAD/SECURITY.md
 X-Version: 0.20.21
 Bug-Database: https://github.com/dulwich/dulwich/issues
 X-Summary: Python Git Library
 X-Description:  
   This is the Dulwich project.
   It aims to provide an interface to git repos (both local and remote) that
   doesn't call out to git directly but instead uses pure Python.
 X-License: Apache License, version 2 or GNU General Public License, version 2 or later.
 Bug-Submit: https://github.com/dulwich/dulwich/issues/new
Lintian-Brush lintian-brush can update DEP-12-style debian/upstream/metadata files that hold information about the upstream project that is packaged as well as the Homepage in the debian/control file based on information provided by the upstream ontologist. By default, it only imports data with the highest certainty - you can override this by specifying the uncertain command-line flag.
[1]Obviously this won t be able to describe the full licensing situation for many projects. Projects like scancode-toolkit are more appropriate for that.

6 April 2021

Jelmer Vernooij: Automatic Fixing of Debian Build Dependencies

The Debian Janitor is an automated system that commits fixes for (minor) issues in Debian packages that can be fixed by software. It gradually started proposing merges in early December. The first set of changes sent out ran lintian-brush on sid packages maintained in Git. This post is part of a series about the progress of the Janitor. In my last blogpost, I introduced the buildlog consultant - a tool that can identify many reasons why a Debian build failed. For example, here s a fragment of a build log where the Build-Depends lack python3-setuptools:
849
850
851
852
853
854
855
856
857
858
 dpkg-buildpackage: info: host architecture amd64
  fakeroot debian/rules clean
 dh clean --with python3,sphinxdoc --buildsystem=pybuild
    dh_auto_clean -O--buildsystem=pybuild
 I: pybuild base:232: python3.9 setup.py clean
 Traceback (most recent call last):
   File "/<<PKGBUILDDIR>>/setup.py", line 2, in <module>
     from setuptools import setup
 ModuleNotFoundError: No module named 'setuptools'
 E: pybuild pybuild:353: clean: plugin distutils failed with: exit code=1: python3.9 setup.py clean
The buildlog consultant can identify the line in bold as being key, and interprets it:

 % analyse-sbuild-log --json ~/build.log
  
    "stage": "build",
    "section": "Build",
    "lineno": 857,
    "kind": "missing-python-module",
    "details":  "module": "setuptools", "python_version": 3, "minimum_version": null 
  
Automatically acting on buildlog problems A common reason why Debian builds fail is missing dependencies or incorrect versions of dependencies declared in the package build depends. Based on the output of the buildlog consultant, it is possible in many cases to determine what dependency needs to be added to Build-Depends. In the example given above, we can use apt-file to look for the package that contains the path /usr/lib/python3/dist-packages/setuptools/__init__.py - and voila, we find python3-setuptools:
 % apt-file search /usr/lib/python3/dist-packages/setuptools/__init__.py
 python3-setuptools: /usr/lib/python3/dist-packages/setuptools/__init__.py
The deb-fix-build command automates these steps:
  1. It builds the package using sbuild; if the package successfully builds then it just exits successfully
  2. It tries to identify the problem by looking through the build log; if it can t or if it s a problem it has seen before (but apparently failed to resolve), then it exits with a non-zero exit code
  3. It tries to find a dependency that can address the problem
  4. It updates Build-Depends in debian/control or Depends in debian/tests/control
  5. Go to step 1
This takes away the tedious manual process of building a package, discovering that a dependency is missing, updating Build-Depends and trying again. For example, when I ran deb-fix-build while packaging saneyaml, the output looks something like this:
 % deb-fix-build
 Using output directory /tmp/tmpyz0nkgqq
 Using sbuild chroot unstable-amd64-sbuild
 Using fixers:  
 Building debian packages, running 'sbuild --no-clean-source -A -s -v'.
 Attempting to use fixer upstream requirement fixer(apt) to address MissingPythonDistribution('setuptools_scm', python_version=3, minimum_version='4')
 Using apt-file to search apt contents
 Adding build dependency: python3-setuptools-scm (>= 4)
 Building debian packages, running 'sbuild --no-clean-source -A -s -v'.
 Attempting to use fixer upstream requirement fixer(apt) to address MissingPythonDistribution('toml', python_version=3, minimum_version=None)
 Adding build dependency: python3-toml
 Building debian packages, running 'sbuild --no-clean-source -A -s -v'.
 Built 0.5.2-1- changes files at [ saneyaml_0.5.2-1_amd64.changes ].
And in our Git repository, we see these changes as well:
% git log -p
 commit 5a1715f4c7273b042818fc75702f2284034c7277 (HEAD -> master)
 Author: Jelmer Vernoo  <jelmer@jelmer.uk>
 Date:   Sun Apr 4 02:35:56 2021 +0100
     Add missing build dependency on python3-toml.
 diff --git a/debian/control b/debian/control
 index 5b854dc..3b27b73 100644
 --- a/debian/control
 +++ b/debian/control
 @@ -1,6 +1,6 @@
  Rules-Requires-Root: no
  Standards-Version: 4.5.1
 -Build-Depends: debhelper-compat (= 12), dh-sequence-python3, python3-all, python3-setuptools (>= 50), python3-wheel, python3-setuptools-scm (>= 4)
 +Build-Depends: debhelper-compat (= 12), dh-sequence-python3, python3-all, python3-setuptools (>= 50), python3-wheel, python3-setuptools-scm (>= 4), python3-toml
  Testsuite: autopkgtest-pkg-python
  Source: python-saneyaml
  Priority: optional
 commit f03047da80fcd8468ee231fbc4cf8488d7a0acd1
 Author: Jelmer Vernoo  <jelmer@jelmer.uk>
 Date:   Sun Apr 4 02:35:34 2021 +0100
     Add missing build dependency on python3-setuptools-scm (>= 4).
 diff --git a/debian/control b/debian/control
 index a476cc2..5b854dc 100644
 --- a/debian/control
 +++ b/debian/control
 @@ -1,6 +1,6 @@
  Rules-Requires-Root: no
  Standards-Version: 4.5.1
 -Build-Depends: debhelper-compat (= 12), dh-sequence-python3, python3-all, python3-setuptools (>= 50), python3-wheel
 +Build-Depends: debhelper-compat (= 12), dh-sequence-python3, python3-all, python3-setuptools (>= 50), python3-wheel, python3-setuptools-scm (>= 4)
  Testsuite: autopkgtest-pkg-python
  Source: python-saneyaml
  Priority: optional
Using deb-fix-build You can run deb-fix-build by installing the ognibuild package from unstable. The only requirements for using it are that:
  • The package is maintained in Git
  • A sbuild schroot is available for use
Caveats deb-fix-build is fairly easy to understand, and if it doesn t work then you re no worse off than you were without it - you ll have to add your own Build-Depends. That said, there are a couple of things to keep in mind:
  • At the moment, it doesn t distinguish between general, Arch or Indep Build-Depends.
  • It can only add dependencies for things that are actually in the archive
  • Sometimes there are multiple packages that can provide a file, command or python package - it tries to find the right one with heuristics but doesn t always get it right

4 March 2021

Molly de Blanc: Vaccination

This is about why I decided to get vaccinated, and why that was a hard choice. Note: If you have the opportunity to get vaccinated, you should. This is good for public health. If you re worried about being a bad person by getting vaccinated now, you re probably not a bad person. This is my professional opinion as a bioethics graduate student. Anyway, onward. Not Great Reasons to Not Get Vaccinated Reason one: Other people need them more. There are people have a much higher risk of dying from COVID or having long term consequences. I don t want to get a vaccine at the expense of someone who has much worse projected outcomes. Reason two: I live a lowish risk life. I have a low/medium risk lifestyle. I go to the grocery store, but I don t do things like indoor dining. I have drinks with friends, outside, generally maintaining distance and trying to be polite and careful. I go on walks or sit in parks with friends. I have three people I see inside, and we don t see anyone else inside. Through my school, I am tested regularly though I am behind right now, I ll admit. I work from home, I take classes on my computer. My podmates also work from home. There are other people who live much higher risk lives and don t have a choice in the matter. They work outside of their homes, they are taking care of other people, they re incarcerated, their children go to school in-person. Those people need vaccines more than I do or at least I feel like that s the case. Even though I know that, e.g., parents won t be able to get vaccinated unless they otherwise qualify, I still feel like I d be doing them wrong by getting vaccinated first! Reason three: I don t want to deal with other people s judgement. When New Jersey allowed smokers to get vaccinated, wow, did people go off on how unfair that is. I ve seen the same rhetoric applied to other preexisting conditions/qualifications. Boo. Great Reasons to Get Vaccinated I had a few good conversations with friends I respect a lot. They convinced me that I should get vaccinated, in spite of my concerns. Reason one: I m scared of COVID. I actually find this the weakest of my reasons to get vaccinated: I m scared of COVID. I get migraines. I downplay how bad they are, because I know other people who have it worse, but they re terrible. They re debilitating. COVID can increase your risk of migraines, especially if you re already prone to them. They can last months. Boo. I m terrified of Long COVID. A part of my identity comes from doing things outside, and this past year without regularly swimming or going on bike trips or going up mountains has been really rough for me. For my own sake, I don t want to get sick. Reason two: I want to protect the people in my life. Being vaccinated is good for the people in my life. The current conversation I ve heard is that if you re vaccinated, you re probably less likely to spread COVID to those around you. That sounds great! I m not going to change my lifestyle anytime soon to be higher risk, but I like knowing that there s an even smaller chance I will become a disease vector. Reason three: Seriously, everyone should get vaccinated. Vaccinations are key to fighting COVID. I am not an epidemiologist (though I did once consider become an epistemologist). I m not going to pretend to be one. But they tell me that vaccines are really important, and the Intro to Public Health class I took agrees. We need to vaccinate everyone we can, everywhere in the world, in order to create the best outcomes. We don t want some vaccine-resistant COVID variant to show up somewhere because we were jerkfaces and prevented people from getting vaccinated. Medical professionals and experts I talked with told me to get vaccinated as soon as the opportunity arose. Maybe they said this because they like me, but I think they re also concerned about public health. So you re ready to get your vaccine! I m so excited for you! Sumana Harihareswara wrote this great blog post about getting vaccinated in New York City, though is probably relevant for New York State in general. Please check out your state s guidelines and maybe do a little research or creative thinking about what counts. This Twitter thread Sumana shared talked about ADHD as a qualifying condition under developmental and learning disorders. Your doctor might be super helpful! Your doctor might also not be helpful at all. When I talked to mine they didn t know much about the vaccine roll out plan, criteria, or procedures around proof of medical condition. Some vaccine sites also have waitlists for extra doses. A friend of mine is on one! For these, you generally don t have to meet the qualification criteria. These are doses left at the end of the day due to canceled appointments and things like that. A lot of states have useful Twitter bots and web sites. We have TurboVax. It s great. Big fan. These are usually appoints for the day of or the next day or two.

Next.

Previous.