Search Results: "toma"

16 January 2022

Wouter Verhelst: Backing up my home server with Bacula and Amazon Storage Gateway

I have a home server. Initially conceived and sized so I could digitize my (rather sizeable) DVD collection, I started using it for other things; I added a few play VMs on it, started using it as a destination for the deja-dup-based backups of my laptop and the time machine-based ones of the various macs in the house, and used it as the primary location of all the photos I've taken with my cameras over the years (currently taking up somewhere around 500G) as well as those that were taking at our wedding (another 100G). To add to that, I've copied the data that my wife had on various older laptops and external hard drives onto this home server as well, so that we don't lose the data should something happen to one or more of these bits of older hardware. Needless to say, the server was running full, so a few months ago I replaced the 4x2T hard drives that I originally put in the server with 4x6T ones, and there was much rejoicing. But then I started considering what I was doing. Originally, the intent was for the server to contain DVD rips of my collection; if I were to lose the server, I could always re-rip the collection and recover that way (unless something happened that caused me to lose both at the same time, of course, but I consider that sufficiently unlikely that I don't want to worry about it). Much of the new data on the server, however, cannot be recovered like that; if the server dies, I lose my photos forever, with no way of recovering them. Obviously that can't be okay. So I started looking at options to create backups of my data, preferably in ways that make it easily doable for me to automate the backups -- because backups that have to be initiated are backups that will be forgotten, and backups that are forgotten are backups that don't exist. So let's not try that. When I was still self-employed in Belgium and running a consultancy business, I sold a number of lower-end tape libraries for which I then configured bacula, and I preferred a solution that would be similar to that without costing an arm and a leg. I did have a look at a few second-hand tape libraries, but even second hand these are still way outside what I can budget for this kind of thing, so that was out too. After looking at a few solutions that seemed very hackish and would require quite a bit of handholding (which I don't think is a good idea), I remembered that a few years ago, I had a look at the Amazon Storage Gateway for a customer. This gateway provides a virtual tape library with 10 drives and 3200 slots (half of which are import/export slots) over iSCSI. The idea is that you install the VM on a local machine, you connect it to your Amazon account, you connect your backup software to it over iSCSI, and then it syncs the data that you write to Amazon S3, with the ability to archive data to S3 Glacier or S3 Glacier Deep Archive. I didn't end up using it at the time because it required a VMWare virtualization infrastructure (which I'm not interested in), but I found out that these days, they also provide VM images for Linux KVM-based virtual machines (amongst others), so that changes things significantly. After making a few calculations, I figured out that for the amount of data that I would need to back up, I would require a monthly budget of somewhere between 10 and 20 USD if the bulk of the data would be on S3 Glacier Deep Archive. This is well within my means, so I gave it a try. The VM's technical requirements state that you need to assign four vCPUs and 16GiB of RAM, which just so happens to be the exact amount of RAM and CPU that my physical home server has. Obviously we can't do that. I tried getting away with 4GiB and 2 vCPUs, but that didn't work; the backup failed out after about 500G out of 2T had been written, due to the VM running out of resources. On the VM's console I found complaints that it required more memory, and I saw it mention something in the vicinity of 7GiB instead, so I decided to try again, this time with 8GiB of RAM rather than 4. This worked, and the backup was successful. As far as bacula is concerned, the tape library is just a (very big...) normal tape library, and I got data throughput of about 30M/s while the VM's upload buffer hadn't run full yet, with things slowing down to pretty much my Internet line speed when it had. With those speeds, Bacula finished the backup successfully in "1 day 6 hours 43 mins 45 secs", although the storage gateway was still uploading things to S3 Glacier for a few hours after that. All in all, this seems like a viable backup solution for large(r) amounts of data, although I haven't yet tried to perform a restore.

14 January 2022

Dirk Eddelbuettel: Rcpp 1.0.8: Updated, Strict Headers

rcpp logo The Rcpp team is thrilled to share the news of the newest release 1.0.8 of Rcpp which hit CRAN today, and has already been uploaded to Debian as well. Windows and macOS builds should appear at CRAN in the next few days. This release continues with the six-months cycle started with release 1.0.5 in July 2020. As a reminder, interim dev or rc releases will alwasys be available in the Rcpp drat repo; this cycle there were once again seven (!!) times two as we also tested the modified header (more below). These rolling release tend to work just as well, and are also fully tested against all reverse-dependencies. Rcpp has become the most popular way of enhancing R with C or C++ code. Right now, around 2478 packages on CRAN depend on Rcpp for making analytical code go faster and further, along with 242 in BioConductor. This release finally brings a change we have worked on quite a bit over the last few months. The idea of enforcing the setting of STRICT_R_HEADERS was prososed years ago in 2016 and again in 2018. But making such a chance against a widely-deployed code base has repurcussions, and we were not ready then. Last April, this was revisited in issue #1158. Over the course of numerous lengthy runs of tests of a changed Rcpp package against (essentially) all reverse-dependencies (i.e. packages which use Rcpp) we identified ninetyfour packages in total which needed a change. We provided either a patch we emailed, or a GitHub pull request, to all ninetyfour. And we are happy to say that eighty cases were resolved via a new CRAN upload, with a seven more having merged the pull request but not yet uploaded. Hence, we could make the case to CRAN (who were always CC ed on the monthly nag emails we sent to maintainers of packages needing a change) that an upload was warranted. And after a brief period for their checks and inspection, our January 11 release of Rcpp 1.0.8 arrived on CRAN on January 13. So with that, a big and heartfelt Thank You! to all eighty maintainers for updating their packages to permit this change at the Rcpp end, to CRAN for the extra checking, and to everybody else who I bugged with the numerous emails and updated to the seemingly never-ending issue #1158. We all got this done, and that is a Good Thing (TM). Other than the aforementioned change which will not automatically set STRICT_R_HEADERS (unless opted out which one can), a number of nice pull request by a number of contributors are included in this release: The full list of details follows.

Changes in Rcpp release version 1.0.8 (2022-01-11)
  • Changes in Rcpp API:
    • STRICT_R_HEADERS is now enabled by default, see extensive discussion in #1158 closing #898.
    • A new #define allows default setting of finalizer calls for external pointers (I aki in #1180 closing #1108).
    • Rcpp:::CxxFlags() now quotes the include path generated, (Kevin in #1189 closing #1188).
    • New header files Rcpp/Light, Rcpp/Lighter, Rcpp/Lightest and default Rcpp/Rcpp for fine-grained access to features (and compilation time) (Dirk #1191 addressing #1168).
  • Changes in Rcpp Attributes:
    • A new option signature allows customization of function signatures (Travers Ching in #1184 and #1187 fixing #1182)
  • Changes in Rcpp Documentation:
    • The Rcpp FAQ has a new entry on how not to grow a vector (Dirk in #1167).
    • Some long-spurious calls to RNGSope have been removed from examples (Dirk in #1173 closing #1172).
    • DOI reference in the bibtex files have been updated per JSS request (Dirk in #1186).
  • Changes in Rcpp Deployment:
    • Some continuous integration components have been updated (Dirk in #1174, #1181, and #1190).

Thanks to my CRANberries, you can also look at a diff to the previous release. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. Bugs reports are welcome at the GitHub issue tracker as well (where one can also search among open or closed issues); questions are also welcome under rcpp tag at StackOverflow which also allows searching among the (currently) 2822 previous questions. If you like this or other open-source work I do, you can sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

8 January 2022

John Goerzen: Make the Internet Yours Again With an Instant Mesh Network

I m going to lead with the technical punch line, and then explain it: Yggdrasil Network is an opportunistic mesh that can be deployed privately or as part of a global-scale network. Each node gets a stable IPv6 address (or even an entire /64) that is derived from its public key and is bound to that node as long as the node wants it (of course, it can generate a new keypair anytime) and is valid wherever the node joins the mesh. All traffic is end-to-end encrypted. Yggdrasil will automatically discover peers on a LAN via broadcast beacons, and requires zero configuration to peer in such a way. It can also run as an overlay network atop the public Internet. Public peers serve as places to join the global network, and since it s a mesh, if one device on your LAN joins the global network, the others will automatically have visibility on it also, thanks to the mesh routing. It neatly solves a lot of problems of portability (my ssh sessions stay live as I move networks, for instance), VPN (incoming ports aren t required since local nodes can connect to a public peer via an outbound connection), security, and so forth. Now on to the explanation: The Tyranny of IP rigidity Every device on the Internet, at one time, had its own globally-unique IP address. This number was its identifier to the world; with an IP address, you can connect to any machine anywhere. Even now, when you connect to a computer to download a webpage or send a message, under the hood, your computer is talking to the other one by IP address. Only, now it s hard to get one. The Internet protocol we all grew up with, version 4 (IPv4), didn t have enough addresses for the explosive growth we ve seen. Internet providers and IT departments had to use a trick called NAT (Network Address Translation) to give you a sort of fake IP address, so they could put hundreds or thousands of devices behind a single public one. That, plus the mobility of devices changing IPs whenever they change locations has meant that a fundamental rule of the old Internet is now broken: Every participant is an equal peer. (Well, not any more.) Nowadays, you can t you host your own website from your phone. Or share files from your house. (Without, that is, the use of some third-party service that locks you down and acts as an intermediary.) Back in the 90s, I worked at a university, and I, like every other employee, had a PC on my desk with an unfirewalled public IP. I installed a webserver, and poof instant website. Nowadays, running a website from home is just about impossible. You may not have a public IP, and if you do, it likely changes from time to time. And even then, your ISP probably blocks you from running servers on it. In short, you have to buy your way into the resources to participate on the Internet. I wrote about these problems in more detail in my article Recovering Our Lost Free Will Online. Enter Yggdrasil I already gave away the punch line at the top. But what does all that mean?
  • Every device that participates gets an IP address that is fully live on the Yggdrasil network.
  • You can host a website, or a mail server, or whatever you like with your Yggdrasil IP.
  • Encryption and authentication are smaller (though not nonexistent) worries thanks to the built-in end-to-end encryption.
  • You can travel the globe, and your IP will follow you: onto a plane, from continent to continent, wherever. Yggdrasil will find you.
  • I ve set up /etc/hosts on my laptop to use the Yggdrasil IPs for other machines on my LAN. Now I can just ssh foo and it will work from home, from a coffee shop, from a 4G tether, wherever. Now, other tools like tinc can do this, obviously. And I could stop there; I could have a completely closed, private Yggdrasil network. Or, I can join the global Yggdrasil network. Each device, in addition to accepting peers it finds on the LAN, can also be configured to establish outbound peering connections or accept inbound ones over the Internet. Put a public peer or two in your configuration and you ve joined the global network. Most people will probably want to do that on every device (because why not?), but you could also do that from just one device on your LAN. Again, there s no need to explicitly build routes via it; your other machines on the LAN will discover the route s existence and use it. This is one of many projects that are working to democratize and decentralize the Internet. So far, it has been quite successful, growing to over 2000 nodes. It is the direct successor to the earlier cjdns/Hyperboria and BATMAN networks, and aims to be a proof of concept and a viable tool for global expansion. Finally, think about how much easier development is when you don t have to necessarily worry about TLS complexity in every single application. When you don t have to worry about port forwarding and firewall penetration. It s what the Internet should be.

    5 January 2022

    Reproducible Builds: Reproducible Builds in December 2021

    Welcome to the December 2021 report from the Reproducible Builds project! In these reports, we try and summarise what we have been up to over the past month, as well as what else has been occurring in the world of software supply-chain security. As a quick recap of what reproducible builds is trying to address, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. As always, if you would like to contribute to the project, please get in touch with us directly or visit the Contribute page on our website.
    Early in December, Julien Voisin blogged about setting up a rebuilderd instance in order to reproduce Tails images. Working on previous work from 2018, Julien has now set up a public-facing instance which is providing build attestations. As Julien dryly notes in his post, Currently, this isn t really super-useful to anyone, except maybe some Tails developers who want to check that the release manager didn t backdoor the released image. Naturally, we would contend sincerely that this is indeed useful.
    The secure/anonymous Tor browser now supports reproducible source releases. According to the project s changelog, version of Tor can now build reproducible tarballs via the make dist-reprod command. This issue was tracked via Tor issue #26299.
    Fabian Keil posted a question to our mailing list this month asking how they might analyse differences in images produced with the FreeBSD and ElectroBSD s mkimg and makefs commands:
    After rebasing ElectroBSD from FreeBSD stable/11 to stable/12
    I recently noticed that the "memstick" images are unfortunately
    still not 100% reproducible.
    Fabian s original post generated a short back-and-forth with Chris Lamb regarding how diffoscope might be able to support the particular format of images generated by this command set.

    diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb prepared and uploading versions 195, 196, 197 and 198 to Debian, as well as made the following changes:
    • Support showing Ordering differences only within .dsc field values. [ ]
    • Add support for XMLb files. [ ]
    • Also add, for example, /usr/lib/x86_64-linux-gnu to our local binary search path. [ ]
    • Support OCaml versions 4.11, 4.12 and 4.13. [ ]
    • Drop some unnecessary has_same_content_as logging calls. [ ]
    • Replace token variable with an anonymously-named variable instead to remove extra lines. [ ]
    • Don t use the runtime platform s native endianness when unpacking .pyc files. This fixes test failures on big-endian machines. [ ]
    Mattia Rizzolo also made a number of changes to diffoscope this month as well, such as:
    • Also recognize GnuCash files as XML. [ ]
    • Support the pgpdump PGP packet visualiser version 0.34. [ ]
    • Ignore the new Lintian tag binary-with-bad-dynamic-table. [ ]
    • Fix the Enhances field in debian/control. [ ]
    Finally, Brent Spillner fixed the version detection for Black uncompromising code formatter [ ], Jelle van der Waa added an external tool reference for Arch Linux [ ] and Roland Clobus added support for reporting when the GNU_BUILD_ID field has been modified [ ]. Thank you for your contributions!

    Distribution work In Debian this month, 70 reviews of packages were added, 27 were updated and 41 were removed, adding to our database of knowledge about specific issues. A number of issue types were created as well, including: strip-nondeterminism version 1.13.0-1 was uploaded to Debian unstable by Holger Levsen. It included contributions already covered in previous months as well as new ones from Mattia Rizzolo, particularly that the dh_strip_nondeterminism Debian integration interface uses the new get_non_binnmu_date_epoch() utility when available: this is important to ensure that strip-nondeterminism does not break some kinds of binNMUs.
    In the world of openSUSE, however, Bernhard M. Wiedemann posted his monthly reproducible builds status report.
    In NixOS, work towards the longer-term goal of making the graphical installation image reproducible is ongoing. For example, Artturin made the gnome-desktop package reproducible.

    Upstream patches The Reproducible Builds project attempts to fix as many currently-unreproducible packages as possible. In December, we wrote a large number of such patches, including:

    Testing framework The Reproducible Builds project runs a significant testing framework at, to check packages and other artifacts for reproducibility. This month, the following changes were made:
    • Holger Levsen:
      • Run the Debian scheduler less often. [ ]
      • Fix the name of the Debian testing suite name. [ ]
      • Detect builds that are rescheduling due to problems with the diffoscope container. [ ]
      • No longer special-case particular machines having a different /boot partition size. [ ]
      • Automatically fix failed apt-daily and apt-daily-upgrade services [ ], failed e2scrub_all.service & user@ systemd units [ ][ ] as well as generic build failures [ ].
      • Simplify a script to powercycle arm64 architecture nodes hosted at/by [ ]
      • Detect if the service is down. [ ]
      • Various miscellaneous node maintenance. [ ][ ]
    • Roland Clobus (Debian live image generation):
      • If the latest snapshot is not complete yet, try to use the previous snapshot instead. [ ]
      • Minor: whitespace correction + comment correction. [ ]
      • Use unique folders and reports for each Debian version. [ ]
      • Turn off debugging. [ ]
      • Add a better error description for incorrect/missing arguments. [ ]
      • Report non-reproducible issues in Debian sid images. [ ]
    Lastly, Mattia Rizzolo updated the automatic logfile parsing rules in a number of ways (eg. to ignore a warning about the Python setuptools deprecation) [ ][ ] and Vagrant Cascadian adjusted the config for the Squid caching proxy on a node. [ ]

    If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

    4 January 2022

    Jonathan McDowell: Upgrading from a CC2531 to a CC2538 Zigbee coordinator

    Previously I setup a CC2531 as a Zigbee coordinator for my home automation. This has turned out to be a good move, with the 4 gang wireless switch being particularly useful. However the range of the CC2531 is fairly poor; it has a simple PCB antenna. It s also a very basic device. I set about trying to improve the range and scalability and settled upon a CC2538 + CC2592 device, which feature an MMCX antenna connector. This device also has the advantage that it s ARM based, which I m hopeful means I might be able to build some firmware myself using a standard GCC toolchain. For now I fetched the JetHome firmware from (JH_2538_2592_ZNP_UART_20211222.hex) - while it s possible to do USB directly with the CC2538 my board doesn t have those bits so going the external USB UART route is easier. The device had some existing firmware on it, so I needed to erase this to force a drop into the boot loader. That means soldering up the JTAG pins and hooking it up to my Bus Pirate for OpenOCD goodness.
    OpenOCD config
    source [find interface/buspirate.cfg]
    buspirate_port /dev/ttyUSB1
    buspirate_mode normal
    buspirate_vreg 1
    buspirate_pullup 0
    transport select jtag
    source [find target/cc2538.cfg]
    Steps to erase
    $ telnet localhost 4444
    Trying ::1...
    Connected to localhost.
    Escape character is '^]'.
    Open On-Chip Debugger
    > mww 0x400D300C 0x7F800
    > mww 0x400D3008 0x0205
    > shutdown
    shutdown command invoked
    Connection closed by foreign host.
    At that point I can switch to the UART connection (on PA0 + PA1) and flash using cc2538-bsl:
    $ git clone
    $ cc2538-bsl/ -p /dev/ttyUSB1 -e -w -v ~/JH_2538_2592_ZNP_UART_20211222.hex
    Opening port /dev/ttyUSB1, baud 500000
    Reading data from /home/noodles/JH_2538_2592_ZNP_UART_20211222.hex
    Firmware file: Intel Hex
    Connecting to target...
    CC2538 PG2.0: 512KB Flash, 32KB SRAM, CCFG at 0x0027FFD4
    Primary IEEE Address: 00:12:4B:00:22:22:22:22
        Performing mass erase
    Erasing 524288 bytes starting at address 0x00200000
        Erase done
    Writing 524256 bytes starting at address 0x00200000
    Write 232 bytes at 0x0027FEF88
        Write done
    Verifying by comparing CRC32 calculations.
        Verified (match: 0x74f2b0a1)
    I then wanted to migrate from the old device to the new without having to repair everything. So I shut down Home Assistant and backed up the CC2531 network information using zigpy-znp (which is already installed for Home Assistant):
    python3 -m /dev/zigbee > cc2531-network.json
    I copied the backup to cc2538-network.json and modified the coordinator_ieee to be the new device s MAC address (rather than end up with 2 devices claiming the same MAC if/when I reuse the CC2531) and did:
    python3 -m --input cc2538-network.json /dev/ttyUSB1
    The old CC2531 needed unplugged first, otherwise I got an RuntimeError: Network formation refused, RF environment is likely too noisy. Temporarily unscrew the antenna or shield the coordinator with metal until a network is formed. error. After that I updated my udev rules to map the CC2538 to /dev/zigbee and restarted Home Assistant. To my surprise it came up and detected the existing devices without any extra effort on my part. However that resulted in 2 coordinators being shown in the visualisation, with the old one turning up as unk_manufacturer. Fixing that involved editing /etc/homeassistant/.storage/core.device_registry and removing the entry which had the old MAC address, removing the device entry in /etc/homeassistant/.storage/ for the old MAC and then finally firing up sqlite to modify the Zigbee database:
    $ sqlite3 /etc/homeassistant/zigbee.db
    SQLite version 3.34.1 2021-01-20 14:10:07
    Enter ".help" for usage hints.
    sqlite> DELETE FROM devices_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
    sqlite> DELETE FROM endpoints_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
    sqlite> DELETE FROM in_clusters_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
    sqlite> DELETE FROM neighbors_v6 WHERE ieee = '00:12:4b:00:11:11:11:11' OR device_ieee = '00:12:4b:00:11:11:11:11';
    sqlite> DELETE FROM node_descriptors_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
    sqlite> DELETE FROM out_clusters_v6 WHERE ieee = '00:12:4b:00:11:11:11:11';
    sqlite> .quit
    So far it all seems a bit happier than with the CC2531; I ve been able to pair a light bulb that was previously detected but would not integrate, which suggests the range is improved. (This post another in the set of things I should write down so I can just grep my own website when I forget what I did to do foo .)

    3 January 2022

    Ian Jackson: Debian s approach to Rust - Dependency handling

    tl;dr: Faithfully following upstream semver, in Debian package dependencies, is a bad idea. Introduction I have been involved in Debian for a very long time. And I ve been working with Rust for a few years now. Late last year I had cause to try to work on Rust things within Debian. When I did, I found it very difficult. The Debian Rust Team were very helpful. However, the workflow and tooling require very large amounts of manual clerical work - work which it is almost impossible to do correctly since the information required does not exist. I had wanted to package a fairly straightforward program I had written in Rust, partly as a learning exercise. But, unfortunately, after I got stuck in, it looked to me like the effort would be wildly greater than I was prepared for, so I gave up. Since then I ve been thinking about what I learned about how Rust is packaged in Debian. I think I can see how to fix some of the problems. Although I don t want to go charging in and try to tell everyone how to do things, I felt I ought at least to write up my ideas. Hence this blog post, which may become the first of a series. This post is going to be about semver handling. I see problems with other aspects of dependency handling and source code management and traceability as well, and of course if my ideas find favour in principle, there are a lot of details that need to be worked out, including some kind of transition plan. How Debian packages Rust, and build vs runtime dependencies Today I will be discussing almost entirely build-dependencies; Rust doesn t (yet?) support dynamic linking, so built Rust binaries don t have Rusty dependencies. However, things are a bit confusing because even the Debian binary packages for Rust libraries contain pure source code. So for a Rust library package, building the Debian binary package from the Debian source package does not involve running the Rust compiler; it s just file-copying and format conversion. The library s Rust dependencies do not need to be installed on the build machine for this. So I m mostly going to be talking about Depends fields, which are Debian s way of talking about runtime dependencies, even though they are used only at build-time. The way this works is that some ultimate leaf package (which is supposed to produce actual executable code) Build-Depends on the libraries it needs, and those Depends on their under-libraries, so that everything needed is installed. What do dependencies mean and what are they for anyway? In systems where packages declare dependencies on other packages, it generally becomes necessary to support versioned dependencies. In all but the most simple systems, this involves an ordering (or similar) on version numbers and a way for a package A to specify that it depends on certain versions of B. Both Debian and Rust have this. Rust upstream crates have version numbers and can specify their dependencies according to semver. Debian s dependency system can represent that. So it was natural for the designers of the scheme for packaging Rust code in Debian to simply translate the Rust version dependencies to Debian ones. However, while the two dependency schemes seem equivalent in the abstract, their concrete real-world semantics are totally different. These different package management systems have different practices and different meanings for dependencies. (Interestingly, the Python world also has debates about the meaning and proper use of dependency versions.) The epistemological problem Consider some package A which is known to depend on B. In general, it is not trivial to know which versions of B will be satisfactory. I.e., whether a new B, with potentially-breaking changes, will actually break A. Sometimes tooling can be used which calculates this (eg, the Debian shlibdeps system for runtime dependencies) but this is unusual - especially for build-time dependencies. Which versions of B are OK can normally only be discovered by a human consideration of changelogs etc., or by having a computer try particular combinations. Few ecosystems with dependencies, in the Free Software community at least, make an attempt to precisely calculate the versions of B that are actually required to build some A. So it turns out that there are three cases for a particular combination of A and B: it is believed to work; it is known not to work; and: it is not known whether it will work. And, I am not aware of any dependency system that has an explicit machine-readable representation for the unknown state, so that they can say something like A is known to depend on B; versions of B before v1 are known to break; version v2 is known to work . (Sometimes statements like that can be found in human-readable docs.) That leaves two possibilities for the semantics of a dependency A depends B, version(s) V..W: Precise: A will definitely work if B matches V..W, and Optimistic: We have no reason to think B breaks with any of V..W. At first sight the latter does not seem useful, since how would the package manager find a working combination? Taking Debian as an example, which uses optimistic version dependencies, the answer is as follows: The primary information about what package versions to use is not only the dependencies, but mostly in which Debian release is being targeted. (Other systems using optimistic version dependencies could use the date of the build, i.e. use only packages that are current .)
    Precise Optimistic
    People involved in version management Package developers,
    downstream developers/users.
    Package developers,
    downstream developer/users,
    distribution QA and release managers.
    Package developers declare versions V and dependency ranges V..W so that It definitely works. A wide range of B can satisfy the declared requirement.
    The principal version data used by the package manager Only dependency versions. Contextual, eg, Releases - set(s) of packages available.
    Version dependencies are for Selecting working combinations (out of all that ever existed). Sequencing (ordering) of updates; QA.
    Expected use pattern by a downstream Downstream can combine any
    declared-good combination.
    Use a particular release of the whole system. Mixing-and-matching requires additional QA and remedial work.
    Downstreams are protected from breakage by Pessimistically updating versions and dependencies whenever anything might go wrong. Whole-release QA.
    A substantial deployment will typically contain Multiple versions of many packages. A single version of each package, except where there are actual incompatibilities which are too hard to fix.
    Package updates are driven by Top-down:
    Depending package updates the declared metadata.
    Depended-on package is updated in the repository for the work-in-progress release.
    So, while Rust and Debian have systems that look superficially similar, they contain fundamentally different kinds of information. Simply representing the Rust versions directly into Debian doesn t work. What is currently done by the Debian Rust Team is to manually patch the dependency specifications, to relax them. This is very labour-intensive, and there is little automation supporting either decisionmaking or actually applying the resulting changes. What to do Desired end goal To update a Rust package in Debian, that many things depend on, one need simply update that package. Debian s sophisticated build and CI infrastructure will try building all the reverse-dependencies against the new version. Packages that actually fail against the new dependency are flagged as suffering from release-critical problems. Debian Rust developers then update those other packages too. If the problems turn out to be too difficult, it is possible to roll back. If a problem with a depending packages is not resolved in a timely fashion, priority is given to updating core packages, and the depending package falls by the wayside (since it is empirically unmaintainable, given available effort). There is no routine manual patching of dependency metadata (or of anything else). Radical proposal Debian should not precisely follow upstream Rust semver dependency information. Instead, Debian should optimistically try the combinations of packages that we want to have. The resulting breakages will be discovered by automated QA; they will have to be fixed by manual intervention of some kind, but usually, simply updating the depending package will be sufficient. This no longer ensures (unlike the upstream Rust scheme) that the result is expected to build and work if the dependencies are satisfied. But as discussed, we don t really need that property in Debian. More important is the new property we gain: that we are able to mix and match versions that we find work in practice, without a great deal of manual effort. Or to put it another way, in Debian we should do as a Rust upstream maintainer does when they do the regular update dependencies for new semvers task: we should update everything, see what breaks, and fix those. (In theory a Rust upstream package maintainer is supposed to do some additional checks or something. But the practices are not standardised and any checks one does almost never reveal anything untoward, so in practice I think many Rust upstreams just update and see what happens. The Rust upstream community has other mechanisms - often, reactive ones - to deal with any problems. Debian should subscribe to those same information sources, eg RustSec.) Nobbling cargo Somehow, when cargo is run to build Rust things against these Debian packages, cargo s dependency system will have to be overridden so that the version of the package that is actually selected by Debian s package manager is used by cargo without complaint. We probably don t want to change the Rust version numbers of Debian Rust library packages, so this should be done by either presenting cargo with an automatically-massaged Cargo.toml where the dependency version restrictions are relaxed, or by using a modified version of cargo which has special option(s) to relax certain dependencies. Handling breakage Rust packages in Debian should already be provided with autopkgtests so that will detect build breakages. Build breakages will stop the updated dependency from migrating to the work-in-progress release, Debian testing. To resolve this, and allow forward progress, we will usually upload a new version of the dependency containing an appropriate Breaks, and either file an RC bug against the depending package, or update it. This can be done after the upload of the base package. Thus, resolution of breakage due to incompatibilities will be done collaboratively within the Debian archive, rather than ad-hoc locally. And it can be done without blocking. My proposal prioritises the ability to make progress in the core, over stability and in particular over retaining leaf packages. This is not Debian s usual approach but given the Rust ecosystem s practical attitudes to API design, versioning, etc., I think the instability will be manageable. In practice fixing leaf packages is not usually really that hard, but it s still work and the question is what happens if the work doesn t get done. After all we are always a shortage of effort - and we probably still will be, even if we get rid of the makework clerical work of patching dependency versions everywhere (so that usually no work is needed on depending packages). Exceptions to the one-version rule There will have to be some packages that we need to keep multiple versions of. We won t want to update every depending package manually when this happens. Instead, we ll probably want to set a version number split: rdepends which want version <X will get the old one. Details - a sketch I m going to sketch out some of the details of a scheme I think would work. But I haven t thought this through fully. This is still mostly at the handwaving stage. If my ideas find favour, we ll have to do some detailed review and consider a whole bunch of edge cases I m glossing over. The dependency specification consists of two halves: the depending .deb s Depends (or, for a leaf package, Build-Depends) and the base .deb Version and perhaps Breaks and Provides. Even though libraries vastly outnumber leaf packages, we still want to avoid updating leaf Debian source packages simply to bump dependencies. Dependency encoding proposal Compared to the existing scheme, I suggest we implement the dependency relaxation by changing the depended-on package, rather than the depending one. So we retain roughly the existing semver translation for Depends fields. But we drop all local patching of dependency versions. Into every library source package we insert a new Debian-specific metadata file declaring the earliest version that we uploaded. When we translate a library source package to a .deb, the binary package build adds Provides for every previous version. The effect is that when one updates a base package, the usual behaviour is to simply try to use it to satisfy everything that depends on that base package. The Debian CI will report the build or test failures of all the depending packages which the API changes broke. We will have a choice, then: Breakage handling - update broken depending packages individually If there are only a few packages that are broken, for each broken dependency, we add an appropriate Breaks to the base binary package. (The version field in the Breaks should be chosen narrowly, so that it is possible to resolve it without changing the major version of the dependency, eg by making a minor source change.) When can then do one of the following: Breakage handling - declare new incompatible API in Debian If the API changes are widespread and many dependencies are affected, we should represent this by changing the in-Debian-source-package metadata to arrange for fewer Provides lines to be generated - withdrawing the Provides lines for earlier APIs. Hopefully examination of the upstream changelog will show what the main compat break is, and therefore tell us which Provides we still want to retain. This is like declaring Breaks for all the rdepends. We should do it if many rdepends are affected. Then, for each rdependency, we must choose one of the responses in the bullet points above. In practice this will often be a mass bug filing campaign, or large update campaign. Breakage handling - multiple versions Sometimes there will be a big API rewrite in some package, and we can t easily update all of the rdependencies because the upstream ecosystem is fragmented and the work involved in reconciling it all is too substantial. When this happens we will bite the bullet and include multiple versions of the base package in Debian. The old version will become a new source package with a version number in its name. This is analogous to how key C/C++ libraries are handled. Downsides of this scheme The first obvious downside is that assembling some arbitrary set of Debian Rust library packages, that satisfy the dependencies declared by Debian, is no longer necessarily going to work. The combinations that Debian has tested - Debian releases - will work, though. And at least, any breakage will affect only people building Rust code using Debian-supplied libraries. Another less obvious problem is that because there is no such thing as Build-Breaks (in a Debian binary package), the per-package update scheme may result in no way to declare that a particular library update breaks the build of a particular leaf package. In other words, old source packages might no longer build when exposed to newer versions of their build-dependencies, taken from a newer Debian release. This is a thing that already happens in Debian, with source packages in other languages, though. Semver violation I am proposing that Debian should routinely compile Rust packages against dependencies in violation of the declared semver, and ship the results to Debian s millions of users. This sounds quite alarming! But I think it will not in fact lead to shipping bad binaries, for the following reasons: The Rust community strongly values safety (in a broad sense) in its APIs. An API which is merely capable of insecure (or other seriously bad) use is generally considered to be wrong. For example, such situations are regarded as vulnerabilities by the RustSec project, even if there is no suggestion that any actually-broken caller source code exists, let alone that actually-broken compiled code is likely. The Rust community also values alerting programmers to problems. Nontrivial semantic changes to APIs are typically accompanied not merely by a semver bump, but also by changes to names or types, precisely to ensure that broken combinations of code do not compile. Or to look at it another way, in Debian we would simply be doing what many Rust upstream developers routinely do: bump the versions of their dependencies, and throw it at the wall and hope it sticks. We can mitigate the risks the same way a Rust upstream maintainer would: when updating a package we should of course review the upstream changelog for any gotchas. We should look at RustSec and other upstream ecosystem tracking and authorship information. Difficulties for another day As I said, I see some other issues with Rust in Debian. Thanks all for your attention!

    comment count unavailable comments

    30 December 2021

    Chris Lamb: Favourite books of 2021: Non-fiction

    As a follow-up to yesterday's post listing my favourite memoirs and biographies I read in 2021, today I'll be outlining my favourite works of non-fiction. Books that just missed the cut include: The Unusual Suspect by Ben Machell for its thrilleresque narrative of a modern-day Robin Hood (and if you get to the end, a completely unexpected twist); Paul Fussell's Class: A Guide to the American Status System as an amusing chaser of sorts to Kate Fox's Watching the English; John Carey's Little History of Poetry for its exhilarating summation of almost four millennia of verse; David Graeber's Debt: The First 5000 Years for numerous historical insights, not least its rejoinder to our dangerously misleading view of ancient barter systems; and, although I didn't treasure everything about it, I won't hesitate to gift Pen Vogler's Scoff to a number of friends over the next year. The weakest book of non-fiction I read this year was undoubtedly Roger Scruton's How to Be a Conservative: I much preferred The Decadent Society for Ross Douthat for my yearly ration of the 'intellectual right'. I also very much enjoyed reading a number of classic texts from academic sociology, but they are difficult to recommend or even summarise. These included One-Dimensional Man by Herbert Marcuse, Postmodernism: Or, the Cultural Logic of Late Capitalism by Frederic Jameson and The Protestant Ethic and the Spirit of Capitalism by Max Weber. 'These are heavy books', remarks John Proctor in Arthur Miller's The Crucible... All round-up posts for 2021: Memoir/biography, Non-fiction (this post) & Fiction (coming soon).

    Hidden Valley Road (2020) Robert Kolker A compelling and disturbing account of the Galvin family six of whom were diagnosed with schizophrenia which details a journey through the study and misunderstanding of the condition. The story of the Galvin family offers a parallel history of the science of schizophrenia itself, from the era of institutionalisation, lobotomies and the 'schizo mother', to the contemporary search for genetic markers for the disease... all amidst fundamental disagreements about the nature of schizophrenia and, indeed, of all illnesses of the mind. Samples of the Galvins' DNA informed decades of research which, curiously, continues to this day, potentially offering paths to treatment, prediction and even eradication of the disease, although on this last point I fancy that I detect a kind of neo-Victorian hubris that we alone will be the ones to find a cure. Either way, a gentle yet ultimately tragic view of a curiously 'American' family, where the inherent lack of narrative satisfaction brings a frustration and sadness of its own.

    Islands of Abandonment: Life in the Post-Human Landscape (2021) Cat Flyn In this disarmingly lyrical book, Cat Flyn addresses the twin questions of what happens after humans are gone and how far can our damage to nature be undone. From the forbidden areas of post-war France to the mining regions of Scotland, Islands of Abandonment explores the extraordinary places where humans no longer live in an attempt to give us a glimpse into what happens when mankind's impact on nature is, for one reason or another, forced to stop. Needless to say, if anxieties in this area are not curdling away in your subconscious mind, you are probably in some kind of denial. Through a journey into desolate, eerie and ravaged areas in the world, this artfully-written study offers profound insights into human nature, eschewing the usual dry sawdust of Wikipedia trivia. Indeed, I summed it up to a close friend remarking that, through some kind of hilarious administrative error, the book's publisher accidentally dispatched a poet instead of a scientist to write this book. With glimmers of hope within the (mostly) tragic travelogue, Islands of Abandonment is not only a compelling read, but also a fascinating insight into the relationship between Nature and Man.

    The Anatomy of Fascism (2004) Robert O. Paxton Everyone is absolutely sure they know what fascism is... or at least they feel confident choosing from a buffet of features to suit the political mood. To be sure, this is not a new phenomenon: even as 'early' as 1946, George Orwell complained in Politics and the English Language that the word Fascism has now no meaning except in so far as it signifies something not desirable . Still, it has proved uncommonly hard to define the core nature of fascism and what differentiates it from related political movements. This is still of great significance in the twenty-first century, for the definition ultimately determines where the powerful label of 'fascist' can be applied today. Part of the enjoyment of reading this book was having my own cosy definition thoroughly dismantled and replaced with a robust system of abstractions and common themes. This is achieved through a study of the intellectual origins of fascism and how it played out in the streets of Berlin, Rome and Paris. Moreover, unlike Strongmen (see above), fascisms that failed to gain meaningful power are analysed too, including Oswald Mosley's British Union of Fascists. Curiously enough, Paxton's own definition of fascism is left to the final chapter, and by the time you reach it, you get an anti-climatic feeling of it being redundant. Indeed, whatever it actually is, fascism is really not quite like any other 'isms' at all, so to try and classify it like one might be a mistake. In his introduction, Paxton warns that many of those infamous images associated with fascism (eg. Hitler in Triumph of the Will, Mussolini speaking from a balcony, etc.) have the ability to induce facile errors about the fascist leader and the apparent compliance of the crowd. (Contemporary accounts often record how sceptical the common man was of the leader's political message, even if they were transfixed by their oratorical bombast.) As it happens, I thus believe I had something of an advantage of reading this via an audiobook, and completely avoided re-absorbing these iconic images. To me, this was an implicit reminder that, however you choose to reduce it to a definition, fascism is undoubtedly the most visual of all political forms, presenting itself to us in vivid and iconic primary images: ranks of disciplined marching youths, coloured-shirted militants beating up members of demonised minorities; the post-war pictures from the concentration camps... Still, regardless of you choose to read it, The Anatomy of Fascism is a powerful book that can teach a great deal about fascism in particular and history in general.

    What Good are the Arts? (2005) John Carey What Good are the Arts? takes a delightfully sceptical look at the nature of art, and cuts through the sanctimony and cant that inevitably surrounds them. It begins by revealing the flaws in lofty aesthetic theories and, along the way, debunks the claims that art makes us better people. They may certainly bring joy into your life, but by no means do the fine arts make you automatically virtuous. Carey also rejects the entire enterprise of separating things into things that are art and things that are not, making a thoroughly convincing case that there is no transcendental category containing so-called 'true' works of art. But what is perhaps equally important to what Carey is claiming is the way he does all this. As in, this is an extremely enjoyable book to read, with not only a fine sense of pace and language, but a devilish sense of humour as well. To be clear, What Good are the Arts? it is no crotchety monograph: Leo Tolstoy's *What Is Art? (1897) is hilarious to read in similar ways, but you can't avoid feeling its cantankerous tone holds Tolstoy's argument back. By contrast, Carey makes his argument in a playful sort of manner, in a way that made me slightly sad to read other polemics throughout the year. It's definitely not that modern genre of boomer jeremiad about the young, political correctness or, heaven forbid, 'cancel culture'... which, incidentally, made Carey's 2014 memoir, The Unexpected Professor something of a disappointing follow-up. Just for fun, Carey later undermines his own argument by arguing at length for the value of one art in particular. Literature, Carey asserts, is the only art capable of reasoning and the only art with the ability to criticise. Perhaps so, and Carey spends a chapter or so contending that fiction has the exclusive power to inspire the mind and move the heart towards practical ends... or at least far better than any work of conceptual art. Whilst reading this book I found myself taking down innumerable quotations and laughing at the jokes far more than I disagreed. And the sustained and intellectual style of polemic makes this a pretty strong candidate for my favourite overall book of the year.

    29 December 2021

    Noah Meyerhans: When You Could Hear Security Scans

    Have you ever wondered what a security probe of a computer sounded like? I d guess probably not, because on the face of it that doesn t make a whole lot of sense. But there was a time when I could very clearly discern the sound of a computer being scanned. It sounded like a small mechanical heart beat: Click-click click-click click-click Prior to 2010, I had a computer under my desk with what at the time were not unheard-of properties: Its storage was based on a stack of spinning metal platters (a now-antiquated device known as a hard drive ), and it had a publicly routable IPv4 address with an unfiltered connection to the Internet. Naturally it ran Linux and an ssh server. As was common in those days, service logging was handled by a syslog daemon. The syslog daemon would sort log messages based on various criteria and record them somewhere. In most simple environments, somewhere was simply a file on local storage. When writing to a local file, syslog daemons can be optionally configured to use the fsync() system call to ensure that writes are flushed to disk. Practically speaking, what this meant is that a page of disk-backed memory would be written to the disk as soon as an event occurred that triggered a log message. Because of potential performance implications, fsync() was not typically enabled for most log files. However, due to the more sensitive nature of authentication logs, it was often enabled for /var/log/auth.log. In the first decade of the 2000 s, there was a fairly unsophisticated worm loose on the Internet that would probe sshd with some common username/password combinations. The worm would pause for a second or so between login attempts, most likely in an effort to avoid automated security responses. The effect was that a system being probed by this worm would generate disk write every second, with a very distinct audible signature from the hard drive. I think this situation is a fun demonstration of a side-channel data leak. It s primitive and doesn t leak very much information, but it was certainly enough to make some inference about the state of the system in question. Of course, side-channel leakage issues have been a concern for ages, but I like this one for its simplicity. It was something that could be explained and demonstrated easily, even to somebody with relatively limited understanding of how computers work , unlike, for instance measuring electromagnetic emanations from CPU power management units. For a different take on the sounds of a computing infrastructure, Peep (The Network Auralizer) won an award at a USENIX conference long, long ago. I d love to see a modern deployment of such a system. I m sure you could build something for your cloud deployment using something like AWS EventBridge or Amazon SQS fairly easily. For more on research into actual real-world side-channel attacks, you can read A Survey of Microarchitectural Side-channel Vulnerabilities, Attacks and Defenses in Cryptography or A Survey of Electromagnetic Side-Channel Attacks and Discussion on their Case-Progressing Potential for Digital Forensics.

    27 December 2021

    Abiola Ajadi: Outreachy-Everyone Struggles!

    Hello Everyone! Three weeks into my internship and it s been great so far with Awesome Mentors. I am currently learning a new Language which is Ruby and this is the perfect time to remind myself that everyone struggles! I struggled a bit getting farmiliar with the codebase and pushing my first merge request during the internship. I won t say i have a perfect understanding of how everything works, but i am learning.

    What is Debci? Debci stands for Debian Continous Integration before i move on, what is continous integration? according to Atlassian Continuous integration (CI) is the practice of automating the integration of code changes from multiple contributors into a single software project. It s a primary DevOps best practice, allowing developers to frequently merge code changes into a central repository where builds and tests then run. Automated tools are used to assert the new code s correctness before integration. From the official documentation; The Debian continuous integration (debci) is an automated system that coordinates the execution of automated tests against packages in the Debian system. Debci exist to make sure packages work currently after an update, How it does this is by testing all of the packages that have tests written in them to make sure it works and nothing is broken. It has a UI for developers to see if the test passes or not.

    Progress This week; I have learnt how to write automated test and Squashing multiple commits into one single commit.

    24 December 2021

    Vincent Bernat: Automatic login with startx and systemd

    If your workstation is using full-disk encryption, you may want to jump directly to your desktop environment after entering the passphrase to decrypt the disk. Many display managers like GDM and LightDM have an autologin feature. However, only GDM can run Xorg with standard user privileges. Here is an alternative using startx and a systemd service:
    Description=X11 session for bernat systemd-user-sessions.service
    ExecStartPre=/usr/bin/chvt 8
    ExecStart=/usr/bin/startx -- vt8 -keeptty -verbose 3 -logfile /dev/null
    Let me explain each block: Drop this unit in /etc/systemd/system/x11-autologin.service and enable it with systemctl enable x11-autologin.service. Xorg is now running rootless and logging into the journal. After using it for a few months, I didn t notice any regression compared to LightDM with autologin.

    1. For more information on how logind provides access to devices, see this blog post. The method names do not match the current implementation, but the concepts are still correct. Xorg takes control of the session when the TTY is active.
    2. Xorg could change the type of the session itself after taking control of it, but it does not.
    3. There is some code in Xorg to do that, but it is executed too late and fails with: xf86OpenConsole: VT_ACTIVATE failed: Operation not permitted.

    20 December 2021

    Craig Small: WordPress 5.8.2 Debian packages

    After a bit of a delay, WordPress version 5.8.2 packages should be available now. This is a minor update from 5.8.1 which fixes two bugs but not the security bug. The security bug is due to WordPress shipping its own CA store, which is a list of certificates it trusts to sign for websites. Debian WordPress has used the system certificate store which uses /etc/ssl/certs/ca-certificates.crt for years so is not impacted by this change. That CA file is generated by update-ca-certificates and is part of the ca-certificates package. We have also had another go of tamping down the nagging WordPress does about updates, as you cannot use the automatic updates through WordPress but via the usual Debian system. I see we are not fully there as WordPress has a site health page that doesn t like things turned off. The two bugs fixed in 5.8.2 I ve not personally hit, but they might help someone out there. In any case, an update is always good. Next stop 5.9 The next planned release is in late January 2022. I m sure there will be a new default theme, but they are planning on making big changes around the blocks and styles to make it easier to customise the look.

    16 December 2021

    Dirk Eddelbuettel: RProtoBuf 0.4.18: Multiple Updates

    A new release 0.4.18 of RProtoBuf arrived on CRAN earlier today. RProtoBuf provides R with bindings for the Google Protocol Buffers ( ProtoBuf ) data encoding and serialization library used and released by Google, and deployed very widely in numerous projects as a language and operating-system agnostic protocol. This release, the first since March of last year, contains two contributed pull requests improving or extending the package, some internal maintance updating the CI setup as well as retiring an old-yet-unused stub interface for RPC, as well as an update for UCRT builds on Windows. The following section from the NEWS.Rd file has more details.

    Changes in RProtoBuf version 0.4.18 (2021-12-15)
    • Support string_view in FindMethodByName() (Adam Cozzette in #72).
    • CI use was updated first at Travis, later at GitHub and now uses r-ci (Dirk in #74 and (parts of) #76).
    • The (to the best of our knowledge) unused minimal RPC mechanism has been removed, retiring one method and one class as well as the import of the RCurl package (Dirk in #76).
    • The toJSON() method supports two (upstream) formatting toggles (Vitali Spinu in #79 with minor edit by Dirk).
    • Windows UCRT builds are now supported (Jeroen in #81, Dirk and Tomas Kalibera in #82).

    Thanks to my CRANberries, there is a diff to the previous release. The RProtoBuf page has copies of the (older) package vignette, the quick overview vignette, and the pre-print of our JSS paper. Questions, comments etc should go to the GitHub issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

    This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

    15 December 2021

    Dirk Eddelbuettel: nanotime 0.3.5 on CRAN: Update

    Another (minor) nanotime release, now at version 0.3.5, just arrived at CRAN. It follows the updates RDieHarder 0.2.3 and RcppCCTZ 0.2.10 earlier today in bringing a patch kindly prepared by Tomas Kalibera for the upcoming (and very useful) UCRT changes for Windows involving small build changes for the updated Windows toolchain. nanotime relies on the RcppCCTZ package for (efficient) high(er) resolution time parsing and formatting up to nanosecond resolution, and the bit64 package for the actual integer64 arithmetic. Initially implemented using the S3 system, it has benefitted greatly from a rigorous refactoring by Leonardo who not only rejigged nanotime internals in S4 but also added new S4 types for periods, intervals and durations. The NEWS snippet adds more details.

    Changes in version 0.3.5 (2021-12-14)
    • Applied patch by Tomas Kalibera for Windows UCRT under the upcoming R 4.2.0 expected for April.

    Thanks to my CRANberries there is also a diff to the previous version. More details and examples are at the nanotime page; code, issue tickets etc at the GitHub repository. If you like this or other open-source work I do, you can now sponsor me at GitHub.

    This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

    Dirk Eddelbuettel: RcppCCTZ 0.2.10: Updates

    A new release 0.2.10 of RcppCCTZ is now on CRAN. RcppCCTZ uses Rcpp to bring CCTZ to R. CCTZ is a C++ library for translating between absolute and civil times using the rules of a time zone. In fact, it is two libraries. One for dealing with civil time: human-readable dates and times, and one for converting between between absolute and civil times via time zones. And while CCTZ is made by Google(rs), it is not an official Google product. The RcppCCTZ page has a few usage examples and details. This package was the first CRAN package to use CCTZ; by now four others packages include its sources too. Not ideal, but beyond our control. This version switches to r-ci, and just like RDieHarder includes a patch kindly prepared by Tomas Kalibera for the upcoming (and very useful) UCRT changes for Windows involving small build changes for the updated Windows toolchain.

    Changes in version 0.2.10 (2021-12-14)
    • Switch CI use to r-ci
    • Applied patch by Tomas Kalibera for Windows UCRT under the upcoming R 4.2.0 expected for April.

    We also have a diff to the previous version thanks to my CRANberries. More details are at the RcppCCTZ page; code, issue tickets etc at the GitHub repository. If you like this or other open-source work I do, you can now sponsor me at GitHub.

    This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

    14 December 2021

    Dirk Eddelbuettel: RDieHarder 0.2.3 on CRAN: Packaging Updates

    An new version 0.2.3 of the random-number generator tester RDieHarder (based on the DieHarder suite developed / maintained by Robert Brown with contributions by David Bauer and myself) is now on CRAN. This release comes only about one and half months after the previous release 0.2.2 and is once again related to R and CRAN changes. The upcoming (and very useful) UCRT changes for Windows involve small build changes for the updated Windows toolchain so this release includes a patch kindly prepared by Tomas Kalibera. And because compilers get cleverer and cleverer over time, I also address a warning and error found by the newest gcc in what is otherwise unchanged and years old C code In addition, two other warnings were fixed right after the previous release. Thanks to CRANberries, you can also look at the most recent diff. If you like this or other open-source work I do, you can now sponsor me at GitHub.

    This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

    12 December 2021

    Russell Coker: Some Ideas for Debian Security Improvements

    Debian security is pretty good, but there s always scope for improvement. Here are some ideas that I think could be used to improve things.
    1. A security wizard , basically a set of scripts with support for plugins that will investigate your system and look for things that can be improved. It could give suggestions on LSMs that could be used, sysctl settings, lists of daemons running as root that possibly don t need root privs, etc. Plugins could be for different daemons, so there could be a plugin for Apache that looks for potential issues with Apache configuration. It wouldn t be possible to cover everything, but it would be possible to cover many common cases.
      It appears that we used to have a harden package to do some of these things which disappeared. It appears that the only remnant of that is the hardening-runtime package.
    2. Kali Linux [1] is a distribution designed for penetration testing, I recently tried out many of it s features and I was very impressed. While I don t think that the aim should be to copy all Kali features into Debian there are probably some that are worthy of inclusion. Most Kali features run well in a VM, but the Wifi penetration testing tools need access to the hardware, so they could be a good candidate for inclusion in Debian (license permitting).
    3. We have a Securing Debian Manual [2] that is really good. It s a little out of date and needs some contributions, it also needs to be better known.
    4. The Security Management page of the Debian wiki [3] has links to a number of pages about improving system security. It needs some updates, it doesn t have a link to a page about SE Linux so there s some work for me to do there.
    5. Can training help people? I would be happy to run some Debian SE Linux training sessions over Matrix or Jitsi. We can probably find people to offer training on other aspects of Linux security that are implemented in Debian if there is an audience. I don t think that I and other DDs (Debian Developers) can train everyone, but we could train people who then go on to run other training sessions and make the session notes etc available under the GPL.
      There would also be some benefits to training other DDs as probably no-one has a good overview of all the security features that are supported.
    Any other ideas? Feel free to comment here or start a thread on a public mailing list. If you start a mailing list discussion please email me or comment here with the URL if it s a list that I m not on so I can track it via the archives. This post was inspired by a discussion on a private list of a related topic. I think it s better to have a public discussion instead.

    6 December 2021

    Matthias Klumpp: New things in AppStream 0.15

    On the road to AppStream 1.0, a lot of items from the long todo list have been done so far only one major feature is remaining, external release descriptions, which is a tricky one to implement and specify. For AppStream 1.0 it needs to be present or be rejected though, as it would be a major change in how release data is handled in AppStream. Besides 1.0 preparation work, the recent 0.15 release and the releases before it come with their very own large set of changes, that are worth a look and may be interesting for your application to support. But first, for a change that affects the implementation and not the XML format: 1. Completely rewritten caching code Keeping all AppStream data in memory is expensive, especially if the data is huge (as on Debian and Ubuntu with their large repositories generated from desktop-entry files as well) and if processes using AppStream are long-running. The latter is more and more the case, not only does GNOME Software run in the background, KDE uses AppStream in KRunner and Phosh will use it too for reading form factor information. Therefore, AppStream via libappstream provides an on-disk cache that is memory-mapped, so data is only consuming RAM if we are actually doing anything with it. Previously, AppStream used an LMDB-based cache in the background, with indices for fulltext search and other common search operations. This was a very fast solution, but also came with limitations, LMDB s maximum key size of 511 bytes became a problem quite often, adjusting the maximum database size (since it has to be set at opening time) was annoyingly tricky, and building dedicated indices for each search operation was very inflexible. In addition to that, the caching code was changed multiple times in the past to allow system-wide metadata to be cached per-user, as some distributions didn t (want to) build a system-wide cache and therefore ran into performance issues when XML was parsed repeatedly for generation of a temporary cache. In addition to all that, the cache was designed around the concept of one cache for data from all sources , which meant that we had to rebuild it entirely if just a small aspect changed, like a MetaInfo file being added to /usr/share/metainfo, which was very inefficient. To shorten a long story, the old caching code was rewritten with the new concepts of caches not necessarily being system-wide and caches existing for more fine-grained groups of files in mind. The new caching code uses Richard Hughes excellent libxmlb internally for memory-mapped data storage. Unlike LMDB, libxmlb knows about the XML document model, so queries can be much more powerful and we do not need to build indices manually. The library is also already used by GNOME Software and fwupd for parsing of (refined) AppStream metadata, so it works quite well for that usecase. As a result, search queries via libappstream are now a bit slower (very much depends on the query, roughly 20% on average), but can be mmuch more powerful. The caching code is a lot more robust, which should speed up startup time of applications. And in addition to all of that, the AsPool class has gained a flag to allow it to monitor AppStream source data for changes and refresh the cache fully automatically and transparently in the background. All software written against the previous version of the libappstream library should continue to work with the new caching code, but to make use of some of the new features, software using it may need adjustments. A lot of methods have been deprecated too now. 2. Experimental compose support Compiling MetaInfo and other metadata into AppStream collection metadata, extracting icons, language information, refining data and caching media is an involved process. The appstream-generator tool does this very well for data from Linux distribution sources, but the tool is also pretty heavyweight with lots of knobs to adjust, an underlying database and a complex algorithm for icon extraction. Embedding it into other tools via anything else but its command-line API is also not easy (due to D s GC initialization, and because it was never written with that feature in mind). Sometimes a simpler tool is all you need, so the libappstream-compose library as well as appstreamcli compose are being developed at the moment. The library contains building blocks for developing a tool like appstream-generator while the cli tool allows to simply extract metadata from any directory tree, which can be used by e.g. Flatpak. For this to work well, a lot of appstream-generator s D code is translated into plain C, so the implementation stays identical but the language changes. Ultimately, the generator tool will use libappstream-compose for any general data refinement, and only implement things necessary to extract data from the archive of distributions. New applications (e.g. for new bundling systems and other purposes) can then use the same building blocks to implement new data generators similar to appstream-generator with ease, sharing much of the code that would be identical between implementations anyway. 2. Supporting user input controls Want to advertise that your application supports touch input? Keyboard input? Has support for graphics tablets? Gamepads? Sure, nothing is easier than that with the new control relation item and supports relation kind (since 0.12.11 / 0.15.0, details):
    3. Defining minimum display size requirements Some applications are unusable below a certain window size, so you do not want to display them in a software center that is running on a device with a small screen, like a phone. In order to encode this information in a flexible way, AppStream now contains a display_length relation item to require or recommend a minimum (or maximum) display size that the described GUI application can work with. For example:
      <display_length compare="ge">360</display_length>
    This will make the application require a display length greater or equal to 300 logical pixels. A logical pixel (also device independent pixel) is the amount of pixels that the application can draw in one direction. Since screens, especially phone screens but also screens on a desktop, can be rotated, the display_length value will be checked against the longest edge of a display by default (by explicitly specifying the shorter edge, this can be changed). This feature is available since 0.13.0, details. See also Tobias Bernard s blog entry on this topic. 4. Tags This is a feature that was originally requested for the LVFS/fwupd, but one of the great things about AppStream is that we can take very project-specific ideas and generalize them so something comes out of them that is useful for many. The new tags tag allows people to tag components with an arbitrary namespaced string. This can be useful for project-internal organization of applications, as well as to convey certain additional properties to a software center, e.g. an application could mark itself as featured in a specific software center only. Metadata generators may also add their own tags to components to improve organization. AppStream gives no recommendations as to how these tags are to be interpreted except for them being a strictly optional feature. So any meaning is something clients and metadata authors need to negotiate. It therefore is a more specialized usecase of the already existing custom tag, and I expect it to be primarily useful within larger organizations that produce a lot of software components that need sorting. For example:
      <tag namespace="lvfs">vendor-2021q1</tag>
      <tag namespace="plasma">featured</tag>
    This feature is available since 0.15.0, details. 5. MetaInfo Creator changes The MetaInfo Creator (source) tool is a very simple web application that provides you with a form to fill out and will then generate MetaInfo XML to add to your project after you have answered all of its questions. It is an easy way for developers to add the required metadata without having to read the specification or any guides at all. Recently, I added support for the new control and display_length tags, resolved a few minor issues and also added a button to instantly copy the generated output to clipboard so people can paste it into their project. If you want to create a new MetaInfo file, this tool is the best way to do it! The creator tool will also not transfer any data out of your webbrowser, it is strictly a client-side application. And that is about it for the most notable changes in AppStream land! Of course there is a lot more, additional tags for the LVFS and content rating have been added, lots of bugs have been squashed, the documentation has been refined a lot and the library has gained a lot of new API to make building software centers easier. Still, there is a lot to do and quite a few open feature requests too. Onwards to 1.0!

    5 December 2021

    Reproducible Builds: Reproducible Builds in November 2021

    Welcome to the November 2021 report from the Reproducible Builds project. As a quick recap, whilst anyone may inspect the source code of free software for malicious flaws, almost all software is distributed to end users as pre-compiled binaries. The motivation behind the reproducible builds effort is therefore to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. If you are interested in contributing to our project, please visit our Contribute page on our website.
    On November 6th, Vagrant Cascadian presented at this year s edition of the SeaGL conference, giving a talk titled Debugging Reproducible Builds One Day at a Time:
    I ll explore how I go about identifying issues to work on, learn more about the specific issues, recreate the problem locally, isolate the potential causes, dissect the problem into identifiable parts, and adapt the packaging and/or source code to fix the issues.
    A video recording of the talk is available on
    Fedora Magazine published a post written by Zbigniew J drzejewski-Szmek about how to Use Diffoscope in packager workflows, specifically around ensuring that new versions of a package do not introduce breaking changes:
    In the role of a packager, updating packages is a recurring task. For some projects, a packager is involved in upstream maintenance, or well written release notes make it easy to figure out what changed between the releases. This isn t always the case, for instance with some small project maintained by one or two people somewhere on GitHub, and it can be useful to verify what exactly changed. Diffoscope can help determine the changes between package releases. [ ]

    kpcyrd announced the release of rebuilderd version 0.16.3 on our mailing list this month, adding support for builds to generate multiple artifacts at once.
    Lastly, we held another IRC meeting on November 30th. As mentioned in previous reports, due to the global events throughout 2020 etc. there will be no in-person summit event this year.

    diffoscope diffoscope is our in-depth and content-aware diff utility. Not only can it locate and diagnose reproducibility issues, it can provide human-readable diffs from many kinds of binary formats. This month, Chris Lamb made the following changes, including preparing and uploading versions 190, 191, 192, 193 and 194 to Debian:
    • New features:
      • Continue loading a .changes file even if the referenced files do not exist, but include a comment in the returned diff. [ ]
      • Log the reason if we cannot load a Debian .changes file. [ ]
    • Bug fixes:
      • Detect XML files as XML files if file(1) claims if they are XML files or if they are named .xml. (#999438)
      • Don t duplicate file lists at each directory level. (#989192)
      • Don t raise a traceback when comparing nested directories with non-directories. [ ]
      • Re-enable test_android_manifest. [ ]
      • Don t reject Debian .changes files if they contain non-printable characters. [ ]
    • Codebase improvements:
      • Avoid aliasing variables if we aren t going to use them. [ ]
      • Use isinstance over type. [ ]
      • Drop a number of unused imports. [ ]
      • Update a bunch of %-style string interpolations into f-strings or str.format. [ ]
      • When pretty-printing JSON, mark the difference as being reformatted, additionally avoiding including the full path. [ ]
      • Import itertools top-level module directly. [ ]
    Chris Lamb also made an update to the command-line client to trydiffoscope, a web-based version of the diffoscope in-depth and content-aware diff utility, specifically only waiting for 2 minutes for to respond in tests. (#998360) In addition Brandon Maier corrected an issue where parts of large diffs were missing from the output [ ], Zbigniew J drzejewski-Szmek fixed some logic in the assert_diff_startswith method [ ] and Mattia Rizzolo updated the packaging metadata to denote that we support both Python 3.9 and 3.10 [ ] as well as a number of warning-related changes[ ][ ]. Vagrant Cascadian also updated the diffoscope package in GNU Guix [ ][ ].

    Distribution work In Debian, Roland Clobus updated the wiki page documenting Debian reproducible Live images to mention some new bug reports and also posted an in-depth status update to our mailing list. In addition, 90 reviews of Debian packages were added, 18 were updated and 23 were removed this month adding to our knowledge about identified issues. Chris Lamb identified a new toolchain issue, absolute_path_in_cmake_file_generated_by_meson.
    Work has begun on classifying reproducibility issues in packages within the Arch Linux distribution. Similar to the analogous effort within Debian (outlined above), package information is listed in a human-readable packages.yml YAML file and a sibling file shows how to classify packages too. Finally, Bernhard M. Wiedemann posted his monthly reproducible builds status report for openSUSE and Vagrant Cascadian updated a link on our website to link to the GNU Guix reproducibility testing overview [ ].

    Software development The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including: Elsewhere, in software development, Jonas Witschel updated strip-nondeterminism, our tool to remove specific non-deterministic results from a completed build so that it did not fail on JAR archives containing invalid members with a .jar extension [ ]. This change was later uploaded to Debian by Chris Lamb. reprotest is the Reproducible Build s project end-user tool to build the same source code twice in widely different environments and checking whether the binaries produced by the builds have any differences. This month, Mattia Rizzolo overhauled the Debian packaging [ ][ ][ ] and fixed a bug surrounding suffixes in the Debian package version [ ], whilst Stefano Rivera fixed an issue where the package tests were broken after the removal of diffoscope from the package s strict dependencies [ ].

    Testing framework The Reproducible Builds project runs a testing framework at, to check packages and other artifacts for reproducibility. This month, the following changes were made:
    • Holger Levsen:
      • Document the progress in setting up [ ]
      • Add the packages required for debian-snapshot. [ ]
      • Make the dstat package available on all Debian based systems. [ ]
      • Mark virt32b-armhf and virt64b-armhf as down. [ ]
    • Jochen Sprickerhof:
      • Add SSH authentication key and enable access to the osuosl168-amd64 node. [ ][ ]
    • Mattia Rizzolo:
      • Revert reproducible Debian: mark virt(32 64)b-armhf as down - restored. [ ]
    • Roland Clobus (Debian live image generation):
      • Rename sid internally to unstable until an issue in the snapshot system is resolved. [ ]
      • Extend testing to include Debian bookworm too.. [ ]
      • Automatically create the Jenkins view to display jobs related to building the Live images. [ ]
    • Vagrant Cascadian:
      • Add a Debian package set group for the packages and tools maintained by the Reproducible Builds maintainers themselves. [ ]

    If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

    Paul Tagliamonte: Framing data (Part 4/5)

    This post is part of a series called "PACKRAT". If this is the first post you've found, it'd be worth reading the intro post first and then looking over all posts in the series.
    In the last post, we we were able to build a functioning Layer 1 PHY where we can encode symbols to transmit, and receive symbols on the other end, we re now at the point where we can encode and decode those symbols as bits and frame blocks of data, marking them with a Sender and a Destination for routing to the right host(s). This is a Layer 2 scheme in the OSI model, which is otherwise known as the Data Link Layer. You re using one to view this website right now I m willing to bet your data is going through an Ethernet layer 2 as well as WiFi or maybe a cellular data protocol like 5G or LTE. Given that this entire exercise is hard enough without designing a complex Layer 2 scheme, I opted for simplicity in the hopes this would free me from the complexity and research that has gone into this field for the last 50 years. I settled on stealing a few ideas from Ethernet Frames namely, the use of MAC addresses to identify parties, and the EtherType field to indicate the Payload type. I also stole the idea of using a CRC at the end of the Frame to check for corruption, as well as the specific CRC method (crc32 using 0xedb88320 as the polynomial). Lastly, I added a callsign field to make life easier on ham radio frequencies if I was ever to seriously attempt to use a variant of this protocol over the air with multiple users. However, given this scheme is not a commonly used scheme, it s best practice to use a nearby radio to identify your transmissions on the same frequency while testing or use a Faraday box to test without transmitting over the airwaves. I added the callsign field in an effort to lean into the spirit of the Part 97 regulations, even if I relied on a phone emission to identify the Frames. As an aside, I asked the ARRL for input here, and their stance to me over email was I d be OK according to the regs if I were to stick to UHF and put my callsign into the BPSK stream using a widely understood encoding (even with no knowledge of PACKRAT, the callsign is ASCII over BPSK and should be easily demodulatable for followup with me). Even with all this, I opted to use FM phone to transmit my callsign when I was active on the air (specifically, using an SDR and a small bash script to automate transmission while I watched for interference or other band users). Right, back to the Frame:
    With all that done, I put that layout into a struct, so that we can marshal and unmarshal bytes to and from our Frame objects, and work with it in software.
    type FrameType [2]byte
    type Frame struct  
    Destination net.HardwareAddr
    Source net.HardwareAddr
    Callsign [8]byte
    Type FrameType
    Payload []byte
    CRC uint32

    Time to pick some consts I picked a unique and distinctive sync sequence, which the sender will transmit before the Frame, while the receiver listens for that sequence to know when it s in byte alignment with the symbol stream. My sync sequence is [3]byte 'U', 'f', '~' which works out to be a very pleasant bit sequence of 01010101 01100110 01111110. It s important to have soothing preambles for your Frames. We need all the good energy we can get at this point.
    var (
    FrameStart = [3]byte 'U', 'f', '~' 
    FrameMaxPayloadSize = 1500
    Next, I defined some FrameType values for the type field, which I can use to determine what is done with that data next, something Ethernet was originally missing, but has since grown to depend on (who needs Length anyway? Not me. See below!)
    FrameType Description Bytes
    Raw Bytes in the Payload field are opaque and not to be parsed. [2]byte 0x00, 0x01
    IPv4 Bytes in the Payload field are an IPv4 packet. [2]byte 0x00, 0x02
    And finally, I decided on a maximum length of the Payload, and decided on limiting it to 1500 bytes to align with the MTU of Ethernet.
    var (
    FrameTypeRaw = FrameType 0, 1 
    FrameTypeIPv4 = FrameType 0, 2 
    Given we know how we re going to marshal and unmarshal binary data to and from Frames, we can now move on to looking through the bit stream for our Frames.

    Why is there no Length field? I was initially a bit surprised that Ethernet Frames didn t have a Length field in use, but the more I thought about it, the more it seemed like a big ole' failure mode without a good implementation outcome. Either the Length is right (resulting in no action and used bits on every packet) or the Length is not the length of the Payload and the driver needs to determine what to do with the packet does it try and trim the overlong payload and ignore the rest? What if both the end of the read bytes and the end of the subset of the packet denoted by Length have a valid CRC? Which is used? Will everyone agree? What if Length is longer than the Payload but the CRC is good where we detected a lost carrer? I decided on simplicity. The end of a Frame is denoted by the loss of the BPSK carrier when the signal is no longer being transmitted (or more correctly, when the signal is no longer received), we know we ve hit the end of a packet. Missing a single symbol will result in the Frame being finalized. This can cause some degree of corruption, but it s also a lot easier than doing tricks like bit stuffing to create an end of symbol stream delimiter.

    Finding the Frame start in a Symbol Stream First thing we need to do is find our sync bit pattern in the symbols we re receiving from our BPSK demodulator. There s some smart ways to do this, but given that I m not much of a smart man, I again decided to go for simple instead. Given our incoming vector of symbols (which are still float values) prepend one at a time to a vector of floats that is the same length as the sync phrase, and compare against the sync phrase, to determine if we re in sync with the byte boundary within the symbol stream. The only trick here is that because we re using BPSK to modulate and demodulate the data, post phaselock we can be 180 degrees out of alignment (such that a +1 is demodulated as -1, or vice versa). To deal with that, I check against both the sync phrase as well as the inverse of the sync phrase (both [1, -1, 1] as well as [-1, 1, -1]) where if the inverse sync is matched, all symbols to follow will be inverted as well. This effectively turns our symbols back into bits, even if we re flipped out of phase. Other techniques like NRZI will represent a 0 or 1 by a change in phase state which is great, but can often cascade into long runs of bit errors, and is generally more complex to implement. That representation isn t ambiguous, given you look for a phase change, not the absolute phase value, which is incredibly compelling. Here s a notional example of how I ve been thinking about the phrase sliding window and how I ve been thinking of the checks. Each row is a new symbol taken from the BPSK receiver, and pushed to the head of the sliding window, moving all symbols back in the vector by one.
     var (
    sync = []float  ...  
    buf = make([]float, len(sync))
    incomingSymbols = []float  ...  
    for _, el := range incomingSymbols  
    copy(buf, buf[1:])
    buf[len(buf)-1] = el
    if compare(sync, buf)  
    // we're synced!
    Given the pseudocode above, let s step through what the checks would be doing at each step:
    Buffer Sync Inverse Sync
    [ ]float 0, ,0 [ ]float -1, ,-1 [ ]float 1, ,1
    [ ]float 0, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
    [more bits in] [ ]float -1, ,-1 [ ]float 1, ,1
    [ ]float 1, ,1 [ ]float -1, ,-1 [ ]float 1, ,1
    After this notional set of comparisons, we know that at the last step, we are now aligned to the frame and byte boundary the next symbol / bit will be the MSB of the 0th Frame byte. Additionally, we know we re also 180 degrees out of phase, so we need to flip the symbol s sign to get the bit. From this point on we can consume 8 bits at a time, and re-assemble the byte stream. I don t know what this technique is called or even if this is used in real grown-up implementations, but it s been working for my toy implementation.

    Next Steps Now that we can read/write Frames to and from PACKRAT, the next steps here are going to be implementing code to encode and decode Ethernet traffic into PACKRAT, coming next in Part 5!

    3 December 2021

    Evgeni Golov: Dependency confusion in the Ansible Galaxy CLI

    I hope you enjoyed my last post about Ansible Galaxy Namespaces. In there I noted that I originally looked for something completely different and the namespace takeover was rather accidental. Well, originally I was looking at how the different Ansible content hosting services and their client (ansible-galaxy) behave in regard to clashes in naming of the hosted content. "Ansible content hosting services"?! There are currently three main ways for users to obtain Ansible content:
    • Ansible Galaxy - the original, community oriented, free hosting platform
    • Automation Hub - the place for Red Hat certified and supported content, available only with a Red Hat subscription, hosted by Red Hat
    • Ansible Automation Platform - the on-premise version of Automation Hub, syncs content from there and allows customers to upload own content
    Now the question I was curious about was: how would the tooling behave if different sources would offer identically named content? This was inspired by Alex Birsan: Dependency Confusion: How I Hacked Into Apple, Microsoft and Dozens of Other Companies and zofrex: Bundler is Still Vulnerable to Dependency Confusion Attacks (CVE - 2020 - 36327), who showed that the tooling for Python, Node.js and Ruby can be tricked into fetching content from "the wrong source", thus allowing an attacker to inject malicious code into a deployment. For the rest of this article, it's not important that there are different implementations of the hosting services, only that users can configure and use multiple sources at the same time. The problem is that, if the user configures their server_list to contain multiple Galaxy-compatible servers, like Ansible Galaxy and Automation Hub, and then asks to install a collection, the Ansible Galaxy CLI will ask every server in the list, until one returns a successful result. The exact order seems to differ between versions, but this doesn't really matter for the issue at hand. Imagine someone wants to install the redhat.satellite collection from Automation Hub (using ansible-galaxy collection install redhat.satellite). Now if their configuration defines Galaxy as the first, and Automation Hub as the second server, Galaxy is always asked whether it has redhat.satellite and only if the answer is negative, Automation Hub is asked. Today there is no redhat namespace on Galaxy, but there is a redhat user on GitHub, so The canonical answer to this issue is to use a requirements.yml file and setting the source parameter. This parameter allows you to express "regardless which sources are configured, please fetch this collection from here". That's is nice, but I think this not being the default syntax (contrary to what e.g. Bundler does) is a bad approach. Users might overlook the security implications, as the shorter syntax without the source just "magically" works. However, I think this is not even the main problem here. The documentation says: Once a collection is found, any of its requirements are only searched within the same Galaxy instance as the parent collection. The install process will not search for a collection requirement in a different Galaxy instance. But as it turns out, the source behavior was changed and now only applies to the exact collection it is set for, not for any dependencies this collection might have. For the sake of the example, imagine two collections: evgeni.test1 and evgeni.test2, where test2 declares a dependency on test1 in its galaxy.yml. Actually, no need to imagine, both collections are available in version 1.0.0 from and test1 version 2.0.0 is available from Now, given our recent reading of the docs, we craft the following requirements.yml:
    - name: evgeni.test2
      version: '*'
    In a perfect world, following the documentation, this would mean that both collections are fetched from, right? However, this is not what ansible-galaxy does. It will fetch evgeni.test2 from the specified source, determine it has a dependency on evgeni.test1 and fetch that from the "first" available source from the configuration. Take for example the following ansible.cfg:
    server_list = test_galaxy, release_galaxy, test_galaxy
    And try to install collections, using the above requirements.yml:
    % ansible-galaxy collection install -r requirements.yml -vvv                 
    ansible-galaxy 2.9.27
      config file = /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg
      configured module search path = ['/home/evgeni/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
      ansible python module location = /usr/lib/python3.10/site-packages/ansible
      executable location = /usr/bin/ansible-galaxy
      python version = 3.10.0 (default, Oct  4 2021, 00:00:00) [GCC 11.2.1 20210728 (Red Hat 11.2.1-1)]
    Using /home/evgeni/Devel/ansible-wtf/collections/ansible.cfg as config file
    Reading requirement file at '/home/evgeni/Devel/ansible-wtf/collections/requirements.yml'
    Found installed collection theforeman.foreman:3.0.0 at '/home/evgeni/.ansible/collections/ansible_collections/theforeman/foreman'
    Process install dependency map
    Processing requirement collection 'evgeni.test2'
    Collection 'evgeni.test2' obtained from server explicit_requirement_evgeni.test2
    Opened /home/evgeni/.ansible/galaxy_token
    Processing requirement collection 'evgeni.test1' - as dependency of evgeni.test2
    Collection 'evgeni.test1' obtained from server test_galaxy
    Starting collection install process
    Installing 'evgeni.test2:1.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test2'
    Downloading to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki
    Installing 'evgeni.test1:2.0.0' to '/home/evgeni/.ansible/collections/ansible_collections/evgeni/test1'
    Downloading to /home/evgeni/.ansible/tmp/ansible-local-133/tmp9uqyjgki
    As you can see, evgeni.test1 is fetched from, instead of Now, if those servers instead would be Galaxy and Automation Hub, and somebody managed to snag the redhat namespace on Galaxy, I would be now getting the wrong stuff Another problematic setup would be with Galaxy and on-prem Ansible Automation Platform, as you can have any namespace on the later and these most certainly can clash with namespaces on public Galaxy. I have reported this behavior to Ansible Security on 2021-08-26, giving a 90 days disclosure deadline, which expired on 2021-11-24. So far, the response was that this is working as designed, to allow cross-source dependencies (e.g. a private collection referring to one on Galaxy) and there is an issue to update the docs to match the code. If users want to explicitly pin sources, they are supposed to name all dependencies and their sources in requirements.yml. Alternatively they obviously can configure only one source in the configuration and always mirror all dependencies. I am not happy with this and I think this is terrible UX, explicitly inviting people to make mistakes.