Search Results: "pm"

23 October 2020

Birger Schacht: An Analysis of 5 Million OpenPGP Keys

In July I finished my Bachelor s Degree in IT Security at the University of Applied Sciences in St. Poelten. During the studies I did some elective courses, one of which was about Data Analysis using Python, Pandas and Jupyter Notebooks. I found it very interesting to do calculations on different data sets and to visualize them. Towards the end of the Bachelor I had to find a topic for my Bachelor Thesis and as a long time user of OpenPGP I thought it would be interesting to do an analysis of the collection of OpenPGP keys that are available on the keyservers of the SKS keyserver network. So in June 2019 I fetched a copy of one of the key dumps of the one of the keyservers (some keyserver publish these copies of their key database so people who want to join the SKS keyserver network can do an initial import). At that time the copy of the key database contained 5,499,675 keys and was around 12GB. Using the hockeypuck keyserver software I imported the keys into an PostgreSQL database. Hockeypuck uses a table called keys to store the keys and in there the column doc stores the OpenPGP keys in JSON format (always with a data field containing the original unparsed data). For the thesis I split the analysis in three parts, first looking at the Public Key packets, then analysing the User ID packets and finally studying the Signature Packets. To analyse the respective packets I used SQL to export the data to CSV files and then used the pandas read_csv method to create a dataframe of the values. In a couple of cases I did some parsing before converting to a DataFrame to make the analysis step faster. The parsing was done using the pgpdump python library. Together with my advisor I decided to submit the thesis for a journal, so we revised and compressed the whole paper and the outcome was now

PUBLISHED in the Journal of Wireless Mobile Networks, Ubiquitous Computing, and Dependable Applications (JoWUA). I think the work gives some valuable insight in the development of the use of OpenPGP in the last 30 years. Looking at the public key packets we were able to compare the different public key algorithms and for example visualize how DSA was the most used algorithm until around 2010 when it was replaced by RSA. When looking at the less used algorithms a trend towards ECC based crytography is visible. What we also noticed was an increase of RSA keys with algorithm ID 3 (RSA Sign-Only), which are deprecated. When we took a deeper look at those keys we realized that most of those keys used a specific User ID string in the User ID packets which allowed us to attribute those keys to two software projects both using the Bouncy Castle Java Cryptographic API (resp. the Spongy Castle version for Android). We also stumbled over a tutorial on how to create RSA keys with Bouncycastle which also describes how to create RSA keys with code that produces RSA Sign-Only keys. In one of those projects, this was then fixed. By looking at the User ID packets we did some statistics about the most used email providers used by OpenPGP users. One domain stood out, because it is not the domain of an email provider: tellfinder.com is a domain used in around 45,000 keys. Tellfinder is a Big Data analysis software and the UID of all but two of those keys is TellFinder Page Archiver- Signing Key <support@tellfinder.com>. We also looked at the comments used in OpenPGP User ID fields. In 2013 Daniel Kahn Gillmor published a blog post titled OpenPGP User ID Comments considered harmful in which he pointed out that most of the comments in the User ID field of OpenPGP keys are duplicating information that is already present somewhere in the User ID or the key itself. In our dataset 3,133 comments were exactly the same as the name, 3,346 were the same as the domain and 18,246 comments were similar to the local part of the email address Last but not least we looked at the signature subpackets and the development of some of the preferences (Preferred Symmetric Algorithm, Preferred Hash Algorithm) that are being published using signature packets. Analysing this huge dataset of cryptographic keys of the last 20 to 30 years was very interesting and I learned a lot about the history of PGP resp. OpenPGP and the evolution of cryptography overall. I think it would be interesting to look at even more properties of OpenPGP keys and I also think it would be valuable for the OpenPGP ecosystem if these kinds analysis could be done regularly. An approach like Tor Metrics could lead to interesting findings and could also help to back decisions regarding future developments of the OpenPGP standard.

21 October 2020

Bits from Debian: Debian donation for Peertube development

The Debian project is happy to announce a donation of 10,000 to help Framasoft reach the fourth stretch-goal of its Peertube v3 crowdfunding campaign -- Live Streaming. This year's iteration of the Debian annual conference, DebConf20, had to be held online, and while being a resounding success, it made clear to the project our need to have a permanent live streaming infrastructure for small events held by local Debian groups. As such, Peertube, a FLOSS video hosting platform, seems to be the perfect solution for us. We hope this unconventional gesture from the Debian project will help us make this year somewhat less terrible and give us, and thus humanity, better Free Software tooling to approach the future. Debian thanks the commitment of numerous Debian donors and DebConf sponsors, particularly all those that contributed to DebConf20 online's success (volunteers, speakers and sponsors). Our project also thanks Framasoft and the PeerTube community for developing PeerTube as a free and decentralized video platform. The Framasoft association warmly thanks the Debian Project for its contribution, from its own funds, towards making PeerTube happen. This contribution has a twofold impact. Firstly, it's a strong sign of recognition from an international project - one of the pillars of the Free Software world - towards a small French association which offers tools to liberate users from the clutches of the web's giant monopolies. Secondly, it's a substantial amount of help in these difficult times, supporting the development of a tool which equally belongs to and is useful to everyone. The strength of Debian's gesture proves, once again, that solidarity, mutual aid and collaboration are values which allow our communities to create tools to help us strive towards Utopia.

Reproducible Builds: Supporter spotlight: Civil Infrastructure Platform

The Reproducible Builds project depends on our many projects, supporters and sponsors. We rely on their financial support, but they are also valued ambassadors who spread the word about the Reproducible Builds project and the work that we do. This is the first installment in a series featuring the projects, companies and individuals who support the Reproducible Builds project. If you are a supporter of the Reproducible Builds project (of whatever size) and would like to be featured here, please let get in touch with us at contact@reproducible-builds.org. However, we are kicking off this series by featuring Urs Gleim and Yoshi Kobayashi of the Civil Infrastructure Platform (CIP) project.
Chris Lamb: Hi Urs and Yoshi, great to meet you. How might you relate the importance of the Civil Infrastructure Platform to a user who is non-technical? A: The Civil Infrastructure Platform (CIP) project is focused on establishing an open source base layer of industrial-grade software that acts as building blocks in civil infrastructure projects. End-users of this critical code include systems for electric power generation and energy distribution, oil and gas, water and wastewater, healthcare, communications, transportation, and community management. These systems deliver essential services, provide shelter, and support social interactions and economic development. They are society s lifelines, and CIP aims to contribute to and support these important pillars of modern society.
Chris: We have entered an age where our civilisations have become reliant on technology to keep us alive. Does the CIP believe that the software that underlies our own safety (and the safety of our loved ones) receives enough scrutiny today? A: For companies developing systems running our infrastructure and keeping our factories working, it is part of their business to ensure the availability, uptime, and security of these very systems. However, software complexity continues to increase, and the efforts spent on those systems is now exploding. What is missing is a common way of achieving this through refining the same tools, and cooperating on the hardening and maintenance of standard components such as the Linux operating system.
Chris: How does the Reproducible Builds effort help the Civil Infrastructure Platform achieve its goals? A: Reproducibility helps a great deal in software maintenance. We have a number of use-cases that should have long-term support of more than 10 years. During this period, we encounter issues that need to be fixed in the original source code. But before we make changes to the source code, we need to check whether it is actually the original source code or not. If we can reproduce exactly the same binary from the source code even after 10 years, we can start to invest time and energy into making these fixes.
Chris: Can you give us a brief history of the Civil Infrastructure Platform? Are there any specific success stories that the CIP is particularly proud of? A: The CIP Project formed in 2016 as a project hosted by Linux Foundation. It was launched out of necessity to establish an open source framework and the subsequent software foundation delivers services for civil infrastructure and economic development on a global scale. Some key milestones we have achieved as a project include our collaboration with Debian, where we are helping with the Debian Long Term Support (LTS) initiative, which aims to extend the lifetime of all Debian stable releases to at least 5 years. This is critical because most control systems for transportation, power plants, healthcare and telecommunications run on Debian-based embedded systems. In addition, CIP is focused on IEC 62443, a standards-based approach to counter security vulnerabilities in industrial automation and control systems. Our belief is that this work will help mitigate the risk of cyber attacks, but in order to deal with evolving attacks of this kind, all of the layers that make up these complex systems (such as system services and component functions, in addition to the countless operational layers) must be kept secure. For this reason, the IEC 62443 series is attracting attention as the de facto cyber-security standard.
Chris: The Civil Infrastructure Platform project comprises a number of project members from different industries, with stakeholders across multiple countries and continents. How does working together with a broad group of interests help in your effectiveness and efficiency? A: Although the members have different products, they share the requirements and issues when developing sustainable products. In the end, we are driven by common goals. For the project members, working internationally is simply daily business. We see this as an advantage over regional or efforts that focus on narrower domains or markets.
Chris: The Civil Infrastructure Platform supports a number of other existing projects and initiatives in the open source world too. How much do you feel being a part of the broader free software community helps you achieve your aims? A: Collaboration with other projects is an essential part of how CIP operates we want to enable commonly-used software components. It would not make sense to re-invent solutions that are already established and widely used in product development. To this end, we have an upstream first policy which means that, if existing projects need to be modified to our needs or are already working on issues that we also need, we work directly with them.
Chris: Open source software in desktop or user-facing contexts receives a significant amount of publicity in the media. However, how do you see the future of free software from an industrial-oriented context? A: Open source software has already become an essential part of the industry and civil infrastructure, and the importance of open source software there is still increasing. Without open source software, we cannot achieve, run and maintain future complex systems, such as smart cities and other key pieces of civil infrastructure.
Chris: If someone wanted to know more about the Civil Infrastructure Platform (or even to get involved) where should they go to look? A: We have many avenues to participate and learn more! We have a website, a wiki and you can even follow us on Twitter.

For more about the Reproducible Builds project, please see our website at reproducible-builds.org. If you are interested in ensuring the ongoing security of the software that underpins our civilisation and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

20 October 2020

Dirk Eddelbuettel: RcppArmadillo 0.10.1.0.0

armadillo image Armadillo is a powerful and expressive C++ template library for linear algebra aiming towards a good balance between speed and ease of use with a syntax deliberately close to a Matlab. RcppArmadillo integrates this library with the R environment and language and is widely used by (currently) 786 other packages on CRAN. A little while ago, Conrad released version 10.1.0 of Armadillo, a a new major release. As before, given his initial heads-up we ran two full reverse-depends checks, and as a consequence contacted four packages authors (two by email, two via PR) about a miniscule required change (as Armadillo now defaults to C++11, an old existing setting of avoiding C++11 lead to an error). Our thanks to those who promptly update their packages truly appreciated. As it turns out, Conrad also softened the error by the time the release ran around. But despite our best efforts, the release was delayed considerably by CRAN. We had made several Windows test builds but luck had it that on the uploaded package CRAN got itself a (completely spurious segfault which can happen on a busy machine building machine things at once). Sadly it took three or four days for CRAN to reply our email. After which it took another number of days for them to ponder the behaviour of a few new deprecated messaged tickled by at the most ten or so (out of 786) packages. Oh well. So here we are, eleven days after I emailed the rcpp-devel list about the new package being on CRAN but possibly delayed (due to that seg.fault). But during all that time the package was of course available via the Rcpp drat. The changes in this release are summarized below as usual and are mostly upstream along with an improved Travis CI setup due to the aforementioned use of the bspm package for binaries at Travis.

Changes in RcppArmadillo version 0.10.1.0.0 (2020-10-09)
  • Upgraded to Armadillo release 10.1.0 (Orchid Ambush)
    • C++11 is now the minimum required C++ standard
    • faster handling of compound expressions by trimatu() and trimatl()
    • faster sparse matrix addition, subtraction and element-wise multiplication
    • expanded sparse submatrix views to handle the non-contiguous form of X.cols(vector_of_column_indices)
    • expanded eigs_sym() and eigs_gen() with optional fine-grained parameters (subspace dimension, number of iterations, eigenvalues closest to specified value)
    • deprecated form of reshape() removed from Cube and SpMat classes
    • ignore and warn on use of the ARMA_DONT_USE_CXX11 macro
  • Switch Travis CI testing to focal and BSPM

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppArmadillo page. Questions, comments etc should go to the rcpp-devel mailing list off the R-Forge page. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

19 October 2020

Louis-Philippe V ronneau: Musings on long-term software support and economic incentives

Although I still read a lot, during my college sophomore years my reading habits shifted from novels to more academic works. Indeed, reading dry textbooks and economic papers for classes often kept me from reading anything else substantial. Nowadays, I tend to binge read novels: I won't touch a book for months on end, and suddenly, I'll read 10 novels back to back1. At the start of a novel binge, I always follow the same ritual: I take out my e-reader from its storage box, marvel at the fact the battery is still pretty full, turn on the WiFi and check if there are OS updates. And I have to admit, Kobo Inc. (now Rakuten Kobo) has done a stellar job of keeping my e-reader up to date. I've owned this model (a Kobo Aura 1st generation) for 7 years now and I'm still running the latest version of Kobo's Linux-based OS. Having recently had trouble updating my Nexus 5 (also manufactured 7 years ago) to Android 102, I asked myself:
Why is my e-reader still getting regular OS updates, while Google stopped issuing security patches for my smartphone four years ago?
To try to answer this, let us turn to economic incentives theory. Although not the be-all and end-all some think it is3, incentives theory is not a bad tool to analyse this particular problem. Executives at Google most likely followed a very business-centric logic when they decided to drop support for the Nexus 5. Likewise, Rakuten Kobo's decision to continue updating older devices certainly had very little to do with ethics or loyalty to their user base. So, what are the incentives that keep Kobo updating devices and why are they different than smartphone manufacturers'? A portrait of the current long-term software support offerings for smartphones and e-readers Before delving deeper in economic theory, let's talk data. I'll be focusing on 2 brands of e-readers, Amazon's Kindle and Rakuten's Kobo. Although the e-reader market is highly segmented and differs a lot based on geography, Amazon was in 2015 the clear worldwide leader with 53% of the worldwide e-reader sales, followed by Rakuten Kobo at 13%4. On the smartphone side, I'll be differentiating between Apple's iPhones and Android devices, taking Google as the barometer for that ecosystem. As mentioned below, Google is sadly the leader in long-term Android software support. Rakuten Kobo According to their website and to this Wikipedia table, the only e-readers Kobo has deprecated are the original Kobo eReader and the Kobo WiFi N289, both released in 2010. This makes their oldest still supported device the Kobo Touch, released in 2011. In my book, that's a pretty good track record. Long-term software support does not seem to be advertised or to be a clear selling point in their marketing. Amazon According to their website, Amazon has dropped support for all 8 devices produced before the Kindle Paperwhite 2nd generation, first sold in 2013. To put things in perspective, the first Kindle came out in 2007, 3 years before Kobo started selling devices. Like Rakuten Kobo, Amazon does not make promises of long-term software support as part of their marketing. Apple Apple has a very clear software support policy for all their devices:
Owners of iPhone, iPad, iPod or Mac products may obtain a service and parts from Apple or Apple service providers for five years after the product is no longer sold or longer, where required by law.
This means in the worst-case scenario of buying an iPhone model just as it is discontinued, one would get a minimum of 5 years of software support. Android Google's policy for their Android devices is to provide software support for 3 years after the launch date. If you buy a Pixel device just before the new one launches, you could theoretically only get 2 years of support. In 2018, Google decided OEMs would have to provide security updates for at least 2 years after launch, threatening not to license Google Apps and the Play Store if they didn't comply. A question of cost structure From the previous section, we can conclude that in general, e-readers seem to be supported longer than smartphones, and that Apple does a better job than Android OEMs, providing support for about twice as long. Even Fairphone, who's entire business is to build phones designed to last and to be repaired was not able to keep the Fairphone 1 (2013) updated for more than a couple years and seems to be struggling to keep the Fairphone 2 (2015) running an up to date version of Android. Anyone who has ever worked in IT will tell you: maintaining software over time is hard work and hard work by specialised workers is expensive. Most commercial electronic devices are sold and developed by for-profit enterprises and software support all comes down to a question of cost structure. If companies like Google or Fairphone are to be expected to provide long-term support for the devices they manufacture, they have to be able to fund their work somehow. In a perfect world, people would be paying for the cost of said long-term support, as it would likely be cheaper then buying new devices every few years and would certainly be better for the planet. Problem is, manufacturers aren't making them pay for it. Economists call this type of problem externalities: things that should be part of the cost of a good, but aren't for one a reason or another. A classic example of an externality is pollution. Clearly pollution is bad and leads to horrendous consequences, like climate change. Sane people agree we should drastically cut our greenhouse gas emissions, and yet, we aren't. Neo-classical economic theory argues the way to fix externalities like pollution is to internalise these costs, in other words, to make people pay for the "real price" of the goods they buy. In the case of climate change and pollution, neo-classical economic theory is plain wrong (spoiler alert: it often is), but this is where band-aids like the carbon tax comes from. Still, coming back to long-term software support, let's see what would happen if we were to try to internalise software maintenance costs. We can do this multiple ways. 1 - Include the price of software maintenance in the cost of the device This is the choice Fairphone makes. This might somewhat work out for them since they are a very small company, but it cannot scale for the following reasons:
  1. This strategy relies on you giving your money to an enterprise now, and trusting them to "Do the right thing" years later. As the years go by, they will eventually look at their books, see how much ongoing maintenance is costing them, drop support for the device, apologise and move on. That is to say, enterprises have a clear economic incentive to promise long-term support and not deliver. One could argue a company's reputation would suffer from this kind of behaviour. Maybe sometime it does, but most often people forget. Political promises are a great example of this.
  2. Enterprises go bankrupt all the time. Even if company X promises 15 years of software support for their devices, if they cease to exist, your device will stop getting updates. The internet is full of stories of IoT devices getting bricked when the parent company goes bankrupt and their servers disappear. This is related to point number 1: to some degree, you have a disincentive to pay for long-term support in advance, as the future is uncertain and there are chances you won't get the support you paid for.
  3. Selling your devices at a higher price to cover maintenance costs does not necessarily mean you will make more money overall raising more money to fund maintenance costs being the goal here. To a certain point, smartphone models are substitute goods and prices higher than market prices will tend to drive consumers to buy cheaper ones. There is thus a disincentive to include the price of software maintenance in the cost of the device.
  4. People tend to be bad at rationalising the total cost of ownership over a long period of time. Economists call this phenomenon hyperbolic discounting. In our case, it means people are far more likely to buy a 500$ phone each 3 years than a 1000$ phone each 10 years. Again, this means OEMs have a clear disincentive to include the price of long-term software maintenance in their devices.
Clearly, life is more complex than how I portrayed it: enterprises are not perfect rational agents, altruism exists, not all enterprises aim solely for profit maximisation, etc. Still, in a capitalist economy, enterprises wanting to charge for software maintenance upfront have to overcome these hurdles one way or another if they want to avoid failing. 2 - The subscription model Another way companies can try to internalise support costs is to rely on a subscription-based revenue model. This has multiple advantages over the previous option, mainly:
  1. It does not affect the initial purchase price of the device, making it easier to sell them at a competitive price.
  2. It provides a stable source of income, something that is very valuable to enterprises, as it reduces overall risks. This in return creates an incentive to continue providing software support as long as people are paying.
If this model is so interesting from an economic incentives point of view, why isn't any smartphone manufacturer offering that kind of program? The answer is, they are, but not explicitly5. Apple and Google can fund part of their smartphone software support via the 30% cut they take out of their respective app stores. A report from Sensor Tower shows that in 2019, Apple made an estimated US$ 16 billion from the App Store, while Google raked in US$ 9 billion from the Google Play Store. Although the Fortune 500 ranking tells us this respectively is "only" 5.6% and 6.5% of their gross annual revenue for 2019, the profit margins in this category are certainly higher than any of their other products. This means Google and Apple have an important incentive to keep your device updated for some time: if your device works well and is updated, you are more likely to keep buying apps from their store. When software support for a device stops, there is a risk paying customers will buy a competitor device and leave their ecosystem. This also explains why OEMs who don't own app stores tend not to provide software support for very long periods of time. Most of them only make money when you buy a new phone. Providing long-term software support thus becomes a disincentive, as it directly reduces their sale revenues. Same goes for Kindles and Kobos: the longer your device works, the more money they make with their electronic book stores. In my opinion, it's likely Amazon and Rakuten Kobo produce quarterly cost-benefit reports to decide when to drop support for older devices, based on ongoing support costs and the recurring revenues these devices bring in. Rakuten Kobo is also in a more precarious situation than Amazon is: considering Amazon's very important market share, if your device stops getting new updates, there is a greater chance people will replace their old Kobo with a Kindle. Again, they have an important economic incentive to keep devices running as long as they are profitable. Can Free Software fix this? Yes and no. Free Software certainly isn't a magic wand one can wave to make everything better, but does provide major advantages in terms of security, user freedom and sometimes costs. The last piece of the puzzle explaining why Rakuten Kobo's software support is better than Google's is technological choices. Smartphones are incredibly complex devices and have become the main computing platform of many. Similar to the web, there is a race for features and complexity that tends to create bloat and make older devices slow and painful to use. On the other hand, e-readers are simpler devices built for a single task: display electronic books. Control over the platform is also a key aspect of the cost structure of providing software updates. Whereas Apple controls both the software and hardware side of iPhones, Android is a sad mess of drivers and SoCs, all providing different levels of support over time6. If you take a look at the platforms the Kindle and Kobo are built on, you'll quickly see they both use Freescale I.MX SoCs. These processors are well known for their excellent upstream support in the Linux kernel and their relative longevity, chips being produced for either 10 or 15 years. This in turn makes updates much easier and less expensive to provide. So clearly, open architectures, free drivers and open hardware helps tremendously, but aren't enough on their own. One of the lessons we must learn from the (amazing) LineageOS project is how lack of funding hurts everyone. If there is no one to do the volunteer work required to maintain a version of LOS for your device, it won't be supported. Worse, when purchasing a new device, users cannot know in advance how many years of LOS support they will get. This makes buying new devices a frustrating hit-and-miss experience. If you are lucky, you will get many years of support. Otherwise, you risk your device becoming an expensive insecure paperweight. So how do we fix this? Anyone with a brain understands throwing away perfectly good devices each 2 years is not sustainable. Government regulations enforcing a minimum support life would be a step in the right direction, but at the end of the day, Capitalism is to blame. Like the aforementioned carbon tax, band-aid solutions can make things somewhat better, but won't fix our current economic system's underlying problems. For now though, I'll leave fixing the problem of Capitalism to someone else.

  1. My most recent novel binge has been focused on re-reading the Dune franchise. I first read the 6 novels written by Frank Herbert when I was 13 years old and only had vague and pleasant memories of his work. Great stuff.
  2. I'm back on LineageOS! Nice folks released an unofficial LOS 17.1 port for the Nexus 5 last January and have kept it updated since then. If you are to use it, I would also recommend updating TWRP to this version specifically patched for the Nexus 5.
  3. Very few serious economists actually believe neo-classical rational agent theory is a satisfactory explanation of human behavior. In my opinion, it's merely a (mostly flawed) lens to try to interpret certain behaviors, a tool amongst others that needs to be used carefully, preferably as part of a pluralism of approaches.
  4. Good data on the e-reader market is hard to come by and is mainly produced by specialised market research companies selling their findings at very high prices. Those particular statistics come from a MarketWatch analysis.
  5. If they were to tell people: You need to pay us 5$/month if you want to receive software updates, I'm sure most people would not pay. Would you?
  6. Coming back to Fairphones, if they had so much problems providing an Android 9 build for the Fairphone 2, it's because Qualcomm never provided Android 7+ support for the Snapdragon 801 SoC it uses.

17 October 2020

Dirk Eddelbuettel: digest 0.6.26: Blake3 and Tuning

And a new version of digest is now on CRAN will go to Debian shortly. digest creates hash digests of arbitrary R objects (using the md5, sha-1, sha-256, sha-512, crc32, xxhash32, xxhash64, murmur32, spookyhash, and blake3 algorithms) permitting easy comparison of R language objects. It is a fairly widely-used package (currently listed at 896k monthly downloads, 279 direct reverse dependencies and 8057 indirect reverse dependencies, or just under half of CRAN) as many tasks may involve caching of objects for which it provides convenient general-purpose hash key generation. This release brings two nice contributed updates. Dirk Schumacher added support for blake3 (though we could probably push this a little harder for performance, help welcome). Winston Chang benchmarked and tuned some of the key base R parts of the package. Last but not least I flipped the vignette to the lovely minidown, updated the Travis CI setup using bspm (as previously blogged about in r4 #30), and added a package website using Matertial for MkDocs. My CRANberries provides the usual summary of changes to the previous version. For questions or comments use the issue tracker off the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

14 October 2020

Michael Stapelberg: Linux package managers are slow

I measured how long the most popular Linux distribution s package manager take to install small and large packages (the ack(1p) source code search Perl script and qemu, respectively). Where required, my measurements include metadata updates such as transferring an up-to-date package list. For me, requiring a metadata update is the more common case, particularly on live systems or within Docker containers. All measurements were taken on an Intel(R) Core(TM) i9-9900K CPU @ 3.60GHz running Docker 1.13.1 on Linux 4.19, backed by a Samsung 970 Pro NVMe drive boasting many hundreds of MB/s write performance. The machine is located in Z rich and connected to the Internet with a 1 Gigabit fiber connection, so the expected top download speed is 115 MB/s. See Appendix C for details on the measurement method and command outputs.

Measurements Keep in mind that these are one-time measurements. They should be indicative of actual performance, but your experience may vary.

ack (small Perl program)
distribution package manager data wall-clock time rate
Fedora dnf 114 MB 33s 3.4 MB/s
Debian apt 16 MB 10s 1.6 MB/s
NixOS Nix 15 MB 5s 3.0 MB/s
Arch Linux pacman 6.5 MB 3s 2.1 MB/s
Alpine apk 10 MB 1s 10.0 MB/s

qemu (large C program)
distribution package manager data wall-clock time rate
Fedora dnf 226 MB 4m37s 1.2 MB/s
Debian apt 224 MB 1m35s 2.3 MB/s
Arch Linux pacman 142 MB 44s 3.2 MB/s
NixOS Nix 180 MB 34s 5.2 MB/s
Alpine apk 26 MB 2.4s 10.8 MB/s

(Looking for older measurements? See Appendix B (2019). The difference between the slowest and fastest package managers is 30x! How can Alpine s apk and Arch Linux s pacman be an order of magnitude faster than the rest? They are doing a lot less than the others, and more efficiently, too.

Pain point: too much metadata For example, Fedora transfers a lot more data than others because its main package list is 60 MB (compressed!) alone. Compare that with Alpine s 734 KB APKINDEX.tar.gz. Of course the extra metadata which Fedora provides helps some use case, otherwise they hopefully would have removed it altogether. The amount of metadata seems excessive for the use case of installing a single package, which I consider the main use-case of an interactive package manager. I expect any modern Linux distribution to only transfer absolutely required data to complete my task.

Pain point: no concurrency Because they need to sequence executing arbitrary package maintainer-provided code (hooks and triggers), all tested package managers need to install packages sequentially (one after the other) instead of concurrently (all at the same time). In my blog post Can we do without hooks and triggers? , I outline that hooks and triggers are not strictly necessary to build a working Linux distribution.

Thought experiment: further speed-ups Strictly speaking, the only required feature of a package manager is to make available the package contents so that the package can be used: a program can be started, a kernel module can be loaded, etc. By only implementing what s needed for this feature, and nothing more, a package manager could likely beat apk s performance. It could, for example:
  • skip archive extraction by mounting file system images (like AppImage or snappy)
  • use compression which is light on CPU, as networks are fast (like apk)
  • skip fsync when it is safe to do so, i.e.:
    • package installations don t modify system state
    • atomic package installation (e.g. an append-only package store)
    • automatically clean up the package store after crashes

Current landscape Here s a table outlining how the various package managers listed on Wikipedia s list of software package management systems fare:
name scope package file format hooks/triggers
AppImage apps image: ISO9660, SquashFS no
snappy apps image: SquashFS yes: hooks
FlatPak apps archive: OSTree no
0install apps archive: tar.bz2 no
nix, guix distro archive: nar. bz2,xz activation script
dpkg distro archive: tar. gz,xz,bz2 in ar(1) yes
rpm distro archive: cpio. bz2,lz,xz scriptlets
pacman distro archive: tar.xz install
slackware distro archive: tar. gz,xz yes: doinst.sh
apk distro archive: tar.gz yes: .post-install
Entropy distro archive: tar.bz2 yes
ipkg, opkg distro archive: tar ,.gz yes

Conclusion As per the current landscape, there is no distribution-scoped package manager which uses images and leaves out hooks and triggers, not even in smaller Linux distributions. I think that space is really interesting, as it uses a minimal design to achieve significant real-world speed-ups. I have explored this idea in much more detail, and am happy to talk more about it in my post Introducing the distri research linux distribution".

Appendix C: measurement details (2020)

ack You can expand each of these:
Fedora s dnf takes almost 33 seconds to fetch and unpack 114 MB.
% docker run -t -i fedora /bin/bash
[root@62d3cae2e2f9 /]# time dnf install -y ack
Fedora 32 openh264 (From Cisco) - x86_64     1.9 kB/s   2.5 kB     00:01
Fedora Modular 32 - x86_64                   6.8 MB/s   4.9 MB     00:00
Fedora Modular 32 - x86_64 - Updates         5.6 MB/s   3.7 MB     00:00
Fedora 32 - x86_64 - Updates                 9.9 MB/s    23 MB     00:02
Fedora 32 - x86_64                            39 MB/s    70 MB     00:01
[ ]
real	0m32.898s
user	0m25.121s
sys	0m1.408s
NixOS s Nix takes a little over 5s to fetch and unpack 15 MB.
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.ack'
unpacking channels...
created 1 symlinks in user environment
installing 'perl5.32.0-ack-3.3.1'
these paths will be fetched (15.55 MiB download, 85.51 MiB unpacked):
  /nix/store/34l8jdg76kmwl1nbbq84r2gka0kw6rc8-perl5.32.0-ack-3.3.1-man
  /nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31
  /nix/store/9fd4pjaxpjyyxvvmxy43y392l7yvcwy1-perl5.32.0-File-Next-1.18
  /nix/store/czc3c1apx55s37qx4vadqhn3fhikchxi-libunistring-0.9.10
  /nix/store/dj6n505iqrk7srn96a27jfp3i0zgwa1l-acl-2.2.53
  /nix/store/ifayp0kvijq0n4x0bv51iqrb0yzyz77g-perl-5.32.0
  /nix/store/w9wc0d31p4z93cbgxijws03j5s2c4gyf-coreutils-8.31
  /nix/store/xim9l8hym4iga6d4azam4m0k0p1nw2rm-libidn2-2.3.0
  /nix/store/y7i47qjmf10i1ngpnsavv88zjagypycd-attr-2.4.48
  /nix/store/z45mp61h51ksxz28gds5110rf3wmqpdc-perl5.32.0-ack-3.3.1
copying path '/nix/store/34l8jdg76kmwl1nbbq84r2gka0kw6rc8-perl5.32.0-ack-3.3.1-man' from 'https://cache.nixos.org'...
copying path '/nix/store/czc3c1apx55s37qx4vadqhn3fhikchxi-libunistring-0.9.10' from 'https://cache.nixos.org'...
copying path '/nix/store/9fd4pjaxpjyyxvvmxy43y392l7yvcwy1-perl5.32.0-File-Next-1.18' from 'https://cache.nixos.org'...
copying path '/nix/store/xim9l8hym4iga6d4azam4m0k0p1nw2rm-libidn2-2.3.0' from 'https://cache.nixos.org'...
copying path '/nix/store/9df65igwjmf2wbw0gbrrgair6piqjgmi-glibc-2.31' from 'https://cache.nixos.org'...
copying path '/nix/store/y7i47qjmf10i1ngpnsavv88zjagypycd-attr-2.4.48' from 'https://cache.nixos.org'...
copying path '/nix/store/dj6n505iqrk7srn96a27jfp3i0zgwa1l-acl-2.2.53' from 'https://cache.nixos.org'...
copying path '/nix/store/w9wc0d31p4z93cbgxijws03j5s2c4gyf-coreutils-8.31' from 'https://cache.nixos.org'...
copying path '/nix/store/ifayp0kvijq0n4x0bv51iqrb0yzyz77g-perl-5.32.0' from 'https://cache.nixos.org'...
copying path '/nix/store/z45mp61h51ksxz28gds5110rf3wmqpdc-perl5.32.0-ack-3.3.1' from 'https://cache.nixos.org'...
building '/nix/store/m0rl62grplq7w7k3zqhlcz2hs99y332l-user-environment.drv'...
created 49 symlinks in user environment
real	0m 5.60s
user	0m 3.21s
sys	0m 1.66s
Debian s apt takes almost 10 seconds to fetch and unpack 16 MB.
% docker run -t -i debian:sid
root@1996bb94a2d1:/# time (apt update && apt install -y ack-grep)
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8400 kB]
Fetched 8546 kB in 1s (8088 kB/s)
[ ]
The following NEW packages will be installed:
  ack libfile-next-perl libgdbm-compat4 libgdbm6 libperl5.30 netbase perl perl-modules-5.30
0 upgraded, 8 newly installed, 0 to remove and 23 not upgraded.
Need to get 7341 kB of archives.
After this operation, 46.7 MB of additional disk space will be used.
[ ]
real	0m9.544s
user	0m2.839s
sys	0m0.775s
Arch Linux s pacman takes a little under 3s to fetch and unpack 6.5 MB.
% docker run -t -i archlinux/base
[root@9f6672688a64 /]# time (pacman -Sy && pacman -S --noconfirm ack)
:: Synchronizing package databases...
 core            130.8 KiB  1090 KiB/s 00:00
 extra          1655.8 KiB  3.48 MiB/s 00:00
 community         5.2 MiB  6.11 MiB/s 00:01
resolving dependencies...
looking for conflicting packages...
Packages (2) perl-file-next-1.18-2  ack-3.4.0-1
Total Download Size:   0.07 MiB
Total Installed Size:  0.19 MiB
[ ]
real	0m2.936s
user	0m0.375s
sys	0m0.160s
Alpine s apk takes a little over 1 second to fetch and unpack 10 MB.
% docker run -t -i alpine
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.12/community/x86_64/APKINDEX.tar.gz
(1/4) Installing libbz2 (1.0.8-r1)
(2/4) Installing perl (5.30.3-r0)
(3/4) Installing perl-file-next (1.18-r0)
(4/4) Installing ack (3.3.1-r0)
Executing busybox-1.31.1-r16.trigger
OK: 43 MiB in 18 packages
real	0m 1.24s
user	0m 0.40s
sys	0m 0.15s

qemu You can expand each of these:
Fedora s dnf takes over 4 minutes to fetch and unpack 226 MB.
% docker run -t -i fedora /bin/bash
[root@6a52ecfc3afa /]# time dnf install -y qemu
Fedora 32 openh264 (From Cisco) - x86_64     3.1 kB/s   2.5 kB     00:00
Fedora Modular 32 - x86_64                   6.3 MB/s   4.9 MB     00:00
Fedora Modular 32 - x86_64 - Updates         6.0 MB/s   3.7 MB     00:00
Fedora 32 - x86_64 - Updates                 334 kB/s    23 MB     01:10
Fedora 32 - x86_64                            33 MB/s    70 MB     00:02
[ ]
Total download size: 181 M
Downloading Packages:
[ ]
real	4m37.652s
user	0m38.239s
sys	0m6.321s
NixOS s Nix takes almost 34s to fetch and unpack 180 MB.
% docker run -t -i nixos/nix
83971cf79f7e:/# time sh -c 'nix-channel --update && nix-env -iA nixpkgs.qemu'
unpacking channels...
created 1 symlinks in user environment
installing 'qemu-5.1.0'
these paths will be fetched (180.70 MiB download, 1146.92 MiB unpacked):
[ ]
real	0m 33.64s
user	0m 16.96s
sys	0m 3.05s
Debian s apt takes over 95 seconds to fetch and unpack 224 MB.
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://deb.debian.org/debian sid InRelease [146 kB]
Get:2 http://deb.debian.org/debian sid/main amd64 Packages [8400 kB]
Fetched 8546 kB in 1s (5998 kB/s)
[ ]
Fetched 216 MB in 43s (5006 kB/s)
[ ]
real	1m25.375s
user	0m29.163s
sys	0m12.835s
Arch Linux s pacman takes almost 44s to fetch and unpack 142 MB.
% docker run -t -i archlinux/base
[root@58c78bda08e8 /]# time (pacman -Sy && pacman -S --noconfirm qemu)
:: Synchronizing package databases...
 core          130.8 KiB  1055 KiB/s 00:00
 extra        1655.8 KiB  3.70 MiB/s 00:00
 community       5.2 MiB  7.89 MiB/s 00:01
[ ]
Total Download Size:   135.46 MiB
Total Installed Size:  661.05 MiB
[ ]
real	0m43.901s
user	0m4.980s
sys	0m2.615s
Alpine s apk takes only about 2.4 seconds to fetch and unpack 26 MB.
% docker run -t -i alpine
/ # time apk add qemu-system-x86_64
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
[ ]
OK: 78 MiB in 95 packages
real	0m 2.43s
user	0m 0.46s
sys	0m 0.09s

Appendix B: measurement details (2019)

ack You can expand each of these:
Fedora s dnf takes almost 30 seconds to fetch and unpack 107 MB.
% docker run -t -i fedora /bin/bash
[root@722e6df10258 /]# time dnf install -y ack
Fedora Modular 30 - x86_64            4.4 MB/s   2.7 MB     00:00
Fedora Modular 30 - x86_64 - Updates  3.7 MB/s   2.4 MB     00:00
Fedora 30 - x86_64 - Updates           17 MB/s    19 MB     00:01
Fedora 30 - x86_64                     31 MB/s    70 MB     00:02
[ ]
Install  44 Packages
Total download size: 13 M
Installed size: 42 M
[ ]
real	0m29.498s
user	0m22.954s
sys	0m1.085s
NixOS s Nix takes 14s to fetch and unpack 15 MB.
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i perl5.28.2-ack-2.28'
unpacking channels...
created 2 symlinks in user environment
installing 'perl5.28.2-ack-2.28'
these paths will be fetched (14.91 MiB download, 80.83 MiB unpacked):
  /nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2
  /nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48
  /nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man
  /nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27
  /nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31
  /nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53
  /nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16
  /nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28
copying path '/nix/store/gkrpl3k6s43fkg71n0269yq3p1f0al88-perl5.28.2-ack-2.28-man' from 'https://cache.nixos.org'...
copying path '/nix/store/iykxb0bmfjmi7s53kfg6pjbfpd8jmza6-glibc-2.27' from 'https://cache.nixos.org'...
copying path '/nix/store/x4knf14z1p0ci72gl314i7vza93iy7yc-perl5.28.2-File-Next-1.16' from 'https://cache.nixos.org'...
copying path '/nix/store/89gi8cbp8l5sf0m8pgynp2mh1c6pk1gk-attr-2.4.48' from 'https://cache.nixos.org'...
copying path '/nix/store/svgkibi7105pm151prywndsgvmc4qvzs-acl-2.2.53' from 'https://cache.nixos.org'...
copying path '/nix/store/k8lhqzpaaymshchz8ky3z4653h4kln9d-coreutils-8.31' from 'https://cache.nixos.org'...
copying path '/nix/store/57iv2vch31v8plcjrk97lcw1zbwb2n9r-perl-5.28.2' from 'https://cache.nixos.org'...
copying path '/nix/store/zfj7ria2kwqzqj9dh91kj9kwsynxdfk0-perl5.28.2-ack-2.28' from 'https://cache.nixos.org'...
building '/nix/store/q3243sjg91x1m8ipl0sj5gjzpnbgxrqw-user-environment.drv'...
created 56 symlinks in user environment
real	0m 14.02s
user	0m 8.83s
sys	0m 2.69s
Debian s apt takes almost 10 seconds to fetch and unpack 16 MB.
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y ack-grep)
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [233 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8270 kB]
Fetched 8502 kB in 2s (4764 kB/s)
[ ]
The following NEW packages will be installed:
  ack ack-grep libfile-next-perl libgdbm-compat4 libgdbm5 libperl5.26 netbase perl perl-modules-5.26
The following packages will be upgraded:
  perl-base
1 upgraded, 9 newly installed, 0 to remove and 60 not upgraded.
Need to get 8238 kB of archives.
After this operation, 42.3 MB of additional disk space will be used.
[ ]
real	0m9.096s
user	0m2.616s
sys	0m0.441s
Arch Linux s pacman takes a little over 3s to fetch and unpack 6.5 MB.
% docker run -t -i archlinux/base
[root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm ack)
:: Synchronizing package databases...
 core            132.2 KiB  1033K/s 00:00
 extra          1629.6 KiB  2.95M/s 00:01
 community         4.9 MiB  5.75M/s 00:01
[ ]
Total Download Size:   0.07 MiB
Total Installed Size:  0.19 MiB
[ ]
real	0m3.354s
user	0m0.224s
sys	0m0.049s
Alpine s apk takes only about 1 second to fetch and unpack 10 MB.
% docker run -t -i alpine
/ # time apk add ack
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
(1/4) Installing perl-file-next (1.16-r0)
(2/4) Installing libbz2 (1.0.6-r7)
(3/4) Installing perl (5.28.2-r1)
(4/4) Installing ack (3.0.0-r0)
Executing busybox-1.30.1-r2.trigger
OK: 44 MiB in 18 packages
real	0m 0.96s
user	0m 0.25s
sys	0m 0.07s

qemu You can expand each of these:
Fedora s dnf takes over a minute to fetch and unpack 266 MB.
% docker run -t -i fedora /bin/bash
[root@722e6df10258 /]# time dnf install -y qemu
Fedora Modular 30 - x86_64            3.1 MB/s   2.7 MB     00:00
Fedora Modular 30 - x86_64 - Updates  2.7 MB/s   2.4 MB     00:00
Fedora 30 - x86_64 - Updates           20 MB/s    19 MB     00:00
Fedora 30 - x86_64                     31 MB/s    70 MB     00:02
[ ]
Install  262 Packages
Upgrade    4 Packages
Total download size: 172 M
[ ]
real	1m7.877s
user	0m44.237s
sys	0m3.258s
NixOS s Nix takes 38s to fetch and unpack 262 MB.
% docker run -t -i nixos/nix
39e9186422ba:/# time sh -c 'nix-channel --update && nix-env -i qemu-4.0.0'
unpacking channels...
created 2 symlinks in user environment
installing 'qemu-4.0.0'
these paths will be fetched (262.18 MiB download, 1364.54 MiB unpacked):
[ ]
real	0m 38.49s
user	0m 26.52s
sys	0m 4.43s
Debian s apt takes 51 seconds to fetch and unpack 159 MB.
% docker run -t -i debian:sid
root@b7cc25a927ab:/# time (apt update && apt install -y qemu-system-x86)
Get:1 http://cdn-fastly.deb.debian.org/debian sid InRelease [149 kB]
Get:2 http://cdn-fastly.deb.debian.org/debian sid/main amd64 Packages [8426 kB]
Fetched 8574 kB in 1s (6716 kB/s)
[ ]
Fetched 151 MB in 2s (64.6 MB/s)
[ ]
real	0m51.583s
user	0m15.671s
sys	0m3.732s
Arch Linux s pacman takes 1m2s to fetch and unpack 124 MB.
% docker run -t -i archlinux/base
[root@9604e4ae2367 /]# time (pacman -Sy && pacman -S --noconfirm qemu)
:: Synchronizing package databases...
 core       132.2 KiB   751K/s 00:00
 extra     1629.6 KiB  3.04M/s 00:01
 community    4.9 MiB  6.16M/s 00:01
[ ]
Total Download Size:   123.20 MiB
Total Installed Size:  587.84 MiB
[ ]
real	1m2.475s
user	0m9.272s
sys	0m2.458s
Alpine s apk takes only about 2.4 seconds to fetch and unpack 26 MB.
% docker run -t -i alpine
/ # time apk add qemu-system-x86_64
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/main/x86_64/APKINDEX.tar.gz
fetch http://dl-cdn.alpinelinux.org/alpine/v3.10/community/x86_64/APKINDEX.tar.gz
[ ]
OK: 78 MiB in 95 packages
real	0m 2.43s
user	0m 0.46s
sys	0m 0.09s

Michael Stapelberg: distri: a Linux distribution to research fast package management

Over the last year or so I have worked on a research linux distribution in my spare time. It s not a distribution for researchers (like Scientific Linux), but my personal playground project to research linux distribution development, i.e. try out fresh ideas. This article focuses on the package format and its advantages, but there is more to distri, which I will cover in upcoming blog posts.

Motivation I was a Debian Developer for the 7 years from 2012 to 2019, but using the distribution often left me frustrated, ultimately resulting in me winding down my Debian work. Frequently, I was noticing a large gap between the actual speed of an operation (e.g. doing an update) and the possible speed based on back of the envelope calculations. I wrote more about this in my blog post Package managers are slow . To me, this observation means that either there is potential to optimize the package manager itself (e.g. apt), or what the system does is just too complex. While I remember seeing some low-hanging fruit , through my work on distri, I wanted to explore whether all the complexity we currently have in Linux distributions such as Debian or Fedora is inherent to the problem space. I have completed enough of the experiment to conclude that the complexity is not inherent: I can build a Linux distribution for general-enough purposes which is much less complex than existing ones. Those were low-hanging fruit from a user perspective. I m not saying that fixing them is easy in the technical sense; I know too little about apt s code base to make such a statement.

Key idea: packages are images, not archives One key idea is to switch from using archives to using images for package contents. Common package managers such as dpkg(1) use tar(1) archives with various compression algorithms. distri uses SquashFS images, a comparatively simple file system image format that I happen to be familiar with from my work on the gokrazy Raspberry Pi 3 Go platform. This idea is not novel: AppImage and snappy also use images, but only for individual, self-contained applications. distri however uses images for distribution packages with dependencies. In particular, there is no duplication of shared libraries in distri. A nice side effect of using read-only image files is that applications are immutable and can hence not be broken by accidental (or malicious!) modification.

Key idea: separate hierarchies Package contents are made available under a fully-qualified path. E.g., all files provided by package zsh-amd64-5.6.2-3 are available under /ro/zsh-amd64-5.6.2-3. The mountpoint /ro stands for read-only, which is short yet descriptive. Perhaps surprisingly, building software with custom prefix values of e.g. /ro/zsh-amd64-5.6.2-3 is widely supported, thanks to:
  1. Linux distributions, which build software with prefix set to /usr, whereas FreeBSD (and the autotools default), which build with prefix set to /usr/local.
  2. Enthusiast users in corporate or research environments, who install software into their home directories.
Because using a custom prefix is a common scenario, upstream awareness for prefix-correctness is generally high, and the rarely required patch will be quickly accepted.

Key idea: exchange directories Software packages often exchange data by placing or locating files in well-known directories. Here are just a few examples:
  • gcc(1) locates the libusb(3) headers via /usr/include
  • man(1) locates the nginx(1) manpage via /usr/share/man.
  • zsh(1) locates executable programs via PATH components such as /bin
In distri, these locations are called exchange directories and are provided via FUSE in /ro. Exchange directories come in two different flavors:
  1. global. The exchange directory, e.g. /ro/share, provides the union of the share sub directory of all packages in the package store.
    Global exchange directories are largely used for compatibility, see below.
  2. per-package. Useful for tight coupling: e.g. irssi(1) does not provide any ABI guarantees, so plugins such as irssi-robustirc can declare that they want e.g. /ro/irssi-amd64-1.1.1-1/out/lib/irssi/modules to be a per-package exchange directory and contain files from their lib/irssi/modules.

Search paths sometimes need to be fixed Programs which use exchange directories sometimes use search paths to access multiple exchange directories. In fact, the examples above were taken from gcc(1) s INCLUDEPATH, man(1) s MANPATH and zsh(1) s PATH. These are prominent ones, but more examples are easy to find: zsh(1) loads completion functions from its FPATH. Some search path values are derived from --datadir=/ro/share and require no further attention, but others might derive from e.g. --prefix=/ro/zsh-amd64-5.6.2-3/out and need to be pointed to an exchange directory via a specific command line flag.

FHS compatibility Global exchange directories are used to make distri provide enough of the Filesystem Hierarchy Standard (FHS) that third-party software largely just works. This includes a C development environment. I successfully ran a few programs from their binary packages such as Google Chrome, Spotify, or Microsoft s Visual Studio Code.

Fast package manager I previously wrote about how Linux distribution package managers are too slow. distri s package manager is extremely fast. Its main bottleneck is typically the network link, even at high speed links (I tested with a 100 Gbps link). Its speed comes largely from an architecture which allows the package manager to do less work. Specifically:
  1. Package images can be added atomically to the package store, so we can safely skip fsync(2) . Corruption will be cleaned up automatically, and durability is not important: if an interactive installation is interrupted, the user can just repeat it, as it will be fresh on their mind.
  2. Because all packages are co-installable thanks to separate hierarchies, there are no conflicts at the package store level, and no dependency resolution (an optimization problem requiring SAT solving) is required at all.
    In exchange directories, we resolve conflicts by selecting the package with the highest monotonically increasing distri revision number.
  3. distri proves that we can build a useful Linux distribution entirely without hooks and triggers. Not having to serialize hook execution allows us to download packages into the package store with maximum concurrency.
  4. Because we are using images instead of archives, we do not need to unpack anything. This means installing a package is really just writing its package image and metadata to the package store. Sequential writes are typically the fastest kind of storage usage pattern.
Fast installation also make other use-cases more bearable, such as creating disk images, be it for testing them in qemu(1) , booting them on real hardware from a USB drive, or for cloud providers such as Google Cloud.

Fast package builder Contrary to how distribution package builders are usually implemented, the distri package builder does not actually install any packages into the build environment. Instead, distri makes available a filtered view of the package store (only declared dependencies are available) at /ro in the build environment. This means that even for large dependency trees, setting up a build environment happens in a fraction of a second! Such a low latency really makes a difference in how comfortable it is to iterate on distribution packages.

Package stores In distri, package images are installed from a remote package store into the local system package store /roimg, which backs the /ro mount. A package store is implemented as a directory of package images and their associated metadata files. You can easily make available a package store by using distri export. To provide a mirror for your local network, you can periodically distri update from the package store you want to mirror, and then distri export your local copy. Special tooling (e.g. debmirror in Debian) is not required because distri install is atomic (and update uses install). Producing derivatives is easy: just add your own packages to a copy of the package store. The package store is intentionally kept simple to manage and distribute. Its files could be exchanged via peer-to-peer file systems, or synchronized from an offline medium.

distri s first release distri works well enough to demonstrate the ideas explained above. I have branched this state into branch jackherer, distri s first release code name. This way, I can keep experimenting in the distri repository without breaking your installation. From the branch contents, our autobuilder creates:
  1. disk images, which
  2. a package repository. Installations can pick up new packages with distri update.
  3. documentation for the release.
The project website can be found at https://distr1.org. The website is just the README for now, but we can improve that later. The repository can be found at https://github.com/distr1/distri

Project outlook Right now, distri is mainly a vehicle for my spare-time Linux distribution research. I don t recommend anyone use distri for anything but research, and there are no medium-term plans of that changing. At the very least, please contact me before basing anything serious on distri so that we can talk about limitations and expectations. I expect the distri project to live for as long as I have blog posts to publish, and we ll see what happens afterwards. Note that this is a hobby for me: I will continue to explore, at my own pace, parts that I find interesting. My hope is that established distributions might get a useful idea or two from distri.

There s more to come: subscribe to the distri feed I don t want to make this post too long, but there is much more! Please subscribe to the following URL in your feed reader to get all posts about distri: https://michael.stapelberg.ch/posts/tags/distri/feed.xml Next in my queue are articles about hermetic packages and good package maintainer experience (including declarative packaging).

Feedback or questions? I d love to discuss these ideas in case you re interested! Please send feedback to the distri mailing list so that everyone can participate!

Fran ois Marier: Making an Apache website available as a Tor Onion Service

As part of the #MoreOnionsPorFavor campaign, I decided to follow brave.com's lead and make my homepage available as a Tor onion service.

Tor daemon setup I started by installing the Tor daemon locally:
apt install tor
and then setting the following in /etc/tor/torrc:
SocksPort 0
SocksPolicy reject *
HiddenServiceDir /var/lib/tor/hidden_service/
HiddenServicePort 80 [2600:3c04::f03c:91ff:fe8c:61ac]:80
HiddenServicePort 443 [2600:3c04::f03c:91ff:fe8c:61ac]:443
HiddenServiceVersion 3
HiddenServiceNonAnonymousMode 1
HiddenServiceSingleHopMode 1
in order to create a version 3 onion service without actually running a Tor relay. Note that since I am making a public website available over Tor, I do not need the location of the website to be hidden and so I used the same settings as Cloudflare in their public Tor proxy. Also, I explicitly used the external IPv6 address of my server in the configuration in order to prevent localhost bypasses. After restarting the Tor daemon to reload the configuration file:
systemctl restart tor.service
I looked for the address of my onion service:
$ cat /var/lib/tor/hidden_service/hostname 
ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion

Apache configuration Next, I enabled a few required Apache modules:
a2enmod mpm_event
a2enmod http2
a2enmod headers
and configured my Apache vhosts in /etc/apache2/sites-enabled/www.conf:
<VirtualHost *:443>
    ServerName fmarier.org
    ServerAlias ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion
    Protocols h2, http/1.1
    Header set Onion-Location "http://ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion% REQUEST_URI s"
    Header set alt-svc 'h2="ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion:443"; ma=315360000; persist=1'
    Header add Strict-Transport-Security: "max-age=63072000"
    Include /etc/fmarier-org/www-common.include
    SSLEngine On
    SSLCertificateFile /etc/letsencrypt/live/fmarier.org/fullchain.pem
    SSLCertificateKeyFile /etc/letsencrypt/live/fmarier.org/privkey.pem
</VirtualHost>
<VirtualHost *:80>
    ServerName fmarier.org
    Redirect permanent / https://fmarier.org/
</VirtualHost>
<VirtualHost *:80>
    ServerName ixrdj3iwwhkuau5tby5jh3a536a2rdhpbdbu6ldhng43r47kim7a3lid.onion
    Include /etc/fmarier-org/www-common.include
</VirtualHost>
Note that /etc/fmarier-org/www-common.include contains all of the configuration options that are common to both the HTTP and the HTTPS sites (e.g. document root, caching headers, aliases, etc.). Finally, I restarted Apache:
apache2ctl configtest
systemctl restart apache2.service

Testing In order to test that my website is correctly available at its .onion address, I opened the following URLs in a Brave Tor window: I also checked that the main URL (https://fmarier.org/) exposes a working Onion-Location header which triggers the display of a button in the URL bar (recently merged and available in Brave Nightly): Testing that the Alt-Svc is working required using the Tor Browser since that's not yet supported in Brave:
  1. Open https://fmarier.org.
  2. Wait 30 seconds.
  3. Reload the page.
On the server side, I saw the following:
2a0b:f4c2:2::1 - - [14/Oct/2020:02:42:20 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"
2600:3c04::f03c:91ff:fe8c:61ac - - [14/Oct/2020:02:42:53 +0000] "GET / HTTP/2.0" 200 2696 "-" "Mozilla/5.0 (Windows NT 10.0; rv:78.0) Gecko/20100101 Firefox/78.0"
That first IP address is from a Tor exit node:
$ whois 2a0b:f4c2:2::1
...
inet6num:       2a0b:f4c2::/40
netname:        MK-TOR-EXIT
remarks:        -----------------------------------
remarks:        This network is used for Tor Exits.
remarks:        We do not have any logs at all.
remarks:        For more information please visit:
remarks:        https://www.torproject.org
which indicates that the first request was not using the .onion address. The second IP address is the one for my server:
$ dig +short -x 2600:3c04::f03c:91ff:fe8c:61ac
hafnarfjordur.fmarier.org.
which indicates that the second request to Apache came from the Tor relay running on my server, hence using the .onion address.

8 October 2020

Dirk Eddelbuettel: littler 0.3.12: Exciting updates

max-heap image The thirteenth release of littler as a CRAN package became available today (after a three day rest at CRAN for no real reason), following in the fourteen-ish year history as a package started by Jeff in 2006, and joined by me a few weeks later. littler is the first command-line interface for R as it predates Rscript. It allows for piping as well for shebang scripting via #!, uses command-line arguments more consistently and still starts faster. It also always loaded the methods package which Rscript only started to do in recent years. littler lives on Linux and Unix, has its difficulties on macOS due to yet-another-braindeadedness there (who ever thought case-insensitive filesystems as a default where a good idea?) and simply does not exist on Windows (yet the build system could be extended see RInside for an existence proof, and volunteers are welcome!). See the FAQ vignette on how to add it to your PATH. A few examples are highlighted at the Github repo, as well as in the examples vignette. This release brings five new example scripts and command wrappers: A number of commands were also extended or updated (see below for more). We have a new and very slick documentation website once again utilising Material for MkDocs. Last but not least the two included vignettes now use minidown and the fabulous water css theme which reduced the file sizes of the two vignettes from, respectively, 884kb and 873kb to 47kb and 15kb. Yes, that is correct. That alone brought the package file size down from 641kb to 116kb. Incredible. The NEWS file entry is below.

Changes in littler version 0.3.12 (2020-10-04)
  • Changes in examples
    • Updates to scripts tt.r, cos.r, cow.r, c4r.r, com.r
    • New script installDeps.r to install dependencies
    • Several updates tp script check.r
    • New script installBSPM.r and installRSPM.r for binary package installation (Dirk and I aki in #81)
    • New script cranIncoming.r to check in Incoming
    • New script urlUpdate.r validates URLs as R does
  • Changes in package
    • Travis CI now uses BSPM
    • A package documentation website was added
    • Vignettes now use minidown resulting in much reduced filesizes: from over 800kb to under 50kb (Dirk in #83)

My CRANberries provides a comparison to the previous release. Full details for the littler release are provided as usual at the ChangeLog page, and now also on the new package docs website. The code is available via the GitHub repo, from tarballs and now of course also from its CRAN page and via install.packages("littler"). Binary packages are available directly in Debian as well as soon via Ubuntu binaries at CRAN thanks to the tireless Michael Rutter. Comments and suggestions are welcome at the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

5 October 2020

Reproducible Builds: Reproducible Builds in September 2020

Welcome to the September 2020 report from the Reproducible Builds project. In our monthly reports, we attempt to summarise the things that we have been up to over the past month, but if you are interested in contributing to the project, please visit our main website. This month, the Reproducible Builds project was pleased to announce a donation from Amateur Radio Digital Communications (ARDC) in support of its goals. ARDC s contribution will propel the Reproducible Builds project s efforts in ensuring the future health, security and sustainability of our increasingly digital society. Amateur Radio Digital Communications (ARDC) is a non-profit which was formed to further research and experimentation with digital communications using radio, with a goal of advancing the state of the art of amateur radio and to educate radio operators in these techniques. You can view the full announcement as well as more information about ARDC on their website.
In August s report, we announced that Jennifer Helsby (redshiftzero) launched a new reproduciblewheels.com website to address the lack of reproducibility of Python wheels . This month, Kushal Das posted a brief follow-up to provide an update on reproducible sources as well. The Threema privacy and security-oriented messaging application announced that within the next months , their apps will become fully open source, supporting reproducible builds :
This is to say that anyone will be able to independently review Threema s security and verify that the published source code corresponds to the downloaded app.
You can view the full announcement on Threema s website.

Events Sadly, due to the unprecedented events in 2020, there will be no in-person Reproducible Builds event this year. However, the Reproducible Builds project intends to resume meeting regularly on IRC, starting on Monday, October 12th at 18:00 UTC (full announcement). The cadence of these meetings will probably be every two weeks, although this will be discussed and decided on at the first meeting. (An editable agenda is available.) On 18th September, Bernhard M. Wiedemann gave a presentation in German titled Wie reproducible builds Software sicherer machen ( How reproducible builds make software more secure ) at the Internet Security Digital Days 2020 conference. (View video.) On Saturday 10th October, Morten Linderud will give a talk at Arch Conf Online 2020 on The State of Reproducible Builds in the Arch Linux distribution:
The previous year has seen great progress in Arch Linux to get reproducible builds in the hands of the users and developers. In this talk we will explore the current tooling that allows users to reproduce packages, the rebuilder software that has been written to check packages and the current issues in this space.
During the Reproducible Builds summit in Marrakesh, GNU Guix, NixOS and Debian were able to produce a bit-for-bit identical binary when building GNU Mes, despite using three different major versions of GCC. Since the summit, additional work resulted in a bit-for-bit identical Mes binary using tcc and this month, a fuller update was posted by the individuals involved.

Development work In openSUSE, Bernhard M. Wiedemann published his monthly Reproducible Builds status update.

Debian Chris Lamb uploaded a number of Debian packages to address reproducibility issues that he had previously provided patches for, including cfingerd (#831021), grap (#870573), splint (#924003) & schroot (#902804) Last month, an issue was identified where a large number of Debian .buildinfo build certificates had been tainted on the official Debian build servers, as these environments had files underneath the /usr/local/sbin directory to prevent the execution of system services during package builds. However, this month, Aurelien Jarno and Wouter Verhelst fixed this issue in varying ways, resulting in a special policy-rcd-declarative-deny-all package. Building on Chris Lamb s previous work on reproducible builds for Debian .ISO images, Roland Clobus announced his work in progress on making the Debian Live images reproducible. [ ] Lucas Nussbaum performed an archive-wide rebuild of packages to test enabling the reproducible=+fixfilepath Debian build flag by default. Enabling the fixfilepath feature will likely fix reproducibility issues in an estimated 500-700 packages. The test revealed only 33 packages (out of 30,000 in the archive) that fail to build with fixfilepath. Many of those will be fixed when the default LLVM/Clang version is upgraded. 79 reviews of Debian packages were added, 23 were updated and 17 were removed this month adding to our knowledge about identified issues. Chris Lamb added and categorised a number of new issue types, including packages that captures their build path via quicktest.h and absolute build directories in documentation generated by Doxygen , etc. Lastly, Lukas Puehringer s uploaded a new version of the in-toto to Debian which was sponsored by Holger Levsen. [ ]

diffoscope diffoscope is our in-depth and content-aware diff utility that can not only locate and diagnose reproducibility issues, it provides human-readable diffs of all kinds too. In September, Chris Lamb made the following changes to diffoscope, including preparing and uploading versions 159 and 160 to Debian:
  • New features:
    • Show ordering differences only in strings(1) output by applying the ordering check to all differences across the codebase. [ ]
  • Bug fixes:
    • Mark some PGP tests that they require pgpdump, and check that the associated binary is actually installed before attempting to run it. (#969753)
    • Don t raise exceptions when cleaning up after guestfs cleanup failure. [ ]
    • Ensure we check FALLBACK_FILE_EXTENSION_SUFFIX, otherwise we run pgpdump against all files that are recognised by file(1) as data. [ ]
  • Codebase improvements:
    • Add some documentation for the EXTERNAL_TOOLS dictionary. [ ]
    • Abstract out a variable we use a couple of times. [ ]
  • diffoscope.org website improvements:
    • Make the (long) demonstration GIF less prominent on the page. [ ]
In addition, Paul Spooren added support for automatically deploying Docker images. [ ]

Website and documentation This month, a number of updates to the main Reproducible Builds website and related documentation. Chris Lamb made the following changes: In addition, Holger Levsen re-added the documentation link to the top-level navigation [ ] and documented that the jekyll-polyglot package is required [ ]. Lastly, diffoscope.org and reproducible-builds.org were transferred to Software Freedom Conservancy. Many thanks to Brett Smith from Conservancy, J r my Bobbio (lunar) and Holger Levsen for their help with transferring and to Mattia Rizzolo for initiating this.

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of these patches, including: Bernhard M. Wiedemann also reported issues in git2-rs, pyftpdlib, python-nbclient, python-pyzmq & python-sidpy.

Testing framework The Reproducible Builds project operates a Jenkins-based testing framework to power tests.reproducible-builds.org. This month, Holger Levsen made the following changes:
  • Debian:
    • Shorten the subject of nodes have gone offline notification emails. [ ]
    • Also track bugs that have been usertagged with usrmerge. [ ]
    • Drop abort-related codepaths as that functionality has been removed from Jenkins. [ ]
    • Update the frequency we update base images and status pages. [ ][ ][ ][ ]
  • Status summary view page:
    • Add support for monitoring systemctl status [ ] and the number of diffoscope processes [ ].
    • Show the total number of nodes [ ] and colourise critical disk space situations [ ].
    • Improve the visuals with respect to vertical space. [ ][ ]
  • Debian rebuilder prototype:
    • Resume building random packages again [ ] and update the frequency that packages are rebuilt. [ ][ ]
    • Use --no-respect-build-path parameter until sbuild 0.81 is available. [ ]
    • Treat the inability to locate some packages as a debrebuild problem, and not as a issue with the rebuilder itself. [ ]
  • Arch Linux:
    • Update various components to be compatible with Arch Linux s move to the xz compression format. [ ][ ][ ]
    • Allow scheduling of old packages to catch up on the backlog. [ ][ ][ ]
    • Improve formatting on the summary page. [ ][ ]
    • Update HTML pages once every hour, not every 30 minutes. [ ]
    • Use the Ubuntu (!) GPG keyserver to validate packages. [ ]
  • System health checks:
    • Highlight important bad conditions in colour. [ ][ ]
    • Add support for detecting more problems, including Jenkins shutdown issues [ ], failure to upgrade Arch Linux packages [ ], kernels with wrong permissions [ ], etc.
  • Misc:
    • Delete old schroot sessions after 2 days, not 3. [ ]
    • Use sudo to cleanup diffoscope schroot sessions. [ ]
In addition, stefan0xC fixed a query for unknown results in the handling of Arch Linux packages [ ] and Mattia Rizzolo updated the template that notifies maintainers by email of their newly-unreproducible packages to ensure that it did not get caught in junk/spam folders [ ]. Finally, build node maintenance was performed by Holger Levsen [ ][ ][ ][ ], Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ].
If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

3 October 2020

Julian Andres Klode: Google Pixel 4a: Initial Impressions

Yesterday I got a fresh new Pixel 4a, to replace my dying OnePlus 6. The OnePlus had developed some faults over time: It repeatedly loses connection to the AP and the network, and it got a bunch of scratches and scuffs from falling on various surfaces without any protection over the past year.

Why get a Pixel? Camera: OnePlus focuses on stuffing as many sensors as it can into a phone, rather than a good main sensor, resulting in pictures that are mediocre blurry messes - the dreaded oil painting effect. Pixel have some of the best camera in the smartphone world. Sure, other hardware is far more capable, but the Pixels manage consistent results, so you need to take less pictures because they don t come out blurry half the time, and the post processing is so good that the pictures you get are just great. Other phones can shoot better pictures, sure - on a tripod. Security updates: Pixels provide 3 years of monthly updates, with security updates being published on the 5th of each month. OnePlus only provides updates every 2 months, and then the updates they do release are almost a month out of date, not counting that they are only 1st-of-month patches, meaning vendor blob updates included in the 5th-of-month updates are even a month older. Given that all my banking runs on the phone, I don t want it to be constantly behind. Feature updates: Of course, Pixels also get Beta Android releases and the newest Android release faster than any other phone, which is advantageous for Android development and being nerdy. Size and weight: OnePlus phones keep getting bigger and bigger. By today s standards, the OnePlus 6 at 6.18" and 177g is a small an lightweight device. Their latest phone, the Nord, has 6.44" and weighs 184g, the OnePlus 8 comes in at 180g with a 6.55" display. This is becoming unwieldy. Eschewing glass and aluminium for plastic, the Pixel 4a comes in at 144g.

First impressions

Accessories The Pixel 4a comes in a small box with a charger, USB-C to USB-C cable, a USB-OTG adapter, sim tray ejector. No pre-installed screen protector or bumper are provided, as we ve grown accustomed to from Chinese manufacturers like OnePlus or Xiaomi. The sim tray ejector has a circular end instead of the standard oval one - I assume so it looks like the o in Google? Google sells you fabric cases for 45 . That seems a bit excessive, although I like that a lot of it is recycled.

Haptics Coming from a 6.18" phablet, the Pixel 4a with its 5.81" feels tiny. In fact, it s so tiny my thumb and my index finger can touch while holding it. Cute! Bezels are a bit bigger, resulting in slightly less screen to body. The bottom chin is probably impracticably small, this was already a problem on the OnePlus 6, but this one is even smaller. Oh well, form over function. The buttons on the side are very loud and clicky. As is the vibration motor. I wonder if this Pixel thinks it s a Model M. It just feels great. The plastic back feels really good, it s that sort of high quality smooth plastic you used to see on those high-end Nokia devices. The finger print reader, is super fast. Setup just takes a few seconds per finger, and it works reliably. Other phones (OnePlus 6, Mi A1/A2) take like half a minute or a minute to set up.

Software The software - stock Android 11 - is fairly similar to OnePlus' OxygenOS. It s a clean experience, without a ton of added bloatware (even OnePlus now ships Facebook out of box, eww). It s cleaner than OxygenOS in some way - there are no duplicate photos apps, for example. On the other hand, it also has quite a bunch of Google stuff I could not care less about like YT Music. To be fair, those are minor noise once all 130 apps were transferred from the old phone. There are various things I miss coming from OnePlus such as off-screen gestures, network transfer rate indicator in quick settings, or a circular battery icon. But the Pixel has an always on display, which is kind of nice. Most of the cool Pixel features, like call screening or live transcriptions are unfortunately not available in Germany. The display is set to display the same amount of content as my 6.18" OnePlus 6 did, so everything is a bit tinier. This usually takes me a week or two to adjust too, and then when I look at the OnePlus again I ll be like Oh the font is huge , but right now, it feels a bit small on the Pixel. You can configure three colour profiles for the Pixel 4a: Natural, Boosted, and Adaptive. I have mine set to adaptive. I d love to see stock Android learn what OnePlus has here: the ability to adjust the colour temperature manually, as I prefer to keep my devices closer to 5500K than 6500K, as I feel it s a bit easier on the eyes. Or well, just give me the ability to load a ICM profile (though, I d need to calibrate the screen then - work!).

Migration experience Restoring the apps from my old phone only restore settings for a few handful out of 130, which is disappointing. I had to spent an hour or two logging in to all the other apps, and I had to fiddle far too long with openScale to get it to take its data over. It s a mystery to me why people do not allow their apps to be backed up, especially something innocent like a weight tracking app. One of my banking apps restored its logins, which I did not really like. KeePass2Android settings were restored as well, but at least the key file was not restored. I did not opt in to restoring my device settings, as I feel that restoring device settings when changing manufactures is bound to mess up some things. For example, I remember people migrating to OnePlus phones and getting their old DND schedule without any way to change it, because OnePlus had hidden the DND stuff. I assume that s the reason some accounts, like my work GSuite account were not migrated (it said it would migrate accounts during setup). I ve setup Bitwarden as my auto-fill service, so I could login into most of my apps and websites using the stored credentials. I found that often that did not work. Like Chrome does autofill fine once, but if I then want to autofill again, I have to kill and restart it, otherwise I don t get the auto-fill menu. Other apps did not allow any auto-fill at all, and only gave me the option to copy and paste. Yikes - auto-fill on Android still needs a lot of work.

Performance It hangs a bit sometimes, but this was likely due to me having set 2 million iterations on my Bitwarden KDF and using Bitwarden a lot, and then opening up all 130 apps to log into them which overwhelmed the phone a bit. Apart from that, it does not feel worse than the OnePlus 6 which was to be expected, given that the benchmarks only show a slight loss in performance. Photos do take a few seconds to process after taking them, which is annoying, but understandable given how much Google relies on computation to provide decent pictures.

Audio The Pixel has dual speakers, with the earpiece delivering a tiny sound and the bottom firing speaker doing most of the work. Still, it s better than just having the bottom firing speaker, as it does provide a more immersive experience. Bass makes this thing vibrate a lot. It does not feel like a resonance sort of thing, but you can feel the bass in your hands. I ve never had this before, and it will take some time getting used to.

Final thoughts This is a boring phone. There s no wow factor at all. It s neither huge, nor does it have high-res 48 or 64 MP cameras, nor does it have a ton of sensors. But everything it does, it does well. It does not pretend to be a flagship like its competition, it doesn t want to wow you, it just wants to be the perfect phone for you. The build is solid, the buttons make you think of a Model M, the camera is one of the best in any smartphone, and you of course get the latest updates before anyone else. It does not feel like a only 350 phone, but yet it is. 128GB storage is plenty, 1080p resolution is plenty, 12.2MP is you guessed it, plenty. The same applies to the other two Pixel phones - the 4a 5G and 5. Neither are particularly exciting phones, and I personally find it hard to justify spending 620 on the Pixel 5 when the Pixel 4a does job for me, but the 4a 5G might appeal to users looking for larger phones. As to 5G, I wouldn t get much use out of it, seeing as its not available anywhere I am. Because I m on Vodafone. If you have a Telekom contract or live outside of Germany, you might just have good 5G coverage already and it might make sense to get a 5G phone rather than sticking to the budget choice.

Outlook The big question for me is whether I ll be able to adjust to the smaller display. I now have a tablet, so I m less often using the phone (which my hands thank me for), which means that a smaller phone is probably a good call. Oh while we re talking about calls - I only have a data-only SIM in it, so I could not test calling. I m transferring to a new phone contract this month, and I ll give it a go then. This will be the first time I get VoLTE and WiFi calling, although it is Vodafone, so quality might just be worse than Telekom on 2G, who knows. A big shoutout to congstar for letting me cancel with a simple button click, and to @vodafoneservice on twitter for quickly setting up my benefits of additional 5GB per month and 10 discount for being an existing cable customer. I m also looking forward to playing around with the camera (especially night sight), and eSIM. And I m getting a case from China, which was handed over to the Airline on Sep 17 according to Aliexpress, so I guess it should arrive in the next weeks. Oh, and screen protector is not here yet, so I can t really judge the screen quality much, as I still have the factory protection film on it, and that s just a blurry mess - but good enough for setting it up. Please Google, pre-apply a screen protector on future phones and include a simple bumper case. I might report back in two weeks when I have spent some more time with the device.

30 September 2020

Paul Wise: FLOSS Activities September 2020

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Administration
  • Debian wiki: unblock IP addresses, approve accounts

Communication

Sponsors The gensim, cython-blis, python-preshed, pytest-rerunfailures, morfessor, nmslib, visdom and pyemd work was sponsored by my employer. All other work was done on a volunteer basis.

Chris Lamb: Free software activities in September 2020

Here is my monthly update covering what I have been doing in the free software world during September 2020 (previous month):

Reproducible Builds One of the original promises of open source software is that distributed peer review and transparency of process results in enhanced end-user security. However, whilst anyone may inspect the source code of free and open source software for malicious flaws, almost all software today is distributed as pre-compiled binaries. This allows nefarious third-parties to compromise systems by injecting malicious code into ostensibly secure software during the various compilation and distribution processes. The motivation behind the Reproducible Builds effort is to ensure no flaws have been introduced during this compilation process by promising identical results are always generated from a given source, thus allowing multiple third-parties to come to a consensus on whether a build was compromised. The project is proud to be a member project of the Software Freedom Conservancy. Conservancy acts as a corporate umbrella allowing projects to operate as non-profit initiatives without managing their own corporate structure. If you like the work of the Conservancy or the Reproducible Builds project, please consider becoming an official supporter. This month, I:

diffoscope I made the following changes to diffoscope, including preparing and uploading versions 159 and 160 to Debian:

Debian Lintian For Lintian, the static analysis tool for Debian packages, I uploaded versions 2.93.0, 2.94.0, 2.95.0 & 2.96.0 (not counting uploads to the backports repositories), as well as: Debian LTS This month I've worked 18 hours on Debian Long Term Support (LTS) and 12 hours on its sister Extended LTS project. You can find out more about the project via the following video:
Uploads Bugs filed

27 September 2020

Dirk Eddelbuettel: pkgKitten 0.2.0: Now with tinytest and new docs

kitten A new release 0.2.0 of pkgKitten just hit on CRAN today, or about eleven months after the previous release. This release brings support for tinytest by having pkgKitten::kitten() automagically call tinytest::puppy() if the latter package is installed (and the user did not opt out of calling it). So your newly created minimal package now also uses a wonderful yet tiny testing framework. We also added a new documentation site using the previously tweeted-about wrapper for Material for MkDocs I really dig. And last but not least we switched to BSPM-based Continued Integration (which I wrote about yesterday in R4 #30) and fixed one bug regarding the default NAMESPACE file.

Changes in version 0.2.0 (2020-09-27)
  • Continuous Integration uses the updated BSPM-based script on Travis and with GitHub Actions (Dirk in #11 plus earlier commits).
  • A new default NAMESPACE file is now installed (Dirk in #12).
  • A package documentation website was added (Dirk in #13).
  • Call tinytest::puppy if installed and not opted out (Dirk in #14).

More details about the package are at the pkgKitten webpage, the (new) pkgKitten docs site, and the pkgKitten GitHub repo. Courtesy of my CRANberries site, there is also a diffstat report for this release. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

26 September 2020

Fran ois Marier: Repairing a corrupt ext4 root partition

I ran into filesystem corruption (ext4) on the root partition of my backup server which caused it to go into read-only mode. Since it's the root partition, it's not possible to unmount it and repair it while it's running. Normally I would boot from an Ubuntu live CD / USB stick, but in this case the machine is using the mipsel architecture and so that's not an option.

Repair using a USB enclosure I had to pull the shutdown the server and then pull the SSD drive out. I then moved it to an external USB enclosure and connected it to my laptop. I started with an automatic filesystem repair:
fsck.ext4 -pf /dev/sde2
which failed for some reason and so I moved to an interactive repair:
fsck.ext4 -f /dev/sde2
Once all of the errors were fixed, I ran a full surface scan to update the list of bad blocks:
fsck.ext4 -c /dev/sde2
Finally, I forced another check to make sure that everything was fixed at the filesystem level:
fsck.ext4 -f /dev/sde2

Fix invalid alternate GPT The other thing I noticed is this messge in my dmesg log:
scsi 8:0:0:0: Direct-Access     KINGSTON  SA400S37120     SBFK PQ: 0 ANSI: 6
sd 8:0:0:0: Attached scsi generic sg4 type 0
sd 8:0:0:0: [sde] 234441644 512-byte logical blocks: (120 GB/112 GiB)
sd 8:0:0:0: [sde] Write Protect is off
sd 8:0:0:0: [sde] Mode Sense: 31 00 00 00
sd 8:0:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't support DPO or FUA
sd 8:0:0:0: [sde] Optimal transfer size 33553920 bytes
Alternate GPT is invalid, using primary GPT.
 sde: sde1 sde2
I therefore checked to see if the partition table looked fine and got the following:
$ fdisk -l /dev/sde
GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
Disk /dev/sde: 111.8 GiB, 120034123776 bytes, 234441648 sectors
Disk model: KINGSTON SA400S3
Units: sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disklabel type: gpt
Disk identifier: 799CD830-526B-42CE-8EE7-8C94EF098D46
Device       Start       End   Sectors   Size Type
/dev/sde1     2048   8390655   8388608     4G Linux swap
/dev/sde2  8390656 234441614 226050959 107.8G Linux filesystem
It turns out that all I had to do, since only the backup / alternate GPT partition table was corrupt and the primary one was fine, was to re-write the partition table:
$ fdisk /dev/sde
Welcome to fdisk (util-linux 2.33.1).
Changes will remain in memory only, until you decide to write them.
Be careful before using the write command.
GPT PMBR size mismatch (234441643 != 234441647) will be corrected by write.
The backup GPT table is not on the end of the device. This problem will be corrected by write.
Command (m for help): w
The partition table has been altered.
Syncing disks.

Run SMART checks Since I still didn't know what caused the filesystem corruption in the first place, I decided to do one last check: SMART errors. I couldn't do this via the USB enclosure since the SMART commands aren't forwarded to the drive and so I popped the drive back into the backup server and booted it up. First, I checked whether any SMART errors had been reported using smartmontools:
smartctl -a /dev/sda
That didn't show any errors and so I kicked off an extended test:
smartctl -t long /dev/sda
which ran for 30 minutes and then passed without any errors. The mystery remains unsolved.

Dirk Eddelbuettel: #30: Easy, Reliable, Fast and Portable Linux and macOS Continuous Integration

Welcome to the 30th post in the rarified R recommendation resources series or R4 for short. The last post introduced BSPM. In the four weeks since, we have worked some more on BSPM to bring it to the point where it is ready for use with continuous integration. Building on this, it is now used inside the run.sh script that driven our CI use for many years (via the r-travis repo). Which we actually use right now on three different platforms: All three use the exact same script facilitating this, and run a matrix over Linux and macOS. You read this right: one CI setup that is portable and which you can take to your CI provider of choice. No lock-in or tie-in. Use what works, change at will. Or run on all three if you like burning extra cycles. This is already used by handful of my repos as well as by at least two repos of friends also deploying r-travis. How does it work? In a nutshell we are There are several customizations that are possible via environment variables We find this setup compelling. The scheme is simple: there really is just one shell script behind it which can also be downloaded and altered. The scheme is also portable as we can (as shown) rotate between CI provides. The scheme is also more flexible: in case of debugging needs one can simply run the script on a local Docker or VM instance. Lastly, the scheme moves away from single points of failure or breakage. Currently the script uses only BSPM as I had the hunch that it would a) work and b) be compelling. Adding support for RSPM would be equally easy, but I have no immediate need to do so. Adding BioConductor installation may be next. That is easy when BioConductor uses r-release; it may be little more challenging under r-devel to but it should work too. Stay tuned. In the meantime, if the above sounds compelling, give run.sh from r-travis a go! If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

25 September 2020

Colin Watson: Porting Launchpad to Python 3: progress report

Launchpad still requires Python 2, which in 2020 is a bit of a problem. Unlike a lot of the rest of 2020, though, there s good reason to be optimistic about progress. I ve been porting Python 2 code to Python 3 on and off for a long time, from back when I was on the Ubuntu Foundations team and maintaining things like the Ubiquity installer. When I moved to Launchpad in 2015 it was certainly on my mind that this was a large body of code still stuck on Python 2. One option would have been to just accept that and leave it as it is, maybe doing more backporting work over time as support for Python 2 fades away. I ve long been of the opinion that this would doom Launchpad to being unmaintainable in the long run, and since I genuinely love working on Launchpad - I find it an incredibly rewarding project - this wasn t something I was willing to accept. We re already seeing some of our important dependencies dropping support for Python 2, which is perfectly reasonable on their terms but which is starting to become a genuine obstacle to delivering important features when we need new features from newer versions of those dependencies. It also looks as though it may be difficult for us to run on Ubuntu 20.04 LTS (we re currently on 16.04, with an upgrade to 18.04 in progress) as long as we still require Python 2, since we have some system dependencies that 20.04 no longer provides. And then there are exciting new features like type hints and async/await that we d like to be able to use. However, until last year there were so many blockers that even considering a port was barely conceivable. What changed in 2019 was sorting out a trifecta of core dependencies. We ported our database layer, Storm. We upgraded to modern versions of our Zope Toolkit dependencies (after contributing various fixes upstream, including some substantial changes to Zope s test runner that we d carried as local patches for some years). And we ported our Bazaar code hosting infrastructure to Breezy. With all that in place, a port seemed more of a realistic possibility. Still, even with this, it was never going to be a matter of just following some standard porting advice and calling it good. Launchpad has almost a million lines of Python code in its main git tree, and around 250 dependencies of which a number are quite Launchpad-specific. In a project that size, not only is following standard porting advice an extremely time-consuming task in its own right, but just about every strange corner case is going to show up somewhere. (Did you know that StringIO.StringIO(None) and io.StringIO(None) do different things even after you account for the native string vs. Unicode text difference? How about the behaviour of .union() on a subclass of frozenset?) Launchpad s test suite is fortunately extremely thorough, but even just starting up the test suite involves importing most of the data model code, so before you can start taking advantage of it you have to make a large fraction of the codebase be at least syntactically-correct Python 3 code and use only modules that exist in Python 3 while still working in Python 2; in a project this size that turns out to be a large effort on its own, and can be quite risky in places. Canonical s product engineering teams work on a six-month cycle, but it just isn t possible to cram this sort of thing into six months unless you do literally nothing else, and please can we put all feature development on hold while we run to stand still is a pretty tough sell to even the most understanding management. Fortunately, we ve been able to grow the Launchpad team in the last year or so, and so it s been possible to put Python 3 on our roadmap on the understanding that we aren t going to get all the way there in one cycle, while still being able to do other substantial feature development work as well. So, with all that preamble, what have we done this cycle? We ve taken a two-pronged approach. From one end, we identified 147 classes that needed to be ported away from some compatibility code in our database layer that was substantially less friendly to Python 3: we ve ported 38 of those, so there s clearly a fair bit more to do, but we were able to distribute this work out among the team quite effectively. From the other end, it was clear that it would be very inefficient to do general porting work when any attempt to even run the test suite would run straight into the same crashes in the same order, so I set myself a target of getting the test suite to start up, and started hacking on an enormous git branch that I never expected to try to land directly: instead, I felt free to commit just about anything that looked reasonable and moved things forward even if it was very rough, and every so often went back to tidy things up and cherry-pick individual commits into a form that included some kind of explanation and passed existing tests so that I could propose them for review. This strategy has been dramatically more successful than anything I ve tried before at this scale. So far this cycle, considering only Launchpad s main git tree, we ve landed 137 Python-3-relevant merge proposals for a total of 39552 lines of git diff output, keeping our existing tests passing along the way and deploying incrementally to production. We have about 27000 more lines of patch at varying degrees of quality to tidy up and merge. Our main development branch is only perhaps 10 or 20 more patches away from the test suite being able to start up, at which point we ll be able to get a buildbot running so that multiple developers can work on this much more easily and see the effect of their work. With the full unlanded patch stack, about 75% of the test suite passes on Python 3! This still leaves a long tail of several thousand tests to figure out and fix, but it s a much more incrementally-tractable kind of problem than where we started. Finally: the funniest (to me) bug I ve encountered in this effort was the one I encountered in the test runner and fixed in zopefoundation/zope.testrunner#106: IDs of failing tests were written to a pipe, so if you have a test suite that s large enough and broken enough then eventually that pipe would reach its capacity and your test runner would just give up and hang. Pretty annoying when it meant an overnight test run didn t give useful results, but also eloquent commentary of sorts.

Reproducible Builds: ARDC sponsors the Reproducible Builds project

The Reproducible Builds project is pleased to announce a donation from Amateur Radio Digital Communications (ARDC) in support of its goals. ARDC s contribution will propel the Reproducible Builds project s efforts in ensuring the future health, security and sustainability of our increasingly digital society.

About Amateur Radio Digital Communications (ARDC) Amateur Radio Digital Communications (ARDC) is a non-profit that was formed to further research and experimentation with digital communications using radio, with a goal of advancing the state of the art of amateur radio and to educate radio operators in these techniques. It does this by managing the allocation of network resources, encouraging research and experimentation with networking protocols and equipment, publishing technical articles and number of other activities to promote the public good of amateur radio and other related fields. ARDC has recently begun to contribute funding to organisations, groups, individuals and projects towards these and related goals, and their grant to the Reproducible Builds project is part of this new initiative. Amateur radio is an entirely volunteer activity performed by knowledgeable hobbyists who have proven their ability by passing the appropriate government examinations. No remuneration is permitted. Ham radio, as it is also known, has proven its value in advancements of the state of the communications arts, as well as in public service during disasters and in times of emergency. For more information about ARDC, please see their website at ampr.org.

About the Reproducible Builds project One of the original promises of open source software was that peer review would result in greater end-user security and stability of our digital ecosystem. However, although it is theoretically possible to inspect and build the original source code in order to avoid maliciously-inserted flaws, almost all software today is distributed in prepackaged form. This disconnect allows third-parties to compromise systems by injecting code into seemingly secure software during the build process, as well as by manipulating copies distributed from app stores and other package repositories. In order to address this, Reproducible builds are a set of software development practices, ideas and tools that create an independently-verifiable path from the original source code, all the way to what is actually running on our machines. Reproducible builds can reveal the injection of backdoors introduced by the hacking of developers own computers, build servers and package repositories, but can also expose where volunteers or companies have been coerced into making changes via blackmail, government order, and so on. A world without reproducible builds is a world where our digital infrastructure cannot be trusted and where online communities are slower to grow, collaborate less and are increasingly fragile. Without reproducible builds, we leave space for greater encroachments on our liberties both by individuals as well as powerful, unaccountable actors such as governments, large corporations and autocratic regimes. The Reproducible Builds project began as a project within the Debian community, but is now working with many crucial and well-known free software projects such as Coreboot, openSUSE, OpenWrt, Tails, GNU Guix, Arch Linux, Tor, and many others. It is now an entirely Linux distribution independent effort and serves as the central clearing house for all issues related to securing build systems and software supply chains of all kinds. For more about the Reproducible Builds project, please see their website at reproducible-builds.org.
If you are interested in ensuring the ongoing security of the software that underpins our civilisation, and wish to sponsor the Reproducible Builds project, please reach out to the project by emailing contact@reproducible-builds.org.

20 September 2020

Dirk Eddelbuettel: RcppSpdlog 0.0.2: New upstream, awesome new stopwatch

Following up on the initial RcppSpdlog 0.0.1 release earlier this week, we are pumped to announce release 0.0.2. It contains upstream version 1.8.0 for spdlog which utilizes (among other things) a new feature in the embedded fmt library, namely completely automated formatting of high resolution time stamps which allows for gems like this (taken from this file in the package and edited down for brevity):
    spdlog::stopwatch sw;                                   // instantiate a stop watch

    // some other code

    sp->info("Elapsed time:  ", sw);
What we see is all there is: One instantiates a stopwatch object, and simply references it. The rest, as they say, is magic. And we get tic / toc alike behaviour in modern C++ at essentially no cost us (as code authors). So nice. Output from the (included in the package) function exampleRsink() (again edited down just a little):
R> RcppSpdlog::exampleRsink()
[13:03:23.845114] [fromR] [I] [thread 793264] Welcome to spdlog!
[...]
[13:03:23.845205] [fromR] [I] [thread 793264] Elapsed time: 0.000103611
[...]
[13:03:23.845281] [fromR] [I] [thread 793264] Elapsed time: 0.00018203
R> 
We see that the two simple logging instances come 10 and 18 microseconds into the call. RcppSpdlog bundles spdlog, a wonderful header-only C++ logging library with all the bells and whistles you would want that was written by Gabi Melman, and also includes fmt by Victor Zverovic. The NEWS entry for this release follows.

Changes in RcppSpdlog version 0.0.2 (2020-09-17)
  • Upgraded to upstream release 1.8.0
  • Switched Travis CI to using BSPM, also test on macOS
  • Added 'stopwatch' use to main R sink example

Courtesy of my CRANberries, there is a diffstat report relative to previous release. More detailed information is on the RcppSpdlog page. The only sour grapes, again, are once more over the CRAN processing. And just how 0.0.1 was delayed for no good reason for three weeks, 0.0.2 was delayed by three days just because well that is how CRAN rules sometimes. I d be even more mad if I had an alternative but I don t. We remain grateful for all they do but they really could have let this one through even at one-day update delta. Ah well, now we re three days wiser and of course nothing changed in the package. If you like this or other open-source work I do, you can now sponsor me at GitHub. For the first year, GitHub will match your contributions.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

Next.