Search Results: "data"

4 March 2024

Colin Watson: Free software activity in January/February 2024

Two months into my new gig and it s going great! Tracking my time has taken a bit of getting used to, but having something that amounts to a queryable database of everything I ve done has also allowed some helpful introspection. Freexian sponsors up to 20% of my time on Debian tasks of my choice. In fact I ve been spending the bulk of my time on debusine which is itself intended to accelerate work on Debian, but more details on that later. While I contribute to Freexian s summaries now, I ve also decided to start writing monthly posts about my free software activity as many others do, to get into some more detail. January 2024 February 2024

3 March 2024

Paul Wise: FLOSS Activities Feb 2024

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review
  • Spam: reported 1 Debian bug report
  • Debian BTS usertags: changes for the month

Administration
  • Debian BTS: unarchive/reopen/triage bugs for reintroduced packages: ovito, tahoe-lafs, tpm2-tss-engine
  • Debian wiki: produce HTML dump for a user, unblock IP addresses, approve accounts

Communication
  • Respond to queries from Debian users and contributors on the mailing lists and IRC

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

2 March 2024

Ravi Dwivedi: Malaysia Trip

Last month, I had a trip to Malaysia and Thailand. I stayed for six days in each of the countries. The selection of these countries was due to both of them granting visa-free entry to Indian tourists for some time window. This post covers the Malaysia part and Thailand part will be covered in the next post. If you want to travel to any of these countries in the visa-free time period, I have written all the questions asked during immigration and at airports during this trip here which might be of help. I mostly stayed in Kuala Lumpur and went to places around it. Although before the trip, I planned to visit Ipoh and Cameron Highlands too, but could not cover it during the trip. I found planning a trip to Malaysia a little difficult. The country is divided into two main islands - Peninsular Malaysia and Borneo. Then there are more islands - Langkawi, Penang island, Perhentian and Redang Islands. Reaching those islands seemed a little difficult to plan and I wish to visit more places in my next Malaysia trip. My first day hostel was booked in Chinatown part of Kuala Lumpur, near Pasar Seni LRT station. As soon as I checked-in and entered my room, I met another Indian named Fletcher, and after that we accompanied each other in the trip. That day, we went to Muzium Negara and Little India. I realized that if you know the right places to buy what you want, Malaysia could be quite cheap. Malaysian currency is Malaysian Ringgit (MYR). 1 MYR is equal to 18 INR. For 2 MYR, you can get a good masala tea in Little India and it costs like 4-5 MYR for a masala dosa. The vegetarian food has good availability in Kuala Lumpur, thanks to the Tamil community. I also tried Mee Goreng, which was vegetarian, and I found it fine in terms of taste. When I checked about Mee Goreng on Wikipedia, I found out that it is unique to Indian immigrants in Malaysia (and neighboring countries) but you don t get it in India!
Mee Goreng, a dish made of noodles in Malaysia.
For the next day, Fletcher had planned a trip to Genting Highlands and pre booked everything. I also planned to join him but when we went to KL Sentral to take the bus, his bus tickets were sold out. I could take a bus at a different time, but decided to visit some other place for the day and cover Genting Highlands later. At the ticket counter, I met a family from Delhi and they wanted to go to Genting Highlands but due to not getting bus tickets for that day, they decided to buy a ticket for the next day and instead planned for Batu Caves that day. I joined them and went to Batu Caves. After returning from Batu Caves, we went our separate ways. I went back and took rest at my hostel and later went to Petronas Towers at night. Petronas Towers is the icon of Kuala Lumpur. Having a photo there was a must. I was at Petronas Towers at around 9 PM. Around that time, Fletcher came back from Genting Highlands and we planned to meet at KL Sentral to head for dinner.
Me at Petronas Towers.
We went back to the same place as the day before where I had Mee Goreng. This time we had dosa and a masala tea. Their masala tea from the last day was tasty and that s why I was looking for them in the first place. We also met a Malaysian family having Indian ancestry dining there and had a nice conversation. Then we went to a place to eat roti canai in Pasar Seni market. Roti canai is a popular non-vegetarian dish in Malaysia but I took the vegetarian version.
Photo with Malaysians.
The next day, we went to Berjaya Time Square shopping place which sells pretty cheap items for daily use and souveniers too. However, I bought souveniers from Petaling Street, which is in Chinatown. At night, we explored Bukit Bintang, which is the heart of Kuala Lumpur and is famous for its nightlife. After that, Fletcher went to Bangkok and I was in Malaysia for two more days. Next day, I went to Genting Highlands and took the cable car, which had awesome views. I came back to Kuala Lumpur by the night. The remaining day I just roamed around in Bukit Bintang. Then I took a flight for Bangkok on 7th Feb, which I will cover in the next post. In Malaysia, I met so many people from different countries - apart from people from Indian subcontinent, I met Syrians, Indonesians (Malaysia seems to be a popular destination for Indonesian tourists) and Burmese people. Meeting people from other cultures is an integral part of travel for me. My expenses for Food + Accommodation + Travel added to 10,000 INR for a week in Malaysia, while flight costs were: 13,000 INR (Delhi to Kuala Lumpur) + 10,000 INR (Kuala Lumpur to Bangkok) + 12,000 INR (Bangkok to Delhi). For OpenStreetMap users, good news is Kuala Lumpur is fairly well-mapped on OpenStreetMap.

Tips
  • I bought local SIM from a shop at KL Sentral station complex which had news in their name (I forgot the exact name and there are two shops having news in their name) and it was the cheapest option I could find. The SIM was 10 MYR for 5 GB data for a week. If you want to make calls too, then you need to spend extra 5 MYR.
  • 7-Eleven and KK Mart convenience stores are everywhere in the city and they are open all the time (24 hours a day). If you are a vegetarian, you can at least get some bread and cheese from there to eat.
  • A lot of people know English (and many - Indians, Pakistanis, Nepalis - know Hindi) in Kuala Lumpur, so I had no language problems most of the time.
  • For shopping on budget, you can go to Petaling Street, Berjaya Time Square or Bukit Bintang. In particular, there is a shop named I Love KL Gifts in Bukit Bintang which had very good prices. just near the metro/monorail stattion. Check out location of the shop on OpenStreetMap.

1 March 2024

Guido G nther: Free Software Activities February 2024

A short status update what happened last month. Work in progress is marked as WiP: GNOME Calls Phosh and Phoc As this often overlaps I've put them in a common section: Phosh Tour Phosh Mobile Settings Phosh OSK Stub Livi Video Player Phosh.mobi Website If you want to support my work see donations.

Ravi Dwivedi: Fixing Mobile Data issue on Lineage OS

I have used Lineage OS on a couple of phones, but I noticed that internet using my mobile data was not working well on it. I am not sure why. This was the case in Xiaomi MI A2 and OnePlus 9 Pro phones. One day I met contrapunctus and they looked at their phone settings and used the same in mine and it worked. So, I am going to write here what worked for me. The trick is to add an access point. Go to Settings -> Network Settings -> Your SIM settings -> Access Point Names -> Click on + symbol. In the Name section, you can write anything, I wrote test. And in the APN section write www, then save. Below is a screenshot demonstrating the settings you have to change.
APN settings screenshot. Notice the circled entries.
This APN will show in the list of APNs and you need to select this one. After this, my mobile data started working well and I started getting speeds according to my data plan. This is what worked for me in Lineage OS. Hopefully, it was of help to you :D I will meet you in the next post.

25 February 2024

Ben Hutchings: Converted from Pyblosxom to Jekyll

I ve been using Pyblosxom here for nearly 17 years, but have become increasingly dissatisfied with having to write HTML instead of Markdown. Today I looked at upgrading my web server and discovered that Pyblosxom was removed from Debian after Debian 10, presumably because it wasn t updated for Python 3. I keep hearing about Jekyll as a static site generator for blogs, so I finally investigated how to use that and how to convert my existing entries. Fortunately it supports both HTML and Markdown (and probably other) input formats, so this was mostly a matter of converting metadata. I have my own crappy script for drafting, publishing, and listing blog entries, which also needed a bit of work to update, but that is now done. If all has gone to plan, you should be seeing just one new entry in the feed but all permalinks to older entries still working.

21 February 2024

Niels Thykier: Expanding on the Language Server (LSP) support for debian/control

I have spent some more time on improving my language server for debian/control. Today, I managed to provide the following features:
  • The X- style prefixes for field names are now understood and handled. This means the language server now considers XC-Package-Type the same as Package-Type.

  • More diagnostics:

    • Fields without values now trigger an error marker
    • Duplicated fields now trigger an error marker
    • Fields used in the wrong paragraph now trigger an error marker
    • Typos in field names or values now trigger a warning marker. For field names, X- style prefixes are stripped before typo detection is done.
    • The value of the Section field is now validated against a dataset of known sections and trigger a warning marker if not known.
  • The "on-save trim end of line whitespace" now works. I had a logic bug in the server side code that made it submit "no change" edits to the editor.

  • The language server now provides "hover" documentation for field names. There is a small screenshot of this below. Sadly, emacs does not support markdown or, if it does, it does not announce the support for markdown. For now, all the documentation is always in markdown format and the language server will tag it as either markdown or plaintext depending on the announced support.

  • The language server now provides quick fixes for some of the more trivial problems such as deprecated fields or typos of fields and values.

  • Added more known fields including the XS-Autobuild field for non-free packages along with a link to the relevant devref section in its hover doc.

This covers basically all my known omissions from last update except spellchecking of the Description field. An image of emacs showing documentation for the Provides field from the language server.
Spellchecking Personally, I feel spellchecking would be a very welcome addition to the current feature set. However, reviewing my options, it seems that most of the spellchecking python libraries out there are not packaged for Debian, or at least not other the name I assumed they would be. The alternative is to pipe the spellchecking to another program like aspell list. I did not test this fully, but aspell list does seem to do some input buffering that I cannot easily default (at least not in the shell). Though, either way, the logic for this will not be trivial and aspell list does not seem to include the corrections either. So best case, you would get typo markers but no suggestions for what you should have typed. Not ideal. Additionally, I am also concerned with the performance for this feature. For d/control, it will be a trivial matter in practice. However, I would be reusing this for d/changelog which is 99% free text with plenty of room for typos. For a regular linter, some slowness is acceptable as it is basically a batch tool. However, for a language server, this potentially translates into latency for your edits and that gets annoying. While it is definitely on my long term todo list, I am a bit afraid that it can easily become a time sink. Admittedly, this does annoy me, because I wanted to cross off at least one of Otto's requested features soon.
On wrap-and-sort support The other obvious request from Otto would be to automate wrap-and-sort formatting. Here, the problem is that "we" in Debian do not agree on the one true formatting of debian/control. In fact, I am fairly certain we do not even agree on whether we should all use wrap-and-sort. This implies we need a style configuration. However, if we have a style configuration per person, then you get style "ping-pong" for packages where the co-maintainers do not all have the same style configuration. Additionally, it is very likely that you are a member of multiple packaging teams or groups that all have their own unique style. Ergo, only having a personal config file is doomed to fail. The only "sane" option here that I can think of is to have or support "per package" style configuration. Something that would be committed to git, so the tooling would automatically pick up the configuration. Obviously, that is not fun for large packaging teams where you have to maintain one file per package if you want a consistent style across all packages. But it beats "style ping-pong" any day of the week. Note that I am perfectly open to having a personal configuration file as a fallback for when the "per package" configuration file is absent. The second problem is the question of which format to use and what to name this file. Since file formats and naming has never been controversial at all, this will obviously be the easy part of this problem. But the file should be parsable by both wrap-and-sort and the language server, so you get the same result regardless of which tool you use. If we do not ensure this, then we still have the style ping-pong problem as people use different tools. This also seems like time sink with no end. So, what next then...?
What next? On the language server front, I will have a look at its support for providing semantic hints to the editors that might be used for syntax highlighting. While I think most common Debian editors have built syntax highlighting already, I would like this language server to stand on its own. I would like us to be in a situation where we do not have implement yet another editor extension for Debian packaging files. At least not for editors that support the LSP spec. On a different front, I have an idea for how we go about relationship related substvars. It is not directly related to this language server, except I got triggered by the language server "missing" a diagnostic for reminding people to add the magic Depends: $ misc:Depends [, $ shlibs:Depends ] boilerplate. The magic boilerplate that you have to write even though we really should just fix this at a tooling level instead. Energy permitting, I will formulate a proposal for that and send it to debian-devel. Beyond that, I think I might start adding support for another file. I also need to wrap up my python-debian branch, so I can get the position support into the Debian soon, which would remove one papercut for using this language server. Finally, it might be interesting to see if I can extract a "batch-linter" version of the diagnostics and related quickfix features. If nothing else, the "linter" variant would enable many of you to get a "mini-Lintian" without having to do a package build first.

19 February 2024

Matthew Garrett: Debugging an odd inability to stream video

We have a cabin out in the forest, and when I say "out in the forest" I mean "in a national forest subject to regulation by the US Forest Service" which means there's an extremely thick book describing the things we're allowed to do and (somewhat longer) not allowed to do. It's also down in the bottom of a valley surrounded by tall trees (the whole "forest" bit). There used to be AT&T copper but all that infrastructure burned down in a big fire back in 2021 and AT&T no longer supply new copper links, and Starlink isn't viable because of the whole "bottom of a valley surrounded by tall trees" thing along with regulations that prohibit us from putting up a big pole with a dish on top. Thankfully there's LTE towers nearby, so I'm simply using cellular data. Unfortunately my provider rate limits connections to video streaming services in order to push them down to roughly SD resolution. The easy workaround is just to VPN back to somewhere else, which in my case is just a Wireguard link back to San Francisco.

This worked perfectly for most things, but some streaming services simply wouldn't work at all. Attempting to load the video would just spin forever. Running tcpdump at the local end of the VPN endpoint showed a connection being established, some packets being exchanged, and then nothing. The remote service appeared to just stop sending packets. Tcpdumping the remote end of the VPN showed the same thing. It wasn't until I looked at the traffic on the VPN endpoint's external interface that things began to become clear.

This probably needs some background. Most network infrastructure has a maximum allowable packet size, which is referred to as the Maximum Transmission Unit or MTU. For ethernet this defaults to 1500 bytes, and these days most links are able to handle packets of at least this size, so it's pretty typical to just assume that you'll be able to send a 1500 byte packet. But what's important to remember is that that doesn't mean you have 1500 bytes of packet payload - that 1500 bytes includes whatever protocol level headers are on there. For TCP/IP you're typically looking at spending around 40 bytes on the headers, leaving somewhere around 1460 bytes of usable payload. And if you're using a VPN, things get annoying. In this case the original packet becomes the payload of a new packet, which means it needs another set of TCP (or UDP) and IP headers, and probably also some VPN header. This still all needs to fit inside the MTU of the link the VPN packet is being sent over, so if the MTU of that is 1500, the effective MTU of the VPN interface has to be lower. For Wireguard, this works out to an effective MTU of 1420 bytes. That means simply sending a 1500 byte packet over a Wireguard (or any other VPN) link won't work - adding the additional headers gives you a total packet size of over 1500 bytes, and that won't fit into the underlying link's MTU of 1500.

And yet, things work. But how? Faced with a packet that's too big to fit into a link, there are two choices - break the packet up into multiple smaller packets ("fragmentation") or tell whoever's sending the packet to send smaller packets. Fragmentation seems like the obvious answer, so I'd encourage you to read Valerie Aurora's article on how fragmentation is more complicated than you think. tl;dr - if you can avoid fragmentation then you're going to have a better life. You can explicitly indicate that you don't want your packets to be fragmented by setting the Don't Fragment bit in your IP header, and then when your packet hits a link where your packet exceeds the link MTU it'll send back a packet telling the remote that it's too big, what the actual MTU is, and the remote will resend a smaller packet. This avoids all the hassle of handling fragments in exchange for the cost of a retransmit the first time the MTU is exceeded. It also typically works these days, which wasn't always the case - people had a nasty habit of dropping the ICMP packets telling the remote that the packet was too big, which broke everything.

What I saw when I tcpdumped on the remote VPN endpoint's external interface was that the connection was getting established, and then a 1500 byte packet would arrive (this is kind of the behaviour you'd expect for video - the connection handshaking involves a bunch of relatively small packets, and then once you start sending the video stream itself you start sending packets that are as large as possible in order to minimise overhead). This 1500 byte packet wouldn't fit down the Wireguard link, so the endpoint sent back an ICMP packet to the remote telling it to send smaller packets. The remote should then have sent a new, smaller packet - instead, about a second after sending the first 1500 byte packet, it sent that same 1500 byte packet. This is consistent with it ignoring the ICMP notification and just behaving as if the packet had been dropped.

All the services that were failing were failing in identical ways, and all were using Fastly as their CDN. I complained about this on social media and then somehow ended up in contact with the engineering team responsible for this sort of thing - I sent them a packet dump of the failure, they were able to reproduce it, and it got fixed. Hurray!

(Between me identifying the problem and it getting fixed I was able to work around it. The TCP header includes a Maximum Segment Size (MSS) field, which indicates the maximum size of the payload for this connection. iptables allows you to rewrite this, so on the VPN endpoint I simply rewrote the MSS to be small enough that the packets would fit inside the Wireguard MTU. This isn't a complete fix since it's done at the TCP level rather than the IP level - so any large UDP packets would still end up breaking)

I've no idea what the underlying issue was, and at the client end the failure was entirely opaque: the remote simply stopped sending me packets. The only reason I was able to debug this at all was because I controlled the other end of the VPN as well, and even then I wouldn't have been able to do anything about it other than being in the fortuitous situation of someone able to do something about it seeing my post. How many people go through their lives dealing with things just being broken and having no idea why, and how do we fix that?

(Edit: thanks to this comment, it sounds like the underlying issue was a kernel bug that Fastly developed a fix for - under certain configurations, the kernel fails to associate the MTU update with the egress interface and so it continues sending overly large packets)

comment count unavailable comments

17 February 2024

James Valleroy: snac2: a minimalist ActivityPub server in Debian

snac2, currently available in Debian testing and unstable, is described by its upstream as A simple, minimalistic ActivityPub instance written in portable C. It provides an ActivityPub server with a bare-bones web interface. It does not use JavaScript or require a database.
Basic forms for creating a new post, or following someone
ActivityPub is the protocol for federated social networks that is implemented by Mastodon, Pleroma, and other similar server software. Federated social networks are most often used for micro-blogging , or making many small posts. You can decide to follow another user (or bot) to see their posts, even if they happen to be on a different server (as long as the server software is compatible with the ActivityPub standard).
The timeline shows posts from accounts that you follow
In addition, snac2 has preliminary support for the Mastodon Client API. This allows basic support for mobile apps that support Mastodon, but you should expect that many features are not available yet. If you are interested in running a minimalist ActivityPub server on Debian, please try out snac2, and report any bugs that you find.

8 February 2024

Sven Hoexter: Use GitHub CLI to List all Repository Secrets

Write it down before I forget about it again:
for x in $(gh api graphql --paginate -f query='query($endCursor:String)   organization(login:"myorg")  
    repositories(first: 100, after: $endCursor, isArchived:false)  
        pageInfo  
            hasNextPage
            endCursor
         
        nodes  
            name
         
     
     
     ' --jq '.data.organization.repositories.nodes[].name'); do
    secrets=$(gh secret list --json name --jq '.[].name' -R "myorg/$ x "   tr '\n' ',')
    if ! [ -z "$ secrets " ]; then
        echo "$ x ,$ secrets "
    fi
done
Requests a list of all not archived repositories in a GitHub org and queries repository secrets. If we find some we output the repo name and the secrets in a comma separated list. Not real CSV, but good enough for further processing. I've to admit it's kinda beautiful what you can do with the gh cli by now. Sadly it seems the secrets are not yet available via GraphQL (or I missed it in the docs), so I just use the gh cli to do the REST calls.

6 February 2024

Robert McQueen: Flathub: Pros and Cons of Direct Uploads

I attended FOSDEM last weekend and had the pleasure to participate in the Flathub / Flatpak BOF on Saturday. A lot of the session was used up by an extensive discussion about the merits (or not) of allowing direct uploads versus building everything centrally on Flathub s infrastructure, and related concerns such as automated security/dependency scanning. My original motivation behind the idea was essentially two things. The first was to offer a simpler way forward for applications that use language-specific build tools that resolve and retrieve their own dependencies from the internet. Flathub doesn t allow network access during builds, and so a lot of manual work and additional tooling is currently needed (see Python and Electron Flatpak guides). And the second was to offer a maybe more familiar flow to developers from other platforms who would just build something and then run another command to upload it to the store, without having to learn the syntax of a new build tool. There were many valid concerns raised in the room, and I think on reflection that this is still worth doing, but might not be as valuable a way forward for Flathub as I had initially hoped. Of course, for a proprietary application where Flathub never sees the source or where it s built, whether that binary is uploaded to us or downloaded by us doesn t change much. But for an FLOSS application, a direct upload driven by the developer causes a regression on a number of fronts. We re not getting too hung up on the malicious developer inserts evil code in the binary case because Flathub already works on the model of verifying the developer and the user makes a decision to trust that app we don t review the source after all. But we do lose other things such as our infrastructure building on multiple architectures, and visibility on whether the build environment or upload credentials have been compromised unbeknownst to the developer. There is now a manual review process for when apps change their metadata such as name, icon, license and permissions which would apply to any direct uploads as well. It was suggested that if only heavily sandboxed apps (eg no direct filesystem access without proper use of portals) were permitted to make direct uploads, the impact of such concerns might be somewhat mitigated by the sandboxing. However, it was also pointed out that my go-to example of Electron app developers can upload to Flathub with one command was also a bit of a fiction. At present, none of them would pass that stricter sandboxing requirement. Almost all Electron apps run old versions of Chromium with less complete portal support, needing sandbox escapes to function correctly, and Electron (and Chromium s) sandboxing still needs additional tooling/downstream patching to run inside a Flatpak. Buh-boh. I think for established projects who already ship their own binaries from their own centralised/trusted infrastructure, and for developers who have understandable sensitivities about binary integrity such such as encryption, password or financial tools, it s a definite improvement that we re able to set up direct uploads with such projects with less manual work. There are already quite a few applications including verified ones where the build recipe simply fetches a binary built elsewhere and unpacks it, and if this already done centrally by the developer, repeating the exercise on Flathub s server adds little value. However for the individual developer experience, I think we need to zoom out a bit and think about how to improve this from a tools and infrastructure perspective as we grow Flathub, and as we seek to raise funds for different sources for these improvements. I took notes for everything that was mentioned as a tooling limitation during the BOF, along with a few ideas about how we could improve things, and hope to share these soon as part of an RFP/RFI (Request For Proposals/Request for Information) process. We don t have funding yet but if we have some prospective collaborators to help refine the scope and estimate the cost/effort, we can use this to go and pursue funding opportunities.

31 January 2024

Dirk Eddelbuettel: dtts 0.1.2 on CRAN: Maintenance

Leonardo and I are happy to announce the release of a very minor maintenance release 0.1.2 of our dtts package which has been on CRAN for a little under two years now. dtts builds upon our nanotime package as well as the beloved data.table to bring high-performance and high-resolution indexing at the nanosecond level to data frames. dtts aims to offers the time-series indexing versatility of xts (and zoo) to the immense power of data.table while supporting highest nanosecond resolution. This release follows yesterday s long-awaited release of data.table version 1.5.0 which had been some time in the making as the first new major.minor release since Matt drifted into being less active and the forefront. The release also renamed the one C-level API accessor to data.table (which was added, if memory serves, by Leonardo with our use in mind). So we have to catch up to the renamed identifier; this release does that, and adds a versioned imports statement on data.table. The short list of changes follows.

Changes in version 0.1.2 (2024-01-31)
  • Update the one exported C-level identifier from data.table following its 1.5.0 release and a renaming
  • Routine continuous integration updates

Courtesy of my CRANberries, there is also a report with diffstat for this release. Questions, comments, issue tickets can be brought to the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 January 2024

Antoine Beaupr : router archeology: the Soekris net5001

Roadkiller was a Soekris net5501 router I used as my main gateway between 2010 and 2016 (for r seau and t l phone). It was upgraded to FreeBSD 8.4-p12 (2014-06-06) and pkgng. It was retired in favor of octavia around 2016. Roughly 10 years later (2024-01-24), I found it in a drawer and, to my surprised, it booted. After wrangling with a RS-232 USB adapter, a null modem cable, and bit rates, I even logged in:
comBIOS ver. 1.33  20070103  Copyright (C) 2000-2007 Soekris Engineering.
net5501
0512 Mbyte Memory                        CPU Geode LX 500 Mhz 
Pri Mas  WDC WD800VE-00HDT0              LBA Xlt 1024-255-63  78 Gbyte
Slot   Vend Dev  ClassRev Cmd  Stat CL LT HT  Base1    Base2   Int 
-------------------------------------------------------------------
0:01:2 1022 2082 10100000 0006 0220 08 00 00 A0000000 00000000 10
0:06:0 1106 3053 02000096 0117 0210 08 40 00 0000E101 A0004000 11
0:07:0 1106 3053 02000096 0117 0210 08 40 00 0000E201 A0004100 05
0:08:0 1106 3053 02000096 0117 0210 08 40 00 0000E301 A0004200 09
0:09:0 1106 3053 02000096 0117 0210 08 40 00 0000E401 A0004300 12
0:20:0 1022 2090 06010003 0009 02A0 08 40 80 00006001 00006101 
0:20:2 1022 209A 01018001 0005 02A0 08 00 00 00000000 00000000 
0:21:0 1022 2094 0C031002 0006 0230 08 00 80 A0005000 00000000 15
0:21:1 1022 2095 0C032002 0006 0230 08 00 00 A0006000 00000000 15
 4 Seconds to automatic boot.   Press Ctrl-P for entering Monitor.
 
                                            
                                                  ______
                                                    ____  __ ___  ___ 
            Welcome to FreeBSD!                     __   '__/ _ \/ _ \
                                                    __       __/  __/
                                                                      
    1. Boot FreeBSD [default]                     _     _   \___ \___ 
    2. Boot FreeBSD with ACPI enabled             ____   _____ _____
    3. Boot FreeBSD in Safe Mode                    _ \ / ____   __ \
    4. Boot FreeBSD in single user mode             _)   (___         
    5. Boot FreeBSD with verbose logging            _ < \___ \        
    6. Escape to loader prompt                      _)  ____)    __   
    7. Reboot                                                         
                                                  ____/ _____/ _____/
                                            
                                            
                                            
    Select option, [Enter] for default      
    or [Space] to pause timer  5            
  
Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD is a registered trademark of The FreeBSD Foundation.
FreeBSD 8.4-RELEASE-p12 #5: Fri Jun  6 02:43:23 EDT 2014
    root@roadkiller.anarc.at:/usr/obj/usr/src/sys/ROADKILL i386
gcc version 4.2.2 20070831 prerelease [FreeBSD]
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: Geode(TM) Integrated Processor by AMD PCS (499.90-MHz 586-class CPU)
  Origin = "AuthenticAMD"  Id = 0x5a2  Family = 5  Model = a  Stepping = 2
  Features=0x88a93d<FPU,DE,PSE,TSC,MSR,CX8,SEP,PGE,CMOV,CLFLUSH,MMX>
  AMD Features=0xc0400000<MMX+,3DNow!+,3DNow!>
real memory  = 536870912 (512 MB)
avail memory = 506445824 (482 MB)
kbd1 at kbdmux0
K6-family MTRR support enabled (2 registers)
ACPI Error: A valid RSDP was not found (20101013/tbxfroot-309)
ACPI: Table initialisation failed: AE_NOT_FOUND
ACPI: Try disabling either ACPI or apic support.
cryptosoft0: <software crypto> on motherboard
pcib0 pcibus 0 on motherboard
pci0: <PCI bus> on pcib0
Geode LX: Soekris net5501 comBIOS ver. 1.33 20070103 Copyright (C) 2000-2007
pci0: <encrypt/decrypt, entertainment crypto> at device 1.2 (no driver attached)
vr0: <VIA VT6105M Rhine III 10/100BaseTX> port 0xe100-0xe1ff mem 0xa0004000-0xa00040ff irq 11 at device 6.0 on pci0
vr0: Quirks: 0x2
vr0: Revision: 0x96
miibus0: <MII bus> on vr0
ukphy0: <Generic IEEE 802.3u media interface> PHY 1 on miibus0
ukphy0:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr0: Ethernet address: 00:00:24:cc:93:44
vr0: [ITHREAD]
vr1: <VIA VT6105M Rhine III 10/100BaseTX> port 0xe200-0xe2ff mem 0xa0004100-0xa00041ff irq 5 at device 7.0 on pci0
vr1: Quirks: 0x2
vr1: Revision: 0x96
miibus1: <MII bus> on vr1
ukphy1: <Generic IEEE 802.3u media interface> PHY 1 on miibus1
ukphy1:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr1: Ethernet address: 00:00:24:cc:93:45
vr1: [ITHREAD]
vr2: <VIA VT6105M Rhine III 10/100BaseTX> port 0xe300-0xe3ff mem 0xa0004200-0xa00042ff irq 9 at device 8.0 on pci0
vr2: Quirks: 0x2
vr2: Revision: 0x96
miibus2: <MII bus> on vr2
ukphy2: <Generic IEEE 802.3u media interface> PHY 1 on miibus2
ukphy2:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr2: Ethernet address: 00:00:24:cc:93:46
vr2: [ITHREAD]
vr3: <VIA VT6105M Rhine III 10/100BaseTX> port 0xe400-0xe4ff mem 0xa0004300-0xa00043ff irq 12 at device 9.0 on pci0
vr3: Quirks: 0x2
vr3: Revision: 0x96
miibus3: <MII bus> on vr3
ukphy3: <Generic IEEE 802.3u media interface> PHY 1 on miibus3
ukphy3:  none, 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto, auto-flow
vr3: Ethernet address: 00:00:24:cc:93:47
vr3: [ITHREAD]
isab0: <PCI-ISA bridge> at device 20.0 on pci0
isa0: <ISA bus> on isab0
atapci0: <AMD CS5536 UDMA100 controller> port 0x1f0-0x1f7,0x3f6,0x170-0x177,0x376,0xe000-0xe00f at device 20.2 on pci0
ata0: <ATA channel> at channel 0 on atapci0
ata0: [ITHREAD]
ata1: <ATA channel> at channel 1 on atapci0
ata1: [ITHREAD]
ohci0: <OHCI (generic) USB controller> mem 0xa0005000-0xa0005fff irq 15 at device 21.0 on pci0
ohci0: [ITHREAD]
usbus0 on ohci0
ehci0: <AMD CS5536 (Geode) USB 2.0 controller> mem 0xa0006000-0xa0006fff irq 15 at device 21.1 on pci0
ehci0: [ITHREAD]
usbus1: EHCI version 1.0
usbus1 on ehci0
cpu0 on motherboard
pmtimer0 on isa0
orm0: <ISA Option ROM> at iomem 0xc8000-0xd27ff pnpid ORM0000 on isa0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
atkbd0: [ITHREAD]
atrtc0: <AT Real Time Clock> at port 0x70 irq 8 on isa0
ppc0: parallel port not found.
uart0: <16550 or compatible> at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
uart0: [FILTER]
uart0: console (19200,n,8,1)
uart1: <16550 or compatible> at port 0x2f8-0x2ff irq 3 on isa0
uart1: [FILTER]
Timecounter "TSC" frequency 499903982 Hz quality 800
Timecounters tick every 1.000 msec
IPsec: Initialized Security Association Processing.
usbus0: 12Mbps Full Speed USB v1.0
usbus1: 480Mbps High Speed USB v2.0
ad0: 76319MB <WDC WD800VE-00HDT0 09.07D09> at ata0-master UDMA100 
ugen0.1: <AMD> at usbus0
uhub0: <AMD OHCI root HUB, class 9/0, rev 1.00/1.00, addr 1> on usbus0
ugen1.1: <AMD> at usbus1
uhub1: <AMD EHCI root HUB, class 9/0, rev 2.00/1.00, addr 1> on usbus1
GEOM: ad0s1: geometry does not match label (255h,63s != 16h,63s).
uhub0: 4 ports with 4 removable, self powered
Root mount waiting for: usbus1
Root mount waiting for: usbus1
uhub1: 4 ports with 4 removable, self powered
Trying to mount root from ufs:/dev/ad0s1a
The last log rotation is from 2016:
[root@roadkiller /var/log]# stat /var/log/wtmp      
65 61783 -rw-r--r-- 1 root wheel 208219 1056 "Nov  1 05:00:01 2016" "Jan 18 22:29:16 2017" "Jan 18 22:29:16 2017" "Nov  1 05:00:01 2016" 16384 4 0 /var/log/wtmp
Interestingly, I switched between eicat and teksavvy on December 11th. Which year? Who knows!
Dec 11 16:38:40 roadkiller mpd: [eicatL0] LCP: authorization successful
Dec 11 16:41:15 roadkiller mpd: [teksavvyL0] LCP: authorization successful
Never realized those good old logs had a "oh dear forgot the year" issue (that's something like Y2K except just "Y", I guess). That was probably 2015, because the log dates from 2017, and the last entry is from November of the year after the above:
[root@roadkiller /var/log]# stat mpd.log 
65 47113 -rw-r--r-- 1 root wheel 193008 71939195 "Jan 18 22:39:18 2017" "Jan 18 22:39:59 2017" "Jan 18 22:39:59 2017" "Apr  2 10:41:37 2013" 16384 140640 0 mpd.log
It looks like the system was installed in 2010:
[root@roadkiller /var/log]# stat /
63 2 drwxr-xr-x 21 root wheel 2120 512 "Jan 18 22:34:43 2017" "Jan 18 22:28:12 2017" "Jan 18 22:28:12 2017" "Jul 18 22:25:00 2010" 16384 4 0 /
... so it lived for about 6 years, but still works after almost 14 years, which I find utterly amazing. Another amazing thing is that there's tuptime installed on that server! That is a software I thought I discovered later and then sponsored in Debian, but turns out I was already using it then!
[root@roadkiller /var]# tuptime 
System startups:        19   since   21:20:16 11/07/15
System shutdowns:       0 ok   -   18 bad
System uptime:          85.93 %   -   1 year, 11 days, 10 hours, 3 minutes and 36 seconds
System downtime:        14.07 %   -   61 days, 15 hours, 22 minutes and 45 seconds
System life:            1 year, 73 days, 1 hour, 26 minutes and 20 seconds
Largest uptime:         122 days, 9 hours, 17 minutes and 6 seconds   from   08:17:56 02/02/16
Shortest uptime:        5 minutes and 4 seconds   from   21:55:00 01/18/17
Average uptime:         19 days, 19 hours, 28 minutes and 37 seconds
Largest downtime:       57 days, 1 hour, 9 minutes and 59 seconds   from   20:45:01 11/22/16
Shortest downtime:      -1 years, 364 days, 23 hours, 58 minutes and 12 seconds   from   22:30:01 01/18/17
Average downtime:       3 days, 5 hours, 51 minutes and 43 seconds
Current uptime:         18 minutes and 23 seconds   since   22:28:13 01/18/17
Actual up/down times:
[root@roadkiller /var]# tuptime -t
No.        Startup Date                                         Uptime       Shutdown Date   End                                                  Downtime
1     21:20:16 11/07/15      1 day, 0 hours, 40 minutes and 12 seconds   22:00:28 11/08/15   BAD                                  2 minutes and 37 seconds
2     22:03:05 11/08/15      1 day, 9 hours, 41 minutes and 57 seconds   07:45:02 11/10/15   BAD                                  3 minutes and 24 seconds
3     07:48:26 11/10/15    20 days, 2 hours, 41 minutes and 34 seconds   10:30:00 11/30/15   BAD                        4 hours, 50 minutes and 21 seconds
4     15:20:21 11/30/15                      19 minutes and 40 seconds   15:40:01 11/30/15   BAD                                   6 minutes and 5 seconds
5     15:46:06 11/30/15                      53 minutes and 55 seconds   16:40:01 11/30/15   BAD                           1 hour, 1 minute and 38 seconds
6     17:41:39 11/30/15     6 days, 16 hours, 3 minutes and 22 seconds   09:45:01 12/07/15   BAD                4 days, 6 hours, 53 minutes and 11 seconds
7     16:38:12 12/11/15   50 days, 17 hours, 56 minutes and 49 seconds   10:35:01 01/31/16   BAD                                 10 minutes and 52 seconds
8     10:45:53 01/31/16     1 day, 21 hours, 28 minutes and 16 seconds   08:14:09 02/02/16   BAD                                  3 minutes and 48 seconds
9     08:17:56 02/02/16    122 days, 9 hours, 17 minutes and 6 seconds   18:35:02 06/03/16   BAD                                 10 minutes and 16 seconds
10    18:45:18 06/03/16   29 days, 17 hours, 14 minutes and 43 seconds   12:00:01 07/03/16   BAD                                 12 minutes and 34 seconds
11    12:12:35 07/03/16   31 days, 17 hours, 17 minutes and 26 seconds   05:30:01 08/04/16   BAD                                 14 minutes and 25 seconds
12    05:44:26 08/04/16     15 days, 1 hour, 55 minutes and 35 seconds   07:40:01 08/19/16   BAD                                  6 minutes and 51 seconds
13    07:46:52 08/19/16     7 days, 5 hours, 23 minutes and 10 seconds   13:10:02 08/26/16   BAD                                  3 minutes and 45 seconds
14    13:13:47 08/26/16   27 days, 21 hours, 36 minutes and 14 seconds   10:50:01 09/23/16   BAD                                  2 minutes and 14 seconds
15    10:52:15 09/23/16   60 days, 10 hours, 52 minutes and 46 seconds   20:45:01 11/22/16   BAD                 57 days, 1 hour, 9 minutes and 59 seconds
16    21:55:00 01/18/17                        5 minutes and 4 seconds   22:00:04 01/18/17   BAD                                 11 minutes and 15 seconds
17    22:11:19 01/18/17                       8 minutes and 42 seconds   22:20:01 01/18/17   BAD                                   1 minute and 20 seconds
18    22:21:21 01/18/17                       8 minutes and 40 seconds   22:30:01 01/18/17   BAD   -1 years, 364 days, 23 hours, 58 minutes and 12 seconds
19    22:28:13 01/18/17                      20 minutes and 17 seconds
The last few entries are actually the tests I'm running now, it seems this machine thinks we're now on 2017-01-18 at ~22:00, while we're actually 2024-01-24 at ~12:00 local:
Wed Jan 18 23:05:38 EST 2017
FreeBSD/i386 (roadkiller.anarc.at) (ttyu0)
login: root
Password:
Jan 18 23:07:10 roadkiller login: ROOT LOGIN (root) ON ttyu0
Last login: Wed Jan 18 22:29:16 on ttyu0
Copyright (c) 1992-2013 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
        The Regents of the University of California. All rights reserved.
FreeBSD 8.4-RELEASE-p12 (ROADKILL) #5: Fri Jun  6 02:43:23 EDT 2014
Reminders:
 * commit stuff in /etc
 * reload firewall (in screen!):
    pfctl -f /etc/pf.conf ; sleep 1
 * vim + syn on makes pf.conf more readable
 * monitoring the PPPoE uplink:
   tail -f /var/log/mpd.log
Current problems:
 * sometimes pf doesn't start properly on boot, if pppoe failed to come up, use
   this to resume:
     /etc/rc.d/pf start
   it will kill your shell, but fix NAT (2012-08-10)
 * babel fails to start on boot (2013-06-15):
     babeld -D -g 33123 tap0 vr3
 * DNS often fails, tried messing with unbound.conf (2014-10-05) and updating
   named.root (2016-01-28) and performance tweaks (ee63689)
 * asterisk and mpd4 are deprecated and should be uninstalled when we're sure
   their replacements (voipms + ata and mpd5) are working (2015-01-13)
 * if IPv6 fails, it's because netblocks are not being routed upstream. DHCPcd
   should do this, but doesn't start properly, use this to resume (2015-12-21):
     /usr/local/sbin/dhcpcd -6 --persistent --background --timeout 0 -C resolv.conf ng0
This machine is doomed to be replaced with the new omnia router, Indiegogo
campaign should ship in april 2016: http://igg.me/at/turris-omnia/x
(I really like the motd I left myself there. In theory, I guess this could just start connecting to the internet again if I still had the same PPPoE/ADSL link I had almost a decade ago; obviously, I do not.) Not sure how the system figured the 2017 time: the onboard clock itself believes we're in 1980, so clearly the CMOS battery has (understandably) failed:
> ?
comBIOS Monitor Commands
boot [drive][:partition] INT19 Boot
reboot                   cold boot
download                 download a file using XMODEM/CRC
flashupdate              update flash BIOS with downloaded file
time [HH:MM:SS]          show or set time
date [YYYY/MM/DD]        show or set date
d[b w d] [adr]           dump memory bytes/words/dwords
e[b w d] adr value [...] enter bytes/words/dwords
i[b w d] port            input from 8/16/32-bit port
o[b w d] port value      output to 8/16/32-bit port
run adr                  execute code at adr
cmosread [adr]           read CMOS RAM data
cmoswrite adr byte [...] write CMOS RAM data
cmoschecksum             update CMOS RAM Checksum
set parameter=value      set system parameter to value
show [parameter]         show one or all system parameters
?/help                   show this help
> show
ConSpeed = 19200
ConLock = Enabled
ConMute = Disabled
BIOSentry = Enabled
PCIROMS = Enabled
PXEBoot = Enabled
FLASH = Primary
BootDelay = 5
FastBoot = Disabled
BootPartition = Disabled
BootDrive = 80 81 F0 FF 
ShowPCI = Enabled
Reset = Hard
CpuSpeed = Default
> time
Current Date and Time is: 1980/01/01 00:56:47
Another bit of archeology: I had documented various outages with my ISP... back in 2003!
[root@roadkiller ~/bin]# cat ppp_stats/downtimes.txt
11/03/2003 18:24:49 218
12/03/2003 09:10:49 118
12/03/2003 10:05:57 680
12/03/2003 10:14:50 106
12/03/2003 10:16:53 6
12/03/2003 10:35:28 146
12/03/2003 10:57:26 393
12/03/2003 11:16:35 5
12/03/2003 11:16:54 11
13/03/2003 06:15:57 18928
13/03/2003 09:43:36 9730
13/03/2003 10:47:10 23
13/03/2003 10:58:35 5
16/03/2003 01:32:36 338
16/03/2003 02:00:33 120
16/03/2003 11:14:31 14007
19/03/2003 00:56:27 11179
19/03/2003 00:56:43 5
19/03/2003 00:56:53 0
19/03/2003 00:56:55 1
19/03/2003 00:57:09 1
19/03/2003 00:57:10 1
19/03/2003 00:57:24 1
19/03/2003 00:57:25 1
19/03/2003 00:57:39 1
19/03/2003 00:57:40 1
19/03/2003 00:57:44 3
19/03/2003 00:57:53 0
19/03/2003 00:57:55 0
19/03/2003 00:58:08 0
19/03/2003 00:58:10 0
19/03/2003 00:58:23 0
19/03/2003 00:58:25 0
19/03/2003 00:58:39 1
19/03/2003 00:58:42 2
19/03/2003 00:58:58 5
19/03/2003 00:59:35 2
19/03/2003 00:59:47 3
19/03/2003 01:00:34 3
19/03/2003 01:00:39 0
19/03/2003 01:00:54 0
19/03/2003 01:01:11 2
19/03/2003 01:01:25 1
19/03/2003 01:01:48 1
19/03/2003 01:02:03 1
19/03/2003 01:02:10 2
19/03/2003 01:02:20 3
19/03/2003 01:02:44 3
19/03/2003 01:03:45 3
19/03/2003 01:04:39 2
19/03/2003 01:05:40 2
19/03/2003 01:06:35 2
19/03/2003 01:07:36 2
19/03/2003 01:08:31 2
19/03/2003 01:08:38 2
19/03/2003 01:10:07 3
19/03/2003 01:11:05 2
19/03/2003 01:12:03 3
19/03/2003 01:13:01 3
19/03/2003 01:13:58 2
19/03/2003 01:14:59 5
19/03/2003 01:15:54 2
19/03/2003 01:16:55 2
19/03/2003 01:17:50 2
19/03/2003 01:18:51 3
19/03/2003 01:19:46 2
19/03/2003 01:20:46 2
19/03/2003 01:21:42 3
19/03/2003 01:22:42 3
19/03/2003 01:23:37 2
19/03/2003 01:24:38 3
19/03/2003 01:25:33 2
19/03/2003 01:26:33 2
19/03/2003 01:27:30 3
19/03/2003 01:28:55 2
19/03/2003 01:29:56 2
19/03/2003 01:30:50 2
19/03/2003 01:31:42 3
19/03/2003 01:32:36 3
19/03/2003 01:33:27 2
19/03/2003 01:34:21 2
19/03/2003 01:35:22 2
19/03/2003 01:36:17 3
19/03/2003 01:37:18 2
19/03/2003 01:38:13 3
19/03/2003 01:39:39 2
19/03/2003 01:40:39 2
19/03/2003 01:41:35 3
19/03/2003 01:42:35 3
19/03/2003 01:43:31 3
19/03/2003 01:44:31 3
19/03/2003 01:45:53 3
19/03/2003 01:46:48 3
19/03/2003 01:47:48 2
19/03/2003 01:48:44 3
19/03/2003 01:49:44 2
19/03/2003 01:50:40 3
19/03/2003 01:51:39 1
19/03/2003 11:04:33 19   
19/03/2003 18:39:36 2833 
19/03/2003 18:54:05 825  
19/03/2003 19:04:00 454  
19/03/2003 19:08:11 210  
19/03/2003 19:41:44 272  
19/03/2003 21:18:41 208  
24/03/2003 04:51:16 6
27/03/2003 04:51:20 5
30/03/2003 04:51:25 5
31/03/2003 08:30:31 255  
03/04/2003 08:30:36 5
06/04/2003 01:16:00 621  
06/04/2003 22:18:08 17   
06/04/2003 22:32:44 13   
09/04/2003 22:33:12 28   
12/04/2003 22:33:17 6
15/04/2003 22:33:22 5
17/04/2003 15:03:43 18   
20/04/2003 15:03:48 5
23/04/2003 15:04:04 16   
23/04/2003 21:08:30 339  
23/04/2003 21:18:08 13   
23/04/2003 23:34:20 253  
26/04/2003 23:34:45 25   
29/04/2003 23:34:49 5
02/05/2003 13:10:01 185  
05/05/2003 13:10:06 5
08/05/2003 13:10:11 5
09/05/2003 14:00:36 63928
09/05/2003 16:58:52 2
11/05/2003 23:08:48 2
14/05/2003 23:08:53 6
17/05/2003 23:08:58 5
20/05/2003 23:09:03 5
23/05/2003 23:09:08 5
26/05/2003 23:09:14 5
29/05/2003 23:00:10 3
29/05/2003 23:03:01 10   
01/06/2003 23:03:05 4
04/06/2003 23:03:10 5
07/06/2003 23:03:38 28   
10/06/2003 23:03:50 12   
13/06/2003 23:03:55 6
14/06/2003 07:42:20 3
14/06/2003 14:37:08 3
15/06/2003 20:08:34 3
18/06/2003 20:08:39 6
21/06/2003 20:08:45 6
22/06/2003 03:05:19 138  
22/06/2003 04:06:28 3
25/06/2003 04:06:58 31   
28/06/2003 04:07:02 4
01/07/2003 04:07:06 4
04/07/2003 04:07:11 5
07/07/2003 04:07:16 5
12/07/2003 04:55:20 6
12/07/2003 19:09:51 1158 
12/07/2003 22:14:49 8025 
15/07/2003 22:14:54 6
16/07/2003 05:43:06 18   
19/07/2003 05:43:12 6
22/07/2003 05:43:17 5
23/07/2003 18:18:55 183  
23/07/2003 18:19:55 9
23/07/2003 18:29:15 158  
23/07/2003 19:48:44 4604 
23/07/2003 20:16:27 3
23/07/2003 20:37:29 1079 
23/07/2003 20:43:12 342  
23/07/2003 22:25:51 6158
Fascinating. I suspect the (IDE!) hard drive might be failing as I saw two new files created in /var that I didn't remember seeing before:
-rw-r--r--   1 root    wheel        0 Jan 18 22:55 3@T3
-rw-r--r--   1 root    wheel        0 Jan 18 22:55 DY5
So I shutdown the machine, possibly for the last time:
Waiting (max 60 seconds) for system process  bufdaemon' to stop...done
Waiting (max 60 seconds) for system process  syncer' to stop...
Syncing disks, vnodes remaining...3 3 0 1 1 0 0 done
All buffers synced.
Uptime: 36m43s
usbus0: Controller shutdown
uhub0: at usbus0, port 1, addr 1 (disconnected)
usbus0: Controller shutdown complete
usbus1: Controller shutdown
uhub1: at usbus1, port 1, addr 1 (disconnected)
usbus1: Controller shutdown complete
The operating system has halted.
Please press any key to reboot.
I'll finally note this was the last FreeBSD server I personally operated. I also used FreeBSD to setup the core routers at Koumbit but those were replaced with Debian recently as well. Thanks Soekris, that was some sturdy hardware. Hopefully this new Protectli router will live up to that "decade plus" challenge. Not sure what the fate of this device will be: I'll bring it to the next Montreal Debian & Stuff to see if anyone's interested, contact me if you can't show up and want this thing.

Matthew Palmer: Why Certificate Lifecycle Automation Matters

If you ve perused the ActivityPub feed of certificates whose keys are known to be compromised, and clicked on the Show More button to see the name of the certificate issuer, you may have noticed that some issuers seem to come up again and again. This might make sense after all, if a CA is issuing a large volume of certificates, they ll be seen more often in a list of compromised certificates. In an attempt to see if there is anything that we can learn from this data, though, I did a bit of digging, and came up with some illuminating results.

The Procedure I started off by finding all the unexpired certificates logged in Certificate Transparency (CT) logs that have a key that is in the pwnedkeys database as having been publicly disclosed. From this list of certificates, I removed duplicates by matching up issuer/serial number tuples, and then reduced the set by counting the number of unique certificates by their issuer. This gave me a list of the issuers of these certificates, which looks a bit like this:
/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA
/C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA
/C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2
/C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2
/C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA
/C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020
Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.

The Data
Lieutenant Commander Data from Star Trek: The Next Generation Insert obligatory "not THAT data" comment here
The end result of this work is the following table, sorted by the count of certificates which have been compromised by exposing their private key:
IssuerCompromised Count
Sectigo170
ISRG (Let's Encrypt)161
GoDaddy141
DigiCert81
GlobalSign46
Entrust3
SSL.com1
If you re familiar with the CA ecosystem, you ll probably recognise that the organisations with large numbers of compromised certificates are also those who issue a lot of certificates. So far, nothing particularly surprising, then. Let s look more closely at the relationships, though, to see if we can get more useful insights.

Volume Control Using the issuance volume report from crt.sh, we can compare issuance volumes to compromise counts, to come up with a compromise rate . I m using the Unexpired Precertificates colume from the issuance volume report, as I feel that s the number that best matches the certificate population I m examining to find compromised certificates. To maintain parity with the previous table, this one is still sorted by the count of certificates that have been compromised.
IssuerIssuance VolumeCompromised CountCompromise Rate
Sectigo88,323,0681701 in 519,547
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
GoDaddy56,121,4291411 in 398,024
DigiCert144,713,475811 in 1,786,586
GlobalSign1,438,485461 in 31,271
Entrust23,16631 in 7,722
SSL.com171,81611 in 171,816
If we now sort this table by compromise rate, we can see which organisations have the most (and least) leakiness going on from their customers:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
Sectigo88,323,0681701 in 519,547
DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
By grouping by order-of-magnitude in the compromise rate, we can identify three bands :
  • The Super Leakers: Customers of Entrust and GlobalSign seem to love to lose control of their private keys. For Entrust, at least, though, the small volumes involved make the numbers somewhat untrustworthy. The three compromised certificates could very well belong to just one customer, for instance. I m not aware of anything that GlobalSign does that would make them such an outlier, either, so I m inclined to think they just got unlucky with one or two customers, but as CAs don t include customer IDs in the certificates they issue, it s not possible to say whether that s the actual cause or not.
  • The Regular Leakers: Customers of SSL.com, GoDaddy, and Sectigo all have compromise rates in the 1-in-hundreds-of-thousands range. Again, the low volumes of SSL.com make the numbers somewhat unreliable, but the other two organisations in this group have large enough numbers that we can rely on that data fairly well, I think.
  • The Low Leakers: Customers of DigiCert and Let s Encrypt are at least three times less likely than customers of the regular leakers to lose control of their private keys. Good for them!
Now we have some useful insights we can think about.

Why Is It So?
Professor Julius Sumner Miller If you don't know who Professor Julius Sumner Miller is, I highly recommend finding out
All of the organisations on the list, with the exception of Let s Encrypt, are what one might term traditional CAs. To a first approximation, it s reasonable to assume that the vast majority of the customers of these traditional CAs probably manage their certificates the same way they have for the past two decades or more. That is, they generate a key and CSR, upload the CSR to the CA to get a certificate, then copy the cert and key somewhere. Since humans are handling the keys, there s a higher risk of the humans using either risky practices, or making a mistake, and exposing the private key to the world. Let s Encrypt, on the other hand, issues all of its certificates using the ACME (Automatic Certificate Management Environment) protocol, and all of the Let s Encrypt documentation encourages the use of software tools to generate keys, issue certificates, and install them for use. Given that Let s Encrypt has 161 compromised certificates currently in the wild, it s clear that the automation in use is far from perfect, but the significantly lower compromise rate suggests to me that lifecycle automation at least reduces the rate of key compromise, even though it doesn t eliminate it completely.

Explaining the Outlier The difference in presumed issuance practices would seem to explain the significant difference in compromise rates between Let s Encrypt and the other organisations, if it weren t for one outlier. This is a largely traditional CA, with the manual-handling issues that implies, but with a compromise rate close to that of Let s Encrypt. We are, of course, talking about DigiCert. The thing about DigiCert, that doesn t show up in the raw numbers from crt.sh, is that DigiCert manages the issuance of certificates for several of the biggest hosted TLS providers, such as CloudFlare and AWS. When these services obtain a certificate from DigiCert on their customer s behalf, the private key is kept locked away, and no human can (we hope) get access to the private key. This is supported by the fact that no certificates identifiably issued to either CloudFlare or AWS appear in the set of certificates with compromised keys. When we ask for all certificates issued by DigiCert , we get both the certificates issued to these big providers, which are very good at keeping their keys under control, as well as the certificates issued to everyone else, whose key handling practices may not be quite so stringent. It s possible, though not trivial, to account for certificates issued to these hosted TLS providers, because the certificates they use are issued from intermediates branded to those companies. With the crt.sh psql interface we can run this query to get the total number of unexpired precertificates issued to these managed services:
SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2])
  FROM (
    SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER,
           ca.NUM_ISSUED, ca.NUM_EXPIRED
      FROM ccadb_certificate cc, ca_certificate cac, ca
     WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID
       AND cac.CA_ID = ca.ID
  GROUP BY ca.ID
  ) sub
 WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';
The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:
IssuerIssuance VolumeCompromised CountCompromise Rate
Entrust23,16631 in 7,722
GlobalSign1,438,485461 in 31,271
SSL.com171,81611 in 171,816
GoDaddy56,121,4291411 in 398,024
"Regular" DigiCert40,397,363811 in 498,732
Sectigo88,323,0681701 in 519,547
All DigiCert144,713,475811 in 1,786,586
ISRG (Let's Encrypt)315,476,4021611 in 1,959,480
In short, it appears that DigiCert s regular customers are just as likely as GoDaddy or Sectigo customers to expose their private keys.

What Does It All Mean? The takeaway from all this is fairly straightforward, and not overly surprising, I believe.

The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.

Humans Are, Of Course, The Problem
Bender, the robot from Futurama, asking if we'd like to kill all humans No thanks, Bender, I'm busy tonight
This observation that if you don t let humans near keys, they don t get leaked is further supported by considering the biggest issuers by volume who have not issued any certificates whose keys have been compromised: Google Trust Services (fourth largest issuer overall, with 57,084,529 unexpired precertificates), and Microsoft Corporation (sixth largest issuer overall, with 22,852,468 unexpired precertificates). It appears that somewhere between most and basically all of the certificates these organisations issue are to customers of their public clouds, and my understanding is that the keys for these certificates are managed in same manner as CloudFlare and AWS the keys are locked away where humans can t get to them. It should, of course, go without saying that if a human can never have access to a private key, it makes it rather difficult for a human to expose it. More broadly, if you are building something that handles sensitive or secret data, the more you can do to keep humans out of the loop, the better everything will be.

Your Support is Appreciated If you d like to see more analysis of how key compromise happens, and the lessons we can learn from examining billions of certificates, please show your support by buying me a refreshing beverage. Trawling CT logs is thirsty work.

Appendix: Methodology Limitations In the interests of clarity, I feel it s important to describe ways in which my research might be flawed. Here are the things I know of that may have impacted the accuracy, that I couldn t feasibly account for.
  • Time Periods: Because time never stops, there is likely to be some slight mismatches in the numbers obtained from the various data sources, because they weren t collected at exactly the same moment.
  • Issuer-to-Organisation Mapping: It s possible that the way I mapped issuers to organisations doesn t match exactly with how crt.sh does it, meaning that counts might be skewed. I tried to minimise that by using the same data sources (the CCADB AllCertificates report) that I believe that crt.sh uses for its mapping, but I cannot be certain of a perfect match.
  • Unwarranted Grouping: I ve drawn some conclusions about the practices of the various organisations based on their general approach to certificate issuance. If a particular subordinate CA that I ve grouped into the parent organisation is managed in some unusual way, that might cause my conclusions to be erroneous. I was able to fairly easily separate out CloudFlare, AWS, and Azure, but there are almost certainly others that I didn t spot, because hoo boy there are a lot of intermediate CAs out there.

29 January 2024

Michael Ablassmeier: qmpbackup 0.28

Over the last weekend i had some spare time to improve qmpbackup a little more, the new version: and some minor code reworks. Hope its useful for someone.

28 January 2024

Niels Thykier: Annotating the Debian packaging directory

In my previous blog post Providing online reference documentation for debputy, I made a point about how debhelper documentation was suboptimal on account of being static rather than online. The thing is that debhelper is not alone in this problem space, even if it is a major contributor to the number of packaging files you have to to know about. If we look at the "competition" here such as Fedora and Arch Linux, they tend to only have one packaging file. While most Debian people will tell you a long list of cons about having one packaging file (such a Fedora's spec file being 3+ domain specific languages "mashed" into one file), one major advantage is that there is only "the one packaging file". You only need to remember where to find the documentation for one file, which is great when you are running on wetware with limited storage capacity. Which means as a newbie, you can dedicate less mental resources to tracking multiple files and how they interact and more effort understanding the "one file" at hand. I started by asking myself how can we in Debian make the packaging stack more accessible to newcomers? Spoiler alert, I dug myself into rabbit hole and ended up somewhere else than where I thought I was going. I started by wanting to scan the debian directory and annotate all files that I could with documentation links. The logic was that if debputy could do that for you, then you could spend more mental effort elsewhere. So I combined debputy's packager provided files detection with a static list of files and I quickly had a good starting point for debputy-based packages.
Adding (non-static) dpkg and debhelper files to the mix Now, I could have closed the topic here and said "Look, I did debputy files plus couple of super common files". But I decided to take it a bit further. I added support for handling some dpkg files like packager provided files (such as debian/substvars and debian/symbols). But even then, we all know that debhelper is the big hurdle and a major part of the omission... In another previous blog post (A new Debian package helper: debputy), I made a point about how debputy could list all auxiliary files while debhelper could not. This was exactly the kind of feature that I would need for this feature, if this feature was to cover debhelper. Now, I also remarked in that blog post that I was not willing to maintain such a list. Also, I may have ranted about static documentation being unhelpful for debhelper as it excludes third-party provided tooling. Fortunately, a recent update to dh_assistant had provided some basic plumbing for loading dh sequences. This meant that getting a list of all relevant commands for a source package was a lot easier than it used to be. Once you have a list of commands, it would be possible to check all of them for dh's NOOP PROMISE hints. In these hints, a command can assert it does nothing if a given pkgfile is not present. This lead to the new dh_assistant list-guessed-dh-config-files command that will list all declared pkgfiles and which helpers listed them. With this combined feature set in place, debputy could call dh_assistant to get a list of pkgfiles, pretend they were packager provided files and annotate those along with manpage for the relevant debhelper command. The exciting thing about letting debpputy resolve the pkgfiles is that debputy will resolve "named" files automatically (debhelper tools will only do so when --name is passed), so it is much more likely to detect named pkgfiles correctly too. Side note: I am going to ignore the elephant in the room for now, which is dh_installsystemd and its package@.service files and the wide-spread use of debian/foo.service where there is no package called foo. For the latter case, the "proper" name would be debian/pkg.foo.service. With the new dh_assistant feature done and added to debputy, debputy could now detect the ubiquitous debian/install file. Excellent. But less great was that the very common debian/docs file was not. Turns out that dh_installdocs cannot be skipped by dh, so it cannot have NOOP PROMISE hints. Meh... Well, dh_assistant could learn about a new INTROSPECTABLE marker in addition to the NOOP PROMISE and then I could sprinkle that into a few commands. Indeed that worked and meant that debian/postinst (etc.) are now also detectable. At this point, debputy would be able to identify a wide range of debhelper related configuration files in debian/ and at least associate each of them with one or more commands. Nice, surely, this would be a good place to stop, right...?
Adding more metadata to the files The debhelper detected files only had a command name and manpage URI to that command. It would be nice if we could contextualize this a bit more. Like is this file installed into the package as is like debian/pam or is it a file list to be processed like debian/install. To make this distinction, I could add the most common debhelper file types to my static list and then merge the result together. Except, I do not want to maintain a full list in debputy. Fortunately, debputy has a quite extensible plugin infrastructure, so added a new plugin feature to provide this kind of detail and now I can outsource the problem! I split my definitions into two and placed the generic ones in the debputy-documentation plugin and moved the debhelper related ones to debhelper-documentation. Additionally, third-party dh addons could provide their own debputy plugin to add context to their configuration files. So, this gave birth file categories and configuration features, which described each file on different fronts. As an example, debian/gbp.conf could be tagged as a maint-config to signal that it is not directly related to the package build but more of a tool or style preference file. On the other hand, debian/install and debian/debputy.manifest would both be tagged as a pkg-helper-config. Files like debian/pam were tagged as ppf-file for packager provided file and so on. I mentioned configuration features above and those were added because, I have had a beef with debhelper's "standard" configuration file format as read by filearray and filedoublearray. They are often considered simple to understand, but it is hard to know how a tool will actually read the file. As an example, consider the following:
  • Will the debhelper use filearray, filedoublearray or none of them to read the file? This topic has about 2 bits of entropy.
  • Will the config file be executed if it is marked executable assuming you are using the right compat level? If it is executable, does dh-exec allow renaming for this file? This topic adds 1 or 2 bit of entropy depending on the context.
  • Will the config file be subject to glob expansions? This topic sounds like a boolean but is a complicated mess. The globs can be handled either by debhelper as it parses the file for you. In this case, the globs are applied to every token. However, this is not what dh_install does. Here the last token on each line is supposed to be a directory and therefore not subject to globs. Therefore, dh_install does the globbing itself afterwards but only on part of the tokens. So that is about 2 bits of entropy more. Actually, it gets worse...
    • If the file is executed, debhelper will refuse to expand globs in the output of the command, which was a deliberate design choice by the original debhelper maintainer took when he introduced the feature in debhelper/8.9.12. Except, dh_install feature interacts with the design choice and does enable glob expansion in the tool output, because it does so manually after its filedoublearray call.
So these "simple" files have way too many combinations of how they can be interpreted. I figured it would be helpful if debputy could highlight these difference, so I added support for those as well. Accordingly, debian/install is tagged with multiple tags including dh-executable-config and dh-glob-after-execute. Then, I added a datatable of these tags, so it would be easy for people to look up what they meant. Ok, this seems like a closed deal, right...?
Context, context, context However, the dh-executable-config tag among other are only applicable in compat 9 or later. It does not seem newbie friendly if you are told that this feature exist, but then have to read in the extended description that that it actually does not apply to your package. This problem seems fixable. Thanks to dh_assistant, it is easy to figure out which compat level the package is using. Then tweak some metadata to enable per compat level rules. With that tags like dh-executable-config only appears for packages using compat 9 or later. Also, debputy should be able to tell you where packager provided files like debian/pam are installed. We already have the logic for packager provided files that debputy supports and I am already using debputy engine for detecting the files. If only the plugin provided metadata gave me the install pattern, debputy would be able tell you where this file goes in the package. Indeed, a bit of tweaking later and setting install-pattern to usr/lib/pam.d/ name , debputy presented me with the correct install-path with the package name placing the name placeholder. Now, I have been using debian/pam as an example, because debian/pam is installed into usr/lib/pam.d in compat 14. But in earlier compat levels, it was installed into etc/pam.d. Well, I already had an infrastructure for doing compat file tags. Off we go to add install-pattern to the complat level infrastructure and now changing the compat level would change the path. Great. (Bug warning: The value is off-by-one in the current version of debhelper. This is fixed in git) Also, while we are in this install-pattern business, a number of debhelper config files causes files to be installed into a fixed directory. Like debian/docs which causes file to be installed into /usr/share/docs/ package . Surely, we can expand that as well and provide that bit of context too... and done. (Bug warning: The code currently does not account for the main documentation package context) It is rather common pattern for people to do debian/foo.in files, because they want to custom generation of debian/foo. Which means if you have debian/foo you get "Oh, let me tell you about debian/foo ". Then you rename it to debian/foo.in and the result is "debian/foo.in is a total mystery to me!". That is suboptimal, so lets detect those as well as if they were the original file but add a tag saying that they are a generate template and which file we suspect it generates. Finally, if you use debputy, almost all of the standard debhelper commands are removed from the sequence, since debputy replaces them. It would be weird if these commands still contributed configuration files when they are not actually going to be invoked. This mostly happened naturally due to the way the underlying dh_assistant command works. However, any file mentioned by the debhelper-documentation plugin would still appear unfortunately. So off I went to filter the list of known configuration files against which dh_ commands that dh_assistant thought would be used for this package.
Wrapping it up I was several layers into this and had to dig myself out. I have ended up with a lot of data and metadata. But it was quite difficult for me to arrange the output in a user friendly manner. However, all this data did seem like it would be useful any tool that wants to understand more about the package. So to get out of the rabbit hole, I for now wrapped all of this into JSON and now we have a debputy tool-support annotate-debian-directory command that might be useful for other tools. To try it out, you can try the following demo: In another day, I will figure out how to structure this output so it is useful for non-machine consumers. Suggestions are welcome. :)
Limitations of the approach As a closing remark, I should probably remind people that this feature relies heavily on declarative features. These include:
  • When determining which commands are relevant, using Build-Depends: dh-sequence-foo is much more reliable than configuring it via the Turing complete configuration we call debian/rules.
  • When debhelper commands use NOOP promise hints, dh_assistant can "see" the config files listed those hints, meaning the file will at least be detected. For new introspectable hint and the debputy plugin, it is probably better to wait until the dust settles a bit before adding any of those.
You can help yourself and others to better results by using the declarative way rather than using debian/rules, which is the bane of all introspection!

Russell Coker: Links January 2024

Long Now has an insightful article about domestication that considers whether humans have evolved to want to control nature [1]. The OMG Elite hacker cable is an interesting device [2]. A Wifi device in a USB cable to allow remote control and monitoring of data transfer, including remote keyboard control and sniffing. Pity that USB-C cables have chips in them so you can t use a spark to remove unwanted chips from modern cables. David Brin s blog post The core goal of tyrants: The Red-Caesar Cult and a restored era of The Great Man has some insightful points about authoritarianism [3]. Ron Garret wrote an interesting argument against Christianity [4], and a follow-up titled Why I Don t Believe in Jesus [5]. He has a link to a well written article about the different theologies of Jesus and Paul [6]. Dimitri John Ledkov wrote an interesting blog post about how they reduced disk space for Ubuntu kernel packages and RAM for the initramfs phase of boot [7]. I hope this gets copied to Debian soon. Joey Hess wrote an interesting blog post about trying to make LLM systems produce bad code if trained on his code without permission [8]. Arstechnica has an interesting summary of research into the security of fingerprint sensors [9]. Not surprising that the products of the 3 vendors that supply almost all PC fingerprint readers are easy to compromise. Bruce Schneier wrote an insightful blog post about how AI will allow mass spying (as opposed to mass surveillance) [10]. ZDnet has an informative article How to Write Better ChatGPT Prompts in 5 Steps [11]. I sent this to a bunch of my relatives. AbortRetryFail has an interesting article about the Itanic Saga [12]. Erberus sounds interesting, maybe VLIW designs could give a good ration of instructions to power unlike the Itanium which was notorious for being power hungry. Bruce Schneier wrote an insightful article about AI and Trust [13]. We really need laws controlling these things! David Brin wrote an interesting blog post on the obsession with historical cycles [14].

26 January 2024

Bastian Venthur: Investigating popularity of Python build backends over time

Inspired by a Mastodon post by Fran oise Conil, who investigated the current popularity of build backends used in pyproject.toml files, I wanted to investigate how the popularity of build backends used in pyproject.toml files evolved over the years since the introduction of PEP-0517 in 2015. Getting the data Tom Forbes provides a huge dataset that contains information about every file within every release uploaded to PyPI. To get the current dataset, we can use:
curl -L --remote-name-all $(curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt")
This will download approximately 30GB of parquet files, providing detailed information about each file included in a PyPI upload, including:
  1. project name, version and release date
  2. file path, size and line count
  3. hash of the file
The dataset does not contain the actual files themselves though, more on that in a moment. Querying the dataset using duckdb We can now use duckdb to query the parquet files directly. Let s look into the schema first:
describe select * from '*.parquet';
 
    column_name     column_type    null    
      varchar         varchar     varchar  
 
  project_name      VARCHAR       YES      
  project_version   VARCHAR       YES      
  project_release   VARCHAR       YES      
  uploaded_on       TIMESTAMP     YES      
  path              VARCHAR       YES      
  archive_path      VARCHAR       YES      
  size              UBIGINT       YES      
  hash              BLOB          YES      
  skip_reason       VARCHAR       YES      
  lines             UBIGINT       YES      
  repository        UINTEGER      YES      
 
  11 rows                       6 columns  
 
From all files mentioned in the dataset, we only care about pyproject.toml files that are in the project s root directory. Since we ll still have to download the actual files, we need to get the path and the repository to construct the corresponding URL to the mirror that contains all files in a bunch of huge git repositories. Some files are not available on the mirrors; to skip these, we only take files where the skip_reason is empty. We also care about the timestamp of the upload (uploaded_on) and the hash to avoid processing identical files twice:
select
    path,
    hash,
    uploaded_on,
    repository
from '*.parquet'
where
    skip_reason == '' and
    lower(string_split(path, '/')[-1]) == 'pyproject.toml' and
    len(string_split(path, '/')) == 5
order by uploaded_on desc
This query runs for a few minutes on my laptop and returns ~1.2M rows. Getting the actual files Using the repository and path, we can now construct an URL from which we can fetch the actual file for further processing:
url = f"https://raw.githubusercontent.com/pypi-data/pypi-mirror- repository /code/ path "
We can download the individual pyproject.toml files and parse them to read the build-backend into a dictionary mapping the file-hash to the build backend. Downloads on GitHub are rate-limited, so downloading 1.2M files will take a couple of days. By skipping files with a hash we ve already processed, we can avoid downloading the same file more than once, cutting the required downloads by circa 50%. Results Assuming the data is complete and my analysis is sound, these are the findings: There is a surprising amount of build backends in use, but the overall amount of uploads per build backend decreases quickly, with a long tail of single uploads:
>>> results.backend.value_counts()
backend
setuptools        701550
poetry            380830
hatchling          56917
flit               36223
pdm                11437
maturin             9796
jupyter             1707
mesonpy              625
scikit               556
                   ...
postry                 1
tree                   1
setuptoos              1
neuron                 1
avalon                 1
maturimaturinn         1
jsonpath               1
ha                     1
pyo3                   1
Name: count, Length: 73, dtype: int64
We pick only the top 4 build backends, and group the remaining ones (including PDM and Maturin) into other so they are accounted for as well. The following plot shows the relative distribution of build backends over time. Each bin represents a time span of 28 days. I chose 28 days to reduce visual clutter. Within each bin, the height of the bars corresponds to the relative proportion of uploads during that time interval: Relative distribution of build backends over time Looking at the right side of the plot, we see the current distribution. It confirms Fran oise s findings about the current popularity of build backends: Between 2018 and 2020 the graph exhibits significant fluctuations, due to the relatively low amount uploads utizing pyproject.toml files. During that early period, Flit started as the most popular build backend, but was eventually displaced by Setuptools and Poetry. Between 2020 and 2020, the overall usage of pyproject.toml files increased significantly. By the end of 2022, the share of Setuptools peaked at 70%. After 2020, other build backends experienced a gradual rise in popularity. Amongh these, Hatch emerged as a notable contender, steadily gaining traction and ultimately stabilizing at 10%. We can also look into the absolute distribution of build backends over time: Absolute distribution of build backends over time The plot shows that Setuptools has the strongest growth trajectory, surpassing all other build backends. Poetry and Hatch are growing at a comparable rate, but since Hatch started roughly 4 years after Poetry, it s lagging behind in popularity. Despite not being among the most widely used backends anymore, Flit maintains a steady and consistent growth pattern, indicating its enduring relevance in the Python packaging landscape. The script for downloading and analyzing the data can be found in my GitHub repository. It contains the results of the duckb query (so you don t have to download the full dataset) and the pickled dictionary, mapping the file hashes to the build backends, saving you days for downloading and analyzing the pyproject.toml files yourself.

25 January 2024

Joachim Breitner: GHC Steering Committee Retrospective

After seven years of service as member and secretary on the GHC Steering Committee, I have resigned from that role. So this is a good time to look back and retrace the formation of the GHC proposal process and committee. In my memory, I helped define and shape the proposal process, optimizing it for effectiveness and throughput, but memory can be misleading, and judging from the paper trail in my email archives, this was indeed mostly Ben Gamari s and Richard Eisenberg s achievement: Already in Summer of 2016, Ben Gamari set up the ghc-proposals Github repository with a sketch of a process and sent out a call for nominations on the GHC user s mailing list, which I replied to. The Simons picked the first set of members, and in the fall of 2016 we discussed the committee s by-laws and procedures. As so often, Richard was an influential shaping force here.

Three ingredients For example, it was him that suggested that for each proposal we have one committee member be the Shepherd , overseeing the discussion. I believe this was one ingredient for the process effectiveness: There is always one person in charge, and thus we avoid the delays incurred when any one of a non-singleton set of volunteers have to do the next step (and everyone hopes someone else does it). The next ingredient was that we do not usually require a vote among all members (again, not easy with volunteers with limited bandwidth and occasional phases of absence). Instead, the shepherd makes a recommendation (accept/reject), and if the other committee members do not complain, this silence is taken as consent, and we come to a decision. It seems this idea can also be traced back on Richard, who suggested that once a decision is requested, the shepherd [generates] consensus. If consensus is elusive, then we vote. At the end of the year we agreed and wrote down these rules, created the mailing list for our internal, but publicly archived committee discussions, and began accepting proposals, starting with Adam Gundry s OverloadedRecordFields. At that point, there was no secretary role yet, so how I did become one? It seems that in February 2017 I started to clean-up and refine the process documentation, fixing bugs in the process (like requiring authors to set Github labels when they don t even have permissions to do that). This in particular meant that someone from the committee had to manually handle submissions and so on, and by the aforementioned principle that at every step there ought to be exactly one person in change, the role of a secretary followed naturally. In the email in which I described that role I wrote:
Simon already shoved me towards picking up the secretary hat, to reduce load on Ben.
So when I merged the updated process documentation, I already listed myself secretary . It wasn t just Simon s shoving that put my into the role, though. I dug out my original self-nomination email to Ben, and among other things I wrote:
I also hope that there is going to be clear responsibilities and a clear workflow among the committee. E.g. someone (possibly rotating), maybe called the secretary, who is in charge of having an initial look at proposals and then assigning it to a member who shepherds the proposal.
So it is hardly a surprise that I became secretary, when it was dear to my heart to have a smooth continuous process here. I am rather content with the result: These three ingredients single secretary, per-proposal shepherds, silence-is-consent helped the committee to be effective throughout its existence, even as every once in a while individual members dropped out.

Ulterior motivation I must admit, however, there was an ulterior motivation behind me grabbing the secretary role: Yes, I did want the committee to succeed, and I did want that authors receive timely, good and decisive feedback on their proposals but I did not really want to have to do that part. I am, in fact, a lousy proposal reviewer. I am too generous when reading proposals, and more likely mentally fill gaps in a specification rather than spotting them. Always optimistically assuming that the authors surely know what they are doing, rather than critically assessing the impact, the implementation cost and the interaction with other language features. And, maybe more importantly: why should I know which changes are good and which are not so good in the long run? Clearly, the authors cared enough about a proposal to put it forward, so there is some need and I do believe that Haskell should stay an evolving and innovating language but how does this help me decide about this or that particular feature. I even, during the formation of the committee, explicitly asked that we write down some guidance on Vision and Guideline ; do we want to foster change or innovation, or be selective gatekeepers? Should we accept features that are proven to be useful, or should we accept features so that they can prove to be useful? This discussion, however, did not lead to a concrete result, and the assessment of proposals relied on the sum of each member s personal preference, expertise and gut feeling. I am not saying that this was a mistake: It is hard to come up with a general guideline here, and even harder to find one that does justice to each individual proposal. So the secret motivation for me to grab the secretary post was that I could contribute without having to judge proposals. Being secretary allowed me to assign most proposals to others to shepherd, and only once in a while myself took care of a proposal, when it seemed to be very straight-forward. Sneaky, ain t it?

7 Years later For years to come I happily played secretary: When an author finished their proposal and public discussion ebbed down they would ping me on GitHub, I would pick a suitable shepherd among the committee and ask them to judge the proposal. Eventually, the committee would come to a conclusion, usually by implicit consent, sometimes by voting, and I d merge the pull request and update the metadata thereon. Every few months I d summarize the current state of affairs to the committee (what happened since the last update, which proposals are currently on our plate), and once per year gathered the data for Simon Peyton Jones annually GHC Status Report. Sometimes some members needed a nudge or two to act. Some would eventually step down, and I d sent around a call for nominations and when the nominations came in, distributed them off-list among the committee and tallied the votes. Initially, that was exciting. For a long while it was a pleasant and rewarding routine. Eventually, it became a mere chore. I noticed that I didn t quite care so much anymore about some of the discussion, and there was a decent amount of naval-gazing, meta-discussions and some wrangling about claims of authority that was probably useful and necessary, but wasn t particularly fun. I also began to notice weaknesses in the processes that I helped shape: We could really use some more automation for showing proposal statuses, notifying people when they have to act, and nudging them when they don t. The whole silence-is-assent approach is good for throughput, but not necessary great for quality, and maybe the committee members need to be pushed more firmly to engage with each proposal. Like GHC itself, the committee processes deserve continuous refinement and refactoring, and since I could not muster the motivation to change my now well-trod secretarial ways, it was time for me to step down. Luckily, Adam Gundry volunteered to take over, and that makes me feel much less bad for quitting. Thanks for that! And although I am for my day job now enjoying a language that has many of the things out of the box that for Haskell are still only language extensions or even just future proposals (dependent types, BlockArguments, do notation with ( foo) expressions and Unicode), I m still around, hosting the Haskell Interlude Podcast, writing on this blog and hanging out at ZuriHac etc.

24 January 2024

Thomas Lange: FAI 6.2 released

After more than one a year, a new minor FAI version is available, but it includes some interesting new features. Here a the items from the NEWS file: fai (6.2) unstable; urgency=low In the past the command fai-cd was only used for creating installation ISOs, that could be used from CD or USB stick. Now it possible to create a live ISO. Therefore you create your live chroot environment using 'fai dirinstall' and then convert it to a bootable live ISO using fai-cd. See man fai-cd(8) for an example. Years ago I had the idea to use the remaining disk space on an USB stick after copying an ISO onto it. I've blogged about this recently: https://blog.fai-project.org/posts/extending-iso-images/ The new FAI version includes the tool mk-data-partition for adding a data partition to the ISO itself or to an USB stick. FAI detects this data partition, mounts it to /media/data and can then use various configurations from it. You may want to copy your own set of .deb packages or your whole FAI config space to this partition. FAI now automatically searches this partition for usable FAI configuration data and packages. FAI will install all packages from pkgs/<CLASSNAME> if the equivalent class is defined. Setting FAI_CONFIG_SRC=detect:// now looks into the data partition for the subdirectory 'config' and uses this as the config space. So it's now possible to modify an existing ISO (that is read-only) and make changes to the config space. If there's no config directory in the data partition FAI uses the default location on the ISO. The tool fai-kvm, which starts virtual machines can now boot an ISO not only as CD but also as USB stick. Sometimes users want to adjust the list of disks before the partitioning is startet. Therefore FAI provides several new functions including You can select individual disks by their model name or even the serial number. Two new FAI flags were added (tmux and screen) that make it easy to run FAI inside a tmux or screen session. And finally FAI uses systemd. Yeah! This technical change was waiting since 2015 in a merge request from Moritz 'Morty' Str be, that would enable using systemd during the installation. Before FAI still was using old-style SYSV init scripts and did not started systemd. I didn't tried to apply the patch, because I was afraid that it would need much time to make it work. But then in may 2023 Juri Grabowski just gave it a try at MiniDebConf Hamburg, and voil it just works! Many, many thanks to Moritz and Juri for their bravery. The whole changelog can be found at https://tracker.debian.org/media/packages/f/fai/changelog-6.2 New ISOs for FAI are also available including an example of a Xfce desktop live ISO: https://fai-project.org/fai-cd/ The FAIme service for creating customized installation ISOs will get its update later. The new packages are available for bookworm by adding this line to your sources.list: deb https://fai-project.org/download bookworm koeln

Next.