Search Results: "nas"

11 April 2024

Reproducible Builds: Reproducible Builds in March 2024

Welcome to the March 2024 report from the Reproducible Builds project! In our reports, we attempt to outline what we have been up to over the past month, as well as mentioning some of the important things happening more generally in software supply-chain security. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:

Arch Linux minimal container userland now 100% reproducible In remarkable news, Reproducible builds developer kpcyrd reported that that the Arch Linux minimal container userland is now 100% reproducible after work by developers dvzv and Foxboron on the one remaining package. This represents a real world , widely-used Linux distribution being reproducible. Their post, which kpcyrd suffixed with the question now what? , continues on to outline some potential next steps, including validating whether the container image itself could be reproduced bit-for-bit. The post, which was itself a followup for an Arch Linux update earlier in the month, generated a significant number of replies.

Validating Debian s build infrastructure after the XZ backdoor From our mailing list this month, Vagrant Cascadian wrote about being asked about trying to perform concrete reproducibility checks for recent Debian security updates, in an attempt to gain some confidence about Debian s build infrastructure given that they performed builds in environments running the high-profile XZ vulnerability. Vagrant reports (with some caveats):
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.
That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.

Making Fedora Linux (more) reproducible In March, Davide Cavalca gave a talk at the 2024 Southern California Linux Expo (aka SCALE 21x) about the ongoing effort to make the Fedora Linux distribution reproducible. Documented in more detail on Fedora s website, the talk touched on topics such as the specifics of implementing reproducible builds in Fedora, the challenges encountered, the current status and what s coming next. (YouTube video)

Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management Julien Malka published a brief but interesting paper in the HAL open archive on Increasing Trust in the Open Source Supply Chain with Reproducible Builds and Functional Package Management:
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDF
Julien s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can be leveraged to further improve the safety of the software supply chain , etc.

Software and source code identification with GNU Guix and reproducible builds In a long line of commendably detailed blog posts, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier have together published two interesting posts on the GNU Guix blog this month. In early March, Ludovic Court s, Maxim Cournoyer, Jan Nieuwenhuizen and Simon Tournier wrote about software and source code identification and how that might be performed using Guix, rhetorically posing the questions: What does it take to identify software ? How can we tell what software is running on a machine to determine, for example, what security vulnerabilities might affect it? Later in the month, Ludovic Court s wrote a solo post describing adventures on the quest for long-term reproducible deployment. Ludovic s post touches on GNU Guix s aim to support time travel , the ability to reliably (and reproducibly) revert to an earlier point in time, employing the iconic image of Harold Lloyd hanging off the clock in Safety Last! (1925) to poetically illustrate both the slapstick nature of current modern technology and the gymnastics required to navigate hazards of our own making.

Two new Rust-based tools for post-processing determinism Zbigniew J drzejewski-Szmek announced add-determinism, a work-in-progress reimplementation of the Reproducible Builds project s own strip-nondeterminism tool in the Rust programming language, intended to be used as a post-processor in RPM-based distributions such as Fedora In addition, Yossi Kreinin published a blog post titled refix: fast, debuggable, reproducible builds that describes a tool that post-processes binaries in such a way that they are still debuggable with gdb, etc.. Yossi post details the motivation and techniques behind the (fast) performance of the tool.

Distribution work In Debian this month, since the testing framework no longer varies the build path, James Addison performed a bulk downgrade of the bug severity for issues filed with a level of `normal` to a new level of `wishlist`. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new `ocaml_include_directories` toolchain issue [ ] and James Addison adding a new `filesystem_order_in_java_jar_manifest_mf_include_resource` issue [ ] and updating the `random_uuid_in_notebooks_generated_by_nbsphinx` to reference a relevant discussion thread [ ]. In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit `time_t` transition. Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.

Mailing list highlights Elsewhere on our mailing list this month:

Alexander Railean of Siemens asked the list to aid in understanding how one can independently verify the reproducibility of Java projects from the Maven Central repository. Having explored those repositories, Alexander could not find examples where the `buildinfo` file was present. Arnout Engelen responded with some details.

Fay Stegerman resuscitated a long-dormant thread to report that she added support in her `diff-zip-meta.py` tool to expose extra timestamps embedded in `.zip` and `.apk` metadata.

Website updates There were made a number of improvements to our website this month, including:

Pol Dellaiera noticed the frequent need to correctly cite the website itself in academic work. To facilitate easier citation across multiple formats, Pol contributed a Citation File Format (CIF) file. As a result, an export in BibTeX format is now available in the Academic Publications section. Pol encourages community contributions to further refine the `CITATION.cff` file. Pol also added an substantial new section to the buy in page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [ ][ ]

Bernhard M. Wiedemann added a new commandments page to the documentation [ ][ ] and fixed some incorrect YAML elsewhere on the site [ ].

Chris Lamb add three recent academic papers to the publications page of the website. [ ]

Mattia Rizzolo and Holger Levsen collaborated to add Infomaniak as a sponsor of `amd64` virtual machines. [ ][ ][ ]

Roland Clobus updated the stable outputs page, dropping version numbers from Python documentation pages [ ] and noting that Python s `set` data structure is also affected by the `PYTHONHASHSEED` functionality. [ ]

Delta chat clients now reproducible Delta Chat, an open source messaging application that can work over email, announced this month that the Rust-based core library underlying Delta chat application is now reproducible.

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made a number of changes such as uploading versions `259`, `260` and `261` to Debian and made the following additional changes:

New features:

Add support for the `zipdetails` tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. [ ]

Bug fixes:

Don t identify Redis database dumps as GNU R database files based simply on their filename. [ ]

Add a missing call to `File.recognizes` so we actually perform the filename check for GNU R data files. [ ]

Don t crash if we encounter an `.rdb` file without an equivalent `.rdx` file. (#1066991)

Correctly check for 7z being available and not lz4 when testing 7z. [ ]

Prevent a traceback when comparing a contentful `.pyc` file with an empty one. [ ]

Testsuite improvements:

Fix `.epub` tests after supporting the new `zipdetails` tool. [ ]

Don t use parenthesis within test skipping messages, as PyTest adds its own parenthesis. [ ]

Factor out Python version checking in `test_zip.py`. [ ]

Skip some Zip-related tests under Python 3.10.14, as a potential regression may have been backported to the 3.10.x series. [ ]

Actually test 7z support in the test_7z set of tests, not the lz4 functionality. (Closes: reproducible-builds/diffoscope#359). [ ]

In addition, Fay Stegerman updated diffoscope s monkey patch for supporting the unusual Mozilla ZIP file format after Python s `zipfile` module changed to detect potentially insecure overlapping entries within `.zip` files. (#362) Chris Lamb also updated the `trydiffoscope` command line client, dropping a build-dependency on the deprecated `python3-distutils` package to fix Debian bug #1065988 [ ], taking a moment to also refresh the packaging to the latest Debian standards [ ]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. [ ]

Upstream patches This month, we wrote a large number of patches, including:

Bernhard M. Wiedemann:

`helm` (SSL-related build failure)

`java-21-openjdk` (parallelism)

`libressl` (SSL-related build failure)

`nfdump` (date issue)

`python-django-q` (avoid stuck build)

`python-smart-open` (fails to build on single-CPU machines)

`python-stdnum` (fails to build in 2039)

`python-yarl` (regression)

`qemu` (build failure)

`rabbitmq-java-client` (with Fridrich Strba; Maven timestamp issue)

`rmw` (build fails in 2038)

`warewulf` (with Egbert Eich; `cpio` modification time and inode issue)

`wxWidgets` (fails to build in 2038)

Chris Lamb:

#1066042 filed against `python-quantities`.

#1066083 filed against `gnome-maps`.

#1066084 filed against `tox`.

#1066085 filed against `q2cli`.

#1067098 filed against `mpl-sphinx-theme`.

#1067099 filed against `woof-doom`.

#1067100 filed against `bochs`.

#1067101 filed against `storm-lang`.

#1067102 filed against `librsvg`.

#1067218 filed against `gretl`.

#1067483 filed against `postfix`.

#1067484 filed against `node-function-bind`.

#1067485 filed against `python-pysaml2`.

#1067947 filed against `golang-github-stvp-tempredis`.

James Addison:

#1065124 filed against `matplotlib`.

#1066014 filed against `pathos`.

#1066016 filed against `rdflib`.

#1066017 filed against `xonsh`.

#1066045 filed against `maven-bundle-plugin`. (This patch was then uploaded by Mattia Rizzollo.)

Ji Techet:

`geany` (toolchain-related issue for `glfw`)

Bernhard M. Wiedemann used reproducibility-tooling to detect and fix packages that added changes in their `%check` section, thus failing when built with the `--no-checks` option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against `caddy`, `exiv2`, `gnome-disk-utility`, `grisbi`, `gsl`, `itinerary`, `kosmindoormap`, `libQuotient`, `med-tools`, `plasma6-disks`, `pspp`, `python-pypuppetdb`, `python-urlextract`, `rsync`, `vagrant-libvirt` and `xsimd`. Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the `ath9k-htc-firmware` package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it s safe to say that the firmware should keep working.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In March, an enormous number of changes were made by Holger Levsen:

Debian-related changes:

Sleep less after a so-called 404 package state has occurred. [ ]

Schedule package builds more often. [ ][ ]

Regenerate all our HTML indexes every hour, but only every 12h for the released suites. [ ]

Create and update unstable and experimental base systems on `armhf` again. [ ][ ]

Don t reschedule so many depwait packages due to the current size of the `i386` architecture queue. [ ]

Redefine our scheduling thresholds and amounts. [ ]

Schedule untested packages with a higher priority, otherwise slow architectures cannot keep up with the experimental distribution growing. [ ]

Only create the `stats_buildinfo.png` graph once per day. [ ][ ]

Reproducible Debian dashboard: refactoring, update several more static stats only every 12h. [ ]

Document how to use `systemctl` with new systemd-based services. [ ]

Temporarily disable `armhf` and `i386` continuous integration tests in order to get some stability back. [ ]

Use the `deb.debian.org` CDN everywhere. [ ]

Remove the rsyslog logging facility on bookworm systems. [ ]

Add `zst` to the list of packages which are false-positive diskspace issues. [ ]

Detect failures to bootstrap Debian base systems. [ ]

Arch Linux-related changes:

Temporarily disable builds because the pacman package manager is broken. [ ][ ]

Split `reproducible_html_live_status` and split the scheduling timing . [ ][ ][ ]

Improve handling when database is locked. [ ][ ]

Misc changes:

Show failed services that require manual cleanup. [ ][ ]

Integrate two new Infomaniak nodes. [ ][ ][ ][ ]

Improve IRC notifications for artifacts. [ ]

Run diffoscope in different systemd slices. [ ]

Run the node health check more often, as it can now repair some issues. [ ][ ]

Also include the string `Bot` in the `userAgent` for Git. (Re: #929013). [ ]

Document increased `tmpfs` size on our OUSL nodes. [ ]

Disable memory account for the `reproducible_build` service. [ ][ ]

Allow 10 times as many open files for the Jenkins service. [ ]

Set `OOMPolicy=continue` and `OOMScoreAdjust=-1000` for both the Jenkins and the `reproducible_build` service. [ ]

Mattia Rizzolo also made the following changes:

Debian-related changes:

Define a `systemd` slice to group all relevant services. [ ][ ]

Add a bunch of quotes in scripts to assuage the `shellcheck` tool. [ ]

Add stats on how many packages have been built today so far. [ ]

Instruct `systemd-run` to handle diffoscope s exit codes specially. [ ]

Prefer the `pgrep` tool over grepping the output of `ps`. [ ]

Re-enable a couple of `i386` and `armhf` architecture builders. [ ][ ]

Fix some stylistic issues flagged by the Python flake8 tool. [ ]

Cease scheduling Debian unstable and experimental on the `armhf` architecture due to the `time_t` transition. [ ]

Start a few more `i386` & `armhf` workers. [ ][ ][ ]

Temporarly skip `pbuilder` updates in the unstable distribution, but only on the `armhf` architecture. [ ]

Other changes:

Perform some large-scale refactoring on how the `systemd` service operates. [ ][ ]

Move the list of workers into a separate file so it s accessible to a number of scripts. [ ]

Refactor the `powercycle_x86_nodes.py` script to use the new IONOS API and its new Python bindings. [ ]

Also fix nph-logwatch after the worker changes. [ ]

Do not install the `stunnel` tool anymore, it shouldn t be needed by anything anymore. [ ]

Move temporary directories related to Arch Linux into a single directory for clarity. [ ]

Update the `arm64` architecture host keys. [ ]

Use a common Postfix configuration. [ ]

The following changes were also made by:

Jan-Benedict Glaw:

Initial work to clean up a messy NetBSD-related script. [ ][ ]

Roland Clobus:

Show the installer log if the installer fails to build. [ ]

Avoid the minus character (i.e. `-`) in a variable in order to allow for tags in openQA. [ ]

Update the schedule of Debian live image builds. [ ]

Vagrant Cascadian:

Maintenance on the `virt` nodes is completed so bring them back online. [ ]

Use the fully qualified domain name in configuration. [ ]

Node maintenance was also performed by Holger Levsen, Mattia Rizzolo [ ][ ] and Vagrant Cascadian [ ][ ][ ][ ]

If you are interested in contributing to the Reproducible Builds project, please visit our Contribute* page on our website. However, you can get in touch with us via:

IRC: `#reproducible-builds` on `irc.oftc.net`.

Twitter: @ReproBuilds

Mastodon: @reproducible_builds@fosstodon.org

Mailing list: `rb-general@lists.reproducible-builds.org`

29 March 2024

Patryk Cisek: Sanoid on TrueNAS

syncoid to TrueNAS In my homelab, I have 2 NAS systems: Linux (Debian) TrueNAS Core (based on FreeBSD) On my Linux box, I use Jim Salter s sanoid to periodically take snapshots of my ZFS pool. I also want to have a proper backup of the whole pool, so I use syncoid to transfer those snapshots to another machine. Sanoid itself is responsible only for taking new snapshots and pruning old ones you no longer care about.

21 March 2024

Ravi Dwivedi: Thailand Trip

This post is the second and final part of my Malaysia-Thailand trip. Feel free to check out the Malaysia part here if you haven t already. Kuala Lumpur to Bangkok is around 1500 km by road, and so I took a Malaysian Airlines flight to travel to Bangkok. The flight staff at the Kuala Lumpur only asked me for a return/onward flight and Thailand immigration asked a few questions but did not check any documents (obviously they checked and stamped my passport ;)). The currency of Thailand is the Thai baht, and 1 Thai baht = 2.5 Indian Rupees. The Thailand time is 1.5 hours ahead of Indian time (For example, if it is 12 noon in India, it will be 13:30 in Thailand). I landed in Bangkok at around 3 PM local time. Fletcher was in Bangkok that time, leaving for Pattaya and we had booked the same hostel. So I took a bus to Pattaya from the airport. The next bus for which the tickets were available was at 7 PM, so I took tickets for that one. The bus ticket cost was 143 Thai Baht. I didn t buy SIM at the airport, thinking there must be better deals in the city. As a consequence, there was no way to contact Fletcher through internet. Although I had a few minutes call remaining out of my international roaming pack.

Our accommodation was near Jomtien beach, so I got off at the last stop, as the bus terminates at the Jomtien beach. Then I decided to walk towards my accommodation. I was using OsmAnd for navigation. However, the place was not marked on OpenStreetMap, and it turned out I missed the street my hostel was on and walked around 1 km further as I was chasing a similarly named incorrect hostel on OpenStreetMap. Then I asked for help from two men sitting at a caf . One of them said he will help me find the street my hostel is on. So, I walked with him, and he told me he lives in Thailand for many years, but he is from Kuwait. He also gave me valuable information. Like, he told me about shared hail-and-ride songthaews which run along the Jomtien Second Road and charge 10 Baht for any distance on their route. This tip significantly reduced our expenses. Further, he suggested me 7-Eleven shops for buying a local SIM. Like Malaysia, Thailand has 24/7 7-Eleven convenience stores, a lot of them not even 100 m apart. The Kuwaiti person dropped me at the address where my hostel was. I tried searching for a person in-charge of that hostel, and soon I realized there was no reception. After asking for help from locals for some time, I bumped into Fletcher, who also came to this address and was searching for the same. After finding a friend, I felt a sigh of relief. Adjacent to the property, there was a hairdresser shop. We went there and asked about this property. The woman called the owner, and she also told us the required passcodes to go inside. Our accommodation was in a room on the second floor, which required us to put a passcode for opening. We entered the passcode and entered the room. So, we stayed at this hostel which had no reception. Due to this, it took 2 hours to find our room and enter. It reminded me of a difficult experience I had in Albania, where me and Akshat were not able to find our apartment in one of the hottest days and the owner didn t know our language. Traveling from the place where the bus dropped me to the hostel, I saw streets were filled with bars and massage parlors, which was expected. Prostitutes were everywhere. We went out at night towards the beach and also roamed around in 7-Elevens to buy a SIM card for myself. I got a SIM for 7 day unlimited internet for 399 baht. Turns out that the rates of SIM cards at the airport were not so different from inside the city.

In terms of speaking English, locals didn t know English at all in both Pattaya and Bangkok. I normally don t expect locals to know English in a non-English speaking country, but the fact that Bangkok is one of the most visited places by tourists made me expect locals to know some English. Talking to locals is an integral part of travel for me, which I couldn t do a lot in Thailand. This aspect is much more important for me than going to touristy places. So, we were in Pattaya. Next morning, Fletcher and I went to Tiger park using shared songthaew. After that, we planned to visit Pattaya Floating market which is near the Tiger Park, but we felt the ticket prices were higher than it was worth. Fletcher had to leave for Bangkok on that day. I suggested him to go to Suvarnabhumi Airport from the Jomtien beach bus terminal (this was the route I took the last day in opposite direction) to avoid traffic congestion inside Bangkok, as he can follow up with metro once he reaches the airport. From the floating market, we were walking in sweltering heat to reach the Jomtien beach. I tried asking for a lift and eventually got successful as a scooty stopped, and surprisingly the person gave a ride to both of us. He was from Delhi, so maybe that s the reason he stopped for us. Then we took a songthaew to the bus terminal and after having lunch, Fletcher left for Bangkok.

Next day I went to Bangkok, but Fletcher already left for Kuala Lumpur. Here I had booked a private room in a hotel (instead of a hostel) for four nights, mainly because of my luggage. This costed 5600 INR for four nights. It was 2 km from the metro station, which I used to walk both sides. In Bangkok, I visited Sukhumvit and Siam by metro. Going to some areas require crossing the Chao Phraya river. For this, I took Chao Phraya Express Boat for going to places like Khao San road and Wat Arun. I would recommend taking the boat ride as it had very good views. In Bangkok, I met a person from Pakistan staying in my hotel and so here also I got some company. But by the time I met him, my days were almost over. So, we went to a random restaurant selling Indian food where we ate some paneer dish with naan and that restaurant person was from Myanmar.

For eating, I mainly relied on fruits and convenience stores. Bananas were very tasty. This was the first time I saw banana flesh being yellow. Mangoes were delicious and pineapples were smaller and flavorful. I also ate Rose Apple, which I never had before. I had Chhole Kulche once in Sukhumvit. That was a little expensive as it costed 164 baht. I also used to buy premix coffee packets from 7-Eleven convenience stores and prepare them inside the stores.

My booking from Bangkok to Delhi was in Air India flight, and they were serving alcohol in the flight. I chose red wine, and this was my first time having alcohol in a flight.

Notes

In this whole trip spanning two weeks, I did not pay for drinking water (except for once in Pattaya which was 9 baht) and toilets. Bangkok and Kuala Lumpur have plenty of malls where you should find a free-of-cost toilet nearby. For drinking water, I relied mainly on my accommodation providing refillable water for my bottle.

Thailand seemed more expensive than Malaysia on average. Malaysia had discounted price due to the Chinese New year.

I liked Pattaya more than Bangkok. Maybe because Pattaya has beach and Bangkok doesn t. Pattaya seemed more lively, and I could meet and talk to a few people as opposed to Bangkok.

Chao Phraya River express boat costs 150 baht for one day where you can hop on and off to any boat.

2 March 2024

Ravi Dwivedi: Malaysia Trip

Last month, I had a trip to Malaysia and Thailand. I stayed for six days in each of the countries. The selection of these countries was due to both of them granting visa-free entry to Indian tourists for some time window. This post covers the Malaysia part and Thailand part will be covered in the next post. If you want to travel to any of these countries in the visa-free time period, I have written all the questions asked during immigration and at airports during this trip here which might be of help. I mostly stayed in Kuala Lumpur and went to places around it. Although before the trip, I planned to visit Ipoh and Cameron Highlands too, but could not cover it during the trip. I found planning a trip to Malaysia a little difficult. The country is divided into two main islands - Peninsular Malaysia and Borneo. Then there are more islands - Langkawi, Penang island, Perhentian and Redang Islands. Reaching those islands seemed a little difficult to plan and I wish to visit more places in my next Malaysia trip. My first day hostel was booked in Chinatown part of Kuala Lumpur, near Pasar Seni LRT station. As soon as I checked-in and entered my room, I met another Indian named Fletcher, and after that we accompanied each other in the trip. That day, we went to Muzium Negara and Little India. I realized that if you know the right places to buy what you want, Malaysia could be quite cheap. Malaysian currency is Malaysian Ringgit (MYR). 1 MYR is equal to 18 INR. For 2 MYR, you can get a good masala tea in Little India and it costs like 4-5 MYR for a masala dosa. The vegetarian food has good availability in Kuala Lumpur, thanks to the Tamil community. I also tried Mee Goreng, which was vegetarian, and I found it fine in terms of taste. When I checked about Mee Goreng on Wikipedia, I found out that it is unique to Indian immigrants in Malaysia (and neighboring countries) but you don t get it in India!

For the next day, Fletcher had planned a trip to Genting Highlands and pre booked everything. I also planned to join him but when we went to KL Sentral to take the bus, his bus tickets were sold out. I could take a bus at a different time, but decided to visit some other place for the day and cover Genting Highlands later. At the ticket counter, I met a family from Delhi and they wanted to go to Genting Highlands but due to not getting bus tickets for that day, they decided to buy a ticket for the next day and instead planned for Batu Caves that day. I joined them and went to Batu Caves. After returning from Batu Caves, we went our separate ways. I went back and took rest at my hostel and later went to Petronas Towers at night. Petronas Towers is the icon of Kuala Lumpur. Having a photo there was a must. I was at Petronas Towers at around 9 PM. Around that time, Fletcher came back from Genting Highlands and we planned to meet at KL Sentral to head for dinner.

We went back to the same place as the day before where I had Mee Goreng. This time we had dosa and a masala tea. Their masala tea from the last day was tasty and that s why I was looking for them in the first place. We also met a Malaysian family having Indian ancestry dining there and had a nice conversation. Then we went to a place to eat roti canai in Pasar Seni market. Roti canai is a popular non-vegetarian dish in Malaysia but I took the vegetarian version.

The next day, we went to Berjaya Time Square shopping place which sells pretty cheap items for daily use and souveniers too. However, I bought souveniers from Petaling Street, which is in Chinatown. At night, we explored Bukit Bintang, which is the heart of Kuala Lumpur and is famous for its nightlife. After that, Fletcher went to Bangkok and I was in Malaysia for two more days. Next day, I went to Genting Highlands and took the cable car, which had awesome views. I came back to Kuala Lumpur by the night. The remaining day I just roamed around in Bukit Bintang. Then I took a flight for Bangkok on 7th Feb, which I will cover in the next post. In Malaysia, I met so many people from different countries - apart from people from Indian subcontinent, I met Syrians, Indonesians (Malaysia seems to be a popular destination for Indonesian tourists) and Burmese people. Meeting people from other cultures is an integral part of travel for me. My expenses for Food + Accommodation + Travel added to 10,000 INR for a week in Malaysia, while flight costs were: 13,000 INR (Delhi to Kuala Lumpur) + 10,000 INR (Kuala Lumpur to Bangkok) + 12,000 INR (Bangkok to Delhi). For OpenStreetMap users, good news is Kuala Lumpur is fairly well-mapped on OpenStreetMap.

Tips

I bought local SIM from a shop at KL Sentral station complex which had news in their name (I forgot the exact name and there are two shops having news in their name) and it was the cheapest option I could find. The SIM was 10 MYR for 5 GB data for a week. If you want to make calls too, then you need to spend extra 5 MYR.

7-Eleven and KK Mart convenience stores are everywhere in the city and they are open all the time (24 hours a day). If you are a vegetarian, you can at least get some bread and cheese from there to eat.

A lot of people know English (and many - Indians, Pakistanis, Nepalis - know Hindi) in Kuala Lumpur, so I had no language problems most of the time.

For shopping on budget, you can go to Petaling Street, Berjaya Time Square or Bukit Bintang. In particular, there is a shop named I Love KL Gifts in Bukit Bintang which had very good prices. just near the metro/monorail stattion. Check out location of the shop on OpenStreetMap.

19 February 2024

Matthew Garrett: Debugging an odd inability to stream video

We have a cabin out in the forest, and when I say "out in the forest" I mean "in a national forest subject to regulation by the US Forest Service" which means there's an extremely thick book describing the things we're allowed to do and (somewhat longer) not allowed to do. It's also down in the bottom of a valley surrounded by tall trees (the whole "forest" bit). There used to be AT&T copper but all that infrastructure burned down in a big fire back in 2021 and AT&T no longer supply new copper links, and Starlink isn't viable because of the whole "bottom of a valley surrounded by tall trees" thing along with regulations that prohibit us from putting up a big pole with a dish on top. Thankfully there's LTE towers nearby, so I'm simply using cellular data. Unfortunately my provider rate limits connections to video streaming services in order to push them down to roughly SD resolution. The easy workaround is just to VPN back to somewhere else, which in my case is just a Wireguard link back to San Francisco.

This worked perfectly for most things, but some streaming services simply wouldn't work at all. Attempting to load the video would just spin forever. Running tcpdump at the local end of the VPN endpoint showed a connection being established, some packets being exchanged, and then nothing. The remote service appeared to just stop sending packets. Tcpdumping the remote end of the VPN showed the same thing. It wasn't until I looked at the traffic on the VPN endpoint's external interface that things began to become clear.

This probably needs some background. Most network infrastructure has a maximum allowable packet size, which is referred to as the Maximum Transmission Unit or MTU. For ethernet this defaults to 1500 bytes, and these days most links are able to handle packets of at least this size, so it's pretty typical to just assume that you'll be able to send a 1500 byte packet. But what's important to remember is that that doesn't mean you have 1500 bytes of packet payload - that 1500 bytes includes whatever protocol level headers are on there. For TCP/IP you're typically looking at spending around 40 bytes on the headers, leaving somewhere around 1460 bytes of usable payload. And if you're using a VPN, things get annoying. In this case the original packet becomes the payload of a new packet, which means it needs another set of TCP (or UDP) and IP headers, and probably also some VPN header. This still all needs to fit inside the MTU of the link the VPN packet is being sent over, so if the MTU of that is 1500, the effective MTU of the VPN interface has to be lower. For Wireguard, this works out to an effective MTU of 1420 bytes. That means simply sending a 1500 byte packet over a Wireguard (or any other VPN) link won't work - adding the additional headers gives you a total packet size of over 1500 bytes, and that won't fit into the underlying link's MTU of 1500.

And yet, things work. But how? Faced with a packet that's too big to fit into a link, there are two choices - break the packet up into multiple smaller packets ("fragmentation") or tell whoever's sending the packet to send smaller packets. Fragmentation seems like the obvious answer, so I'd encourage you to read Valerie Aurora's article on how fragmentation is more complicated than you think. tl;dr - if you can avoid fragmentation then you're going to have a better life. You can explicitly indicate that you don't want your packets to be fragmented by setting the Don't Fragment bit in your IP header, and then when your packet hits a link where your packet exceeds the link MTU it'll send back a packet telling the remote that it's too big, what the actual MTU is, and the remote will resend a smaller packet. This avoids all the hassle of handling fragments in exchange for the cost of a retransmit the first time the MTU is exceeded. It also typically works these days, which wasn't always the case - people had a nasty habit of dropping the ICMP packets telling the remote that the packet was too big, which broke everything.

What I saw when I tcpdumped on the remote VPN endpoint's external interface was that the connection was getting established, and then a 1500 byte packet would arrive (this is kind of the behaviour you'd expect for video - the connection handshaking involves a bunch of relatively small packets, and then once you start sending the video stream itself you start sending packets that are as large as possible in order to minimise overhead). This 1500 byte packet wouldn't fit down the Wireguard link, so the endpoint sent back an ICMP packet to the remote telling it to send smaller packets. The remote should then have sent a new, smaller packet - instead, about a second after sending the first 1500 byte packet, it sent that same 1500 byte packet. This is consistent with it ignoring the ICMP notification and just behaving as if the packet had been dropped.

All the services that were failing were failing in identical ways, and all were using Fastly as their CDN. I complained about this on social media and then somehow ended up in contact with the engineering team responsible for this sort of thing - I sent them a packet dump of the failure, they were able to reproduce it, and it got fixed. Hurray!

(Between me identifying the problem and it getting fixed I was able to work around it. The TCP header includes a Maximum Segment Size (MSS) field, which indicates the maximum size of the payload for this connection. iptables allows you to rewrite this, so on the VPN endpoint I simply rewrote the MSS to be small enough that the packets would fit inside the Wireguard MTU. This isn't a complete fix since it's done at the TCP level rather than the IP level - so any large UDP packets would still end up breaking)

I've no idea what the underlying issue was, and at the client end the failure was entirely opaque: the remote simply stopped sending me packets. The only reason I was able to debug this at all was because I controlled the other end of the VPN as well, and even then I wouldn't have been able to do anything about it other than being in the fortuitous situation of someone able to do something about it seeing my post. How many people go through their lives dealing with things just being broken and having no idea why, and how do we fix that?

(Edit: thanks to this comment, it sounds like the underlying issue was a kernel bug that Fastly developed a fix for - under certain configurations, the kernel fails to associate the MTU update with the egress interface and so it continues sending overly large packets)

comments

13 February 2024

Matthew Palmer: Not all TLDs are Created Equal

In light of the recent cancellation of the queer.af domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated. Since many people may not be aware of the risks, I thought I d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.

Top-level What-Now? A top-level domain (TLD) is the last part of a domain name (the collection of words, separated by periods, after the `https://` in your web browser s location bar). It s the com in `example.com`, or the af in `queer.af`. There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs). Despite all being TLDs, they re very different beasts under the hood.

What s the Difference? Generic TLDs are what most organisations and individuals register their domains under: old-school technobabble like com , net , or org , historical oddities like gov , and the new-fangled world of words like tech , social , and bank . These gTLDs are all regulated under a set of rules created and administered by ICANN (the Internet Corporation for Assigned Names and Numbers ), which try to ensure that things aren t a complete wild-west, limiting things like price hikes (well, sometimes, anyway), and providing means for disputes over names¹. Country-code TLDs, in contrast, are all two letters long², and are given out to countries to do with as they please. While ICANN kinda-sorta has something to do with ccTLDs (in the sense that it makes them exist on the Internet), it has no authority to control how a ccTLD is managed. If a country decides to raise prices by 100x, or cancel all registrations that were made on the 12th of the month, there s nothing anyone can do about it. If that sounds bad, that s because it is. Also, it s not a theoretical problem the Taliban deciding to asssert its bigotry over the little corner of the Internet namespace it has taken control of is far from the first time that ccTLDs have caused grief.

Shifting Sands The `queer.af` cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate. Since then, of course, things have changed, and the new bosses have decided to get a bit more active. Those running `queer.af` seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn t completed that move when the Taliban came knocking.

The Curious Case of Brexit When the United Kingdom decided to leave the European Union, it fell foul of the EU s rules for the registration of domains under the eu ccTLD³. To register (and maintain) a domain name ending in `.eu`, you have to be a resident of the EU. When the UK ceased to be part of the EU, residents of the UK were no longer EU residents. Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons. Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf). In any event, all very unpleasant for everyone involved.

Geopolitics on the Internet?!? After Russia invaded Ukraine in February 2022, the Ukranian Vice Prime Minister asked ICANN to suspend ccTLDs associated with Russia. While ICANN said that it wasn t going to do that, because it wouldn t do anything useful, some domain registrars (the companies you pay to register domain names) ceased to deal in Russian ccTLDs, and some websites restricted links to domains with Russian ccTLDs. Whether or not you agree with the sort of activism implied by these actions, the fact remains that even the actions of a government that aren t directly related to the Internet can have grave consequences for your domain name if it s registered under a ccTLD. I don t think any gTLD operator will be invading a neighbouring country any time soon.

Money, Money, Money, Must Be Funny When you register a domain name, you pay a registration fee to a registrar, who does administrative gubbins and causes you to be able to control the domain name in the DNS. However, you don t own that domain name⁴ you re only renting it. When the registration period comes to an end, you have to renew the domain name, or you ll cease to be able to control it. Given that a domain name is typically your brand or identity online, the chances are you d prefer to keep it over time, because moving to a new domain name is a massive pain, having to tell all your customers or users that now you re somewhere else, plus having to accept the risk of someone registering the domain name you used to have and capturing your traffic it s all a gigantic hassle. For gTLDs, ICANN has various rules around price increases and bait-and-switch pricing that tries to keep a lid on the worst excesses of registries. While there are any number of reasonable criticisms of the rules, and the Internet community has to stay on their toes to keep ICANN from totally succumbing to regulatory capture, at least in the gTLD space there s some degree of control over price gouging. On the other hand, ccTLDs have no effective controls over their pricing. For example, in 2008 the Seychelles increased the price of `.sc` domain names from US$25 to US$75. No reason, no warning, just pay up .

Who Is Even Getting That Money? A closely related concern about ccTLDs is that some of the cool ones are assigned to countries that are not great. The poster child for this is almost certainly Libya, which has the ccTLD ly . While Libya was being run by a terrorist-supporting extremist, companies thought it was a great idea to have domain names that ended in `.ly`. These domain registrations weren t (and aren t) cheap, and it s hard to imagine that at least some of that money wasn t going to benefit the Gaddafi regime. Similarly, the British Indian Ocean Territory, which has the io ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia. Money from the registration of `.io` domains doesn t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government. Again, I m not trying to suggest that all gTLD operators are wonderful people, but it s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.

Are ccTLDs Ever Useful? The answer to that question is an unqualified maybe . I certainly don t think it s a good idea to register a domain under a ccTLD for vanity purposes: because it makes a word, is the same as a file extension you like, or because it looks cool. Those ccTLDs that clearly represent and are associated with a particular country are more likely to be OK, because there is less impetus for the registry to try a naked cash grab. Unfortunately, ccTLD registries have a disconcerting habit of changing their minds on whether they serve their geographic locality, such as when auDA decided to declare an open season in the `.au` namespace some years ago. Essentially, while a ccTLD may have geographic connotations now, there s not a lot of guarantee that they won t fall victim to scope creep in the future. Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved. At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you. Unfortunately, as the `.eu` example shows, living somewhere today is no guarantee you ll still be living there tomorrow, even if you don t move house. In short, I d suggest sticking to gTLDs. They re at least lower risk than ccTLDs.

+1, Helpful If you ve found this post informative, why not buy me a refreshing beverage? My typing fingers (both of them) thank you in advance for your generosity.

Footnotes

don t make the mistake of thinking that I approve of ICANN or how it operates; it s an omnishambles of poor governance and incomprehensible decision-making.

corresponding roughly, though not precisely (because everything has to be complicated, because humans are complicated), to the entries in the ISO standard for Codes for the representation of names of countries and their subdivisions , ISO 3166.

yes, the EU is not a country; it s part of the roughly, though not precisely caveat mentioned previously.

despite what domain registrars try very hard to imply, without falling foul of deceptive advertising regulations.

22 January 2024

Paul Tagliamonte: Writing a simulator to check phased array beamforming

Interested in future updates? Follow me on mastodon at @paul@soylent.green. Posts about hz.tools will be tagged #hztools.

If you're on the Fediverse, I'd very much appreciate boosts on my toot!

While working on hz.tools, I started to move my beamforming code from 2-D (meaning, beamforming to some specific angle on the X-Y plane for waves on the X-Y plane) to 3-D. I ll have more to say about that once I get around to publishing the code as soon as I m sure it s not completely wrong, but in the meantime I decided to write a simple simulator to visually check the beamformer against the textbooks. The results were pretty rad, so I figured I d throw together a post since it s interesting all on its own outside of beamforming as a general topic. I figured I d write this in Rust, since I ve been using Rust as my primary language over at zoo, and it s a good chance to learn the language better.

This post has some large GIFs

It make take a little bit to load depending on your internet connection. Sorry about that, I'm not clever enough to do better without doing tons of complex engineering work. They may be choppy while they load or something. I tried to compress an ensmall them, so if they're loaded but fuzzy, click on them to load a slightly larger version.

This post won t cover the basics of how phased arrays work or the specifics of calculating the phase offsets for each antenna, but I ll dig into how I wrote a simple simulator and how I wound up checking my phase offsets to generate the renders below.

Assumptions I didn t want to build a general purpose RF simulator, anything particularly generic, or something that would solve for any more than the things right in front of me. To do this as simply (and quickly all this code took about a day to write, including the beamforming math) I had to reduce the amount of work in front of me. Given that I was concerend with visualizing what the antenna pattern would look like in 3-D given some antenna geometry, operating frequency and configured beam, I made the following assumptions: All anetnnas are perfectly isotropic they receive a signal that is exactly the same strength no matter what direction the signal originates from. There s a single point-source isotropic emitter in the far-field (I modeled this as being 1 million meters away 1000 kilometers) of the antenna system. There is no noise, multipath, loss or distortion in the signal as it travels through space. Antennas will never interfere with each other.

2-D Polar Plots The last time I wrote something like this, I generated 2-D GIFs which show a radiation pattern, not unlike the polar plots you d see on a microphone. These are handy because it lets you visualize what the directionality of the antenna looks like, as well as in what direction emissions are captured, and in what directions emissions are nulled out. You can see these plots on spec sheets for antennas in both 2-D and 3-D form. Now, let s port the 2-D approach to 3-D and see how well it works out.

Writing the 3-D simulator As an EM wave travels through free space, the place at which you sample the wave controls that phase you observe at each time-step. This means, assuming perfectly synchronized clocks, a transmitter and receiver exactly one RF wavelength apart will observe a signal in-phase, but a transmitter and receiver a half wavelength apart will observe a signal 180 degrees out of phase. This means that if we take the distance between our point-source and antenna element, divide it by the wavelength, we can use the fractional part of the resulting number to determine the phase observed. If we multiply that number (in the range of 0 to just under 1) by tau, we can generate a complex number by taking the cos and sin of the multiplied phase (in the range of 0 to tau), assuming the transmitter is emitting a carrier wave at a static amplitude and all clocks are in perfect sync.

 let observed_phases: Vec<Complex> = antennas
.iter()
.map( antenna   
let distance = (antenna - tx).magnitude();
let distance = distance - (distance as i64 as f64);
((distance / wavelength) * TAU)
 )
.map( phase  Complex(phase.cos(), phase.sin()))
.collect();

At this point, given some synthetic transmission point and each antenna, we know what the expected complex sample would be at each antenna. At this point, we can adjust the phase of each antenna according to the beamforming phase offset configuration, and add up every sample in order to determine what the entire system would collectively produce a sample as.

 let beamformed_phases: Vec<Complex> = ...;
let magnitude = beamformed_phases
.iter()
.zip(observed_phases.iter())
.map( (beamformed, observed)  observed * beamformed)
.reduce( acc, el  acc + el)
.unwrap()
.abs();

Armed with this information, it s straight forward to generate some number of (Azimuth, Elevation) points to sample, generate a transmission point far away in that direction, resolve what the resulting Complex sample would be, take its magnitude, and use that to create an (x, y, z) point at (azimuth, elevation, magnitude). The color attached two that point is based on its distance from (0, 0, 0). I opted to use the Life Aquatic table for this one. After this process is complete, I have a point cloud of ((x, y, z), (r, g, b)) points. I wrote a small program using kiss3d to render point cloud using tons of small spheres, and write out the frames to a set of PNGs, which get compiled into a GIF. Now for the fun part, let s take a look at some radiation patterns!

1x4 Phased Array The first configuration is a phased array where all the elements are in perfect alignment on the `y` and `z` axis, and separated by some offset in the `x` axis. This configuration can sweep 180 degrees (not the full 360), but can t be steared in elevation at all. Let s take a look at what this looks like for a well constructed 1x4 phased array: And now let s take a look at the renders as we play with the configuration of this array and make sure things look right. Our initial quarter-wavelength spacing is very effective and has some outstanding performance characteristics. Let s check to see that everything looks right as a first test. Nice. Looks perfect. When pointing forward at `(0, 0)`, we d expect to see a torus, which we do. As we sweep between 0 and 360, astute observers will notice the pattern is mirrored along the axis of the antennas, when the beam is facing forward to 0 degrees, it ll also receive at 180 degrees just as strong. There s a small sidelobe that forms when it s configured along the array, but it also becomes the most directional, and the sidelobes remain fairly small.

Long compared to the wavelength (1 ) Let s try again, but rather than spacing each antenna of a wavelength apart, let s see about spacing each antenna 1 of a wavelength apart instead. The main lobe is a lot more narrow (not a bad thing!), but some significant sidelobes have formed (not ideal). This can cause a lot of confusion when doing things that require a lot of directional resolution unless they re compensated for.

Going from ( to 5 ) The last model begs the question - what do things look like when you separate the antennas from each other but without moving the beam? Let s simulate moving our antennas but not adjusting the configured beam or operating frequency. Very cool. As the spacing becomes longer in relation to the operating frequency, we can see the sidelobes start to form out of the end of the antenna system.

2x2 Phased Array The second configuration I want to try is a phased array where the elements are in perfect alignment on the `z` axis, and separated by a fixed offset in either the `x` or `y` axis by their neighbor, forming a square when viewed along the x/y axis. Let s take a look at what this looks like for a well constructed 2x2 phased array: Let s do the same as above and take a look at the renders as we play with the configuration of this array and see what things look like. This configuration should suppress the sidelobes and give us good performance, and even give us some amount of control in elevation while we re at it. Sweet. Heck yeah. The array is quite directional in the configured direction, and can even sweep a little bit in elevation, a definite improvement from the 1x4 above.

Long compared to the wavelength (1 ) Let s do the same thing as the 1x4 and take a look at what happens when the distance between elements is long compared to the frequency of operation say, 1 of a wavelength apart? What happens to the sidelobes given this spacing when the frequency of operation is much different than the physical geometry? Mesmerising. This is my favorate render. The sidelobes are very fun to watch come in and out of existence. It looks absolutely other-worldly.

Going from ( to 5 ) Finally, for completeness' sake, what do things look like when you separate the antennas from each other just as we did with the 1x4? Let s simulate moving our antennas but not adjusting the configured beam or operating frequency. Very very cool. The sidelobes wind up turning the very blobby cardioid into an electromagnetic dog toy. I think we ve proven to ourselves that using a phased array much outside its designed frequency of operation seems like a real bad idea.

Future Work Now that I have a system to test things out, I m a bit more confident that my beamforming code is close to right! I d love to push that code over the line and blog about it, since it s a really interesting topic on its own. Once I m sure the code involved isn t full of lies, I ll put it up on the hztools org, and post about it here and on mastodon.

Russell Coker: Storage Trends 2024

It has been less than a year since my last post about storage trends [1] and enough has changed to make it worth writing again. My previous analysis was that for <2TB only SSD made sense, for 4TB SSD made sense for business use while hard drives were still a good option for home use, and for 8TB+ hard drives were clearly the best choice for most uses. I will start by looking at MSY prices, they aren't the cheapest (you can get cheaper online) but they are competitive and they make it easy to compare the different options. I'll also compare the cheapest options in each size, there are more expensive options but usually if you want to pay more then the performance benefits of SSD (both SATA and NVMe) are even more appealing. All prices are in Australian dollars and of parts that are readily available in Australia, but the relative prices of the parts are probably similar in most countries. The main issue here is when to use SSD and when to use hard disks, and then if SSD is chosen which variety to use. Small Storage For my last post the cheapest storage devices from MSY were $19 for a 128G SSD, now it s $24 for a 128G SSD or NVMe device. I don t think the Australian dollar has dropped much against foreign currencies, so I guess this is partly companies wanting more profits and partly due to the demand for more storage. Items that can t sell in quantity need higher profit margins if they are to have them in stock. 500G SSDs are around $33 and 500G NVMe devices for $36 so for most use cases it wouldn t make sense to buy anything smaller than 500G. The cheapest hard drive is $45 for a 1TB disk. A 1TB SATA SSD costs $61 and a 1TB NVMe costs $79. So 1TB disks aren t a good option for any use case. A 2TB hard drive is $89. A 2TB SATA SSD is $118 and a 2TB NVMe is $145. I don t think the small savings you can get from using hard drives makes them worth using for 2TB. For most people if you have a system that s important to you then $145 on storage isn t a lot to spend. It seems hardly worth buying less than 2TB of storage, even for a laptop. Even if you don t use all the space larger storage devices tend to support more writes before wearing out so you still gain from it. A 2TB NVMe device you buy for a laptop now could be used in every replacement laptop for the next 10 years. I only have 512G of storage in my laptop because I have a collection of SSD/NVMe devices that have been replaced in larger systems, so the 512G is essentially free for my laptop as I bought a larger device for a server. For small business use it doesn t make sense to buy anything smaller than 2TB for any system other than a router. If you buy smaller devices then you will sometimes have to pay people to install bigger ones and when the price is $145 it s best to just pay that up front and be done with it. Medium Storage A 4TB hard drive is $135. A 4TB SATA SSD is $319 and a 4TB NVMe is $299. The prices haven t changed a lot since last year, but a small increase in hard drive prices and a small decrease in SSD prices makes SSD more appealing for this market segment. A common size range for home servers and small business servers is 4TB or 8TB of storage. To do that on SSD means about $600 for 4TB of RAID-1 or $900 for 8TB of RAID-5/RAID-Z. That s quite affordable for that use. For 8TB of less important storage a 8TB hard drive costs $239 and a 8TB SATA SSD costs $899 so a hard drive clearly wins for the specific case of non-RAID single device storage. Note that the U.2 devices are more competitive for 8TB than SATA but I included them in the next section because they are more difficult to install. Serious Storage With 8TB being an uncommon and expensive option for consumer SSDs the cheapest price is for multiple 4TB devices. To have multiple NVMe devices in one PCIe slot you need PCIe bifurcation (treating the PCIe slot as multiple slots). Most of the machines I use don t support bifurcation and most affordable systems with ECC RAM don t have it. For cheap NVMe type storage there are U.2 devices (the enterprise form of NVMe). Until recently they were too expensive to use for desktop systems but now there are PCIe cards for internal U.2 devices, $14 for a card that takes a single U.2 is a common price on AliExpress and prices below $600 for a 7.68TB U.2 device are common that s cheaper on a per-TB basis than SATA SSD and NVMe! There are PCIe cards that take up to 4*U.2 devices (which probably require bifurcation) which means you could have 8+ U.2 devices in one not particularly high end PC for 56TB of RAID-Z NVMe storage. Admittedly $4200 for 56TB is moderately expensive, but it s in the price range for a small business server or a high end home server. A more common configuration might be 2*7.68TB U.2 on a single PCIe card (or 2 cards if you don t have bifurcation) for 7.68TB of RAID-1 storage. For SATA SSD AliExpress has a 6*2.5 hot-swap device that fits in a 5.25 bay for $63, so if you have 2*5.25 bays you could have 12*4TB SSDs for 44TB of RAID-Z storage. That wouldn t be much cheaper than 8*7.68TB U.2 devices and would be slower and have less space. But it would be a good option if PCIe bifurcation isn t possible. 16TB SATA hard drives cost $559 which is almost exactly half the price per TB of U.2 storage. That doesn t seem like a good deal. If you want 16TB of RAID storage then 3*7.68TB U.2 devices only costs about 50% more than 2*16TB SATA disks. In most cases paying 50% more to get NVMe instead of hard disks is a good option. As sizes go above 16TB prices go up in a more than linear manner, I guess they don t sell much volume of larger drives. 15.36TB U.2 devices are on sale for about $1300, slightly more than twice the price of a 16TB disk. It s within the price range of small businesses and serious home users. Also it should be noted that the U.2 devices are designed for enterprise levels of reliability and the hard disk prices I m comparing to are the cheapest available. If NAS hard disks were compared then the price benefit of hard disks would be smaller. Probably the biggest problem with U.2 for most people is that it s an uncommon technology that few people have much experience with or spare parts for testing. Also you can t buy U.2 gear at your local computer store which might mean that you want to have spare parts on hand which is an extra expense. For enterprise use I ve recently been involved in discussions with a vendor that sells multiple petabyte arrays of NVMe. Apparently NVMe is cheap enough that there s no need to use anything else if you want a well performing file server. Do Hard Disks Make Sense? There are specific cases like comparing a 8TB hard disk to a 8TB SATA SSD or a 16TB hard disk to a 15.36TB U.2 device where hard disks have an apparent advantage. But when comparing RAID storage and counting the performance benefits of SSD the savings of using hard disks don t seem to be that great. Is now the time that hard disks are going to die in the market? If they can t get volume sales then prices will go up due to lack of economy of scale in manufacture and increased stock time for retailers. 8TB hard drives are now more expensive than they were 9 months ago when I wrote my previous post, has a hard drive price death spiral already started? SSDs are cheaper than hard disks at the smallest sizes, faster (apart from some corner cases with contiguous IO), take less space in a computer, and make less noise. At worst they are a bit over twice the cost per TB. But the most common requirements for storage are small enough and cheap enough that being twice as expensive as hard drives isn t a problem for most people. I predict that hard disks will become less popular in future and offer less of a price advantage. The vendors are talking about 50TB hard disks being available in future but right now you can fit more than 50TB of NVMe or U.2 devices in a volume less than that of a 3.5 hard disk so for storage density SSD can clearly win. Maybe in future hard disks will be used in arrays of 100TB devices for large scale enterprise storage. But for home users and small businesses the current sizes of SSD cover most uses. At the moment it seems that the one case where hard disks can really compare well is for backup devices. For backups you want large storage, good contiguous write speeds, and low prices so you can buy plenty of them. Further Issues The prices I ve compared for SATA SSD and NVMe devices are all based on the cheapest devices available. I think it s a bit of a market for lemons [2] as devices often don t perform as well as expected and the incidence of fake products purporting to be from reputable companies is high on the cheaper sites. So you might as well buy the cheaper devices. An advantage of the U.2 devices is that you know that they will be reliable and perform well. One thing that concerns me about SSDs is the lack of knowledge of their failure cases. Filesystems like ZFS were specifically designed to cope with common failure cases of hard disks and I don t think we have that much knowledge about how SSDs fail. But with 3 copies of metadata BTFS or ZFS should survive unexpected SSD failure modes. I still have some hard drives in my home server, they keep working well enough and the prices on SSDs keep dropping. But if I was buying new storage for such a server now I d get U.2. I wonder if tape will make a comeback for backup. Does anyone know of other good storage options that I missed?

21 January 2024

Debian Brasil: MiniDebConf BH 2024 - patroc nio e financiamento coletivo

J est rolando a inscri o de participante e a chamada de atividades para a MiniDebConf Belo Horizonte 2024, que acontecer de 27 a 30 de abril no Campus Pampulha da UFMG. Este ano estamos ofertando bolsas de alimenta o, hospedagem e passagens para contribuidores(as) ativos(as) do Projeto Debian. Patroc nio: Para a realiza o da MiniDebConf, estamos buscando patroc nio financeiro de empresas e entidades. Ent o se voc trabalha em uma empresa/entidade (ou conhece algu m que trabalha em uma) indique o nosso plano de patroc nio para ela. L voc ver os valores de cada cota e os seus benef cios. Financiamento coletivo: Mas voc tamb m pode ajudar a realiza o da MiniDebConf por meio do nosso financiamento coletivo! Fa a uma doa o de qualquer valor e tenha o seu nome publicado no site do evento como apoiador(a) da MiniDebConf Belo Horizonte 2024. Mesmo que voc n o pretenda vir a Belo Horizonte para participar do evento, voc pode doar e assim contribuir para o mais importante evento do Projeto Debian no Brasil. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o

14 January 2024

Debian Brasil: MiniDebConf BH 2024 - abertura de inscri o e chamada de atividades

Est aberta a inscri o de participantes e a chamada de atividades para a MiniDebConf Belo Horizonte 2024 e para o FLISOL - Festival Latino-americano de Instala o de Software Livre. Veja abaixo algumas informa es importantes: Data e local da MiniDebConf e do FLISOL A MiniDebConf acontecer de 27 a 30 de abril no Campus Pampulha da UFMG - Universidade Federal de Minas Gerais. No dia 27 (s bado) tamb m realizaremos uma edi o do FLISOL - Festival Latino-americano de Instala o de Software Livre, evento que acontece no mesmo dia em v rias cidades da Am rica Latina. Enquanto a MiniDebConf ter atividades focados no Debian, o FLISOL ter atividades gerais sobre Software Livre e temas relacionados como linguagem de programa o, CMS, administra o de redes e sistemas, filosofia, liberdade, licen as, etc. Inscri o gratuita e oferta de bolsas Voc j pode realizar a sua inscri o gratuita para a MiniDebConf Belo Horizonte 2024. A MiniDebConf um evento aberto a todas as pessoas, independente do seu n vel de conhecimento sobre Debian. O mais importante ser reunir a comunidade para celebrar um dos maiores projeto de Software Livre no mundo, por isso queremos receber desde usu rios(as) inexperientes que est o iniciando o seu contato com o Debian at Desenvolvedores(as) oficiais do projeto. Ou seja, est o todos(as) convidados(as)! Este ano estamos ofertando bolsas de hospedagem e passagens para viabilizar a vinda de pessoas de outras cidades que contribuem para o Projeto Debian. Contribuidores(as) n o oficiais, DMs e DDs podem solicitar as bolsas usando o formul rio de inscri o. Tamb m estamos ofertando bolsas de alimenta o para todos(as) os(as) participantes, mesmo n o contribuidores(as), e pessoas que moram na regi o de BH. Os recursos financeiros s o bastante limitados, mas tentaremos atender o m ximo de pedidos. Se voc pretende pedir alguma dessas bolsas, acesse este link e veja mais informa es antes de realizar a sua inscri o: A inscri o (sem bolsas) poder ser feita at a data do evento, mas temos uma data limite para o pedido de bolsas de hospedagem e passagens, por isso fique atento(a) ao prazo final: at 18 de fevereiro. Como estamos usando mesmo formul rio para os dois eventos, a inscri o ser v lida tanto para a MiniDebConf quanto para o FLISOL. Para se inscrever, acesse o site, v em Criar conta. Criei a sua conta (preferencialmente usando o Salsa) e acesse o seu perfil. L voc ver o bot o de Se inscrever. https://bh.mini.debconf.org Chamada de atividades Tamb m est aberta a chamada de atividades tanto para MiniDebConf quanto para o FLISOL. Para mais informa es, acesse este link. Fique atento ao prazo final para enviar sua proposta de atividade: at 18 de fevereiro. Contato Qualquer d vida, mande um email para contato@debianbrasil.org.br Organiza o

2 January 2024

Matthew Garrett: Dealing with weird ELF libraries

Libraries are collections of code that are intended to be usable by multiple consumers (if you're interested in the etymology, watch this video). In the old days we had what we now refer to as "static" libraries, collections of code that existed on disk but which would be copied into newly compiled binaries. We've moved beyond that, thankfully, and now make use of what we call "dynamic" or "shared" libraries - instead of the code being copied into the binary, a reference to the library function is incorporated, and at runtime the code is mapped from the on-disk copy of the shared object[1]. This allows libraries to be upgraded without needing to modify the binaries using them, and if multiple applications are using the same library at once it only requires that one copy of the code be kept in RAM.

But for this to work, two things are necessary: when we build a binary, there has to be a way to reference the relevant library functions in the binary; and when we run a binary, the library code needs to be mapped into the process.

(I'm going to somewhat simplify the explanations from here on - things like symbol versioning make this a bit more complicated but aren't strictly relevant to what I was working on here)

For the first of these, the goal is to replace a call to a function (eg, printf()) with a reference to the actual implementation. This is the job of the linker rather than the compiler (eg, if you use the -c argument to tell gcc to simply compile to an object rather than linking an executable, it's not going to care about whether or not every function called in your code actually exists or not - that'll be figured out when you link all the objects together), and the linker needs to know which symbols (which aren't just functions - libraries can export variables or structures and so on) are available in which libraries. You give the linker a list of libraries, it extracts the symbols available, and resolves the references in your code with references to the library.

But how is that information extracted? Each ELF object has a fixed-size header that contains references to various things, including a reference to a list of "section headers". Each section has a name and a type, but the ones we're interested in are .dynstr and .dynsym. .dynstr contains a list of strings, representing the name of each exported symbol. .dynsym is where things get more interesting - it's a list of structs that contain information about each symbol. This includes a bunch of fairly complicated stuff that you need to care about if you're actually writing a linker, but the relevant entries for this discussion are an index into .dynstr (which means the .dynsym entry isn't sufficient to know the name of a symbol, you need to extract that from .dynstr), along with the location of that symbol within the library. The linker can parse this information and obtain a list of symbol names and addresses, and can now replace the call to printf() with a reference to libc instead.

(Note that it's not possible to simply encode this as "Call this address in this library" - if the library is rebuilt or is a different version, the function could move to a different location)

Experimentally, .dynstr and .dynsym appear to be sufficient for linking a dynamic library at build time - there are other sections related to dynamic linking, but you can link against a library that's missing them. Runtime is where things get more complicated.

When you run a binary that makes use of dynamic libraries, the code from those libraries needs to be mapped into the resulting process. This is the job of the runtime dynamic linker, or RTLD[2]. The RTLD needs to open every library the process requires, map the relevant code into the process's address space, and then rewrite the references in the binary into calls to the library code. This requires more information than is present in .dynstr and .dynsym - at the very least, it needs to know the list of required libraries.

There's a separate section called .dynamic that contains another list of structures, and it's the data here that's used for this purpose. For example, .dynamic contains a bunch of entries of type DT_NEEDED - this is the list of libraries that an executable requires. There's also a bunch of other stuff that's required to actually make all of this work, but the only thing I'm going to touch on is DT_HASH. Doing all this re-linking at runtime involves resolving the locations of a large number of symbols, and if the only way you can do that is by reading a list from .dynsym and then looking up every name in .dynstr that's going to take some time. The DT_HASH entry points to a hash table - the RTLD hashes the symbol name it's trying to resolve, looks it up in that hash table, and gets the symbol entry directly (it still needs to resolve that against .dynstr to make sure it hasn't hit a hash collision - if it has it needs to look up the next hash entry, but this is still generally faster than walking the entire .dynsym list to find the relevant symbol). There's also DT_GNU_HASH which fulfills the same purpose as DT_HASH but uses a more complicated algorithm that performs even better. .dynamic also contains entries pointing at .dynstr and .dynsym, which seems redundant but will become relevant shortly.

So, .dynsym and .dynstr are required at build time, and both are required along with .dynamic at runtime. This seems simple enough, but obviously there's a twist and I'm sorry it's taken so long to get to this point.

I bought a Synology NAS for home backup purposes (my previous solution was a single external USB drive plugged into a small server, which had uncomfortable single point of failure properties). Obviously I decided to poke around at it, and I found something odd - all the libraries Synology ships were entirely lacking any ELF section headers. This meant no .dynstr, .dynsym or .dynamic sections, so how was any of this working? nm asserted that the libraries exported no symbols, and readelf agreed. If I wrote a small app that called a function in one of the libraries and built it, gcc complained that the function was undefined. But executables on the device were clearly resolving the symbols at runtime, and if I loaded them into ghidra the exported functions were visible. If I dlopen()ed them, dlsym() couldn't resolve the symbols - but if I hardcoded the offset into my code, I could call them directly.

Things finally made sense when I discovered that if I passed the --use-dynamic argument to readelf, I did get a list of exported symbols. It turns out that ELF is weirder than I realised. As well as the aforementioned section headers, ELF objects also include a set of program headers. One of the program header types is PT_DYNAMIC. This typically points to the same data that's present in the .dynamic section. Remember when I mentioned that .dynamic contained references to .dynsym and .dynstr? This means that simply pointing at .dynamic is sufficient, there's no need to have separate entries for them.

The same information can be reached from two different locations. The information in the section headers is used at build time, and the information in the program headers at run time[3]. I do not have an explanation for this. But if the information is present in two places, it seems obvious that it should be able to reconstruct the missing section headers in my weird libraries? So that's what this does. It extracts information from the DYNAMIC entry in the program headers and creates equivalent section headers.

There's one thing that makes this more difficult than it might seem. The section header for .dynsym has to contain the number of symbols present in the section. And that information doesn't directly exist in DYNAMIC - to figure out how many symbols exist, you're expected to walk the hash tables and keep track of the largest number you've seen. Since every symbol has to be referenced in the hash table, once you've hit every entry the largest number is the number of exported symbols. This seemed annoying to implement, so instead I cheated, added code to simply pass in the number of symbols on the command line, and then just parsed the output of readelf against the original binaries to extract that information and pass it to my tool.

Somehow, this worked. I now have a bunch of library files that I can link into my own binaries to make it easier to figure out how various things on the Synology work. Now, could someone explain (a) why this information is present in two locations, and (b) why the build-time linker and run-time linker disagree on the canonical source of truth?

[1] "Shared object" is the source of the .so filename extension used in various Unix-style operating systems
[2] You'll note that "RTLD" is not an acryonym for "runtime dynamic linker", because reasons
[3] For environments using the GNU RTLD, at least - I have no idea whether this is the case in all ELF environments

comments

31 December 2023

Chris Lamb: Favourites of 2023

This post should have marked the beginning of my yearly roundups of the favourite books and movies I read and watched in 2023. However, due to coming down with a nasty bout of flu recently and other sundry commitments, I wasn't able to undertake writing the necessary four or five blog posts In lieu of this, however, I will simply present my (unordered and unadorned) highlights for now. Do get in touch if this (or any of my previous posts) have spurred you into picking something up yourself

Books

Films Recent releases

The Blue Caftan (Maryam Touzani, 2022)
The Eight Mountains (Felix van Groeningen & Charlotte Vandermeersch, 2022)
Evil Does Not Exist (Ryusuke Hamaguchi, 2023)
Killers of the Flower Moon (Martin Scorcese, 2023)
Monster (Hirokazu Kore-eda, 2023)
Passages (Ira Sachs, 2023)
Poor Things (Yorgos Lanthimos, 2023)
The Tuba Thieves (Alison O Daniel, 2023)
Theater Camp (Molly Gordon and Nick Lieberman, 2023)
T R (Todd Field, 2022)

Unenjoyable experiences included Alejandro G mez Monteverde's Sound of Freedom (2023), Alex Garland's Men (2022) and Steven Spielberg's The Fabelmans (2022).
Older releases (Films released before 2022, and not including rewatches from previous years.)

Brief Encounter (David Lean, 1945)
Clouds of Sils Maria (Olivier Assayas, 2014)
Daisy Miller (Peter Bogdanovich, 1974)
First Reformed (Paul Schrader, 2017)
Forbidden Games (Ren Cl ment, 1952)
La Noire de... (Ousmane Semb ne, 1966)
The Queen of Spades (Thorold Dickinson, 1949)
The River (Jean Renoir, 1951)
Topsy-Turvy (Mike Leigh, 1999)
Le Trou (Jacques Becker, 1960)

Distinctly unenjoyable watches included Ocean's Eleven (1960), El Topo (1970), L olo (1992), Hotel Mumbai (2018), Bulworth (1998) and and The Big Red One (1980).

28 December 2023

Simon Josefsson: Validating debian/copyright: licenserecon

Recently I noticed a new tool called licenserecon written by Peter Blackman, and I helped get licenserecon into Debian. The purpose of licenserecon is to reconcile licenses from debian/copyright against the output from licensecheck, a tool written by Jonas Smedegaard. It assumes DEP5 copyright files. You run the tool in a directory that has a debian/ sub-directory, and its output when it notices mismatches (this is for resolv-wrapper):

# sudo apt install licenserecon
jas@kaka:~/dpkg/resolv-wrapper$ lrc
Parsing Source Tree ....
Running licensecheck ....
d/copyright       licensecheck
BSD-3-Clauses     BSD-3-clause     src/resolv_wrapper.c
BSD-3-Clauses     BSD-3-clause     tests/dns_srv.c
BSD-3-Clauses     BSD-3-clause     tests/test_dns_fake.c
BSD-3-Clauses     BSD-3-clause     tests/test_res_query_search.c
BSD-3-Clauses     BSD-3-clause     tests/torture.c
BSD-3-Clauses     BSD-3-clause     tests/torture.h
jas@kaka:~/dpkg/resolv-wrapper$

Noticing one-character typos like this may not bring satisfaction except to the most obsessive-compulsive among us, however the tool has the potential of discovering more serious mistakes. Using it manually once in a while may be useful, however I tend to forget QA steps that are not automated. Could we add this to the Salsa CI/CD pipeline? I recently proposed a merge request to add a wrap-and-sort job to the Salsa CI/CD pipeline (disabled by default) and learned how easy it was to extend it. I think licenserecon is still a bit rough on the edges, and I haven t been able to successfully use it on any but the simplest packages yet. I wouldn t want to suggest it is added to the normal Salsa CI/CD pipeline, even if disabled. If you maintain a Debian package on Salsa and wish to add a licenserecon job to your pipeline, I wrote licenserecon.yml for you. The simplest way to use licenserecon.yml is to replace recipes/debian.yml@salsa-ci-team/pipeline as the Salsa CI/CD configuration file setting with debian/salsa-ci.yml@debian/licenserecon. If you use a debian/salsa-ci.yml file you may put something like this in it instead:

---
include:
  - https://salsa.debian.org/salsa-ci-team/pipeline/raw/master/recipes/debian.yml
  - https://salsa.debian.org/debian/licenserecon/raw/main/debian/licenserecon.yml

Once you trigger the pipeline, this will result in a new job licenserecon that validates debian/copyright against licensecheck output on every build! I have added this to the libcpucycles package on Salsa and the pipeline contains a new job licenserecon whose output currently ends with:

$ cd $ WORKING_DIR /$ SOURCE_DIR 
$ lrc
Parsing Source Tree ....
Running licensecheck ....
No differences found
Cleaning up project directory and file based variables

If upstream releases a new version with files not matching our debian/copyright file, we will detect that on the next Salsa build job rather than months later when somebody happens to run the tools manually or there is some license conflict. Incidentally licenserecon is written in Pascal which brought back old memories with Turbo Pascal back in the MS-DOS days. Thanks Peter for licenserecon, and Jonas for licensecheck making this possible!

Antonio Terceiro: Debian CI: 10 years later

It was 2013, and I was on a break from work between Christmas and New Year of 2013. I had been working at Linaro for well over a year, on the LAVA project. I was living and breathing automated testing infrastructure, mostly for testing low-level components such as kernels and bootloaders, on real hardware. At this point I was also a Debian contributor for quite some years, and had become an official project members two years prior. Most of my involvement was in the Ruby team, where we were already consistently running upstream test suites during package builds. During that break, I put these two contexts together, and came to the conclusion that Debian needed a dedicated service that would test the contents of the Debian archive. I was aware of the existance of autopkgtest, and started working on a very simple service that would later become Debian CI. In January 2014, debci was initially announced on that month's Misc Developer News, and later uploaded to Debian. It's been continuously developed for the last 10 years, evolved from a single shell script running tests in a loop into a distributed system with 47 geographically-distributed machines as of writing this piece, became part of the official Debian release process gating migrations to testing, had 5 Summer of Code and Outrechy interns working on it, and processed beyond 40 million test runs. In there years, Debian CI has received contributions from a lot of people, but I would like to give special credits to the following:

Ian Jackson - created autopkgtest.
Martin Pitt - was the maintainer of autopkgtest when Debian CI launched and helped a lot for some time.
Paul Gevers - decided that he wanted Debian CI test runs to control testing migration. While at it, became a member of the Debian Release Team and the other half of the permanent Debian CI team together with me.
Lucas Kanashiro - Google Summer of Code intern, 2014.
Brandon Fairchild - Google Summer of Code intern, 2014.
Candy Tsai - Outreachy intern, 2019.
Pavit Kaur - Google Summer of Code intern, 2021
Abiola Ajadi - Outreachy intern, December 2021-2022.

24 December 2023

Russ Allbery: Review: Liberty's Daughter

Review: Liberty's Daughter, by Naomi Kritzer

Publisher:	Fairwood Press
Copyright:	November 2023
ISBN:	1-958880-16-7
Format:	Kindle
Pages:	257

Liberty's Daughter is a stand-alone near-future science fiction fix-up novel. The original stories were published in Fantasy and Science Fiction between 2012 and 2015. Beck Garrison lives on New Minerva (Min), one of a cluster of libertarian seasteads 220 nautical miles off the coast of Los Angeles. Her father brought her to Min when she was four, so it's the only life she knows. As this story opens, she's picked up a job for pocket change: finding very specific items that people want to buy. Since any new goods have to be shipped in and the seasteads have an ambiguous legal status, they don't get Amazon deliveries, but there are enough people (and enough tourists who bring high-value goods for trade) that someone probably has whatever someone else is looking for. Even sparkly high-heeled sandals size eight. Beck's father is high in the informal power structure of the seasteads for reasons that don't become apparent until very late in this book. Beck therefore has a comfortable, albeit cramped, life. The social protections, self-confidence, and feelings of invincibility that come with that wealth serve her well as a finder. After the current owner of the sandals bargains with her to find a person rather than an object, that privilege also lets her learn quite a lot before she starts getting into trouble. The political background of this novel is going to require some suspension of disbelief. The premise is that one of those harebrained libertarian schemes to form a freedom utopia has been successful enough to last for 49 years and attract 80,000 permanent residents. (It's a libertarian seastead so a lot of those residents are indentured slaves, as one does in libertarian philosophy. The number of people with shares, like Beck's father, is considerably smaller.) By the end of the book, Kritzer has offered some explanations for why the US would allow such a place to continue to exist, but the chances of the famously fractious con artists and incompetents involved in these types of endeavors creating something that survived internal power struggles for that long seem low. One has to roll with it for story reasons: Kritzer needs the population to be large enough for a plot, and the history to be long enough for Beck to exist as a character. The strength of this book is Beck, and specifically the fact that Beck is a second-generation teenager who grew up on the seastead. Unlike a lot of her age peers with their Cayman Islands vacations, she's never left and has no experience with life on land. She considers many things to be perfectly normal that are not at all normal to the reader and the various reader surrogates who show up over the course of the book. She also has the instinctive feel for seastead politics of the child of a prominent figure in a small town. And, most importantly, she has formed her own sense of morality and social structure that matches neither that of the reader nor that of her father. Liberty's Daughter is told in first-person by Beck. Judging the authenticity of Gen-Z thought processes is not one of my strengths, but Beck felt right to me. Her narration is dryly matter-of-fact, with only brief descriptions of her emotional reactions, but her personality shines in the occasional sarcasm and obstinacy. Kritzer has the teenage bafflement at the stupidity of adults down pat, as well as the tendency to jump head-first into ideas and make some decisions through sheer stubbornness. This is not one of those fix-up novels where the author has reworked the stories sufficiently that the original seams don't show. It is very episodic; compared to a typical novel of this length, there's more plot but less character growth. It's a good book when you want to be pulled into a stream of events that moves right along. This is not the book for deep philosophical examinations of the basis of a moral society, but it does have, around the edges, is the humans build human societies and develop elaborate social conventions and senses of belonging no matter how stupid the original philosophical foundations were. Even societies built on nasty exploitation can engender a sort of loyalty. Beck doesn't support the worst parts of her weird society, but she wants to fix it, not burn it to the ground. I thought there was a profound observation there. That brings me to my complaint: I hated the ending. Liberty's Daughter is in part Beck's fight for her own autonomy, both moral and financial, and the beginnings of an effort to turn her home into the sort of home she wants. By the end of the book, she's testing the limits of what she can accomplish, solidifying her own moral compass, and deciding how she wants to use the social position she inherited. It felt like the ending undermined all of that and treated her like a child. I know adolescence comes with those sorts of reversals, but I was still so mad. This is particularly annoying since I otherwise want to recommend this book. It's not ground-breaking, it's not that deep, but it was a thoroughly enjoyable day's worth of entertainment with a likable protagonist. Just don't read the last chapter, I guess? Or have more tolerance than I have for people treating sixteen-year-olds as if they're not old enough to make decisions. Content warnings: pandemic. Rating: 7 out of 10

22 December 2023

Russ Allbery: Review: Wintersmith

Review: Wintersmith, by Terry Pratchett

Series:	Discworld #35
Publisher:	Clarion Books
Copyright:	2006
Printing:	2007
ISBN:	0-06-089033-9
Format:	Mass market
Pages:	450

Wintersmith is the 35th Discworld novel and the 3rd Tiffany Aching novel. You could probably start here, since understanding the backstory isn't vital for following the plot, but I'm not sure why you would. Tiffany is now training with Miss Treason, a 113-year-old witch who is quite different in her approach from Miss Level, Tiffany's mentor in A Hat Full of Sky. Miss Level was the unassuming and constantly helpful glue that held the neighborhood together. Miss Treason is the judge; her neighbors are scared of her and proud of being scared of her, since that means they have a proper witch who can see into their heads and sort out their problems. On the surface, they're quite different; part of the story of this book is Tiffany learning to see the similarities. First, though, Miss Treason rushes Tiffany to a strange midnight Morris Dance, without any explanation. The Morris Dance usually celebrates the coming of spring and is at the center of a village party, so Tiffany is quite confused by seeing it danced on a dark and windy night in late autumn. But there is a hole in the dance where the Fool normally is, and Tiffany can't keep herself from joining it. This proves to be a mistake. That space was left for someone very different from Tiffany, and now she's entangled herself in deep magic that she doesn't understand. This is another Pratchett novel where the main storyline didn't do much for me. All the trouble stems from Miss Treason being maddeningly opaque, and while she did warn Tiffany, she did so in that way that guarantees a protagonist of a middle-grade novel will ignore. The Wintersmith is a boring, one-note quasi-villain, and the plot mainly revolves around elemental powers being dumber than a sack of hammers. The one thing I will say about the main plot is that the magic Tiffany danced into is entangled with courtship and romance, Tiffany turns thirteen over the course of this book, and yet this is not weird and uncomfortable reading the way it would be in the hands of many other authors. Pratchett has a keen eye for the age range that he's targeting. The first awareness that there is such a thing as romance that might be relevant to oneself pairs nicely with the Wintersmith's utter confusion at how Tiffany's intrusion unbalanced his dance. This is a very specific age and experience that I think a lot of authors would shy away from, particularly with a female protagonist, and I thought Pratchett handled it adroitly. I personally found the Wintersmith's awkward courting tedious and annoying, but that's more about me than about the book. As with A Hat Full of Sky, though, everything other than the main plot was great. It is becoming obvious how much Tiffany and Granny Weatherwax have in common, and that Granny Weatherwax recognizes this and is training Tiffany herself. This is high-quality coming-of-age material, not in the traditional fantasy sense of chosen ones and map explorations, but in the sense of slowly-developing empathy and understanding of people who think differently than you do. Tiffany, like Granny Weatherwax, has very little patience with nonsense, and her irritation with stupidity is one of her best characteristics. But she's learning how to blunt it long enough to pay attention, and to understand how people she doesn't like can still be the right people for specific situations. I particularly loved how Granny carries on with a feud at the same time that Tiffany is learning to let go of one. It's not a contradiction or hypocrisy; it's a sign that Tiffany is entitled to her judgments and feelings, but has to learn how to keep them in their place and not let them take over. One of the great things about the Tiffany Aching books is that the villages are also characters. We don't see that much of the individual people, but one of the things Tiffany is learning is how to see the interpersonal dynamics and patterns of village life. Somehow the feelings of irritation and exasperation fade once you understand people's motives and see more sides to their character. There is a lot more Nanny Ogg in this book than there has been in the last few, and that reminded me of how much I love her character. She has a completely different approach than Granny Weatherwax, but it's just as effective in different ways. She's also the perfect witch to have around when you've stumbled into a stylized love story that you don't want to be a part of, and yet find oddly fascinating. It says something about the skill of Pratchett's characterization that I could enjoy a book this much while having no interest in the main plot. The Witches have always been great characters, but somehow they're even better when seen through Tiffany's perspective. Good stuff; if you liked any of the other Tiffany Aching books, you will like this as well. Followed by Making Money in publication order. The next Tiffany Aching novel is I Shall Wear Midnight. Rating: 8 out of 10

20 December 2023

Russell Coker: Abuse and Free Software

People in positions of power can get away with mistreating other people. For any organisation to operate effectively there have to be mechanisms to address bad behaviour, both to help the organisation to achieve it s goals and to protect people who work for it. When an organisation operates in the public interest there is a greater reason to try to prevent bad behaviour as hurting people is not in the public interest. There are many forms of power, in the free software community a reputation for doing good technical work or work related to supporting software development gives some power and influence. We have seen examples of technical contributions used to excuse mistreatment of other people. The latest example of using a professional reputation to cover for abuse is Eben Moglen who has done some good legal work in the past while also treating members of the community badly (as documented by Matthew Garrett) [1]. Matthew has also documented how since 2016 Eben has not been doing good work for the free software community [2]. When news comes out about people who did good work while abusing other people they are usually defended with claims such as we can t lose the great contributions of this one person so it s worth losing the contributions of everyone who can t work with them , but in such situations it s very common to discover that they haven t been doing great work. This might be partly due to abusive people being better at self-promoting than actually doing good work and might be partly due to the fact that people who are afraid to speak out when they are doing good work might suddenly feel ready to go public if the person s work (defence) is decreasing. Bradley Kuhn s article about this situation is worth reading [3]. I don t have as much knowledge of the people involved in these disputes as Matthew, but I know enough about what is happening to be confident that Matthew s summary is accurate.

2 November 2023

Reproducible Builds: Farewell from the Reproducible Builds Summit 2023!

Farewell from the Reproducible Builds summit, which just took place in Hamburg, Germany: This year, we were thrilled to host the seventh edition of this exciting event. Topics covered this year included:

Project updates from openSUSE, Fedora, Debian, ElectroBSD, Reproducible Central and NixOS
Mapping the big picture
Towards a snapshot service
Understanding user-facing needs and personas
Language-specific package managers
Defining our definitions
Creating a Ten Commandments of reproducibility
Embedded systems
Next steps in GNU Guix reproducibility
Signature storage and sharing
Public verification services
Verification use cases
Web site audiences
Enabling new projects to be born reproducible
Collecting reproducibility success stories
Reproducibility s relationship to SBOMs
SBOMs for RPM-based distributions
Filtering diffoscope output
Reproducibility of filesystem images, filesystems and containers
Using verification data
A deep-dive on Fedora and Arch Linux package reproducibility
Debian rebuild archive service discussion

as well as countless informal discussions and hacking sessions into the night. Projects represented at the venue included:

Debian, openSUSE, QubesOS, GNU Guix, Arch Linux, phosh, Mobian, PureOS, JustBuild, LibreOffice, Warpforge, OpenWrt, F-Droid, NixOS, ElectroBSD, Apache Security, Buildroot, Systemd, Apache Maven, Fedora, Privoxy, CHAINS (KTH Royal Institute of Technology), coreboot, GitHub, Tor Project, Ubuntu, rebuilderd, repro-env, spytrap-adb, arch-repro-status, etc.

A huge thanks to our sponsors and partners for making the event possible:

Event facilitation

Platinum sponsor

If you weren t able to make it this year, don t worry; just look out for an announcement in 2024 for the next event.

22 October 2023

Aigars Mahinovs: Figuring out finances part 3

So now that I have something that looks very much like a budgeting setup going, I am going to .. delete it! Why? Well, at the end of the last part of this, the Firefly III instance was running on a tiny Debian server in a Docker container right next to another Docker container that is running the main user of this server - a Home Assistant instance that has been managing my home for several years already. So why change that? See, there is one bit of knowledge that is very crucial to your Home Assistant experience, which is not really emphasised enough in the Home Assistant documentation. In fact back when I was getting into the Home Assistant both the main documentation and basically all the guides around were just coming off the hype of Docker disrupting everything and that is a big reason why everyone suggested to install and use Home Assistant as a Docker container on top of any kind of stable OS. In fact I used to run it for years on my TerraMaster NAS, just so that I don't have a separate home server running 24/7 at home and just have everything inside the very compact NAS case. So here is the thing you NEED to know - Home Assistant Container is DEMO version of Home Assistant! If you want to have a full Home Assistant experience and use the knowledge of the huge community around the HA space, you have to use the Home Assistant OS. Ideally on dedicated hardware. Ideally on HA Green box, but any tiny PC would also work great. Raspberry Pi 4+ is common, but quite weak as the network size grows and especially the SD card for storage gets old very fast. Get a real small x86 PC with at least 4Gb RAM and a NVME SSD (eMMC is fine too). You want to have an Ethernet port and a few free USB ports. I would also suggest immediately getting HA SkyConnect adapter that can do Zigbee networking and will do Matter soon (tm). I am making do with a SonOff Zigbee gateway, but it is quite hacky to get working and your whole Zigbee communication breaks down if the WiFi goes down - suboptimal. So I took a backup of the Home Assistant instance using it's build-in tools. I took an export of my fully configured Firefly III instance and proceeded to wipe the drive of the NUC. That was not a smart idea. :D On the Home Assitant side I was really frustrated by the documentation that was really focused on users that are (likely) using Windows and are using an SD card in something like Raspberry Pi to get Home Assistant OS running. It recommended downloading Etcher to write the image to the boot medium. That is a really weird piece of software that managed to actually crash consistently when I was trying to run it from Debian Live or Ubuntu Live on my NUC. It took me way too long to give up and try something much simpler - dd. xzcat haos_generic-x86-64-11.0.img.xz dd of=/dev/mmcblk0 bs=1M That just worked, prefectly and really fast. If you want to use a GUI in a live environment, then just using the gnome-disk-utility ("Disks" in Gnome menu) and using the "Restore Disk Image ..." on a partition would work just as well. It even supports decompressing the XZ images directly while writing. But that image is small, will it not have a ton of unused disk space behind the fixed install partition? Yes, it will ... until first boot. The HA OS takes over the empty space after its install partition on the first boot-up and just grows its main partition to take up all the remaining space. Smart. After first boot is completed, the first boot wizard can be accessed via your web browser and one of the prominent buttons there is restoring from backup. So you just give it the backup file and wait. Sadly the restore does not actually give any kind of progress, so your only way to figure out when it is done is opening the same web adress in another browser tab and refresh periodically - after restoring from backup it just boots into the same config at it had before - all the settings, all the devices, all the history is preserved. Even authentification tokens are preserved so if yu had a Home Assitant Mobile installed on your phone (both for remote access and to send location info and phone state, like charging, to HA to trigger automations) then it will just suddenly start working again without further actions needed from your side. That is an almost perfect backup/restore experience. The first thing you get for using the OS version of HA is easy automatic update that also automatically takes a backup before upgrade, so if anything breaks you can roll back with one click. There is also a command-line tool that allows to upgrade, but also downgrade ha-core and other modules. I had to use it today as HA version 23.10.4 actually broke support for the Sonoff bridge that I am using to control Zigbee devices, which are like 90% of all smart devices in my home. Really helpful stuff, but not a must have. What is a must have and that you can (really) only get with Home Assistant Operating System are Addons. Some addons are just normal servers you can run alongside HA on the same HA OS server, like MariaDB or Plex or a file server. That is not the most important bit, but even there the software comes pre-configured to use in a home server configuration and has a very simple config UI to pre-configure key settings, like users, passwords and database accesses for MariaDB - you can litereally in a few clicks and few strings make serveral users each with its own access to its own database. Couple more clicks and the DB is running and will be kept restarted in case of failures. But the real gems in the Home Assistant Addon Store are modules that extend Home Assitant core functionality in way that would be really hard or near impossible to configure in Home Assitant Container manually, especially because no documentation has ever existed for such manual config - everyone just tells you to install the addon from HA Addon store or from HACS. Or you can read the addon metadata in various repos and figure out what containers it actually runs with what settings and configs and what hooks it puts into the HA Core to make them cooperate. And then do it all over again when a new version breaks everything 6 months later when you have already forgotten everything. In the Addons that show up immediately after installation are addons to install the new Matter server, a MariaDB and MQTT server (that other addons can use for data storage and message exchange), Z-Wave support and ESPHome integration and very handy File manager that includes editors to edit Home Assitant configs directly in brower and SSH/Terminal addon that boht allows SSH connection and also a web based terminal that gives access to the OS itself and also to a comand line interface, for example, to do package downgrades if needed or see detailed logs. And also there is where you can get the features that are the focus this year for HA developers - voice enablers. However that is only a beginning. Like in Debian you can add additional repositories to expand your list of available addons. Unlike Debian most of the amazing software that is available for Home Assistant is outside the main, official addon store. For now I have added the most popular addon repository - HACS (Home Assistant Community Store) and repository maintained by Alexbelgium. The first includes things like NodeRED (a workflow based automation programming UI), Tailscale/Wirescale for VPN servers, motionEye for CCTV control, Plex for home streaming. HACS also includes a lot of HA UI enhacement modules, like themes, custom UI control panels like Mushroom or mini-graph-card and integrations that provide more advanced functions, but also require more knowledge to use, like Local Tuya - that is harder to set up, but allows fully local control of (normally) cloud-based devices. And it has AppDaemon - basically a Python based automation framework where you put in Python scrips that get run in a special environment where they get fed events from Home Assistant and can trigger back events that can control everything HA can and also do anything Python can do. This I will need to explore later. And the repository by Alex includes the thing that is actually the focus of this blog post (I know :D) - Firefly III addon and Firefly Importer addon that you can then add to your Home Assistant OS with a few clicks. It also has all kinds of addons for NAS management, photo/video server, book servers and Portainer that lets us setup and run any Docker container inside the HA OS structure. HA OS will detect this and warn you about unsupported processes running on your HA OS instance (nice security feature!), but you can just dismiss that. This will be very helpful very soon. This whole environment of OS and containers and apps really made me think - what was missing in Debian that made the talented developers behind all of that to spend the immense time and effor to setup a completely new OS and app infrastructure and develop a completel paraller developer community for Home Assistant apps, interfaces and configurations. Is there anything that can still be done to make HA community and the general open source and Debian community closer together? HA devs are not doing anything wrong: they are using the best open source can provide, they bring it to people whould could not install and use it otherwise, they are contributing fixes and improvements as well. But there must be some way to do this better, together. So I installed MariaDB, create a user and database for Firefly. I installed Firefly III and configured it to use the MariaDB with the web config UI. When I went into the Firefly III web UI I was confronted with the normal wizard to setup a new instance. And no reference to any backup restore. Hmm, ok. Maybe that goes via the Importer? So I make an access token again, configured the Importer to use that, configured the Nordlinger bank connection settings. Then I tried to import the export that I downloaded from Firefly III before. The importer did not auto-recognose the format. Turns out it is just a list of transactions ... It can only be barely useful if you first manually create all the asset accounts with the same names as before and even then you'll again have to deal with resolving the problem of transfers showing up twice. And all of your categories (that have not been used yet) are gone, your automation rules and bills are gone, your budgets and piggy banks are gone. Boooo. It will be easier for me to recreate my account data from bank exports again than to resolve data in that transaction export. Turns out that Firefly III documenation explicitly recommends making a mysqldump of your own and not rely on anything in the app itself for backup purposes. Kind of sad this was not mentioned in the export page that sure looked a lot like a backup :D After doing all that work all over again I needed to make something new not to feel like I wasted days of work for no real gain. So I started solving a problem I had for a while already - how do I add cash transations to the system when I am out of the house with just my phone in the hand? So far my workaround has been just sending myself messages in WhatsApp with the amount and description of any cash expenses. Two solutions are possible: app and bot. There are actually multiple Android-based phone apps that work with Firefly III API to do full financial management from the phone. However, after trying it out, that is not what I will be using most of the time. First of all this requires your Firefly III instance to be accessible from the Internet. Either via direct API access using some port forwarding and secured with HTTPS and good access tokens, or via a VPN server redirect that is installed on both HA and your phone. Tailscale was really easy to get working. But the power has its drawbacks - adding a new cash transaction requires opening the app, choosing new transaction view, entering descriptio, amount, choosing "Cash" as source account and optionally choosing destination expense account, choosing category and budget and then submitting the form to the server. Sadly none of that really works if you have no Internet or bad Internet at the place where you are using cash. And it's just too many steps. Annoying. An easier alternative is setting up a Telegram bot - it is running in a custom Docker container right next to your Firefly (via Portainer) and you talk to it via a custom Telegram chat channel that you create very easily and quickly. And then you can just tell it "Coffee 5" and it will create a transaction from the (default) cash account in 5 amount with description "Coffee". This part also works if you are offline at the moment - the bot will receive the message once you get back online. You can use Telegram bot menu system to edit the transaction to add categories or expense accounts, but this part only work if you are online. And the Firefly instance does not have to be online at all. Really nifty. So next week I will need to write up all the regular payments as bills in Firefly (again) and then I can start writing a Python script to predict my (financial) future!

Ian Jackson: DigiSpark (ATTiny85) - Arduino, C, Rust, build systems

Recently I completed a small project, including an embedded microcontroller. For me, using the popular Arduino IDE, and C, was a mistake. The experience with Rust was better, but still very exciting, and not in a good way. Here follows the rant.

Introduction In a recent project (I ll write about the purpose, and the hardware in another post) I chose to use a DigiSpark board. This is a small board with a USB-A tongue (but not a proper plug), and an ATTiny85 microcontroller, This chip has 8 pins and is quite small really, but it was plenty for my application. By choosing something popular, I hoped for convenient hardware, and an uncomplicated experience. Convenient hardware, I got. Arduino IDE The usual way to program these boards is via an IDE. I thought I d go with the flow and try that. I knew these were closely related to actual Arduinos and saw that the IDE package arduino was in Debian. But it turns out that the Debian package s version doesn t support the DigiSpark. (AFAICT from the list it offered me, I m not sure it supports any ATTiny85 board.) Also, disturbingly, its board manager seemed to be offering to install board support, suggesting it would download stuff from the internet and run it. That wouldn t be acceptable for my main laptop. I didn t expect to be doing much programming or debugging, and the project didn t have significant security requirements: the chip, in my circuit, has only a very narrow ability do anything to the real world, and no network connection of any kind. So I thought it would be tolerable to do the project on my low-security video laptop . That s the machine where I m prepared to say yes to installing random software off the internet. So I went to the upstream Arduino site and downloaded a tarball containing the Arduino IDE. After unpacking that in /opt it ran and produced a pointy-clicky IDE, as expected. I had already found a 3rd-party tutorial saying I needed to add a magic URL (from the DigiSpark s vendor) in the preferences. That indeed allowed it to download a whole pile of stuff. Compilers, bootloader clients, god knows what. However, my tiny test program didn t make it to the board. Half-buried in a too-small window was an error message about the board s bootloader ( Micronucleus ) being too new. The boards I had came pre-flashed with micronucleus 2.2. Which is hardly new, But even so the official Arduino IDE (or maybe the DigiSpark s board package?) still contains an old version. So now we have all the downsides of curl bash-ware, but we re lacking the it s up to date and it just works upsides. Further digging found some random forum posts which suggested simply downloading a newer micronucleus and manually stuffing it into the right place: one overwrites a specific file, in the middle the heaps of stuff that the Arduino IDE s board support downloader squirrels away in your home directory. (In my case, the home directory of the untrusted shared user on the video laptop,) So, whatever . I did that. And it worked! Having demo d my ability to run code on the board, I set about writing my program. Writing C again The programming language offered via the Arduino IDE is C. It s been a little while since I started a new thing in C. After having spent so much of the last several years writing Rust. C s primitiveness quickly started to grate, and the program couldn t easily be as DRY as I wanted (Don t Repeat Yourself, see Wilson et al, 2012, 4, p.6). But, I carried on; after all, this was going to be quite a small job. Soon enough I had a program that looked right and compiled. Before testing it in circuit, I wanted to do some QA. So I wrote a simulator harness that #included my Arduino source file, and provided imitations of the few Arduino library calls my program used. As an side advantage, I could build and run the simulation on my main machine, in my normal development environment (Emacs, make, etc.). The simulator runs confirmed the correct behaviour. (Perhaps there would have been some more faithful simulation tool, but the Arduino IDE didn t seem to offer it, and I wasn t inclined to go further down that kind of path.) So I got the video laptop out, and used the Arduino IDE to flash the program. It didn t run properly. It hung almost immediately. Some very ad-hoc debugging via led-blinking (like printf debugging, only much worse) convinced me that my problem was as follows: Arduino C has 16-bit ints. My test harness was on my 64-bit Linux machine. C was autoconverting things (when building for the micrcocontroller). The way the Arduino IDE ran the compiler didn t pass the warning options necessary to spot narrowing implicit conversions. Those warnings aren t the default in C in general ~~because C compilers hate us all~~ for compatibility reasons. I don t know why those warnings are not the default in the Arduino IDE, but my guess is that they didn t want to bother poor novice programmers with messages from the compiler explaining how their program is quite possibly wrong. After all, users don t like error messages so we shouldn t report errors. And novice programmers are especially fazed by error messages so it s better to just let them struggle themselves with the arcane mysteries of undefined behaviour in C? The Arduino IDE does offer a dropdown for compiler warnings . The default is None. Setting it to All didn t produce anything about my integer overflow bugs. And, the output was very hard to find anyway because the log window has a constant stream of strange messages from javax.jmdns, with hex DNS packet dumps. WTF. Other things that were vexing about the Arduino IDE: it has fairly fixed notions (which don t seem to be documented) about how your files and directories ought to be laid out, and magical machinery for finding things you put nearby its sketch (as it calls them) and sticking them in its ear, causing lossage. It has a tendency to become confused if you edit files under its feet (e.g. with git checkout). It wasn t really very suited to a workflow where principal development occurs elsewhere. And, important settings such as the project s clock speed, or even the target board, or the compiler warning settings to use weren t stored in the project directory along with the actual code. I didn t look too hard, but I presume they must be in a dotfile somewhere. This is madness. Apparently there is an Arduino CLI too. But I was already quite exasperated, and I didn t like the idea of going so far off the beaten path, when the whole point of using all this was to stay with popular tooling and share fate with others. (How do these others cope? I have no idea.) As for the integer overflow bug: I didn t seriously consider trying to figure out how to control in detail the C compiler options passed by the Arduino IDE. (Perhaps this is possible, but not really documented?) I did consider trying to run a cross-compiler myself from the command line, with appropriate warning options, but that would have involved providing (or stubbing, again) the Arduino/DigiSpark libraries (and bugs could easily lurk at that interface). Instead, I thought, if only I had written the thing in Rust . But that wasn t possible, was it? Does Rust even support this board? Rust on the DigiSpark I did a cursory web search and found a very useful blog post by Dylan Garrett. This encouraged me to think it might be a workable strategy. I looked at the instructions there. It seemed like I could run them via the privsep arrangement I use to protect myself when developing using upstream cargo packages from crates.io. I got surprisingly far surprisingly quickly. It did, rather startlingly, cause my rustup to download a random recent Nightly Rust, but I have six of those already for other Reasons. Very quickly I got the trinket LED blink example, referenced by Dylan s blog post, to compile. Manually copying the file to the video laptop allowed me to run the previously-downloaded micronucleus executable and successfully run the blink example on my board! I thought a more principled approach to the bootloader client might allow a more convenient workflow. I found the upstream Micronucleus git releases and tags, and had a look over its source code, release dates, etc. It seemed plausible, so I compiled v2.6 from source. That was a success: now I could build and install a Rust program onto my board, from the command line, on my main machine. No more pratting about with the video laptop. I had got further, more quickly, with Rust, than with the Arduino IDE, and the outcome and workflow was superior. So, basking in my success, I copied the directory containing the example into my own project, renamed it, and adjusted the path references. That didn t work. Now it didn t build. Even after I copied about .cargo/config.toml and rust-toolchain.toml it didn t build, producing a variety of exciting messages, depending what precisely I tried. I don t have detailed logs of my flailing: the instructions say to build it by cd ing to the subdirectory, and, given that what I was trying to do was to not follow those instructions, it didn t seem sensible to try to prepare a proper repro so I could file a ticket. I wasn t optimistic about investigating it more deeply myself: I have some experience of fighting cargo, and it s not usually fun. Looking at some of the build control files, things seemed quite complicated. Additionally, not all of the crates are on crates.io. I have no idea why not. So, I would need to supply local copies of them anyway. I decided to just git subtree add the avr-hal git tree. (That seemed better than the approach taken by the avr-hal project s cargo template, since that template involve a cargo dependency on a foreign git repository. Perhaps it would be possible to turn them into path dependencies, but given that I had evidence of file-location-sensitive behaviour, which I didn t feel like I wanted to spend time investigating, using that seems like it would possibly have invited more trouble. Also, I don t like package templates very much. They re a form of clone-and-hack: you end up stuck with whatever bugs or oddities exist in the version of the template which was current when you started.) Since I couldn t get things to build outside avr-hal, I edited the example, within avr-hal, to refer to my (one) program.rs file outside avr-hal, with a #[path] instruction. That s not pretty, but it worked. I also had to write a nasty shell script to work around the lack of good support in my nailing-cargo privsep tool for builds where cargo must be invoked in a deep subdirectory, and/or Cargo.lock isn t where it expects, and/or the target directory containing build products is in a weird place. It also has to filter the output from cargo to adjust the pathnames in the error messages. Otherwise, running both cd A; cargo build and cd B; cargo build from a Makefile produces confusing sets of error messages, some of which contain filenames relative to A and some relative to B, making it impossible for my Emacs to reliably find the right file. RIIR (Rewrite It In Rust) Having got my build tooling sorted out I could go back to my actual program. I translated the main program, and the simulator, from C to Rust, more or less line-by-line. I made the Rust version of the simulator produce the same output format as the C one. That let me check that the two programs had the same (simulated) behaviour. Which they did (after fixing a few glitches in the simulator log formatting). Emboldened, I flashed the Rust version of my program to the DigiSpark. It worked right away! RIIR had caused the bug to vanish. Of course, to rewrite the program in Rust, and get it to compile, it was necessary to be careful about the types of all the various integers, so that s not so surprising. Indeed, it was the point. I was then able to refactor the program to be a bit more natural and DRY, and improve some internal interfaces. Rust s greater power, compared to C, made those cleanups easier, so making them worthwhile. However, when doing real-world testing I found a weird problem: my timings were off. Measured, the real program was too fast by a factor of slightly more than 2. A bit of searching (and searching my memory) revealed the cause: I was using a board template for an Adafruit Trinket. The Trinket has a clock speed of 8MHz. But the DigiSpark runs at 16.5MHz. (This is discussed in a ticket against one of the C/C++ libraries supporting the ATTiny85 chip.) The Arduino IDE had offered me a choice of clock speeds. I have no idea how that dropdown menu took effect; I suspect it was adding prelude code to adjust the clock prescaler. But my attempts to mess with the CPU clock prescaler register by hand at the start of my Rust program didn t bear fruit. So instead, I adopted a bodge: since my code has (for code structure reasons, amongst others) only one place where it dealt with the underlying hardware s notion of time, I simply changed my delay function to adjust the passed-in delay values, compensating for the wrong clock speed. There was probably a more principled way. For example I could have (re)based my work on either of the two unmerged open MRs which added proper support for the DigiSpark board, rather than abusing the Adafruit Trinket definition. But, having a nearly-working setup, and an explanation for the behaviour, I preferred the narrower fix to reopening any cans of worms. An offer of help As will be obvious from this posting, I m not an expert in dev tools for embedded systems. Far from it. This area seems like quite a deep swamp, and I m probably not the person to help drain it. (Frankly, much of the improvement work ought to be done, and paid for, by hardware vendors.) But, as a full Member of the Debian Project, I have considerable gatekeeping authority there. I also have much experience of software packaging, build systems, and release management. If anyone wants to try to improve the situation with embedded tooling in Debian, and is willing to do the actual packaging work. I would be happy to advise, and to review and sponsor your contributions. An obvious candidate: it seems to me that micronucleus could easily be in Debian. Possibly a DigiSpark board definition could be provided to go with the arduino package. Unfortunately, IMO Debian s Rust packaging tooling and workflows are very poor, and the first of my suggestions for improvement wasn t well received. So if you need help with improving Rust packages in Debian, please talk to the Debian Rust Team yourself. Conclusions Embedded programming is still rather a mess and probably always will be. Embedded build systems can be bizarre. Documentation is scant. You re often expected to download board support packages full of mystery binaries, from the board vendor (or others). Dev tooling is maddening, especially if aimed at novice programmers. You want version control? Hermetic tracking of your project s build and install configuration? Actually to be told by the compiler when you write obvious bugs? You re way off the beaten track. As ever, Free Software is under-resourced and the maintainers are often busy, or (reasonably) have other things to do with their lives. All is not lost Rust can be a significantly better bet than C for embedded software: The Rust compiler will catch a good proportion of programming errors, and an experienced Rust programmer can arrange (by suitable internal architecture) to catch nearly all of them. When writing for a chip in the middle of some circuit, where debugging involves staring an LED or a multimeter, that s precisely what you want. Rust embedded dev tooling was, in this case, considerably better. Still quite chaotic and strange, and less mature, perhaps. But: significantly fewer mystery downloads, and significantly less crazy deviations from the language s normal build system. Overall, less bad software supply chain integrity. The ATTiny85 chip, and the DigiSpark board, served my hardware needs very well. (More about the hardware aspects of this project in a future posting.)

comments

Next.