Search Results: "nore"

15 January 2025

Dirk Eddelbuettel: RcppFastFloat 0.0.5 on CRAN: New Upstream, Updates

A new minor release of RcppFastFloat just arrived on CRAN. The package wraps fast_float, another nice library by Daniel Lemire. For details, see the arXiv preprint or published paper showing that one can convert character representations of numbers into floating point at rates at or exceeding one gigabyte per second. This release updates the underlying fast_float library version to the current version 7.0.0, and updates a few packaging aspects.

Changes in version 0.0.5 (2025-01-15)
  • No longer set a compilation standard
  • Updates to continuous integration, badges, URLs, DESCRIPTION
  • Update to fast_float 7.0.0
  • Per CRAN Policy comment-out compiler 'diagnostic ignore' instances

Courtesy of my CRANberries, there is also a diffstat report for this release. For questions, suggestions, or issues please use the [issue tracker][issue tickets] at the GitHub repo.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. If you like this or other open-source work I do, you can now sponsor me at GitHub.

7 January 2025

Jonathan Wiltshire: Using TPM for Automatic Disk Decryption in Debian 12

These days it s straightforward to have reasonably secure, automatic decryption of your root filesystem at boot time on Debian 12. Here s how I did it on an existing system which already had a stock kernel, secure boot enabled, grub2 and an encrypted root filesystem with the passphrase in key slot 0. There s no need to switch to systemd-boot for this setup but you will use systemd-cryptenroll to manage the TPM-sealed key. If that offends you, there are other ways of doing this.

Caveat The parameters I ll seal a key against in the TPM include a hash of the initial ramdisk. This is essential to prevent an attacker from swapping the image for one which discloses the key. However, it also means the key has to be re-sealed every time the image is rebuilt. This can be frequent, for example when installing/upgrading/removing packages which include a kernel module. You won t get locked out (as long as you still have a passphrase in another slot), but will need to re-seal the key to restore the automation. You can also choose not to include this parameter for the seal, but that opens the door to such an attack.

Caution: these are the steps I took on my own system. You may need to adjust them to avoid ending up with a non-booting system.

Check for a usable TPM device We ll bind the secure boot state, kernel parameters, and other boot measurements to a decryption key. Then, we ll seal it using the TPM. This prevents the disk being moved to another system, the boot chain being tampered with and various other attacks.
# apt install tpm2-tools
# systemd-cryptenroll --tpm2-device list
PATH        DEVICE     DRIVER 
/dev/tpmrm0 STM0125:00 tpm_tis

Clean up older kernels including leftover configurations I found that previously-removed (but not purged) kernel packages sometimes cause dracut to try installing files to the wrong paths. Identify them with:
# apt install aptitude
# aptitude search '~c'
Change search to purge or be more selective, this part is an exercise for the reader.

Switch to dracut for initramfs images Unless you have a particular requirement for the default initramfs-tools, replace it with dracut and customise:
# mkdir /etc/dracut.conf.d
# echo 'add_dracutmodules+=" tpm2-tss crypt "' > /etc/dracut.conf.d/crypt.conf
# apt install dracut

Remove root device from crypttab, configure grub Remove (or comment) the root device from /etc/crypttab and rebuild the initial ramdisk with dracut -f. Edit /etc/default/grub and add rd.auto rd.luks=1 to GRUB_CMDLINE_LINUX. Re-generate the config with update-grub. At this point it s a good idea to sanity-check the initrd contents with lsinitrd. Then, reboot using the new image to ensure there are no issues. This will also have up-to-date TPM measurements ready for the next step.

Identify device and seal a decryption key
# lsblk -ip -o NAME,TYPE,MOUNTPOINTS
NAME                                                    TYPE  MOUNTPOINTS
/dev/nvme0n1p4                                          part  /boot
/dev/nvme0n1p5                                          part  
 -/dev/mapper/luks-deff56a9-8f00-4337-b34a-0dcda772e326 crypt 
   -/dev/mapper/lv-var                                  lvm   /var
   -/dev/mapper/lv-root                                 lvm   /
   -/dev/mapper/lv-home                                 lvm   /home
In this example my root filesystem is in a container on /dev/nvme0n1p5. The existing passphrase key is in slot 0.
# systemd-cryptenroll --tpm2-device=auto --tpm2-pcrs=7+8+9+14 /dev/nvme0n1p5
Please enter current passphrase for disk /dev/nvme0n1p5: ********
New TPM2 token enrolled as key slot 1.
The PCRs I chose (7, 8, 9 and 14) correspond to the secure boot policy, kernel command line (to prevent init=/bin/bash-style attacks), files read by grub including that crucial initrd measurement, and secure boot MOK certificates and hashes. You could also include PCR 5 for the partition table state, and any others appropriate for your setup.

Reboot You should now be able to reboot and the root device will be unlocked automatically, provided the secure boot measurements remain consistent. The key slot protected by a passphrase (mine is slot 0) is now your recovery key. Do not remove it!
Please consider supporting my work in Debian and elsewhere through Liberapay.

31 December 2024

Scarlett Gately Moore: KDE: Application snaps 24.12.0 release and more

https://kde.org/announcements/gear/24.12.0 I hope everyone had a wonderful holiday! Your present from me is shiny new application snaps! There are several new qt6 ports in this release. Please visit https://snapcraft.io/store?q=kde I have also fixed the Krita snap unable to open/save bug. Please test edge! I am continuing work on core24 support and hope to be done before next release. I do look forward to 2025! Begone 2024! If you can help with gas, I still have 3 weeks of treatments to go. Thank you for your continued support. https://gofund.me/573cc38e

12 December 2024

Matthew Garrett: When should we require that firmware be free?

The distinction between hardware and software has historically been relatively easy to understand - hardware is the physical object that software runs on. This is made more complicated by the existence of programmable logic like FPGAs, but by and large things tend to fall into fairly neat categories if we're drawing that distinction.

Conversations usually become more complicated when we introduce firmware, but should they? According to Wikipedia, Firmware is software that provides low-level control of computing device hardware, and basically anything that's generally described as firmware certainly fits into the "software" side of the above hardware/software binary. From a software freedom perspective, this seems like something where the obvious answer to "Should this be free" is "yes", but it's worth thinking about why the answer is yes - the goal of free software isn't freedom for freedom's sake, but because the freedoms embodied in the Free Software Definition (and by proxy the DFSG) are grounded in real world practicalities.

How do these line up for firmware? Firmware can fit into two main classes - it can be something that's responsible for initialisation of the hardware (such as, historically, BIOS, which is involved in initialisation and boot and then largely irrelevant for runtime[1]) or it can be something that makes the hardware work at runtime (wifi card firmware being an obvious example). The role of free software in the latter case feels fairly intuitive, since the interface and functionality the hardware offers to the operating system is frequently largely defined by the firmware running on it. Your wifi chipset is, these days, largely a software defined radio, and what you can do with it is determined by what the firmware it's running allows you to do. Sometimes those restrictions may be required by law, but other times they're simply because the people writing the firmware aren't interested in supporting a feature - they may see no reason to allow raw radio packets to be provided to the OS, for instance. We also shouldn't ignore the fact that sufficiently complicated firmware exposed to untrusted input (as is the case in most wifi scenarios) may contain exploitable vulnerabilities allowing attackers to gain arbitrary code execution on the wifi chipset - and potentially use that as a way to gain control of the host OS (see this writeup for an example). Vendors being in a unique position to update that firmware means users may never receive security updates, leaving them with a choice between discarding hardware that otherwise works perfectly or leaving themselves vulnerable to known security issues.

But even the cases where firmware does nothing other than initialise the hardware cause problems. A lot of hardware has functionality controlled by registers that can be locked during the boot process. Vendor firmware may choose to disable (or, rather, never to enable) functionality that may be beneficial to a user, and then lock out the ability to reconfigure the hardware later. Without any ability to modify that firmware, the user lacks the freedom to choose what functionality their hardware makes available to them. Again, the ability to inspect this firmware and modify it has a distinct benefit to the user.

So, from a practical perspective, I think there's a strong argument that users would benefit from most (if not all) firmware being free software, and I don't think that's an especially controversial argument. So I think this is less of a philosophical discussion, and more of a strategic one - is spending time focused on ensuring firmware is free worthwhile, and if so what's an appropriate way of achieving this?

I think there's two consistent ways to view this. One is to view free firmware as desirable but not necessary. This approach basically argues that code that's running on hardware that isn't the main CPU would benefit from being free, in the same way that code running on a remote network service would benefit from being free, but that this is much less important than ensuring that all the code running in the context of the OS on the primary CPU is free. The maximalist position is not to compromise at all - all software on a system, whether it's running at boot or during runtime, and whether it's running on the primary CPU or any other component on the board, should be free.

Personally, I lean towards the former and think there's a reasonably coherent argument here. I think users would benefit from the ability to modify the code running on hardware that their OS talks to, in the same way that I think users would benefit from the ability to modify the code running on hardware the other side of a network link that their browser talks to. I also think that there's enough that remains to be done in terms of what's running on the host CPU that it's not worth having that fight yet. But I think the latter is absolutely intellectually consistent, and while I don't agree with it from a pragmatic perspective I think things would undeniably be better if we lived in that world.

This feels like a thing you'd expect the Free Software Foundation to have opinions on, and it does! There are two primarily relevant things - the Respects your Freedoms campaign focused on ensuring that certified hardware meets certain requirements (including around firmware), and the Free System Distribution Guidelines, which define a baseline for an OS to be considered free by the FSF (including requirements around firmware).

RYF requires that all software on a piece of hardware be free other than under one specific set of circumstances. If software runs on (a) a secondary processor and (b) within which software installation is not intended after the user obtains the product, then the software does not need to be free. (b) effectively means that the firmware has to be in ROM, since any runtime interface that allows the firmware to be loaded or updated is intended to allow software installation after the user obtains the product.

The Free System Distribution Guidelines require that all non-free firmware be removed from the OS before it can be considered free. The recommended mechanism to achieve this is via linux-libre, a project that produces tooling to remove anything that looks plausibly like a non-free firmware blob from the Linux source code, along with any incitement to the user to load firmware - including even removing suggestions to update CPU microcode in order to mitigate CPU vulnerabilities.

For hardware that requires non-free firmware to be loaded at runtime in order to work, linux-libre doesn't do anything to work around this - the hardware will simply not work. In this respect, linux-libre reduces the amount of non-free firmware running on a system in the same way that removing the hardware would. This presumably encourages users to purchase RYF compliant hardware.

But does that actually improve things? RYF doesn't require that a piece of hardware have no non-free firmware, it simply requires that any non-free firmware be hidden from the user. CPU microcode is an instructive example here. At the time of writing, every laptop listed here has an Intel CPU. Every Intel CPU has microcode in ROM, typically an early revision that is known to have many bugs. The expectation is that this microcode is updated in the field by either the firmware or the OS at boot time - the updated version is loaded into RAM on the CPU, and vanishes if power is cut. The combination of RYF and linux-libre doesn't reduce the amount of non-free code running inside the CPU, it just means that the user (a) is more likely to hit since-fixed bugs (including security ones!), and (b) has less guidance on how to avoid them.

As long as RYF permits hardware that makes use of non-free firmware I think it hurts more than it helps. In many cases users aren't guided away from non-free firmware - instead it's hidden away from them, leaving them less aware that their freedom is constrained. Linux-libre goes further, refusing to even inform the user that the non-free firmware that their hardware depends on can be upgraded to improve their security.

Out of sight shouldn't mean out of mind. If non-free firmware is a threat to user freedom then allowing it to exist in ROM doesn't do anything to solve that problem. And if it isn't a threat to user freedom, then what's the point of requiring linux-libre for a Linux distribution to be considered free by the FSF? We seem to have ended up in the worst case scenario, where nothing is being done to actually replace any of the non-free firmware running on people's systems and where users may even end up with a reduced awareness that the non-free firmware even exists.

[1] Yes yes SMM

comment count unavailable comments

11 December 2024

Divine Attah-Ohiemi: From Sisterly Wisdom to Debian Dreams: My Outreachy Journey

Discovering Open Source: How I Got Introduced
Hey there! I m Divine Attah-Ohiemi, a sophomore studying Computer Science. My journey into the world of open source was anything but grand. It all started with a simple question to my sister: How do people get jobs without experience? Her answer? Open source! I dove into this vibrant community, and it felt like discovering a hidden treasure chest filled with knowledge and opportunities. Choosing Debian: Why This Community?
Why Debian, you ask? Well, I applied to Outreachy twice, and both times, I chose Debian. It s not just my first operating system; it feels like home. The Debian community is incredibly welcoming, like a big family gathering where everyone supports each other. Whether I was updating my distro or poring over documentation, the care and consideration in this community were palpable. It reminded me of the warmth of homeschooling with relatives. Plus, knowing that Debian's name comes from its creator Ian and his wife Debra adds a personal touch that makes me feel even more honored to contribute to making the website better! Why I Applied to Outreachy: What Inspired Me
Outreachy is my golden ticket to the open source world! As a 19-year-old, I see this internship as a unique opportunity to gain invaluable experience while contributing to something meaningful. It s the perfect platform for me to learn, grow, and connect with like-minded individuals who share my passion for technology and community. I m excited for this journey and can t wait to see where it takes me!

9 December 2024

Paul Wise: FLOSS Activities November 2024

Focus This month I didn't have any particular focus. I just worked on issues in my info bubble.

Changes

Issues

Review

Communication
  • Respond to queries from Debian users and contributors on IRC

Sponsors The SWH work was sponsored. All other work was done on a volunteer basis.

6 December 2024

B lint R czey: Firebuild 0.8.3 is out with 100+ fixes and experimental macOS support!

The new Firebuild release contains plenty of small fixes and a few notable improvements.

Experimental macOS support The most frequently asked question from people getting to know Firebuild was if it worked on their Mac and the answer sadly used to be that well, it did, but only in a Linux VM. This was far from what they were looking for. Linux and macOS have common UNIX roots, but porting Firebuild to macOS included bigger challenges, like ensuring that dyld(1), macOS s dynamic loader initializes the preloaded interceptor library early enough to catch all interesting calls, and avoid using anything that uses malloc() or thread local variables which are not yet set up then. Preloading libraries on Linux is really easy, running LD_PRELOAD=my_lib.so ls just works if the library exports the symbols to be interposed, while macOS employs multiple lines of defense to prevent applications from using unknown libraries. Firebuild s guide for making DYLD_INSERT_LIBRARIES honored on Macs can be helpful with other projects as well that rely on injecting libraries. Since GitHub s Arm64 macOS runners don t allow intercepting binaries with arm64e ABI yet, Firebuild s Apple Silicon tests are run at Bitrise, who are proud to be first to provide the latest Xcode stacks and were also quick to make the needed changes to their infrastructure to support Firebuild (thanks!  ). Firebuild on macOS can already accelerate simple projects and rebuild itself with Xcode. Since Xcode introduces a lot of nondeterminism to the build, Firebuild can t shine in acceleration with Xcode yet, but can provide nice reports to show which part of the build is the most time consuming and how each sub-command is called. If you would like to try Firebuild on macOS please compile it from the GitHub repository for now. Precompiled binaries will be distributed on the Mac App Store and via CI providers. Contact us to get notified when those channels become available.

Dealing with the Epochalypse Glibc s API provides many functions with time parameters and some of those functions are intercepted by Firebuild. Time parameters used to be passed as 32-bit values on 32-bit systems, preventing them to accurately represent timestamps after year 2038, which is known as the Y2038 problem or the Epochalypse. To deal with the problem glibc 2.34 started providing new function symbol variants with 64-bit time parameters, e.g clock_gettime64() in addition to clock_gettime(). The new 64-bit variants are used when compiling consumers of the API with _TIME_BITS=64 defined. Processes intercepted by Firebuild may have been compiled with or without _TIME_BITS=64, thus libfirebuild now provides both variants on affected systems running glibc >= 34 to work safely with binaries using 64-bit and 32-bit time representation. Many Linux distributions already stopped supporting 32-bit architectures, but Debian and Ubuntu still supports armhf, for example, where the Y2038 problem still applies. Both Debian and Ubuntu performed a transition rebuilding every library (and their reverse dependencies) with -D_FILE_OFFSET_BITS=64 set where the libraries exported symbols that changed when switching to 64-bit time representation (thanks to Steve Langasek for driving this!) . Thanks to the transition most programs are ready for 2038, but interposer libraries are trickier to fix and if you maintain one it might be a good idea to check if it works well both 32-bit and 64-bit libraries. Faketime, for example is not fixed yet, see #1064555.

Select passed through environment variables with regular expressions Firebuild filters out most of the environment variables set when starting a build to make the build more reproducible and achieve higher cache hit rate. Extra environment variables to pass through can be specified on the command line one by one, but with many similarly named variables this may become hard to maintain. With regular expressions this just became easier:
firebuild -o 'env_vars.pass_through += "MY_VARS_.*"' my_build_command
If you are not interested in acceleration just would like to explore what the build does by generating a report you can simply pass all variables:
firebuild -r -o 'env_vars.pass_through += ".*"' my_build_command

Other highlights from the 0.8.3 release
  • Fixed and nicer report in Chrome and other WebKit based browsers
  • Support GLibc 2.39 by intercepting pidfd_spawn() and pidfd_spawnp()
  • Even faster Rust build acceleration
For all the changes please check out the release page on GitHub!  (This post is also published on The Firebuild blog.)

2 December 2024

Dirk Eddelbuettel: anytime 0.3.10 on CRAN: Multiple Enhancements

A new release of the anytime package arrived on CRAN today the first is well over four years. The package is fairly feature-complete, and code and functionality remain mature and stable, of course. anytime is a very focused package aiming to do just one thing really well: to convert anything in integer, numeric, character, factor, ordered, input format to either POSIXct (when called as anytime) or Date objects (when called as anydate) and to do so without requiring a format string as well as accomodating different formats in one input vector. See the anytime page, or the GitHub repo for a few examples, and the beautiful documentation site for all documentation. This release slowly matured over four years. It combines a number of strictly internal repository maintenance such as changes to continuous integration with small enhancements (adding for example some new formats, responding better to an error condition, dealing with logical input as an error) with a relaxation of the C++ compilation standard. While we once needed C++11, it is now a constraint as as R itself is quite proactive (the last two releases defaulted already to C++17, suitable compiler permitting) we can now relax this constraint. The documentation site is new, as some other small changes. See the full list of changes which follows.

Changes in anytime version 0.3.10 (2024-12-02)
  • A new documentation site was added.
  • Continuous Integration now uses run.sh from r-ci with bspm
  • Logical input vectors are now recognised as an error (#121)
  • Additional dot-separated format '%Y.%m.%d' is supported
  • Other small updates were made throughout the package
  • No longer set a C++ compilation standard as the default choices by R are sufficient for the package
  • Switch Rcpp include file to Rcpp/Lightest
  • We recommend ~/.R/Makevars compiler flag options -Wno-ignored-attributes -Wno-nonnull -Wno-parentheses
  • The tinytest runner was simplified
  • NA values from conversion now trigger a warning

Courtesy of my CRANberries, there is also a diffstat report of changes relative to the previous release. The issue tracker tracker off the GitHub repo can be use for questions and comments. More information about the package is at the package page, the GitHub repo and the documentation site. If you like this or other open-source work I do, you can now sponsor me at GitHub.

This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.

30 November 2024

Dima Kogan: Strava track filtering validation

After years of seeing people's strava tracks, I became convinced that they insufficiently filter the data, resulting in over-estimating the effort. Today I did a bit of lazy analysis, and half-confirmed this: in the one case I looked at, strava reported reasonable elevation gain numbers, but greatly overestimated the distance traveled. I looked at a single gps track of a long bike ride. This was uploaded to strava manually, as a .gpx file. I can imagine that different things happen if you use the strava app or some device that integrates with the service (the filtering might happen before the data hits the server, and the server could decide to not apply any more filtering). I processed the data with a simple hysteretic filter, ignoring small changes in position and elevation, trying out different thresholds in the process. I completely ignore the timestamps, and only look at the differences between successive points. This handles the usual GPS noise; it does not handle GPS jumps, which I completely ignore in this analysis. Ignoring these would produce inflated elevation/gain numbers, but I'm working with a looong track, so hopefully this is a small effect. Clearly this is not scientific, but it's something.

The code
Parsing .gpx is slow (this is a big file), so I cache that into a .vnl:
import sys
import gpxpy
filename_in  = 'INPUT.gpx'
filename_out = 'OUTPUT.gpx'
with open(filename_in, 'r') as f:
    gpx = gpxpy.parse(f)
f_out = open(filename_out, 'w')
tracks = gpx.tracks
if len(tracks) != 1:
    print("I want just one track", file=sys.stderr)
    sys.exit(1)
track = tracks[0]
segments = track.segments
if len(segments) != 1:
    print("I want just one segment", file=sys.stderr)
    sys.exit(1)
segment = segments[0]
time0 = segment.points[0].time
print("# time lat lon ele_m")
for p in segment.points:
    print(f" (p.time - time0).seconds   p.latitude   p.longitude   p.elevation ",
          file = f_out)
And I process this data with the different filters (this is a silly Python loop, and is slow):
#!/usr/bin/python3
import sys
import numpy as np
import numpysane as nps
import gnuplotlib as gp
import vnlog
import pyproj
geod = None
def dist_ft(lat0,lon0, lat1,lon1):
    global geod
    if geod is None:
        geod = pyproj.Geod(ellps='WGS84')
    return \
        geod.inv(lon0,lat0, lon1,lat1)[2] * 100./2.54/12.
f = 'OUTPUT.gpx'
track,list_keys,dict_key_index = \
    vnlog.slurp(f)
t      = track[:,dict_key_index['time' ]]
lat    = track[:,dict_key_index['lat'  ]]
lon    = track[:,dict_key_index['lon'  ]]
ele_ft = track[:,dict_key_index['ele_m']] * 100./2.54/12.
@nps.broadcast_define( ( (), ()),
                       (2,))
def filter_track(ele_hysteresis_ft,
                 dxy_hysteresis_ft):
    dist        = 0.0
    ele_gain_ft = 0.0
    lon_accepted = None
    lat_accepted = None
    ele_accepted = None
    for i in range(len(lat)):
        if ele_accepted is not None:
            dxy_here  = dist_ft(lat_accepted,lon_accepted, lat[i],lon[i])
            dele_here = np.abs( ele_ft[i] - ele_accepted )
            if dxy_here < dxy_hysteresis_ft and dele_here < ele_hysteresis_ft:
                continue
            if ele_ft[i] > ele_accepted:
                ele_gain_ft += dele_here;
            dist += np.sqrt(dele_here * dele_here +
                            dxy_here  * dxy_here)
        lon_accepted = lon[i]
        lat_accepted = lat[i]
        ele_accepted = ele_ft[i]
    # lose the last point. It simply doesn't matter
    dist_mi = dist / 5280.
    return np.array((ele_gain_ft, dist_mi))
Nele_hysteresis_ft    = 20
ele_hysteresis0_ft    = 5
ele_hysteresis1_ft    = 100
ele_hysteresis_ft_all = np.linspace(ele_hysteresis0_ft,
                                    ele_hysteresis1_ft,
                                    Nele_hysteresis_ft)
Ndxy_hysteresis_ft = 20
dxy_hysteresis0_ft = 5
dxy_hysteresis1_ft = 1000
dxy_hysteresis_ft  = np.linspace(dxy_hysteresis0_ft,
                                 dxy_hysteresis1_ft,
                                 Ndxy_hysteresis_ft)
# shape (Nele,Ndxy,2)
gain,distance = \
    nps.mv( filter_track( nps.dummy(ele_hysteresis_ft_all,-1),
                          dxy_hysteresis_ft),
            -1,0 )
# Stolen from mrcal
def options_heatmap_with_contours( plotoptions, # we update this on output
                                   *,
                                   contour_min           = 0,
                                   contour_max,
                                   contour_increment     = None,
                                   do_contours           = True,
                                   contour_labels_styles = 'boxed',
                                   contour_labels_font   = None):
    r'''Update plotoptions, return curveoptions for a contoured heat map'''
    gp.add_plot_option(plotoptions,
                       'set',
                       ('view equal xy',
                        'view map'))
    if do_contours:
        if contour_increment is None:
            # Compute a "nice" contour increment. I pick a round number that gives
            # me a reasonable number of contours
            Nwant = 10
            increment = (contour_max - contour_min)/Nwant
            # I find the nearest 1eX or 2eX or 5eX
            base10_floor = np.power(10., np.floor(np.log10(increment)))
            # Look through the options, and pick the best one
            m   = np.array((1., 2., 5., 10.))
            err = np.abs(m * base10_floor - increment)
            contour_increment = -m[ np.argmin(err) ] * base10_floor
        gp.add_plot_option(plotoptions,
                           'set',
                           ('key box opaque',
                            'style textbox opaque',
                            'contour base',
                            f'cntrparam levels incremental  contour_max , contour_increment , contour_min '))
        if contour_labels_font is not None:
            gp.add_plot_option(plotoptions,
                               'set',
                               f'cntrlabel format "%d" font " contour_labels_font "' )
        else:
            gp.add_plot_option(plotoptions,
                               'set',
                               f'cntrlabel format "%.0f"' )
        plotoptions['cbrange'] = [contour_min, contour_max]
        # I plot 3 times:
        # - to make the heat map
        # - to make the contours
        # - to make the contour labels
        _with = np.array(('image',
                          'lines nosurface',
                          f'labels  contour_labels_styles  nosurface'))
    else:
        gp.add_plot_option(plotoptions, 'unset', 'key')
        _with = 'image'
    using = \
        f'( dxy_hysteresis0_ft +$1* float(dxy_hysteresis1_ft-dxy_hysteresis0_ft)/(Ndxy_hysteresis_ft-1) ):' + \
        f'( ele_hysteresis0_ft +$2* float(ele_hysteresis1_ft-ele_hysteresis0_ft)/(Nele_hysteresis_ft-1) ):3'
    plotoptions['_3d']     = True
    plotoptions['_xrange'] = [dxy_hysteresis0_ft,dxy_hysteresis1_ft]
    plotoptions['_yrange'] = [ele_hysteresis0_ft,ele_hysteresis1_ft]
    plotoptions['ascii']   = True # needed for using to work
    gp.add_plot_option(plotoptions, 'unset', 'grid')
    return \
        dict( tuplesize=3,
              legend = "", # needed to force contour labels
              using = using,
              _with=_with)
contour_granularity = 1000
plotoptions = dict()
curveoptions = \
    options_heatmap_with_contours( plotoptions, # we update this on output
                                   # round down to the nearest contour_granularity
                                   contour_min = (np.min(gain) // contour_granularity)*contour_granularity,
                                   # round up to the nearest contour_granularity
                                   contour_max = ((np.max(gain) + (contour_granularity-1)) // contour_granularity) * contour_granularity,
                                   do_contours = True)
gp.add_plot_option(plotoptions, 'unset', 'key')
gp.add_plot_option(plotoptions, 'set', 'size square')
gp.plot(gain,
        xlabel  = "Distance hysteresis (ft)",
        ylabel  = "Elevation hysteresis (ft)",
        cblabel = "Elevation gain (ft)",
        wait = True,
        **curveoptions,
        **plotoptions,
        title    = 'Computed gain vs filtering parameters')
contour_granularity = 10
plotoptions = dict()
curveoptions = \
    options_heatmap_with_contours( plotoptions, # we update this on output
                                   # round down to the nearest contour_granularity
                                   contour_min = (np.min(distance) // contour_granularity)*contour_granularity,
                                   # round up to the nearest contour_granularity
                                   contour_max = ((np.max(distance) + (contour_granularity-1)) // contour_granularity) * contour_granularity,
                                   do_contours = True)
gp.add_plot_option(plotoptions, 'unset', 'key')
gp.add_plot_option(plotoptions, 'set', 'size square')
gp.plot(distance,
        xlabel  = "Distance hysteresis (ft)",
        ylabel  = "Elevation hysteresis (ft)",
        cblabel = "Distance (miles)",
        wait = True,
        **curveoptions,
        **plotoptions,
        title    = 'Computed distance vs filtering parameters')

Results: gain
Strava says the gain was 46307ft. The analysis says:
strava-gain.png
strava-gain-zoom.png
These show the filtered gain for different values of the distance and gain hysteresis thresholds. The same data is shown at diffent zoom levels. There's no sweet spot, but we get 46307ft with a reasonable amount of filtering. Maybe 46307ft is a bit low even.

Results: distance
Strava says the distance covered was 322 miles. The analysis says:
strava-distance.png
strava-distance-zoom.png
Once again, there's no sweet spot, but we get 322 miles only if we apply no filtering at all. That's clearly too high, and is not reasonable. From the map (and from other people's strava routes) the true distance is closer to 305 miles. Why those people's strava numbers are more believable is anybody's guess.

29 November 2024

Raju Devidas: Finding all sub domains of a main domain

Problem: Need to know all the sub domains of a main domain, e.g. example.com has a sub domain dev.example.com , I also want to know other sub domains. Solution: Install the package called sublist3r, written by Ahmed Aboul-Ela
$ sudo apt install sublist3r
run the command
$ sublist3r -d example.com -o subdomains-example.com.txt
                 ____        _     _ _     _   _____
                / ___  _   _   __   (_)___   _ ___ / _ __
                \___ \        &apos_ \    / __  __   _ \  &apos__ 
                 ___)    _     _)     \__ \  _ ___)    
                 ____/ \__,_ _.__/ _ _ ___/\__ ____/ _ 
                # Coded By Ahmed Aboul-Ela - @aboul3la
[-] Enumerating subdomains now for example.com
[-] Searching now in Baidu..
[-] Searching now in Yahoo..
[-] Searching now in Google..
[-] Searching now in Bing..
[-] Searching now in Ask..
[-] Searching now in Netcraft..
[-] Searching now in DNSdumpster..
[-] Searching now in Virustotal..
[-] Searching now in ThreatCrowd..
[-] Searching now in SSL Certificates..
[-] Searching now in PassiveDNS..
Process DNSdumpster-8:
Traceback (most recent call last):
  File "/usr/lib/python3.12/multiprocessing/process.py", line 314, in _bootstrap
    self.run()
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 269, in run
    domain_list = self.enumerate()
                  ^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 649, in enumerate
    token = self.get_csrftoken(resp)
            ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib/python3/dist-packages/sublist3r.py", line 644, in get_csrftoken
    token = csrf_regex.findall(resp)[0]
            ~~~~~~~~~~~~~~~~~~~~~~~~^^^
IndexError: list index out of range
[!] Error: Google probably now is blocking our requests
[~] Finished now the Google Enumeration ...
[!] Error: Virustotal probably now is blocking our requests
[-] Saving results to file: subdomains-example.com.txt
[-] Total Unique Subdomains Found: 7
AS207960 Test Intermediate - example.com
www.example.com
dev.example.com
m.example.com
products.example.com
support.example.com
m.testexample.com
We can see the subdomains listed at the end of the command output. enjoy, have fun, drink water!

22 November 2024

Matthew Palmer: Invalid Excuses for Why Your Release Process Sucks

In my companion article, I made the bold claim that your release process should consist of no more than two steps:
  1. Create an annotated Git tag;
  2. Run a single command to trigger the release pipeline.
As I have been on the Internet for more than five minutes, I m aware that a great many people will have a great many objections to this simple and straightforward idea. In the interests of saving them a lot of wear and tear on their keyboards, I present this list of common reasons why these objections are invalid. If you have an objection I don t cover here, the comment box is down the bottom of the article. If you think you ve got a real stumper, I m available for consulting engagements, and if you turn out to have a release process which cannot feasibly be reduced to the above two steps for legitimate technical reasons, I ll waive my fees.

But I automatically generate my release notes from commit messages! This one is really easy to solve: have the release note generation tool feed directly into the annotation. Boom! Headshot.

But all these files need to be edited to make a release! No, they absolutely don t. But I can see why you might think you do, given how inflexible some packaging environments can seem, and since that s how we ve always done it .

Language Packages Most languages require you to encode the version of the library or binary in a file that you want to revision control. This is teh suck, but I m yet to encounter a situation that can t be worked around some way or another. In Ruby, for instance, gemspec files are actually executable Ruby code, so I call code (that s part of git-version-bump, as an aside) to calculate the version number from the git tags. The Rust build tool, Cargo, uses a TOML file, which isn t as easy, but a small amount of release automation is used to take care of that.

Distribution Packages If you re building Linux distribution packages, you can easily apply similar automation faffery. For example, Debian packages take their metadata from the debian/changelog file in the build directory. Don t keep that file in revision control, though: build it at release time. Everything you need to construct a Debian (or RPM) changelog is in the tag version numbers, dates, times, authors, release notes. Use it for much good.

The Dreaded Changelog Finally, there s the CHANGELOG file. If it s maintained during the development process, it typically has an archive of all the release notes, under version numbers, with an Unreleased heading at the top. It s one more place to remember to have to edit when making that preparing release X.Y.Z commit, and it is a gift to the Demon of Spurious Merge Conflicts if you follow the policy of every commit must add a changelog entry . My solution: just burn it to the ground. Add a line to the top with a link to wherever the contents of annotated tags get published (such as GitHub Releases, if that s your bag) and never open it ever again.

But I need to know other things about my release, too! For some reason, you might think you need some other metadata about your releases. You re probably wrong it s amazing how much information you can obtain or derive from the humble tag so think creatively about your situation before you start making unnecessary complexity for yourself. But, on the off chance you re in a situation that legitimately needs some extra release-related information, here s the secret: structured annotation. The annotation on a tag can be literally any sequence of octets you like. How that data is interpreted is up to you. So, require that annotations on release tags use some sort of structured data format (say YAML or TOML or even XML if you hate your release manager), and mandate that it contain whatever information you need. You can make sure that the annotation has a valid structure and contains all the information you need with an update hook, which can reject the tag push if it doesn t meet the requirements, and you re sorted.

But I have multiple packages in my repo, with different release cadences and versions! This one is common enough that I just refer to it as the monorepo drama . Personally, I m not a huge fan of monorepos, but you do you, boo. Annotated tags can still handle it just fine. The trick is to include the package name being released in the tag name. So rather than a release tag being named vX.Y.Z, you use foo/vX.Y.Z, bar/vX.Y.Z, and baz/vX.Y.Z. The release automation for each package just triggers on tags that match the pattern for that particular package, and limits itself to those tags when figuring out what the version number is.

But we don t semver our releases! Oh, that s easy. The tag pattern that marks a release doesn t have to be vX.Y.Z. It can be anything you want. Relatedly, there is a (rare, but existent) need for packages that don t really have a conception of releases in the traditional sense. The example I ve hit most often is automatically generated bindings packages, such as protobuf definitions. The source of truth for these is a bunch of .proto files, but to be useful, they need to be packaged into code for the various language(s) you re using. But those packages need versions, and while someone could manually make releases, the best option is to build new per-language packages automatically every time any of those definitions change. The versions of those packages, then, can be datestamps (I like something like YYYY.MM.DD.N, where N starts at 0 each day and increments if there are multiple releases in a single day). This process allows all the code that needs the definitions to declare the minimum version of the definitions that it relies on, and everything is kept in sync and tracked almost like magic.

Th-th-th-th-that s all, folks! I hope you ve enjoyed this bit of mild debunking. Show your gratitude by buying me a refreshing beverage, or purchase my professional expertise and I ll answer all of your questions and write all your CI jobs.

18 November 2024

Russ Allbery: Review: Delilah Green Doesn't Care

Review: Delilah Green Doesn't Care, by Ashley Herring Blake
Series: Bright Falls #1
Publisher: Jove
Copyright: February 2022
ISBN: 0-593-33641-0
Format: Kindle
Pages: 374
Delilah Green Doesn't Care is a sapphic romance novel. It's the first of a trilogy, although in the normal romance series fashion each book follows a different protagonist and has its own happy ending. It is apparently classified as romantic comedy, which did not occur to me while reading but which I suppose I can see in retrospect. Delilah Green got the hell out of Bright Falls as soon as she could and tried not to look back. After her father died, her step-mother lavished all of her perfectionist attention on her overachiever step-sister, leaving Delilah feeling like an unwanted ghost. She escaped to New York where there was space for a queer woman with an acerbic personality and a burgeoning career in photography. Her estranged step-sister's upcoming wedding was not a good enough reason to return to the stifling small town of her childhood. The pay for photographing the wedding was, since it amounted to three months of rent and trying to sell photographs in galleries was not exactly a steady living. So back to Bright Falls Delilah goes. Claire never left Bright Falls. She got pregnant young and ended up with a different life than she expected, although not a bad one. Now she's raising her daughter as a single mom, running the town bookstore, and dealing with her unreliable ex. She and Iris are Astrid Parker's best friends and have been since fifth grade, which means she wants to be happy for Astrid's upcoming wedding. There's only one problem: the groom. He's a controlling, boorish ass, but worse, Astrid seems to turn into a different person around him. Someone Claire doesn't like. Then, to make life even more complicated, Claire tries to pick up Astrid's estranged step-sister in Bright Falls's bar without recognizing her. I have a lot of things to say about this novel, but here's the core of my review: I started this book at 4pm on a Saturday because I hadn't read anything so far that day and wanted to at least start a book. I finished it at 11pm, having blown off everything else I had intended to do that evening, completely unable to put it down. It turns out there is a specific type of romance novel protagonist that I absolutely adore: the sarcastic, confident, no-bullshit character who is willing to pick the fights and say the things that the other overly polite and anxious characters aren't able to get out. Astrid does not react well to criticism, for reasons that are far more complicated than it may first appear, and Claire and Iris have been dancing around the obvious problems with her surprise engagement. As the title says, Delilah thinks she doesn't care: she's here to do a job and get out, and maybe she'll get to tweak her annoying step-sister a bit in the process. But that also means that she is unwilling to play along with Astrid's obsessively controlling mother or her obnoxious fiance, and thus, to the barely disguised glee of Claire and Iris, is a direct threat to the tidy life that Astrid's mother is trying to shoehorn her daughter into. This book is a great example of why I prefer sapphic romances: I think this character setup would not work, at least for me, in a heterosexual romance. Delilah's role only works if she's a woman; if a male character were the sarcastic conversational bulldozer, it would be almost impossible to avoid falling into the gender stereotype of a male rescuer. If this were a heterosexual romance trying to avoid that trap, the long-time friend who doesn't know how to directly confront Astrid would have to be the male protagonist. That could work, but it would be a tricky book to write without turning it into a story focused primarily on the subversion of gender roles. Making both protagonists women dodges the problem entirely and gives them so much narrative and conceptual space to simply be themselves, rather than characters obscured by the shadows of societal gender rules. This is also, at it's core, a book about friendship. Claire, Astrid, and Iris have the sort of close-knit friend group that looks exclusive and unapproachable from the outside. Delilah was the stereotypical outsider, mocked and excluded when they thought of her at all. This, at least, is how the dynamics look at the start of the book, but Blake did an impressive job of shifting my understanding of those relationships without changing their essential nature. She fleshes out all of the characters, not just the romantic leads, and adds complexity, nuance, and perspective. And, yes, past misunderstanding, but it's mostly not the cheap sort that sometimes drives romance plots. It's the misunderstanding rooted in remembered teenage social dynamics, the sort of misunderstanding that happens because communication is incredibly difficult, even more difficult when one has no practice or life experience, and requires knowing oneself well enough to even know what to communicate. The encounter between Delilah and Claire in the bar near the start of the book is cornerstone of the plot, but the moment that grabbed me and pulled me in was Delilah's first interaction with Claire's daughter Ruby. That was the point when I knew these were characters I could trust, and Blake never let me down. I love how Ruby is handled throughout this book, with all of the messy complexity of a kid of divorced parents with her own life and her own personality and complicated relationships with both parents that are independent of the relationship their parents have with each other. This is not a perfect book. There's one prank scene that I thought was excessively juvenile and should have been counter-productive, and there's one tricky question of (nonsexual) consent that the book raises and then later seems to ignore in a way that bugged me after I finished it. There is a third-act breakup, which is not my favorite plot structure, but I think Blake handles it reasonably well. I would probably find more niggles and nitpicks if I re-read it more slowly. But it was utterly engrossing reading that exactly matched my mood the day that I picked it up, and that was a fantastic reading experience. I'm not much of a romance reader and am not the traditional audience for sapphic romance, so I'm probably not the person you should be looking to for recommendations, but this is the sort of book that got me to immediately buy all of the sequels and start thinking about a re-read. It's also the sort of book that dragged me back in for several chapters when I was fact-checking bits of my review. Take that recommendation for whatever it's worth. Content note: Reviews of Delilah Green Doesn't Care tend to call it steamy or spicy. I have no calibration for this for romance novels. I did not find it very sex-focused (I have read genre fantasy novels with more sex), but there are several on-page sex scenes if that's something you care about one way or the other. Followed by Astrid Parker Doesn't Fail. Rating: 9 out of 10

12 November 2024

Paul Tagliamonte: Complex for Whom?

In basically every engineering organization I ve ever regarded as particularly high functioning, I ve sat through one specific recurring conversation which is not a conversation about complexity . Things are good or bad because they are or aren t complex, architectures needs to be redone because it s too complex some refactor of whatever it is won t work because it s too complex. You may have even been a part of some of these conversations or even been the one advocating for simple light-weight solutions. I ve done it. Many times. Rarely, if ever, do we talk about complexity within its rightful context complexity for whom. Is a solution complex because it s complex for the end user? Is it complex if it s complex for an API consumer? Is it complex if it s complex for the person maintaining the API service? Is it complex if it s complex for someone outside the team maintaining it to understand? Complexity within a problem domain I ve come to believe, is fairly zero-sum there s a fixed amount of complexity in the problem to be solved, and you can choose to either solve it, or leave it for those downstream of you to solve that problem on their own. That being said, while I believe there is a lower bound in complexity to contend with for a problem, I do not believe there is an upper bound to the complexity of solutions possible. It is always possible, and in fact, very likely that teams create problems for themselves while trying to solve a problem. The rest of this post is talking to the lower bound. When getting feedback on an early draft of this blog post, I ve been informed that Fred Brooks coined a term for what I call lower bound complexity Essential Complexity , in the paper No Silver Bullet Essence and Accident in Software Engineering , which is a better term and can be used interchangeably.

Complexity Culture In a large enough organization, where the team is high functioning enough to have and maintain trust amongst peers, members of the team will specialize. People will begin to engage with subsets of the work to be done, and begin to have their efficacy measured against that part of the organization s problems. Incentives shift, and over time it becomes increasingly likely that two engineers may have two very different priorities when working on the same system together. Someone accountable for uptime and tasked with responding to outages will begin to resist changes. Someone accountable for rapidly delivering features will resist gates between them and their users. Companies (either wittingly or unwittingly) will deal with this by tasking engineers with both production (feature development) and operational tasks (maintenance), so the difference in incentives isn t usually as bad as it could be. When we get a bunch of folks from far-flung corners of an organization in a room, fire up a slide deck and throw up some aspirational to-be architecture diagram in order to get a sign-off to solve some problem (be it someone needs a credible promotion packet, new feature needs to get delivered, or the system has begun to fail and needs fixing), the initial reaction will, more often than I d like, start to devolve into a discussion of how this is going to introduce a bunch of complexity, going to be hard to maintain, why can t you make it less complex? Right around here is when I start to try and contextualize the conversation happening around me understand what complexity is that being discussed, and understand who is taking on that burden. Think about who should be owning that problem, and work through the tradeoffs involved. Is it best solved here, or left to consumers (be them other systems, developers, or users). Should something become an API call s optional param, taking on all the edge-cases and on, or should users have to implement the logic using the data you return (leaving everyone else to take on all the edge-cases and maintenance)? Should you process the data, or require the user to preprocess it for you? Frequently it s right to make an active and explicit decision to simplify and leave problems to be solved downstream, since they may not actually need to be solved or perhaps you expect consumers will want to own the specifics of how the problem is solved, in which case you leave lots of documentation and examples. Many other times, especially when it s something downstream consumers are likely to hit, it s best solved internal to the system, since the only thing that can come of leaving it unsolved are bugs, frustration and half-correct solutions. This is a grey-space of tradeoffs, not a clear decision tree. No one wants the software manifestation of a katamari ball or a junk drawer, nor does anyone want a half-baked service unable to handle the simplest use-case.

Head-in-sand as a Service Popoffs about how complex something is, are, to a first approximation, best understood as meaning complicated for the person making comments . A lot of the #thoughtleadership believe that an AWS hosted EKS k8s cluster running images built by CI talking to an AWS hosted PostgreSQL RDS is not complex. They re right. Mostly right. This is less complex less complex for them. It s not, however, without complexity and its own tradeoffs it s just complexity that they do not have to deal with. Now they don t have to maintain machines that have pesky operating systems or hard drive failures. They don t have to deal with updating the version of k8s, nor ensuring the backups work. No one has to push some artifact to prod manually. Deployments happen unattended. You click a button and get a cluster. On the other hand, developers outside the ops function need to deal with troubleshooting CI, debugging access control rules encoded in turing complete YAML, permissions issues inside the cluster due to whatever the fuck a service mesh is, everyone needs to learn how to use some k8s tools they only actually use during a bad day, likely while doing some x.509 troubleshooting to connect to the cluster (an internal only endpoint; just port forward it) not to mention all sorts of rules to route packets to their project (a single repo s binary being run in 3 containers on a single vm host). Beyond that, there s the invisible complexity complexity on the interior of a service you depend on. I think about the dozens of teams maintaining the EKS service (which is either run on EC2 instances, or alternately, EC2 instances in a trench coat, moustache and even more shell scripts), the RDS service (also EC2 and shell scripts, but this time accounting for redundancy, backups, availability zones), scores of hypervisors pulled off the shelf (xen, kvm) smashed together with the ones built in-house (firecracker, nitro, etc) running on hardware that has to be refreshed and maintained continuously. Every request processed by network ACL rules, AWS IAM rules, security group rules, using IP space announced to the internet wired through IXPs directly into ISPs. I don t even want to begin to think about the complexity inherent in how those switches are designed. Shitloads of complexity to solve problems you may or may not have, or even know you had. What s more complex? An app running in an in-house 4u server racked in the office s telco closet in the back running off the office Verizon line, or an app running four hypervisors deep in an AWS datacenter? Which is more complex to you? What about to your organization? In total? Which is more prone to failure? Which is more secure? Is the complexity good or bad? What type of Complexity can you manage effectively? Which threaten the system? Which threaten your users?

COMPLEXIVIBES This extends beyond Engineering. Decisions regarding what tools are we able to use be them existing contracts with cloud providers, CIO mandated SaaS products, a list of the only permissible open source projects will incur costs in terms of expressed complexity . Pinning open source projects to a fixed set makes SBOM production less complex . Using only one SaaS provider s product suite (even if its terrible, because it has all the types of tools you need) makes accreditation less complex . If all you have is a contract with Pauly T s lowest price technically acceptable artisinal cloudary and haberdashery, the way you pay for your compute is less complex for the CIO shop, though you will find yourself building your own hosted database template, mechanism to spin up a k8s cluster, and all the operational and technical burden that comes with it. Or you won t and make it everyone else s problem in the organization. Nothing you can do will solve for the fact that you must now deal with this problem somewhere because it was less complicated for the business to put the workloads on the existing contract with a cut-rate vendor. Suddenly, the decision to reduce complexity because of an existing contract vehicle has resulted in a huge amount of technical risk and maintenance burden being onboarded. Complexity you would otherwise externalize has now been taken on internally. With a large enough organizations (specifically, in this case, i m talking about you, bureaucracies), this is largely ignored or accepted as normal since the personnel cost is understood to be free to everyone involved. Doing it this way is more expensive, more work, less reliable and less maintainable, and yet, somehow, is, in a lot of ways, less complex to the organization. It s particularly bad with bureaucracies, since screwing up a contract will get you into much more trouble than delivering a broken product, leaving basically no reason for anyone to care to fix this. I can t shake the feeling that for every story of technical mandates gone awry, somewhere just out of sight there s a decisionmaker optimizing for what they believe to be the least amount of complexity least hassle, fewest unique cases, most consistency as they can. They freely offload complexity from their accreditation and risk acceptance functions through mandates. They will never have to deal with it. That does not change the fact that someone does.

TC;DR (TOO COMPLEX; DIDN T REVIEW) We wish to rid ourselves of systemic Complexity after all, complexity is bad, simplicity is good. Removing upper-bound own-goal complexity ( accidental complexity in Brooks s terms) is important, but once you hit the lower bound complexity, the tradeoffs become zero-sum. Removing complexity from one part of the system means that somewhere else maybe outside your organization or in a non-engineering function must grow it back. Sometimes, the opposite is the case, such as when a previously manual business processes is automated. Maybe that s a good idea. Maybe it s not. All I know is that what doesn t help the situation is conflating complexity with everything we don t like legacy code, maintenance burden or toil, cost, delivery velocity.
  • Complexity is not the same as proclivity to failure. The most reliable systems I ve interacted with are unimaginably complex, with layers of internal protection to prevent complete failure. This has its own set of costs which other people have written about extensively.
  • Complexity is not cost. Sometimes the cost of taking all the complexity in-house is less, for whatever value of cost you choose to use.
  • Complexity is not absolute. Something simple from one perspective may be wildly complex from another. The impulse to burn down complex sections of code is helpful to have generally, but sometimes things are complicated for a reason, even if that reason exists outside your codebase or organization.
  • Complexity is not something you can remove without introducing complexity elsewhere. Just as not making a decision is a decision itself; choosing to require someone else to deal with a problem rather than dealing with it internally is a choice that needs to be considered in its full context.
Next time you re sitting through a discussion and someone starts to talk about all the complexity about to be introduced, I want to pop up in the back of your head, politely asking what does complex mean in this context? Is it lower bound complexity? Is this complexity desirable? Is what they re saying mean something along the lines of I don t understand the problems being solved, or does it mean something along the lines of this problem should be solved elsewhere? Do they believe this will result in more work for them in a way that you don t see? Should this not solved at all by changing the bounds of what we should accept or redefine the understood limits of this system? Is the perceived complexity a result of a decision elsewhere? Who s taking this complexity on, or more to the point, is failing to address complexity required by the problem leaving it to others? Does it impact others? How specifically? What are you not seeing? What can change? What should change?

Sven Hoexter: fluxcd: Validate flux-system Root Kustomization

Not entirely sure how people use fluxcd, but I guess most people have something like a flux-system flux kustomization as the root to add more flux kustomizations to their kubernetes cluster. Here all of that is living in a monorepo, and as we're all humans people figure out different ways to break it, which brings the reconciliation of the flux controllers down. Thus we set out to do some pre-flight validations. Note1: We do not use flux variable substitutions for those root kustomizations, so if you use those, you've to put additional work into the validation and pipe things through flux envsubst. First Iteration: Just Run kustomize Like Flux Would Do It With a folder structure where we've a cluster folder with subfolders per cluster, we just run a for loop over all of them:
for CLUSTER in $ CLUSTERS ; do
    pushd clusters/$ CLUSTER 
    # validate if we can create and build a flux-system like kustomization file
    kustomize create --autodetect --recursive
    if ! kustomize build . -o /dev/null 2> error.log; then
        echo "Error building flux-system kustomization for cluster $ CLUSTER "
        cat error.log
    fi
    popd
done
Second Iteration: Make Sure Our Workload Subfolder Have a kustomization.yaml Next someone figured out that you can delete some yaml files from a workload subfolder, including the kustomization.yaml, but not all of them. That left around a resource definition which lacks some other referenced objects, but is still happily included into the root kustomization by kustomize create and flux, which of course did not work. Thus we started to catch that as well in our growing for loop:
for CLUSTER in $ CLUSTERS ; do
    pushd clusters/$ CLUSTER 
    # validate if we can create and build a flux-system like kustomization file
    kustomize create --autodetect --recursive
    if ! kustomize build . -o /dev/null 2> error.log; then
        echo "Error building flux-system kustomization for cluster $ CLUSTER "
        cat error.log
    fi
    # validate if we always have a kustomization file in folders with yaml files
    for CLFOLDER in $(find . -type d); do
        test -f $ CLFOLDER /kustomization.yaml && continue
        test -f $ CLFOLDER /kustomization.yml && continue
        if [[ $(find $ CLFOLDER  -maxdepth 1 \( -name '*.yaml' -o -name '*.yml' \) -type f wc -l) != 0 ]]; then
            echo "Error Cluster $ CLUSTER  folder $ CLFOLDER  lacks a kustomization.yaml"
        fi
    done
    popd
done
Note2: I shortened those snippets to the core parts. In our case some things are a bit specific to how we implemented the execution of those checks in GitHub action workflows. Hope that's enough to transport the idea of what to check for.

11 November 2024

Gunnar Wolf: Why academics under-share research data - A social relational theory

This post is a review for Computing Reviews for Why academics under-share research data - A social relational theory , a article published in Journal of the Association for Information Science and Technology
As an academic, I have cheered for and welcomed the open access (OA) mandates that, slowly but steadily, have been accepted in one way or another throughout academia. It is now often accepted that public funds means public research. Many of our universities or funding bodies will demand that, with varying intensities sometimes they demand research to be published in an OA venue, sometimes a mandate will only prefer it. Lately, some journals and funder bodies have expanded this mandate toward open science, requiring not only research outputs (that is, articles and books) to be published openly but for the data backing the results to be made public as well. As a person who has been involved with free software promotion since the mid 1990s, it was natural for me to join the OA movement and to celebrate when various universities adopt such mandates. Now, what happens after a university or funder body adopts such a mandate? Many individual academics cheer, as it is the right thing to do. However, the authors observe that this is not really followed thoroughly by academics. What can be observed, rather, is the slow pace or feet dragging of academics when they are compelled to comply with OA mandates, or even an outright refusal to do so. If OA and open science are close to the ethos of academia, why aren t more academics enthusiastically sharing the data used for their research? This paper finds a subversive practice embodied in the refusal to comply with such mandates, and explores an hypothesis based on Karl Marx s productive worker theory and Pierre Bourdieu s ideas of symbolic capital. The paper explains that academics, as productive workers, become targets for exploitation: given that it s not only the academics sharing ethos, but private industry s push for data collection and industry-aligned research, they adapt to technological changes and jump through all kinds of hurdles to create more products, in a result that can be understood as a neoliberal productivity measurement strategy. Neoliberalism assumes that mechanisms that produce more profit for academic institutions will result in better research; it also leads to the disempowerment of academics as a class, although they are rewarded as individuals due to the specific value they produce. The authors continue by explaining how open science mandates seem to ignore the historical ways of collaboration in different scientific fields, and exploring different angles of how and why data can be seen as under-shared, failing to comply with different aspects of said mandates. This paper, built on the social sciences tradition, is clearly a controversial work that can spark interesting discussions. While it does not specifically touch on computing, it is relevant to Computing Reviews readers due to the relatively high percentage of academics among us.

10 November 2024

Reproducible Builds: Reproducible Builds in October 2024

Welcome to the October 2024 report from the Reproducible Builds project. Our reports attempt to outline what we ve been up to over the past month, highlighting news items from elsewhere in tech where they are related. As ever, if you are interested in contributing to the project, please visit our Contribute page on our website. Table of contents:
  1. Beyond bitwise equality for Reproducible Builds?
  2. Two Ways to Trustworthy at SeaGL 2024
  3. Number of cores affected Android compiler output
  4. On our mailing list
  5. diffoscope
  6. IzzyOnDroid passed 25% reproducible apps
  7. Distribution work
  8. Website updates
  9. Reproducibility testing framework
  10. Supply-chain security at Open Source Summit EU
  11. Upstream patches

Beyond bitwise equality for Reproducible Builds? Jens Dietrich, Tim White, of Victoria University of Wellington, New Zealand along with Behnaz Hassanshahi and Paddy Krishnan of Oracle Labs Australia published a paper entitled Levels of Binary Equivalence for the Comparison of Binaries from Alternative Builds :
The availability of multiple binaries built from the same sources creates new challenges and opportunities, and raises questions such as: Does build A confirm the integrity of build B? or Can build A reveal a compromised build B? . To answer such questions requires a notion of equivalence between binaries. We demonstrate that the obvious approach based on bitwise equality has significant shortcomings in practice, and that there is value in opting for alternative notions. We conceptualise this by introducing levels of equivalence, inspired by clone detection types.
A PDF of the paper is freely available.

Two Ways to Trustworthy at SeaGL 2024 On Friday 8th November, Vagrant Cascadian will present a talk entitled Two Ways to Trustworthy at SeaGL in Seattle, WA. Founded in 2013, SeaGL is a free, grassroots technical summit dedicated to spreading awareness and knowledge about free source software, hardware and culture. Vagrant s talk:
[ ] delves into how two project[s] approaches fundamental security features through Reproducible Builds, Bootstrappable Builds, code auditability, etc. to improve trustworthiness, allowing independent verification; trustworthy projects require little to no trust. Exploring the challenges that each project faces due to very different technical architectures, but also contextually relevant social structure, adoption patterns, and organizational history should provide a good backdrop to understand how different approaches to security might evolve, with real-world merits and downsides.

Number of cores affected Android compiler output Fay Stegerman wrote that the cause of the Android toolchain bug from September s report that she reported to the Android issue tracker has been found and the bug has been fixed.
the D8 Java to DEX compiler (part of the Android toolchain) eliminated a redundant field load if running the class s static initialiser was known to be free of side effects, which ended up accidentally depending on the sharding of the input, which is dependent on the number of CPU cores used during the build.
To make it easier to understand the bug and the patch, Fay also made a small example to illustrate when and why the optimisation involved is valid.

On our mailing list On our mailing list this month:

diffoscope diffoscope is our in-depth and content-aware diff utility that can locate and diagnose reproducibility issues. This month, Chris Lamb made the following changes, including preparing and uploading versions 279, 280, 281 and 282 to Debian:
  • Ignore errors when listing .ar archives (#1085257). [ ]
  • Don t try and test with systemd-ukify in the Debian stable distribution. [ ]
  • Drop Depends on the deprecated python3-pkg-resources (#1083362). [ ]
In addition, Jelle van der Waa added support for Unified Kernel Image (UKI) files. [ ][ ][ ] Furthermore, Vagrant Cascadian updated diffoscope in GNU Guix to version 282. [ ][ ]

IzzyOnDroid passed 25% reproducible apps The IzzyOnDroid project has reached a good milestone by reaching over 25% of the ~1,200 Android apps provided by their repository (of official APKs built by the original application developers) having been confirmed to be reproducible by a rebuilder.

Distribution work In Debian this month:
  • Holger Levsen uploaded devscripts version 2.24.2, including many changes to the debootsnap, debrebuild and reproducible-check scripts. This is the first time that debrebuild actually works (using sbuild s unshare backend). As part of this, Holger also fixed an issue in the reproducible-check script where a typo in the code led to incorrect results [ ]
  • Recently, a news entry was added to snapshot.debian.org s homepage, describing the recent changes that made the system stable again:
    The new server has no problems keeping up with importing the full archives on every update, as each run finishes comfortably in time before it s time to run again. [While] the new server is the one doing all the importing of updated archives, the HTTP interface is being served by both the new server and one of the VM s at LeaseWeb.
    The entry list a number of specific updates surrounding the API endpoints and rate limiting.
  • Lastly, 12 reviews of Debian packages were added, 3 were updated and 18 were removed this month adding to our knowledge about identified issues.
Elsewhere in distribution news, Zbigniew J drzejewski-Szmek performed another rebuild of Fedora 42 packages, with the headline result being that 91% of the packages are reproducible. Zbigniew also reported a reproducibility problem with QImage. Finally, in openSUSE, Bernhard M. Wiedemann published another report for that distribution.

Website updates There were an enormous number of improvements made to our website this month, including:
  • Alba Herrerias:
    • Improve consistency across distribution-specific guides. [ ]
    • Fix a number of links on the Contribute page. [ ]
  • Chris Lamb:
  • hulkoba
  • James Addison:
    • Huge and significant work on a (as-yet-merged) quickstart guide to be linked from the homepage [ ][ ][ ][ ][ ]
    • On the homepage, link directly to the Projects subpage. [ ]
    • Relocate dependency-drift notes to the Volatile inputs page. [ ]
  • Ninette Adhikari:
    • Add a brand new Success stories page that highlights the success stories of Reproducible Builds, showcasing real-world examples of projects shipping with verifiable, reproducible builds . [ ][ ][ ][ ][ ][ ]
  • Pol Dellaiera:
    • Update the website s README page for building the website under NixOS. [ ][ ][ ][ ][ ]
    • Add a new academic paper citation. [ ]
Lastly, Holger Levsen filed an extensive issue detailing a request to create an overview of recommendations and standards in relation to reproducible builds.

Reproducibility testing framework The Reproducible Builds project operates a comprehensive testing framework running primarily at tests.reproducible-builds.org in order to check packages and other artifacts for reproducibility. In October, a number of changes were made by Holger Levsen, including:
  • Add a basic index.html for rebuilderd. [ ]
  • Update the nginx.conf configuration file for rebuilderd. [ ]
  • Document how to use a rescue system for Infomaniak s OpenStack cloud. [ ]
  • Update usage info for two particular nodes. [ ]
  • Fix up a version skew check to fix the name of the riscv64 architecture. [ ]
  • Update the rebuilderd-related TODO. [ ]
In addition, Mattia Rizzolo added a new IP address for the inos5 node [ ] and Vagrant Cascadian brought 4 virt nodes back online [ ].

Supply-chain security at Open Source Summit EU The Open Source Summit EU took place recently, and covered plenty of topics related to supply-chain security, including:

Upstream patches The Reproducible Builds project detects, dissects and attempts to fix as many currently-unreproducible packages as possible. We endeavour to send all of our patches upstream where appropriate. This month, we wrote a large number of such patches, including:

Finally, If you are interested in contributing to the Reproducible Builds project, please visit our Contribute page on our website. However, you can get in touch with us via:

25 October 2024

Reproducible Builds (diffoscope): diffoscope 282 released

The diffoscope maintainers are pleased to announce the release of diffoscope version 282. This version includes the following changes:
[ Chris Lamb ]
* Ignore errors when listing .ar archives. (Closes: #1085257)
* Update copyright years.
You find out more by visiting the project homepage.

13 October 2024

Andy Simpkins: The state of the art

A long time ago . A long time ago a computer was a woman (I think almost exclusively a women, not a man) who was employed to do a lot of repetitive mathematics typically for accounting and stock / order processing. Then along came Lyons, who deployed an artificial computer to perform the same task, only with fewer errors in less time. Modern day computing was born we had entered the age of the Digital Computer. These computers were large, consumed huge amounts of power but were precise, and gave repeatable, verifiable results. Over time the huge mainframe digital computers have shrunk in size, increased in performance, and consume far less power so much so that they often didn t need the specialist CFC based, refrigerated liquid cooling systems of their bigger mainframe counterparts, only requiring forced air flow, and occasionally just convection cooling. They shrank so far and became cheep enough that the Personal Computer became to be, replacing the mainframe with its time shared resources with a machine per user. Desktop or even portable laptop computers were everywhere. We networked them together, so now we can share information around the office, a few computers were given specialist tasks of being available all the time so we could share documents, or host databases these servers were basically PCs designed to operate 24 7, usually more powerful than their desktop counterparts (or at least with faster storage and networking). Next we joined these networks together and the internet was born. The dream of a paperless office might actually become realised we can now send email (and documents) from one organisation (or individual) to another via email. We can make our specialist computers applications available outside just the office and web servers / web apps come of age. Fast forward a few years and all of a sudden we need huge data-halls filled with Rack scale machines augmented with exotic GPUs and NPUs again with refrigerated liquid cooling, all to do the same task that we were doing previously without the magical buzzword that has been named AI; because we all need another dot com bubble or block chain band waggon to jump aboard. Our AI enabled searches take slightly longer, consume magnitudes more power, and best of all the results we are given may or may not be correct . Progress, less precise answers, taking longer, consuming more power, without any verification and often giving a different result if you repeat your question AND we still need a personal computing device to access this wondrous thing. Remind me again why we are here? (time lines and huge swaves of history simply ignored to make an attempted comic point this is intended to make a point and not be scholarly work)

9 October 2024

Ben Hutchings: FOSS activity in September 2024

10 September 2024

Freexian Collaborators: Debian Contributions: Python 3 patches, OpenSSH GSS-API split, rebootstrap, salsa CI, etc. (by Anupa Ann Joseph)

Debian Contributions: 2024-08 Contributing to Debian is part of Freexian s mission. This article covers the latest achievements of Freexian and their collaborators. All of this is made possible by organizations subscribing to our Long Term Support contracts and consulting services.

Debian Python 3 patch review, by Stefano Rivera Last month, at DebConf, Stefano reviewed the current patch set of Debian s cPython packages with Matthias Klose, the primary maintainer until now. As a result of that review, Stefano re-reviewed the patchset, updating descriptions, etc. A few patches were able to be dropped, and a few others were forwarded upstream. One finds all sorts of skeletons doing reviews like this. One of the patches had been inactive (fortunately, because it was buggy) since the day it was applied, 13 years ago. One is a cleanup that probably only fixes a bug on HPUX, and is a result of copying code from xfree86 into Python 25 years ago. It was fixed in xfree86 a year later. Others support just Debian-specific functionality and probably never seemed worth forwarding. Or good cleanup that only really applies to Debian. A trivial new patch would allow Debian to multiarch co-install Python stable ABI dynamic extensions (like we can with regular dynamic extensions). Performance concerns are stalling it in review, at the moment.

DebConf 24 Organization, by Stefano Rivera Stefano helped organize DebConf 24, which concluded in early August. The event is run by a large entirely volunteer team. The work involved in making this happen is far too varied to describe here. While Freexian provides funding for 20% of collaborator time to spend on Debian-related work, it only covers a small fraction of contributions to time-intensive tasks like this. Since the end of the event, Stefano has been doing some work on the conference finances, and initiated the reimbursement process for travel bursaries.

Archive rebuilds on Debusine, by Stefano Rivera The recent setuptools 73 upload to Debian unstable removed the test subcommand, breaking many packages that were using python3 setup.py test in their Debian packaging. Stefano did a partial archive-rebuild using debusine.debian.net to find the regressions and file bugs. Debusine will be a powerful tool to do QA work like this for Debian in the future, but it doesn t have all the features needed to coordinate rebuild-testing, yet. They are planned to be fleshed out in the next year. In the meantime, Debusine has the building blocks to work through a queue of package building tasks and store the results, it just needs to be driven from outside the system. So, Stefano started working on a set of tools using the Debusine client API to perform archive rebuilds, found and tagged existing bugs, and filed many more.

OpenSSH GSS-API split, by Colin Watson Colin landed the first stage of the planned split of GSS-API authentication and key exchange support in Debian s OpenSSH packaging. In order to allow for smooth upgrades, the second stage will have to wait until after the Debian 13 (trixie) release; but once that s done, as upstream puts it, this substantially reduces the amount of pre-authentication attack surface exposed on your users sshd by default .

OpenSSL vs. cryptography, by Colin Watson Colin facilitated a discussion between Debian s OpenSSL team and the upstream maintainers of Python cryptography about a new incompatibility between Debian s OpenSSL packaging and cryptography s handling of OpenSSL s legacy provider, which was causing a number of build and test failures. While the issue remains open, the Debian OpenSSL maintainers have effectively reverted the change now, so it s no longer a pressing problem.

/usr-move, by Helmut Grohne There are less than 40 source packages left to move files to /usr, so what we re left with is the long tail of the transition. Rather than fix all of them, Helmut started a discussion on removing packages from unstable and filed a first batch. As libvirt is being restructured in experimental, we re handling the fallout in collaboration with its maintainer Andrea Bolognani. Since base-files validates the aliasing symlinks before upgrading, it was discovered that systemd has its own ideas with no solution as of yet. Helmut also proposed that dash checks for ineffective diversions of /bin/sh and that lintian warns about aliased files.

rebootstrap by Helmut Grohne Bootstrapping Debian for a new or existing CPU architecture still is a quite manual process. The rebootstrap project attempts to automate part of the early stage, but it still is very sensitive to changes in unstable. We had a number of fairly intrusive changes this year already. August included a little more fallout from the earlier gcc-for-host work where the C++ include search path would end up being wrong in the generated cross toolchain. A number of packages such as util-linux (twice), libxml2, libcap-ng or systemd had their stage profiles broken. e2fsprogs gained a cycle with libarchive-dev due to having gained support for creating an ext4 filesystem from a tar archive. The restructuring of glib2.0 remains an unsolved problem for now, but libxt and cdebconf should be buildable without glib2.0.

Salsa CI, by Santiago Ruano Rinc n Santiago completed the initial RISC-V support (!523) in the Salsa CI s pipeline. The main work started in July, but it was required to take into account some comments in the review (thanks to Ahmed!) and some final details in [!534]. riscv64 is the most recently supported port in Debian, which will be part of trixie. As its name suggests, the new build-riscv64 job makes it possible to test that a package successfully builds in the riscv64 architecture. The RISC-V runner (salsaci riscv64 runner 01) runs in a couple of machines generously provided by lab.rvperf.org. Debian Developers interested in running this job in their projects should enable the runner (salsaci riscv64 runner 01) in Settings / CI / Runners, and follow the instructions available at https://salsa.debian.org/salsa-ci-team/pipeline/#build-job-on-risc-v. Santiago also took part in discussions about how to optimize the build jobs and reviewed !537 to make the build-source job to only satisfy the Build-Depends and Build-Conflicts fields by Andrea Pappacoda. Thanks a lot to him!

Miscellaneous contributions
  • Stefano submitted patches for BeautifulSoup to support the latest soupsieve and lxml.
  • Stefano uploaded pypy3 7.3.17, upgrading the cPython compatibility from 3.9 to 3.10. Then ran into a GCC-14-related regression, which had to be ignored for now as it s proving hard to fix.
  • Colin released libpipeline 1.5.8 and man-db 2.13.0; the latter included foundations allowing adding an autopkgtest for man-db.
  • Colin upgraded 19 Python packages to new upstream versions (fixing 5 CVEs), fixed several other build failures, fixed a Python 3.12 compatibility issue in zope.security, and made python-nacl build reproducibly.
  • Colin tracked down test failures in python-asyncssh and Ruby resulting from certain odd /etc/hosts configurations.
  • Carles upgraded the packages python-ring-doorbell and simplemonitor to new upstream versions.
  • Carles started discussions and implementation of a tool (still in early days) named po-debconf-manager : a way for translators and reviewers to collaborate using git as a backend instead of mailing list; and submit the translations using salsa MR. More information next month.
  • Carles (dog-fooding po-debconf-manager ) reviewed debconf templates translated by a collaborator.
  • Carles reviewed and submitted the translation of apt .
  • Helmut sent 19 patches for improving cross building.
  • Helmut implemented the cross-exe-wrapper proposed by Simon McVittie for use with glib2.0.
  • Helmut detailed what it takes to make Perl s ExtUtils::PkgConfig suitable for cross building.
  • Helmut made the deletion of the root password work in debvm in all situations and implemented a test case using expect.
  • Anupa attended Debian Publicity team meeting and is moderating and posting on Debian Administrators LinkedIn group.
  • Thorsten uploaded package gutenprint to fix a FTBFS with gcc14 and package ipp-usb to fix a /usr-merge issue.
  • Santiago updated bzip2 to fix a long-standing bug that requested to include a pkg-config file. An important impact of this change is that it makes it possible to use Rust bindings for libbz2 by Sequoia, an implementation of OpenPGP.

Next.