cabal install
, or a fun VSCode extension gets updated, or anything like that, I am running code that could be malicious or buggy.
In a way it is surprising and reassuring that, as far as I can tell, this commonly does not happen. Most open source developers out there seem to be nice and well-meaning, after all.
git push
, git pull
from private repositories, gh pr create
) from the outside , and the actual build environment can do without access to these secrets.
The user experience I thus want is a quick way to enter a development environment where I can do most of the things I need to do while programming (network access, running command line and GUI programs), with access to the current project, but without access to my actual /home
directory.
I initially followed the blog post Application Isolation using NixOS Containers by Marcin Sucharski and got something working that mostly did what I wanted, but then a colleague pointed out that tools like firejail
can achieve roughly the same with a less global setup. I tried to use firejail
, but found it to be a bit too inflexible for my particular whims, so I ended up writing a small wrapper around the lower level sandboxing tool https://github.com/containers/bubblewrap.
dev
and included below, builds a new filesystem namespace with minimal /proc
and /dev
directories, it s own /tmp
directories. It then binds-mound some directories to make the host s NixOS system available inside the container (/bin
, /usr
, the nix store including domain socket, stuff for OpenGL applications). My user s home directory is taken from ~/.dev-home
and some configuration files are bind-mounted for convenient sharing. I intentionally don t share most of the configuration for example, a direnv enable
in the dev environment should not affect the main environment. The X11 socket for graphical applications and the corresponding .Xauthority
file is made available. And finally, if I run dev
in a project directory, this project directory is bind mounted writable, and the current working directory is preserved.
The effect is that I can type dev
on the command line to enter dev mode rather conveniently. I can run development tools, including graphical ones like VSCode, and especially the latter with its extensions is part of the sandbox. To do a git push
I either exit the development environment (Ctrl-D) or open a separate terminal. Overall, the inconvenience of switching back and forth seems worth the extra protection.
Clearly, isn t going to hold against a determined and maybe targeted attacker (e.g. access to the X11 and the nix daemon socket can probably be used to escape easily). But I hope it will help against a compromised dev dependency that just deletes or exfiltrates data, like keys or passwords, from the usual places in $HOME
.
xdg-desktop-portal
to heed my default browser settings ). For now I will live with manually copying and pasting URLs, we ll see how long this lasts.evolution
or firefox
inside the container, and if I do not even have VSCode or cabal
available outside, so that it s less likely that I forget to enter dev
before using these tools.
It shouldn t be too hard to cargo-cult some of the NixOS Containers infrastructure to be able to have a separate system configuration that I can manage as part of my normal system configuration and make available to bubblewrap
here.dev
and going back to what I did before
dev
script (at the time of writing)
#!/usr/bin/env bash
extra=()
if [[ "$PWD" == /home/jojo/build/* ]] [[ "$PWD" == /home/jojo/projekte/programming/* ]]
then
extra+=(--bind "$PWD" "$PWD" --chdir "$PWD")
fi
if [ -n "$1" ]
then
cmd=( "$@" )
else
cmd=( bash )
fi
# Caveats:
# * access to all of /etc
# * access to /nix/var/nix/daemon-socket/socket , and is trusted user (but needed to run nix)
# * access to X11
exec bwrap \
--unshare-all \
\
# blank slate \
--share-net \
--proc /proc \
--dev /dev \
--tmpfs /tmp \
--tmpfs /run/user/1000 \
\
# Needed for GLX applications, in paticular alacritty \
--dev-bind /dev/dri /dev/dri \
--ro-bind /sys/dev/char /sys/dev/char \
--ro-bind /sys/devices/pci0000:00 /sys/devices/pci0000:00 \
--ro-bind /run/opengl-driver /run/opengl-driver \
\
--ro-bind /bin /bin \
--ro-bind /usr /usr \
--ro-bind /run/current-system /run/current-system \
--ro-bind /nix /nix \
--ro-bind /etc /etc \
--ro-bind /run/systemd/resolve/stub-resolv.conf /run/systemd/resolve/stub-resolv.conf \
\
--bind ~/.dev-home /home/jojo \
--ro-bind ~/.config/alacritty ~/.config/alacritty \
--ro-bind ~/.config/nvim ~/.config/nvim \
--ro-bind ~/.local/share/nvim ~/.local/share/nvim \
--ro-bind ~/.bin ~/.bin \
\
--bind /tmp/.X11-unix/X0 /tmp/.X11-unix/X0 \
--bind ~/.Xauthority ~/.Xauthority \
--setenv DISPLAY :0 \
\
--setenv container dev \
"$ extra[@] " \
-- \
"$ cmd[@] "
Package maintainers can guarantee package authorship through software signing [but] it is unclear how common this practice is, and whether the resulting signatures are created properly. Prior work has provided raw data on signing practices, but measured single platforms, did not consider time, and did not provide insight on factors that may influence signing. We lack a comprehensive, multi-platform understanding of signing adoption and relevant factors. This study addresses this gap. (arXiv, full PDF)
[The] principle of reusability [ ] makes it harder to reproduce projects build environments, even though reproducibility of build environments is essential for collaboration, maintenance and component lifetime. In this work, we argue that functional package managers provide the tooling to make build environments reproducible in space and time, and we produce a preliminary evaluation to justify this claim.The abstract continues with the claim that Using historical data, we show that we are able to reproduce build environments of about 7 million Nix packages, and to rebuild 99.94% of the 14 thousand packages from a 6-year-old Nixpkgs revision. (arXiv, full PDF)
This paper thus proposes an approach to automatically identify configuration options causing non-reproducibility of builds. It begins by building a set of builds in order to detect non-reproducible ones through binary comparison. We then develop automated techniques that combine statistical learning with symbolic reasoning to analyze over 20,000 configuration options. Our methods are designed to both detect options causing non-reproducibility, and remedy non-reproducible configurations, two tasks that are challenging and costly to perform manually. (HAL Portal, full PDF)
fedora-repro-build
that attempts to reproduce an existing package within a koji build environment. Although the projects README
file lists a number of fields will always or almost always vary and there is a non-zero list of other known issues, this is an excellent first step towards full Fedora reproducibility.
256
, 257
and 258
to Debian and made the following additional changes:
gpg
s use-embedded-filenames. Many thanks to Daniel Kahn Gillmor dkg@debian.org for reporting this issue and providing feedback. [ ][ ]struct.unpack
-related errors when parsing Python .pyc
files. (#1064973). [ ]rdb_expected_diff
on non-GNU systems as %p
formatting can vary, especially with respect to MacOS. [ ]pytest
8.0. [ ]7zip
package (over p7zip-full
) after a Debian package transition. (#1063559). [ ]test_zip
black clean. [ ]diff(1)
correctly [ ][ ] thanks! And lastly, Vagrant Cascadian pushed updates in GNU Guix for diffoscope to version 255, 256, and 258, and updated trydiffoscope to 67.0.6.
README.rst
to match. [ ][ ]--vary=build_path.path
option. [ ][ ][ ][ ]SOURCE_DATE_EPOCH
page. [ ]SOURCE_DATE_EPOCH
documentation re. datetime.datetime.fromtimestamp
. Thanks, James Addison. [ ]/usr/bin/du --apparent-size
in the Jenkins shell monitor. [ ]arm64
nodes. [ ]/proc/$pid/oom_score_adj
to -1000 if it has not already been done. [ ]opemwrt-target-tegra
and jtx
task to the list of zombie jobs. [ ][ ]armhf
architecture build nodes, virt32z
and virt64z
, and insert them into the Munin monitoring. [ ][ ] [ ][ ]tegra
target with mpc85xx
[ ], Jan-Benedict Glaw updated the NetBSD build script to use a separate $TMPDIR
to mitigate out of space issues on a tmpfs-backed /tmp
[ ] and Zheng Junjie added a link to the GNU Guix tests [ ].
Lastly, node maintenance was performed by Holger Levsen [ ][ ][ ][ ][ ][ ] and Vagrant Cascadian [ ][ ][ ][ ].
gimagereader
(date)grass
(date-related issue)grub2
(filesystem ordering issue)latex2html
(drop a non-deterministic log)mhvtl
(tar)obs
(build-tool issue)ollama
(GZip embedding the modification time)presenterm
(filesystem-ordering issue)qt6-quick3d
(parallelism)flask-limiter
.python-parsl-doc
(disable dynamic argument evaluation by Sphinx autodoc
extension)python3-pytest-repeat
(remove entry_points.txt
creation that varied by shell)python3-selinux
(remove packaged direct_url.json
file that embeds build path)python3-sepolicy
(remove packaged direct_url.json
file that embeds build path)pyswarms
.python-x2go
.snapd
(fix timestamp header in packaged manual-page)zzzeeksphinx
(existing RB patch forwarded and merged (with modifications))#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
Publisher: | St. Martin's Press |
Copyright: | 2023 |
ISBN: | 1-250-27694-2 |
Format: | Kindle |
Pages: | 310 |
A lawyer for Dalio said he "treated all employees equally, giving people at all levels the same respect and extending them the same perks."Uh-huh. Anyway, I personally know nothing about Bridgewater other than what I learned here and the occasional mention in Matt Levine's newsletter (which is where I got the recommendation for this book). I have no independent information whether anything Copeland describes here is true, but Copeland provides the typical extensive list of notes and sourcing one expects in a book like this, and Levine's comments indicated it's generally consistent with Bridgewater's industry reputation. I think this book is true, but since the clear implication is that the world's largest hedge fund was primarily a deranged cult whose employees mostly spied on and rated each other rather than doing any real investment work, I also have questions, not all of which Copeland answers to my satisfaction. But more on that later. The center of this book are the Principles. These were an ever-changing list of rules and maxims for how people should conduct themselves within Bridgewater. Per Copeland, although Dalio later published a book by that name, the version of the Principles that made it into the book was sanitized and significantly edited down from the version used inside the company. Dalio was constantly adding new ones and sometimes changing them, but the common theme was radical, confrontational "honesty": never being silent about problems, confronting people directly about anything that they did wrong, and telling people all of their faults so that they could "know themselves better." If this sounds like textbook abusive behavior, you have the right idea. This part Dalio admits to openly, describing Bridgewater as a firm that isn't for everyone but that achieves great results because of this culture. But the uncomfortably confrontational vibes are only the tip of the iceberg of dysfunction. Here are just a few of the ways this played out according to Copeland:
This covers basically all my known omissions from last update except spellchecking of the Description field.
The X- style prefixes for field names are now understood and handled. This means the language server now considers XC-Package-Type the same as Package-Type.
More diagnostics:
- Fields without values now trigger an error marker
- Duplicated fields now trigger an error marker
- Fields used in the wrong paragraph now trigger an error marker
- Typos in field names or values now trigger a warning marker. For field names, X- style prefixes are stripped before typo detection is done.
- The value of the Section field is now validated against a dataset of known sections and trigger a warning marker if not known.
The "on-save trim end of line whitespace" now works. I had a logic bug in the server side code that made it submit "no change" edits to the editor.
The language server now provides "hover" documentation for field names. There is a small screenshot of this below. Sadly, emacs does not support markdown or, if it does, it does not announce the support for markdown. For now, all the documentation is always in markdown format and the language server will tag it as either markdown or plaintext depending on the announced support.
The language server now provides quick fixes for some of the more trivial problems such as deprecated fields or typos of fields and values.
Added more known fields including the XS-Autobuild field for non-free packages along with a link to the relevant devref section in its hover doc.
Despite its very limited feature set, I feel editing debian/control in emacs is now a much more pleasant experience. Coming back to the features that Otto requested, the above covers a grand total of zero. Sorry, Otto. It is not you, it is me.
- Diagnostics or linting of basic issues.
- Completion suggestions for all known field names that I could think of and values for some fields.
- Folding ranges (untested). This feature enables the editor to "fold" multiple lines. It is often used with multi-line comments and that is the feature currently supported.
- On save, trim trailing whitespace at the end of lines (untested). Might not be registered correctly on the server end.
Notable omission at this time:
- An error marker for syntax errors.
- An error marker for missing a mandatory field like Package or Architecture. This also includes Standards-Version, which is admittedly mandatory by policy rather than tooling falling part.
- An error marker for adding Multi-Arch: same to an Architecture: all package.
- Error marker for providing an unknown value to a field with a set of known values. As an example, writing foo in Multi-Arch would trigger this one.
- Warning marker for using deprecated fields such as DM-Upload-Allowed, or when setting a field to its default value for fields like Essential. The latter rule only applies to selected fields and notably Multi-Arch: no does not trigger a warning.
- Info level marker if a field like Priority duplicates the value of the Source paragraph.
- No errors are raised if a field does not have a value.
- No errors are raised if a field is duplicated inside a paragraph.
- No errors are used if a field is used in the wrong paragraph.
- No spellchecking of the Description field.
- No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
- Quick fixes to solve these problems... :)
Obviously, the setup should get easier over time. The first three bullet points should eventually get resolved by merges and upload meaning you end up with an apt install command instead of them. For the editor part, I would obviously love it if we can add snippets for editors to make the automatically pick up the language server when the relevant file is installed.
- Build and install the deb of the main branch of pygls from https://salsa.debian.org/debian/pygls The package is in NEW and hopefully this step will soon just be a regular apt install.
- Build and install the deb of the rts-locatable branch of my python-debian fork from https://salsa.debian.org/nthykier/python-debian There is a draft MR of it as well on the main repo.
- Build and install the deb of the lsp-support branch of debputy from https://salsa.debian.org/debian/debputy
- Configure your editor to run debputy lsp debian/control as the language server for debian/control. This is depends on your editor. I figured out how to do it for emacs (see below). I also found a guide for neovim at https://neovim.io/doc/user/lsp. Note that debputy can be run from any directory here. The debian/control is a reference to the file format and not a concrete file in this case.
(with-eval-after-load 'eglot
(add-to-list 'eglot-server-programs
'(debian-control-mode . ("debputy" "lsp" "debian/control"))))
- [X] No errors are raised if a field does not have a value.
- [X] No errors are raised if a field is duplicated inside a paragraph.
- [X] No errors are used if a field is used in the wrong paragraph.
- [ ] No spellchecking of the Description field.
- [X] No understanding that Foo and X[CBS]-Foo are related. As an example, XC-Package-Type is completely ignored despite being the old name for Package-Type.
- [X] Fixed the on-save trim end of line whitespace. Bug in the server end.
- [X] Hover text for field names
queer.af
domain registration by the Taliban, the fragile and difficult nature of country-code top-level domains (ccTLDs) has once again been comprehensively demonstrated.
Since many people may not be aware of the risks, I thought I d give a solid explainer of the whole situation, and explain why you should, in general, not have anything to do with domains which are registered under ccTLDs.
https://
in your web browser s location bar).
It s the com in example.com
, or the af in queer.af
.
There are two kinds of TLDs: country-code TLDs (ccTLDs) and generic TLDs (gTLDs).
Despite all being TLDs, they re very different beasts under the hood.
queer.af
cancellation is interesting because, at the time the domain was reportedly registered, 2018, Afghanistan had what one might describe as, at least, a different political climate.
Since then, of course, things have changed, and the new bosses have decided to get a bit more active.
Those running queer.af
seem to have seen the writing on the wall, and were planning on moving to another, less fraught, domain, but hadn t completed that move when the Taliban came knocking.
.eu
, you have to be a resident of the EU.
When the UK ceased to be part of the EU, residents of the UK were no longer EU residents.
Cue much unhappiness, wailing, and gnashing of teeth when this was pointed out to Britons.
Some decided to give up their domains, and move to other parts of the Internet, while others managed to hold onto them by various legal sleight-of-hand (like having an EU company maintain the registration on their behalf).
In any event, all very unpleasant for everyone involved.
.sc
domain names from US$25 to US$75. No reason, no warning, just pay up .
.ly
.
These domain registrations weren t (and aren t) cheap, and it s hard to imagine that at least some of that money wasn t going to benefit the Gaddafi regime.
Similarly, the British Indian Ocean Territory, which has the io ccTLD, was created in a colonialist piece of chicanery that expelled thousands of native Chagossians from Diego Garcia.
Money from the registration of .io
domains doesn t go to the (former) residents of the Chagos islands, instead it gets paid to the UK government.
Again, I m not trying to suggest that all gTLD operators are wonderful people, but it s not particularly likely that the direct beneficiaries of the operation of a gTLD stole an island chain and evicted the residents.
.au
namespace some years ago.
Essentially, while a ccTLD may have geographic connotations now, there s not a lot of guarantee that they won t fall victim to scope creep in the future.
Finally, it might be somewhat safer to register under a ccTLD if you live in the location involved.
At least then you might have a better idea of whether your domain is likely to get pulled out from underneath you.
Unfortunately, as the .eu
example shows, living somewhere today is no guarantee you ll still be living there tomorrow, even if you don t move house.
In short, I d suggest sticking to gTLDs.
They re at least lower risk than ccTLDs.
apt install git gpg python3-debian python3-dacite apt build-dep linux
linux
and kernel-team
repos:
git clone --depth 1 https://salsa.debian.org/kernel-team/linux git clone --depth 1 https://salsa.debian.org/kernel-team/kernel-team
linux
directory:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable.git
torvalds
repo if you re building an RC version instead:
debian/bin/genorig.py https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
debian/control
. The first
command will take a bit, and the second command will fail: but that s success just as the output says.
debian/rules orig debian/rules debian/control
debian/rules source
debian/config/
to see where your
changes should go. If it s a setting shared among multiple architectures that
may be debian/config/config
. For x86-specific things, the file is
debian/config/amd64/config
. On aarch64 debian/config/arm64/config
. If in
doubt, you could try asking #debian-kernel
on IRC.
../kernel-team/utils/kconfigeditor2/process.py .
debian/rules source
. The
debian/build/source_rt/Kconfig
file is needed by the script:
Traceback (most recent call last): File "/tmp/linux/../kernel-team/utils/kconfigeditor2/process.py", line 19, in __init__ menu = fs_menu[featureset or 'none'] ~~~~~~~^^^^^^^^^^^^^^^^^^^^^^ KeyError: 'rt' During handling of the above exception, another exception occurred: [...] FileNotFoundError: [Errno 2] No such file or directory: './debian/build/source_rt/Kconfig'
debian/rules source
process.py
should work fine and fix your config
file.
export MAKEFLAGS=-j$(nproc) export DEB_BUILD_PROFILES='pkg.linux.nokerneldbg pkg.linux.nokerneldbginfo pkg.linux.notools nodoc' dpkg-buildpackage -b -nc -uc
for x in $(gh api graphql --paginate -f query='query($endCursor:String) organization(login:"myorg")
repositories(first: 100, after: $endCursor, isArchived:false)
pageInfo
hasNextPage
endCursor
nodes
name
' --jq '.data.organization.repositories.nodes[].name'); do
secrets=$(gh secret list --json name --jq '.[].name' -R "myorg/$ x " tr '\n' ',')
if ! [ -z "$ secrets " ]; then
echo "$ x ,$ secrets "
fi
done
Requests a list of all not archived repositories in a GitHub org and queries
repository secrets. If we find some we output the repo name and the
secrets in a comma separated list. Not real CSV, but good enough for further
processing. I've to admit it's kinda beautiful what you can do with the
gh cli by now. Sadly it seems the secrets are not yet available via GraphQL
(or I missed it in the docs), so I just use the gh cli to do the REST calls.
pyproject.toml
files, I wanted to investigate how the popularity of build
backends used in pyproject.toml
files evolved over the years since the
introduction of PEP-0517 in 2015.
Getting the data
Tom Forbes provides a huge
dataset that contains information
about every file within every release uploaded to PyPI. To
get the current dataset, we can use:
curl -L --remote-name-all $(curl -L "https://github.com/pypi-data/data/raw/main/links/dataset.txt")
describe select * from '*.parquet';
column_name column_type null
varchar varchar varchar
project_name VARCHAR YES
project_version VARCHAR YES
project_release VARCHAR YES
uploaded_on TIMESTAMP YES
path VARCHAR YES
archive_path VARCHAR YES
size UBIGINT YES
hash BLOB YES
skip_reason VARCHAR YES
lines UBIGINT YES
repository UINTEGER YES
11 rows 6 columns
pyproject.toml
files that are in the project s root directory. Since we ll still have to
download the actual files, we need to get the path
and the repository
to
construct the corresponding URL to the mirror that contains all files in a
bunch of huge git repositories. Some files are not available on the mirrors; to
skip these, we only take files where the skip_reason
is empty. We also care
about the timestamp of the upload (uploaded_on
) and the hash
to avoid
processing identical files twice:
select
path,
hash,
uploaded_on,
repository
from '*.parquet'
where
skip_reason == '' and
lower(string_split(path, '/')[-1]) == 'pyproject.toml' and
len(string_split(path, '/')) == 5
order by uploaded_on desc
repository
and path
, we can now construct an URL from which we
can fetch the actual file for further processing:
url = f"https://raw.githubusercontent.com/pypi-data/pypi-mirror- repository /code/ path "
pyproject.toml
files and parse them to read
the build-backend
into a dictionary mapping the file-hash
to the build
backend. Downloads on GitHub are rate-limited, so downloading 1.2M files
will take a couple of days. By skipping files with a hash we ve already
processed, we can avoid downloading the same file more than once, cutting the
required downloads by circa 50%.
Results
Assuming the data is complete and my analysis is sound, these are the findings:
There is a surprising amount of build backends in use, but the overall amount
of uploads per build backend decreases quickly, with a long tail of single
uploads:
>>> results.backend.value_counts()
backend
setuptools 701550
poetry 380830
hatchling 56917
flit 36223
pdm 11437
maturin 9796
jupyter 1707
mesonpy 625
scikit 556
...
postry 1
tree 1
setuptoos 1
neuron 1
avalon 1
maturimaturinn 1
jsonpath 1
ha 1
pyo3 1
Name: count, Length: 73, dtype: int64
pyproject.toml
files. During that early
period, Flit started as the most popular build backend, but was eventually
displaced by Setuptools and Poetry.
Between 2020 and 2020, the overall usage of pyproject.toml
files increased
significantly. By the end of 2022, the share of Setuptools peaked at 70%.
After 2020, other build backends experienced a gradual rise in popularity.
Amongh these, Hatch emerged as a notable contender, steadily gaining
traction and ultimately stabilizing at 10%.
We can also look into the absolute distribution of build backends over time:
The plot shows that Setuptools has the strongest growth trajectory, surpassing
all other build backends. Poetry and Hatch are growing at a comparable rate,
but since Hatch started roughly 4 years after Poetry, it s lagging behind in
popularity. Despite not being among the most widely used backends anymore, Flit
maintains a steady and consistent growth pattern, indicating its enduring
relevance in the Python packaging landscape.
The script for downloading and analyzing the data can be found in my GitHub
repository. It contains the results of the duckb query (so you
don t have to download the full dataset) and the pickled dictionary, mapping
the file hashes to the build backends, saving you days for downloading and
analyzing the pyproject.toml
files yourself.