build-id
(more specifically, the ELF note
.note.gnu.build-id
) wound up in the Debian apt archive metadata.
I ve always thought this was super cool, and seeing as how Michael Stapelberg
blogged
some great pointers around the ecosystem, including the fancy new debuginfod
service, and the
find-dbgsym-packages
helper, which uses these same headers, I don t think I m the only one.
At work I ve been using a lot of rust,
specifically, async rust using tokio. To try and work on
my style, and to dig deeper into the how and why of the decisions made in these
frameworks, I ve decided to hack up a project that I ve wanted to do ever
since 2015 write a debug filesystem. Let s get to it.
9p
, the network protocol for serving
a filesystem over a network. This leads to all sorts of fun programs, like the
Plan 9 ftp
client being a 9p server you mount the ftp server and access
files like any other files. It s kinda like if fuse were more fully a part
of how the operating system worked, but fuse is all running client-side. With
9p there s a single client, and different servers that you can connect to,
which may be backed by a hard drive, remote resources over something like SFTP, FTP, HTTP or even purely synthetic.
The interesting (maybe sad?) part here is that 9p wound up outliving Plan 9
in terms of adoption 9p
is in all sorts of places folks don t usually expect.
For instance, the Windows Subsystem for Linux uses the 9p protocol to share
files between Windows and Linux. ChromeOS uses it to share files with Crostini,
and qemu uses 9p (virtio-p9
) to share files between guest and host. If you re
noticing a pattern here, you d be right; for some reason 9p is the go-to protocol
to exchange files between hypervisor and guest. Why? I have no idea, except maybe
due to being designed well, simple to implement, and it s a lot easier to validate the data being shared
and validate security boundaries. Simplicity has its value.
As a result, there s a lot of lingering 9p support kicking around. Turns out
Linux can even handle mounting 9p filesystems out of the box. This means that I
can deploy a filesystem to my LAN or my localhost
by running a process on top
of a computer that needs nothing special, and mount it over the network on an
unmodified machine unlike fuse
, where you d need client-specific software
to run in order to mount the directory. For instance, let s mount a 9p
filesystem running on my localhost machine, serving requests on 127.0.0.1:564
(tcp) that goes by the name mountpointname
to /mnt
.
$ mount -t 9p \ -o trans=tcp,port=564,version=9p2000.u,aname=mountpointname \ 127.0.0.1 \ /mntLinux will mount away, and attach to the filesystem as the root user, and by default, attach to that mountpoint again for each local user that attempts to use it. Nifty, right? I think so. The server is able to keep track of per-user access and authorization along with the host OS.
rust
and tokio
specifically,
I opted to implement the whole stack myself, without third party libraries on
the critical path where I could avoid it. The 9p protocol (sometimes called
Styx
, the original name for it) is incredibly simple. It s a series of client
to server requests, which receive a server to client response. These are,
respectively, T
messages, which t
ransmit a request to the server, which
trigger an R
message in response (R
eply messages). These messages are
TLV payload
with a very straight forward structure so straight forward, in fact, that I
was able to implement a working server off nothing more than a handful of man
pages.
Later on after the basics worked, I found a more complete
spec page
that contains more information about the
unix specific variant
that I opted to use (9P2000.u
rather than 9P2000
) due to the level
of Linux
specific support for the 9P2000.u
variant over the 9P2000
protocol.
rust
and tokio
running i/o for an HTTP
and WebRTC
server. I figured I d pick something
fairly similar to write my filesystem with, since 9P
can be implemented
on basically anything with I/O. That means tokio
tcp server bits, which
construct and use a 9p
server, which has an idiomatic Rusty API that
partially abstracts the raw R
and T
messages, but not so much as to
cause issues with hiding implementation possibilities. At each abstraction
level, there s an escape hatch allowing someone to implement any of
the layers if required. I called this framework
arigato which can be found over on
docs.rs and
crates.io.
/// Simplified version of the arigato File trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.File.html
trait File
/// OpenFile is the type returned by this File via an Open call.
type OpenFile: OpenFile;
/// Return the 9p Qid for this file. A file is the same if the Qid is
/// the same. A Qid contains information about the mode of the file,
/// version of the file, and a unique 64 bit identifier.
fn qid(&self) -> Qid;
/// Construct the 9p Stat struct with metadata about a file.
async fn stat(&self) -> FileResult<Stat>;
/// Attempt to update the file metadata.
async fn wstat(&mut self, s: &Stat) -> FileResult<()>;
/// Traverse the filesystem tree.
async fn walk(&self, path: &[&str]) -> FileResult<(Option<Self>, Vec<Self>)>;
/// Request that a file's reference be removed from the file tree.
async fn unlink(&mut self) -> FileResult<()>;
/// Create a file at a specific location in the file tree.
async fn create(
&mut self,
name: &str,
perm: u16,
ty: FileType,
mode: OpenMode,
extension: &str,
) -> FileResult<Self>;
/// Open the File, returning a handle to the open file, which handles
/// file i/o. This is split into a second type since it is genuinely
/// unrelated -- and the fact that a file is Open or Closed can be
/// handled by the arigato server for us.
async fn open(&mut self, mode: OpenMode) -> FileResult<Self::OpenFile>;
/// Simplified version of the arigato OpenFile trait; this isn't actually
/// the same trait; there's some small cosmetic differences. The
/// actual trait can be found at:
///
/// https://docs.rs/arigato/latest/arigato/server/trait.OpenFile.html
trait OpenFile
/// iounit to report for this file. The iounit reported is used for Read
/// or Write operations to signal, if non-zero, the maximum size that is
/// guaranteed to be transferred atomically.
fn iounit(&self) -> u32;
/// Read some number of bytes up to buf.len() from the provided
/// offset of the underlying file. The number of bytes read is
/// returned.
async fn read_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
/// Write some number of bytes up to buf.len() from the provided
/// offset of the underlying file. The number of bytes written
/// is returned.
fn write_at(
&mut self,
buf: &mut [u8],
offset: u64,
) -> FileResult<u32>;
arigato
to implement a 9p
filesystem we ll call
debugfs that will serve all the debug
files shipped according to the Packages
metadata from the apt
archive. We ll
fetch the Packages
file and construct a filesystem based on the reported
Build-Id
entries. For those who don t know much about how an apt
repo
works, here s the 2-second crash course on what we re doing. The first is to
fetch the Packages
file, which is specific to a binary architecture (such as
amd64
, arm64
or riscv64
). That architecture
is specific to a
component
(such as main
, contrib
or non-free
). That component
is
specific to a suite
, such as stable
, unstable
or any of its aliases
(bullseye
, bookworm
, etc). Let s take a look at the Packages.xz
file for
the unstable-debug
suite
, main
component
, for all amd64
binaries.
$ curl \
https://deb.debian.org/debian-debug/dists/unstable-debug/main/binary-amd64/Packages.xz \
unxz
This will return the Debian-style
rfc2822-like headers,
which is an export of the metadata contained inside each .deb
file which
apt
(or other tools that can use the apt
repo format) use to fetch
information about debs. Let s take a look at the debug headers for the
netlabel-tools
package in unstable
which is a package named
netlabel-tools-dbgsym
in unstable-debug
.
Package: netlabel-tools-dbgsym
Source: netlabel-tools (0.30.0-1)
Version: 0.30.0-1+b1
Installed-Size: 79
Maintainer: Paul Tagliamonte <paultag@debian.org>
Architecture: amd64
Depends: netlabel-tools (= 0.30.0-1+b1)
Description: debug symbols for netlabel-tools
Auto-Built-Package: debug-symbols
Build-Ids: e59f81f6573dadd5d95a6e4474d9388ab2777e2a
Description-md5: a0e587a0cf730c88a4010f78562e6db7
Section: debug
Priority: optional
Filename: pool/main/n/netlabel-tools/netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
Size: 62776
SHA256: 0e9bdb087617f0350995a84fb9aa84541bc4df45c6cd717f2157aa83711d0c60
So here, we can parse the package headers in the Packages.xz
file, and store,
for each Build-Id
, the Filename
where we can fetch the .deb
at. Each
.deb
contains a number of files but we re only really interested in the
files inside the .deb
located at or under /usr/lib/debug/.build-id/
,
which you can find in debugfs
under
rfc822.rs. It s
crude, and very single-purpose, but I m feeling a bit lazy.
.deb
file is a special type of
.ar file, that contains (usually)
three files inside debian-binary
, control.tar.xz
and data.tar.xz
.
The core of an .ar
file is a fixed size (60 byte
) entry header,
followed by the specified size
number of bytes.
[8 byte .ar file magic]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
[60 byte entry header]
[N bytes of data]
...
First up was to implement a basic ar
parser in
ar.rs. Before we get
into using it to parse a deb, as a quick diversion, let s break apart a .deb
file by hand something that is a bit of a rite of passage (or at least it
used to be? I m getting old) during the Debian nm (new member) process, to take
a look at where exactly the .debug
file lives inside the .deb
file.
$ ar x netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ ls
control.tar.xz debian-binary
data.tar.xz netlabel-tools-dbgsym_0.30.0-1+b1_amd64.deb
$ tar --list -f data.tar.xz grep '.debug$'
./usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
Since we know quite a bit about the structure of a .deb
file, and I had to
implement support from scratch anyway, I opted to implement a (very!) basic
debfile parser using HTTP Range requests. HTTP Range requests, if supported by
the server (denoted by a accept-ranges: bytes
HTTP header in response to an
HTTP HEAD
request to that file) means that we can add a header such as
range: bytes=8-68
to specifically request that the returned GET
body be the
byte range provided (in the above case, the bytes starting from byte offset 8
until byte offset 68
). This means we can fetch just the ar file entry from
the .deb
file until we get to the file inside the .deb
we are interested in
(in our case, the data.tar.xz
file) at which point we can request the body
of that file with a final range
request. I wound up writing a struct to
handle a read_at
-style API surface in
hrange.rs, which
we can pair with ar.rs
above and start to find our data in the .deb
remotely
without downloading and unpacking the .deb
at all.
After we have the body of the data.tar.xz
coming back through the HTTP
response, we get to pipe it through an xz
decompressor (this kinda sucked in
Rust, since a tokio
AsyncRead
is not the same as an http
Body response is
not the same as std::io::Read
, is not the same as an async (or sync)
Iterator
is not the same as what the xz2
crate expects; leading me to read
blocks of data to a buffer and stuff them through the decoder by looping over
the buffer for each lzma2
packet in a loop), and tar
file parser (similarly
troublesome). From there we get to iterate over all entries in the tarfile,
stopping when we reach our file of interest. Since we can t seek, but gdb
needs to, we ll pull it out of the stream into a Cursor<Vec<u8>>
in-memory
and pass a handle to it back to the user.
From here on out its a matter of
gluing together a File traited struct
in debugfs
, and serving the filesystem over TCP using arigato
. Done
deal!
xz
file. What s interesting is xz
has a great primitive
to solve this specific problem (specifically, use a block size that allows you
to seek to the block as close to your desired seek position just before it,
only discarding at most block size - 1
bytes), but data.tar.xz
files
generated by dpkg
appear to have a single mega-huge block for the whole file.
I don t know why I would have expected any different, in retrospect. That means
that this now devolves into the base case of How do I seek around an lzma2
compressed data stream ; which is a lot more complex of a question.
Thankfully, notoriously brilliant tianon was
nice enough to introduce me to Jon Johnson
who did something super similar adapted a technique to seek inside a
compressed gzip
file, which lets his service
oci.dag.dev
seek through Docker container images super fast based on some prior work
such as soci-snapshotter
, gztool
, and
zran.c.
He also pulled this party trick off for apk based distros
over at apk.dag.dev, which seems apropos.
Jon was nice enough to publish a lot of his work on this specifically in a
central place under the name targz
on his GitHub, which has been a ton of fun to read through.
The gist is that, by dumping the decompressor s state (window of previous
bytes, in-memory data derived from the last N-1 bytes
) at specific
checkpoints along with the compressed data stream offset in bytes and
decompressed offset in bytes, one can seek to that checkpoint in the compressed
stream and pick up where you left off creating a similar block mechanism
against the wishes of gzip. It means you d need to do an O(n)
run over the
file, but every request after that will be sped up according to the number
of checkpoints you ve taken.
Given the complexity of xz
and lzma2
, I don t think this is possible
for me at the moment especially given most of the files I ll be requesting
will not be loaded from again especially when I can just cache the debug
header by Build-Id
. I want to implement this (because I m generally curious
and Jon has a way of getting someone excited about compression schemes, which
is not a sentence I thought I d ever say out loud), but for now I m going to
move on without this optimization. Such a shame, since it kills a lot of the
work that went into seeking around the .deb
file in the first place, given
the debian-binary
and control.tar.gz
members are so small.
debugfs
out for a spin! First, we need
to mount the filesystem. It even works on an entirely unmodified, stock
Debian box on my LAN, which is huge. Let s take it for a spin:
$ mount \
-t 9p \
-o trans=tcp,version=9p2000.u,aname=unstable-debug \
192.168.0.2 \
/usr/lib/debug/.build-id/
And, let s prove to ourselves that this actually mounted before we go
trying to use it:
$ mount grep build-id
192.168.0.2 on /usr/lib/debug/.build-id type 9p (rw,relatime,aname=unstable-debug,access=user,trans=tcp,version=9p2000.u,port=564)
Slick. We ve got an open connection to the server, where our host
will keep a connection alive as root, attached to the filesystem provided
in aname
. Let s take a look at it.
$ ls /usr/lib/debug/.build-id/
00 0d 1a 27 34 41 4e 5b 68 75 82 8E 9b a8 b5 c2 CE db e7 f3
01 0e 1b 28 35 42 4f 5c 69 76 83 8f 9c a9 b6 c3 cf dc E7 f4
02 0f 1c 29 36 43 50 5d 6a 77 84 90 9d aa b7 c4 d0 dd e8 f5
03 10 1d 2a 37 44 51 5e 6b 78 85 91 9e ab b8 c5 d1 de e9 f6
04 11 1e 2b 38 45 52 5f 6c 79 86 92 9f ac b9 c6 d2 df ea f7
05 12 1f 2c 39 46 53 60 6d 7a 87 93 a0 ad ba c7 d3 e0 eb f8
06 13 20 2d 3a 47 54 61 6e 7b 88 94 a1 ae bb c8 d4 e1 ec f9
07 14 21 2e 3b 48 55 62 6f 7c 89 95 a2 af bc c9 d5 e2 ed fa
08 15 22 2f 3c 49 56 63 70 7d 8a 96 a3 b0 bd ca d6 e3 ee fb
09 16 23 30 3d 4a 57 64 71 7e 8b 97 a4 b1 be cb d7 e4 ef fc
0a 17 24 31 3e 4b 58 65 72 7f 8c 98 a5 b2 bf cc d8 E4 f0 fd
0b 18 25 32 3f 4c 59 66 73 80 8d 99 a6 b3 c0 cd d9 e5 f1 fe
0c 19 26 33 40 4d 5a 67 74 81 8e 9a a7 b4 c1 ce da e6 f2 ff
Outstanding. Let s try using gdb
to debug a binary that was provided by
the Debian
archive, and see if it ll load the ELF by build-id
from the
right .deb
in the unstable-debug
suite:
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Yes! Yes it will!
$ file /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
/usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug: ELF 64-bit LSB shared object, x86-64, version 1 (SYSV), dynamically linked, interpreter *empty*, BuildID[sha1]=e59f81f6573dadd5d95a6e4474d9388ab2777e2a, for GNU/Linux 3.2.0, with debug_info, not stripped
9p
is mainline, which is great, but it s not robust.
Network issues or server restarts will wedge the mountpoint (Linux can t
reconnect when the tcp connection breaks), and things that work fine on local
filesystems get translated in a way that causes a lot of network chatter for
instance, just due to the way the syscalls are translated, doing an ls
, will
result in a stat
call for each file in the directory, even though linux had
just got a stat
entry for every file while it was resolving directory names.
On top of that, Linux will serialize all I/O with the server, so there s no
concurrent requests for file information, writes, or reads pending at the same
time to the server; and read
and write
throughput will degrade as latency
increases due to increasing round-trip time, even though there are offsets
included in the read
and write
calls. It works well enough, but is
frustrating to run up against, since there s not a lot you can do server-side
to help with this beyond implementing the 9P2000.L
variant (which, maybe is
worth it).
tar
file and found the correct member, so for most files, we don t
know the real size to report when getting a stat
. We can t parse the tarfiles
for every stat
call, since that d make ls
even slower (bummer). Only
hiccup is that when I report a filesize of zero, gdb
throws a bit of a
fit; let s try with a size of 0
to start:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 0 Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
warning: Discarding section .note.gnu.build-id which has a section size (24) larger than the file size [in module /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug]
[...]
This obviously won t work since gdb
will throw away all our hard work because
of stat
s output, and neither will loading the real size of the underlying
file. That only leaves us with hardcoding a file size and hope nothing else
breaks significantly as a result. Let s try it again:
$ ls -lah /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
-r--r--r-- 1 root root 954M Dec 31 1969 /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug
$ gdb -q /usr/sbin/netlabelctl
Reading symbols from /usr/sbin/netlabelctl...
Reading symbols from /usr/lib/debug/.build-id/e5/9f81f6573dadd5d95a6e4474d9388ab2777e2a.debug...
(gdb)
Much better. I mean, terrible but better. Better for now, anyway.
9p
arigato
-based filesystems for use around my LAN, but I
don t think I ll be moving to use debugfs
until I can figure out how to
ensure the connection is more resilient to changing networks, server restarts
and fixes on i/o performance. I think it was a useful exercise and is a pretty
great hack, but I don t think this ll be shipping anywhere anytime soon.
Along with me publishing this post, I ve pushed up all my repos; so you
should be able to play along at home! There s a lot more work to be done
on arigato
; but it does handshake and successfully export a working
9P2000.u
filesystem. Check it out on on my github at
arigato,
debugfs
and also on crates.io
and docs.rs.
At least I can say I was here and I got it working after all these years.
So far, I have not found any reproducibility issues; everything I tested I was able to get to build bit-for-bit identical with what is in the Debian archive.That is to say, reproducibility testing permitted Vagrant and Debian to claim with some confidence that builds performed when this vulnerable version of XZ was installed were not interfered with.
Functional package managers (FPMs) and reproducible builds (R-B) are technologies and methodologies that are conceptually very different from the traditional software deployment model, and that have promising properties for software supply chain security. This thesis aims to evaluate the impact of FPMs and R-B on the security of the software supply chain and propose improvements to the FPM model to further improve trust in the open source supply chain. PDFJulien s paper poses a number of research questions on how the model of distributions such as GNU Guix and NixOS can be leveraged to further improve the safety of the software supply chain , etc.
normal
to a new level of wishlist
. In addition, 28 reviews of Debian packages were added, 38 were updated and 23 were removed this month adding to ever-growing knowledge about identified issues. As part of this effort, a number of issue types were updated, including Chris Lamb adding a new ocaml_include_directories
toolchain issue [ ] and James Addison adding a new filesystem_order_in_java_jar_manifest_mf_include_resource
issue [ ] and updating the random_uuid_in_notebooks_generated_by_nbsphinx
to reference a relevant discussion thread [ ].
In addition, Roland Clobus posted his 24th status update of reproducible Debian ISO images. Roland highlights that the images for Debian unstable often cannot be generated due to changes in that distribution related to the 64-bit time_t
transition.
Lastly, Bernhard M. Wiedemann posted another monthly update for his reproducibility work in openSUSE.
buildinfo
file was present. Arnout Engelen responded with some details.
diff-zip-meta.py
tool to expose extra timestamps embedded in .zip
and .apk
metadata.
CITATION.cff
file. Pol also added an substantial new section to the buy in page documenting the role of Software Bill of Materials (SBOMs) and ephemeral development environments. [ ][ ]
amd64
virtual machines. [ ][ ][ ]
set
data structure is also affected by the PYTHONHASHSEED
functionality. [ ]
259
, 260
and 261
to Debian and made the following additional changes:
zipdetails
tool from the Perl distribution. Thanks to Fay Stegerman and Larry Doolittle et al. for the pointer and thread about this tool. [ ]File.recognizes
so we actually perform the filename check for GNU R data files. [ ].rdb
file without an equivalent .rdx
file. (#1066991).pyc
file with an empty one. [ ].epub
tests after supporting the new zipdetails
tool. [ ]test_zip.py
. [ ]zipfile
module changed to detect potentially insecure overlapping entries within .zip
files. (#362)
Chris Lamb also updated the trydiffoscope
command line client, dropping a build-dependency on the deprecated python3-distutils
package to fix Debian bug #1065988 [ ], taking a moment to also refresh the packaging to the latest Debian standards [ ]. Finally, Vagrant Cascadian submitted an update for diffoscope version 260 in GNU Guix. [ ]
helm
(SSL-related build failure)java-21-openjdk
(parallelism)libressl
(SSL-related build failure)nfdump
(date issue)python-django-q
(avoid stuck build)python-smart-open
(fails to build on single-CPU machines)python-stdnum
(fails to build in 2039)python-yarl
(regression)qemu
(build failure)rabbitmq-java-client
(with Fridrich Strba; Maven timestamp issue)rmw
(build fails in 2038)warewulf
(with Egbert Eich; cpio
modification time and inode issue)wxWidgets
(fails to build in 2038)python-quantities
.gnome-maps
.tox
.q2cli
.mpl-sphinx-theme
.woof-doom
.bochs
.storm-lang
.librsvg
.gretl
.postfix
.node-function-bind
.python-pysaml2
.golang-github-stvp-tempredis
.matplotlib
.pathos
.rdflib
.xonsh
.maven-bundle-plugin
. (This patch was then uploaded by Mattia Rizzollo.)geany
(toolchain-related issue for glfw
)%check
section, thus failing when built with the --no-checks
option. Only half of all openSUSE packages were tested so far, but a large number of bugs were filed, including ones against caddy
, exiv2
, gnome-disk-utility
, grisbi
, gsl
, itinerary
, kosmindoormap
, libQuotient
, med-tools
, plasma6-disks
, pspp
, python-pypuppetdb
, python-urlextract
, rsync
, vagrant-libvirt
and xsimd
.
Similarly, Jean-Pierre De Jesus DIAZ employed reproducible builds techniques in order to test a proposed refactor of the ath9k-htc-firmware
package. As the change produced bit-for-bit identical binaries to the previously shipped pre-built binaries:
I don t have the hardware to test this firmware, but the build produces the same hashes for the firmware so it s safe to say that the firmware should keep working.
armhf
again. [ ][ ]i386
architecture queue. [ ]stats_buildinfo.png
graph once per day. [ ][ ]systemctl
with new systemd-based services. [ ]armhf
and i386
continuous integration tests in order to get some stability back. [ ]deb.debian.org
CDN everywhere. [ ]zst
to the list of packages which are false-positive diskspace issues. [ ]Bot
in the userAgent
for Git. (Re: #929013). [ ]tmpfs
size on our OUSL nodes. [ ]reproducible_build
service. [ ][ ]OOMPolicy=continue
and OOMScoreAdjust=-1000
for both the Jenkins and the reproducible_build
service. [ ]systemd
slice to group all relevant services. [ ][ ]shellcheck
tool. [ ]systemd-run
to handle diffoscope s exit codes specially. [ ]pgrep
tool over grepping the output of ps
. [ ]i386
and armhf
architecture builders. [ ][ ]armhf
architecture due to the time_t
transition. [ ]i386
& armhf
workers. [ ][ ][ ]pbuilder
updates in the unstable distribution, but only on the armhf
architecture. [ ]systemd
service operates. [ ][ ]powercycle_x86_nodes.py
script to use the new IONOS API and its new Python bindings. [ ]stunnel
tool anymore, it shouldn t be needed by anything anymore. [ ]arm64
architecture host keys. [ ]-
) in a variable in order to allow for tags in openQA. [ ]#reproducible-builds
on irc.oftc.net
.
rb-general@lists.reproducible-builds.org
I usually talk about me whenever I am talking about answering the question who am I, I usually say like I am a librarian turned free software enthusiast and a Debian Developer. So I had no technical background and I learned, I was introduced to free software through my husband and then I learned Debian packaging, and eventually I became a Debian Developer. So I always give my example to people who say I am not technically inclined, I don't have technical background so I can't contribute to free software. So yeah, that's what I refer to myself.For the next question, could you tell me what do you do in Debian, and could you mention your story up until here today? [Sruthi]:
Okay, so let me start from my initial days in Debian. I started contributing to Debian, my first contribution was a Tibetan font. We went to a Tibetan place and they were saying they didn't have a font in Linux. So that's how I started contributing. Then I moved on to Ruby packages, then I have some JavaScript and Go packages, all dependencies of GitLab. So I was involved with maintaining GitLab for some time, now I'm not very active there. But yeah, so GitLab was the main package I was contributing to since I contributed since 2016 to maybe like 2020 or something. Later I have come [over to] packaging. Now I am part of some of the teams, delegated teams, like community team and outreach team, as well as the Debconf committee. And the biggest, I think, my activity in Debian, I would say is organizing Debconf 2023. So it was a great experience and yeah, so that's my story in Debian.So what are three key terms about you and your candidacy? [Sruthi]:
Okay, let me first think about it. For candidacy, I can start with diversity is one point I started expressing from the first time I contested for DPL. But to be honest, that's the main point I want to bring.[Yashraj]:
So for diversity, if you could break down your thoughts on diversity and make them, [about] your three points including diversity.[Sruthi]:
So in addition to, eventually when starting it was just diversity. Now I have like a bit more ideas, like community, like I want to be a leader for the Debian community. More than, I don't know, maybe people may not agree, but I would say I want to be a leader of Debian community rather than a Debian operating system. I connect to community more and third point I would say.The term of a DPL lasts for an year. So what do you think during, what would you try to do during that, that you can't do from your position now? [Sruthi]:
Okay. So I, like, I am very happy with the structure of Debian and how things work in Debian. Like you can do almost a lot of things, like almost all things without being a DPL. Whatever change you want to bring about or whatever you want to do, you can do without being a DPL. Anyone, like every DD has the same rights. Only things I feel [the] DPL has hold on are mainly the budget or the funding part, which like, that's where they do the decision making part. And then comes like, and one advantage of DPL driving some idea is that somehow people tend to listen to that with more, like, tend to give more attention to what DPL is saying rather than a normal DD. So I wanted to, like, I have answered some of the questions on how to, how I plan to do the financial budgeting part, how I want to handle, like, and the other thing is using the extra attention that I get as a DPL, I would like to obviously start with the diversity aspect in Debian. And yeah, like, I, what I want to do is not, like, be a leader and say, like, take Debian to one direction where I want to go, but I would rather take suggestions and inputs from the whole community and go about with that. So yes, that's what I would say.And taking a less serious question now, what is your preferred text editor? [Sruthi]:
Vim.[Yashraj]:
Vim, wholeheartedly team Vim?[Sruthi]:
Yes.[Yashraj]:
Great. Well, this was made in Vim, all the text for this.[Sruthi]:
So, like, since you mentioned extra data, I'll give my example, like, it's just a fun note, when I started contributing to Debian, as I mentioned, I didn't have any knowledge about free software, like Debian, and I was not used to even using Linux. So, and I didn't have experience with these text editors. So, when I started contributing, I used to do the editing part using gedit. So, that's how I started. Eventually, I moved to Nano, and once I reached Vim, I didn't move on.Team Vim. Next question. What, what do you think is the importance of the Debian project in the world today? And where would you like to see it in 10 years, like 10 years into the future? [Sruthi]:
Okay. So, Debian, as we all know, is referred to as the universal operating system without, like, it is said for a reason. We have hundreds and hundreds of operating systems, like Linux, distributions based on Debian. So, I believe Debian, like even now, Debian has good influence on the, at least on the Linux or Linux ecosystem. So, what we implement in Debian has, like, is going to affect quite a lot of, like, a very good percentage of people using Linux. So, yes. So, I think Debian is one of the leading Linux distributions. And I think in 10 years, we should be able to reach a position, like, where we are not, like, even now, like, even these many years after having Linux, we face a lot of problems in newer and newer hardware coming up and installing on them is a big problem. Like, firmwares and all those things are getting more and more complicated. Like, it should be getting simpler, but it's getting more and more complicated. So, I, one thing I would imagine, like, I don't know if we will ever reach there, but I would imagine that eventually with the Debian, we should be able to have some, at least a few of the hardware developers or hardware producers have Debian pre-installed and those kind of things. Like, not, like, become, I'm not saying it's all, it's also available right now. What I'm saying is that it becomes prominent enough to be opted as, like, default distro.What part of Debian has made you And what part of the project has kept you going all through these years? [Sruthi]:
Okay. So, I started to contribute in 2016, and I was part of the team doing GitLab packaging, and we did have a lot of training workshops and those kind of things within India. And I was, like, I had interacted with some of the Indian DDs, but I never got, like, even through chat or mail. I didn't have a lot of interaction with the rest of the world, DDs. And the 2019 Debconf changed my whole perspective about Debian. Before that, I wasn't, like, even, I was interested in free software. I was doing the technical stuff and all. But after DebConf, my whole idea has been, like, my focus changed to the community. Debian community is a very welcoming, very interesting community to be with. And so, I believe that, like, 2019 DebConf was a for me. And that kept, from 2019, my focus has been to how to support, like, how, I moved to the community part of Debian from there. Then in 2020 I became part of the community team, and, like, I started being part of other teams. So, these, I would say, the Debian community is the one, like, aspect of Debian that keeps me whole, keeps me held on to the Debian ecosystem as a whole.Continuing to speak about Debian, what do you think, what is the first thing that comes to your mind when you think of Debian, like, the word, the community, what's the first thing? [Sruthi]:
I think I may sound like a broken record or something.[Yashraj]:
No, no.[Sruthi]:
Again, I would say the Debian community, like, it's the people who makes Debian, that makes Debian special. Like, apart from that, if I say, I would say I'm very, like, one part of Debian that makes me very happy is the, how the governing system of Debian works, the Debian constitution and all those things, like, it's a very unique thing for Debian. And, and it's like, when people say you can't work without a proper, like, establishment or even somebody deciding everything for you, it's difficult. When people say, like, we have been, Debian has been proving it for quite a long time now, that it's possible. So, so that's one thing I believe, like, that's one unique point. And I am very proud about that.What areas do you think Debian is failing in, how can it (that standing) be improved? [Sruthi]:
So, I think where Debian is failing now is getting new people into Debian. Like, I don't remember, like, exactly the answer. But I remember hearing someone mention, like, the average age of a Debian Developer is, like, above 40 or 45 or something, like, exact age, I don't remember. But it's like, Debian is getting old. Like, the people in Debian are getting old and we are not getting enough of new people into Debian. And that's very important to have people, like, new people coming up. Otherwise, eventually, like, after a few years, nobody, like, we won't have enough people to take the project forward. So, yeah, I believe that is where we need to work on. We are doing some efforts, like, being part of GSOC or outreachy and having maybe other events, like, local events. Like, we used to have a lot of Debian packaging workshops in India. And those kind of, I think, in Brazil and all, they all have, like, local communities are doing. But we are not very successful in retaining the people who maybe come and try out things. But we are not very good at retaining the people, like, retaining people who come. So, we need to work on those things. Right now, I don't have a solid answer for that. But one thing, like, I was thinking about is, like, having a Debian specific outreach project, wherein the focus will be about the Debian, like, starting will be more on, like, usually what happens in GSOC and outreach is that people come, have the, do the contributions, and they go back. Like, they don't have that connection with the Debian, like, Debian community or Debian project. So, what I envision with these, the Debian outreach, the Debian specific outreach is that we have some part of the internship, like, even before starting the internship, we have some sessions and, like, with the people in Debian having, like, getting them introduced to the Debian philosophy and Debian community and Debian, how Debian works. And those things, we focus on that. And then we move on to the technical internship parts. So, I believe this could do some good in having, like, when you have people you can connect to, you tend to stay back in a project mode. When you feel something more than, like, right now, we have so many technical stuff to do, like, the choice for a college student is endless. So, if they want, if they stay back for something, like, maybe for Debian, I would say, we need to have them connected to the Debian project before we go into technical parts. Like, technical parts, like, there are other things as well, where they can go and do the technical part, but, like, they can come here, like, yeah. So, that's what I was saying. Focused outreach projects is one thing. That's just one. That's not enough. We need more of, like, more ideas to have more new people come up. And I'm very happy with, like, the DebConf thing. We tend to get more and more people from the places where we have a DebConf. Brazil is an example. After the Debconf, they have quite a good improvement on Debian contributors. And I think in India also, it did give a good result. Like, we have more people contributing and staying back and those things. So, yeah. So, these were the things I would say, like, we can do to improve.For the final question, what field in free software do you, what field in free software generally do you think requires the most work to be put into it? What do you think is Debian's part in that field? [Sruthi]:
Okay. Like, right now, what comes to my mind is the free software licenses parts. Like, we have a lot of free software licenses, and there are non-free software licenses. But currently, I feel free software is having a big problem in enforcing these licenses. Like, there are, there may be big corporations or like some people who take up the whole, the code and may not follow the whole, for example, the GPL licenses. Like, we don't know how much of those, how much of the free softwares are used in the bigger things. Yeah, I agree. There are a lot of corporations who are afraid to touch free software. But there would be good amount of free software, free work that converts into property, things violating the free software licenses and those things. And we do not have the kind of like, we have SFLC, SFC, etc. But still, we do not have the ability to go behind and trace and implement the licenses. So, enforce those licenses and bring people who are violating the licenses forward and those kind of things is challenging because one thing is it takes time, like, and most importantly, money is required for the legal stuff. And not always people who like people who make small software, or maybe big, but they may not have the kind of time and money to have these things enforced. So, that's a big challenge free software is facing, especially in our current scenario. I feel we are having those, like, we need to find ways how we can get it sorted. I don't have an answer right now what to do. But this is a challenge I felt like and Debian's part in that. Yeah, as I said, I don't have a solution for that. But the Debian, so DFSG and Debian sticking on to the free software licenses is a good support, I think.So, that was the final question, Do you have anything else you want to mention for anyone watching this? [Sruthi]:
Not really, like, I am happy, like, I think I was able to answer the questions. And yeah, I would say who is watching. I won't say like, I'm the best DPL candidate, you can't have a better one or something. I stand for a reason. And if you believe in that, or the Debian community and Debian diversity, and those kinds of things, if you believe it, I hope you would be interested, like, you would want to vote for me. That's it. Like, I'm not, I'll make it very clear. I'm not doing a technical leadership part here. So, those, I can't convince people who want technical leadership to vote for me. But I would say people who connect with me, I hope they vote for me.
How am I? Well, I'm, as I wrote in my platform, I'm a proud grandfather doing a lot of free software stuff, doing a lot of sports, have some goals in mind which I like to do and hopefully for the best of Debian.And How are you today? [Andreas]:
How I'm doing today? Well, actually I have some headaches but it's fine for the interview. So, usually I feel very good. Spring was coming here and today it's raining and I plan to do a bicycle tour tomorrow and hope that I do not get really sick but yeah, for the interview it's fine.What do you do in Debian? Could you mention your story here? [Andreas]:
Yeah, well, I started with Debian kind of an accident because I wanted to have some package salvaged which is called WordNet. It's a monolingual dictionary and I did not really plan to do more than maybe 10 packages or so. I had some kind of training with xTeddy which is totally unimportant, a cute teddy you can put on your desktop. So, and then well, more or less I thought how can I make Debian attractive for my employer which is a medical institute and so on. It could make sense to package bioinformatics and medicine software and it somehow evolved in a direction I did neither expect it nor wanted to do, that I'm currently the most busy uploader in Debian, created several teams around it. DebianMate is very well known from me. I created the Blends team to create teams and techniques around what we are doing which was Debian TIS, Debian Edu, Debian Science and so on and I also created the packaging team for R, for the statistics package R which is technically based and not topic based. All these blends are covering a certain topic and R is just needed by lots of these blends. So, yeah, and to cope with all this I have written a script which is routing an update to manage all these uploads more or less automatically. So, I think I had one day where I uploaded 21 new packages but it's just automatically generated, right? So, it's on one day more than I ever planned to do.What is the first thing you think of when you think of Debian? Editors' note: The question was misunderstood as the worst thing you think of when you think of Debian [Andreas]:
The worst thing I think about Debian, it's complicated. I think today on Debian board I was asked about the technical progress I want to make and in my opinion we need to standardize things inside Debian. For instance, bringing all the packages to salsa, follow some common standards, some common workflow which is extremely helpful. As I said, if I'm that productive with my own packages we can adopt this in general, at least in most cases I think. I made a lot of good experience by the support of well-formed teams. Well-formed teams are those teams where people support each other, help each other. For instance, how to say, I'm a physicist by profession so I'm not an IT expert. I can tell apart what works and what not but I'm not an expert in those packages. I do and the amount of packages is so high that I do not even understand all the techniques they are covering like Go, Rust and something like this. And I also don't speak Java and I had a problem once in the middle of the night and I've sent the email to the list and was a Java problem and I woke up in the morning and it was solved. This is what I call a team. I don't call a team some common repository that is used by random people for different packages also but it's working together, don't hesitate to solve other people's problems and permit people to get active. This is what I call a team and this is also something I observed in, it's hard to give a percentage, in a lot of other teams but we have other people who do not even understand the concept of the team. Why is working together make some advantage and this is also a tough thing. I [would] like to tackle in my term if I get elected to form solid teams using the common workflow. This is one thing. The other thing is that we have a lot of good people in our infrastructure like FTP masters, DSA and so on. I have the feeling they have a lot of work and are working more or less on their limits, and I like to talk to them [to ask] what kind of change we could do to move that limits or move their personal health to the better side.The DPL term lasts for a year, What would you do during that you couldn't do now? [Andreas]:
Yeah, well this is basically what I said are my main issues. I need to admit I have no really clear imagination what kind of tasks will come to me as a DPL because all these financial issues and law issues possible and issues [that] people who are not really friendly to Debian might create. I'm afraid these things might occupy a lot of time and I can't say much about this because I simply don't know.What are three key terms about you and your candidacy? [Andreas]:
As I said, I like to work on standards, I d like to make Debian try [to get it right so] that people don't get overworked, this third key point is be inviting to newcomers, to everybody who wants to come. Yeah, I also mentioned in my term this diversity issue, geographical and from gender point of view. This may be the three points I consider most important.Preferred text editor? [Andreas]:
Yeah, my preferred one? Ah, well, I have no preferred text editor. I'm using the Midnight Commander very frequently which has an internal editor which is convenient for small text. For other things, I usually use VI but I also use Emacs from time to time. So, no, I have not preferred text editor. Whatever works nicely for me.What is the importance of the community in the Debian Project? How would like to see it evolving over the next few years? [Andreas]:
Yeah, I think the community is extremely important. So, I was on a lot of DebConfs. I think it's not really 20 but 17 or 18 DebCons and I really enjoyed these events every year because I met so many friends and met so many interesting people that it's really enriching my life and those who I never met in person but have read interesting things and yeah, Debian community makes really a part of my life.And how do you think it should evolve specifically? [Andreas]:
Yeah, for instance, last year in Kochi, it became even clearer to me that the geographical diversity is a really strong point. Just discussing with some women from India who is afraid about not coming next year to Busan because there's a problem with Shanghai and so on. I'm not really sure how we can solve this but I think this is a problem at least I wish to tackle and yeah, this is an interesting point, the geographical diversity and I'm running the so-called mentoring of the month. This is a small project to attract newcomers for the Debian Med team which has the focus on medical packages and I learned that we had always men applying for this and so I said, okay, I dropped the constraint of medical packages. Any topic is fine, I teach you packaging but it must be someone who does not consider himself a man. I got only two applicants, no, actually, I got one applicant and one response which was kind of strange if I'm hunting for women or so. I did not understand but I got one response and interestingly, it was for me one of the least expected counters. It was from Iran and I met a very nice woman, very open, very skilled and gifted and did a good job or have even lose contact today and maybe we need more actively approach groups that are underrepresented. I don't know if what's a good means which I did but at least I tried and so I try to think about these kind of things.What part of Debian has made you smile? What part of the project has kept you going all through the years? [Andreas]:
Well, the card game which is called Mao on the DebConf made me smile all the time. I admit I joined only two or three times even if I really love this kind of games but I was occupied by other stuff so this made me really smile. I also think the first online DebConf in 2020 made me smile because we had this kind of short video sequences and I tried to make a funny video sequence about every DebConf I attended before. This is really funny moments but yeah, it's not only smile but yeah. One thing maybe it's totally unconnected to Debian but I learned personally something in Debian that we have a do-ocracy and you can do things which you think that are right if not going in between someone else, right? So respect everybody else but otherwise you can do so. And in 2020 I also started to take trees which are growing widely in my garden and plant them into the woods because in our woods a lot of trees are dying and so I just do something because I can. I have the resource to do something, take the small tree and bring it into the woods because it does not harm anybody. I asked the forester if it is okay, yes, yes, okay. So everybody can do so but I think the idea to do something like this came also because of the free software idea. You have the resources, you have the computer, you can do something and you do something productive, right? And when thinking about this I think it was also my Debian work. Meanwhile I have planted more than 3,000 trees so it's not a small number but yeah, I enjoy this.What part of Debian would you have some criticisms for? [Andreas]:
Yeah, it's basically the same as I said before. We need more standards to work together. I do not want to repeat this but this is what I think, yeah.What field in Free Software generally do you think requires the most work to be put into it? What do you think is Debian's part in the field? [Andreas]:
It's also in general, the thing is the fact that I'm maintaining packages which are usually as modern software is maintained in Git, which is fine but we have some software which is at Sourceport, we have software laying around somewhere, we have software where Debian somehow became Upstream because nobody is caring anymore and free software is very different in several things, ways and well, I in principle like freedom of choice which is the basic of all our work. Sometimes this freedom goes in the way of productivity because everybody is free to re-implement. You asked me for the most favorite editor. In principle one really good working editor would be great to have and would work and we have maybe 500 in Debian or so, I don't know. I could imagine if people would concentrate and say five instead of 500 editors, we could get more productive, right? But I know this will not happen, right? But I think this is one thing which goes in the way of making things smooth and productive and we could have more manpower to replace one person who's [having] children, doing some other stuff and can't continue working on something and maybe this is a problem I will not solve, definitely not, but which I see.What do you think is Debian's part in the field? [Andreas]:
Yeah, well, okay, we can bring together different Upstreams, so we are building some packages and have some general overview about similar things and can say, oh, you are doing this and some other person is doing more or less the same, do you want to join each other or so, but this is kind of a channel we have to our Upstreams which is probably not very successful. It starts with code copies of some libraries which are changed a little bit, which is fine license-wise, but not so helpful for different things and so I've tried to convince those Upstreams to forward their patches to the original one, but for this and I think we could do some kind of, yeah, [find] someone who brings Upstream together or to make them stop their forking stuff, but it costs a lot of energy and we probably don't have this and it's also not realistic that we can really help with this problem.Do you have any questions for me? [Andreas]:
I enjoyed the interview, I enjoyed seeing you again after half a year or so. Yeah, actually I've seen you in the eating room or cheese and wine party or so, I do not remember we had to really talk together, but yeah, people around, yeah, for sure. Yeah.
upstream
, mark it forwarded
; perhaps taking
5-10 minutes.
Often I'll stumble across something that has already been fixed but not recorded
as such as I go.
Despite this minimal level of work, I'm quite satisfied with the cumulative
progress. It's notable to me how much my perspective has shifted by becoming a
maintainer: I'm considering everything through a different lens to that of being
just one user.
Eventually I will put some time aside to scratch some of my own itches (html5 by
default; support dark mode; duckduckgo plugin; use the details
tag...) but for
now this minimal exercise is of broader use.
remote: fatal: pack exceeds maximum allowed size (4.88 GiB)
however breaking up the commit into smaller commits for parts of the archive made it possible to push the entire archive. Here are the commands to create this repository:
git init
git lfs install
git lfs track 'dists/**' 'pool/**'
git add .gitattributes
git commit -m"Add Git-LFS track attributes." .gitattributes
time debmirror --method=rsync --host ftp.se.debian.org --root :debian --arch=amd64 --source --dist=bookworm,bookworm-updates --section=main --verbose --diff=none --keyring /usr/share/keyrings/debian-archive-keyring.gpg --ignore .git .
git add dists project
git commit -m"Add." -a
git remote add origin git@gitlab.com:debdistutils/archives/debian/mirror.git
git push --set-upstream origin --all
for d in pool//; do
echo $d;
time git add $d;
git commit -m"Add $d." -a
git push
done
The resulting repository size is around 27MB with Git LFS object storage around 174GB. I think this approach would scale to handle all architectures for one release, but working with a single git repository for all releases for all architectures may lead to a too large git repository (>1GB). So maybe one repository per release? These repositories could also be split up on a subset of pool/ files, or there could be one repository per release per architecture or sources.
Finally, I have concerns about using SHA1 for identifying objects. It seems both Git and Debian s snapshot service is currently using SHA1. For Git there is SHA-256 transition and it seems GitLab is working on support for SHA256-based repositories. For serious long-term deployment of these concepts, it would be nice to go for SHA256 identifiers directly. Git-LFS already uses SHA256 but Git internally uses SHA1 as does the Debian snapshot service.
What do you think? Happy Hacking!
Publisher: | St. Martin's Press |
Copyright: | 2023 |
ISBN: | 1-250-27694-2 |
Format: | Kindle |
Pages: | 310 |
A lawyer for Dalio said he "treated all employees equally, giving people at all levels the same respect and extending them the same perks."Uh-huh. Anyway, I personally know nothing about Bridgewater other than what I learned here and the occasional mention in Matt Levine's newsletter (which is where I got the recommendation for this book). I have no independent information whether anything Copeland describes here is true, but Copeland provides the typical extensive list of notes and sourcing one expects in a book like this, and Levine's comments indicated it's generally consistent with Bridgewater's industry reputation. I think this book is true, but since the clear implication is that the world's largest hedge fund was primarily a deranged cult whose employees mostly spied on and rated each other rather than doing any real investment work, I also have questions, not all of which Copeland answers to my satisfaction. But more on that later. The center of this book are the Principles. These were an ever-changing list of rules and maxims for how people should conduct themselves within Bridgewater. Per Copeland, although Dalio later published a book by that name, the version of the Principles that made it into the book was sanitized and significantly edited down from the version used inside the company. Dalio was constantly adding new ones and sometimes changing them, but the common theme was radical, confrontational "honesty": never being silent about problems, confronting people directly about anything that they did wrong, and telling people all of their faults so that they could "know themselves better." If this sounds like textbook abusive behavior, you have the right idea. This part Dalio admits to openly, describing Bridgewater as a firm that isn't for everyone but that achieves great results because of this culture. But the uncomfortably confrontational vibes are only the tip of the iceberg of dysfunction. Here are just a few of the ways this played out according to Copeland:
In this talk Holger h01ger Levsen will give an overview about Reproducible Builds: How it started with a small BoF at DebConf13 (and before), then grew from being a Debian effort to something many projects work on together, until in 2021 it was mentioned in an Executive Order of the President of the United States. And of course, the talk will not end there, but rather outline where we are today and where we still need to be going, until Debian stable (and other distros!) will be 100% reproducible, verified by many. h01ger has been involved in reproducible builds since 2014 and so far has set up automated reproducibility testing for Debian, Fedora, Arch Linux, FreeBSD, NetBSD and coreboot.More information can be found on FOSDEM s own page for the talk, including a video recording and slides.
Courtesy of my CRANberries, there is also a report with diffstat for this release. Questions, comments, issue tickets can be brought to the GitHub repo. If you like this or other open-source work I do, you can now sponsor me at GitHub.Changes in version 0.1.2 (2024-01-31)
- Update the one exported C-level identifier from data.table following its 1.5.0 release and a renaming
- Routine continuous integration updates
This post by Dirk Eddelbuettel originated on his Thinking inside the box blog. Please report excessive re-aggregation in third-party for-profit settings.
/C=BE/O=GlobalSign nv-sa/CN=AlphaSSL CA - SHA256 - G4 /C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Domain Validation Secure Server CA /C=GB/ST=Greater Manchester/L=Salford/O=Sectigo Limited/CN=Sectigo RSA Organization Validation Secure Server CA /C=US/ST=Arizona/L=Scottsdale/O=GoDaddy.com, Inc./OU=http://certs.godaddy.com/repository//CN=Go Daddy Secure Certificate Authority - G2 /C=US/ST=Arizona/L=Scottsdale/O=Starfield Technologies, Inc./OU=http://certs.starfieldtech.com/repository//CN=Starfield Secure Certificate Authority - G2 /C=AT/O=ZeroSSL/CN=ZeroSSL RSA Domain Secure Site CA /C=BE/O=GlobalSign nv-sa/CN=GlobalSign GCC R3 DV TLS CA 2020Rather than try to work with raw issuers (because, as Andrew Ayer says, The SSL Certificate Issuer Field is a Lie), I mapped these issuers to the organisations that manage them, and summed the counts for those grouped issuers together.
Issuer | Compromised Count |
---|---|
Sectigo | 170 |
ISRG (Let's Encrypt) | 161 |
GoDaddy | 141 |
DigiCert | 81 |
GlobalSign | 46 |
Entrust | 3 |
SSL.com | 1 |
Issuer | Issuance Volume | Compromised Count | Compromise Rate |
---|---|---|---|
Sectigo | 88,323,068 | 170 | 1 in 519,547 |
ISRG (Let's Encrypt) | 315,476,402 | 161 | 1 in 1,959,480 |
GoDaddy | 56,121,429 | 141 | 1 in 398,024 |
DigiCert | 144,713,475 | 81 | 1 in 1,786,586 |
GlobalSign | 1,438,485 | 46 | 1 in 31,271 |
Entrust | 23,166 | 3 | 1 in 7,722 |
SSL.com | 171,816 | 1 | 1 in 171,816 |
Issuer | Issuance Volume | Compromised Count | Compromise Rate |
---|---|---|---|
Entrust | 23,166 | 3 | 1 in 7,722 |
GlobalSign | 1,438,485 | 46 | 1 in 31,271 |
SSL.com | 171,816 | 1 | 1 in 171,816 |
GoDaddy | 56,121,429 | 141 | 1 in 398,024 |
Sectigo | 88,323,068 | 170 | 1 in 519,547 |
DigiCert | 144,713,475 | 81 | 1 in 1,786,586 |
ISRG (Let's Encrypt) | 315,476,402 | 161 | 1 in 1,959,480 |
SELECT SUM(sub.NUM_ISSUED[2] - sub.NUM_EXPIRED[2]) FROM ( SELECT ca.name, max(coalesce(coalesce(nullif(trim(cc.SUBORDINATE_CA_OWNER), ''), nullif(trim(cc.CA_OWNER), '')), cc.INCLUDED_CERTIFICATE_OWNER)) as OWNER, ca.NUM_ISSUED, ca.NUM_EXPIRED FROM ccadb_certificate cc, ca_certificate cac, ca WHERE cc.CERTIFICATE_ID = cac.CERTIFICATE_ID AND cac.CA_ID = ca.ID GROUP BY ca.ID ) sub WHERE sub.name ILIKE '%Amazon%' OR sub.name ILIKE '%CloudFlare%' AND sub.owner = 'DigiCert';The number I get from running that query is 104,316,112, which should be subtracted from DigiCert s total issuance figures to get a more accurate view of what DigiCert s regular customers do with their private keys. When I do this, the compromise rates table, sorted by the compromise rate, looks like this:
Issuer | Issuance Volume | Compromised Count | Compromise Rate |
---|---|---|---|
Entrust | 23,166 | 3 | 1 in 7,722 |
GlobalSign | 1,438,485 | 46 | 1 in 31,271 |
SSL.com | 171,816 | 1 | 1 in 171,816 |
GoDaddy | 56,121,429 | 141 | 1 in 398,024 |
"Regular" DigiCert | 40,397,363 | 81 | 1 in 498,732 |
Sectigo | 88,323,068 | 170 | 1 in 519,547 |
All DigiCert | 144,713,475 | 81 | 1 in 1,786,586 |
ISRG (Let's Encrypt) | 315,476,402 | 161 | 1 in 1,959,480 |
The less humans have to do with certificate issuance, the less likely they are to compromise that certificate by exposing the private key. While it may not be surprising, it is nice to have some empirical evidence to back up the common wisdom. Fully-managed TLS providers, such as CloudFlare, AWS Certificate Manager, and whatever Azure s thing is called, is the platonic ideal of this principle: never give humans any opportunity to expose a private key. I m not saying you should use one of these providers, but the security approach they have adopted appears to be the optimal one, and should be emulated universally. The ACME protocol is the next best, in that there are a variety of standardised tools widely available that allow humans to take themselves out of the loop, but it s still possible for humans to handle (and mistakenly expose) key material if they try hard enough. Legacy issuance methods, which either cannot be automated, or require custom, per-provider automation to be developed, appear to be at least four times less helpful to the goal of avoiding compromise of the private key associated with a certificate.
manifest-version: "0.1"
installations:
- install:
source: foo
dest-dir: usr/bin
"manifest-version": "0.1",
"installations": [
"install":
"source": "foo",
"dest-dir": "usr/bin",
]
So beyond me being unsatisfied with the current situation, it was also clear to me that I needed to come up with a better solution if I wanted externally provided plugins for debputy. To put a bit more perspective on what I expected from the end result:
- Most plugins would probably throw KeyError: X or ValueError style errors for quite a while. Worst case, they would end on my table because the user would have a hard time telling where debputy ends and where the plugins starts. "Best" case, I would teach debputy to say "This poor error message was brought to you by plugin foo. Go complain to them". Either way, it would be a bad user experience.
- This even assumes plugin providers would actually bother writing manifest parsing code. If it is that difficult, then just providing a custom file in debian might tempt plugin providers and that would undermine the idea of having the manifest be the sole input for debputy.
- It had to cover as many parsing errors as possible. An error case this code would handle for you, would be an error where I could ensure it sufficient degree of detail and context for the user.
- It should be type-safe / provide typing support such that IDEs/mypy could help you when you work on the parsed result.
- It had to support "normalization" of the input, such as
# User provides
- install: "foo"
# Which is normalized into:
- install:
source: "foo"
4) It must be simple to tell debputy what input you expected.
# User provides
- install: "foo"
# Which is normalized into:
- install:
source: "foo"
# Example input matching this typed dict.
#
# "source": ["foo"]
# "into": ["pkg"]
#
class InstallExamplesTargetFormat(TypedDict):
# Which source files to install (dest-dir is fixed)
sources: List[str]
# Which package(s) that should have these files installed.
into: NotRequired[List[str]]
# Example input matching this typed dict.
#
# "source": "foo"
# "into": "pkg"
#
#
class InstallExamplesManifestFormat(TypedDict):
# Note that sources here is split into source (str) vs. sources (List[str])
sources: NotRequired[List[str]]
source: NotRequired[str]
# We allow the user to write into: foo in addition to into: [foo]
into: Union[str, List[str]]
FullInstallExamplesManifestFormat = Union[
InstallExamplesManifestFormat,
List[str],
str,
]
def _handler(
normalized_form: InstallExamplesTargetFormat,
) -> InstallRule:
... # Do something with the normalized form and return an InstallRule.
concept_debputy_api.add_install_rule(
keyword="install-examples",
target_form=InstallExamplesTargetFormat,
manifest_form=FullInstallExamplesManifestFormat,
handler=_handler,
)
This prototype did not take a long (I remember it being within a day) and worked surprisingly well though with some poor error messages here and there. Now came the first challenge, adding the manifest format schema plus relevant normalization rules. The very first normalization I did was transforming into: Union[str, List[str]] into into: List[str]. At that time, source was not a separate attribute. Instead, sources was a Union[str, List[str]], so it was the only normalization I needed for all my use-cases at the time. There are two problems when writing a normalization. First is determining what the "source" type is, what the target type is and how they relate. The second is providing a runtime rule for normalizing from the manifest format into the target format. Keeping it simple, the runtime normalizer for Union[str, List[str]] -> List[str] was written as:
- Read TypedDict.__required_attributes__ and TypedDict.__optional_attributes__ to determine which attributes where present and which were required. This was used for reporting errors when the input did not match.
- Read the types of the provided TypedDict, strip the Required / NotRequired markers and use basic isinstance checks based on the resulting type for str and List[str]. Again, used for reporting errors when the input did not match.
def normalize_into_list(x: Union[str, List[str]]) -> List[str]:
return x if isinstance(x, list) else [x]
# Map from
class InstallExamplesManifestFormat(TypedDict):
# Note that sources here is split into source (str) vs. sources (List[str])
sources: NotRequired[List[str]]
source: NotRequired[str]
# We allow the user to write into: foo in addition to into: [foo]
into: Union[str, List[str]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
# Which source files to install (dest-dir is fixed)
sources: List[str]
# Which package(s) that should have these files installed.
into: NotRequired[List[str]]
While working on all of this type introspection for Python, I had noted the Annotated[X, ...] type. It is basically a fake type that enables you to attach metadata into the type system. A very random example:
- How will the parser generator understand that source should be normalized and then mapped into sources?
- Once that is solved, the parser generator has to understand that while source and sources are declared as NotRequired, they are part of a exactly one of rule (since sources in the target form is Required). This mainly came down to extra book keeping and an extra layer of validation once the previous step is solved.
# For all intents and purposes, foo is a string despite all the Annotated stuff.
foo: Annotated[str, "hello world"] = "my string here"
# Map from
#
# "source": "foo" # (or "sources": ["foo"])
# "into": "pkg"
#
class InstallExamplesManifestFormat(TypedDict):
# Note that sources here is split into source (str) vs. sources (List[str])
sources: NotRequired[List[str]]
source: NotRequired[
Annotated[
str,
DebputyParseHint.target_attribute("sources")
]
]
# We allow the user to write into: foo in addition to into: [foo]
into: Union[str, List[str]]
# ... into
#
# "source": ["foo"]
# "into": ["pkg"]
#
class InstallExamplesTargetFormat(TypedDict):
# Which source files to install (dest-dir is fixed)
sources: List[str]
# Which package(s) that should have these files installed.
into: NotRequired[List[str]]
It was not super difficult given the existing infrastructure, but it did take some hours of coding and debugging. Additionally, I added a parse hint to support making the into conditional based on whether it was a single binary package. With this done, you could now write something like:
- Conversion table where the parser generator can tell that BinaryPackage requires an input of str and a callback to map from str to BinaryPackage. (That is probably lie. I think the conversion table came later, but honestly I do remember and I am not digging into the git history for this one)
- At runtime, said callback needed access to the list of known packages, so it could resolve the provided string.
# Map from
class InstallExamplesManifestFormat(TypedDict):
# Note that sources here is split into source (str) vs. sources (List[str])
sources: NotRequired[List[str]]
source: NotRequired[
Annotated[
str,
DebputyParseHint.target_attribute("sources")
]
]
# We allow the user to write into: foo in addition to into: [foo]
into: Union[BinaryPackage, List[BinaryPackage]]
# ... into
class InstallExamplesTargetFormat(TypedDict):
# Which source files to install (dest-dir is fixed)
sources: List[str]
# Which package(s) that should have these files installed.
into: NotRequired[
Annotated[
List[BinaryPackage],
DebputyParseHint.required_when_multi_binary()
]
]
In this way everybody wins. Yes, writing this parser generator code was more enjoyable than writing the ad-hoc manual parsers it replaced. :)
- The parser engine handles most of the error reporting meaning users get most of the errors in a standard format without the plugin provider having to spend any effort on it. There will be some effort in more complex cases. But the common cases are done for you.
- It is easy to provide flexibility to users while avoiding having to write code to normalize the user input into a simplified programmer oriented format.
- The parser handles mapping from basic types into higher forms for you. These days, we have high level types like FileSystemMode (either an octal or a symbolic mode), different kind of file system matches depending on whether globs should be performed, etc. These types includes their own validation and parsing rules that debputy handles for you.
- Introspection and support for providing online reference documentation. Also, debputy checks that the provided attribute documentation covers all the attributes in the manifest form. If you add a new attribute, debputy will remind you if you forget to document it as well. :)
auto end0 iface end0 inet dhcpThen to get sshd to start you have to run the following commands to generate ssh host keys that aren t zero bytes long:
rm /etc/ssh/ssh_host_* systemctl restart ssh.serviceIt appears to have wifi hardware but the OS doesn t recognise it. This isn t a priority for me as I mostly want to use it as a server. Performance For the first test of performance I created a 100MB file from /dev/urandom and then tried compressing it on various systems. With zstd -9 it took 16.893 user seconds on the LicheePi4A, 0.428s on my Thinkpad X1 Carbon Gen5 with a i5-6300U CPU (Debian/Unstable), 1.288s on my E5-2696 v3 workstation (Debian/Bookworm), 0.467s on the E5-2696 v3 running Debian/Unstable, 2.067s on a E3-1271 v3 server, and 7.179s on the E3-1271 v3 system emulating a RISC-V system via QEMU running Debian/Unstable. It s very impressive that the QEMU emulation is fast enough that emulating a different CPU architecture is only 3.5* slower for this test (or maybe 10* slower if it was running Debian/Unstable on the AMD64 code)! The emulated RISC-V is also more than twice as fast as real RISC-V hardware and probably of comparable speed to real RISC-V hardware when running the same versions (and might be slightly slower if running the same version of zstd) which is a tribute to the quality of emulation. One performance issue that most people don t notice is the time taken to negotiate ssh sessions. It s usually not noticed because the common CPUs have got faster at about the same rate as the algorithms for encryption and authentication have become more complex. On my i5-6300U laptop it takes 0m0.384s to run ssh -i ~/.ssh/id_ed25519 localhost id with the below server settings (taken from advice on ssh-audit.com [3] for a secure ssh configuration). On the E3-1271 v3 server it is 0.336s, on the QMU system it is 28.022s, and on the LicheePi it is 0.592s. By this metric the LicheePi is about 80% slower than decent x86 systems and the QEMU emulation of RISC-V is 73* slower than the x86 system it runs on. Does crypto depend on instructions that are difficult to emulate?
HostKey /etc/ssh/ssh_host_ed25519_key KexAlgorithms -ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha256 MACs -umac-64-etm@openssh.com,hmac-sha1-etm@openssh.com,umac-64@openssh.com,umac-128@openssh.com,hmac-sha2-256,hmac-sha2-512,hmac-sha1I haven t yet tested the performance of Ethernet (what routing speed can you get through the 2 gigabit ports?), emmc storage, and USB. At the moment I ve been focused on using RISC-V as a test and development platform. My conclusion is that I m glad I don t plan to compile many kernels or anything large like LibreOffice. But that for typical development that I do it will be quite adequate. The speed of Chromium seems adequate in basic tests, but the video output hasn t worked reliably enough to do advanced tests. Hardware Features Having two Gigabit Ethernet ports, 4 USB-3 ports, and Wifi on board gives some great options for using this as a router. It s disappointing that they didn t go with 2.5Gbit as everyone seems to be doing that nowadays but Gigabit is enough for most things. Having only a single HDMI port and not supporting USB-C docks (the USB-C port appears to be power only) limits what can be done for workstation use and for controlling displays. I know of people using small ARM computers attached to the back of large TVs for advertising purposes and that isn t going to be a great option for this. The CPU and RAM apparently uses a lot of power (which is relative the entire system draws up to 2A at 5V so the CPU would be something below 5W). To get this working a cooling fan has to be stuck to the CPU and RAM chips via a layer of thermal stuff that resembles a fine sheet of blu-tack in both color and stickyness. I am disappointed that there isn t any more solid form of construction, to mount this on a wall or ceiling some extra hardware would be needed to secure this. Also if they just had a really big copper heatsink I think that would be better. 80386 CPUs with similar TDP were able to run without a fan. I wonder how things would work with all USB ports in use. It s expected that a USB port can supply a minimum of 2.5W which means that all the ports could require 10W if they were active. Presumably something significantly less than 5W is available for the USB ports. Other Devices Sipheed has a range of other devices in the works. They currently sell the LicheeCluster4A which support 7 compute modules for a cluster in a box. This has some interesting potential for testing and demonstrating cluster software but you could probably buy an AMD64 system with more compute power for less money. The Lichee Console 4A is a tiny laptop which could be useful for people who like the 7 laptop form factor, unfortunately it only has a 1280*800 display if it had the same resolution display as a typical 7 phone I would have bought one. The next device that appeals to me is the soon to be released Lichee Pad 4A which is a 10.1 tablet with 1920*1200 display, Wifi6, Bluetooth 5.4, and 16G of RAM. It also has 1 USB-C connection, 2*USB-3 sockets, and support for an external card with 2*Gigabit ethernet. It s a tablet as a laptop without keyboard instead of the more common larger phone design model. They are also about to release the LicheePadMax4A which is similar to the other tablet but with a 14 2240*1400 display and which ships with a keyboard to make it essentially a laptop with detachable keyboard. Conclusion At this time I wouldn t recommend that this device be used as a workstation or laptop, although the people who want to do such things will probably do it anyway regardless of my recommendations. I think it will be very useful as a test system for RISC-V development. I have some friends who are interested in this sort of thing and I can give them VMs. It is a bit expensive. The Sipheed web site boasts about the LicheePi4 being faster than the RaspberryPi4, but it s not a lot faster and the RaspberryPi4 is much cheaper ($127 or $129 for one with 8G of RAM). The RaspberryPi4 has two HDMI ports but a limit of 8G of RAM while the LicheePi has up to 16G of RAM and two Gigabit Ethernet ports but only a single HDMI port. It seems that the RaspberryPi4 might win if you want a cheap low power desktop system. At this time I think the reason for this device is testing out RISC-V as an alternative to the AMD64 and ARM64 architectures. An open CPU architecture goes well with free software, but it isn t just people who are into FOSS who are testing such things. I know some corporations are trying out RISC-V as a way of getting other options for embedded systems that don t involve paying monopolists. The Lichee Console 4A is probably a usable tiny laptop if the resolution is sufficient for your needs. As an aside I predict that the tiny laptop or pocket computer segment will take off in the near future. There are some AMD64 systems the size of a phone but thicker that run Windows and go for reasonable prices on AliExpress. Hopefully in the near future this device will have better video drivers and be usable as a small and quiet workstation. I won t rule out the possibility of making this my main workstation in the not too distant future, all it needs is reliable 4K display and the ability to decode 4K video. It s performance for web browsing and as an ssh client seems adequate, and that s what matters for my workstation use. But for the moment it s just for server use.
Next.